SoCQA: Socio-computational qualitative analysis

This project explored the application of Natural Language Processing (NLP) and Machine Learning (ML) tools to the context domain of organizational behaviour, more specifically to a study of group maintenance in a novel setting. The project involved information scientists working collaboratively with domain scientists with the goal of developing an innovative NLP and ML-based research tool to support qualitative social science research, specifically content analysis. Content analysis is an increasingly popular qualitative research technique for finding evidence of concepts of interest using text as raw data rather than numbers. The process of identifying and labelling significant features in text is referred to as “coding” and the result of such an analysis is a text annotated with codes for the concepts exhibited. In this project, the problem of coding qualitative data was conceptualized as an Information Extraction (IE) problem amenable to automation using NLP. However, rather than seeking to automate the process, the technologies were used in a supporting role, keeping the human coder in the loop. ML was used to induce rules from examples of codes, avoiding the need to develop rules manually. To reduce the amount of training data need, an active learning process was employed, in which a few hand-coded examples are used to create an initial model that can be further evolved through interaction with the user. These approaches were combined in a prototype tool to support qualitative content analysis. As a demonstration and test of the tool, it was applied to a study of group maintenance behaviour in cyber-infrastructure-supported distributed groups, specifically free/libre open source software development teams. For more information, please contact the PI, Nancy McCracken .