This project will explore the application of Natural Language Processing (NLP) and Machine Learning (ML) tools to the context domain of organizational behaviour, more specifically to a study of group maintenance in a novel setting. The project involves information scientists working collaboratively with domain scientists with the goal of developing an innovative NLP and ML-based research tool to support qualitative social science research, specifically content analysis. Content analysis is an increasingly popular qualitative research technique for finding evidence of concepts of interest using text as raw data rather than numbers. The process of identifying and labelling significant features in text is referred to as “coding” and the result of such an analysis is a text annotated with codes for the concepts exhibited.
In this project, the problem of coding qualitative data is conceptualized as an Information Extraction (IE) problem amenable to automation using NLP. However, rather than seeking to automate the process, the technologies will be used in a supporting role, keeping the human coder in the loop. ML will be used to induce rules from examples of codes, avoiding the need to develop rules manually. To reduce the amount of training data need, an active learning process will be employed, in which a few hand-coded examples are used to create an initial model that can be further evolved through interaction with the user. These approaches will be combined in a prototype tool to support qualitative content analysis. As a demonstration and test of the tool, it will be applied to a study of group maintenance behaviour in cyber-infrastructure-supported distributed groups, specifically free/libre open source software development teams.
For more information, please contact the PI, Nancy McCracken <njmccrac [at] syr [dot] edu>.