SoCQA: Socio-computational qualitative analysis

This project will explore the application of Natural Language Processing (NLP) and Machine Learning (ML) tools to the context domain of organizational behaviour, more specifically to a study of group maintenance in a novel setting. The project involves information scientists working collaboratively with domain scientists with the goal of developing an innovative NLP and ML-based research tool to support qualitative social science research, specifically content analysis. Content analysis is an increasingly popular qualitative research technique for finding evidence of concepts of interest using text as raw data rather than numbers. The process of identifying and labelling significant features in text is referred to as “coding” and the result of such an analysis is a text annotated with codes for the concepts exhibited.

In this project, the problem of coding qualitative data is conceptualized as an Information Extraction (IE) problem amenable to automation using NLP. However, rather than seeking to automate the process, the technologies will be used in a supporting role, keeping the human coder in the loop. ML will be used to induce rules from examples of codes, avoiding the need to develop rules manually. To reduce the amount of training data need, an active learning process will be employed, in which a few hand-coded examples are used to create an initial model that can be further evolved through interaction with the user. These approaches will be combined in a prototype tool to support qualitative content analysis. As a demonstration and test of the tool, it will be applied to a study of group maintenance behaviour in cyber-infrastructure-supported distributed groups, specifically free/libre open source software development teams.

For more information, please contact the PI, Nancy McCracken <njmccrac [at] syr [dot] edu>.

Design of an Active Learning System with Human Correction for Content Analysis

Yan, J. L. S., McCracken N., & Crowston K. (2014).  Design of an Active Learning System with Human Correction for Content Analysis. Workshop on Interactive Language Learning, Visualization, and Interfaces, 52nd Annual Meeting of the Association for Computational Linguistics.

Academic Year REU Intern Positions have been filled.

REU Research Intern Positions Available Fall 2013 and Spring 2014

Undergraduate Research Intern positions available for the academic year on-campus (fall 2013 and spring 2014). These positions are funded under the Research Experiences for Undergraduates (REU) program from the NSF and provide an $8,000 stipend paid over the academic year. Undergraduate students from information science, the social sciences and computer science who are interested in participating in an interdisciplinary research team are encouraged to apply by September 9, 2013.

System Development Update

We have recently completed most of the functionality of the SoCQA tool. This includes ingesting the documents as annotated in Atlas-ti, learning a model from that data and reporting the model performance results, applying the model to additional data and allowing the user to verify whether the model predictions are correct or not.

System design and development update

We're near the end of the 1st year of the grant and system development is progressing well, albeit a bit slower than we'd hoped. The high level system design is set and we've been implementing functionality in a series of sprints. By the end of the current sprint, we should have a basic system in place, allowing us to import email messages, import human annotations of some of the data from an Atlas-ti file, learn a model and apply the model to additional data. The final piece will be an interface to allow a human coder to correct the machine-applied annotations.

Tutorial introduction to content analysis

Here are the slides for the tutorial I gave on content analysis for the PI meeting for the NSF Socio-computational Systems (SoCS) program.

Syndicate content