Completed Projects


    The goal of this research is to generate an empirically-based understanding of the ramifications of adding spoken language capabilities to text-based dialogue tutors, and to understand how these implications might differ in human-human and human-computer spoken interactions. This research will explore the relative effectiveness of speech versus text-based tutoring in the context of ITSPOKE, a speech-based dialogue system that uses a text-based system for tutoring conceptual physics (VanLehn et al, 2002) as its ``back-end.'' The results of this work will demonstrate whether spoken dialogues yield increased performance compared to text with respect to a variety of evaluation measures, whether the same or different student and tutor behaviors correlate with learning gains in speech and text, and how such findings generalize both across and within human and computer tutoring conditions. These results will impact the development of future dialogue tutoring systems incorporating speech, by highlighting the performance gains that can be expected, and the requirements for achieving such gains.


    The focus of our proposed work is to provide an infrastructure that will allow learning researchers to study dialogue in new ways and for educational technology researchers to quickly build dialogue based help systems for their tutoring systems. Most tutorial dialogue systems that to date have undergone successful evaluations (CIRCSIM, AutoTutor, WHY-Atlas, the Geometry Explanation Tutor) represent development efforts of many man-years. These systems were instrumental in pushing the technology forward and in proving that tutorial dialogue systems are feasible and useful in realistic educational contexts, although not always provably better on a pedagogical level than the more challenging alternatives to which they have been compared. We are now entering a new phase in which we as a research community must not only continue to improve the effectiveness of basic tutorial dialogue technology but also provide tools that support investigating the effective use of dialogue as a learning intervention as well as application of tutorial dialogue systems by those who are not dialogue system researchers. We propose to develop a community resource to address all three of these problems on a grand scale, building upon our prior work developing both basic dialogue technology and tools for rapid development of running dialogue systems. This grant is led by Pamela Jordan at the University of Pittsburgh and Carolyn Rose at Carnegie Mellon University.


    Most existing tutoring systems respond based only on the correctness of student answers. Although the tutoring community has shown that incorrectness and uncertainty both represent learning impasses (and thus opportunities to learn), and has also shown correlations between uncertainty and learning, to date there have been very few controlled experiments investigating whether system responses to student uncertainty improve learning. We thus propose a small controlled study to test whether this hypothesis holds true, under "ideal" system conditions. The study uses a Wizard of Oz (WOZ) version of a qualitative physics spoken dialogue tutoring system, where the human Wizard performs speech recognition, natural language understanding, and recognition of uncertainty, for each student answer. In the experimental condition, the Wizard then tells the system that correct but uncertain answers are incorrect, causing the system to respond to both uncertain and incorrect student answers in the same way, namely with further dialogue, thereby reinforcing the student's understanding of the principle(s) under discussion. In the first control condition, the system responds only to incorrect student answers in this way. In the second control condition, the system responds to a percentage of correct answers in this way, to control for the additional tutoring in the experimental condition.


    This research investigates the feasibility and utility of monitoring student emotions in spoken dialogue tutorial systems. While human tutors respond to both the content of student utterances and underlying perceived emotions, most tutorial dialogue systems cannot detect student emotions, and furthermore are text-based, which may limit their success at emotion prediction. While there has been increasing interest in identifying problematic emotions (e.g. frustration, anger) in spoken dialogue applications such as call centers, little work has addressed the tutorial domain. The PIs are investigating the use of lexical, syntactic, dialogue, prosodic and acoustic cues to enable a computer tutor to automatically predict and respond to student emotions. The research is being performed in the context of ITSPOKE, a speech-based tutoring dialogue system for conceptual physics. The PIs are recording students interacting with ITSPOKE, manually annotating student emotions in these as well as in human-human dialogues, identifying linguistic and paralinguistic cues to the annotations, and using machine learning to predict emotions from potential cues. The PIs are then deriving strategies for adapting the system's tutoring based upon emotion identification. The major scientific contribution will be an understanding of whether cues available to spoken dialogue systems can be used to predict emotion, and ultimately to improve tutoring performance. The results will be of value to other applications that can benefit from monitoring emotional speech. Progress towards closing the performance gap between human tutors and current machine tutors will also expand the usefulness of current computer tutors. This grant is in collaboration with Julia Hirschberg and her group at Columbia University.


    It is widely acknowledged, both in academic studies and the marketplace, that the most effective form of education is the professional human tutor. A major difference between human tutors and computer tutors is that only human tutors understand unconstrained natural language input. Recently, a few tutoring systems have been developed that carry on a natural language (NL) dialogue with students. Our research problem is to find ways to make NL-based tutoring systems more effective. Our basic approach is to derive new dialogue strategies from studies of human tutorial dialogues, incorporate them in an NL-based tutoring system, and determine if they make the tutoring system more effective. For instance, some studies are determining if learning increases when human tutors are constrained to follow certain strategies. In order to incorporate the new dialogue strategies into our existing text and spoken NL-based tutoring systems, two completely new modules are being developed. One new module will interpret student utterances using a large directed graph of propositions called an explanation network, which is halfway between the shallow and deep representations of knowledge that are currently used. The second new module uses machine learning to improve the selection of dialogue management strategies. The research is thus a multidisciplinary effort whose intellectual merit lies in new results in the cognitive psychology of human tutoring, in the technology of NL processing, and in the design of effective tutoring systems. Improved NL-based tutoring systems could have a broad impact on education and society. This grant is in collaboration with Kurt VanLehn, Micheline Chi, and Pamela Jordan at the Learning Research and Development Center, University of Pittsburgh, and with Carolyn Rose (now at CMU).


    Research on the factors that make one-on-one tutoring a very effective mode of instruction has converged on an important finding: that the critical term in "tutorial interaction" is "interaction." That is, what the tutor says or does during tutoring, and what the student says or does are less important than the dynamic, coordinated interplay between their dialogue turns. It is now important to identify the discourse mechanisms that drive highly interactive human tutoring, so that these mechanisms can be simulated by natural-language dialogue engines in intelligent tutoring systems (ITSs). In the first stage of this project, we will analyze a corpus of naturalistic tutorial dialogues to accomplish this goal. Specifically, we will identify the mechanisms that achieve cohesion in tutorial dialogues, since highly interactive tutorial dialogue is intrinsically highly cohesive. In the second stage, we will run a series of controlled studies to test the hypothesis that more highly cohesive tutorial dialogue is more effective for promoting learning than less cohesive dialogue, and to assess the effectiveness of a few selected mechanisms of cohesion. Finally, in the third stage of the project, we will explore the extent to which database tools developed by the computational linguistics community (e.g., WordNet and FrameNet) can automatically tag cohesion in tutorial dialogue. We will also extend these tools and develop algorithms that will allow them to be used to automatically generate cohesive tutor turns for a small sample of student turns, as a first step towards developing a natural-language dialogue engine that can use these tools to generate highly cohesive tutorial dialogue. This grant is in collaboration with Sandra Katz.


    SWoRD is a web-based system to support peer reviewing in a wide variety of disciplinary classroom settings. One result of prior research with SWoRD is an enormous database of written materials that are ripe for analysis and exploitation in support of research on natural language processing (NLP), intelligent tutoring systems (ITS), cognitive science, educational data mining, and improving learning from peer review. In this project we will both analyze existing SWoRD-generated data, and develop an improved version of SWoRD for use in further experimentation. In particular, we will explore using SWoRD to teach substantive skills in domains involving ill-defined problems, and will explore techniques for automatically identifying key concepts and flagging issue understanding. Second, given a SWoRD toolkit of what can be accomplished robustly with peer interactions, we will explore the use of natural language processing to automatically support and improve those interactions. Finally, we will develop a new version of the SWoRD program that incorporates improved features and control facilities, and that incorporates Artificial Intelligence techniques to improve learning in a variety of ways. This is an internal LRDC grant, and is in collaboration with Christian Schunn and Kevin Ashley.


    This research investigates whether responding to student uncertainty over and above correctness improves learning during computer tutoring. The investigation is performed in the context of a spoken dialogue tutoring system, where student speech provides many linguistic cues (e.g. intonation, pausing, word usage) that computational linguistics research suggests can be used to detect uncertainty. Intelligent tutoring systems research suggests that uncertainty is part of the learning process, and has hypothesized that to increase system effectiveness, it is critical to respond to more than correctness. However, most existing tutoring systems respond only to student correctness, and few controlled experiments have yet investigated whether also responding to uncertainty can improve learning. This research designs and implements two different enhancements to the spoken dialogue tutoring system, to test two hypotheses in the tutoring literature concerning how tutors can effectively respond to uncertainty over and above correctness. The first hypothesis is that student uncertainty and incorrectness both represent learning impasses, i.e., opportunities to improve understanding. This hypothesis is addressed with an enhanced system version that treats uncertainty in the same way that incorrectness is currently treated (i.e., with additional subdialogue to increase understanding). The second hypothesis is that more optimal responses can be developed by modeling how human tutor responses to correctness change when the student is uncertain. This hypothesis is addressed by analyzing human tutor dialogue act responses (i.e. content and presentation) to student uncertainty over and above correctness in an existing tutoring corpus, then implementing these responses in a second enhanced system version. Two controlled experiments are then performed. The first tests the relative impact of the two adaptations on learning using a Wizard of Oz version of the system, with a human (Wizard) detecting uncertainty and performing speech recognition and language understanding. The second experiment tests the impact of the best-performing adaptation from the first experiment in the context of the real system, with the system processing the speech and language and detecting uncertainty in a fully automated manner. The major intellectual contribution of the research is to demonstrate whether significant improvements in learning are achieved by adapting to student uncertainty over and above correctness during tutoring, to advance the state of the art by fully automating and evaluating user uncertainty detection and adaptation in a working spoken dialogue system, and to investigate any different effects of this adaptation under ideal versus actual system conditions. This NSF grant is in collaboration with Kate Forbes-Riley.


    Recent studies show that U.S. students lag behind students in other developed countries in math and science. Because one-on-one tutoring has been shown to be a highly effective form of instruction, many educators and education policy makers have looked to intelligent tutoring systems (ITSs) as a means of providing cost-effective, individualized instruction to students that can improve their conceptual understanding of, and problem-solving skills in, math and science. However, even though many ITSs have been shown to be effective, they are still not as effective as human tutors.

    The goal of this Cognition and Student Learning development project is to take a step towards meeting President Obama's challenge to produce "learning software as effective as a personal tutor." We will do this by building an enhanced version of a natural-language dialogue system that engages students in deep-reasoning, reflective dialogues after they solve quantitative problems in Andes, an intelligent tutoring system for physics. Improvements to this system will focus on addressing a key limitation of natural-language (NL) tutoring systems: although these systems are "interactive" in the sense that they try to elicit explanations from students instead of lecturing to them, automated tutors do not align their dialogue turns with those of the student to the same degree, and in the same ways, that human tutors do. In particular, automated tutors often fail to reuse parts of the student's dialogue turns in their own turns, to adjust the level of abstraction that the student is working from when the student is over-generalizing or missing important distinctions between concepts, and to abstract or specialize correct student input when doing so might enhance the student's understanding. Empirical research shows that these forms of lexical and semantic alignment in human tutoring predict learning. The main outcome of this development effort will be a fully working, prototype reflective dialogue version of Andes that can carry out these functions and serve as a research platform for a future study that compares the effectiveness of the enhanced NL tutoring system with the current system, which lacks these alignment capabilities--thereby allowing us to test the hypothesis that it is not interaction per se that explains the effectiveness of human tutoring, but how it is carried out.

    The enhanced version of this reflective dialogue system will be developed through an iterative process of preparing a prototype for experienced physics teachers and students to try out using the "Wizard of Oz" paradigm, identifying cases in which the system does not work as intended (e.g., the tutor prompts the student to generalize or make distinctions when this is not warranted by the discourse context), refining the software to correct these problems, and testing the revised software in a subsequent field trial. The subject pool for these trials will be students enrolled in a first-year physics course at the University of Pittsburgh and high school students taking physics in Pittsburgh urban and suburban schools. During the third (final) year of the project, we will collect pilot data that addresses the feasibility of implementing the system in authentic high school physics classes, and the promise of the system to increase students' conceptual understanding of physics and ability to solve physics problems. The latter will be determined by comparing students' pre- and post-test performance on measures of conceptual understanding and problem-solving ability in physics, and by comparing the performance of students who use the current and enhanced version of the system on these measures. This IES grant is in collaboration with Sandra Katz, Pamela Jordan, and Michael Ford.


    From the instructor's viewpoint, a class writing assignment is a black box. Until instructors actually read the first or final drafts, they do not have much information about how well the assignment has succeeded as a pedagogical activity, and even then, it is hard to get a complete picture. Computer-supported peer review systems such as SWoRD, a scaffolded peer review system can help students to write higher quality compositions in classroom assignments, can help in this regard. The goal of this project is to develop and evaluate methods to provide instructors with a comprehensive overview of the progress of a class writing assignment in terms of how well students understand the issues based on structured reviewing rubrics, feedback students provide and receive in the peer review process, and machine learning computational lingustics analysis of the resulting texts. The SWoRD-based peer-review system will present the instructor's overview via a kind of "Teacher-side Dashboard" that will summarize salient information for the class as a whole, cluster students based on common features of their texts, and enable instructors to delve into particular student's writings more effectively in a guided manner. This is an internal LRDC grant, and is in collaboration with Kevin Ashley, Christian Schunn and Jingtao Wang.


    There has been increasing interest in affective dialogue systems, motivated by the belief that in human-human dialogues, participants seem to be (at least to some degree) detecting and responding to the emotions, attitudes and metacognitive states of other participants. The goal of the proposed research is to improve the state of the art in affective spoken dialogue systems along three dimensions, by drawing on the results of prior research in the wider spoken dialogue and affective system communities. First, prior research has shown that not all users interact with a system in the same way; the proposed research hypothesizes that employing different affect adaptations for users with different domain aptitude levels will yield further performance improvement in affective spoken dialogue systems. Second, prior research has shown that users display a range of affective states and attitudes while interacting with a system; the proposed research hypothesizes that adapting to multiple user states will yield further performance improvement in affective spoken dialogue systems. Third, while prior research has shown preliminary performance gains for affect adaptation in semi-automated dialogue systems, similar gains have not yet been realized in fully automated systems. The proposed research will use state of the art empirical methods to build fully automated affect detectors. It is hypothesized that both fully and semi-automated versions of a dialogue systemthat either adapts to affect differently depending on user class, or that adapts to multiple user affective states, can improve performance compared to non-adaptive counterparts, with semi-automation generating the most improvement. The three hypotheses will be investigated in the context of an existing spoken dialogue tutoring system that adapts to the user state of uncertainty. The task domain is conceptual physics typically covered in a first-year physics course (e.g., Newtons Laws, gravity, etc.). To investigate the first hypothesis, a first enhanced system version will be developed; it will use the existing uncertainty adaptation for lower aptitude users with respect to domain knowledge, and a new uncertainty adaptation will be developed and implemented to be employed for higher aptitude users. To investigate the second hypothesis, a second enhanced systemversion will be developed; it will use the existing uncertainty adaptation for all turns displaying uncertainty, and a new disengagement adaptation will be developed and implemented to be employed for all student turns displaying a second state of disengagement. A controlled experiment with the two enhanced systems will then be conducted in a Wizard-of-Oz (WOZ) setup, with a human Wizard detecting affect and performing speech recognition and language understanding. To investigate the third hypothesis, a second controlled experiment will be conducted, which replaces the WOZ system versions with fully-automated systems.

    The major intellectual contribution of this research will be to demonstrate whether significant performance gains can be achieved in both partially and fully-automated affective spoken dialogue tutoring systems 1) by adapting to user uncertainty based on user aptitude levels, and 2) by adapting to multiple user states hypothesized to be of primary importance within the tutoring domain, namely uncertainty and disengagement. The research project will thus advance the state of the art in both spoken dialogue and computer tutoring technologies, while at the same time demonstrating any differing effects of affect-adaptive systems under ideal versus realistic conditions. More broadly, the research and resulting technology will lead to more natural and effective spoken dialogue-based systems, both for tutoring as well as for more traditional information-seeking domains. In addition, improving the performance of computer tutors will expand their usefulness and thus have substantial benefits for education and society. This NSF grant is in collaboration with Kate Forbes-Riley.


    Peer assessments provided by students are widely used in massively open-access online courses (MOOCs) due to the difficulty of fully-automating assessment for many types of assignments. However, the use of student assessment provides an overwhelming amount of textual information for instructors to process. The proposed research will develop Natural Language Processing methods to support search and large scale analytics of student assessment comments in MOOCs. My liason for this Google Faculty Research Award is Daniel Russell.


    Assessing analytic writing in response to text (RTA) is a means for understanding students’ analytic writing ability and understanding measures of effective teaching. Current writing assessments typically examine “content-free” writing (i.e., writing in response to open-ended prompts divorced from text), although prior work demonstrates that it is also possible to administer and rate student writing in response-to-text. However, there is a significant barrier to scoring writing at scale, as scoring is labor intensive and requires extensive training and expertise on the part of raters to obtain reliable scores.

    Recent advances in artificial intelligence offer a promising way forward for scoring students’ analytic writing at scale. Natural language processing (NLP) experts have been working for decades on producing ways to reliably score student writing holistically. The state-of-the-art of automated essay systems (AES) indicates that AES can produce scores as reliable as human ratings in the sense that they can be trained to score similarly to humans on holistic measures of writing, especially for short, timed student responses. To move the field forward, however, there is a need for writing assessments that are aligned with authentic writing tasks. Second, there is a need to explore whether AES algorithms can reliably score across multiple dimensions of student writing. Our assessement includes five dimensions (analysis, evidence, organization, style/vocabulary and mechanics/usage/grammar/syntax) and it will be important to see if AES designs can rate substantive dimensions such as analysis and evidence as well as they can rate more surface and structural dimensions of writing. This LRDC internal grant is in collaboration with Rip Correnti and Lindsay Clare Matsumura.


    This study will adapt and apply existing Artificial Intelligence techniques from Natural Language Processing and Machine Learning to automatically scaffold the peer reviewing and revising-from-peer-review process. Utilizing an iterative development plan more complex and refined versions of the system will be used with heavy testing in October and March each year. Researchers will undertake three different but partially integrated interventions: automatic detection of effective review comment features, automatic detection of thesis statements and related comments, and facilitating author revision by organizing review comments and author response planning. The pilot experiment will take place in a high school setting the last 6 months of the grant.

    Iterative development will be conducted in four classroom environments: high school science, high school English/social studies, a university physics lab, and a university psychology class. The comparison group will include students using the same web-based peer review system as the treatment group, but with all the intelligent scaffolding interventions disabled. This IES grant is in collaboration with Kevin Ashley, Amanda Godley and Christian Schunn.


    The degree and quality of interaction between students and instructors are critical factors for students' engagement, retention, and learning outcomes across domains. This is especially true for the introductory STEM courses at the undergraduate level since these courses are generally taught in lecture halls due to a large number of students enrolled. Recent developments in educational technology such as MOOCs and financial troubles in universities make it safe to predict that the class size problem will only get worse both in traditional face-to-face and online classes. So, how can we modify the passive nature of lectures and increase the interaction while actively involving both students and instructors in the learning process in these circumstances?

    In order to address this problem, we propose integrating Natural Language Processing (NLP) with a mobile application that prompts students to reflect as well as provide immediate and continuous feedback to instructors about the difficulties that their students encounter. By enhancing the student reflection and instructor feedback cycle with technological tools, this project will incorporate three lines of research: 1) role of students' reflection and instructor's feedback on students' retention and learning outcomes, 2) effectiveness and reliability of NLP to summarize written responses in a meaningful way, and 3) value and design of mobile technologies to improve retention and learning in STEM domains. This LRDC internal grant is in collaboration with Muhsin Menekse and Jingtao Wang.


    This project evaluates the viability of revision as a pedagogical technique by determining whether student interactions with an NLP-based revision assistant enables them to learn to write better -- that is, whether certain forms of the feedback (in terms of the perceived purposes and scopes of changes) encourage students to learn to make more effective revisions. More specifically, the project works toward three objectives: (1) Define a schema for characterizing the types of changes that occur at different levels of the rewriting. For example, the writer might add one or more sentences to provide evidence to support a thesis; or the writer might add just one or two words to make a phrase more precise. (2) Based on the schema, design a computational model for recognizing the purpose and scope of each change within a revision. One application of such a model is a revision assistant that serves as a sounding board for students as they experiment with different revision alternatives. (3) Conduct experiments to study the interactions between students and the revision writing environment in which variations of idealized computational models are simulated. The findings of the experiments pave the way for developing better technologies to support for student learning. This NSF grant is in collaboration with Rebecca Hwa.


The PIs are investigating the design of intelligent tutoring systems (ITSs) that are aimed at learning in unstructured domains. Such systems are not able to do as much automatically as ITSs working in traditionally narrow and well-structured domains, but rather they need to share responsibilities for scaffolding learning with a teacher and/or peers. In the work proposed, the three PIs, who share expertise in automated natural language understanding, intelligent tutoring systems, machine learning, argumentation (especially in law), complex problem solving, and engineering education, are integrating intelligent tutoring, data mining, machine learning, and language processing to design a socio-technical system (people and machines working together) that helps undergraduates and law students write better argumentative essays. The work of helping learners derive an argument is shared by the computer and peers, as is the work of helping peer reviewers review the writing of others and the work of learners to turn their argument diagrams into well-written documents. Research questions address the roles computers might take on in promoting writing and the technology that enables that, how to distribute scaffolding between an intelligent machine and human agents, how to promote better writing (especially the relationship between diagramming and writing), and how to promote learning through peer review of the writing of others.

This project is bringing together outstanding researchers from a variety of different disciplines -- artificial intelligence, law education, engineering and science education, and cognitive psychology -- to address an education issue of national concern -- writing, especially writing that makes and substantiates a point -- and to explore ways of extending intelligent tutoring systems beyond fact-based domains. It fulfills all aims of the Cyberlearning program -- to imagine, design, and learn how to best design and use the next generation of learning technologies, to address learning issues of national importance, and to contribute to understanding of how people learn. This NSF grant is in collaboration with Kevin Ashley and Christian Schunn.