The BoB project, along with the dialogues collected in its course, has given rise to different research activities that were conducted at the KRDB Research Centre (have a look at the BoB-related publications by Raffaella Bernardi and Manuel Kirschner). When these activities culminate in research theses, we will briefly describe them on this page. We are always interested in learning about your research activities involving the BoB dialogue data. Please get in touch and have your research presented on this page! Question and Answer Classifier for closed domain Interactive Question Answering(2009 EMLCT Masters
thesis by Đinh Lê Thành) AbstractNowadays natural language processing has made big progress thanks to the application of statistical approaches and to the large amount of data available to train the systems. These progresses are pushed by the several evaluation campaigns. Thanks to them systems are compared and progress measured. These evaluations are mostly based on data sets artificially developed by the organizers of such evaluation campaigns. In our work we show that though useful these data sets are biased and there is the need of developing data generated in a more natural setting by real users. We consider as case studies the classification of questions. In particular we look at the classification of questions types needed in Question Answering systems, and the classification of follow up questions into topic continuation and topic shift needed in Interactive Question Answering. We evaluate classifiers first on TREC data and than on a corpus of real user’s data. In both cases the performance of the classifiers drops significantly showing the need of working on more users centered systems. The results also show that the classifiers could be better fine tuned taking into account the new challenges real users data launch to NLP systems. We leave this for future research. Further informationThe BoB question classifier code repositoryDeep analysis in IQA: evaluation on real Users dialogues(2009 EMLCT Masters thesis by Zorana Ratkovic)AbstractInteractive Question
Answering (IQA) is a natural and cohesive way for a user to obtain
information by interactive with a system using natural language. With
the advancement in Natural Language Processing, research in the field of
IQA has started to focus on the role of semantics and the discourse
structure in these systems. The need for a deeper analysis, which
examines the syntax and semantics of the questions and the answers is
evident. Using this deeper analysis allows us to model the context of
the interaction. I will look at a current closed-domain IQA system which
is based on Linear Regression modeling. This system uses superficial and
non-semantically motivated features. I propose adding deep analysis and
semantic features in order to improve the system and show the need for
such analysis. Particular attention will be placed on the so-called
follow-up questions (questions that the user poses after having received
some answer from the system) and the role of context. I propose that
adding the linguistically heavy features will prove beneficial, thereby
showing the need for such analysis in IQA systems. The Structure of Real User-System Dialogues in Interactive Question Answering(2010 PhD thesis by Manuel Kirschner) AbstractWhen users engage in (typed) conversations with an Interactive Question Answering (IQA) system, user questions are typically not asked in isolation. The questions' context, i.e., the preceding interactions, should be useful for understanding Follow-Up Questions (FU Qs) and helping the system pinpoint the correct answer. In this work, we study how much context, and what elements of it, should be considered to answer FU Qs. We harness Logistic Regression Models (LRMs), both for learning which aspects of dialogue structure are relevant to answering FU Qs, and for comparing the accuracy with which the resulting IQA systems can correctly answer these questions. Unlike much of the related research in IQA, which uses artificial collections of user questions, our work is based on real user-system dialogues we collected via a chatbot-inspired help-desk IQA system we deployed on the web site of our University library.
Further informationManuel's personal homepage with he PhD thesis and related publications. The data set (dialogue snippet set) used for the Machine Learning experiments in the thesis are based on a subset of the BoB dialogue corpus, namely the English dialogues gathered between September 2008 and June 2009. The snippet set can be obtained under the same terms and conditions as the full BoB dialogue corpus. |