BoB dialogue corpus info


Basic characterization of BoB dialogues

As the following table with a few example dialogues between BoB and library users exemplifies, user-BoB dialogues are conducted in English, German and Italian. The average number of user queries (i.e., questions, keywords or assertions) per dialogue is around 4, and the average length of a user query is 4 words. BoB answers with one of several hundred canned-text system responses, which typically consist of short one or a few sentences (26 words on average).

bobDataPreview



Feel free to try out BoB yourself, to get a better idea of the type of dialogues contained in the corpus.

Corpus size

Constantly increasing number of dialogues in 3 languages

BoB has been collecting user-system dialogues on the Library web-site of the Free University of Bozen-Bolzano for several months. More precisely, the three language versions of BoB have been online as follows:

English

German

Italian

Quantitative description of BoB dialogue corpus files

We have been saving the collected dialogue data into separate spreadsheet tables every couple of months. The single files are well-suited for adding the different levels of hand-annotation that we will describe below. The following table gives an overview of the corpus files that are currently available, and their respective status concerning the level of annotations available (refer to the section dedicated to these annotation levels).

Levels of meta-data in BoB dialogue corpus

Meta-data from hand annotation

The following table lists the table columns of the BoB dialogue data (as exemplified in the table on the main page), specifying which of the columns stores hand-annotated information:

The values in the columns filled by hand-annotation are described as follows:

Column ID Description
C Question relevance. 0 = we might want to ignore this Q (typo, strange syntax, test/keywords, Out-of-domain, SubDialogue continuation of ‘0’); 1 = interesting Q (or sub-dialogue continuation of ‘1’); 2 = semi-interesting Q, with no relevance for library domain (greeting, thanks, smalltalk, sub-dialogue continuation of ‘2’)
D Answer correctness. 1 = current Answer (can be ‘no pattern matched’ apology message!) is correct; 2 = corrected answer in column J, or answer has to be added to BoB’s repository
E Follow-up Question (FU Q) type. 1 = FU Q; 10 = FU Q semantically breaking sub-dialogue; 2 = Topic Shift Q; 20 = Topic Shift Q semantically breaking sub-dialogue; 4 = Q semantically follows sub-dialogue (possible only IF previous answer was marked ‘correct’); 8 = Q Rephrases previous Q; 80 = Q rephrases and semantically breaks sub-dialogue
F
FU Q sub-type. 1 = context-dependent FU Q; 0 = fully specified FU Q
J
Hand-corrected system answer. Provided for some rows where column D == 2

Please note that hand annotation is currently only available for some of the dialogue data files (as specified in the table at the top of this page), and even for the files where it is available, it is not provided for all dialogues.

Automatic meta-data from BoB dialogue management

The remaining columns of the BoB dialogue corpus files contain information that does not rely on hand annotation. Whereas some columns contain the dialogue utterances along with time stamp information, other columns exhibit meta information form BoB's dialogue management routine. We briefly describe these latter columns here. Please refer to the BoB dialogue corpus manual for further details, also about all remaining data columns. 

Column ID Description
M
[aIsLocal]: Context-dependent FU Qs (“local” question patterns). For each answer it returns to the user, BoB keeps track of whether this answer was part of a question pattern marked as context-dependent (“local”). 1 iff the answer was retrieved from a question pattern flagged as “context-
dependent FU Q” / “local”
N
[aIsError]: Apology message (user question not understood). 1 iff the answer is an apology message that BoB did not understand the previous user input
Q
[qTechnicallyContinuesSD]: sub-dialogue continuation. 1 iff this question was issued when BoB was in sub-dialogue mode, and the question continues the sub-dialogue by following one of the proposed sub-dialogue paths
R
[qTechnicallyBreaksSD]: breaking out of sub-dialogue. 1 iff this question was issued when BoB was in sub-dialogue mode, and the question breaks the sub-dialogue by NOT following any of the proposed sub-dialogue paths

The BoB dialogue corpus manual

You can download the manual, containing all the information above, and more details, in the attachments section at the bottom of this page.
Ċ
Manuel Kirschner,
Jul 2, 2010, 1:46 AM
Comments