(2005-10-31) A [[Chatbot]] System as a Tool to Animate a Corpus

Bayan Abu Shawar and Eric Atwell: A chatbot system as a tool to animate a corpus. ELIZA... Loebner prize competition

Most chatbots are restricted to knowledge that is manually “hand coded” in their files, and to a specific natural language which is written or spoken. To overcome this problem and to generate chat which is closer to human language, we developed a program to convert a machine readable text (corpus) to a spe- cific chatbot format, which is used to retrain the chatbot.

the British National Corpus (BNC)

In this paper, we present the chatbot system as a tool to explore or visualize different types of English language used in the BNC corpus in a qualitative man- ner in contrast to tools such as Wmatrix which visualises a corpus in terms of quantitative statistics

Section 2 reviews the ALICE chatbot system

Section 4 presents the BNC corpus and the problems which arose during the automation process, and the software tool that maps the BNC spoken transcripts to ALICE chatbot internal knowledge representation, called AIML

2 The ALICE chatbot system

first implemented by Wallace in 1995

AIML, or Artificial Intelligence Mark-up Language, is a derivative of Extensible Mark-up Language (XML). It was developed by Wallace and the Alicebot free software community during 1995-2000 to enable people to input dialogue pattern knowledge into chatbots based on the A.L.I.C.E. open-source software technology. AIML consists of data objects called AIML objects, which are made up of units called topics and categories

2.1 Types of ALICE/AIML categories

There are three types of categories: atomic categories, default categories, and recursive categories.

Atomic categories

Default categories

Recursive categories

3 Human to human versus human to chatbot dialogues

3.1 ALICE against Spoken Professional American English transcripts

4 Training ALICE with conversation transcripts

To retrain ALICE with English corpora, we examined these sources of English dialogue: the dialogue Diversity Corpus (DDC) and the spoken part of the British National Corpus (BNC), and online FAQ (Frequently Asked Ques- tions) websites. Two main goals are achieved using the BNC: the ability of the program to generate more than one million categories extracted from the BNC; and the ability to use the chatbot as a tool to explore and visualise or animate the domain-specific English dialogue.

4.1 Problems with using BNC as chatbot training data

4.2 Using the BNC word-frequency list

5 A chatbot as a tool to visualize or animate corpus texts

6 Conclusions


Edited:    |       |    Search Twitter for discussion