Experimenting with the Neural Conversation Model

These are the experiments for The Neural Conversation Model carried out during Spring 2016, CMU. The (private for now) Github repo is maintained here.

This is the model used for the experiments. The datasets have been developed by me.

Idea

The problem at hand is to model dialogs so that they appear human like. When the system communicates with a person through text, it learns about their knowledge and interests. By using Memory Networks built using LSTMs, a modified form of RNN’s, for the predefined toy tasks T ∈ {“where is” , “what is in”}, we can build a smart conversation agent with added functionality of learning information over the course of the dialogue towards a cognitive agent that is able to converse with a person.

Data

The data is in the form of a dialogue between the user and the system, and the system answers a question based on the context of all the previous information provided to it.

Consider a use case where of different buildings, where there are restaurants serving different food items. To make things simple and create sufficient training data consider a scenario of ten buildings, each one of them having ten different restaurants and ten different dishes.

Buildings	Restaurants	Foods
BUILDING1, BUILDING2, BUILDING3, BUILDING4, BUILDING5, BUILDING6, BUILDING7, BUILDING8, BUILDING9, BUILDING10	REST1, REST2, REST3, REST4, REST5, REST6, REST7, REST8, REST9, REST10	FOOD1, FOOD2, FOOD3, FOOD4, FOOD5, FOOD6, FOOD7, FOOD8, FOOD9, FOOD10

Creating the Dataset

The data set was created using different possible combinations of the above entities but only one-to-one mapping was considered between them. Also apart from the sentences having the answer noise statements were also added in the dialogue to simulate a real life scenario. The experiments are on Wh questions (i.e. Where is, What is in etc.) and Yes/No questions.

Types of Dataset

The following types of datasets were synthetically generated to carry out various experiments:

Randomized dataset with one, two and three noise statements, single turn dialogue and a Wh question.
Randomized dataset with one, two, three, four, five and six noise statements, two turn dialogue and Wh questions.
Randomized dataset with one, two and three noise statements, single turn dialogue and a Y/N question.
Randomized dataset with one, two, three, four, five and six noise statements, two turn dialogue and Y/N questions.
Randomized dataset with one, two and three noise statements, two turn dialogue and a combination of Wh and Y/N questions.

Examples of the above datasets are:

Randomized dataset with two noise statements, single turn dialogue and a Wh question
  
I am at BUILDING10
BUILDING10 has REST9
BUILDING1 has REST7
BUILDING5 has REST8
REST7 serves FOOD1
REST9 serves FOOD3
REST8 serves FOOD5
Which is the closest restaurant?
 
REST9

Randomized dataset with one noise statement, single turn dialogue and a Y/N question

I am at BUILDING4
BUILDING9 has REST2
BUILDING4 has REST5
REST2 serves FOOD2
REST5 serves FOOD7
Is REST2 closest to my current location?    

No

Randomized dataset with two noise statements, single turn dialogue and a Y/N question

I am at BUILDING3
BUILDING3 has REST2
BUILDING9 has REST4
BUILDING10 has REST9
REST9 serves FOOD6
REST4 serves FOOD3
REST2 serves FOOD8
Is REST2 closest to my current location?    

Yes

Randomized dataset with three noise statements, two turn dialogue and Wh questions

I am at BUILDING4
BUILDING8 has REST10
BUILDING10 has REST4
BUILDING4 has REST1
BUILDING6 has REST6
REST10 serves FOOD10
REST4 serves FOOD6
REST1 serves FOOD3
REST6 serves FOOD2
Which is the closest restaurant?

REST1

REST1 is closest to my current location
What does it serve?

FOOD3

Randomized dataset with three noise statements, two turn dialogue and Y/N questions

I am at BUILDING2
BUILDING1 has REST2
BUILDING3 has REST4
BUILDING9 has REST1
BUILDING2 has REST10
REST10 serves FOOD6
REST4 serves FOOD9
REST1 serves FOOD10
REST2 serves FOOD5
Is REST10 closest to my current location?

Yes

REST10 is closest to my current location
Does it serve FOOD6?

Yes 

Randomized dataset with a noise statement, two turn dialogue having Wh & Y/N question

I am at BUILDING8
BUILDING6 has REST1
BUILDING8 has REST4
REST4 serves FOOD2
REST1 serves FOOD10
Is REST1 closest to my current location?

No

REST4 is closest to my current location
What does it serve?

FOOD2

Randomized dataset with one noise statement, single turn dialogue and a Wh question
  
I am at BUILDING8
BUILDING9 has REST9
BUILDING8 has REST6
REST6 serves FOOD3
REST9 serves FOOD10
Which is the closest restaurant?

REST6

Training

The memory network is trained on a set of training stories. Each of the story enlists a set of factual statements, a question, its answer and the set of supporting factual statements that help answer that question. Using these as inputs, the network learns how to answer questions.

Consider the single conversation story:

I am at BUILDING6
BUILDING6 has REST9
BUILDING1 has REST2
BUILDING4 has REST10
which is the closest restaurant? REST9	 1 2

where:

Statement 1-4 represent facts. Statement 5 consists of a question, its answer (REST9) and the 2 supporting statements (1 & 2) that helped derive that answer.

Consider the double conversation story:

I am at BUILDING6
BUILDING6 has REST9
BUILDING1 has REST2
BUILDING4 has REST10
which is the closest restaurant?	REST9	 1 2
REST9 serves FOOD3
REST2 serves FOOD9
REST10 serves FOOD7
what does it serve?	FOOD3	1 2 6

The context for both the questions “which is the closest restaurant” and “what does it serve” is the same:

` [I, am, at, BUILDING6], [BUILDING6, has, REST9], [BUILDING1, has, REST2], [BUILDING4, has, REST10], [REST9, serves, FOOD3], [REST2, serves, FOOD9], [REST10, serves, FOOD7] `

Experiments

A few screenshots of the final web application. This is done by running the web application on the local host server.

Example 1

The question presented is “Does it serve food1”. To completely answer this question, the memory network must:

Understand what “it” refers to
Relate the entity “food1” to another entity iteratively in different networks (denoted by Mem 1, Mem 2 and Mem 3)

The different steps entailed in calculating the final answer are as follows:

Based on all the sentences present in each compartment of each memory network, first find the most recent sentence, which is present at the last time stamp. This will give the entity that is related to “it”. a. The solution that the memnets find for this step is “rest8” with confidence value of 0.38 calculated by Mem 2. b. For the memnet, the question now becomes “Does rest8 serve food1”?
Next, the memnet needs to find the truth in the question, it can either be yes or no, based on the confidence value of each. a. Mem 3 finds that “rest8 serves food8” with a confidence value of 0.37. b. Mem 1 finds that “rest2 serves food1” with a confidence value of 0.81. c. These three statements coupled together form a transitive relation, which outputs a “no”, which is the final result that we see.

Note that the sum of the confidence values for all the episodic memories in one memory is approximately equal to 1. This indicates the confidence of that memory in finding the current entity relation.

Example 2

The question presented is “Which is the closest restaurant”. To completely answer this question, the memory network must:

Understand “which” refers to what entity
Relate the entity “restaurant” to another entity iteratively in different networks (denoted by Mem 1, Mem 2 and Mem 3)

Note that the memory networks knows “restaurant” is an entity based on the training data and it is able to figure out how “closest” relates to “restaurant”. The network may not know what these entities actually are in the real world.

The different steps entailed in calculating the final answer are as follows:

Based on all the sentences present in each compartment of each memory network, first find the sentence which gives the current location of the use. This will give the entity that is related to “closest”. a. The solution that the memnets find for this step is “building10” with confidence value of 1.00 calculated by Mem 2. b. For the memnet, the question now becomes “Which is the closest restaurant to building10”?
Next, the memnet needs to find the truth value or the answer for the question, it needs to be a word, like “rest10” based on the confidence value of each. As mentioned above, the relation of “Restaurant” to “Rest” is learnt from the training data. a. Mem 3 finds that “building10 has rest10” with a confidence value of 0.98. b. These three statements coupled together form a transitive relation, which outputs the answer “rest10”, which is the final result that we see.

Results

The system performs poorly when the number of irrelevant support sentences in the story are too high. These statements end up adding noise to our model. This could be solved by judging the relevance of a statement by classification. Another possibility is to increase the number of hops or the history limit upto which the memory network can remember the context.
A question whose answer doesn’t lie in the context/story is difficult to handle. This could be possibly alleviated by adding more support statements related to the context.

blog comments powered by Disqus

Published

01 May 2016