多轮对话:should have Better objective functions and evaluation metrics

1.Neural Responding Machine for Short-Text Conversation

2015较老
data: 微博数据, 每句限长140个字。每条微博有平均20条回复，共约22w个微博，也就是约440w条问答对。
方法：带attention的seq2seq翻译模型

2.Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

2016 AAAI

For me there are two main takeaways. First, the use of a hierarchy of RNNs using one to model the sequence of utterances in the dialogue, and one to model the sequences of tokens in an individual turn. And secondly the value of bootstrapping the model using external data, which makes a significant difference to model performance.

utterance是某轮里面的某个人的话，tokens就是word。然后web query任务就是那种百度上搜一个问题他直接给你回复了，我想起来了，这种任务实质就是阅读理解任务。然后boostrap是分为两个，一个是用一个大规模的word embed，可以让他会更多的词，另一个方面就是在另一个非对话数据集上pretrain，让他会说话

2.A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

引入随机隐变量来刻画utterance之间的关系

3.Hierarchical recurrent attention network for response generation.

extended the hierarchical structure with the attention mechanism [2] to at- tend to important parts within and among utterances with word level attention and utterance level attention, respectively.

4.Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

5.AN ABSTRACTIVE APPROACH TO QUESTION ANSWERING

6.Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation

7.2017AAAITopic Aware Neural Response Generation

8.How to Make Context More Useful?An Empirical Study on Context-Aware Neural Conversational Models

统计行数方法

wc -l

统计轮数方法

4244675/492065= 8.6262485647

#AAAI2018对话相关

Dialogue Act Sequence Labeling Using Hierarchical Encoder with CRF

In some approaches, a hierarchical convolutional and recurrent neural encoder model are used to learn utterance representations by processing a whole conversation. The utterance representations are further used to classify DA classes using the conditional random field (CRF) as a linear classifier. However, these models might fail in a dialogue system where one can perceive the past utterances, but cannot see future ones.

任务叫Dialogue Act recognition，就是用rnn把句子表示之后放进crf做序列标注

Eliciting Positive Emotion through Affect-Sensitive Dialogue Response Generation: A Neural Network Approach

生成式回复

Augmenting End-to-End Dialogue Systems with Commonsense Knowledge

Improving Variational Encoder-Decoders in Dialogue Generation

Varitional encoder-decoder (VED)已经被广泛应用于对话生成，但与用于编码和解码的强大RNN结构，隐向量分布通常由一个简单的多的模型来近似，导致了KL弥散和难以训练的问题。在本篇论文中，作者将训练过程拆分为两个阶段：第一个阶段负责学习通过自编码(AE)将离散的文本转换为连续的embedding；第二个阶段学习通过重构编码得到的embedding来泛化隐含表示。这样一来，通过单独训练一个VED模型来对高斯噪声进行变化，进而采样得到隐变量，能够得到一个更加灵活的分布。

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Personalizing a Dialogue System with Transfer Reinforcement Learning

It is difficult to train a personalized task-oriented dialogue system because the data collected from each individual is often insufficient. Personalized dialogue systems trained on a small dataset can overfit and make it difficult to adapt to different user needs. One way to solve this problem is to consider a collection of multiple users' data as a source domain and an individual user's data as a target domain, and to perform a transfer learning from the source to the target domain. By following this idea, we propose "PETAL"(PErsonalized Task-oriented diALogue), a transfer-learning framework based on POMDP to learn a personalized dialogue system. The system first learns common dialogue knowledge from the source domain and then adapts this knowledge to the target user. This framework can avoid the negative transfer problem by considering differences between source and target users. The policy in the personalized POMDP can learn to choose different actions appropriately for different users. Experimental results on a real-world coffee-shopping data and simulation data show that our personalized dialogue system can choose different optimal actions for different users, and thus effectively improve the dialogue quality under the personalized setting.

Elastic Responding Machine for Dialog Generation with Dynamically Mechanism Selecting

RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs

Context Aware Conversational Understanding for Intelligent Agents with a Screen

Conversational Model Adaptation via KL Divergence Regularization

Towards a Neural Conversation Model with Diversity Net Using Determinantal Point Processes

Towards Building Large Scale Multimodal Domain-Aware Conversation Systems

Exploring Implicit Feedback for Open Domain Conversation Generation

Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

A Knowledge-Grounded Neural Conversation Model

propose a knowledge grounded approach which infuses the output utterance with factual information relevant to the conversational context without slot filling.

The neural architecture of the knowledge grounded model which uses a set of external world facts to augment the output utterance generated bt the model. Instead of just having a set of facts to augment the conversation, a richer way could be to use knowledge graphs or commonsense knowledge bases which consist of [entity-relation-entity] triples

proposed a knowledge grounded neural conversation model[3], where the research is aiming at combining conversational dialogs with task-oriented knowledge using unstructured data such as Twitter data for conversation and Foursquare data for external knowledge. However, the task is still limited to a restaurant information service, and has not yet been tested with a wide variety of dialog tasks.

CoChat: Enabling Bot and Human Collaboration for Task Completion

propose a memory-enhanced hierarchical RNN (MemHRNN) to handle the one-shot learning challenges caused by instantly introducing new actions in CoChat.

aaai2019

Why social bots?
• Maximize user engagement by generating enjoyable and more human-like conversations
• Help reduce user frustration
• Influence dialogue research in general
(social bot papers often cited in task-completion dialogue papers)

• 2010: Response retrieval system (IR) [Jafarpour+ 10]
• 2011: Response generation using Statistical Machine Translation
(phrase-based MT) [Ritter+ 11]
• 2015: First neural response generation systems (RNN, seq2seq)
[Sordoni+ 15; Vinyals & Le 15; Shang+ 15]

Similar to sequence models in Neural Machine Translation (NMT), summarization, etc. Uses either RNN, LSTM, GRU, etc.
Source:conversation history.
Target:response.

Blandness problem: cause and remedies Common MLE objective (maximum likelihood)
[Li+ 16a]提出Mutual Information for Neural Network Generation(Mutual information objective)

consistency problem:

Personalized Response Generation [Li+ 2016b]
Personal modeling as multi-task learning [Luan+ 17]
Improving personalization with multiple losses [Al-Rfou+ 16]

Long conversational context problem:

It can be challenging for LSTM/GRU to encode very long context (i.e. more than 200 words: [Khandelwal+ 18])
Hierarchical Encoder-Decoder (HRED) Serban+ 16 + conversation (turn by turn))
Hierarchical Latent Variable Encoder-Decoder (VHRED) [Serban+ 17] （Adds a latent variable to the decoder，and Trained by maximizing variational lower-bound on the log-likelihood）which was Related to persona model [Li+ 2016b]:Deals with 1-N problem, but unsupervisedly.

Grounded problem: A Knowledge-Grounded Neural Conversation Model [Ghazvininejad+ 17] or Conversations around images e.g.,Q-As [Das+ 16] or chat [Mostafazadeh+ 17] or Grounding: affect [Huber+ 18]

DSTC7 Challenge: Knowledge-Grounded Conversation（“Sentence Generation” track (61 registrants as of June) Registration link: http://workshop.colips.org/dstc7/call.html）

Emergence of reinforcement learning (RL) for E2E dialogue

Tries to promote long-term dialogue success

REINFORCE algorithm [Williams+ 92]

Reward functions:

Ease of answering:-Pr
Information flow:-logSigmoidcos
Meaningfulness:logP+logP

Survey on dialogue datasets [Serban+ 15]

Evaluate problem：

Human evaluation (crowdsourcing)
automatic:

Machine-Translation-Based Metrics:BLEU [Papineni+ 02]: ngram overlap metric、NIST [Doddington+ 02]（Seldom used in dialogue, but copes with blandness issue
• Considers info gain of each ngram: score(interesting calculation) >> score(of the)）、METEOR（Accounts for synonyms, paraphrases, etc.）

Trainable Metric
• Towards an automatic turning test [Lowe+ 17]: ADEM: Metric based on hierarchical RNN (VHRED)

problem:

Dialogue task:“How NOT to evaluate dialogue systems” [Liu+ 16]

But same problem even for Translation task
[Graham +15]

motivation

1.Challenge: The blandness problem
2.Challenge: The consistency problem
3.Challenge: Long conversational context
4.Challenge: Grounded
5.Reward functions:

Ease of answering:-Pr
Information flow:-logSigmoidcos
Meaningfulness:logP+logP

6.Datasets
7.Evaluate

in a netshell：

MLE causes blandness (mitigated by MMI)
Evaluation metrics (BLEU, METEOR, etc.) reliable only on large datasets➡️expensive for optimization (e.g., sequence-level training [Ranzato+ 15])
RL reward functions currently too ad-hoc

Open Benchmarks

Alexa Challenge (2017-)

–Academic competition, 15 sponsored teams in 2017, 8 in 2018

– $250,000 research grant (2018)

– Proceedings [Ram+ 17]

Dialogue System Technology Challenge (DSTC) (2013-) (formerly Dialogue State Tracking Challenge)
Focused this year on grounded conversation: Visual-Scene [Hori +18], background article [Galley +18]
Conversational Intelligence Challenge (ConvAI) (2017-)
Focused this year on personalized chat (FB Persona-Chat dataset)

code

旧hred：python hred_main.py –path=./hred_pretrain/ –w2v=./word2vec/word2vec.128d.117k.bin –emb_dim=128 –max_sent=10 –batch=32 –vsize=30000 -seshid=512 –max_word=50

新hred：python hred_main.py –path=./hred_pretrain/ –emb_dim=128 –max_sent=10 –batch=32 –vsize=30000 -seshid=512 –max_word=50 –emtraining –model=GRU –lr_p=5 –w2v=./word2vec/word2vec.128d.117k.bin

rl:python train_full_rl.py –path=./saverl_model/ –abs_dir=../fast_rl_init/hred_pretrain –ext_dir=./hred_extractor –lr_p=5 –ckpt_freq=5000 –patience=10