CN113094475B

CN113094475B - Dialog intention recognition system and method based on context attention flow

Info

Publication number: CN113094475B
Application number: CN202110634398.4A
Authority: CN
Inventors: 江岭; 黄鹏; 张振羽
Original assignee: Chengdu Xiaoduo Technology Co ltd
Current assignee: Chengdu Xiaoduo Technology Co ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-09-21
Anticipated expiration: 2041-06-08
Also published as: CN113094475A

Abstract

The invention provides a dialogue intention recognition system and a method based on context attention flow, which comprises an input coding module, an autocorrelation coefficient analysis module, a feedforward neural network and a multitask learning module, wherein the input coding module is used for coding a dialogue intention; the input coding module is used for coding an input sentence containing a plurality of words to obtain a corresponding representation vector; the autocorrelation coefficient analysis module is used for splicing the characterization vectors of the current statement and the characterization vectors of the historical dialogue statements and then calculating to obtain the expression vector of the previous statement fused with the problem information; then, performing feature fusion according to the above sentence expression vector to obtain a context sentence expression vector fused with the dialogue context information; finally, performing dot product operation according to the representation vector of the current statement and the representation vector of the context statement to obtain a feature vector for intention identification; the multi-task learning module is used for optimizing the feature vectors according to the total loss function of the system, and therefore efficiency and accuracy of conversation intention recognition are improved.

Description

Dialog intention recognition system and method based on context attention flow

Technical Field

The invention relates to the technical field of computers, in particular to a system and a method for recognizing dialog intentions based on contextual attention flow.

Background

The core function module of the dialogue robot is intention identification. The robot firstly needs to predict the corresponding intention according to the dialogue sentences sent by the user, and then sends the corresponding answer to the user based on the intention, thereby completing the online automatic response. The conversation is a process of multi-turn question answering, but at present, when the online robot identifies the intention, only the content of a single sentence is considered, and the intentions in many conversations cannot be identified through the content of the single sentence. Therefore, the on-line robot has the intention of having a considerable number of sentences that cannot be accurately recognized on a single sentence basis, resulting in a failure of the robot's question-answer response.

In order to solve the problem of intention recognition in multi-turn dialogue question-answering, two types of methods are mainly adopted in the industry and academia at present:

memory network-based methods-memory networks generally include an input encoding module, a memory module, and an output prediction module. The method generally maintains a memory slot space (memory module), stores historical statements above the conversation, dynamically updates the memory state of the network by applying an attention mechanism, generates a feature vector based on the memory state, and predicts the intention of the conversation based on the feature vector.

The reading understanding method based on the reading understanding technology, namely a reading understanding model, generally adopts an encoder to encode input articles and questions, obtains the representation of word granularity of the articles through the technologies of mutual attention, self-attention and the like of article contents and question contents, constructs two starting and ending position prediction heads, predicts the probability (P (start)) of each word as the starting position and the probability (P (end)) of the ending position of a question answer, and finally selects a group of phrases with the maximum probability (P (start)) P (end) to form the question answer.

Although these historical sentences can be accurately positioned when the reading understanding-based technology is used for processing the problems that the multiple turns of conversations depend on, topic articles required for reading understanding are difficult to obtain in the industry, and after relevant historical sentences are obtained, a construction model is required to further fuse the sentence information and current sentence information so as to predict the conversation intention. The model based on the memory network cannot directly select related historical dialogue sentences as the above dependence information of the dialogue, so that the model is difficult to accurately fuse the dialogue above information into the current sentence. Furthermore, it is also possible to select coding features of some of the sentences repeatedly at a time, resulting in the model not being able to adequately focus on other relevant features, affecting the ability of the model to model multiple rounds of conversation.

Therefore, it is necessary to provide a scheme in order to improve efficiency and accuracy of recognition of the dialog intention and enhance the response capability of the robot.

Disclosure of Invention

The invention aims to provide a system and a method for recognizing dialog intentions based on contextual attention flow, which are used for realizing the technical effect of improving the efficiency and the accuracy of dialog intention recognition.

In a first aspect, the present invention provides a contextual attention flow based dialog intent recognition system comprising: the system comprises an input coding module, an autocorrelation coefficient analysis module, a feedforward neural network and a multitask learning module;

the input coding module is used for coding an input sentence containing a plurality of words to obtain a corresponding representation vector; the input sentences comprise a plurality of historical conversation sentences and current sentences of known conversation intents and conversation types in the conversation sample set;

the autocorrelation coefficient analysis module is used for splicing the characterization vectors of the current statement and the characterization vectors of the historical dialogue statements and then calculating to obtain the expression vector of the previous statement fused with the problem information; then, performing feature fusion according to the above sentence expression vector to obtain a context sentence expression vector fused with the dialogue context information; finally, performing dot product operation according to the representation vector of the current statement and the representation vector of the context statement to obtain a feature vector for intention identification;

the feedforward neural network is used for processing the characteristic vector and inputting the processed characteristic vector into the multi-task learning module;

the multi-task learning module is used for calculating according to the processing result of the feedforward neural network and the actual dialogue intention of each historical dialogue statement to obtain a corresponding dialogue intention identification loss function; analyzing according to the processing result of the feedforward neural network and the actual type of each historical dialogue statement to obtain a corresponding dialogue upper-part type recognition loss function; simultaneously splicing the characterization vectors of the current statement and the characterization vectors of each historical dialogue statement, and calculating by a conditional random field to obtain a corresponding dialogue intention evidence loss function; then, calculating to obtain a total loss function of the system according to the conversation intention identification loss function, the conversation upper text type identification loss function and the conversation intention evidence loss function, and optimizing the feature vector according to the total loss function;

the calculation mode of the expression vector of the above sentence is as follows:

in the formula,u _i ¹representing the expression vector of the above sentence; tanh represents a hyperbolic tangent function;W _cqandb _cqall represent the learning parameters of the above-question attention layer;qrepresenting a current sentence;u _iis shown asiA history dialogue statement;Nrepresenting a total number of historical conversational utterances;irepresenting variables with the value range of 1-N;

the calculation mode of the expression vector of the context sentence is as follows:

in the formula,u _i ²representing a context sentence representation vector;Nrepresenting a total number of historical conversational utterances;W _self∈R^d×d，R^d×drepresenting a real number matrix with d-dimension of rows and columns, wherein R represents a real number;attn _ijrepresenting the attention weight after the normalization processing of the softmax function;score _ijan attention weight between the ith and jth history statements above representing the current statement;kis a variable, representing the second in a range of valueskA plurality of;

the calculation mode of the feature vector is as follows:

in the formula,vec _featurethe feature vector is represented by a vector of features,W _qcandb _qcall represent questions-the learning parameters of the above attention layers,qwhich is indicative of the current sentence,dotrepresenting a dot product operation.

Further, the autocorrelation coefficient analysis module includes the above-question attention layer, the self-attention layer, and the question-above attention layer; the upper-question attention layer is used for splicing the representation vectors of the current statement and the representation vectors of the historical dialogue statements and then calculating through a hyperbolic tangent function to obtain an upper statement representation vector fused with question information; the self-attention layer is used for performing feature fusion on the above-mentioned sentence expression vector through a self-attention mechanism to obtain a context sentence expression vector fused with historical dialogue context information; the question-above attention layer is used for performing dot product operation according to the representation vector of the current statement and the representation vector of the context statement to obtain a feature vector for intention identification.

Further, the multitask learning module comprises a conversation intention identification unit, a conversation text class identification unit and a conversation text evidence selection unit; the dialogue intention recognition unit is used for calculating according to the processing result of the feedforward neural network and the intention of each historical dialogue statement to obtain a corresponding dialogue intention recognition loss function, and the dialogue upper class recognition unit is used for calculating according to the processing result of the feedforward neural network and the type of each historical dialogue statement to obtain a corresponding dialogue upper class recognition loss function; and the evidence selection unit above the dialog is used for splicing the representation vector of the current statement and the representation vectors of the historical dialog statements and then calculating according to the relevance between the current statement and the historical dialog statements to obtain a corresponding evidence selection loss function above the dialog.

Further, the dialog intention recognition unit, the dialog context class recognition unit and the dialog context evidence selection unit are implemented in the following manner:

in the above equation, Loss1 represents the dialog intent recognition Loss function; loss2 represents the class recognition penalty function above the dialog; loss3 represents the evidence selection Loss function above the dialog; crf denotes a conditional random field;ffrepresenting a feed-forward neural network;θ _acflownetwork parameters representing an autocorrelation coefficient analysis module;θ _ffa network parameter representing a feedforward neural network;θ _crfnetwork parameters representing conditional random fields;qrepresenting a current sentence;u ₁，u ₂，…，u _Nrepresenting each historical dialog statement;x _krepresenting the second in a sample set of dialogkA sample is obtained;CErepresenting a cross entropy operation;MLErepresenting a maximum likelihood estimation operation;intent _kto representx _kA corresponding intent;type _kto representx _kA corresponding type;tag _kto representx _kA flag as to whether the current statement is relevant;sel _Nthen the mark of each historical dialogue sentence in the mark sequence is represented, 0 represents irrelevant, and 1 represents relevant; d represents a data set in which each sample is containedx _kCorresponding intentionintent _kType (c) oftype _kAnd related indiciatag _k。

Further, the total loss function is calculated in the following manner:

in the formula,min _objrepresents the total Loss function, Loss1 represents the dialog intent recognition Loss function; loss2 represents the class recognition penalty function above the dialog; loss3 represents the evidence selection Loss function above the dialog;λ ₁，λ ₂，λ ₃representing a hyper-parameter.

In a second aspect, an embodiment of the present invention provides a dialog intention recognition method based on contextual attention flow, which is applied to the dialog intention recognition system described above, and includes:

s1, coding an input sentence containing a plurality of words to obtain a corresponding characterization vector; the input statement comprises a plurality of historical dialogue statements and a current statement;

s2, splicing the characterization vectors of the current statement and the characterization vectors of the historical dialogue statements and then calculating to obtain a previous statement expression vector fused with problem information; then, performing feature fusion according to the above sentence expression vector to obtain a context sentence expression vector fused with historical dialogue context information; finally, performing dot product operation according to the representation vector of the current statement and the representation vector of the context statement to obtain a feature vector for intention identification;

s3, processing the feature vectors through a feedforward neural network and inputting the processed feature vectors into the multi-task learning module;

s4, optimizing the feature vector through a multi-task learning module according to a total loss function of the system;

and S5, analyzing according to the optimized feature vector to obtain the intention of the current statement.

The beneficial effects that the invention can realize are as follows: the system and the method for recognizing the dialogue intention based on the context attention flow acquire the feature vector for recognizing the current sentence intention through the set autocorrelation coefficient analysis module, and optimize the feature vector through the trained multi-task learning module, so that the efficiency and the accuracy of recognizing the dialogue intention are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a schematic diagram of a topology of a dialog intention recognition system based on contextual attention flow according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a dialog intention recognition method based on contextual attention flow according to an embodiment of the present invention.

Icon: 10-dialog intention recognition system; 100-input coding module; 200-an autocorrelation coefficient analysis module; 300-a feed-forward neural network; 400-multitask learning module.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating a topology of a dialog intention recognition system based on contextual attention flow according to an embodiment of the present invention.

In one embodiment, the present invention provides a contextual attention flow based dialog intent recognition system 10 comprising an input encoding module 100, an autocorrelation coefficient analysis module 200, a feedforward neural network 300, and a multitasking learning module 400;

the input encoding module 100 is configured to encode an input sentence including a plurality of words to obtain a corresponding token vector; the input statement comprises a plurality of historical conversation statements and current statements of known conversation intents and conversation types in the conversation sample;

the autocorrelation coefficient analysis module 200 is configured to splice the characterization vectors of the current statement and the characterization vectors of each historical dialogue statement and then perform calculation to obtain an above-mentioned statement expression vector fused with the problem information; then, performing feature fusion according to the above sentence expression vector to obtain a context sentence expression vector fused with the dialogue context information; finally, performing dot product operation according to the representation vector of the current sentence and the representation vector of the context sentence to obtain a feature vector for intention identification;

the feedforward neural network 300 is used for processing the feature vectors and inputting the processed feature vectors into the multi-task learning module;

the multitask learning module 400 is configured to calculate according to the processing result of the feedforward neural network and the actual dialogue intention of each historical dialogue statement to obtain a corresponding dialogue intention identification loss function; analyzing according to the processing result of the feedforward neural network and the actual type of each historical dialogue statement to obtain a corresponding dialogue upper-text type recognition loss function; simultaneously splicing the characterization vectors of the current statement and the characterization vectors of each historical dialogue statement, and calculating by a conditional random field to obtain a corresponding dialogue intention evidence loss function; and then, calculating to obtain a total loss function of the system according to the conversation intention identification loss function, the conversation upper text type identification loss function and the conversation intention evidence loss function, and optimizing the feature vector according to the total loss function.

Through the embodiment, the complexity of the system is reduced, and the efficiency and the accuracy of dialog intention recognition are improved.

In one embodiment, the input encoding module 100 may use an LSTM-CNN encoder; the LSTM-CNN encoder firstly uses a word embedding layer based on a Glove word vector to encode a sentence containing N words into an N x d-dimensional matrix, each word corresponds to a d-dimensional vector in the matrix, then uses an LSTM encoder to read in the matrix, and sends the output of the LSTM into a multi-convolution kernel CNN network. The CNN network comprises convolution kernels with the lengths of 1, 3, 5, 7 and 9 units, the convolution results of the convolution kernels are spliced together, and maximum pooling operation is performed, so that the characterization vectors of all statement codes are generated.

It should be noted that the input encoding module 100 is not limited to use of the LSTM-CNN encoder, and other encoders may be used instead, such as a transform network.

In one embodiment, autocorrelation coefficient analysis module 200 includes the above-question attention layer, the self-attention layer, and the question-above attention layer; the problem attention layer is used for splicing the representation vector of the current statement and the representation vectors of the historical dialogue statements and then calculating through a hyperbolic tangent function to obtain a problem information fused representation vector of the previous statement; the self-attention layer is used for performing feature fusion on the above-mentioned sentence expression vector through a self-attention mechanism to obtain a context sentence expression vector fused with historical dialogue context information; question-above attention layer is used to perform dot product operation based on the token vector of the current sentence and the context sentence representation vector, obtaining a feature vector for intent recognition.

Specifically, the above sentence expression vector is calculated in the following manner:

in the formula,u _i ¹representing the expression vector of the above sentence; tanh represents a hyperbolic tangent function;W _cqandb _cqall represent the learning parameters of the above-question attention layer;qrepresenting a current sentence;u _iis shown asiA history dialogue statement;Nrepresenting a total number of historical conversational utterances;irepresenting variables with the value range of 1-N.

The calculation method of the expression vector of the context sentence is as follows:

in the formula,u _i ²representing a context sentence representation vector;Nrepresenting a total number of historical conversational utterances;W _self∈R^d×d，R^d×drepresenting a real number matrix with d-dimension of rows and columns, wherein R represents a real number;attn _ijrepresenting the attention weight after the normalization processing of the softmax function;score _ijan attention weight between the ith and jth history statements above representing the current statement;kis a variable, representing the second in a range of valueskAnd (4) respectively.

The calculation method of the feature vector is as follows:

In one embodiment, the multitask learning module 400 includes a dialogue intention recognition unit, a dialogue above class recognition unit, and a dialogue above evidence selection unit; the dialogue intention recognition unit is used for calculating according to the processing result of the feedforward neural network and the intention of each historical dialogue statement to obtain a corresponding dialogue intention recognition loss function, and the dialogue upper class recognition unit is used for calculating according to the processing result of the feedforward neural network and the type of each historical dialogue statement to obtain a corresponding dialogue upper class recognition loss function; and the evidence selection unit above the dialog is used for splicing the characterization vectors of the current statement and the characterization vectors of the historical dialog statements and then calculating according to the relevance between the current statement and the historical dialog statements to obtain a corresponding evidence selection loss function above the dialog.

Specifically, the dialog intention recognition unit, the dialog context class recognition unit and the dialog context evidence selection unit are implemented in the following manners:

in the above equation, Loss1 represents the dialog intent recognition Loss function; loss2 represents the class recognition penalty function above the dialog; loss3 represents the evidence selection Loss function above the dialog; crf denotes a conditional random field;ffrepresenting a feed-forward neural network;θ _acflownetwork parameters representing an autocorrelation coefficient analysis module;θ _ffa network parameter representing a feedforward neural network;θ _crfnetwork parameters representing conditional random fields;qrepresenting a current sentence;u ₁，u ₂，…，u _Nrepresenting each historical dialog statement;x _krepresenting the second in a sample set of dialogkA sample is obtained;CErepresenting a cross entropy operation;MLErepresenting a maximum likelihood estimation operation;intent _kto representx _kA corresponding intent;type _kto representx _kA corresponding type;tag _kto representx _kA flag as to whether the current statement is relevant;sel _Nthen the mark of each historical dialogue sentence in the mark sequence is represented, 0 represents irrelevant, and 1 represents relevant;d represents a data set in which each sample is containedx _kCorresponding intentionintent _kType (c) oftype _kAnd related indiciatag _k。

In one embodiment, the total loss function is calculated by:

in the formula,min _objrepresents the total Loss function, Loss1 represents the dialog intent recognition Loss function; loss2 represents the class recognition penalty function above the dialog; loss3 represents the evidence selection Loss function above the dialog;λ ₁，λ ₂，λ ₃representing a hyper-parameter. Wherein,λ ₁，λ ₂，λ ₃can be obtained by adopting a hyper-parameter grid search; for example, a training data set may be set, the training data set is divided into a training set and a verification set, the intention recognition accuracy is obtained under different super parameters, and a group of super parameters with the highest accuracy on the verification set is selected.

Through the embodiment, the feature vector for dialog intention prediction can be more accurate, and therefore accuracy of dialog intention recognition is improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating a dialog intention recognition method based on contextual attention flow according to an embodiment of the present invention.

In one embodiment, the present invention further provides a method for recognizing dialog intention based on contextual attention flow for the above-mentioned dialog intention recognition system, which is described in detail as follows.

Through the process, the efficiency and the accuracy of recognizing the conversation intention are improved.

Further, in order to solve the intention recognition that the dialog depends on, the industry currently adopts NLI (natural language reasoning) method, memory network and other methods for analysis. The invention contrasts and analyzes the following methods:

l BERT-NLI: and the method comprises the steps of splicing the sentences on the conversation into single sentences by using an advanced natural language model BERT in the industry as a sentence coder, sending the single sentences and the current conversation sentences into the BERT, and then using the pooling vector of the BERT as a feature vector to identify the intention.

l E2 EMEM: an end-to-end memory network can form input, memory update and output closed-loop parameter update.

l DMN: the dynamic memory network adopts a dynamic gating algorithm to update the memory state and continuously updates the internal memory module of the network.

l KVNet: a key-value network of a parameter key hash can greatly enlarge the retrieval range of facts and improve the retrieval fusion precision. The above of the dialog may be regarded as a fact in the dialog and the current dialog statement as a retrieval request.

l DANet: a deep dialogue historical statement fusion network can fuse information above dialogue into the representation of current dialogue statements based on an attention mechanism, and improves dialogue intention recognition accuracy.

In order to test the accuracy, we obtained about 90 ten thousand dialogs from Taobao, performed artificial intention labeling (dialog intention, dialog type, dialog intention related facts), and then trained and tested the method of the present invention (ACFlow) and the above 5-class industry method using 90% of them as training set and 10% as test set, and the experimental results are shown in Table 1. As can be seen from Table 1, the method provided by the invention has an accuracy rate which is about 6-7% higher than that of the representative methods.

TABLE 1

In summary, the embodiment of the present invention provides a system and a method for recognizing dialog intentions based on contextual attention streams, including an input encoding module, an autocorrelation coefficient analyzing module, a feedforward neural network, and a multitask learning module; the input coding module is used for coding an input sentence containing a plurality of words to obtain a corresponding representation vector; the input statement comprises a plurality of historical conversation statements and current statements of known conversation intents and conversation types in the conversation sample; the autocorrelation coefficient analysis module is used for splicing the characterization vectors of the current statement and the characterization vectors of the historical dialogue statements and then calculating to obtain the expression vector of the previous statement fused with the problem information; then, performing feature fusion according to the above sentence expression vector to obtain a context sentence expression vector fused with the dialogue context information; finally, performing dot product operation according to the representation vector of the current sentence and the representation vector of the context sentence to obtain a feature vector for intention identification; the feedforward neural network is used for processing the characteristic vector and inputting the processed characteristic vector into the multi-task learning module; the multi-task learning module optimizes the characteristic vector through a total loss function of the system; efficiency and accuracy of dialog intention recognition are improved.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A contextual attention flow based dialog intent recognition system comprising: the system comprises an input coding module, an autocorrelation coefficient analysis module, a feedforward neural network and a multitask learning module;

the calculation mode of the feature vector is as follows:

2. The dialog intent recognition system of claim 1 wherein the autocorrelation coefficient analysis module comprises a above-question attention layer, a self-attention layer, and a question-above attention layer; the upper-question attention layer is used for splicing the representation vectors of the current statement and the representation vectors of the historical dialogue statements and then calculating through a hyperbolic tangent function to obtain an upper statement representation vector fused with question information; the self-attention layer is used for performing feature fusion on the above-mentioned sentence expression vector through a self-attention mechanism to obtain a context sentence expression vector fused with historical dialogue context information; the question-above attention layer is used for performing dot product operation according to the representation vector of the current statement and the representation vector of the context statement to obtain a feature vector for intention identification.

3. The dialog intent recognition system of claim 1 wherein the multitask learning module comprises a dialog intent recognition unit, a dialog context class recognition unit and a dialog context evidence selection unit; the dialogue intention recognition unit is used for calculating according to the processing result of the feedforward neural network and the intention of each historical dialogue statement to obtain a corresponding dialogue intention recognition loss function, and the dialogue upper class recognition unit is used for calculating according to the processing result of the feedforward neural network and the type of each historical dialogue statement to obtain a corresponding dialogue upper class recognition loss function; and the evidence selection unit above the dialog is used for splicing the representation vector of the current statement and the representation vectors of the historical dialog statements and then calculating according to the relevance between the current statement and the historical dialog statements to obtain a corresponding evidence selection loss function above the dialog.

4. The dialog intention recognition system according to claim 3, characterized in that the dialog intention recognition unit, the dialog context class recognition unit and the dialog context evidence selection unit are implemented in such a way that:

5. The dialog intent recognition system of claim 4 wherein the total loss function is calculated by:

6. A dialog intention recognition method based on contextual attention flow, applied to the dialog intention recognition system according to any one of claims 1 to 5, comprising: