US20140309990A1 - Semantic re-ranking of nlu results in conversational dialogue applications - Google Patents
Semantic re-ranking of nlu results in conversational dialogue applications Download PDFInfo
- Publication number
- US20140309990A1 US20140309990A1 US14/314,248 US201414314248A US2014309990A1 US 20140309990 A1 US20140309990 A1 US 20140309990A1 US 201414314248 A US201414314248 A US 201414314248A US 2014309990 A1 US2014309990 A1 US 2014309990A1
- Authority
- US
- United States
- Prior art keywords
- nlu
- nlu interpretations
- type
- sets
- interpretations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
Definitions
- This application generally relates to natural language processing applications, and more specifically, to identifying and resolving anaphora that occur in conversational dialogue applications.
- Natural Language Processing (NLP) and Natural Language Understanding (NLU) involve using computer processing to extract meaningful information from natural language inputs such as human generated speech and text.
- NLP Natural Language Processing
- NLU Natural Language Understanding
- One recent application of such technology is processing speech and/or text queries in multi-modal conversational dialog applications such as for mobile devices like smartphones.
- FIG. 1 shows some example screen shots of one such conversational dialogue application for a mobile device, Dragon Go!, which processes speech query inputs and obtains simultaneous search results from a variety of top websites and content sources.
- Such conversational dialogue applications require adding a natural language understanding component to an existing web search algorithm in order to extract semantic meaning from the input queries. This can involve using approximate string matching to discover semantic template structures. One or more semantic meanings can be assigned to each semantic template. Parsing rules and classifier training samples can be generated and used to train NLU models that determine query interpretations (sometimes referred to as query intents).
- the arrangement may include multiple computer-implemented dialogue components, which may be configured to intercommunicate and use context to narrow down understanding, recognition, and/or reasoning errors.
- a user client may deliver output prompts to a human user and may receive dialogue inputs including speech inputs from the human user.
- An automatic speech recognition (ASR) engine may process the speech inputs to determine corresponding sequences of representative text words.
- a natural language understanding (NLU) engine may process the text words to determine corresponding semantic interpretations.
- a dialogue manager (DM) may generate output prompts and/or respond to the semantic interpretations so as to manage a dialogue process with the human user.
- the dialogue components may share context information with each other using a common context sharing mechanism such that the operation of each dialogue component reflects available context information.
- the context sharing mechanism may be based on key value pairs including a key element characterizing a specific context type and a value element characterizing a specific context value.
- the context information may include dialog context information reflecting context of the dialogue manager within the dialogue process.
- the dialogue context information may include one or more of:
- the context information may include client context information, for example, reflecting context of the user client within the dialogue process and/or NLU context information reflecting context of the NLU engine within the dialogue process.
- a user client may deliver output prompts to a human user and may receive dialogue inputs from the human user including speech inputs.
- An automatic speech recognition (ASR) engine may process the speech inputs to determine corresponding sequences of representative text words.
- a natural language understanding (NLU) engine may process the text words to determine corresponding NLU-ranked semantic interpretations.
- a semantic re-ranking module may re-rank the NLU-ranked semantic interpretations based on at least one of dialog context information and world knowledge information.
- a dialogue manager may respond to the re-ranked semantic interpretations and may generate output prompts so as to manage a dialogue process with the human user.
- the semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using dialog context information characterized by a context sharing mechanism using key value pairs including a key element characterizing a specific context type and a value element characterizing a specific context value. Additionally or alternatively, the semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using dialogue context information including one or more of: a belief state reflecting collective knowledge accumulated during the dialogue process, an expectation agenda reflecting new information expected by the dialogue manager, a dialogue focus reflecting information most recently prompted by the dialogue manager, and one or more selected items reflecting user dialogue choices needed by the dialogue manager.
- the semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using dialog context information that includes NLU context information reflecting context of the NLU engine within the dialogue process.
- the semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using semantic feature confidence scoring.
- the semantic feature confidence scoring may be combined in a decision tree to re-rank the NLU-ranked semantic interpretations.
- aspects of the disclosure are directed to an automatic conversational system having multiple computer-implemented dialogue components for conducting an automated dialogue process with a human user.
- the system may detect and/or resolve anaphora based on linguistic cues, dialogue context, and/or general knowledge.
- a user client may deliver dialogue output prompts to the human user and may receive dialogue input responses from the human user including speech inputs.
- An automatic speech recognition engine may process the speech inputs to determine corresponding sequences of representative text words.
- a natural language understanding (NLU) processing arrangement may process the dialogue input responses and the text words to determine corresponding semantic interpretations.
- NLU natural language understanding
- the NLU processing arrangement may include an anaphora processor that may be configured to access one or more information sources characterizing dialogue context, linguistic features, and/or NLU features to identify unresolved anaphora in the text words that need resolution in order to determine a semantic interpretation.
- a dialogue manager may manage the dialogue process with the human user based on the semantic interpretations.
- the anaphora processor may further resolve an identified unresolved anaphora by associating it with a previous concept occurring in the text words.
- the anaphora processor may favor recent actions in the dialogue process, use one or more dialogue scope rules, semantic distance relations, semantic coherence relations, and/or concept default values to resolve an identified unresolved anaphora.
- the system may utilize a client-server architecture, for example, where the user client resides on a mobile device.
- the NLU interpretation selection models may include a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type (e.g., a name/meaning pair type), a specialized NLU interpretation selection model specific to a first set of NLU interpretations type, and a specialized NLU interpretation selection model specific to a second set of NLU interpretations type.
- the second set of NLU interpretations type may be different from the first set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the first set of NLU interpretations type
- the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the second set of NLU interpretations type.
- the generic NLU interpretation selection model may be utilized to process natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type.
- the term N-best of potential semantic interpretations type may be used. Additionally or alternatively, the type(s) used might not correspond directly to the N-Best but may instead correspond to the input utilized by the semantic re-ranking model. In some embodiments, the whole N-Best may be used as input. In some embodiments, the N interpretation may score the N-Best one-by-one. Additionally or alternatively, the re-ranker model may work on pairs of interpretations taken from the N-Best.
- One common type for semantic interpretations is the value of a semantic slot used to identify the action to be taken by the application. In some embodiments, this common scheme may be enriched by grouping some values with one or more common characteristics together. In some embodiments, this common scheme may be enriched with information about the other semantic slots of the interpretations (e.g., those that do not control action taken by the application).
- a plurality of sets of NLU interpretations types may be extracted from a dataset comprising natural language input data.
- Each set of NLU interpretations type of the plurality of sets of NLU interpretations types may be classified as corresponding to a set of NLU interpretations type.
- a group of sets of NLU interpretations type classified as corresponding to the first set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types
- a group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types
- multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types.
- a determination to generate the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be made based on a number of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type.
- a determination to generate the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be made based on a number of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the set of NLU interpretations type may be generated by executing a machine learning algorithm on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, and does not include the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be generated by executing the machine learning algorithm on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the multiple groups of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type.
- the determination to generate a specialized NLU interpretation selection model for a set of NLU interpretations type may be based on variability in the natural language data classified as part of the set of NLU interpretations type (e.g., a higher variability may indicate that more training data should be obtained for the type before a specialized NLU interpretation selection model is generated for the set of NLU interpretations type).
- a determination not to generate a specialized NLU interpretation selection model specific to the set of NLU interpretations type may be made based on a number of sets of NLU interpretations types classified as corresponding to the set of NLU interpretations type.
- each NLU interpretations type corresponding to the set of NLU interpretations type may be added to a common dataset to form a dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type.
- the generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type may be generated by executing a machine learning algorithm on the dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type.
- the natural language input data comprising data corresponding to the first set of NLU interpretations type may be parsed to identify the data corresponding to the first set of NLU interpretations type.
- the natural language input data comprising data corresponding to the second set of NLU interpretations type may be parsed to identify the data corresponding to the second set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be identified for utilization to process the natural language input data comprising the data corresponding to the first set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be identified for utilization to process the natural language input data comprising the data corresponding to the second set of NLU interpretations type.
- the natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be parsed to identify the data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type. Responsive to identifying the data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type, the generic NLU interpretation selection model may be identified for utilization to process the natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type.
- FIG. 1 depicts example screen shots of a conversational dialog application for a mobile device
- FIG. 2 depicts an example multi-modal conversational dialog application arrangement that shares context information between components in accordance with one or more example embodiments
- FIG. 3 depicts an illustrative method, including various example functional steps performed by a context-sharing conversational dialog application, in accordance with one or more example embodiments;
- FIG. 4 depicts an example of an automated conversational dialogue system for performing a semantic re-ranking of NLU results using dialogue context and world knowledge in accordance with one or more example embodiments
- FIG. 5 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application performing a semantic re-ranking of NLU results using dialogue context and world knowledge, in accordance with one or more example embodiments;
- FIG. 6 depicts an example of an automated conversational dialogue system for identifying and resolving anaphora in accordance with one or more example embodiments
- FIG. 7 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application identifying and resolving anaphora, in accordance with one or more example embodiments;
- FIG. 8 depicts an illustrative method for generating and utilizing NLU interpretation selection models in accordance with one or more example embodiments
- FIG. 9 depicts an illustrative method for generating NLU interpretation selection models in accordance with one or more example embodiments.
- FIG. 10 depicts an illustrative method for utilizing NLU interpretation selection models in accordance with one or more example embodiments.
- a conversational dialogue arrangement which allows the various system components to keep track of dialogue context and share such information with other system components.
- FIG. 2 depicts an example multi-modal conversational dialog application arrangement that shares context information between components in accordance with one or more example embodiments
- FIG. 3 depicts an illustrative method, including various example functional steps performed by a context-sharing conversational dialog application, in accordance with one or more example embodiments.
- a user client 201 may deliver output prompts to a human user, step 301 , and may receive natural language dialogue inputs, including speech inputs, from the human user, step 302 .
- An automatic speech recognition (ASR) engine 202 may process the speech inputs to determine corresponding sequences of representative text words, step 303 .
- a natural language understanding (NLU) engine 203 may process the text words to determine corresponding semantic interpretations, step 304 .
- ASR automatic speech recognition
- NLU natural language understanding
- a dialogue manager (DM) 204 may generate the output prompts and respond to the semantic interpretations so as to manage a dialogue process with the human user, step 305 .
- Context sharing module 205 may provide a common context sharing mechanism so that each of the dialogue components—user client 201 , ASR engine 202 , NLU engine 203 , and dialogue manager 204 —may share context information with each other so that the operation of each dialogue component reflects available context information.
- the context sharing module 205 may manage dialogue context information of the dialogue manager 204 based on maintaining a dialogue belief state that represents the collective knowledge accumulated from the user input throughout the dialogue.
- An expectation agenda may represent what new pieces of information the dialogue manager 204 still expects to collect at any given point in the dialogue process.
- the dialogue focus may represent what specific information the dialogue manager 204 just explicitly requested from the user, and similarly the dialogue manager 204 may also track the currently selected items, which typically may be candidate values among which the user needs to choose for disambiguation, for selecting a given specific option (one itinerary, one reservation hour, etc.), and for choosing one of multiple possible next actions (“book now”, “modify reservation”, “cancel”, etc.).
- a dialogue context protocol may be defined, for example, as:
- SELECTED_ITEMS a list of key-value pairs of currently selected concept candidates among which the user needs to pick.
- a dialogue prompt: “do you mean Debbie Sanders or Debbie Xanders?” would yield to SELECTED_ITEMS ⁇ (CONTACT, Debbie Sanders), (CONTACT, Debbie Xanders) ⁇ .
- Communicating this dialogue context information back to the NLU engine 203 may enable the NLU engine 203 to weight focus and expectation concepts more heavily. And communicating such dialogue context information back to the ASR engine 202 may allow for smart dynamic optimization of the recognition vocabulary, and communicating the dialogue context information back to the user client 201 may help determine part of the current visual display on that device.
- the context sharing module 205 may also manage visual/client context information of the user client 201 .
- visual context would be when the user looks at a specific day of her calendar application on the visual display of the user client 201 and says: “Book a meeting at 1 pm,” she probably means to book it for the date currently in view in the calendar application.
- the user client 201 may also communicate touch input information via the context sharing module 205 to the dialogue manager 204 by sending the semantic interpretations corresponding to the equivalent natural language command. For instance, clicking on a link to “Book now” may translate into INTENTION:confirmBooking.
- the user client 201 may send contextual information by prefixing each such semantic key-value input pairs by the keyword CONTEXT. In that case, the dialogue manager 204 may treat this information as “contextual” and may consider it for default values, but not as explicit user input.
- the context sharing module 205 may also manage NLU/general knowledge context with regards to the NLU engine 203 . For example, when a person says: “Book a flight to London,” it may be safe to assume that the destination is not London, Ontario, that the user most probably means London, UK. Moreover, depending on the user's current location and/or other information in a user profile, it might even be reasonable to propose what specific London airport is most likely.
- the NLU engine 203 may access knowledge databases and return contextual information about concepts that have not been explicitly mentioned in the user's current sentence, and may communicate context by defining complex hierarchical concepts and concept properties (or attributes) associated to a concept.
- ASR and NLU engines process natural language user inputs in isolation, one input at a time. Each engine typically produces a set of output candidates. Each ASR candidate can have multiple semantic interpretations—language is ambiguous and a given sequence of words can mean many different things. A semantic interpretation can be thought of as a set of (possibly hierarchical) semantic slots, each corresponding to a concept in the natural language input.
- the ASR recognition candidates are ranked in terms of acoustic and language model match.
- the ASR engine can be bypassed, which is equivalent to a 1-best high accuracy ASR output.
- the ASR and NLU semantic interpretations typically are ranked by various heuristics ranging from parsing accuracy to semantic model probabilities.
- both the ASR engine and the NLU engine have no notion of conversation history. Their combined semantic interpretation candidates are ranked based on local features only. However, sometimes, knowing what question was asked in the dialogue process (the focus), what information is already known (the belief state), and what other pieces of information can be still expected from the user (the expectation agenda) can influence the likelihood of one interpretation candidate over another. Moreover, having some notion of world knowledge may help make a better informed decision of which of the interpretation candidates is actually correct; for example, knowing that the scheduling of a 13 minute meeting is much less probable than the scheduling of a 30 minute meeting.
- a human-machine dialogue arrangement with multiple computer-implemented dialogue components that performs a semantic re-ranking of NLU results in conversational applications using dialogue context and world knowledge is provided.
- FIG. 4 depicts an example of an automated conversational dialogue system for performing a semantic re-ranking of NLU results using dialogue context and world knowledge in accordance with one or more example embodiments
- FIG. 5 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application performing a semantic re-ranking of NLU results using dialogue context and world knowledge, in accordance with one or more example embodiments.
- a user client 401 may deliver output prompts to a human user, step 501 , and may receive dialogue inputs from the human user, including speech inputs, step 502 .
- An automatic speech recognition (ASR) engine 402 may process the speech inputs to determine corresponding sequences of representative text words, step 503 .
- ASR automatic speech recognition
- a natural language understanding (NLU) engine 403 may process the text words to determine corresponding NLU-ranked semantic interpretations, step 504 .
- a semantic re-ranking module 404 may re-rank the NLU-ranked semantic interpretations based on at least one of dialogue context information 407 and world knowledge information 408 , step 505 .
- a dialogue manager 405 may respond to the re-ranked semantic interpretations and may generate the output prompts so as to manage a dialogue process with the human user, step 506 .
- the semantic re-ranking module 404 may re-rank the N-best NLU-ranked semantic interpretations.
- Dialogue context information 407 may be characterized by a context sharing mechanism using key value pairs including a key element characterizing a specific context type and a value element characterizing a specific context value, thereby reflecting context of the NLU engine within the dialogue process.
- the dialogue context information 407 may include one or more of:
- semantic re-ranking module 404 may use a machine learning approach to learn a statistical re-ranking model on annotated examples with the semantic slots that a 1-best output should contain.
- a default re-ranking model may be included with the semantic re-ranking module 404 but an application developer may also produce a custom or adapted model using an offline training tool. The application developer may also define rules that would have precedence on the statistical re-ranking model to fix specific cases.
- a set of robust, application independent and language independent confidence features may be computed, including, for example:
- These features can characterized by a multi-dimensional feature vector to which a polynomial transformation may be applied to produce a prediction target that reflects the adequacy of a given semantic interpretation based on its similarity to the annotation measured by the F1-score of their respective list of associated semantic slots.
- confidence score computation may be implemented using two types of re-ranking decisions.
- Heuristic weighting may be based on a neural net model that computes feature weights and processes a weighted sum of the features.
- Confidence score features may be combined in a decision tree and a new ranking may obtained of the semantic interpretations (e.g., with the most likely one ranked first).
- parsing of the decision tree in effect answers in a series of questions about the confidence features that are used to compute a confidence-based re-ranking score.
- a question about one feature or one combination of features may be answered to produce a new semantic ranking score for the complete sentence.
- the re-ranked semantic interpretations may then be returned to the dialogue manager.
- anaphora A dialogue reference to a previously discussed concept is called an anaphora, and a sentence containing such references is called anaphoric.
- the mechanisms by which such references are solved are referred to as anaphora resolution. For example, suppose a person is placing an order for pizza delivery, and at some points says “make it extra-large.” One could assume that it refers to the pizza size. Yet it could also be the size of a drink and only the conversational context can help resolve this ambiguity. If the customer had said “make it all dressed,” one would use the common world knowledge, knowing that only pizzas have the property of being all-dressed, to deduce that the sentence refers to pizza.
- a generic application-independent algorithm is provided that allows automated conversational dialogue applications to detect and resolve anaphora based on linguistic cues, dialogue context, and/or general knowledge.
- FIG. 6 depicts an example of an automated conversational dialogue system for identifying and resolving anaphora in accordance with one or more example embodiments
- FIG. 7 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application identifying and resolving anaphora, in accordance with one or more example embodiments.
- a user client 601 may deliver output prompts to a human user, step 701 , and may receive natural language dialogue inputs, including speech inputs from the human user, step 702 .
- An automatic speech recognition (ASR) engine 602 may process the speech inputs to determine corresponding sequences of representative text words, step 703 .
- ASR automatic speech recognition
- a natural language understanding (NLU) engine 603 may process the text words to determine corresponding semantic interpretations, step 704 .
- the NLU engine 603 may include an anaphora processor 604 that may access different information sources 606 characterizing dialogue context, linguistic features, and NLU features to identify and resolve anaphora in the text words needing resolution, step 705 , in order to determine a semantic interpretation.
- a dialogue manager (DM) 605 may generate the output prompts and may respond to the semantic interpretations so as to manage a dialogue process with the human user, step 706 .
- the output of the NLU engine 603 contains concepts whose value is “context,” this may be a reliable indication that the particular concept needs to be mapped by the anaphora processor 604 to a mention earlier in the conversation.
- the NLU predictions are meant to be more generic: “her” might refer to a MEETING_PARTICIPANT, yet the anaphora processor 604 may map it to PERSON: context. “It” could mean anything and the anaphora processor 604 may map that to CONCEPT:context.
- the information sources 606 that the anaphora processor 604 accesses may also include dialogue context data. Elliptic input queries may be detected when slots that are mandatory at a given point in the dialogue are missing. For example, if the recognized sentence lacks a slot for INTENTION but there are other action parameters (date, time, person, etc.), that would be evidence of context carry-over.
- the information sources 606 may also include linguistic features such as missing verbs (“How about tomorrow?”), elliptic sentences (“not that”), presence of pronouns (“with her”), presence of definite articles (“the pizza”).
- the dialogue manager 605 may keep track of the dialogue history, record each step in the dialogue (user input, system prompt) along with the set of current selected items (or search results) at each step and the current belief state (the collected values that define the query).
- the anaphora processor 604 may iterate through the dialogue history, starting from the previous user query and working back towards further back interactions, and may compute a correlation measure optimized over data examples.
- the anaphora processor 604 may base this correlation measure on various features such as:
- the anaphora processor 604 may accept the resolution.
- a dialogue context protocol may be defined to include one or more sets of NLU interpretations types (e.g., a BELIEF that comprises one or more name/meaning or name/value pairs).
- multiple NLU interpretation selection models may be generated.
- the NLU interpretation selection models may include a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type and one or more specialized NLU interpretation selection models, each of which may be specific to a particular set of NLU interpretations type.
- the specialized NLU interpretation selection model(s) may be utilized to process natural language input data comprising data corresponding to their respective sets of NLU interpretations type(s).
- the generic NLU interpretation selection model may be utilized to process natural language input data comprising data that does not correspond to the sets of NLU interpretations type(s) associated with the specialized NLU interpretation selection model(s).
- FIG. 8 depicts an illustrative method for generating and utilizing NLU interpretation selection models in accordance with one or more example embodiments.
- multiple NLU interpretation selection models may be generated. For example, as will be described in greater detail below, a plurality of sets of NLU interpretations types may be identified within a dataset comprising natural language input data (e.g., training, validation, and/or test data). A determination may be made (e.g., based on a number of sets of NLU interpretations types in the dataset that correspond to a given set of NLU interpretations type) to generate one or more specialized NLU interpretation selection models, each of which may be specific to a particular set of NLU interpretations type.
- a specialized NLU interpretation selection model specific to a first set of NLU interpretations type may be generated, and a specialized NLU interpretation selection model specific to a second set of NLU interpretations type may also be generated.
- the second set of NLU interpretations type may be different from the first set of NLU interpretations type.
- a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type may also be generated, for example, based on sets of NLU interpretations types in the dataset that correspond to neither the first set of NLU interpretations type nor the second set of NLU interpretations type (e.g., sets of NLU interpretations types with a lower number of occurrences in the dataset than the number of occurrences of sets of NLU interpretations types in the dataset that correspond to the first set of NLU interpretations type and/or the number of occurrences of sets of NLU interpretations types in the dataset that correspond to the second set of NLU interpretations type).
- the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the first set of NLU interpretations type.
- natural language input data may be parsed to identify sets of NLU interpretations types within the data, each of which may be categorized as corresponding to a set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be utilized to process a portion of the natural language input data comprising sets of NLU interpretations types that correspond to the first set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the second set of NLU interpretations type.
- natural language input data may be parsed to identify sets of NLU interpretations types within the data, each of which may be categorized as corresponding to a set of NLU interpretations type.
- the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be utilized to process a portion of the natural language input data comprising sets of NLU interpretations types that correspond to the second set of NLU interpretations type.
- the generic NLU interpretation selection model may be utilized to process natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type.
- the generic NLU interpretation selection model may be utilized to process portions of the natural language input data that comprise data including sets of NLU interpretations types categorized as corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type.
- FIG. 9 depicts an illustrative method for generating NLU interpretation selection models in accordance with one or more example embodiments.
- a plurality of sets of NLU interpretations types may be extracted from a dataset comprising natural language input data.
- a plurality of sets of NLU interpretations types may be extracted from a dataset comprising natural language input data (e.g., training, validation, and/or test data).
- each set of NLU interpretations types of the plurality of sets of NLU interpretations types may be classified as corresponding to a set of NLU interpretations type (e.g., based on a possible associated application and/or interpretation).
- a group of sets of NLU interpretations types may be classified as corresponding to a first set of NLU interpretations type, a group of sets of NLU interpretations types may be classified as corresponding to a second set of NLU interpretations type, and one or more other sets of NLU interpretations types may be classified as corresponding to one or more other sets of NLU interpretations types (e.g., sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type).
- each of the groups of sets of NLU interpretations types may be identified (e.g., based on their classified sets of NLU interpretations types).
- the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types in the dataset
- the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types in the dataset
- multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types in the dataset.
- a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, and the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type).
- groups of sets of NLU interpretations types remain to be processed (e.g., the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, and the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretation
- a determination may be made whether to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type that remains to be processed. For example, a determination may be made to generate a specialized NLU interpretation selection model specific to the first set of NLU interpretations type (e.g., based on a number of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type).
- the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be generated by executing a machine learning algorithm on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, and does not include the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type.
- the method may then return to step 908 , and a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type and the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type).
- groups of sets of NLU interpretations types remain to be processed (e.g., the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type and the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type).
- a determination may be made whether to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type that remains to be processed. For example, a determination may be made to generate a specialized NLU interpretation selection model specific to the second set of NLU interpretations type (e.g., based on a number of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type).
- the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be generated by executing a machine learning algorithm (e.g., the same machine learning algorithm or a different machine learning algorithm) on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the multiple groups of set of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type.
- a machine learning algorithm e.g., the same machine learning algorithm or a different machine learning algorithm
- the method may then return to step 908 , and a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type).
- a determination may be made whether to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type that remains to be processed. For example, a determination may be made not to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type corresponding to a group of the multiple groups of sets of NLU interpretations types classified as corresponding to set of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type (e.g., based on a number of sets of NLU interpretations types classified as corresponding to the set of NLU interpretations type).
- each NLU interpretations type corresponding to the set of NLU interpretations type may be added to a common dataset to form a dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type (e.g., a generic dataset).
- the method may then return to step 908 , and a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., any remaining groups of the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type).
- Steps 910 , 916 , and 908 may be repeated for each group of the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, until a determination is made, at step 908 , that no more groups of sets of NLU interpretations types remain to be processed, at which point the method may proceed to step 914 .
- a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type may be generated by executing a machine learning algorithm (e.g., the same machine learning algorithm or a different machine learning algorithm) on the generic dataset (e.g., the dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type).
- a machine learning algorithm e.g., the same machine learning algorithm or a different machine learning algorithm
- FIG. 10 depicts an illustrative method for utilizing NLU interpretation selection models in accordance with one or more example embodiments.
- natural language input data may be received.
- natural language input data comprising data corresponding to the first set of NLU interpretations type, data corresponding to the second set of NLU interpretations type, and data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be received.
- the received natural language input data may be parsed to identify sets of NLU interpretations types, each of which may be categorized as corresponding to a specific set of NLU interpretations type (e.g., the first set of NLU interpretations type, the second set of NLU interpretations type, or a set of NLU interpretations type other than the first set of NLU interpretations type and the second set of NLU interpretations type).
- a specific set of NLU interpretations type e.g., the first set of NLU interpretations type, the second set of NLU interpretations type, or a set of NLU interpretations type other than the first set of NLU interpretations type and the second set of NLU interpretations type.
- the natural language input data comprising data corresponding to the first set of NLU interpretations type, data corresponding to the second set of NLU interpretations type, and data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be parsed to identify sets of NLU interpretations types, each of which may be categorized as corresponding to a specific set of NLU interpretations type (e.g., sets of NLU interpretations types within the data corresponding to the first set of NLU interpretations type may be identified and categorized as corresponding to the first set of NLU interpretations type, sets of NLU interpretations types within the data corresponding to the second set of NLU interpretations type may be identified and categorized as corresponding to the second set of NLU interpretations type, and sets of NLU interpretations types within the data that corresponds to the neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be identified and categorized as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and
- Data may be identified for processing (e.g., the data corresponding to the first set of NLU interpretations type, the data corresponding to the second set of NLU interpretations type, or the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type), and, at step 1006 , a determination may be made as to whether a specialized NLU interpretation selection model exists for the data identified for processing.
- the data corresponding to the first set of NLU interpretations type may be identified for processing; at step 1006 , a determination may be made that a specialized NLU interpretation selection model exists for the data corresponding to the first set of NLU interpretations type (e.g., the specialized NLU interpretation selection model specific to the first set of NLU interpretations type); at step 1008 , the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the first set of NLU interpretations type) may be identified for processing the data corresponding to the first set of NLU interpretations type; and, at step 1010 , the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the first set of NLU interpretations type) may be utilized to process the data corresponding to the first set of NLU interpretations type.
- a specialized NLU interpretation selection model exists for the data corresponding to the first set of NLU interpretations type (e.g., the specialized NLU interpretation selection model specific to the first set of N
- the data corresponding to the second set of NLU interpretations type may be identified for processing; at step 1006 , a determination may be made that a specialized NLU interpretation selection model exists for the data corresponding to the second set of NLU interpretations type (e.g., the specialized NLU interpretation selection model specific to the second set of NLU interpretations type); at step 1008 , the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the second set of NLU interpretations type) may be identified for processing the data corresponding to the second set of NLU interpretations type; and, at step 1010 , the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the second set of NLU interpretations type) may be utilized to process the data corresponding to the second set of NLU interpretations type.
- a specialized NLU interpretation selection model exists for the data corresponding to the second set of NLU interpretations type (e.g., the specialized NLU interpretation selection model specific to the second set of N
- the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be identified for processing; at step 1006 , a determination may be made that a specialized NLU interpretation selection model does not exist for the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type; at step 1012 , the generic NLU interpretation selection model may be identified for processing the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type; and, at step 1014 , the generic NLU interpretation selection model may be utilized to process the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type.
- One or more embodiments may be implemented in any conventional computer programming language.
- embodiments may be implemented in a procedural programming language (e.g., “C”) or an object-oriented programming language (e.g., “C++”, Python).
- Some embodiments may be implemented as pre-programmed hardware elements, other related components, or as a combination of hardware and software components.
- Embodiments can be implemented as a computer program product for use with a computer system.
- Such implementations may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium.
- the medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques).
- the series of computer instructions may embody all or part of the functionality previously described herein with respect to the system.
- Such computer instructions may be written in a number of programming languages for use with one or more computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical, or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies.
- a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over a network (e.g., the Internet or World Wide Web).
- Some embodiments may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments may be implemented as entirely hardware, or entirely software (e.g., a computer program product).
- a described “process” is the performance of a described function in a computer using computer hardware (such as a processor, domain-programmable gate array, or other electronic combinatorial logic, or similar device), which may be operating under control of software or firmware or a combination of any of these or operating outside control of any of the foregoing. All or part of the described function may be performed by active or passive electronic components, such as transistors or resistors. Use of the term “process” does not necessarily imply a schedulable entity, although, in some embodiments, a process may be implemented by such a schedulable entity.
- a “process” may be implemented using more than one processor or more than one (single- or multi-processor) computer and it may be an instance of a computer program or an instance of a subset of the instructions of a computer program.
- One or more aspects of the disclosure may be embodied in computer-usable data or computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices to perform the operations described herein.
- program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types when executed by one or more processors in a computer or other data processing device.
- the computer-executable instructions may be stored on a computer-readable medium such as a hard disk, optical disk, removable storage media, solid-state memory, RAM, and the like.
- the functionality of the program modules may be combined or distributed as desired in various embodiments.
- the functionality may be embodied in whole or in part in firmware or hardware equivalents, such as integrated circuits, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGA), and the like.
- ASICs application-specific integrated circuits
- FPGA field programmable gate arrays
- Particular data structures may be used to more effectively implement one or more aspects of the disclosure, and such data structures are contemplated to be within the scope of computer executable instructions and computer-usable data described herein.
- aspects described herein may be embodied as a method, an apparatus, or as one or more computer-readable media storing computer-executable instructions. Accordingly, those aspects may take the form of an entirely hardware embodiment, an entirely software embodiment, an entirely firmware embodiment, or an embodiment combining software, hardware, and firmware aspects in any combination.
- various signals representing data or events as described herein may be transferred between a source and a destination in the form of light or electromagnetic waves traveling through signal-conducting media such as metal wires, optical fibers, or wireless transmission media (e.g., air or space).
- the one or more computer-readable media may comprise one or more non-transitory computer-readable media.
- the various methods and acts may be operative across one or more computing devices and one or more networks.
- the functionality may be distributed in any manner, or may be located in a single computing device (e.g., a server, a client computer, or the like).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
- This application is a continuation-in-part of and claims priority to U.S. patent application Ser. No. 13/793,854, filed Mar. 11, 2013, and entitled “Semantic Re-Ranking of NLU Results in Conversational Dialogue Applications,” the disclosure of which is incorporated by reference herein in its entirety.
- This application generally relates to natural language processing applications, and more specifically, to identifying and resolving anaphora that occur in conversational dialogue applications.
- Natural Language Processing (NLP) and Natural Language Understanding (NLU) involve using computer processing to extract meaningful information from natural language inputs such as human generated speech and text. One recent application of such technology is processing speech and/or text queries in multi-modal conversational dialog applications such as for mobile devices like smartphones.
-
FIG. 1 shows some example screen shots of one such conversational dialogue application for a mobile device, Dragon Go!, which processes speech query inputs and obtains simultaneous search results from a variety of top websites and content sources. Such conversational dialogue applications require adding a natural language understanding component to an existing web search algorithm in order to extract semantic meaning from the input queries. This can involve using approximate string matching to discover semantic template structures. One or more semantic meanings can be assigned to each semantic template. Parsing rules and classifier training samples can be generated and used to train NLU models that determine query interpretations (sometimes referred to as query intents). - In a typical conversational dialog application, there are several interconnected components:
-
- the dialogue manager (DM), which decides what the next action should be after each user input,
- the automatic speech recognition engine (ASR), which translates spoken utterances into sequences of text words,
- the natural language understanding engine (NLU), which maps the words into semantic interpretations, or concepts, and
- the client, typically the component which resides on a mobile device or embedded platform and deals with visual displays and touch input.
- The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosure. This summary is not an extensive overview of the disclosure. It is intended neither to identify key or critical elements of the disclosure nor to delineate the scope of the disclosure. The following summary merely presents some concepts of the disclosure in a simplified form as a prelude to the description below.
- Aspects of the disclosure are directed to a human-machine dialogue arrangement. In some embodiments, the arrangement may include multiple computer-implemented dialogue components, which may be configured to intercommunicate and use context to narrow down understanding, recognition, and/or reasoning errors. A user client may deliver output prompts to a human user and may receive dialogue inputs including speech inputs from the human user. An automatic speech recognition (ASR) engine may process the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) engine may process the text words to determine corresponding semantic interpretations. A dialogue manager (DM) may generate output prompts and/or respond to the semantic interpretations so as to manage a dialogue process with the human user. The dialogue components may share context information with each other using a common context sharing mechanism such that the operation of each dialogue component reflects available context information.
- In some embodiments, the context sharing mechanism may be based on key value pairs including a key element characterizing a specific context type and a value element characterizing a specific context value. The context information may include dialog context information reflecting context of the dialogue manager within the dialogue process. For example, the dialogue context information may include one or more of:
-
- a belief state reflecting collective knowledge accumulated during the dialogue process,
- an expectation agenda reflecting new information expected by the dialogue manager,
- a dialogue focus reflecting information most recently prompted by the dialogue manager, and
- one or more selected items reflecting user dialogue choices needed by the dialogue manager.
- In some embodiments, the context information may include client context information, for example, reflecting context of the user client within the dialogue process and/or NLU context information reflecting context of the NLU engine within the dialogue process.
- Aspects of the disclosure are directed to a human-machine dialogue arrangement with multiple computer-implemented dialogue components that may perform a semantic re-ranking of NLU results in conversational applications using dialogue context and world knowledge. A user client may deliver output prompts to a human user and may receive dialogue inputs from the human user including speech inputs. An automatic speech recognition (ASR) engine may process the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) engine may process the text words to determine corresponding NLU-ranked semantic interpretations. A semantic re-ranking module may re-rank the NLU-ranked semantic interpretations based on at least one of dialog context information and world knowledge information. A dialogue manager may respond to the re-ranked semantic interpretations and may generate output prompts so as to manage a dialogue process with the human user.
- In some embodiments, the semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using dialog context information characterized by a context sharing mechanism using key value pairs including a key element characterizing a specific context type and a value element characterizing a specific context value. Additionally or alternatively, the semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using dialogue context information including one or more of: a belief state reflecting collective knowledge accumulated during the dialogue process, an expectation agenda reflecting new information expected by the dialogue manager, a dialogue focus reflecting information most recently prompted by the dialogue manager, and one or more selected items reflecting user dialogue choices needed by the dialogue manager.
- In some embodiments, the semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using dialog context information that includes NLU context information reflecting context of the NLU engine within the dialogue process. The semantic re-ranking module may re-rank the NLU-ranked semantic interpretations using semantic feature confidence scoring. For example, in some embodiments, the semantic feature confidence scoring may be combined in a decision tree to re-rank the NLU-ranked semantic interpretations.
- Aspects of the disclosure are directed to an automatic conversational system having multiple computer-implemented dialogue components for conducting an automated dialogue process with a human user. In some embodiments, the system may detect and/or resolve anaphora based on linguistic cues, dialogue context, and/or general knowledge. A user client may deliver dialogue output prompts to the human user and may receive dialogue input responses from the human user including speech inputs. An automatic speech recognition engine may process the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) processing arrangement may process the dialogue input responses and the text words to determine corresponding semantic interpretations. In some embodiments, the NLU processing arrangement may include an anaphora processor that may be configured to access one or more information sources characterizing dialogue context, linguistic features, and/or NLU features to identify unresolved anaphora in the text words that need resolution in order to determine a semantic interpretation. A dialogue manager may manage the dialogue process with the human user based on the semantic interpretations.
- In some embodiments, the anaphora processor may further resolve an identified unresolved anaphora by associating it with a previous concept occurring in the text words. For example, the anaphora processor may favor recent actions in the dialogue process, use one or more dialogue scope rules, semantic distance relations, semantic coherence relations, and/or concept default values to resolve an identified unresolved anaphora.
- In some embodiments, the system may utilize a client-server architecture, for example, where the user client resides on a mobile device.
- In accordance with one or more embodiments, multiple NLU interpretation selection models may be generated. The NLU interpretation selection models may include a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type (e.g., a name/meaning pair type), a specialized NLU interpretation selection model specific to a first set of NLU interpretations type, and a specialized NLU interpretation selection model specific to a second set of NLU interpretations type. The second set of NLU interpretations type may be different from the first set of NLU interpretations type. The specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the first set of NLU interpretations type, and the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the second set of NLU interpretations type. The generic NLU interpretation selection model may be utilized to process natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type.
- In some embodiments, the term N-best of potential semantic interpretations type may be used. Additionally or alternatively, the type(s) used might not correspond directly to the N-Best but may instead correspond to the input utilized by the semantic re-ranking model. In some embodiments, the whole N-Best may be used as input. In some embodiments, the N interpretation may score the N-Best one-by-one. Additionally or alternatively, the re-ranker model may work on pairs of interpretations taken from the N-Best. One common type for semantic interpretations is the value of a semantic slot used to identify the action to be taken by the application. In some embodiments, this common scheme may be enriched by grouping some values with one or more common characteristics together. In some embodiments, this common scheme may be enriched with information about the other semantic slots of the interpretations (e.g., those that do not control action taken by the application).
- In some embodiments, a plurality of sets of NLU interpretations types may be extracted from a dataset comprising natural language input data. Each set of NLU interpretations type of the plurality of sets of NLU interpretations types may be classified as corresponding to a set of NLU interpretations type. A group of sets of NLU interpretations type classified as corresponding to the first set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types, a group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types, and multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types.
- In some embodiments, a determination to generate the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be made based on a number of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type. Similarly, a determination to generate the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be made based on a number of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type. Responsive to determining to generate the specialized NLU interpretation selection model specific to the set of NLU interpretations type, the specialized NLU interpretation selection model specific to the set of NLU interpretations type may be generated by executing a machine learning algorithm on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, and does not include the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type. Similarly, responsive to determining to generate the specialized NLU interpretation selection model specific to the second set of NLU interpretations type, the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be generated by executing the machine learning algorithm on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the multiple groups of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type. In some embodiments, the determination to generate a specialized NLU interpretation selection model for a set of NLU interpretations type may be based on variability in the natural language data classified as part of the set of NLU interpretations type (e.g., a higher variability may indicate that more training data should be obtained for the type before a specialized NLU interpretation selection model is generated for the set of NLU interpretations type).
- In some embodiments, for each set of NLU interpretations type of the sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, a determination not to generate a specialized NLU interpretation selection model specific to the set of NLU interpretations type may be made based on a number of sets of NLU interpretations types classified as corresponding to the set of NLU interpretations type. For each set of NLU interpretations type of the typesets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, responsive to determining not to generate the specialized NLU interpretation selection model specific to the set of NLU interpretations type, each NLU interpretations type corresponding to the set of NLU interpretations type may be added to a common dataset to form a dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type. The generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type may be generated by executing a machine learning algorithm on the dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type.
- In some embodiments, the natural language input data comprising data corresponding to the first set of NLU interpretations type may be parsed to identify the data corresponding to the first set of NLU interpretations type. Similarly, the natural language input data comprising data corresponding to the second set of NLU interpretations type may be parsed to identify the data corresponding to the second set of NLU interpretations type. Responsive to identifying the data corresponding to the first set of NLU interpretations type, the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be identified for utilization to process the natural language input data comprising the data corresponding to the first set of NLU interpretations type. Similarly, responsive to identifying the data corresponding to the second set of NLU interpretations type, the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be identified for utilization to process the natural language input data comprising the data corresponding to the second set of NLU interpretations type.
- In some embodiments, the natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be parsed to identify the data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type. Responsive to identifying the data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type, the generic NLU interpretation selection model may be identified for utilization to process the natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type.
- Other details and features will be described in the sections that follow.
- The present disclosure is pointed out with particularity in the appended claims. Features of the disclosure will become more apparent upon a review of this disclosure in its entirety, including the drawing figures provided herewith.
- Some features herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which like reference numerals refer to similar elements, and wherein:
-
FIG. 1 depicts example screen shots of a conversational dialog application for a mobile device; -
FIG. 2 depicts an example multi-modal conversational dialog application arrangement that shares context information between components in accordance with one or more example embodiments; -
FIG. 3 depicts an illustrative method, including various example functional steps performed by a context-sharing conversational dialog application, in accordance with one or more example embodiments; -
FIG. 4 depicts an example of an automated conversational dialogue system for performing a semantic re-ranking of NLU results using dialogue context and world knowledge in accordance with one or more example embodiments; -
FIG. 5 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application performing a semantic re-ranking of NLU results using dialogue context and world knowledge, in accordance with one or more example embodiments; -
FIG. 6 depicts an example of an automated conversational dialogue system for identifying and resolving anaphora in accordance with one or more example embodiments; -
FIG. 7 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application identifying and resolving anaphora, in accordance with one or more example embodiments; -
FIG. 8 depicts an illustrative method for generating and utilizing NLU interpretation selection models in accordance with one or more example embodiments; -
FIG. 9 depicts an illustrative method for generating NLU interpretation selection models in accordance with one or more example embodiments; and -
FIG. 10 depicts an illustrative method for utilizing NLU interpretation selection models in accordance with one or more example embodiments. - In the following description of various illustrative embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown, by way of illustration, various embodiments in which aspects of the disclosure may be practiced. It is to be understood that other embodiments may be utilized, and structural and functional modifications may be made, without departing from the scope of the present disclosure.
- It is noted that various connections between elements are discussed in the following description. It is noted that these connections are general and, unless specified otherwise, may be direct or indirect, wired or wireless, and that the specification is not intended to be limiting in this respect.
- Dialogue Context Sharing
- In traditional conversational dialog applications, all of the components function in a context-less mode: each user input is recognized and understood in isolation, and deciding what the next step should be is done by taking into account only the current state of a given component and the last user input. But human reasoning and natural language understanding rely heavily on using dialogue context information such as conversation history, visual cues, user profile, world knowledge, etc. In accordance with aspects of the disclosure, a conversational dialogue arrangement is provided, which allows the various system components to keep track of dialogue context and share such information with other system components.
-
FIG. 2 depicts an example multi-modal conversational dialog application arrangement that shares context information between components in accordance with one or more example embodiments, andFIG. 3 depicts an illustrative method, including various example functional steps performed by a context-sharing conversational dialog application, in accordance with one or more example embodiments. Auser client 201 may deliver output prompts to a human user,step 301, and may receive natural language dialogue inputs, including speech inputs, from the human user,step 302. An automatic speech recognition (ASR)engine 202 may process the speech inputs to determine corresponding sequences of representative text words,step 303. A natural language understanding (NLU)engine 203 may process the text words to determine corresponding semantic interpretations,step 304. A dialogue manager (DM) 204 may generate the output prompts and respond to the semantic interpretations so as to manage a dialogue process with the human user,step 305.Context sharing module 205 may provide a common context sharing mechanism so that each of the dialogue components—user client 201,ASR engine 202,NLU engine 203, anddialogue manager 204—may share context information with each other so that the operation of each dialogue component reflects available context information. - For example, the
context sharing module 205 may manage dialogue context information of thedialogue manager 204 based on maintaining a dialogue belief state that represents the collective knowledge accumulated from the user input throughout the dialogue. An expectation agenda may represent what new pieces of information thedialogue manager 204 still expects to collect at any given point in the dialogue process. The dialogue focus may represent what specific information thedialogue manager 204 just explicitly requested from the user, and similarly thedialogue manager 204 may also track the currently selected items, which typically may be candidate values among which the user needs to choose for disambiguation, for selecting a given specific option (one itinerary, one reservation hour, etc.), and for choosing one of multiple possible next actions (“book now”, “modify reservation”, “cancel”, etc.). - Based on such an approach, a dialogue context protocol may be defined, for example, as:
-
- BELIEF=list of pairs of concepts (key, values) collected throughout the dialogue where the key is a name that identifies a specific kind of concept and the values are the corresponding concept values. For example “I want to book a meeting on May first” would yield a BELIEF={(DATE, “2012 May 1”), (INTENTION=“new meeting”)}.
- FOCUS=the concept key. For example, following a question of the system “What time would you like the meeting at?”, the focus may be START_TIME.
- EXPECTATION=list of concept keys the system may expect to receive. For instance, in the example above, while FOCUS is START_TIME, EXPECTATION may contain DURATION, END_TIME, PARTICIPANTS, LOCATION, . . . .
- SELECTED_ITEMS: a list of key-value pairs of currently selected concept candidates among which the user needs to pick. Thus a dialogue prompt: “do you mean Debbie Sanders or Debbie Xanders?” would yield to SELECTED_ITEMS {(CONTACT, Debbie Sanders), (CONTACT, Debbie Xanders)}.
- Communicating this dialogue context information back to the
NLU engine 203 may enable theNLU engine 203 to weight focus and expectation concepts more heavily. And communicating such dialogue context information back to theASR engine 202 may allow for smart dynamic optimization of the recognition vocabulary, and communicating the dialogue context information back to theuser client 201 may help determine part of the current visual display on that device. - Similarly, the
context sharing module 205 may also manage visual/client context information of theuser client 201. One specific example of visual context would be when the user looks at a specific day of her calendar application on the visual display of theuser client 201 and says: “Book a meeting at 1 pm,” she probably means to book it for the date currently in view in the calendar application. - The
user client 201 may also communicate touch input information via thecontext sharing module 205 to thedialogue manager 204 by sending the semantic interpretations corresponding to the equivalent natural language command. For instance, clicking on a link to “Book now” may translate into INTENTION:confirmBooking. In addition, theuser client 201 may send contextual information by prefixing each such semantic key-value input pairs by the keyword CONTEXT. In that case, thedialogue manager 204 may treat this information as “contextual” and may consider it for default values, but not as explicit user input. - The
context sharing module 205 may also manage NLU/general knowledge context with regards to theNLU engine 203. For example, when a person says: “Book a flight to London,” it may be safe to assume that the destination is not London, Ontario, that the user most probably means London, UK. Moreover, depending on the user's current location and/or other information in a user profile, it might even be reasonable to propose what specific London airport is most likely. TheNLU engine 203 may access knowledge databases and return contextual information about concepts that have not been explicitly mentioned in the user's current sentence, and may communicate context by defining complex hierarchical concepts and concept properties (or attributes) associated to a concept. - Semantic Re-Ranking
- Conventional ASR and NLU engines process natural language user inputs in isolation, one input at a time. Each engine typically produces a set of output candidates. Each ASR candidate can have multiple semantic interpretations—language is ambiguous and a given sequence of words can mean many different things. A semantic interpretation can be thought of as a set of (possibly hierarchical) semantic slots, each corresponding to a concept in the natural language input. The ASR recognition candidates are ranked in terms of acoustic and language model match. In the special case of a natural language input from the user in the form of text from a keyboard, the ASR engine can be bypassed, which is equivalent to a 1-best high accuracy ASR output. The ASR and NLU semantic interpretations typically are ranked by various heuristics ranging from parsing accuracy to semantic model probabilities.
- But both the ASR engine and the NLU engine have no notion of conversation history. Their combined semantic interpretation candidates are ranked based on local features only. However, sometimes, knowing what question was asked in the dialogue process (the focus), what information is already known (the belief state), and what other pieces of information can be still expected from the user (the expectation agenda) can influence the likelihood of one interpretation candidate over another. Moreover, having some notion of world knowledge may help make a better informed decision of which of the interpretation candidates is actually correct; for example, knowing that the scheduling of a 13 minute meeting is much less probable than the scheduling of a 30 minute meeting.
- This suggests that it would be useful to perform a re-ranking of the N-best semantic interpretations using dialogue context and world knowledge to order all likely interpretations of an utterance by their adequacy in representing the user intent. Thus, in accordance with aspects of the disclosure, a human-machine dialogue arrangement with multiple computer-implemented dialogue components that performs a semantic re-ranking of NLU results in conversational applications using dialogue context and world knowledge is provided.
-
FIG. 4 depicts an example of an automated conversational dialogue system for performing a semantic re-ranking of NLU results using dialogue context and world knowledge in accordance with one or more example embodiments, andFIG. 5 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application performing a semantic re-ranking of NLU results using dialogue context and world knowledge, in accordance with one or more example embodiments. Auser client 401 may deliver output prompts to a human user,step 501, and may receive dialogue inputs from the human user, including speech inputs,step 502. An automatic speech recognition (ASR)engine 402 may process the speech inputs to determine corresponding sequences of representative text words,step 503. A natural language understanding (NLU)engine 403 may process the text words to determine corresponding NLU-ranked semantic interpretations,step 504. Asemantic re-ranking module 404 may re-rank the NLU-ranked semantic interpretations based on at least one ofdialogue context information 407 and world knowledge information 408,step 505. Adialogue manager 405 may respond to the re-ranked semantic interpretations and may generate the output prompts so as to manage a dialogue process with the human user,step 506. - The
semantic re-ranking module 404 may re-rank the N-best NLU-ranked semantic interpretations.Dialogue context information 407 may be characterized by a context sharing mechanism using key value pairs including a key element characterizing a specific context type and a value element characterizing a specific context value, thereby reflecting context of the NLU engine within the dialogue process. In some embodiments, thedialogue context information 407 may include one or more of: -
- a belief state reflecting collective knowledge accumulated during the dialogue process,
- an expectation agenda reflecting new information expected by the
dialogue manager 405, - a dialogue focus, reflecting information most recently prompted by the
dialogue manager 405, and - one or more selected items reflecting user dialogue choices needed by the
dialogue manager 405.
- Conventional approaches to semantic re-ranking are based on a pipeline of ad hoc rules. The tuning of those rules for specific applications can be very difficult since the impacts of modifying a rule are difficult to predict and some rules seem more adapted to a given application than to another. Thus, in some embodiments,
semantic re-ranking module 404 may use a machine learning approach to learn a statistical re-ranking model on annotated examples with the semantic slots that a 1-best output should contain. A default re-ranking model may be included with thesemantic re-ranking module 404 but an application developer may also produce a custom or adapted model using an offline training tool. The application developer may also define rules that would have precedence on the statistical re-ranking model to fix specific cases. - In some embodiments, for each semantic interpretation returned by the NLU engine, a set of robust, application independent and language independent confidence features may be computed, including, for example:
-
- Internalization status: Categorizing the relevancy of the semantic interpretation to the application domain. Interpretations that are out of vocabulary or not matching may be filtered.
- Parsing confidence: Confidence of the NLU in the interpretation parsing.
- Focus weight: Categorizing the interpretation on how well it fits the expectation of the application.
- Parsed word ratio: The ratio of words attributed to a semantic slot in the utterance.
- Slot internalization ratio: The ratio of slots relevant to the dialog application in the current context.
- Internalized word ratio: The ratio of words attributed to a semantic slot relevant to the dialog application in the current context.
- Raw Score: Score attributed to the ASR result on which the interpretation is based.
- ASR index: Position of the ASR result on which the interpretation is based in the list of all ASR results
- Slot in focus count: Number of slots in the interpretation that are expected by the dialog application.
- Parsing score: Score attributed by the NLU ranker to the interpretation.
- Average prior: Average of the semantic slot prior value.
- Correction Score: Ratio of corrected slots.
- Correction slot count: Number of slots that have been corrected.
- Slot count: The number of slots in the interpretation
- Ratio of slots in focus: Ratio of slots expected by the application.
- RAW score cluster: Cluster raw score in groups (e.g., 5 groups) based on their normalized value.
- Average interpretation similarity: The average similarity of the interpretation to other interpretations of the N-best list. The similarity between two different interpretations may be measured by the F1-score. A good interpretation tends to be generated several times with some variation.
- These features can characterized by a multi-dimensional feature vector to which a polynomial transformation may be applied to produce a prediction target that reflects the adequacy of a given semantic interpretation based on its similarity to the annotation measured by the F1-score of their respective list of associated semantic slots.
- Once the confidence feature criteria are included for each semantic interpretation, two types of re-ranking decisions may be implemented: confidence score computation and heuristic weighting. Heuristic weighting may be based on a neural net model that computes feature weights and processes a weighted sum of the features. Confidence score features may be combined in a decision tree and a new ranking may obtained of the semantic interpretations (e.g., with the most likely one ranked first). Specifically, parsing of the decision tree in effect answers in a series of questions about the confidence features that are used to compute a confidence-based re-ranking score. At each node in the decision tree, a question about one feature or one combination of features may be answered to produce a new semantic ranking score for the complete sentence. The re-ranked semantic interpretations may then be returned to the dialogue manager.
- Anaphora Resolution
- A dialogue reference to a previously discussed concept is called an anaphora, and a sentence containing such references is called anaphoric. The mechanisms by which such references are solved are referred to as anaphora resolution. For example, suppose a person is placing an order for pizza delivery, and at some points says “make it extra-large.” One could assume that it refers to the pizza size. Yet it could also be the size of a drink and only the conversational context can help resolve this ambiguity. If the customer had said “make it all dressed,” one would use the common world knowledge, knowing that only pizzas have the property of being all-dressed, to deduce that the sentence refers to pizza.
- In accordance with aspects of the disclosure, a generic application-independent algorithm is provided that allows automated conversational dialogue applications to detect and resolve anaphora based on linguistic cues, dialogue context, and/or general knowledge.
-
FIG. 6 depicts an example of an automated conversational dialogue system for identifying and resolving anaphora in accordance with one or more example embodiments, andFIG. 7 depicts an illustrative method, including various example functional steps performed by an automated conversational dialog application identifying and resolving anaphora, in accordance with one or more example embodiments. Auser client 601 may deliver output prompts to a human user,step 701, and may receive natural language dialogue inputs, including speech inputs from the human user,step 702. An automatic speech recognition (ASR)engine 602 may process the speech inputs to determine corresponding sequences of representative text words,step 703. A natural language understanding (NLU)engine 603 may process the text words to determine corresponding semantic interpretations,step 704. TheNLU engine 603 may include ananaphora processor 604 that may accessdifferent information sources 606 characterizing dialogue context, linguistic features, and NLU features to identify and resolve anaphora in the text words needing resolution,step 705, in order to determine a semantic interpretation. A dialogue manager (DM) 605 may generate the output prompts and may respond to the semantic interpretations so as to manage a dialogue process with the human user,step 706. - Among the
different information sources 606 accessed by theanaphora processor 604 to flag zero or more concepts as anaphoric are NLU features that reflect when theanaphora processor 604 learns that certain wordings project to concepts (slots) being carried over from context. For example, when a sentence starts with “how about . . . ”, the previous user intent will apply to the current query, and so theanaphora processor 604 may generate an INTENTION=“context” concept. If a sentence contains a personal pronoun (“call her”), the person is somebody mentioned in the past conversation history and theanaphora processor 604 may generate a PERSON=“context” concept. So, whenever the output of theNLU engine 603 contains concepts whose value is “context,” this may be a reliable indication that the particular concept needs to be mapped by theanaphora processor 604 to a mention earlier in the conversation. The NLU predictions are meant to be more generic: “her” might refer to a MEETING_PARTICIPANT, yet theanaphora processor 604 may map it to PERSON: context. “It” could mean anything and theanaphora processor 604 may map that to CONCEPT:context. - The information sources 606 that the
anaphora processor 604 accesses may also include dialogue context data. Elliptic input queries may be detected when slots that are mandatory at a given point in the dialogue are missing. For example, if the recognized sentence lacks a slot for INTENTION but there are other action parameters (date, time, person, etc.), that would be evidence of context carry-over. The information sources 606 may also include linguistic features such as missing verbs (“How about tomorrow?”), elliptic sentences (“not that”), presence of pronouns (“with her”), presence of definite articles (“the pizza”). - Any anaphora identified by the
anaphora processor 604 may also need to be resolved. Thedialogue manager 605 may keep track of the dialogue history, record each step in the dialogue (user input, system prompt) along with the set of current selected items (or search results) at each step and the current belief state (the collected values that define the query). At each new user input, theanaphora processor 604 may iterate through the dialogue history, starting from the previous user query and working back towards further back interactions, and may compute a correlation measure optimized over data examples. Theanaphora processor 604 may base this correlation measure on various features such as: -
- Dialogue history. For example, how far back in the conversation history are the “missing” concept slots being found?
- Dialogue scope/task configuration. Independent stand-alone tasks may be configured as boundaries for context carry-over. For example, given an application that can schedule meetings, make restaurant reservations, place calls, send emails, etc., some of these tasks may be marked as “incompatible” so that no carry over is allowed.
- Semantic/ontology distance. Typically there may be a hierarchy of “is a” relations in a given ontology (a MEETING_PARTICIPANT is a PERSON). When the
NLU engine 603 outputs a context slot, theanaphora processor 604 may look in the dialogue history for any concept of the same type, or of a more general type, linked through a “is a” relation. - Semantic/ontology coherence. The system may represent “has a” relations in the ontology (“PIZZA has a SIZE and PIZZA has a TOPPINGS_TYPE). In each anaphoric sentence, the
anaphora processor 604 may replace the context concept with its resolution candidate and may compute how “semantically compatible” the sentence is (e.g., a sentence “make the pizza all dressed” may have a higher semantic coherence than “make the drink all dressed”). - Default values. Sometimes the
anaphora processor 604 may resolve missing concepts not from the dialogue history, but from default values. Deciding when a concept has a default value and when it is probable enough or more probable than a matching value may be computed by theanaphora processor 604 as a combination of all of the above measures.
- When the resolution probability is high enough (e.g., above a configurable threshold), the
anaphora processor 604 may accept the resolution. - Specialized NLU Interpretation Selection Models
- As indicated above, conventional approaches to semantic re-ranking are based on a pipeline of ad hoc rules, which may be referred to as an NLU interpretation model. The tuning of such a model for specific applications can be very difficult since the impacts of modifying a rule are difficult to predict and some rules seem more adapted to a given application than to another. Additionally, if a model is tuned based on a dataset that includes a disproportionate amount of data related to a specific application, the model can become biased to the application, which may result in poor performance when interpreting data that is unrelated to the application.
- As indicated above, a dialogue context protocol may be defined to include one or more sets of NLU interpretations types (e.g., a BELIEF that comprises one or more name/meaning or name/value pairs). In accordance with one or more embodiments, multiple NLU interpretation selection models may be generated. The NLU interpretation selection models may include a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type and one or more specialized NLU interpretation selection models, each of which may be specific to a particular set of NLU interpretations type. The specialized NLU interpretation selection model(s) may be utilized to process natural language input data comprising data corresponding to their respective sets of NLU interpretations type(s). The generic NLU interpretation selection model may be utilized to process natural language input data comprising data that does not correspond to the sets of NLU interpretations type(s) associated with the specialized NLU interpretation selection model(s).
-
FIG. 8 depicts an illustrative method for generating and utilizing NLU interpretation selection models in accordance with one or more example embodiments. Referring toFIG. 8 , atstep 802, multiple NLU interpretation selection models may be generated. For example, as will be described in greater detail below, a plurality of sets of NLU interpretations types may be identified within a dataset comprising natural language input data (e.g., training, validation, and/or test data). A determination may be made (e.g., based on a number of sets of NLU interpretations types in the dataset that correspond to a given set of NLU interpretations type) to generate one or more specialized NLU interpretation selection models, each of which may be specific to a particular set of NLU interpretations type. For example, a specialized NLU interpretation selection model specific to a first set of NLU interpretations type may be generated, and a specialized NLU interpretation selection model specific to a second set of NLU interpretations type may also be generated. The second set of NLU interpretations type may be different from the first set of NLU interpretations type. A generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type may also be generated, for example, based on sets of NLU interpretations types in the dataset that correspond to neither the first set of NLU interpretations type nor the second set of NLU interpretations type (e.g., sets of NLU interpretations types with a lower number of occurrences in the dataset than the number of occurrences of sets of NLU interpretations types in the dataset that correspond to the first set of NLU interpretations type and/or the number of occurrences of sets of NLU interpretations types in the dataset that correspond to the second set of NLU interpretations type). - At
step 804, the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the first set of NLU interpretations type. For example, as will be described in greater detail below, natural language input data may be parsed to identify sets of NLU interpretations types within the data, each of which may be categorized as corresponding to a set of NLU interpretations type. The specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be utilized to process a portion of the natural language input data comprising sets of NLU interpretations types that correspond to the first set of NLU interpretations type. Similarly, atstep 806, the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be utilized to process natural language input data comprising data corresponding to the second set of NLU interpretations type. For example, as described with respect to step 804, natural language input data may be parsed to identify sets of NLU interpretations types within the data, each of which may be categorized as corresponding to a set of NLU interpretations type. The specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be utilized to process a portion of the natural language input data comprising sets of NLU interpretations types that correspond to the second set of NLU interpretations type. At step 808, the generic NLU interpretation selection model may be utilized to process natural language input data comprising data corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type. For example, the generic NLU interpretation selection model may be utilized to process portions of the natural language input data that comprise data including sets of NLU interpretations types categorized as corresponding to neither the first set of NLU interpretations type nor the second set of NLU interpretations type. -
FIG. 9 depicts an illustrative method for generating NLU interpretation selection models in accordance with one or more example embodiments. Referring toFIG. 9 , atstep 902, a plurality of sets of NLU interpretations types may be extracted from a dataset comprising natural language input data. For example, a plurality of sets of NLU interpretations types may be extracted from a dataset comprising natural language input data (e.g., training, validation, and/or test data). Atstep 904, each set of NLU interpretations types of the plurality of sets of NLU interpretations types may be classified as corresponding to a set of NLU interpretations type (e.g., based on a possible associated application and/or interpretation). For example, a group of sets of NLU interpretations types may be classified as corresponding to a first set of NLU interpretations type, a group of sets of NLU interpretations types may be classified as corresponding to a second set of NLU interpretations type, and one or more other sets of NLU interpretations types may be classified as corresponding to one or more other sets of NLU interpretations types (e.g., sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type). At step 906, each of the groups of sets of NLU interpretations types may be identified (e.g., based on their classified sets of NLU interpretations types). For example, the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types in the dataset, the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types in the dataset, and multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type may be identified from amongst the plurality of sets of NLU interpretations types in the dataset. - At
step 908, a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, and the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type). Atstep 910, a determination may be made whether to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type that remains to be processed. For example, a determination may be made to generate a specialized NLU interpretation selection model specific to the first set of NLU interpretations type (e.g., based on a number of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type). Atstep 912, responsive to determining to generate the specialized NLU interpretation selection model specific to the first set of NLU interpretations type, the specialized NLU interpretation selection model specific to the first set of NLU interpretations type may be generated by executing a machine learning algorithm on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, and does not include the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type. The method may then return to step 908, and a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type and the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type). - At
step 910, a determination may be made whether to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type that remains to be processed. For example, a determination may be made to generate a specialized NLU interpretation selection model specific to the second set of NLU interpretations type (e.g., based on a number of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type). Atstep 912, responsive to determining to generate the specialized NLU interpretation selection model specific to the second set of NLU interpretations type, the specialized NLU interpretation selection model specific to the second set of NLU interpretations type may be generated by executing a machine learning algorithm (e.g., the same machine learning algorithm or a different machine learning algorithm) on a dataset comprising natural language input data that includes the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the multiple groups of set of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type. The method may then return to step 908, and a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type). - At
step 910, a determination may be made whether to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type that remains to be processed. For example, a determination may be made not to generate a specialized NLU interpretation selection model specific to a set of NLU interpretations type corresponding to a group of the multiple groups of sets of NLU interpretations types classified as corresponding to set of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type (e.g., based on a number of sets of NLU interpretations types classified as corresponding to the set of NLU interpretations type). Atstep 916, responsive to determining not to generate a specialized NLU interpretation selection model specific to the set of NLU interpretations type corresponding to the group of the multiple groups of sets of NLU interpretations types classified as corresponding to set of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, each NLU interpretations type corresponding to the set of NLU interpretations type may be added to a common dataset to form a dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type (e.g., a generic dataset). The method may then return to step 908, and a determination may be made regarding whether any groups of sets of NLU interpretations types remain to be processed. For example, a determination may be made that groups of sets of NLU interpretations types remain to be processed (e.g., any remaining groups of the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type). -
Steps step 908, that no more groups of sets of NLU interpretations types remain to be processed, at which point the method may proceed to step 914. Atstep 914, a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type may be generated by executing a machine learning algorithm (e.g., the same machine learning algorithm or a different machine learning algorithm) on the generic dataset (e.g., the dataset comprising natural language input data that includes the multiple groups of sets of NLU interpretations types classified as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type, does not include the group of sets of NLU interpretations types classified as corresponding to the first set of NLU interpretations type, and does not include the group of sets of NLU interpretations types classified as corresponding to the second set of NLU interpretations type). -
FIG. 10 depicts an illustrative method for utilizing NLU interpretation selection models in accordance with one or more example embodiments. Referring toFIG. 10 , atstep 1002, natural language input data may be received. For example, natural language input data comprising data corresponding to the first set of NLU interpretations type, data corresponding to the second set of NLU interpretations type, and data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be received. Atstep 1004, the received natural language input data may be parsed to identify sets of NLU interpretations types, each of which may be categorized as corresponding to a specific set of NLU interpretations type (e.g., the first set of NLU interpretations type, the second set of NLU interpretations type, or a set of NLU interpretations type other than the first set of NLU interpretations type and the second set of NLU interpretations type). For example, the natural language input data comprising data corresponding to the first set of NLU interpretations type, data corresponding to the second set of NLU interpretations type, and data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be parsed to identify sets of NLU interpretations types, each of which may be categorized as corresponding to a specific set of NLU interpretations type (e.g., sets of NLU interpretations types within the data corresponding to the first set of NLU interpretations type may be identified and categorized as corresponding to the first set of NLU interpretations type, sets of NLU interpretations types within the data corresponding to the second set of NLU interpretations type may be identified and categorized as corresponding to the second set of NLU interpretations type, and sets of NLU interpretations types within the data that corresponds to the neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be identified and categorized as corresponding to sets of NLU interpretations types different from both the first set of NLU interpretations type and the second set of NLU interpretations type). - Data may be identified for processing (e.g., the data corresponding to the first set of NLU interpretations type, the data corresponding to the second set of NLU interpretations type, or the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type), and, at
step 1006, a determination may be made as to whether a specialized NLU interpretation selection model exists for the data identified for processing. For example, the data corresponding to the first set of NLU interpretations type may be identified for processing; atstep 1006, a determination may be made that a specialized NLU interpretation selection model exists for the data corresponding to the first set of NLU interpretations type (e.g., the specialized NLU interpretation selection model specific to the first set of NLU interpretations type); atstep 1008, the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the first set of NLU interpretations type) may be identified for processing the data corresponding to the first set of NLU interpretations type; and, atstep 1010, the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the first set of NLU interpretations type) may be utilized to process the data corresponding to the first set of NLU interpretations type. - Additionally or alternatively, the data corresponding to the second set of NLU interpretations type may be identified for processing; at
step 1006, a determination may be made that a specialized NLU interpretation selection model exists for the data corresponding to the second set of NLU interpretations type (e.g., the specialized NLU interpretation selection model specific to the second set of NLU interpretations type); atstep 1008, the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the second set of NLU interpretations type) may be identified for processing the data corresponding to the second set of NLU interpretations type; and, atstep 1010, the specialized NLU interpretation selection model (e.g., the specialized NLU interpretation selection model specific to the second set of NLU interpretations type) may be utilized to process the data corresponding to the second set of NLU interpretations type. Additionally or alternatively, the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type may be identified for processing; atstep 1006, a determination may be made that a specialized NLU interpretation selection model does not exist for the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type; atstep 1012, the generic NLU interpretation selection model may be identified for processing the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type; and, at step 1014, the generic NLU interpretation selection model may be utilized to process the data that corresponds to neither the first set of NLU interpretations type nor the second set of NLU interpretations type. - One or more embodiments may be implemented in any conventional computer programming language. For example, embodiments may be implemented in a procedural programming language (e.g., “C”) or an object-oriented programming language (e.g., “C++”, Python). Some embodiments may be implemented as pre-programmed hardware elements, other related components, or as a combination of hardware and software components.
- Embodiments can be implemented as a computer program product for use with a computer system. Such implementations may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions may embody all or part of the functionality previously described herein with respect to the system. Such computer instructions may be written in a number of programming languages for use with one or more computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical, or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. Such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over a network (e.g., the Internet or World Wide Web). Some embodiments may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments may be implemented as entirely hardware, or entirely software (e.g., a computer program product).
- A described “process” is the performance of a described function in a computer using computer hardware (such as a processor, domain-programmable gate array, or other electronic combinatorial logic, or similar device), which may be operating under control of software or firmware or a combination of any of these or operating outside control of any of the foregoing. All or part of the described function may be performed by active or passive electronic components, such as transistors or resistors. Use of the term “process” does not necessarily imply a schedulable entity, although, in some embodiments, a process may be implemented by such a schedulable entity. Furthermore, unless the context otherwise requires, a “process” may be implemented using more than one processor or more than one (single- or multi-processor) computer and it may be an instance of a computer program or an instance of a subset of the instructions of a computer program.
- One or more aspects of the disclosure may be embodied in computer-usable data or computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices to perform the operations described herein. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types when executed by one or more processors in a computer or other data processing device. The computer-executable instructions may be stored on a computer-readable medium such as a hard disk, optical disk, removable storage media, solid-state memory, RAM, and the like. The functionality of the program modules may be combined or distributed as desired in various embodiments. In addition, the functionality may be embodied in whole or in part in firmware or hardware equivalents, such as integrated circuits, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGA), and the like. Particular data structures may be used to more effectively implement one or more aspects of the disclosure, and such data structures are contemplated to be within the scope of computer executable instructions and computer-usable data described herein.
- Various aspects described herein may be embodied as a method, an apparatus, or as one or more computer-readable media storing computer-executable instructions. Accordingly, those aspects may take the form of an entirely hardware embodiment, an entirely software embodiment, an entirely firmware embodiment, or an embodiment combining software, hardware, and firmware aspects in any combination. In addition, various signals representing data or events as described herein may be transferred between a source and a destination in the form of light or electromagnetic waves traveling through signal-conducting media such as metal wires, optical fibers, or wireless transmission media (e.g., air or space). In general, the one or more computer-readable media may comprise one or more non-transitory computer-readable media.
- As described herein, the various methods and acts may be operative across one or more computing devices and one or more networks. The functionality may be distributed in any manner, or may be located in a single computing device (e.g., a server, a client computer, or the like).
- Aspects of the disclosure have been described in terms of illustrative embodiments thereof. Numerous other embodiments, modifications, and variations within the scope and spirit of the appended claims will occur to persons of ordinary skill in the art from a review of this disclosure. For example, one or more of the steps depicted in the illustrative figures may be performed in other than the recited order, and one or more depicted steps may be optional in accordance with aspects of the disclosure.
Claims (20)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/314,248 US9761225B2 (en) | 2013-03-11 | 2014-06-25 | Semantic re-ranking of NLU results in conversational dialogue applications |
PCT/US2015/037318 WO2015200422A1 (en) | 2014-06-25 | 2015-06-24 | Semantic re-ranking of nlu results in conversational dialogue applications |
EP15736732.7A EP3161666A1 (en) | 2014-06-25 | 2015-06-24 | Semantic re-ranking of nlu results in conversational dialogue applications |
US15/700,438 US10540965B2 (en) | 2013-03-11 | 2017-09-11 | Semantic re-ranking of NLU results in conversational dialogue applications |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/793,854 US9269354B2 (en) | 2013-03-11 | 2013-03-11 | Semantic re-ranking of NLU results in conversational dialogue applications |
US14/314,248 US9761225B2 (en) | 2013-03-11 | 2014-06-25 | Semantic re-ranking of NLU results in conversational dialogue applications |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/793,854 Continuation-In-Part US9269354B2 (en) | 2013-03-11 | 2013-03-11 | Semantic re-ranking of NLU results in conversational dialogue applications |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/700,438 Continuation US10540965B2 (en) | 2013-03-11 | 2017-09-11 | Semantic re-ranking of NLU results in conversational dialogue applications |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140309990A1 true US20140309990A1 (en) | 2014-10-16 |
US9761225B2 US9761225B2 (en) | 2017-09-12 |
Family
ID=51687383
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/314,248 Active 2034-01-13 US9761225B2 (en) | 2013-03-11 | 2014-06-25 | Semantic re-ranking of NLU results in conversational dialogue applications |
US15/700,438 Active 2033-03-23 US10540965B2 (en) | 2013-03-11 | 2017-09-11 | Semantic re-ranking of NLU results in conversational dialogue applications |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/700,438 Active 2033-03-23 US10540965B2 (en) | 2013-03-11 | 2017-09-11 | Semantic re-ranking of NLU results in conversational dialogue applications |
Country Status (1)
Country | Link |
---|---|
US (2) | US9761225B2 (en) |
Cited By (95)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160322050A1 (en) * | 2015-04-30 | 2016-11-03 | Kabushiki Kaisha Toshiba | Device and method for a spoken dialogue system |
US9805028B1 (en) * | 2014-09-17 | 2017-10-31 | Google Inc. | Translating terms using numeric representations |
US9886958B2 (en) * | 2015-12-11 | 2018-02-06 | Microsoft Technology Licensing, Llc | Language and domain independent model based approach for on-screen item selection |
WO2018057166A1 (en) | 2016-09-23 | 2018-03-29 | Intel Corporation | Technologies for improved keyword spotting |
CN107871500A (en) * | 2017-11-16 | 2018-04-03 | 百度在线网络技术(北京)有限公司 | One kind plays multimedia method and apparatus |
US10032451B1 (en) * | 2016-12-20 | 2018-07-24 | Amazon Technologies, Inc. | User recognition for speech processing systems |
US20180330721A1 (en) * | 2017-05-15 | 2018-11-15 | Apple Inc. | Hierarchical belief states for digital assistants |
US10347146B2 (en) | 2014-12-23 | 2019-07-09 | International Business Machines Corporation | Managing answer feasibility |
US10372824B2 (en) | 2017-05-15 | 2019-08-06 | International Business Machines Corporation | Disambiguating concepts in natural language |
US10418032B1 (en) * | 2015-04-10 | 2019-09-17 | Soundhound, Inc. | System and methods for a virtual assistant to manage and use context in a natural language dialog |
US10437836B2 (en) | 2014-12-18 | 2019-10-08 | International Business Machines Corporation | Scoring attributes in a deep question answering system based on syntactic or semantic guidelines |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10796219B2 (en) * | 2016-10-31 | 2020-10-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Semantic analysis method and apparatus based on artificial intelligence |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
WO2020222444A1 (en) * | 2019-05-02 | 2020-11-05 | Samsung Electronics Co., Ltd. | Server for determining target device based on speech input of user and controlling target device, and operation method of the server |
US10878808B1 (en) * | 2018-01-09 | 2020-12-29 | Amazon Technologies, Inc. | Speech processing dialog management |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US10964327B2 (en) | 2019-05-02 | 2021-03-30 | Samsung Electronics Co., Ltd. | Hub device, multi-device system including the hub device and plurality of devices, and method of operating the same |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US20210117681A1 (en) | 2019-10-18 | 2021-04-22 | Facebook, Inc. | Multimodal Dialog State Tracking and Action Prediction for Assistant Systems |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11043205B1 (en) * | 2017-06-27 | 2021-06-22 | Amazon Technologies, Inc. | Scoring of natural language processing hypotheses |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11081104B1 (en) * | 2017-06-27 | 2021-08-03 | Amazon Technologies, Inc. | Contextual natural language processing |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11145302B2 (en) * | 2018-02-23 | 2021-10-12 | Samsung Electronics Co., Ltd. | System for processing user utterance and controlling method thereof |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
EP3380949B1 (en) * | 2015-11-25 | 2021-10-20 | Microsoft Technology Licensing, LLC | Automatic spoken dialogue script discovery |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11289074B2 (en) * | 2019-10-01 | 2022-03-29 | Lg Electronics Inc. | Artificial intelligence apparatus for performing speech recognition and method thereof |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11567788B1 (en) | 2019-10-18 | 2023-01-31 | Meta Platforms, Inc. | Generating proactive reminders for assistant systems |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11887585B2 (en) | 2019-05-31 | 2024-01-30 | Apple Inc. | Global re-ranker |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9761225B2 (en) * | 2013-03-11 | 2017-09-12 | Nuance Communications, Inc. | Semantic re-ranking of NLU results in conversational dialogue applications |
US10713441B2 (en) * | 2018-03-23 | 2020-07-14 | Servicenow, Inc. | Hybrid learning system for natural language intent extraction from a dialog utterance |
US10339919B1 (en) * | 2018-04-20 | 2019-07-02 | botbotbotbot Inc. | Task-independent conversational systems |
US11144735B2 (en) | 2019-04-09 | 2021-10-12 | International Business Machines Corporation | Semantic concept scorer based on an ensemble of language translation models for question answer system |
US11404050B2 (en) * | 2019-05-16 | 2022-08-02 | Samsung Electronics Co., Ltd. | Electronic apparatus and method for controlling thereof |
CN110288995B (en) * | 2019-07-19 | 2021-07-16 | 出门问问(苏州)信息科技有限公司 | Interaction method and device based on voice recognition, storage medium and electronic equipment |
US11822894B1 (en) * | 2022-12-30 | 2023-11-21 | Fmr Llc | Integrating common and context-specific natural language understanding processing in a virtual assistant application |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070156392A1 (en) * | 2005-12-30 | 2007-07-05 | International Business Machines Corporation | Method and system for automatically building natural language understanding models |
US20130246315A1 (en) * | 2012-03-16 | 2013-09-19 | Orbis Technologies, Inc. | Systems and methods for semantic inference and reasoning |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7711672B2 (en) | 1998-05-28 | 2010-05-04 | Lawrence Au | Semantic network methods to disambiguate natural language meaning |
AU6225199A (en) | 1998-10-05 | 2000-04-26 | Scansoft, Inc. | Speech controlled computer user interface |
WO2001098942A2 (en) | 2000-06-19 | 2001-12-27 | Lernout & Hauspie Speech Products N.V. | Package driven parsing using structure function grammar |
US7092928B1 (en) | 2000-07-31 | 2006-08-15 | Quantum Leap Research, Inc. | Intelligent portal engine |
US6963831B1 (en) | 2000-10-25 | 2005-11-08 | International Business Machines Corporation | Including statistical NLU models within a statistical parser |
US6754626B2 (en) | 2001-03-01 | 2004-06-22 | International Business Machines Corporation | Creating a hierarchical tree of language models for a dialog system based on prompt and dialog context |
CA2536270A1 (en) * | 2003-08-21 | 2005-03-03 | Idilia Inc. | Internet searching using semantic disambiguation and expansion |
US7475010B2 (en) | 2003-09-03 | 2009-01-06 | Lingospot, Inc. | Adaptive and scalable method for resolving natural language ambiguities |
US6969831B1 (en) | 2004-08-09 | 2005-11-29 | Sunbeam Products, Inc. | Heating pad assembly |
US20080208586A1 (en) | 2007-02-27 | 2008-08-28 | Soonthorn Ativanichayaphong | Enabling Natural Language Understanding In An X+V Page Of A Multimodal Application |
US20110258543A1 (en) | 2008-10-30 | 2011-10-20 | Talkamatic Ab | Dialog system |
US9978365B2 (en) | 2008-10-31 | 2018-05-22 | Nokia Technologies Oy | Method and system for providing a voice interface |
US9502025B2 (en) | 2009-11-10 | 2016-11-22 | Voicebox Technologies Corporation | System and method for providing a natural language content dedication service |
US8959014B2 (en) | 2011-06-30 | 2015-02-17 | Google Inc. | Training acoustic models using distributed computing techniques |
US9292603B2 (en) | 2011-09-30 | 2016-03-22 | Nuance Communications, Inc. | Receipt and processing of user-specified queries |
US9082402B2 (en) | 2011-12-08 | 2015-07-14 | Sri International | Generic virtual personal assistant platform |
US9761225B2 (en) * | 2013-03-11 | 2017-09-12 | Nuance Communications, Inc. | Semantic re-ranking of NLU results in conversational dialogue applications |
-
2014
- 2014-06-25 US US14/314,248 patent/US9761225B2/en active Active
-
2017
- 2017-09-11 US US15/700,438 patent/US10540965B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070156392A1 (en) * | 2005-12-30 | 2007-07-05 | International Business Machines Corporation | Method and system for automatically building natural language understanding models |
US20130246315A1 (en) * | 2012-03-16 | 2013-09-19 | Orbis Technologies, Inc. | Systems and methods for semantic inference and reasoning |
Cited By (181)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11979836B2 (en) | 2007-04-03 | 2024-05-07 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US12165635B2 (en) | 2010-01-18 | 2024-12-10 | Apple Inc. | Intelligent automated assistant |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US12009007B2 (en) | 2013-02-07 | 2024-06-11 | Apple Inc. | Voice trigger for a digital assistant |
US12277954B2 (en) | 2013-02-07 | 2025-04-15 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US12118999B2 (en) | 2014-05-30 | 2024-10-15 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US12067990B2 (en) | 2014-05-30 | 2024-08-20 | Apple Inc. | Intelligent assistant for home automation |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US12200297B2 (en) | 2014-06-30 | 2025-01-14 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9805028B1 (en) * | 2014-09-17 | 2017-10-31 | Google Inc. | Translating terms using numeric representations |
US10503837B1 (en) | 2014-09-17 | 2019-12-10 | Google Llc | Translating terms using numeric representations |
US10437835B2 (en) | 2014-12-18 | 2019-10-08 | International Business Machines Corporation | Scoring attributes in a deep question answering system based on syntactic or semantic guidelines |
US10437836B2 (en) | 2014-12-18 | 2019-10-08 | International Business Machines Corporation | Scoring attributes in a deep question answering system based on syntactic or semantic guidelines |
US10957213B2 (en) | 2014-12-23 | 2021-03-23 | International Business Machines Corporation | Managing answer feasibility |
US10347146B2 (en) | 2014-12-23 | 2019-07-09 | International Business Machines Corporation | Managing answer feasibility |
US10957214B2 (en) | 2014-12-23 | 2021-03-23 | International Business Machines Corporation | Managing answer feasibility |
US10347147B2 (en) | 2014-12-23 | 2019-07-09 | International Business Machines Corporation | Managing answer feasibility |
US12236952B2 (en) | 2015-03-08 | 2025-02-25 | Apple Inc. | Virtual assistant activation |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10418032B1 (en) * | 2015-04-10 | 2019-09-17 | Soundhound, Inc. | System and methods for a virtual assistant to manage and use context in a natural language dialog |
US20160322050A1 (en) * | 2015-04-30 | 2016-11-03 | Kabushiki Kaisha Toshiba | Device and method for a spoken dialogue system |
US9865257B2 (en) * | 2015-04-30 | 2018-01-09 | Kabushiki Kaisha Toshiba | Device and method for a spoken dialogue system |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US12154016B2 (en) | 2015-05-15 | 2024-11-26 | Apple Inc. | Virtual assistant in a communication session |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US12204932B2 (en) | 2015-09-08 | 2025-01-21 | Apple Inc. | Distributed personal assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11188297B2 (en) * | 2015-11-25 | 2021-11-30 | Microsoft Technology Licensing, Llc | Automatic spoken dialogue script discovery |
EP3380949B1 (en) * | 2015-11-25 | 2021-10-20 | Microsoft Technology Licensing, LLC | Automatic spoken dialogue script discovery |
US9886958B2 (en) * | 2015-12-11 | 2018-02-06 | Microsoft Technology Licensing, Llc | Language and domain independent model based approach for on-screen item selection |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US12223282B2 (en) | 2016-06-09 | 2025-02-11 | Apple Inc. | Intelligent automated assistant in a home environment |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12175977B2 (en) | 2016-06-10 | 2024-12-24 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US12293763B2 (en) | 2016-06-11 | 2025-05-06 | Apple Inc. | Application integration with a digital assistant |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
EP3516651A4 (en) * | 2016-09-23 | 2020-04-22 | Intel Corporation | Technologies for improved keyword spotting |
WO2018057166A1 (en) | 2016-09-23 | 2018-03-29 | Intel Corporation | Technologies for improved keyword spotting |
US10796219B2 (en) * | 2016-10-31 | 2020-10-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Semantic analysis method and apparatus based on artificial intelligence |
US20230139140A1 (en) * | 2016-12-20 | 2023-05-04 | Amazon Technologies, Inc. | User recognition for speech processing systems |
US11455995B2 (en) * | 2016-12-20 | 2022-09-27 | Amazon Technologies, Inc. | User recognition for speech processing systems |
US10032451B1 (en) * | 2016-12-20 | 2018-07-24 | Amazon Technologies, Inc. | User recognition for speech processing systems |
US11990127B2 (en) * | 2016-12-20 | 2024-05-21 | Amazon Technologies, Inc. | User recognition for speech processing systems |
US10755709B1 (en) * | 2016-12-20 | 2020-08-25 | Amazon Technologies, Inc. | User recognition for speech processing systems |
US12260234B2 (en) | 2017-01-09 | 2025-03-25 | Apple Inc. | Application integration with a digital assistant |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US10372824B2 (en) | 2017-05-15 | 2019-08-06 | International Business Machines Corporation | Disambiguating concepts in natural language |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US20180330721A1 (en) * | 2017-05-15 | 2018-11-15 | Apple Inc. | Hierarchical belief states for digital assistants |
US10565314B2 (en) | 2017-05-15 | 2020-02-18 | International Business Machines Corporation | Disambiguating concepts in natural language |
US10482874B2 (en) * | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US12026197B2 (en) | 2017-05-16 | 2024-07-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US12254887B2 (en) | 2017-05-16 | 2025-03-18 | Apple Inc. | Far-field extension of digital assistant services for providing a notification of an event to a user |
US11043205B1 (en) * | 2017-06-27 | 2021-06-22 | Amazon Technologies, Inc. | Scoring of natural language processing hypotheses |
US11081104B1 (en) * | 2017-06-27 | 2021-08-03 | Amazon Technologies, Inc. | Contextual natural language processing |
CN107871500A (en) * | 2017-11-16 | 2018-04-03 | 百度在线网络技术(北京)有限公司 | One kind plays multimedia method and apparatus |
US10878808B1 (en) * | 2018-01-09 | 2020-12-29 | Amazon Technologies, Inc. | Speech processing dialog management |
US11145302B2 (en) * | 2018-02-23 | 2021-10-12 | Samsung Electronics Co., Ltd. | System for processing user utterance and controlling method thereof |
US12211502B2 (en) | 2018-03-26 | 2025-01-28 | Apple Inc. | Natural assistant interaction |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US12061752B2 (en) | 2018-06-01 | 2024-08-13 | Apple Inc. | Attention aware virtual assistant dismissal |
US12080287B2 (en) | 2018-06-01 | 2024-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US12136419B2 (en) | 2019-03-18 | 2024-11-05 | Apple Inc. | Multimodality in digital assistant systems |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US12183346B2 (en) | 2019-05-02 | 2024-12-31 | Samsung Electronics Co., Ltd. | Hub device, multi-device system including the hub device and plurality of devices, and method of operating the same |
US11721343B2 (en) | 2019-05-02 | 2023-08-08 | Samsung Electronics Co., Ltd. | Hub device, multi-device system including the hub device and plurality of devices, and method of operating the same |
WO2020222444A1 (en) * | 2019-05-02 | 2020-11-05 | Samsung Electronics Co., Ltd. | Server for determining target device based on speech input of user and controlling target device, and operation method of the server |
US10964327B2 (en) | 2019-05-02 | 2021-03-30 | Samsung Electronics Co., Ltd. | Hub device, multi-device system including the hub device and plurality of devices, and method of operating the same |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US12154571B2 (en) | 2019-05-06 | 2024-11-26 | Apple Inc. | Spoken notifications |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US12216894B2 (en) | 2019-05-06 | 2025-02-04 | Apple Inc. | User configurable task triggers |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11887585B2 (en) | 2019-05-31 | 2024-01-30 | Apple Inc. | Global re-ranker |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11289074B2 (en) * | 2019-10-01 | 2022-03-29 | Lg Electronics Inc. | Artificial intelligence apparatus for performing speech recognition and method thereof |
US11669918B2 (en) | 2019-10-18 | 2023-06-06 | Meta Platforms Technologies, Llc | Dialog session override policies for assistant systems |
US11699194B2 (en) | 2019-10-18 | 2023-07-11 | Meta Platforms Technologies, Llc | User controlled task execution with task persistence for assistant systems |
US11636438B1 (en) | 2019-10-18 | 2023-04-25 | Meta Platforms Technologies, Llc | Generating smart reminders by assistant systems |
US12019685B1 (en) * | 2019-10-18 | 2024-06-25 | Meta Platforms Technologies, Llc | Context carryover across tasks for assistant systems |
US11341335B1 (en) | 2019-10-18 | 2022-05-24 | Facebook Technologies, Llc | Dialog session override policies for assistant systems |
US11403466B2 (en) | 2019-10-18 | 2022-08-02 | Facebook Technologies, Llc. | Speech recognition accuracy with natural-language understanding based meta-speech systems for assistant systems |
US11314941B2 (en) | 2019-10-18 | 2022-04-26 | Facebook Technologies, Llc. | On-device convolutional neural network models for assistant systems |
US11308284B2 (en) | 2019-10-18 | 2022-04-19 | Facebook Technologies, Llc. | Smart cameras enabled by assistant systems |
US11861674B1 (en) | 2019-10-18 | 2024-01-02 | Meta Platforms Technologies, Llc | Method, one or more computer-readable non-transitory storage media, and a system for generating comprehensive information for products of interest by assistant systems |
US12299755B2 (en) | 2019-10-18 | 2025-05-13 | Meta Platforms Technologies, Llc | Context carryover across tasks for assistant systems |
US11688021B2 (en) | 2019-10-18 | 2023-06-27 | Meta Platforms Technologies, Llc | Suppressing reminders for assistant systems |
US11948563B1 (en) | 2019-10-18 | 2024-04-02 | Meta Platforms, Inc. | Conversation summarization during user-control task execution for assistant systems |
US11443120B2 (en) | 2019-10-18 | 2022-09-13 | Meta Platforms, Inc. | Multimodal entity and coreference resolution for assistant systems |
US12182883B2 (en) | 2019-10-18 | 2024-12-31 | Meta Platforms Technologies, Llc | In-call experience enhancement for assistant systems |
US11704745B2 (en) | 2019-10-18 | 2023-07-18 | Meta Platforms, Inc. | Multimodal dialog state tracking and action prediction for assistant systems |
US20210117681A1 (en) | 2019-10-18 | 2021-04-22 | Facebook, Inc. | Multimodal Dialog State Tracking and Action Prediction for Assistant Systems |
US11238239B2 (en) | 2019-10-18 | 2022-02-01 | Facebook Technologies, Llc | In-call experience enhancement for assistant systems |
US11567788B1 (en) | 2019-10-18 | 2023-01-31 | Meta Platforms, Inc. | Generating proactive reminders for assistant systems |
US11688022B2 (en) | 2019-10-18 | 2023-06-27 | Meta Platforms, Inc. | Semantic representations using structural ontology for assistant systems |
US11694281B1 (en) | 2019-10-18 | 2023-07-04 | Meta Platforms, Inc. | Personalized conversational recommendations by assistant systems |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US12197712B2 (en) | 2020-05-11 | 2025-01-14 | Apple Inc. | Providing relevant data items based on context |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US12301635B2 (en) | 2020-05-11 | 2025-05-13 | Apple Inc. | Digital assistant hardware abstraction |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US12219314B2 (en) | 2020-07-21 | 2025-02-04 | Apple Inc. | User identification using headphones |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
Also Published As
Publication number | Publication date |
---|---|
US20180075846A1 (en) | 2018-03-15 |
US9761225B2 (en) | 2017-09-12 |
US10540965B2 (en) | 2020-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10540965B2 (en) | Semantic re-ranking of NLU results in conversational dialogue applications | |
US9269354B2 (en) | Semantic re-ranking of NLU results in conversational dialogue applications | |
US9171542B2 (en) | Anaphora resolution using linguisitic cues, dialogue context, and general knowledge | |
US9361884B2 (en) | Communicating context across different components of multi-modal dialog applications | |
US20240419659A1 (en) | Method and system of classification in a natural language user interface | |
US11861315B2 (en) | Continuous learning for natural-language understanding models for assistant systems | |
US11948563B1 (en) | Conversation summarization during user-control task execution for assistant systems | |
US11055355B1 (en) | Query paraphrasing | |
US11568855B2 (en) | System and method for defining dialog intents and building zero-shot intent recognition models | |
US20220383190A1 (en) | Method of training classification model, method of classifying sample, and device | |
US20210319051A1 (en) | Conversation oriented machine-user interaction | |
EP3513324B1 (en) | Computerized natural language query intent dispatching | |
US11016968B1 (en) | Mutation architecture for contextual data aggregator | |
US20140163959A1 (en) | Multi-Domain Natural Language Processing Architecture | |
US11393475B1 (en) | Conversational system for recognizing, understanding, and acting on multiple intents and hypotheses | |
US11847424B1 (en) | Natural language generation | |
EP3251114B1 (en) | Transcription correction using multi-token structures | |
US20230094730A1 (en) | Model training method and method for human-machine interaction | |
CN110060674A (en) | Form management method, apparatus, terminal and storage medium | |
US12288549B2 (en) | Spoken query processing for image search | |
GB2604317A (en) | Dialogue management | |
US11893994B1 (en) | Processing optimization using machine learning | |
Korpusik et al. | Dialogue state tracking with convolutional semantic taggers | |
US20060155673A1 (en) | Method and apparatus for robust input interpretation by conversation systems | |
WO2015200422A1 (en) | Semantic re-ranking of nlu results in conversational dialogue applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GANDRABUR, SIMONA;LAVALLEE, JEAN-FRANCOIS;TREMBLAY, REAL;SIGNING DATES FROM 20140609 TO 20140620;REEL/FRAME:033176/0054 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:065533/0389 Effective date: 20230920 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |