US8527262B2 - Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications - Google Patents
Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications Download PDFInfo
- Publication number
- US8527262B2 US8527262B2 US11/767,104 US76710407A US8527262B2 US 8527262 B2 US8527262 B2 US 8527262B2 US 76710407 A US76710407 A US 76710407A US 8527262 B2 US8527262 B2 US 8527262B2
- Authority
- US
- United States
- Prior art keywords
- constituent
- verb
- semantic
- morphemes
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 98
- 238000002372 labelling Methods 0.000 title claims abstract description 34
- 230000000877 morphologic effect Effects 0.000 title claims description 38
- 238000003058 natural language processing Methods 0.000 title description 12
- 239000000470 constituent Substances 0.000 claims abstract description 109
- 238000012545 processing Methods 0.000 claims abstract description 26
- 230000008569 process Effects 0.000 claims description 54
- 238000004458 analytical method Methods 0.000 claims description 48
- 238000013179 statistical model Methods 0.000 claims description 14
- 238000005211 surface analysis Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 description 13
- 238000012549 training Methods 0.000 description 9
- 238000001514 detection method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 102100021723 Arginase-1 Human genes 0.000 description 6
- 101000752037 Homo sapiens Arginase-1 Proteins 0.000 description 6
- 101000800287 Homo sapiens Tubulointerstitial nephritis antigen-like Proteins 0.000 description 6
- 101100260702 Mus musculus Tinagl1 gene Proteins 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 101150088826 arg1 gene Proteins 0.000 description 5
- 238000012805 post-processing Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 238000013145 classification model Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 239000003607 modifier Substances 0.000 description 4
- 102100030356 Arginase-2, mitochondrial Human genes 0.000 description 3
- 101000792835 Homo sapiens Arginase-2, mitochondrial Proteins 0.000 description 3
- 239000011521 glass Substances 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 101150026173 ARG2 gene Proteins 0.000 description 1
- 241001503991 Consolida Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 101100166068 Schizosaccharomyces pombe (strain 972 / ATCC 24843) arg5 gene Proteins 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Definitions
- the present invention relates generally to systems and methods for automated natural language processing and, in particular, systems and methods for automated semantic role labeling for natural language processing of languages having complex morphology.
- natural language processing systems implement various techniques to analyze a natural language text sentence to achieve some level of machine understanding of text input.
- natural language processing applications typically employ automated morphological, syntactic, and semantic analysis techniques to extract and process grammatical/linguistic features of a natural language text sentence based on rules that define the grammar of the target language
- a grammar of a given language defines rules that govern the structure of words (morphology), rules that govern the structure of sentences (syntax) and rules that govern the meanings of words and sentences (semantics).
- morphological rules of grammar are rules that define the syntactic roles, or POS (parts of speech), that a word may have such as noun, verb, adjective etc.
- morphological rules dictate the manner in which words can be modified by adding affixes (i.e., prefixes or suffixes) to generate different, related words.
- affixes i.e., prefixes or suffixes
- a word can have one of several possible inflections within a given POS category, where each inflection marks a distinct use such as gender, number, tense, person, mood or voice.
- syntax rules of grammar govern proper sentence structure, i.e., the correct sequences of the syntactic categories (POSs).
- Syntactic analysis is a process by which syntax rules of grammar are used to combine the words of an input text sentence into phrases and combine the phrases (constituents) into a complete sentence.
- Syntactic analysis is typically performed by constructing one or more hierarchical trees called syntax parse trees. For instance, FIG. 6A depicts an exemplary syntax parse tree for the English language sentence “The man broke the glass” and FIG. 6B depicts an exemplary syntax parse tree for the English language sentence “The man with black hair broke the glass”.
- Each syntax parse tree includes leaf nodes that represent each word of the input sentence, a single root node (S) that represents the complete sentence, and intermediate-level nodes, such as NP (noun phrase), VP (verb phrase), PP (prepositional phrase) nodes, etc, between the root and leaf nodes, which are hierarchically arranged and connected based on the syntax rules of grammar.
- S root node
- intermediate-level nodes such as NP (noun phrase), VP (verb phrase), PP (prepositional phrase) nodes, etc, between the root and leaf nodes, which are hierarchically arranged and connected based on the syntax rules of grammar.
- Semantics rules of grammar govern the meanings of words and sentences.
- Semantic analysis is a process by which semantic rules are used to identify the “semantic roles” of a particular syntactic category within the sentence. For example, “subjects” are generally assigned the role of “who” (agent, actor, doer, or cause of the action, and the like), direct objects are assigned the role of “what” (patient, affected, done-to, or effect of the action, and the like), and modifiers can have a variety of roles such as source, goal, time, and the like.
- Semantic role labeling generally refers to a process of assigning appropriate semantic roles to the arguments of a verb, where for a target verb in a sentence, the goal is to identify constituents that are arguments of the verb and then assign appropriate semantic roles to the verb arguments.
- the “arguments” of a verb are those phrases that are needed in a clause (sentence) to make the clause semantically complete.
- the verb “give” requires three arguments (i) a giver (ii) a taker, and (iii) an object given.
- the verb arguments are (i) John (the giver); (ii) Mary (the taker) and (iii) the book (the object given).
- Semantic role information of sentence constituents is a crucial component in natural language processing (NLP) and natural language understanding (NLU) applications in which semantic parsing of sentences is needed to understand the grammatical relations between the arguments of natural language predicates and resolve syntactic ambiguity.
- NLP natural language processing
- NLU natural language understanding
- semantic parsing of sentences is needed to understand the grammatical relations between the arguments of natural language predicates and resolve syntactic ambiguity.
- the ability to recognize and label semantic arguments is a key task for answering “Who”, “When”, “What”, “Where”, “Why”, etc., questions in applications such as machine translation, information extraction, natural language generation, question answering, text summarization, etc., which require some form of semantic interpretation.
- the role of the verb's direct object can differ even in transitive uses, such as in the following example sentences: (6) “The sergeant played taps” and (7) “The sergeant played a beat-up old bugle.” This alternation in the syntactic realization of semantic arguments is widespread, affecting most verbs in some way, and the patterns exhibited by specific verbs vary widely.
- any parsed corpus makes it possible in some instances to identify the subjects and objects of verbs in sentences such as the above examples, or while the parsed corpus may provide semantic function tags such as temporal and locative for certain constituents (generally syntactic adjuncts), the parsed corpus does not necessarily distinguish the different roles played by a verb's grammatical subject or object in the above examples. Again, this is because the same verb used with the same syntactic sub-categorization can assign different semantic roles. As such, semantic role labeling is difficult using pure syntactic parsers as these parsers are not capable of representing the full, deep semantic meaning of sentence.
- semantic role labeling systems have been implemented using supervised machine learning techniques to train syntactic parsers using a corpus of words annotated with semantic role labels for each verb argument.
- the well-known Proposition Bank project provides a human-annotated corpus of semantic verb-argument relations, where for each verb appearing in the corpus, a set of semantic roles is defined for purposes of providing task independent semantic representations that are independent of the given application.
- the possible labels of arguments are core argument labels ARG [0-5] and modifier argument labels such as ARGM-LOC and ARGM-TMP, for location and temporal modifiers, respectively.
- the entry specific roles for the verb offer are given as:
- the secondary roles include: Location, Time, Manner, Direction, Cause, Discourse, Extent, Purpose, Negation, Modal, and Adverbial, which are represented in PropBank as “ArgM” with an additional function tag, for example ArgM-TMP for temporal.
- a set of roles corresponding to a distinct usage of a verb is called a roleset, and can be associated with a set of syntactic frames indicating allowable syntactic variations in the expression of that set of roles.
- the roleset with its associated frames is called a Frameset.
- a polysemous verb may have more than one Frameset, when the differences in meaning are distinct enough to require different sets of roles, one for each Frameset. This lexical resource provides a consistent argument labels across different syntactic realizations of the same verb. For example, in the following sentences:
- the input to the SRL system is a sequence of white-space delimited words, where each verb is presented by a white-space delimited word and a constituent is presented as a sequence of white-space delimited words, and where punctuations and special characters are assumed to be separated from the words.
- the proposed SRL systems are configured to predict a semantic role label for each white-space delimited verb and each constituent (sequence of white space delimited words).
- the proposed SRL systems are configured to process the input text sentence at the character level.
- Arabic is a Semitic language with rich templatic morphology where an Arabic word may be composed of a stem (consisting of a consonantal root and a template), or a stem plus one or more affixes (prefix or suffix) attached to the beginning and/or end of the stem.
- affixes include inflectional markers for tense, gender, and/or number, as well as prepositions, conjunctions, determiners, possessive pronouns and pronouns, for example.
- Arabic white-space delimited words may be composed of zero or more prefixes, followed by a stem and zero or more suffixes.
- Arabic white-space delimited words may be composed multiple prefixes, a stem, and multiple suffixes, important morphologic information can be missed if Arabic text is processed at the word or character level such as for English and Chinese, resulting in poor performance.
- a method for processing natural language text includes receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes, identifying a target verb as a stem of an inflicted word in the text sentence, grouping morphemes from one or more inflicted words with the same syntactic role into constituents, and predicting a semantic role of a constituent for the target verb.
- a method for processing natural language text includes receiving as input a natural language text sentence comprising a sequence of white-space delimited words including at least one inflicted word comprising a stem and one or more affixes, automatically segmenting the white-space delimited words into separate morphemes including prefixes, stems and suffixe, automatically grouping morphemes into constituents and identifying morphemes that are target verbs, and automatically predicting a semantic role of a constituent for a target verb using a trained statistical model.
- a method for processing natural language text includes receiving as input a natural language text sentence comprising a sequence of white-space delimited words including at least one inflicted word comprising a stem and one or more affixes, automatically performing a morphological analysis on the text sentence as a sequence of characters to extract morphological information, automatically detecting stems of inflicted words that are target verbs and grouping stems and affixes of different words into constituent, using the extracted morphological information, and automatically predicting a semantic role of each constituent for a target verb using a trained statistical model using a plurality of feature data including morphological features extracted during morphological analysis.
- FIG. 1 is a high-level block diagram of a system for automatic semantic role labeling of natural language sentences for languages with complex morphology, according to an exemplary embodiment of the invention.
- FIGS. 2A and 2B schematically illustrate systems and methods for training a statistical semantic role labeling model according to an exemplary embodiment of the invention.
- FIGS. 3A and 3B illustrate a method for segmenting an Arabic text sentence into morphemes, according to an exemplary embodiment of the invention.
- FIGS. 4A and 4B are exemplary parse tree diagrams that illustrate results of syntactic parsing and annotation of semantic role labeling for the exemplary segmented text of FIG. 3B .
- FIG. 5 is a block diagram that illustrates a system and method for semantic role classification of verb arguments for languages with complex morphology, according to an exemplary embodiment of the invention.
- FIGS. 6A and 6B are exemplary syntactic parse trees of English text sentences generated using conventional syntactic analysis methods.
- FIG. 1 is a block diagram of a semantic role labeling system according to an exemplary embodiment of the invention.
- FIG. 1 illustrates an exemplary architecture of an automated semantic role labeling (SRL) system ( 100 ) comprising various components such as a morphological analysis/segmentation module ( 101 ), a verb detection/constituent detection module ( 102 ), a semantic role labeling classification module ( 103 ), a knowledge base repository ( 104 ), an SRL model training module ( 105 ) and an SRL model repository ( 106 ), to provide an integrated computational framework that supports automated semantic role labeling for natural language processing applications targeted for languages with complex morphology.
- SRL automated semantic role labeling
- the SRL system ( 100 ) may be implemented as part of an automatic natural language processing system (e.g., machine translation, information extraction, NLG, question answering, text summarization, etc.), wherein the SRL system ( 100 ) is employed to process input text data ( 110 ) to identify verb-argument structures in input sentences, predict the semantic roles for verb arguments, and output semantic role labeled textual data ( 120 ) in any suitable format that enables machine understanding of the natural language text input for the given application.
- an automatic natural language processing system e.g., machine translation, information extraction, NLG, question answering, text summarization, etc.
- the raw text data ( 110 ) input to the SRL system ( 100 ) may be the recognition result of a user input (e.g., speech input, handwriting input, etc.) to a front-end machine recognition system (e.g., speech recognition system, handwriting recognition system, etc.) for example, where the textual data stream (recognition results) are input to the SRL system ( 100 ) for semantic parsing and interpretation.
- the raw text input ( 110 ) may be text input from a keyboard or memory or any other input source, depending on the application.
- the morphological analysis/segmentation module ( 101 ), verb detection/ constituent detection module ( 102 ) and semantic role labeling classifier module ( 103 ) implement morphological, syntactic, and semantic processing functions to extract various lexical/syntactical features that are processed by the SRL system ( 100 ) to identify and classify verb arguments in input text sentences for languages with high morphology.
- the knowledge base repository ( 104 ) includes diverse sources of information and knowledge, which is used by the model builder ( 105 ) to build/train a SRL model ( 106 ) during a training phase.
- the SRL model ( 106 ) is used by the SLR classifier module ( 103 ) during a decoding phase to make identification and classification decisions with regard to semantic role labeling of verb arguments within input text sentences using various lexical/syntactic features and other information extracted during the decoding process.
- the knowledge base repository ( 104 ) includes data structures, rules, models, configuration files, etc., that are used by the various processing modules ( 101 ), ( 102 ) and ( 103 ) to perform morphological, syntactic, and semantic analysis on input text to thereby extract the lexical features and information that is used for semantic role identification and annotation.
- the SRL system ( 100 ) is trained to identify and classify semantic roles of verb arguments within input text sentences using a SRL model that is configured to make identification and classification decisions regarding an utterance in accordance with an aggregate of a plurality of information sources.
- the model building module ( 105 ) may implement one or more machine-learning and/or model-based methods to construct an SRL model ( 106 ) that is used by the processing module ( 103 ) during a decoding phase to make decisions for semantic role labeling of input text sentences over a set of lexical and syntactic feature data.
- a statistical SRL model can be trained using various machine learning techniques such as maximum entropy modeling, voted perceptron, support vector machines, boosting, statistical decision trees, and/or combinations thereof.
- FIGS. 2A and 2B schematically illustrate systems and methods for training a semantic role labeling model according to an exemplary embodiment of the invention.
- the semantic role labeling model ( 106 ) is configured to make identification and classification decisions regarding an utterance in accordance with an aggregate of a plurality of information sources.
- FIG. 2A is a block diagram that illustrates the various functions and methodologies that may be implemented by the model training module ( 105 ) in FIG. 1 to train the statistical semantic role labeling system ( 100 ), according to an exemplary embodiment of the invention.
- FIG. 2B illustrates various sources of information that may be used to train and decode the statistical semantic role labeling system ( 100 ), according to an exemplary embodiment of the invention.
- a model building process implemented by the model trainer ( 105 ) is generally based on the use of a large human-annotated corpus of semantic verb-argument relations (such as the ProBank annotation) to train a statistical model ( 106 ), together with other sources of syntactic and lexical features that are integrated into the statistical model ( 105 _ 1 ).
- a large human-annotated corpus of semantic verb-argument relations such as the ProBank annotation
- a classification SRL model ( 106 ) for the argument identification and classification processes is trained using a maximum entropy approach ( 105 _ 2 ) using a large real corpus annotated with argument structures and other relevant features, where good performance in a natural language processing applications can be achieved by integrating diverse sources of information or features.
- a Maximum Entropy classification model is used to integrate arbitrary types of information and make a classification decision by aggregating all information available for a given classification.
- Maximum Entropy has many advantages over rule based methods of the prior art. For example, Maximum Entropy permits the use of many information sources and provides flexibility and accuracy needed for changing dynamic language models.
- the Maximum Entropy method is a flexible statistical modeling framework that has been used widely in many areas of natural language processing. Maximum entropy modeling produces a probability model that is as uniform as possible while matching empirical feature expectations, which can be interpreted as making as few assumptions as possible in the model.
- any type of feature can be used, enabling a system designer to experiment with different feature types.
- Maximum entropy modeling permits combinations of multiple overlapping information sources.
- the information sources may be combined as follows:
- This equation describes the probability of a particular outcome (o) (e.g., one of the arguments) given a pair (verb, constituent), and the context.
- ⁇ i is a weighting function or constant used to place a level of importance on the information being considered for the feature.
- the denominator includes a sum over all possible outcomes (o′), which is essentially a normalization factor for probabilities to sum to 1.
- the indicator functions or features fi are activated when certain outcomes are generated for certain context.
- the maximum entropy models may be trained using improved iterative scaling, which is known in the art.
- FIG. 2B schematically illustrates various sources/features that may be integrated into an SRL model ( 106 ) using Maximum Entropy modeling, where information or features extracted from these sources are used to train a Maximum Entropy model.
- the multiple sources of information in statistical modeling may include, e.g., a lexical or surface analysis. This may include the analyzing strings or sequences of characters, morphemes or words.
- Another source may include syntactic analysis information. This looks to the patterned relations that govern the way the words in a sentence come together, the meaning of the words in context, parts of speech, information from parse tree, etc. Semantic analysis looks to the meaning of (parts of) words, phrases, sentences, and texts.
- Morphological analysis may explore all the possible solutions to a multi-dimensional, non-quantified problem. For example, identification of a word stem from a full word form (e.g., morphemes). In addition, any other information may be employed which is helpful in training statistical models for SRL classification.
- the model builder process ( 105 ) may use data in the knowledge base ( 104 ) to train classification models, and possibly dynamically update previously trained classification models that are implemented by the classification process ( 103 ).
- the model builder ( 105 ) may be implemented “off-line” for building/training a classification model that learns to provide proper SRL identification and classification assessments, or the model builder process ( 105 ) may employ “continuous” learning methods that can use domain knowledge in repository ( 104 ) which is updated with additional learned data derived from newly SRL annotated textual data generated by (or input to) the SRL system ( 100 ).
- a continuous learning functionality adds to the robustness of the SRL system ( 100 ) by enabling the classification process ( 103 ) to continually evolve and improve over time without costly human intervention.
- a decoding phase of the SRL system ( 100 ) includes various sub-tasks to process an input text sentence, including, (i) segmenting the text into morphemes or tokens via processing module ( 101 ), (ii) detecting target verbs and grouping tokens into constituents via processing module ( 102 ), and (iii) for given a target verb and a constituent, predicting the verb's arguments, including NONE (no-argument), via processing module ( 103 ).
- This last step is a classification process implemented by the processing module ( 103 ), wherein the SRL model ( 106 ) is used to attribute a semantic role label, or NONE, if the constituent does not fill any role.
- These various sub-tasks may employ various types and combinations of lingusitc computational methodologies depending on the application.
- An exemplary mode of operation of the SRL system ( 100 ) will now be discussed with regard to exemplary operating modes and architectures for implementing the processing stages ( 101 ), ( 102 ) and ( 103 ) for automated semantic role labeling of language with high morphology such as Arabic.
- an exemplary operating mode of the processing stages will be described in the context of processing a sample Arabic text input sentence as depicted in FIG. 3A , wherein the input sentence ( 300 ) is a sequence of white-space delimited words that are written from left to right.
- the SRL system ( 100 ) is assumed to be adapted to determine verb-argument structures and predict semantic labels for given input-sentences using the set of semantic roles as defined by PropBank as discussed above, including those roles traditionally viewed as arguments and as adjuncts.
- the morphological analysis/segmentation module ( 101 ) receives the raw text input ( 110 ) and segments words into tokens or morphemes.
- the morphological analysis/segmentation module ( 101 ) will segment white-space delimited words into (hypothesized) prefixes, stems, and suffixes, which may then become the subjects of analysis for further processing of the text. In this manner, verbs and pronouns can be processed as separate tokens. This segmenting process enables a surface word form into its component derivational and inflectional morphemes.
- FIG. 3A illustrates an input sentence in Arabic text, where a target verb (V) and constituent (C) are part of the same word (W) (where constituent (C) has a role of Arg1 and is a suffix).
- the English translation for the Arabic text sentence in FIG. 3A is: “if legislation members returned to those they elected them from different regions and counties they will found popular agreement in this sense”, where the phrase “elected them” is one-Arabic word (W), where “elected” is the verb (V) and “them” is a constituent (C). It is important to know that in the example of FIG. 3A , the target verb and one of its constituent are part of the same white-space delimited word where the verb and constituent are a sequence of characters in the white-space delimited word
- FIG. 3B illustrates a segmented text sentence ( 301 ) that is generated by performing segmentation on the input ext sentence ( 300 ) of FIG. 3A .
- FIG. 3B is the Arabic segmented text sentence ( 301 ) where the original white-space delimited Arabic word (W) is segmented into separate “tokens” including the verb (V) token and the constituent (C) token .
- the character “ ⁇ ” is used in the segmented sentence ( 301 ) to mark the separation of the tokens in the original white-space delimited word (W).
- FIG. 3B illustrates a “light segmentation” process in which the input text is segmented into tokens by segmenting stems and affixes for the semantic role labeling task.
- the input text sentence can be segmented into more detailed morphemes (e.g., POS segmentation) with POS tags which can then be grouped together to form lighter segmentation as features that are used for semantic role labeling.
- more detailed morphemes e
- segmentation can be implemented using machine learning techniques such as FST (finite state transducer) or Maximum Entropy to train the model on a corpus of training date already segmented using a plurality of information sources, wherein the trained model can then be used to segment new raw text.
- FST finite state transducer
- Maximum Entropy to train the model on a corpus of training date already segmented using a plurality of information sources, wherein the trained model can then be used to segment new raw text.
- the segmentation process provides important morphological feature data that enables increased accuracy in the SRL process. This is in contrast to conventional methods for processing English text, as noted above, where the SRL process will first detect target verb (a white-space delimited word) and then determine constituents (sequence of white-space delimited words) with their correct arguments. This approach is not particularly applicable for high morphology languages as it will not allow detection of the verb nor the constituent in the exemplary sentence of FIG. 3A . Moreover, if a character based approach is used such as the Chinese SRL methodology, important morphological information such as pronouns, etc, can be missed, therefore resulting in poor performance.
- the initial segmentation process is an optional process.
- high morphological text such as Arabic text can be processed on a character level (rather than word level) in instances where morphological information is integrated into the SRL model during training and decoding.
- the segmentation process is not needed where morphological analysis is performed as a character level using morphological information during the classification process (discussed below).
- the verb detection and parsing/constituent detection module ( 102 ) receives as input the segmented text (sequence of tokens), and groups the segmented text into constituents, where sequences of tokens that have the same role are grouped together—indicated by the label (argument).
- the constituents are formed by building a parse tree, where each node in the tree is a constituent.
- FIG. 4A is an exemplary diagram of a parse tree ( 400 ) of a portion of the sentence of FIG. 3B which illustrates the use of the parse to group a portion of the tokens in FIG. 3B into constituents (nodes in the tree) using PropBank constituent labels.
- FIG. 400 is an exemplary diagram of a parse tree of a portion of the sentence of FIG. 3B which illustrates the use of the parse to group a portion of the tokens in FIG. 3B into constituents (nodes in the tree) using PropBank constituent labels.
- NP noun phrase
- PP prepositional phrase
- VP Very Low Density Polymer
- VBD Very Low Density Polymer
- SBAR clause introduced by complementizer
- WHNP Wh-noun Phrase
- This process can be implemented using known statistical parsing approaches.
- a statistical model based on maximum entropy and a plurality of information sources can be used to build a reasonably accurate parser.
- the SRL classifier ( 103 ) receives the parse tree as input and processed the target verbs and constituents (e.g., node in the parse tree). For each pair (verb, constituent), the semantic role labeling classifier ( 103 ) predicts the argument of the verb, including NONE if there is no argument (i.e., constituent does not fill any role).
- This process is a classification process in which the classification module ( 103 ) uses the trained statistical SRL model ( 106 ) as well as the input utterance (text) and other relevant features to compute for each pair (verb, constituent) the likelihood of each possible semantic role label (argument) for the given verb and context.
- the argument with highest score may be assigned, or the N-best arguments may be assigned and subjected to further post processings steps to eventually select the most probable and best argment assignment.
- binary classification process may first be used to detect or otherwise identify whether a candidate constituent is an argument or not, and then predict the argument's number (among set of arguments) if the candidate is identified as an argument.
- FIG. 4B is an exemplary Arabic parse tree with Semantic Role Annotations which is generated as a result of SRL classification of the parse tree of FIG. 4A .
- the semantic role label Arg1 attributes to the constituent (they) (NP-OBJ node), of those elected, resulting in the node NP-OBJ-Arg1.
- FIG. 5 is a flow diagram of a semantic role labeling classification method according to an exemplary embodiment of the invention.
- FIG. 5 illustrates an exemplary process flow which may be implemented by the SRL classification module ( 103 ) in FIG. 1 during a decoding phase in which the SRL classifier ( 103 ) uses the trained SRL model to determine whether the constituents in an input sentence represent semantic arguments of a given verb and assign appropriate argument labels to those constituents, including NONE (no-argument) labels for those constituents that do not correspond to a semantic argument of the given verb.
- the SRL classification module ( 103 ) receives as input raw text data and other associated syntactic/lexical features that were extracted during prior processing stages (step 500 ).
- the input data can include raw text along with labels/tags for segmented text, parse tree features, target verb labels, constituent labels, and other diverse sources of information, similar to those feature computed to train the SRL model.
- the input text may be subjected to various decoding stages (e.g., processing stages 101 , and 102 in FIG. 1 ) to obtain lexical/grammatical features that were used to train the SRL model.
- decoding stages e.g., processing stages 101 , and 102 in FIG. 1
- the types of features that may be used for classification will vary.
- the SRL classification module ( 103 ) will process the input data/features using the trained SRL model, which is trained to make identification and classification decisions regarding role assignments for constituents over a plurality of input features (step ( 501 )).
- the SRL classification module ( 103 ) processes this input text and associated features using the trained SRL model to predict argument roles and determine the highest probable role assignment(s) for all constituents of a given verb in the sentence, given the set of features of each constituent in the parse tree.
- This classification process (step 501 ) process may be implemented in various ways.
- the process (step 501 ) may be implemented to include sub-processes ( 501 _ 1 ) and ( 501 _ 2 ).
- the classifier module will use the SRL model and input data/features to compute/estimate for each verb/constituent pair, a set or distribution of probabilities that indicate how likely the given constituent fills a semantic role as an argument of the given verb for all possible semantic roles of the verb, given the set of features processed by the SRL model (step 501 _ 1 ).
- an SRL model can be based on a maximum entropy framework where the SRL model is trained to process the set of feature data for each verb/constituent pair to thereby assign probabilities to each possible argument, including a NONE or NULL label that indicates the constituent is not a semantic argument of the verb.
- the SRL model may be a maximum entropy model that is trained to output all possible semantic roles for a given verb/constituent pair.
- SRL model is used to estimate probable semantic role assignments (assign probabilities to each argument)
- a search may be conducted over the role assignments to explore the context and predict the most likely argument(s) (step 501 _ 2 ) using known techniques such as Viterbi searching or dynamic programming, etc. This process takes into consideration previous states and current observations to determine the next state probabilities to thereby ensure that the sequence of classifications produced by the SRL model are coherent.
- the classification process (step 501 ) may be configured to produce different outputs depending on whether the classification results post-processing steps that may be implemented.
- the SRL classification process ( 501 ) may be configured to (i) output a set of N-best argument labels for each constituent of a given verb when post processing steps ( 502 ), 503 ) are employed or (ii) output the best argument label for each constituent of a given verb when no post processing steps are implemented.
- the classification results (step 501 ) can be further processed using an n-best parse hypothesis process (step 502 ) and a re-scoring or re-ranking process (step 503 ) to enhance the accuracy of semantic role labeling of verb arguments.
- an argument lattice is generated using an N-best hypotheses analysis for each node in the syntax tree.
- Each of the N-best arguments for the given constituents are considered as potential argument candidates while performing a search through the argument lattice using argument sequence information to find the maximum likelihood path through the lattice (step 502 ).
- the argument labels for the best path or the N-best paths can be assigned.
- An optional re-ranking and rescoring process may be implemented to increase the accuracy of the semantic role labeling system.
- the process of re-ranking is to select the overall best solution from a pool of complete solutions (i.e., best path from a set of best paths (N-best paths) output from the N-best hypothesis process (step 502 ).
- Re-Ranking can integrate different types of information into a statistical Model, using maximum entropy or voted perceptron for example, to classify the N-best list.
- the predicted scores of the arguments are used as the input features to the re-ranking process.
- Other types of information can be input to the re-ranking method using well-known methods.
- a semantic role labeling representation of the input text is generated and output where for each sentence, the verb arguments are labeled according to the best semantic role determined via the classification process (step 504 ).
- the SRL representation may be a parse tree where each node in the parse tree has a semantic role label that represents a semantic role played by the verb argument or a Null label that indicates that the node does not correspond to a semantic argument of the verb.
- the arguments of a verb can be labeled ARG0 to ARG5, as core arguments, and possibly adjunctive arguments, ARGMs.
- the classification process ( 501 ) outputs the best argument label for each constituent of a given verb for a given sentence, which label is used to generate the SRL representation (step 504 ).
- the systems and methods described herein in accordance with the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
- the present invention may be implemented in software as an application comprising program instructions that are tangibly embodied on one or more program storage devices (e.g., magnetic floppy disk, RAM, CD Rom, DVD, ROM and flash memory), and executable by any device or machine comprising a suitable architecture.
- program storage devices e.g., magnetic floppy disk, RAM, CD Rom, DVD, ROM and flash memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
- Arg0 entity offering
- Arg1 commodity
- Arg2 price
- Arg3 benefactive or entity offered to
-
- [ARG0 the company] to offer [ARG1 a 15% to 20% stake] [ARG2 to the public];
- [ARG0 Sotheby's] . . . offered [ARG2 the Dorrance heirs] [ARG1 a money-back guarantee];
- [ARG1 an amendment] offered by [ARG0 Rep. Peter DeFazio]; and
- [ARG2 Subcontractors] will be offered [ARG1 a settlement].
-
- [ARG0 John] broke [ARG1 the window]
- [ARG1 The window] broke,
the arguments of the verbs are labeled as numbered arguments: Arg0 and Arg1, and so on according to their specific roles despite the different syntactic positions of the labeled phrases (words between brackets). In particular, in the above example, it is recognized that each argument plays the same role (as indicated by the numbered label Arg) in the meaning of the particular sense of the verb broke. These phrases are called “constituents” of semantic roles. In this example, the constituent [the window] is recognized as the verb's object in both sentences.
where oi is the outcome associated feature fi and qi(h) is an indicator function for histories. The maximum entropy models may be trained using improved iterative scaling, which is known in the art.
Claims (17)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/767,104 US8527262B2 (en) | 2007-06-22 | 2007-06-22 | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/767,104 US8527262B2 (en) | 2007-06-22 | 2007-06-22 | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
Publications (2)
Publication Number | Publication Date |
---|---|
US20080319735A1 US20080319735A1 (en) | 2008-12-25 |
US8527262B2 true US8527262B2 (en) | 2013-09-03 |
Family
ID=40137416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/767,104 Active 2030-08-18 US8527262B2 (en) | 2007-06-22 | 2007-06-22 | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
Country Status (1)
Country | Link |
---|---|
US (1) | US8527262B2 (en) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100250335A1 (en) * | 2009-03-31 | 2010-09-30 | Yahoo! Inc | System and method using text features for click prediction of sponsored search advertisements |
US20110123967A1 (en) * | 2009-11-24 | 2011-05-26 | Xerox Corporation | Dialog system for comprehension evaluation |
US20110257962A1 (en) * | 2008-10-22 | 2011-10-20 | Kumar Bulusu Gopi | System and method for automatically generating sentences of a language |
US20130311166A1 (en) * | 2012-05-15 | 2013-11-21 | Andre Yanpolsky | Domain-Specific Natural-Language Processing Engine |
US20140109058A1 (en) * | 2012-10-12 | 2014-04-17 | Vmware,Inc. | Test language interpreter |
US20140180692A1 (en) * | 2011-02-28 | 2014-06-26 | Nuance Communications, Inc. | Intent mining via analysis of utterances |
US20140200879A1 (en) * | 2013-01-11 | 2014-07-17 | Brian Sakhai | Method and System for Rating Food Items |
US8839202B2 (en) | 2012-10-12 | 2014-09-16 | Vmware, Inc. | Test environment managed within tests |
US8839201B2 (en) | 2012-10-12 | 2014-09-16 | Vmware, Inc. | Capturing test data associated with error conditions in software item testing |
US9069902B2 (en) | 2012-10-12 | 2015-06-30 | Vmware, Inc. | Software test automation |
US9152623B2 (en) | 2012-11-02 | 2015-10-06 | Fido Labs, Inc. | Natural language processing system and method |
US9269353B1 (en) * | 2011-12-07 | 2016-02-23 | Manu Rehani | Methods and systems for measuring semantics in communications |
US9292422B2 (en) | 2012-10-12 | 2016-03-22 | Vmware, Inc. | Scheduled software item testing |
US9292416B2 (en) | 2012-10-12 | 2016-03-22 | Vmware, Inc. | Software development kit testing |
US20160299884A1 (en) * | 2013-11-11 | 2016-10-13 | The University Of Manchester | Transforming natural language requirement descriptions into analysis models |
US9633008B1 (en) * | 2016-09-29 | 2017-04-25 | International Business Machines Corporation | Cognitive presentation advisor |
US9684587B2 (en) | 2012-10-12 | 2017-06-20 | Vmware, Inc. | Test creation with execution |
US20180011830A1 (en) * | 2015-01-23 | 2018-01-11 | National Institute Of Information And Communications Technology | Annotation Assisting Apparatus and Computer Program Therefor |
CN107818082A (en) * | 2017-09-25 | 2018-03-20 | 沈阳航空航天大学 | With reference to the semantic role recognition methods of phrase structure tree |
CN108427667A (en) * | 2017-02-15 | 2018-08-21 | 北京国双科技有限公司 | A kind of segmentation method and device of legal documents |
US10067858B2 (en) | 2012-10-12 | 2018-09-04 | Vmware, Inc. | Cloud-based software testing |
US20190074005A1 (en) * | 2017-09-06 | 2019-03-07 | Zensar Technologies Limited | Automated Conversation System and Method Thereof |
CN109994103A (en) * | 2019-03-26 | 2019-07-09 | 北京博瑞彤芸文化传播股份有限公司 | A kind of training method of intelligent semantic Matching Model |
US10387294B2 (en) | 2012-10-12 | 2019-08-20 | Vmware, Inc. | Altering a test |
CN110634336A (en) * | 2019-08-22 | 2019-12-31 | 北京达佳互联信息技术有限公司 | Method and device for generating audio electronic book |
US20200074322A1 (en) * | 2018-09-04 | 2020-03-05 | Rovi Guides, Inc. | Methods and systems for using machine-learning extracts and semantic graphs to create structured data to drive search, recommendation, and discovery |
US10606946B2 (en) * | 2015-07-06 | 2020-03-31 | Microsoft Technology Licensing, Llc | Learning word embedding using morphological knowledge |
US20200134018A1 (en) * | 2018-10-25 | 2020-04-30 | Srivatsan Laxman | Mixed-initiative dialog automation with goal orientation |
US10956670B2 (en) | 2018-03-03 | 2021-03-23 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
WO2021147875A1 (en) * | 2020-01-20 | 2021-07-29 | 华为技术有限公司 | Text screening method and apparatus |
US11176329B2 (en) | 2020-02-18 | 2021-11-16 | Bank Of America Corporation | Source code compiler using natural language input |
US11182560B2 (en) | 2019-02-15 | 2021-11-23 | Wipro Limited | System and method for language independent iterative learning mechanism for NLP tasks |
US11210472B2 (en) | 2019-05-08 | 2021-12-28 | Tata Consultancy Services Limited | Automated extraction of message sequence chart from textual description |
US11250128B2 (en) | 2020-02-18 | 2022-02-15 | Bank Of America Corporation | System and method for detecting source code anomalies |
US11568153B2 (en) | 2020-03-05 | 2023-01-31 | Bank Of America Corporation | Narrative evaluator |
US11657229B2 (en) | 2020-05-19 | 2023-05-23 | International Business Machines Corporation | Using a joint distributional semantic system to correct redundant semantic verb frames |
US11822589B2 (en) | 2020-09-16 | 2023-11-21 | L&T Technology Services Limited | Method and system for performing summarization of text |
DE102023205209A1 (en) | 2022-06-30 | 2024-01-04 | Bosch Global Software Technologies Private Limited | Control unit for assigning at least one element of a plurality of documents and methods therefor |
US12106221B2 (en) | 2019-06-13 | 2024-10-01 | International Business Machines Corporation | Predicting functional tags of semantic role labeling |
Families Citing this family (144)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8799776B2 (en) * | 2001-07-31 | 2014-08-05 | Invention Machine Corporation | Semantic processor for recognition of whole-part relations in natural language documents |
US9009590B2 (en) * | 2001-07-31 | 2015-04-14 | Invention Machines Corporation | Semantic processor for recognition of cause-effect relations in natural language documents |
US8671341B1 (en) * | 2007-01-05 | 2014-03-11 | Linguastat, Inc. | Systems and methods for identifying claims associated with electronic text |
US9342588B2 (en) * | 2007-06-18 | 2016-05-17 | International Business Machines Corporation | Reclassification of training data to improve classifier accuracy |
US8521511B2 (en) | 2007-06-18 | 2013-08-27 | International Business Machines Corporation | Information extraction in a natural language understanding system |
US8812296B2 (en) * | 2007-06-27 | 2014-08-19 | Abbyy Infopoisk Llc | Method and system for natural language dictionary generation |
US20090024385A1 (en) * | 2007-07-16 | 2009-01-22 | Semgine, Gmbh | Semantic parser |
US8543565B2 (en) * | 2007-09-07 | 2013-09-24 | At&T Intellectual Property Ii, L.P. | System and method using a discriminative learning approach for question answering |
US8655643B2 (en) * | 2007-10-09 | 2014-02-18 | Language Analytics Llc | Method and system for adaptive transliteration |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8170969B2 (en) * | 2008-08-13 | 2012-05-01 | Siemens Aktiengesellschaft | Automated computation of semantic similarity of pairs of named entity phrases using electronic document corpora as background knowledge |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8489388B2 (en) | 2008-11-10 | 2013-07-16 | Apple Inc. | Data detection |
WO2010105216A2 (en) * | 2009-03-13 | 2010-09-16 | Invention Machine Corporation | System and method for automatic semantic labeling of natural language texts |
US20130166303A1 (en) * | 2009-11-13 | 2013-06-27 | Adobe Systems Incorporated | Accessing media data using metadata repository |
US20110119050A1 (en) * | 2009-11-18 | 2011-05-19 | Koen Deschacht | Method for the automatic determination of context-dependent hidden word distributions |
US8731943B2 (en) * | 2010-02-05 | 2014-05-20 | Little Wing World LLC | Systems, methods and automated technologies for translating words into music and creating music pieces |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
EP2583421A1 (en) * | 2010-06-16 | 2013-04-24 | Sony Mobile Communications AB | User-based semantic metadata for text messages |
AU2011274318A1 (en) * | 2010-06-29 | 2012-12-20 | Royal Wins Pty Ltd | System and method of providing a computer-generated response |
CN101908042B (en) | 2010-08-09 | 2016-04-13 | 中国科学院自动化研究所 | A kind of mask method of bilingual combination semantic role |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8903707B2 (en) * | 2012-01-12 | 2014-12-02 | International Business Machines Corporation | Predicting pronouns of dropped pronoun style languages for natural language translation |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
CN104169909B (en) * | 2012-06-25 | 2016-10-05 | 株式会社东芝 | Context resolution device and context resolution method |
US9720903B2 (en) * | 2012-07-10 | 2017-08-01 | Robert D. New | Method for parsing natural language text with simple links |
US10810368B2 (en) | 2012-07-10 | 2020-10-20 | Robert D. New | Method for parsing natural language text with constituent construction links |
US9280520B2 (en) * | 2012-08-02 | 2016-03-08 | American Express Travel Related Services Company, Inc. | Systems and methods for semantic information retrieval |
US20140067394A1 (en) * | 2012-08-28 | 2014-03-06 | King Abdulaziz City For Science And Technology | System and method for decoding speech |
GB201216640D0 (en) | 2012-09-18 | 2012-10-31 | Touchtype Ltd | Formatting module, system and method for formatting an electronic character sequence |
US20140172767A1 (en) * | 2012-12-14 | 2014-06-19 | Microsoft Corporation | Budget optimal crowdsourcing |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
CN110442699A (en) | 2013-06-09 | 2019-11-12 | 苹果公司 | Operate method, computer-readable medium, electronic equipment and the system of digital assistants |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
WO2015184186A1 (en) | 2014-05-30 | 2015-12-03 | Apple Inc. | Multi-command single utterance input method |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9805028B1 (en) * | 2014-09-17 | 2017-10-31 | Google Inc. | Translating terms using numeric representations |
US9886432B2 (en) * | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10621390B1 (en) * | 2014-12-01 | 2020-04-14 | Massachusetts Institute Of Technology | Method and apparatus for summarization of natural language |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
EP3144822A1 (en) * | 2015-09-21 | 2017-03-22 | Tata Consultancy Services Limited | Tagging text snippets |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10282411B2 (en) * | 2016-03-31 | 2019-05-07 | International Business Machines Corporation | System, method, and recording medium for natural language learning |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10366234B2 (en) * | 2016-09-16 | 2019-07-30 | Rapid7, Inc. | Identifying web shell applications through file analysis |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10402499B2 (en) * | 2016-11-17 | 2019-09-03 | Goldman Sachs & Co. LLC | System and method for coupled detection of syntax and semantics for natural language understanding and generation |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
CN106776576B (en) * | 2016-12-29 | 2020-04-03 | 竹间智能科技(上海)有限公司 | Clause and semantic role marking method and system based on CoNLL format |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | User interface for correcting recognition errors |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | Low-latency intelligent automated assistant |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | Far-field extension for digital assistant services |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
CN107491440B (en) * | 2017-09-19 | 2021-07-16 | 马上消费金融股份有限公司 | Natural language word segmentation construction method and system and natural language classification method and system |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
CN109299455A (en) * | 2017-12-20 | 2019-02-01 | 北京联合大学 | A computer language processing method for Chinese gerunds with unusual collocations |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US11023684B1 (en) * | 2018-03-19 | 2021-06-01 | Educational Testing Service | Systems and methods for automatic generation of questions from text |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | Virtual assistant operation in multi-device environments |
DK179822B1 (en) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10944859B2 (en) | 2018-06-03 | 2021-03-09 | Apple Inc. | Accelerated task performance |
CN110895659B (en) * | 2018-08-22 | 2023-05-26 | 阿里巴巴集团控股有限公司 | Model training method, recognition device and computing equipment |
US10909324B2 (en) * | 2018-09-07 | 2021-02-02 | The Florida International University Board Of Trustees | Features for classification of stories |
CN109582949B (en) * | 2018-09-14 | 2022-11-22 | 创新先进技术有限公司 | Event element extraction method and device, computing equipment and storage medium |
CN109840327B (en) * | 2019-01-31 | 2023-05-12 | 北京嘉和海森健康科技有限公司 | Vocabulary recognition method and device |
US11227102B2 (en) * | 2019-03-12 | 2022-01-18 | Wipro Limited | System and method for annotation of tokens for natural language processing |
CN110362656A (en) * | 2019-06-03 | 2019-10-22 | 广东幽澜机器人科技有限公司 | A kind of semantic feature extracting method and device |
US11262978B1 (en) * | 2019-06-19 | 2022-03-01 | Amazon Technologies, Inc. | Voice-adapted reformulation of web-based answers |
KR20240129242A (en) | 2019-09-16 | 2024-08-27 | 도큐가미, 인크. | Cross-document intelligent authoring and processing assistant |
US20230025835A1 (en) * | 2019-10-25 | 2023-01-26 | Nippon Telegraph And Telephone Corporation | Workflow generation support apparatus, workflow generation support method and workflow generation support program |
US11481548B2 (en) * | 2019-12-05 | 2022-10-25 | Tencent America LLC | Zero pronoun recovery and resolution |
CN111475650B (en) * | 2020-04-02 | 2023-04-07 | 中国人民解放军国防科技大学 | Russian semantic role labeling method, system, device and storage medium |
CN111738019B (en) * | 2020-06-24 | 2025-03-11 | 深圳前海微众银行股份有限公司 | A method and device for recognizing a paraphrase sentence |
WO2022006244A1 (en) * | 2020-06-30 | 2022-01-06 | RELX Inc. | Methods and systems for performing legal brief analysis |
US20230343333A1 (en) | 2020-08-24 | 2023-10-26 | Unlikely Artificial Intelligence Limited | A computer implemented method for the aut0omated analysis or use of data |
CN112270169B (en) * | 2020-10-14 | 2023-07-25 | 北京百度网讯科技有限公司 | Method and device for predicting dialogue roles, electronic equipment and storage medium |
CN112528676B (en) * | 2020-12-18 | 2022-07-08 | 南开大学 | Document-level event argument extraction method |
CN112765987A (en) * | 2021-01-26 | 2021-05-07 | 武汉大学 | Event identification method and system based on recursive conditional random field decoder |
CN113297367B (en) * | 2021-06-29 | 2024-08-02 | 中国平安人寿保险股份有限公司 | Method and related equipment for generating user dialogue links |
US11977854B2 (en) | 2021-08-24 | 2024-05-07 | Unlikely Artificial Intelligence Limited | Computer implemented methods for the automated analysis or use of data, including use of a large language model |
US11989527B2 (en) | 2021-08-24 | 2024-05-21 | Unlikely Artificial Intelligence Limited | Computer implemented methods for the automated analysis or use of data, including use of a large language model |
US12073180B2 (en) | 2021-08-24 | 2024-08-27 | Unlikely Artificial Intelligence Limited | Computer implemented methods for the automated analysis or use of data, including use of a large language model |
US12067362B2 (en) | 2021-08-24 | 2024-08-20 | Unlikely Artificial Intelligence Limited | Computer implemented methods for the automated analysis or use of data, including use of a large language model |
US11989507B2 (en) | 2021-08-24 | 2024-05-21 | Unlikely Artificial Intelligence Limited | Computer implemented methods for the automated analysis or use of data, including use of a large language model |
CN113987104B (en) * | 2021-09-28 | 2024-06-21 | 浙江大学 | Generating type event extraction method based on ontology guidance |
CN113971205A (en) * | 2021-11-04 | 2022-01-25 | 杭州安恒信息技术股份有限公司 | Threat report attack behavior extraction method, device, equipment and storage medium |
CN114283411B (en) * | 2021-12-20 | 2022-11-15 | 北京百度网讯科技有限公司 | Text recognition method, and training method and device of text recognition model |
CN114861672A (en) * | 2022-04-12 | 2022-08-05 | 中央民族大学 | HowNet dynamic role knowledge system construction method and system based on Tibetan language features |
CN114896394B (en) * | 2022-04-18 | 2024-04-05 | 桂林电子科技大学 | Event trigger word detection and classification method based on multilingual pre-training model |
WO2024020603A2 (en) * | 2022-07-22 | 2024-01-25 | President And Fellows Of Harvard College | Augmenting sports videos using natural language |
US12147757B2 (en) * | 2022-12-28 | 2024-11-19 | Tencent America LLC | Unifying text segmentation and long document summarization |
CN116204642B (en) * | 2023-03-06 | 2023-10-27 | 上海阅文信息技术有限公司 | Intelligent character implicit attribute recognition analysis method, system and application in digital reading |
CN116720501A (en) * | 2023-06-08 | 2023-09-08 | 广州大学 | Attack entity and relation extraction method and system for open source network threat information |
CN117436459B (en) * | 2023-12-20 | 2024-05-31 | 商飞智能技术有限公司 | Verb-verb semantic relationship identification method and device |
CN118964549B (en) * | 2024-07-17 | 2025-04-22 | 唯界科技(杭州)有限公司 | A method for constructing multi-role intelligent agents based on large language models |
CN118690748B (en) * | 2024-08-26 | 2025-01-28 | 科大讯飞股份有限公司 | A method and related device for identifying English fixed collocations |
Citations (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4964044A (en) * | 1986-05-20 | 1990-10-16 | Kabushiki Kaisha Toshiba | Machine translation system including semantic information indicative of plural and singular terms |
US5099425A (en) * | 1988-12-13 | 1992-03-24 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for analyzing the semantics and syntax of a sentence or a phrase |
US5299125A (en) * | 1990-08-09 | 1994-03-29 | Semantic Compaction Systems | Natural language processing system and method for parsing a plurality of input symbol sequences into syntactically or pragmatically correct word messages |
US5640501A (en) * | 1990-10-31 | 1997-06-17 | Borland International, Inc. | Development system and methods for visually creating goal oriented electronic form applications having decision trees |
US5680511A (en) * | 1995-06-07 | 1997-10-21 | Dragon Systems, Inc. | Systems and methods for word recognition |
US5835888A (en) * | 1996-06-10 | 1998-11-10 | International Business Machines Corporation | Statistical language model for inflected languages |
US5875334A (en) * | 1995-10-27 | 1999-02-23 | International Business Machines Corporation | System, method, and program for extending a SQL compiler for handling control statements packaged with SQL query statements |
US5995922A (en) * | 1996-05-02 | 1999-11-30 | Microsoft Corporation | Identifying information related to an input word in an electronic dictionary |
US6067520A (en) * | 1995-12-29 | 2000-05-23 | Lee And Li | System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models |
US20010014899A1 (en) * | 2000-02-04 | 2001-08-16 | Yasuyuki Fujikawa | Structural documentation system |
US6292771B1 (en) * | 1997-09-30 | 2001-09-18 | Ihc Health Services, Inc. | Probabilistic method for natural language processing and for encoding free-text data into a medical database by utilizing a Bayesian network to perform spell checking of words |
US6415250B1 (en) * | 1997-06-18 | 2002-07-02 | Novell, Inc. | System and method for identifying language using morphologically-based techniques |
US20020152202A1 (en) * | 2000-08-30 | 2002-10-17 | Perro David J. | Method and system for retrieving information using natural language queries |
US6477488B1 (en) * | 2000-03-10 | 2002-11-05 | Apple Computer, Inc. | Method for dynamic context scope selection in hybrid n-gram+LSA language modeling |
US20020164006A1 (en) * | 2001-05-04 | 2002-11-07 | Weiss Lewis E. | Electronic document call back system |
US20020198810A1 (en) * | 2001-06-11 | 2002-12-26 | Donatien Roger | Online creation and management of enterprises |
US20030018469A1 (en) * | 2001-07-20 | 2003-01-23 | Humphreys Kevin W. | Statistically driven sentence realizing method and apparatus |
US20030033284A1 (en) * | 2000-11-13 | 2003-02-13 | Peter Warren | User defined view selection utility |
US20030036912A1 (en) * | 2001-08-15 | 2003-02-20 | Sobotta Thu Dang | Computerized tax transaction system |
US20030055754A1 (en) * | 2000-11-30 | 2003-03-20 | Govone Solutions, Lp | Method, system and computer program product for facilitating a tax transaction |
US20030061022A1 (en) * | 2001-09-21 | 2003-03-27 | Reinders James R. | Display of translations in an interleaved fashion with variable spacing |
US6594783B1 (en) * | 1999-08-27 | 2003-07-15 | Hewlett-Packard Development Company, L.P. | Code verification by tree reconstruction |
US20030182102A1 (en) * | 2002-03-20 | 2003-09-25 | Simon Corston-Oliver | Sentence realization model for a natural language generation system |
US20030182631A1 (en) * | 2002-03-22 | 2003-09-25 | Xerox Corporation | Systems and methods for determining the topic structure of a portion of text |
US20040078271A1 (en) * | 2002-10-17 | 2004-04-22 | Ubs Painewebber Inc. | Method and system for tax reporting |
US20040230415A1 (en) * | 2003-05-12 | 2004-11-18 | Stefan Riezler | Systems and methods for grammatical text condensation |
US6823325B1 (en) * | 1999-11-23 | 2004-11-23 | Trevor B. Davies | Methods and apparatus for storing and retrieving knowledge |
US20050192807A1 (en) * | 2004-02-26 | 2005-09-01 | Ossama Emam | Hierarchical approach for the statistical vowelization of Arabic text |
US7024399B2 (en) * | 1999-04-05 | 2006-04-04 | American Board Of Family Practice, Inc. | Computer architecture and process of patient generation, evolution, and simulation for computer based testing system using bayesian networks as a scripting language |
US20060116860A1 (en) * | 2004-11-30 | 2006-06-01 | Xerox Corporation | Systems and methods for user-interest sensitive condensation |
US7080062B1 (en) * | 1999-05-18 | 2006-07-18 | International Business Machines Corporation | Optimizing database queries using query execution plans derived from automatic summary table determining cost based queries |
US20060204945A1 (en) * | 2005-03-14 | 2006-09-14 | Fuji Xerox Co., Ltd. | Question answering system, data search method, and computer program |
US20060212859A1 (en) * | 2005-03-18 | 2006-09-21 | Microsoft Corporation | System and method for generating XML-based language parser and writer |
US20060235689A1 (en) * | 2005-04-13 | 2006-10-19 | Fuji Xerox Co., Ltd. | Question answering system, data search method, and computer program |
US7421386B2 (en) * | 2003-10-23 | 2008-09-02 | Microsoft Corporation | Full-form lexicon with tagged data and methods of constructing and using the same |
US7475015B2 (en) * | 2003-09-05 | 2009-01-06 | International Business Machines Corporation | Semantic language modeling and confidence measurement |
US20090076795A1 (en) * | 2007-09-18 | 2009-03-19 | Srinivas Bangalore | System And Method Of Generating Responses To Text-Based Messages |
US8214196B2 (en) * | 2001-07-03 | 2012-07-03 | University Of Southern California | Syntax-based statistical translation model |
US8296127B2 (en) * | 2004-03-23 | 2012-10-23 | University Of Southern California | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US8412645B2 (en) * | 2008-05-30 | 2013-04-02 | International Business Machines Corporation | Automatic detection of undesirable users of an online communication resource based on content analytics |
-
2007
- 2007-06-22 US US11/767,104 patent/US8527262B2/en active Active
Patent Citations (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4964044A (en) * | 1986-05-20 | 1990-10-16 | Kabushiki Kaisha Toshiba | Machine translation system including semantic information indicative of plural and singular terms |
US5099425A (en) * | 1988-12-13 | 1992-03-24 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for analyzing the semantics and syntax of a sentence or a phrase |
US5299125A (en) * | 1990-08-09 | 1994-03-29 | Semantic Compaction Systems | Natural language processing system and method for parsing a plurality of input symbol sequences into syntactically or pragmatically correct word messages |
US5640501A (en) * | 1990-10-31 | 1997-06-17 | Borland International, Inc. | Development system and methods for visually creating goal oriented electronic form applications having decision trees |
US5680511A (en) * | 1995-06-07 | 1997-10-21 | Dragon Systems, Inc. | Systems and methods for word recognition |
US5875334A (en) * | 1995-10-27 | 1999-02-23 | International Business Machines Corporation | System, method, and program for extending a SQL compiler for handling control statements packaged with SQL query statements |
US6067520A (en) * | 1995-12-29 | 2000-05-23 | Lee And Li | System and method of recognizing continuous mandarin speech utilizing chinese hidden markou models |
US5995922A (en) * | 1996-05-02 | 1999-11-30 | Microsoft Corporation | Identifying information related to an input word in an electronic dictionary |
US5835888A (en) * | 1996-06-10 | 1998-11-10 | International Business Machines Corporation | Statistical language model for inflected languages |
US6415250B1 (en) * | 1997-06-18 | 2002-07-02 | Novell, Inc. | System and method for identifying language using morphologically-based techniques |
US6292771B1 (en) * | 1997-09-30 | 2001-09-18 | Ihc Health Services, Inc. | Probabilistic method for natural language processing and for encoding free-text data into a medical database by utilizing a Bayesian network to perform spell checking of words |
US7024399B2 (en) * | 1999-04-05 | 2006-04-04 | American Board Of Family Practice, Inc. | Computer architecture and process of patient generation, evolution, and simulation for computer based testing system using bayesian networks as a scripting language |
US7080062B1 (en) * | 1999-05-18 | 2006-07-18 | International Business Machines Corporation | Optimizing database queries using query execution plans derived from automatic summary table determining cost based queries |
US6594783B1 (en) * | 1999-08-27 | 2003-07-15 | Hewlett-Packard Development Company, L.P. | Code verification by tree reconstruction |
US6823325B1 (en) * | 1999-11-23 | 2004-11-23 | Trevor B. Davies | Methods and apparatus for storing and retrieving knowledge |
US20010014899A1 (en) * | 2000-02-04 | 2001-08-16 | Yasuyuki Fujikawa | Structural documentation system |
US6477488B1 (en) * | 2000-03-10 | 2002-11-05 | Apple Computer, Inc. | Method for dynamic context scope selection in hybrid n-gram+LSA language modeling |
US20020152202A1 (en) * | 2000-08-30 | 2002-10-17 | Perro David J. | Method and system for retrieving information using natural language queries |
US20030033284A1 (en) * | 2000-11-13 | 2003-02-13 | Peter Warren | User defined view selection utility |
US20030055754A1 (en) * | 2000-11-30 | 2003-03-20 | Govone Solutions, Lp | Method, system and computer program product for facilitating a tax transaction |
US20020164006A1 (en) * | 2001-05-04 | 2002-11-07 | Weiss Lewis E. | Electronic document call back system |
US20020198810A1 (en) * | 2001-06-11 | 2002-12-26 | Donatien Roger | Online creation and management of enterprises |
US8214196B2 (en) * | 2001-07-03 | 2012-07-03 | University Of Southern California | Syntax-based statistical translation model |
US20030018469A1 (en) * | 2001-07-20 | 2003-01-23 | Humphreys Kevin W. | Statistically driven sentence realizing method and apparatus |
US20030036912A1 (en) * | 2001-08-15 | 2003-02-20 | Sobotta Thu Dang | Computerized tax transaction system |
US20030061022A1 (en) * | 2001-09-21 | 2003-03-27 | Reinders James R. | Display of translations in an interleaved fashion with variable spacing |
US20030182102A1 (en) * | 2002-03-20 | 2003-09-25 | Simon Corston-Oliver | Sentence realization model for a natural language generation system |
US7526424B2 (en) * | 2002-03-20 | 2009-04-28 | Microsoft Corporation | Sentence realization model for a natural language generation system |
US20030182631A1 (en) * | 2002-03-22 | 2003-09-25 | Xerox Corporation | Systems and methods for determining the topic structure of a portion of text |
US20040078271A1 (en) * | 2002-10-17 | 2004-04-22 | Ubs Painewebber Inc. | Method and system for tax reporting |
US20040230415A1 (en) * | 2003-05-12 | 2004-11-18 | Stefan Riezler | Systems and methods for grammatical text condensation |
US7475015B2 (en) * | 2003-09-05 | 2009-01-06 | International Business Machines Corporation | Semantic language modeling and confidence measurement |
US7421386B2 (en) * | 2003-10-23 | 2008-09-02 | Microsoft Corporation | Full-form lexicon with tagged data and methods of constructing and using the same |
US20050192807A1 (en) * | 2004-02-26 | 2005-09-01 | Ossama Emam | Hierarchical approach for the statistical vowelization of Arabic text |
US8296127B2 (en) * | 2004-03-23 | 2012-10-23 | University Of Southern California | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US20060116860A1 (en) * | 2004-11-30 | 2006-06-01 | Xerox Corporation | Systems and methods for user-interest sensitive condensation |
US20060204945A1 (en) * | 2005-03-14 | 2006-09-14 | Fuji Xerox Co., Ltd. | Question answering system, data search method, and computer program |
US20060212859A1 (en) * | 2005-03-18 | 2006-09-21 | Microsoft Corporation | System and method for generating XML-based language parser and writer |
US20060235689A1 (en) * | 2005-04-13 | 2006-10-19 | Fuji Xerox Co., Ltd. | Question answering system, data search method, and computer program |
US20090076795A1 (en) * | 2007-09-18 | 2009-03-19 | Srinivas Bangalore | System And Method Of Generating Responses To Text-Based Messages |
US8412645B2 (en) * | 2008-05-30 | 2013-04-02 | International Business Machines Corporation | Automatic detection of undesirable users of an online communication resource based on content analytics |
Cited By (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8849650B2 (en) * | 2008-10-22 | 2014-09-30 | Sankhya Technologies Private Limited | System and method for automatically generating sentences of a language |
US20110257962A1 (en) * | 2008-10-22 | 2011-10-20 | Kumar Bulusu Gopi | System and method for automatically generating sentences of a language |
US20100250335A1 (en) * | 2009-03-31 | 2010-09-30 | Yahoo! Inc | System and method using text features for click prediction of sponsored search advertisements |
US20110123967A1 (en) * | 2009-11-24 | 2011-05-26 | Xerox Corporation | Dialog system for comprehension evaluation |
US20140180692A1 (en) * | 2011-02-28 | 2014-06-26 | Nuance Communications, Inc. | Intent mining via analysis of utterances |
US9269353B1 (en) * | 2011-12-07 | 2016-02-23 | Manu Rehani | Methods and systems for measuring semantics in communications |
US20130311166A1 (en) * | 2012-05-15 | 2013-11-21 | Andre Yanpolsky | Domain-Specific Natural-Language Processing Engine |
US8839201B2 (en) | 2012-10-12 | 2014-09-16 | Vmware, Inc. | Capturing test data associated with error conditions in software item testing |
US9684587B2 (en) | 2012-10-12 | 2017-06-20 | Vmware, Inc. | Test creation with execution |
US8839202B2 (en) | 2012-10-12 | 2014-09-16 | Vmware, Inc. | Test environment managed within tests |
US8949794B2 (en) * | 2012-10-12 | 2015-02-03 | Vmware, Inc. | Binding a software item to a plain english control name |
US9069902B2 (en) | 2012-10-12 | 2015-06-30 | Vmware, Inc. | Software test automation |
US10387294B2 (en) | 2012-10-12 | 2019-08-20 | Vmware, Inc. | Altering a test |
US20140109058A1 (en) * | 2012-10-12 | 2014-04-17 | Vmware,Inc. | Test language interpreter |
US9292422B2 (en) | 2012-10-12 | 2016-03-22 | Vmware, Inc. | Scheduled software item testing |
US9292416B2 (en) | 2012-10-12 | 2016-03-22 | Vmware, Inc. | Software development kit testing |
US10067858B2 (en) | 2012-10-12 | 2018-09-04 | Vmware, Inc. | Cloud-based software testing |
US9152623B2 (en) | 2012-11-02 | 2015-10-06 | Fido Labs, Inc. | Natural language processing system and method |
US20140200879A1 (en) * | 2013-01-11 | 2014-07-17 | Brian Sakhai | Method and System for Rating Food Items |
US20160299884A1 (en) * | 2013-11-11 | 2016-10-13 | The University Of Manchester | Transforming natural language requirement descriptions into analysis models |
US20180011830A1 (en) * | 2015-01-23 | 2018-01-11 | National Institute Of Information And Communications Technology | Annotation Assisting Apparatus and Computer Program Therefor |
US10157171B2 (en) * | 2015-01-23 | 2018-12-18 | National Institute Of Information And Communications Technology | Annotation assisting apparatus and computer program therefor |
US10606946B2 (en) * | 2015-07-06 | 2020-03-31 | Microsoft Technology Licensing, Llc | Learning word embedding using morphological knowledge |
US9633008B1 (en) * | 2016-09-29 | 2017-04-25 | International Business Machines Corporation | Cognitive presentation advisor |
CN108427667B (en) * | 2017-02-15 | 2021-08-10 | 北京国双科技有限公司 | Legal document segmentation method and device |
CN108427667A (en) * | 2017-02-15 | 2018-08-21 | 北京国双科技有限公司 | A kind of segmentation method and device of legal documents |
US20190074005A1 (en) * | 2017-09-06 | 2019-03-07 | Zensar Technologies Limited | Automated Conversation System and Method Thereof |
CN107818082B (en) * | 2017-09-25 | 2020-12-04 | 沈阳航空航天大学 | A Semantic Role Recognition Method Combined with Phrase Structure Tree |
CN107818082A (en) * | 2017-09-25 | 2018-03-20 | 沈阳航空航天大学 | With reference to the semantic role recognition methods of phrase structure tree |
US11151318B2 (en) | 2018-03-03 | 2021-10-19 | SAMURAI LABS sp. z. o.o. | System and method for detecting undesirable and potentially harmful online behavior |
US10956670B2 (en) | 2018-03-03 | 2021-03-23 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
US11663403B2 (en) | 2018-03-03 | 2023-05-30 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
US11507745B2 (en) | 2018-03-03 | 2022-11-22 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
US20200074322A1 (en) * | 2018-09-04 | 2020-03-05 | Rovi Guides, Inc. | Methods and systems for using machine-learning extracts and semantic graphs to create structured data to drive search, recommendation, and discovery |
US20200134018A1 (en) * | 2018-10-25 | 2020-04-30 | Srivatsan Laxman | Mixed-initiative dialog automation with goal orientation |
US10853579B2 (en) * | 2018-10-25 | 2020-12-01 | Srivatsan Laxman | Mixed-initiative dialog automation with goal orientation |
US11182560B2 (en) | 2019-02-15 | 2021-11-23 | Wipro Limited | System and method for language independent iterative learning mechanism for NLP tasks |
CN109994103A (en) * | 2019-03-26 | 2019-07-09 | 北京博瑞彤芸文化传播股份有限公司 | A kind of training method of intelligent semantic Matching Model |
US11210472B2 (en) | 2019-05-08 | 2021-12-28 | Tata Consultancy Services Limited | Automated extraction of message sequence chart from textual description |
US12106221B2 (en) | 2019-06-13 | 2024-10-01 | International Business Machines Corporation | Predicting functional tags of semantic role labeling |
CN110634336A (en) * | 2019-08-22 | 2019-12-31 | 北京达佳互联信息技术有限公司 | Method and device for generating audio electronic book |
WO2021147875A1 (en) * | 2020-01-20 | 2021-07-29 | 华为技术有限公司 | Text screening method and apparatus |
US11176329B2 (en) | 2020-02-18 | 2021-11-16 | Bank Of America Corporation | Source code compiler using natural language input |
US11657232B2 (en) | 2020-02-18 | 2023-05-23 | Bank Of America Corporation | Source code compiler using natural language input |
US11657151B2 (en) | 2020-02-18 | 2023-05-23 | Bank Of America Corporation | System and method for detecting source code anomalies |
US11250128B2 (en) | 2020-02-18 | 2022-02-15 | Bank Of America Corporation | System and method for detecting source code anomalies |
US11568153B2 (en) | 2020-03-05 | 2023-01-31 | Bank Of America Corporation | Narrative evaluator |
US11657229B2 (en) | 2020-05-19 | 2023-05-23 | International Business Machines Corporation | Using a joint distributional semantic system to correct redundant semantic verb frames |
US11822589B2 (en) | 2020-09-16 | 2023-11-21 | L&T Technology Services Limited | Method and system for performing summarization of text |
DE102023205209A1 (en) | 2022-06-30 | 2024-01-04 | Bosch Global Software Technologies Private Limited | Control unit for assigning at least one element of a plurality of documents and methods therefor |
Also Published As
Publication number | Publication date |
---|---|
US20080319735A1 (en) | 2008-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8527262B2 (en) | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications | |
Lin et al. | ASRNN: A recurrent neural network with an attention model for sequence labeling | |
US12222970B2 (en) | Generative event extraction method based on ontology guidance | |
Loftsson | Tagging Icelandic text: A linguistic rule-based approach | |
Megyesi | Data-driven syntactic analysis | |
Mohammed | Using machine learning to build POS tagger for under-resourced language: the case of Somali | |
Megyesi | Shallow parsing with PoS taggers and linguistic features | |
Lata et al. | Mention detection in coreference resolution: survey | |
Almanea | Automatic methods and neural networks in Arabic texts diacritization: a comprehensive survey | |
Feldman et al. | A resource-light approach to morpho-syntactic tagging | |
Comas et al. | Sibyl, a factoid question-answering system for spoken documents | |
Jayasuriya et al. | Learning a stochastic part of speech tagger for sinhala | |
Wilson | Toward automatic processing of English metalanguage | |
Ramesh et al. | Interpretable natural language segmentation based on link grammar | |
Janicki | Statistical and Computational Models for Whole Word Morphology | |
Lief | Deep contextualized word embeddings from character language models for neural sequence labeling | |
Lee et al. | Interlingua-based English–Korean two-way speech translation of Doctor–Patient dialogues with CCLINC | |
Hillard | Automatic sentence structure annotation for spoken language processing | |
Quecedo | Neural models for unsupervised disambiguation in morpho-logically rich languages | |
Marin | Effective use of cross-domain parsing in automatic speech recognition and error detection | |
Ruzsics | Multi-level Modelling for Upstream Text Processing | |
Nabende | Applying dynamic Bayesian Networks in transliteration detection and generation | |
Morley et al. | Data Driven Grammatical Error Detection in Transcripts of Children’s Speech | |
Hegde et al. | Syllable‐Level Morphological Segmentation of Kannada and Tulu Words | |
Muradovich et al. | APPLICATION OF THE N-GRAM MODEL TO THE KARAKALPAK |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KAMBHATLA, NANDAKISHORE;ZITOUNI, IMOD;REEL/FRAME:020110/0817;SIGNING DATES FROM 20070814 TO 20070816 Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KAMBHATLA, NANDAKISHORE;ZITOUNI, IMOD;SIGNING DATES FROM 20070814 TO 20070816;REEL/FRAME:020110/0817 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |