WO2005122144A1 - Dispositif de reconnaissance vocale, méthode et programme de reconnaissance vocale - Google Patents

Dispositif de reconnaissance vocale, méthode et programme de reconnaissance vocale Download PDF

Info

Publication number
WO2005122144A1
WO2005122144A1 PCT/JP2005/010183 JP2005010183W WO2005122144A1 WO 2005122144 A1 WO2005122144 A1 WO 2005122144A1 JP 2005010183 W JP2005010183 W JP 2005010183W WO 2005122144 A1 WO2005122144 A1 WO 2005122144A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
unregistered
unregistered word
speech recognition
search
Prior art date
Application number
PCT/JP2005/010183
Other languages
English (en)
Japanese (ja)
Inventor
Yoshiyuki Okimoto
Tsuyoshi Inoue
Takashi Tsuzuki
Original Assignee
Matsushita Electric Industrial Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co., Ltd. filed Critical Matsushita Electric Industrial Co., Ltd.
Priority to JP2006514478A priority Critical patent/JP4705023B2/ja
Priority to US11/628,887 priority patent/US7813928B2/en
Publication of WO2005122144A1 publication Critical patent/WO2005122144A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results

Definitions

  • Speech recognition apparatus speech recognition method, and program
  • the present invention relates to a speech recognition device used for a man-machine interface based on speech recognition, and more particularly to a response technique for unregistered word utterances.
  • Patent Document 1 the similarity between the input speech and each word in the speech recognition dictionary is obtained, and the reference similarity is obtained by combining the unit standard pattern with each reference word. If the similarity is corrected and the corrected similarity does not reach a certain threshold, the user's utterance is regarded as an unregistered word, and a method is described.
  • Patent Document 2 describes a method for detecting unregistered words with a small amount of processing and high accuracy using a phoneme HMM (Hidden Markov Model) and a garbage HMM. It has been done.
  • a phoneme HMM Hidden Markov Model
  • Patent Document 3 when the utterance of an unregistered word is detected by the user, a method of presenting a list of words that can be accepted by the device according to the situation to the user Is written. According to this, even if the user does not know the word recognized by the device, every time he / she utters an unregistered word, he / she is taught a word that can be spoken in that situation. It is possible to realize the desired operation without repeating the above.
  • Patent Document 4 speech recognition is performed by combining an internal dictionary corresponding to a conventional speech recognition dictionary and an external dictionary that stores a large number of unregistered words in the conventional speech recognition dictionary.
  • a method is described in which speech recognition is performed as a dictionary for use, and when a word included in an external dictionary is recognized as a recognition result, it is simultaneously indicated that this is an unregistered word.
  • a response such as “Taro Matsushita does not exist” becomes possible.
  • Patent Document 1 Japanese Patent No. 2808906
  • Patent Document 2 Japanese Patent No. 2886117
  • Patent Document 3 Japanese Patent No. 3468572
  • Patent Document 4 Japanese Patent Laid-Open No. 9-230889
  • Non-Patent Document 1 Kiyohiro Shikano, Satoshi Nakamura, Shiro Ise, “Digital Signal Processing Series 5: Digital Signal Processing of Voice / Sound Information”, Shosodo, November 10, 1997, p. 45, 53
  • the present invention has been made in view of a serious problem, and an object of the present invention is to provide a speech recognition device that can reduce the situation in which a user tries a useless recurrent speech.
  • Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results.
  • the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit. The stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.
  • the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in a speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
  • an utterance of an unregistered word that causes voice recognition by the user to fail is presented to the user, and at the same time, it is also presented to the user that it is not due to a recognition error. Can do.
  • the recognition rate for the utterance of a word in the speech recognition dictionary which is the original purpose, cannot be reduced.
  • the unregistered word word storage means for searching for unregistered word candidates is very large and always requires maintenance. By separating this function from a server as a function, It is possible to reduce the manufacturing cost of the device and at the same time to reduce the maintenance cost of the unregistered word storage means.
  • FIG. 2 is a flowchart showing the operation of the speech recognition apparatus according to the first embodiment.
  • FIG. 3 is a diagram showing an output example of a speech recognition unit when a recognized vocabulary is uttered according to the first embodiment.
  • FIG. 4 is a diagram showing an output example of a reference similarity calculation unit when a recognized vocabulary is uttered according to Embodiment 1.
  • FIG. 5 is a diagram showing a result display example when a recognized vocabulary is uttered according to the first embodiment.
  • FIG. 6 is a diagram showing an output example of a speech recognition unit when an unregistered word is uttered according to the first embodiment.
  • FIG. 7 is a diagram showing an output example of a reference similarity calculation unit when an unregistered word is uttered according to the first embodiment.
  • FIG. 8 is a diagram showing an output example of an unregistered word candidate search unit according to the first embodiment.
  • FIG. 10 is a diagram showing a calculation method of similarity between phoneme sequences at the time of unregistered word search according to the first embodiment.
  • FIG. 11 is a block diagram showing a functional configuration of the unknown utterance detection apparatus.
  • FIG. 12 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 2 of the present invention.
  • FIG. 13 is a diagram showing an example of unregistered word categories according to the second embodiment.
  • FIG. 14 is a block diagram showing a functional configuration of an unregistered word class determination unit using a class N-gram language model.
  • FIG. 17 is a diagram showing an example of a class N-gram language model for unregistered word class determination according to the second embodiment.
  • FIG. 18 is a diagram showing a display example of a result when an unregistered word of a different class according to the second embodiment is uttered.
  • FIG. 19 is a diagram showing a configuration of an unregistered word class determination unit that acquires information for external application force unregistered word class determination according to the second embodiment.
  • FIG. 20 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 3 of the present invention.
  • FIG. 21 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 4 of the present invention. Explanation of symbols
  • a speech recognition device is a speech recognition device that recognizes spoken speech, defines a vocabulary for speech recognition, and stores it as a registered word.
  • speech recognition means for verifying the spoken speech and registered words stored in the speech recognition word storage means, and based on the verification result of the speech recognition means
  • Unspoken word determining means for determining whether the spoken speech is a registered word stored in the speech recognition word storage means and stored, unregistered word;
  • the unregistered word storage means for storing the unregistered word, and the spoken voice based on the spoken voice when the unregistered word determining means determines that it is an unregistered word.
  • Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results.
  • the unregistered word candidate search means may search a plurality of unregistered word candidates from unregistered words stored in the unregistered word word storage means.
  • the unregistered word storage means preferably stores the unregistered words classified according to the category according to the category to which the unregistered word belongs.
  • the speech recognition apparatus further includes unregistered word class determining means for determining a category to which the unregistered word belongs based on the spoken speech, and the unregistered word candidate search means includes the unregistered word More preferably, the unregistered word candidate is searched from the classified categories in the unregistered word word storage unit based on the determination result of the class determining unit.
  • the speech recognition apparatus further includes an information acquisition unit that acquires information about the category, and the unregistered word candidate search unit is based on the information acquired by the information acquisition unit.
  • the registered word candidates may also be searched for the categorized category in the unregistered word word storage means.
  • the unregistered word candidate search means searches for the unregistered word candidate by calculating an unregistered word score obtained by quantifying the degree of similarity with the spoken speech,
  • the display unit displays the unregistered word candidate and its unregistered word score as the search result, and the result display unit displays the unregistered word according to the unregistered word score. I prefer to change the display of the candidate.
  • the unregistered word candidates are quantified, and the unregistered word candidates are emphasized by emphasizing the most likely unregistered word candidates. There is an effect that it can be presented easily.
  • the unregistered words stored in the unregistered word storage means may be updated under a predetermined condition.
  • the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit.
  • the stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.
  • the registered words may be updated under predetermined conditions.
  • the present invention can be realized as a voice recognition system that can be realized as such a voice recognition apparatus. That is, a speech recognition system for recognizing spoken speech, the speech recognition system being registered in the speech recognition device for recognizing spoken speech and the speech recognition device! An unregistered word search server for searching words, wherein the speech recognition device defines a vocabulary for speech recognition and stores speech recognition word storage means as a registered word, and the spoken speech, Speech recognition means for collating a registered word stored in the speech recognition word storage means; Based on the collation result of the speech recognition means, whether the spoken speech is a registered word that is stored in the speech recognition word storage means and whether it is a stored unregistered word.
  • the unregistered search server When the unregistered word determining means and the unregistered word determining means determine that it is an unregistered word, the unregistered search server is searched for an unregistered word candidate that seems to correspond to the spoken speech.
  • Search request transmitting means for requesting, search result receiving means for acquiring a search result of the unregistered word candidate from the unregistered word search server, and result display means for displaying the search result, and the unregistered word
  • the search server includes an unregistered word storage unit that stores the unregistered word, a search request reception unit that receives the search request from the search request transmission unit, and the search request reception unit that has received the search request.
  • the apparatus may further include a search unit and a search result transmission unit that transmits the search result to the voice recognition device.
  • the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in the speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
  • FIG. 1 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 1 of the present invention.
  • the speech recognition apparatus 100 shown in FIG. 1 is used as one of the man 'machine' interfaces.
  • User power is a device that accepts speech input and outputs the recognition result of the input speech.
  • speech recognition unit 101 speech recognition vocabulary storage unit 102, reference similarity calculation unit 103, unregistered word determination unit 104, An unregistered word candidate search unit 105, an unregistered word word storage unit 106, and a result display unit 107 are provided.
  • the speech recognition unit 101 is a processing unit that captures input speech and recognizes the utterance content.
  • the voice recognition vocabulary storage unit 102 is a storage device such as a hard disk that defines and stores the vocabulary recognized by the voice recognition unit 101.
  • This speech recognition vocabulary storage unit 102 stores a standard acoustic pattern of each word as a standard pattern, or a representation of the acoustic pattern of each word in a model called HMM (Hidden Markov Model) or -Uralnet. And!
  • This speech recognition vocabulary storage unit 102 stores a standard pattern expressing a pattern for each shorter acoustic unit or a model expressed by a model such as an HMM or -Ural net.
  • a word pattern or a word model is synthesized and provided to the speech recognition unit 101.
  • the reference similarity calculation unit 103 is a processing unit that calculates a reference similarity used to determine whether or not the input speech is an unregistered word. This reference similarity calculation unit 103 searches for a subword sequence having the highest similarity to the input speech by arbitrarily combining patterns or models of acoustic units shorter than words called subwords, and has the maximum similarity. Ask for.
  • the unregistered word determination unit 104 determines whether or not the user's utterance content is an unregistered word based on the results of both the voice recognition unit 101 and the reference similarity calculation unit 103. This unregistered word determination unit 104, when the utterance content of the user is a word stored in the speech recognition vocabulary storage unit 102, that is, a registered word, results in a recognition result that the utterance content has been recognized. If the word is output to the display unit 107 and is not stored in the speech recognition vocabulary storage unit 102, that is, an unregistered word, the determination result that the utterance content is an unregistered word is displayed as an unregistered word. Output to candidate search section 105.
  • the unregistered word candidate search unit 105 is a processing unit that searches for unregistered word words from the utterance content when the user's utterance content is determined to be an unregistered word.
  • the unregistered word storage unit 106 is a storage device such as a hard disk that stores a large number of words to be searched for unregistered words in the unregistered word candidate search unit 105.
  • the unregistered word candidate search unit 105 searches for an unregistered word word from a very large number of vocabularies stored in the unregistered word word storage unit 106, and will be described later. Thus, it is preferable to perform a search using a simpler and faster method (that is, shorter calculation time), which is different from the speech recognition unit 101.
  • the result display unit 107 is a display device such as a CRT display or a liquid crystal display, and shows a screen showing a recognition result output from the unregistered word determination unit 104, a determination result, and a search result of unregistered word words. By displaying the screen, it is shown to the user whether the content of the user's utterance has been recognized or whether it is an unregistered word.
  • FIG. 2 is a flowchart showing the processing operation of the speech recognition apparatus 100.
  • the speech recognition unit 101 when the speech recognition apparatus 100 receives an input of speech uttered by the user (S10), the speech recognition unit 101 resembles the input speech from the speech recognition vocabulary storage unit 102 based on the input speech.
  • the word to be recognized is recognized (S12). More specifically, the speech recognition unit 101 compares the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 with the input speech, and inputs speech for each word. Similarities are calculated, and those with high similarity are extracted as candidates. At this time, the speech recognition apparatus 100 searches the sub-word sequence closest to the input speech in the reference similarity calculation unit 103, and obtains the similarity as the reference similarity (S14).
  • the speech recognition apparatus 100 uses the unregistered word determination unit 104 to refer to the similarity of the first candidate word (the candidate word having the highest similarity) obtained by the speech recognition unit 101, and the reference
  • the similarity calculation unit 103 compares the reference similarity obtained and determines whether the comparison result is within a predetermined threshold (S16).
  • the predetermined threshold here is a threshold for determining whether the utterance content of the user is a registered word or an unregistered word. Using a large number of sampled utterances, the similarity by each of the speech recognition unit 101 and the reference similarity calculation unit 103 is obtained, and these statistical distribution power optimum threshold values are determined.
  • the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 are within a threshold that is statistically determined in advance. If there is (Yes in S16), it is determined that the utterance content of the user is a word (registered word) included in the speech recognition vocabulary storage unit 102 (S18). Thereafter, the speech recognition apparatus 100 presents the recognition result to the user via the result display unit 107 (S26), and ends the processing operation.
  • the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 exceed a statistically predetermined threshold. (No in S16), the user's utterance content is determined to be a word (unregistered word) that is not included in the speech recognition vocabulary storage unit 102 (S20), and the determination result is stored as an unregistered word candidate search unit. Output to 105
  • the speech recognition apparatus 100 uses the unregistered word candidate search unit 105 to register the utterance content based on the utterance content.
  • a word word is searched (S22).
  • the unregistered word candidate search unit 105 compares the subword sequence obtained by the reference similarity calculation unit 103 with each of the many unregistered word words stored in the unregistered word word storage unit 106.
  • an unregistered word score which is a score related to the similarity, is obtained to search for an unregistered word having a high score, that is, an unregistered word that seems to be the user's utterance content.
  • the unregistered word candidate search unit 105 extracts a plurality of unregistered word candidates that are considered to be the utterance contents of the user, for example, in descending order from the highest score (S24), together with the unregistered word score. Output to the result display section 107.
  • the speech recognition apparatus 100 presents the determination result, the extracted unregistered word candidate and the unregistered word score to the user via the result display unit 107 (S26), and performs the processing operation. finish.
  • the speech recognition device 100 defines a word to be recognized, that is, a speech recognition vocabulary according to an application that uses the speech recognition device 100 as an input device for a man-machine interface.
  • a word to be recognized that is, a speech recognition vocabulary according to an application that uses the speech recognition device 100 as an input device for a man-machine interface.
  • the name of the program to be searched the name of a performer who becomes a key in the search, and the like are defined as the voice recognition vocabulary.
  • the speech recognition apparatus 100 displays differently depending on whether the utterance content by the user is a word included in the speech recognition vocabulary storage unit 102 or not. That is, when the utterance content is a word included in the speech recognition vocabulary storage unit 102, as described above, the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 and The speech recognition unit 101 performs collation with the input speech, calculates the similarity for each word, obtains the top candidates in descending order from the highest similarity, and outputs the content to the result display unit 107.
  • FIG. 3 shows an example in which a user utters “Matsushita Taro” on the assumption that “Taro Matsushita” t ⁇ ⁇ word exists in the speech recognition vocabulary storage unit 102. .
  • the reference similarity calculation unit 103 searches for a subword sequence closest to the input speech and obtains the similarity as the reference similarity.
  • FIG. 4 shows an output example of the reference similarity calculation unit 103 for the utterance “Matsushita Taro” by the user.
  • the difference between the similarity “2041” of the first candidate and the reference similarity “2225” is a statistically calculated threshold (for example, “200 Therefore, the unregistered word determination unit 104 determines that the user's utterance content is a registered word. Since the judgment result of the utterance content is not correct for unregistered words, in this case, the unregistered word candidate search unit 105 outputs the recognition result as it is to the result display unit 107 without performing the unregistered word search. “Taro Matsushita” is correctly displayed as a recognition result via the display unit 107.
  • FIG. 5 shows an example of the result display in the result display unit 107.
  • the speech recognition unit 101 collates with each word stored in the speech recognition vocabulary storage unit 102, The similarity is calculated for each word, and the top candidates are output in descending order from the highest similarity.
  • the utterance content is a word that is not included in the speech recognition vocabulary, there is no word that matches the utterance content among these candidates, so an output example is shown in Fig. 6. It will be like that.
  • the speech utterance content of the user is “Matsushita Taro”. In other words, “Taro Matsushita” t, the word is included, and, as! /
  • the reference similarity calculation unit 103 searches for a subword sequence most similar to the input speech and calculates the similarity. This is whether the utterance content is included in the speech recognition vocabulary. It is not affected at all by no. As a result, the output of the reference similarity calculation unit 103 is the same as the output example (see FIG. 4) when the utterance content is included in the speech recognition vocabulary as shown in FIG.
  • the unregistered word determination unit 104 compares the similarity of the first candidate by the speech recognition unit 101 and the reference similarity by the reference similarity calculation unit 103 as described above. When the utterance content is not included in the speech recognition vocabulary, the similarity between the two is greatly different and the difference between them is greater than a predetermined threshold value. Based on this, the unregistered word determination unit 104 The content is determined as an unregistered word. For example, in the example shown in FIG. 6 and FIG. 7, the similarity “1431” of the first candidate in the speech recognition unit 101 and the reference similarity “2225” are greatly different, and the difference is predetermined. Therefore, the unregistered word determination unit 104 determines that the user's utterance content is an unregistered word.
  • the unregistered word candidate search unit 105 determines the subword sequence obtained in the reference similarity calculation unit 103. Are compared with each of a number of unregistered word words stored in the unregistered word word storage unit 106, and an unregistered word score, which is a score related to the similarity, is calculated.
  • the unregistered word candidate search unit 105 extracts the top five candidates from the unregistered word score in descending order of the unregistered word score, and outputs it to the result display unit 107 together with the unregistered word score.
  • FIG. 8 shows that the unregistered word candidate search unit 105 performs an unregistered word candidate search unit 105 based on the subword string “Matsushima Kanou” obtained by the reference similarity calculation unit 103 for the user's utterance “Matsushi Taro”. It is a figure which shows the example of the result of having searched the registration word.
  • “Taro Matsushita” is stored in the unregistered word storage unit 106.
  • the search result by the unregistered word candidate search unit 105 is sent to the result display unit 107 together with information that these words are unregistered words, and the user's utterance is recognized as an unregistered word. This is communicated to the user.
  • the result shown in Fig. 9 is output. Is done. The user who sees the recognition result in the format illustrated in Fig. 9 can know at a glance that his / her utterance content was unknown to the system.
  • the unregistered word candidate search unit 105 has a high accuracy in search accuracy. Not only is it required, it can also be a great merit, such as keeping the hardware resources to achieve the search accuracy low. However, even if the search accuracy is not so high, by displaying multiple candidates, the words spoken by the user will be included in it with a high probability. Since the word is an unregistered word, it is useful to know that it is useless even if you try to speak repeatedly.
  • unregistered word candidate search section 105 in the first embodiment, a value based on phoneme editing distance is used as a search method for unregistered word candidates.
  • the unregistered word candidate search unit 105 the phoneme symbol string representation of the subword sequence obtained by the reference similarity calculation unit 103 and the phoneme symbol string of the word stored in the unregistered word word storage unit 106 are used.
  • the edit distance is calculated as described above, and the normalized length is subtracted from 1 to obtain the unregistered word score.
  • the unregistered word candidate search unit 105 performs this process for all the words stored in the unregistered word word storage unit 106, and extracts the unregistered word candidates from the words with the highest unregistered word score in descending order. The result is output to the result display unit 107.
  • the diagram shown in FIG. 8 is an example of the unregistered word candidate and the unregistered word score obtained in this way.
  • the advantage of realizing the unregistered word search method by comparing the phoneme sequences in this way is that the entire search for the unregistered word word storage unit 106 in which a large number of words are stored is performed with a light process.
  • unregistered word candidates are searched for and displayed to the user in a short time to give the user a light feeling of use. It is out.
  • the reference similarity calculation unit 103 is provided for unregistered word determination, but this is not an essential requirement. It is also possible to use other methods for determining unregistered words.
  • an unknown utterance detection apparatus as shown in FIG. Use it for the purpose.
  • FIG. 11 is a block diagram showing a functional configuration of the unknown utterance detection apparatus.
  • the speech segment pattern storage unit 111 stores speech segments of standard speech used for matching with the feature parameters of the input speech.
  • a speech segment is a VC pattern that concatenates the latter half of a vowel segment and the first half of a consonant segment that follows it, and the latter half of a consonant segment. It means a set of CV patterns connected to the first half of the vowel interval.
  • a speech piece is a set of phonemes that are almost equivalent to one letter of the alphabet when Japanese is written in Roman letters, and a mora that is almost equivalent to one letter of hiragana when Japanese is written in Hiragana.
  • a set of subwords meaning a chain of multiple mora, or a mixed set of these sets.
  • the word dictionary storage unit 112 stores a rule for synthesizing the word pattern of the speech recognition vocabulary by connecting the speech pieces.
  • the word matching unit 113 compares the input speech expressed in a time series of feature parameters with the synthesized word pattern, and calculates the likelihood corresponding to the similarity for each word.
  • the transition probability storage unit 114 stores a transition probability that expresses the naturalness of the connection as a continuous value when the speech pieces are arbitrarily combined.
  • the 2gram probability of phonemes is used as the transition probability.
  • the 2-gram probability of a phoneme means the probability P (y I X) that a phoneme y connects after the preceding phoneme x, and is obtained in advance using a large number of Japanese text data.
  • the transition probability may be a 2-gram probability of a mora, a 2-gram probability of a subword, or a 2-gram probability of a mixture of these, or a 3-gram probability other than a 2-gram probability. .
  • the speech sequence matching unit 115 considers the transition probability based on the likelihood of a pattern formed by arbitrarily combining the speech segment patterns and the input speech expressed as a time series of feature parameters. The maximum likelihood obtained is calculated.
  • the candidate score difference calculation unit 116 out of the likelihood of each word calculated by the word matching unit 113, the word that has the highest value (first candidate) and the word that has the next highest value (The difference between the likelihoods of the 2nd candidate) is calculated by normalizing the word length.
  • the candidate-phoneme sequence similarity calculation unit 117 calculates the distance between the first candidate phoneme sequence and the second candidate phoneme sequence in order to obtain the acoustic similarity between the first candidate and the second candidate. Calculate
  • Candidate 'speech sequence score difference calculation section 118 normalizes the difference between the likelihood of the first candidate and the reference likelihood calculated by speech sequence matching section 115 by the word length. calculate.
  • Candidate / speech sequence / phoneme sequence similarity calculation unit 119 calculates the acoustic similarity between the first candidate and the sequence determined as the optimal sequence by the speech sequence matching unit 115 as a distance between each phoneme sequence. Calculate as separation.
  • the unregistered word determination unit 104 includes the candidate score difference calculation unit 116, the candidate / phoneme sequence similarity calculation unit 117, the candidate, and the speech sequence.
  • the values obtained by the score difference calculation unit 118 and the candidate / speech sequence / phoneme sequence similarity calculation unit 119 are combined to determine whether or not the input speech is an unregistered word. In this way, the determination accuracy of unregistered words is improved by statistically combining a plurality of measures for detecting unregistered words.
  • the four scales are used as the scales used in the unregistered word determination unit 104. Besides this, the likelihood of each word candidate itself, its distribution, and the local score within the word section. It is also possible to use measures such as the amount of variation and the duration information of the phonemes that make up the word.
  • a linear discriminant obtained in advance using a large number of recognition result cases is used as a method for determining an unregistered word based on a plurality of scales.
  • learning machines such as neural networks, decision trees, and SVM (support 'vector' machines) is also effective.
  • the unregistered word candidate search unit 105 the unregistered word search method based on the edit distance between phoneme sequences has been described.
  • the definition of the edit distance between phonemes, insertion error, omission error, replacement Those obtained experimentally that not all errors are set to edit distance "1" It is also effective to use a continuous value based on the error occurrence probability as the distance.
  • data in the same format as the speech recognition vocabulary storage unit 102 is stored in the unregistered word word storage unit 106, and the unregistered word candidate search unit 105 is similar to the speech recognition unit 101. It is also possible to directly collate words from the input speech parameter and output unregistered word candidates and their unregistered word scores. With such a configuration, the resources required for unregistered word search increase, but the effect of improving the search accuracy for unregistered words can be obtained. Even in this case, the ⁇ effect and the ⁇ ⁇ effect that does not reduce the recognition rate for the target word, which is a feature of the present invention, are maintained.
  • the unregistered word storage unit 106 May be a word included in the speech recognition vocabulary storage unit 102, and instead, a word included in the speech recognition vocabulary storage unit 102 is searched by the unregistered word candidate search unit 105. In such a case, this may be excluded and output to the result display unit 107. By doing so, it becomes possible to determine the vocabulary of the unregistered word word storage unit 106 regardless of the contents of the speech recognition vocabulary storage unit 102, and the maintenance of the unregistered word word storage unit 106 becomes easy. An effect is obtained.
  • the unregistered word determination unit 104 determines whether or not an unregistered word word is included in the sentence utterance, and if it is included, in which position the unregistered word word is present. The required force and other operations are exactly the same.
  • the unregistered word candidate search unit 105 has been described as outputting five unregistered word candidates, but the unregistered word candidate search unit 105 has an unregistered word search accuracy. It is effective to change this according to the above, and the number of candidates to be output according to the similarity of each unregistered word may be made variable by the unregistered word candidate search unit 105. Therefore, depending on the search accuracy of the unregistered word candidate search unit 105 or the unregistered word score of the searched unregistered word, the number of output unregistered word candidates may be one. With this configuration, there is no unnecessary load when letting the user determine whether or not there is a spoken word in the candidate list. If you don't have to spend it on your users, you can get the effect.
  • the output example shown in Fig. 9 shows an example in which all unregistered word candidates are displayed in the same manner, but the result display unit 107 displays the unregistered word score of the unregistered word candidates. It is also possible to emphasize the candidate that seems to be the user's utterance content by changing the font size according to the situation, changing the font to bold, or changing the color. As a result, an effect of reducing the load on the user when searching for a spoken word from the list can be obtained.
  • FIG. 12 is a block diagram showing a functional configuration of the speech recognition apparatus according to the second embodiment.
  • the speech recognition apparatus 200 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105, And it is the same as the speech recognition apparatus 100 according to the first embodiment in that the result display unit 107 is provided.
  • the speech recognition apparatus 200 according to the second embodiment includes the unregistered word class determination unit 201 and the unregistered word class-specific word storage unit 202, and thus the speech recognition according to the first embodiment. Different from device 100. Hereinafter, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment, and description thereof is omitted.
  • the unregistered word class determination unit 201 determines what category the unregistered word belongs to based on the content of the utterance by the user and the usage status of the system. It is a processing part which performs.
  • the unregistered word class-specific word storage unit 202 is a storage device such as a hard disk that stores unregistered word words classified into categories.
  • the operation is the same as that shown in the first embodiment.
  • the reference similarity calculation unit 103 references Based on the similarity, the unregistered word determination unit 104 performs unregistered word determination.
  • the unregistered word class determination unit 201 determines in which category the unregistered word belongs.
  • the unregistered word category refers to a unique person name such as a celebrity name, a unique title name such as a program title, and a unique place name such as a “tour place” as shown in FIG. The method for determining the unregistered word category in the unregistered word class determining unit 201 will be described later.
  • the unregistered word candidate search unit 105 searches for an unregistered word. Done. At this time, the unregistered word candidate search unit 105 narrows down the search range of the unregistered word class-specific word storage unit 202 based on the class determination result by the unregistered word class determination unit 201 to search for unregistered words. Perform a search. When the speech recognition apparatus 200 acquires unregistered word candidates in this manner, the unregistered word candidate is presented to the user via the result display unit 107 as in the case of the first embodiment.
  • the unregistered word category can be determined from information before and after the unregistered word in the recognized sentence. For example, if the user ’s utterance is “I want to see the bans and threads in which ⁇ appears,” it is assumed that “ ⁇ ” is an unregistered word in the proper personal name class, and “ For the utterance “Record ⁇ ”, ⁇ is considered as an unregistered word in the program title class. In this way, a class N-gram language model including an unregistered word class can be used as a model for estimating the word class at the target location from the context before and after the sentence.
  • Figure 14 shows the functional configuration of the unregistered word class determination unit when using a class N-gram language model that includes unregistered word classes.
  • the unregistered word class determination unit 201a when using a class N gram language model, includes a word string hypothesis generation unit 211, a class N gram generation storage unit 221 and a class dependent A word N-gram generation / accumulation unit 231.
  • the word string hypothesis generation unit 211 refers to a class N gram that evaluates a sequence of words and unregistered word classes, and a class-dependent word N gram that evaluates a word string that forms an unregistered word class. A word string hypothesis is generated from the word matching result and a recognition result is acquired. [0116] The class N-gram generation / accumulation unit 221 generates a class N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a context including an unregistered word class, and generates the generated class N-gram. Accumulate ram.
  • the class-dependent word N-gram generation / accumulation unit 231 generates and generates a class-dependent word N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a word sequence in an unregistered word class. Class-dependent words N-grams are accumulated.
  • FIG. 15 shows a functional configuration of the class N gram generation / storage unit 221.
  • the class N gram generation / storage unit 221 includes a sentence expression corpus storage unit 222 in which a large number of sentence expressions to be recognized are stored in advance as text, and a sentence expression for morphological analysis of the sentence expression.
  • the sentence expression corpus accumulating unit 222 accumulates a large number of sentence expression data libraries to be recognized in advance.
  • the morpheme analysis unit 223 for sentence expression has a meaning from a relatively long sentence expression such as “Record the weather forecast for tomorrow” stored in the sentence expression corpus storage unit 222. Analyzes the morphemes that are the language units of.
  • the class N-gram generation unit 224 extracts a word string included in the text parsed into morphemes, refers to an unregistered word class input from the class-dependent word N-gram generation storage unit 231 described later, If the corresponding unregistered word class exists, the unregistered word class included in the text is replaced with a virtual word, and the statistic of the chain of words or unregistered word classes is obtained to determine the word or unregistered word class. A class N-gram that associates class chains with their probabilities is generated. The class N gram generated by the class N gram generation unit 224 is stored in the class N gram storage unit 225.
  • the class-dependent word N-gram generation / storage unit 231 includes a class corpus storage unit 232, a class morpheme analysis unit 233, a class-dependent word N-gram generation unit 234, and a class-dependent word.
  • the N-gram storage unit 235, the unregistered word class definition generation unit 236, and the unregistered word class definition storage unit 237 are configured.
  • the class corpus storage unit 232 stores in advance a data library of unregistered words (for example, a title of a TV program, a person's name, etc.) having the same semantic properties and syntactic properties.
  • the class morphological analyzer 233 performs morphological analysis on the class corpus.
  • the class morpheme analyzer 122 is an unregistered word having a relatively short common property such as a TV program name such as “MMM weather forecast” stored in the class cone storage 121. Are analyzed in morpheme units.
  • the class-dependent word N-gram generation unit 234 processes the morphological analysis results, obtains the statistic of the word chain, and obtains the class-dependent word N-drum, which is information associating the word string with its probability. Generate.
  • the class-dependent word N-gram storage unit 235 stores the class-dependent word N-gram generated by the class-dependent word N-gram generation unit 234.
  • the class-dependent word N-gram stored in the class-dependent word N-gram storage unit 235 is referred to by the word string hypothesis generation unit 211 during speech recognition.
  • the unregistered word class definition generation unit 236 generates a definition of an unregistered word class in which unregistered words having the same characteristics as the morphological analysis result power of the class corpus are defined as classes. In other words, morphological analysis is performed on unregistered words with common characteristics, and a class definition is generated in which the obtained word string is a word string of the unregistered word class.
  • the unregistered word class definition storage unit 237 stores the unregistered word class definition generated by the unregistered word class definition generation unit 236. This unregistered word class definition is referenced by the class N gram generation unit 224 of the class N gram generation storage unit 221 when generating the above class N gram.
  • P (C I C, ..., C) means the probability that an n-chain of word classes will occur.
  • J J c force also means the probability that a specific word w will occur.
  • a class means a group that takes into account the connectivity of the word, such as the part of speech of the word and the unit that further subdivides it.
  • the probability of occurrence of “recording 00” is that if “00” is considered a word of a unique program name class,
  • an example of a determination method using a class N-gram language model is shown as a method of determining an unregistered word category in the unregistered word class determination unit.
  • a method using the context information of the dialogue is possible.
  • the dialog management unit of the voice dialog system generates estimated information on word categories that the user is likely to utter from the dialog history information, and transmits this to the unregistered word class determination unit.
  • the unregistered word class determination unit determines the category of the unregistered word word from the estimated information regarding the word category transmitted.
  • FIG. 19 shows a block diagram of the unregistered word class determination unit in such a configuration.
  • the unregistered word class determination unit 201b is based on the category acquired by the word category information receiving unit 241 that acquires the category of the spoken word and the category acquired by the word category information receiving unit 241.
  • An unregistered word class determining unit 242 that determines a category of a word determined to be a registered word.
  • the effect of adopting such a configuration is that when the unregistered word category is determined using the class N-gram language model, the expected input utterance must be a sentence utterance.
  • estimation results from applications such as the dialog management unit it is possible to determine the category even if the input speech is word utterance.
  • FIG. 20 is a block diagram showing a functional configuration of the speech recognition apparatus according to the third embodiment.
  • the speech recognition apparatus 300 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105,
  • the speech recognition apparatus 300 according to the third embodiment includes the unregistered word word storage unit 301 connected to the unregistered word word server 303 via the network 302. Different from 100 etc. speech recognition device related to 1 etc. In the following, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment and the description thereof is omitted.
  • the unregistered word storage unit 301 has a function of storing a large number of unregistered words that are to be searched for unregistered words in the unregistered word candidate search unit 105, and at the same time updating the stored information by means of communication means. .
  • the network 302 is a communication network such as the Internet or a telephone line.
  • the unregistered word word server 303 stores the necessary latest unregistered word words, and provides a server device that provides such information to a client (in this case, the speech recognition device 300). It is.
  • the output flow of the speech recognition apparatus for the user's utterance in the third embodiment is the same as that shown in the first embodiment.
  • the difference in the third embodiment is in the maintenance method of the unregistered word word storage unit 301 referred to by the unregistered word candidate search unit 105.
  • the unregistered word storage unit 301 can be arbitrarily updated.
  • words that change and increase daily such as unique names and unique program names
  • the user's spoken words cannot be searched when searching for unregistered word candidates. For example, depending on the time of program modification in television broadcasting or the start of a new season in pro sports, New titles appear, new entertainers and new athlete names appear, and these become unregistered words.
  • the update operation of the words stored in the unregistered word storage unit 301 is specifically performed as follows.
  • An unregistered word storage unit 301 has been registered in advance to determine the number of days in which unregistered words are expected to increase rapidly. When this date arrives, an unregistered word update request is automatically sent to the telephone line. And transmitted to the unregistered word server 303 via the network 302 such as the Internet. Alternatively, an unregistered word storage unit 301 always performs an update request according to a predetermined schedule, and a user who feels that there is a shortage of unregistered word registration makes an update request. An update request is transmitted from the word storage unit 301 to the unregistered word server 303.
  • the unregistered word word storage unit 301 detects that a certain amount of unregistered words has been added just by actively transmitting an update request to the unregistered word word server 303 at all times.
  • the server 303 may transmit update information to the unregistered word storage unit 301 of each client.
  • the unregistered word word server 303 that has received the update request or has determined that the update is necessary because the new unregistered word has reached the specified amount, stores the information about the added word in the unregistered word storage unit 301 of the client. Reply to
  • the third embodiment it is dedicated to the operation of updating a word that is optimally maintained.
  • the unregistered word word storage unit 301 updates the word stored in accordance with the unregistered word word server 303, but it is specialized for the update work, and the word is updated using information held by the server. It is also possible.
  • EPG Electronic Program Guide
  • genres that the user is unlikely to utter for example, genre information such as professional baseball player names, foreign movie actor names, Japanese movie titles, etc. are extracted in advance.
  • the unregistered words of these extracted genres may not be acquired from the unregistered word word server 303. Thereby, it is possible to obtain an effect of preventing the unregistered word word storage unit 301 from being unnecessarily enlarged.
  • a modification in which the word stored in the speech recognition vocabulary storage unit 102 is updated is also conceivable.
  • a server provided outside the figure selects a word that is likely to be spoken by the user in the near future, and the speech recognition vocabulary storage unit 102 for the selected word. The contents of may be updated.
  • a word for example, when this audio recognition device 300 is applied to a recording reservation system, it is related to a program scheduled to be broadcast within one week from the performer names and program titles recorded in the EPG described above. Can be used appropriately.
  • the server generates information used by the speech recognition unit 101 for recognizing the extracted word, and updates the content of the speech recognition vocabulary storage unit 102 with the generated information.
  • Such an update operation can be performed in exactly the same manner as the operation of updating the contents of the unregistered word word storage unit 301 from the unregistered word word server 303 via the network 302.
  • the information for recognizing words related to programs scheduled to be broadcast in the past is deleted every day, and the information for recognizing words related to programs scheduled to be broadcast one week ahead is added. May be.
  • the recognition word speech recognition vocabulary
  • the recognition word is used to increase the frequency. Since only a relatively small number of information for recognition that is expected to be used can be stored in the speech recognition vocabulary storage unit 102, it is easy to shorten the recognition time and obtain a good recognition rate.
  • FIG. 21 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 4 of the present invention.
  • the speech recognition apparatus 400 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, and a result display unit 107.
  • the speech recognition apparatus 400 according to the fourth embodiment includes the unregistered word search request transmission / reception unit 401 connected to the unregistered word search server 403 via the network 402. It differs from the speech recognition apparatus 100 according to 1-3. Hereinafter, this difference will be mainly described. Note that the same reference numerals are given to the same portions as those in the first embodiment and the description thereof is omitted.
  • the unregistered word search request transmission / reception unit 401 transmits an unregistered word search request to the unregistered word search server 403 via the network 402, and the unregistered word search server 403 searches for the unregistered word.
  • a processing unit that receives the result, and is realized by a communication interface or the like.
  • This unregistered word search request transmission / reception unit 401 receives the subword sequence obtained by the reference similarity calculation unit 103 as described in Embodiment 1 and the input speech when search for an unregistered word is necessary.
  • the network 402 is a communication network such as the Internet or a telephone line.
  • the unregistered word search server 403 is a server device that searches for unregistered words in response to a request from the client (speech recognition device 400). And a storage unit 405.
  • the unregistered word search unit 404 is a processing unit that performs an unregistered word search, receives information about unregistered words from the client via the network 402, and receives the search result via the network 402. It also has a communication function for replying.
  • Unregistered word word storage section 405 is a storage device such as a hard disk that stores information related to unregistered word words.
  • the output flow of the speech recognition apparatus for the user's utterance is the same as that shown in the first embodiment.
  • the difference in the fourth embodiment is that the unregistered word candidate search unit 105 in the first embodiment is not provided inside, and the search operation for unregistered word candidates is outsourced to an external server.
  • the subword sequence of the unregistered word portion obtained by the reference similarity calculation unit 103 is transmitted / received as an unregistered word search request.
  • the unit 401 transmits the unregistered word search server 403.
  • the unregistered word search unit 404 that has received the sub-word sequence of the unregistered word portion from the client searches the unregistered word uttered by the user from the word group stored in the unregistered word word storage unit 405.
  • the method described with reference to FIG. 10 in Embodiment 1 described above is effective.
  • the search results obtained in this way are returned to the unregistered word search request transmission / reception unit 401 via the network 402 as unregistered word candidates.
  • the unregistered word search request transmission / reception unit 401 passes the returned unregistered word search result to the result display unit 107, and uses the fact that the word spoken by the user via the result display unit 107 was an unregistered word. Present to the person.
  • the server side can generally have a relatively large hardware configuration, it is difficult to install unregistered words that are difficult to install on the client side such as a mobile terminal.
  • a search algorithm can be implemented, and it may be possible to improve the search accuracy for unregistered words.
  • the present invention relates to various electronic devices that use speech recognition technology as input means to devices, such as AV devices such as TVs and videos, car-mounted devices such as car navigation systems, and portable devices such as PDAs and mobile phones. It can be used for terminals, etc., and its industrial applicability is very wide and large.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Ce dispositif de reconnaissance vocale peut montrer à un utilisateur si le mot prononcé par l’utilisateur est un mot qui n’est pas enregistré dans un dictionnaire de reconnaissance vocale et si le mot devrait être reprononcé du fait d’une erreur de reconnaissance. Le dispositif de reconnaissance vocale comprend : une unité de stockage de vocabulaire de reconnaissance vocale (102) pour définir un vocabulaire de reconnaissance vocale et le stocker en tant que mots enregistrés ; une unité de reconnaissance vocale (101) pour corréler la parole prononcée avec un mot enregistré ; une unité de jugement de mot non enregistré (104) pour juger si la parole prononcée est un mot enregistré ou un mot non enregistré en fonction du résultat de la corrélation de l’unité de reconnaissance vocale (101) ; une unité de stockage de mot non enregistré (106) pour stocker un mot non enregistré ; une unité de recherche de candidat mot non enregistré (105) utilisée lorsque la parole est jugée comme étant un mot non enregistré par l’unité de jugement de mot non enregistré (104), pour chercher un candidat mot non enregistré qui est considéré comme correspondant à la parole prononcée, dans l’unité de stockage de mot non enregistré (106) ; et une unité d’affichage de résultat (107) pour afficher le résultat de la recherche.
PCT/JP2005/010183 2004-06-10 2005-06-02 Dispositif de reconnaissance vocale, méthode et programme de reconnaissance vocale WO2005122144A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2006514478A JP4705023B2 (ja) 2004-06-10 2005-06-02 音声認識装置、音声認識方法、及びプログラム
US11/628,887 US7813928B2 (en) 2004-06-10 2005-06-02 Speech recognition device, speech recognition method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2004173147 2004-06-10
JP2004-173147 2004-06-10

Publications (1)

Publication Number Publication Date
WO2005122144A1 true WO2005122144A1 (fr) 2005-12-22

Family

ID=35503310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2005/010183 WO2005122144A1 (fr) 2004-06-10 2005-06-02 Dispositif de reconnaissance vocale, méthode et programme de reconnaissance vocale

Country Status (3)

Country Link
US (1) US7813928B2 (fr)
JP (1) JP4705023B2 (fr)
WO (1) WO2005122144A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070185713A1 (en) * 2006-02-09 2007-08-09 Samsung Electronics Co., Ltd. Recognition confidence measuring by lexical distance between candidates
WO2007097390A1 (fr) * 2006-02-23 2007-08-30 Nec Corporation Systeme de reconnaissance vocale, procede de sortie de resultat de reconnaissance vocale et programme de sortie de resultat de reconnaissance vocale
JP2010014885A (ja) * 2008-07-02 2010-01-21 Advanced Telecommunication Research Institute International 音声認識機能付情報処理端末
US7881928B2 (en) * 2006-09-01 2011-02-01 International Business Machines Corporation Enhanced linguistic transformation
KR101300839B1 (ko) 2007-12-18 2013-09-10 삼성전자주식회사 음성 검색어 확장 방법 및 시스템
US8688451B2 (en) * 2006-05-11 2014-04-01 General Motors Llc Distinguishing out-of-vocabulary speech from in-vocabulary speech
DE112007002665B4 (de) * 2006-12-15 2017-12-28 Mitsubishi Electric Corp. Spracherkennungssystem
CN113869281A (zh) * 2018-07-19 2021-12-31 北京影谱科技股份有限公司 一种人物识别方法、装置、设备和介质

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070132834A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Speech disambiguation in a composite services enablement environment
WO2007138875A1 (fr) * 2006-05-31 2007-12-06 Nec Corporation systÈme de fabrication de modÈle de langUe/dictionnaire de mots à reconnaissance vocale, procÉdÉ, programme, et systÈme À reconnaissance vocale
US20080154600A1 (en) * 2006-12-21 2008-06-26 Nokia Corporation System, Method, Apparatus and Computer Program Product for Providing Dynamic Vocabulary Prediction for Speech Recognition
KR100897554B1 (ko) * 2007-02-21 2009-05-15 삼성전자주식회사 분산 음성인식시스템 및 방법과 분산 음성인식을 위한 단말기
WO2008126347A1 (fr) * 2007-03-16 2008-10-23 Panasonic Corporation Dispositif d'analyse de voix, méthode d'analyse de voix, programme d'analyse de voix et circuit d'intégration du système
US8756527B2 (en) * 2008-01-18 2014-06-17 Rpx Corporation Method, apparatus and computer program product for providing a word input mechanism
JP5024154B2 (ja) * 2008-03-27 2012-09-12 富士通株式会社 関連付け装置、関連付け方法及びコンピュータプログラム
KR101427686B1 (ko) 2008-06-09 2014-08-12 삼성전자주식회사 프로그램 선택 방법 및 그 장치
JP2010154397A (ja) * 2008-12-26 2010-07-08 Sony Corp データ処理装置、データ処理方法、及び、プログラム
JP5692493B2 (ja) * 2009-02-05 2015-04-01 セイコーエプソン株式会社 隠れマルコフモデル作成プログラム、情報記憶媒体、隠れマルコフモデル作成システム、音声認識システム及び音声認識方法
US9659559B2 (en) * 2009-06-25 2017-05-23 Adacel Systems, Inc. Phonetic distance measurement system and related methods
KR20110006004A (ko) * 2009-07-13 2011-01-20 삼성전자주식회사 결합인식단위 최적화 장치 및 그 방법
US20150279354A1 (en) * 2010-05-19 2015-10-01 Google Inc. Personalization and Latency Reduction for Voice-Activated Commands
US8522283B2 (en) 2010-05-20 2013-08-27 Google Inc. Television remote control data transfer
JP5739718B2 (ja) * 2011-04-19 2015-06-24 本田技研工業株式会社 対話装置
JP5642037B2 (ja) * 2011-09-22 2014-12-17 株式会社東芝 検索装置、検索方法およびプログラム
JP5853653B2 (ja) * 2011-12-01 2016-02-09 ソニー株式会社 サーバ装置、情報端末及びプログラム
JP5675722B2 (ja) * 2012-07-23 2015-02-25 東芝テック株式会社 認識辞書処理装置及び認識辞書処理プログラム
US9311914B2 (en) * 2012-09-03 2016-04-12 Nice-Systems Ltd Method and apparatus for enhanced phonetic indexing and search
JP6221301B2 (ja) * 2013-03-28 2017-11-01 富士通株式会社 音声処理装置、音声処理システムおよび音声処理方法
US10170114B2 (en) 2013-05-30 2019-01-01 Promptu Systems Corporation Systems and methods for adaptive proper name entity recognition and understanding
JP6100101B2 (ja) * 2013-06-04 2017-03-22 アルパイン株式会社 音声認識を利用した候補選択装置および候補選択方法
US9384731B2 (en) * 2013-11-06 2016-07-05 Microsoft Technology Licensing, Llc Detecting speech input phrase confusion risk
US9653071B2 (en) * 2014-02-08 2017-05-16 Honda Motor Co., Ltd. Method and system for the correction-centric detection of critical speech recognition errors in spoken short messages
JP5921601B2 (ja) * 2014-05-08 2016-05-24 日本電信電話株式会社 音声認識辞書更新装置、音声認識辞書更新方法、プログラム
CN107112007B (zh) * 2014-12-24 2020-08-07 三菱电机株式会社 语音识别装置及语音识别方法
US9392324B1 (en) 2015-03-30 2016-07-12 Rovi Guides, Inc. Systems and methods for identifying and storing a portion of a media asset
JP6744025B2 (ja) * 2016-06-21 2020-08-19 日本電気株式会社 作業支援システム、管理サーバ、携帯端末、作業支援方法およびプログラム
US9984688B2 (en) * 2016-09-28 2018-05-29 Visteon Global Technologies, Inc. Dynamically adjusting a voice recognition system
WO2018173295A1 (fr) * 2017-03-24 2018-09-27 ヤマハ株式会社 Dispositif d'interface d'utilisateur, procédé d'interface d'utilisateur, et système d'utilisation sonore
CN107103903B (zh) * 2017-05-05 2020-05-29 百度在线网络技术(北京)有限公司 基于人工智能的声学模型训练方法、装置及存储介质
CN107240395B (zh) * 2017-06-16 2020-04-28 百度在线网络技术(北京)有限公司 一种声学模型训练方法和装置、计算机设备、存储介质
CN107293296B (zh) * 2017-06-28 2020-11-20 百度在线网络技术(北京)有限公司 语音识别结果纠正方法、装置、设备及存储介质
US20190147855A1 (en) * 2017-11-13 2019-05-16 GM Global Technology Operations LLC Neural network for use in speech recognition arbitration
KR102455067B1 (ko) * 2017-11-24 2022-10-17 삼성전자주식회사 전자 장치 및 그 제어 방법
CN109325227A (zh) * 2018-09-14 2019-02-12 北京字节跳动网络技术有限公司 用于生成修正语句的方法和装置
US11024310B2 (en) 2018-12-31 2021-06-01 Sling Media Pvt. Ltd. Voice control for media content search and selection
KR102738700B1 (ko) * 2019-09-18 2024-12-06 삼성전자주식회사 전자장치 및 그 음성인식 제어방법
WO2022198474A1 (fr) * 2021-03-24 2022-09-29 Sas Institute Inc. Structure d'analyse de parole avec support pour grands corpus de n-grammes
US11875780B2 (en) * 2021-02-16 2024-01-16 Vocollect, Inc. Voice recognition performance constellation graph

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0981181A (ja) * 1995-09-11 1997-03-28 Atr Onsei Honyaku Tsushin Kenkyusho:Kk 音声認識装置
JP2000259645A (ja) * 1999-03-05 2000-09-22 Fuji Xerox Co Ltd 音声処理装置及び音声データ検索装置
JP2001236089A (ja) * 1999-12-17 2001-08-31 Atr Interpreting Telecommunications Res Lab 統計的言語モデル生成装置、音声認識装置、情報検索処理装置及びかな漢字変換装置
JP2002215670A (ja) * 2001-01-15 2002-08-02 Omron Corp 音声応答装置、音声応答方法、音声応答プログラム、音声応答プログラムを記録した記録媒体および予約システム
JP2002297179A (ja) * 2001-03-29 2002-10-11 Fujitsu Ltd 自動応答対話システム
JP2002540479A (ja) * 1999-03-26 2002-11-26 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ クライアントサーバ音声認識
JP2002358095A (ja) * 2001-03-30 2002-12-13 Sony Corp 音声処理装置および音声処理方法、並びにプログラムおよび記録媒体
JP2003044091A (ja) * 2001-07-31 2003-02-14 Ntt Docomo Inc 音声認識システム、携帯情報端末、音声情報処理装置、音声情報処理方法および音声情報処理プログラム

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5127055A (en) * 1988-12-30 1992-06-30 Kurzweil Applied Intelligence, Inc. Speech recognition apparatus & method having dynamic reference pattern adaptation
JP2808906B2 (ja) * 1991-02-07 1998-10-08 日本電気株式会社 音声認識装置
JPH06282293A (ja) 1993-03-29 1994-10-07 Sony Corp 音声認識装置
JP3468572B2 (ja) 1994-03-22 2003-11-17 三菱電機株式会社 対話処理装置
JP3459712B2 (ja) * 1995-11-01 2003-10-27 キヤノン株式会社 音声認識方法及び装置及びコンピュータ制御装置
JPH09230889A (ja) 1996-02-23 1997-09-05 Hitachi Ltd 音声認識応答装置
US6195641B1 (en) * 1998-03-27 2001-02-27 International Business Machines Corp. Network universal spoken language vocabulary
EP1088299A2 (fr) * 1999-03-26 2001-04-04 Scansoft, Inc. Reconnaissance vocale client-serveur
JP3976959B2 (ja) * 1999-09-24 2007-09-19 三菱電機株式会社 音声認識装置、音声認識方法および音声認識プログラム記録媒体
JP4543294B2 (ja) * 2000-03-14 2010-09-15 ソニー株式会社 音声認識装置および音声認識方法、並びに記録媒体
JP4072718B2 (ja) * 2002-11-21 2008-04-09 ソニー株式会社 音声処理装置および方法、記録媒体並びにプログラム

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0981181A (ja) * 1995-09-11 1997-03-28 Atr Onsei Honyaku Tsushin Kenkyusho:Kk 音声認識装置
JP2000259645A (ja) * 1999-03-05 2000-09-22 Fuji Xerox Co Ltd 音声処理装置及び音声データ検索装置
JP2002540479A (ja) * 1999-03-26 2002-11-26 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ クライアントサーバ音声認識
JP2001236089A (ja) * 1999-12-17 2001-08-31 Atr Interpreting Telecommunications Res Lab 統計的言語モデル生成装置、音声認識装置、情報検索処理装置及びかな漢字変換装置
JP2002215670A (ja) * 2001-01-15 2002-08-02 Omron Corp 音声応答装置、音声応答方法、音声応答プログラム、音声応答プログラムを記録した記録媒体および予約システム
JP2002297179A (ja) * 2001-03-29 2002-10-11 Fujitsu Ltd 自動応答対話システム
JP2002358095A (ja) * 2001-03-30 2002-12-13 Sony Corp 音声処理装置および音声処理方法、並びにプログラムおよび記録媒体
JP2003044091A (ja) * 2001-07-31 2003-02-14 Ntt Docomo Inc 音声認識システム、携帯情報端末、音声情報処理装置、音声情報処理方法および音声情報処理プログラム

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAKAMOTO H. ET AL: "Detection of Unregistered-Words Using Phoneme Cluster Models", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J80-D-II, no. 9, 25 September 1997 (1997-09-25), pages 2261 - 2269, XP002996904 *
TANGAKI H. ET AL: "Hierarchical Language Model Incorporating Probabilistic Description of Vocabulary in Classes", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNIUCATION ENGINEERS, vol. J84-D-II, no. 11, 1 November 2001 (2001-11-01), pages 2371 - 2378, XP002996905 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070185713A1 (en) * 2006-02-09 2007-08-09 Samsung Electronics Co., Ltd. Recognition confidence measuring by lexical distance between candidates
US8990086B2 (en) * 2006-02-09 2015-03-24 Samsung Electronics Co., Ltd. Recognition confidence measuring by lexical distance between candidates
WO2007097390A1 (fr) * 2006-02-23 2007-08-30 Nec Corporation Systeme de reconnaissance vocale, procede de sortie de resultat de reconnaissance vocale et programme de sortie de resultat de reconnaissance vocale
US8756058B2 (en) 2006-02-23 2014-06-17 Nec Corporation Speech recognition system, speech recognition result output method, and speech recognition result output program
US8688451B2 (en) * 2006-05-11 2014-04-01 General Motors Llc Distinguishing out-of-vocabulary speech from in-vocabulary speech
US7881928B2 (en) * 2006-09-01 2011-02-01 International Business Machines Corporation Enhanced linguistic transformation
DE112007002665B4 (de) * 2006-12-15 2017-12-28 Mitsubishi Electric Corp. Spracherkennungssystem
KR101300839B1 (ko) 2007-12-18 2013-09-10 삼성전자주식회사 음성 검색어 확장 방법 및 시스템
JP2010014885A (ja) * 2008-07-02 2010-01-21 Advanced Telecommunication Research Institute International 音声認識機能付情報処理端末
CN113869281A (zh) * 2018-07-19 2021-12-31 北京影谱科技股份有限公司 一种人物识别方法、装置、设备和介质

Also Published As

Publication number Publication date
US20080167872A1 (en) 2008-07-10
US7813928B2 (en) 2010-10-12
JPWO2005122144A1 (ja) 2008-04-10
JP4705023B2 (ja) 2011-06-22

Similar Documents

Publication Publication Date Title
JP4705023B2 (ja) 音声認識装置、音声認識方法、及びプログラム
US7421387B2 (en) Dynamic N-best algorithm to reduce recognition errors
US9336769B2 (en) Relative semantic confidence measure for error detection in ASR
US8401840B2 (en) Automatic spoken language identification based on phoneme sequence patterns
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US8612212B2 (en) Method and system for automatically detecting morphemes in a task classification system using lattices
US8719021B2 (en) Speech recognition dictionary compilation assisting system, speech recognition dictionary compilation assisting method and speech recognition dictionary compilation assisting program
US7124080B2 (en) Method and apparatus for adapting a class entity dictionary used with language models
US7620548B2 (en) Method and system for automatic detecting morphemes in a task classification system using lattices
JP5440177B2 (ja) 単語カテゴリ推定装置、単語カテゴリ推定方法、音声認識装置、音声認識方法、プログラム、および記録媒体
US8577679B2 (en) Symbol insertion apparatus and symbol insertion method
US20050256715A1 (en) Language model generation and accumulation device, speech recognition device, language model creation method, and speech recognition method
US20060009965A1 (en) Method and apparatus for distribution-based language model adaptation
JP6323947B2 (ja) 音響イベント認識装置、及びプログラム
US7912707B2 (en) Adapting a language model to accommodate inputs not found in a directory assistance listing
US20050187767A1 (en) Dynamic N-best algorithm to reduce speech recognition errors
US7085720B1 (en) Method for task classification using morphemes
JP4764203B2 (ja) 音声認識装置及び音声認識プログラム
JP5243325B2 (ja) 音声認識に仮名漢字変換システムを用いた端末、方法及びプログラム
JP2000172294A (ja) 音声認識方法、その装置及びプログラム記録媒体
JP5124012B2 (ja) 音声認識装置及び音声認識プログラム
JP4986301B2 (ja) 音声認識処理機能を用いたコンテンツ検索装置、プログラム及び方法
WO2009147745A1 (fr) Dispositif de récupération
JP5585111B2 (ja) 発話内容推定装置、言語モデル作成装置、それに用いる方法およびプログラム
Zhang Making an effective use of speech data for acoustic modeling

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2006514478

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 11628887

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase