WO2005122144A1 - Speech recognition device, speech recognition method, and program - Google Patents

Speech recognition device, speech recognition method, and program Download PDF

Info

Publication number
WO2005122144A1
WO2005122144A1 PCT/JP2005/010183 JP2005010183W WO2005122144A1 WO 2005122144 A1 WO2005122144 A1 WO 2005122144A1 JP 2005010183 W JP2005010183 W JP 2005010183W WO 2005122144 A1 WO2005122144 A1 WO 2005122144A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
unregistered
unregistered word
speech recognition
search
Prior art date
Application number
PCT/JP2005/010183
Other languages
French (fr)
Japanese (ja)
Inventor
Yoshiyuki Okimoto
Tsuyoshi Inoue
Takashi Tsuzuki
Original Assignee
Matsushita Electric Industrial Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co., Ltd. filed Critical Matsushita Electric Industrial Co., Ltd.
Priority to JP2006514478A priority Critical patent/JP4705023B2/en
Priority to US11/628,887 priority patent/US7813928B2/en
Publication of WO2005122144A1 publication Critical patent/WO2005122144A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results

Definitions

  • Speech recognition apparatus speech recognition method, and program
  • the present invention relates to a speech recognition device used for a man-machine interface based on speech recognition, and more particularly to a response technique for unregistered word utterances.
  • Patent Document 1 the similarity between the input speech and each word in the speech recognition dictionary is obtained, and the reference similarity is obtained by combining the unit standard pattern with each reference word. If the similarity is corrected and the corrected similarity does not reach a certain threshold, the user's utterance is regarded as an unregistered word, and a method is described.
  • Patent Document 2 describes a method for detecting unregistered words with a small amount of processing and high accuracy using a phoneme HMM (Hidden Markov Model) and a garbage HMM. It has been done.
  • a phoneme HMM Hidden Markov Model
  • Patent Document 3 when the utterance of an unregistered word is detected by the user, a method of presenting a list of words that can be accepted by the device according to the situation to the user Is written. According to this, even if the user does not know the word recognized by the device, every time he / she utters an unregistered word, he / she is taught a word that can be spoken in that situation. It is possible to realize the desired operation without repeating the above.
  • Patent Document 4 speech recognition is performed by combining an internal dictionary corresponding to a conventional speech recognition dictionary and an external dictionary that stores a large number of unregistered words in the conventional speech recognition dictionary.
  • a method is described in which speech recognition is performed as a dictionary for use, and when a word included in an external dictionary is recognized as a recognition result, it is simultaneously indicated that this is an unregistered word.
  • a response such as “Taro Matsushita does not exist” becomes possible.
  • Patent Document 1 Japanese Patent No. 2808906
  • Patent Document 2 Japanese Patent No. 2886117
  • Patent Document 3 Japanese Patent No. 3468572
  • Patent Document 4 Japanese Patent Laid-Open No. 9-230889
  • Non-Patent Document 1 Kiyohiro Shikano, Satoshi Nakamura, Shiro Ise, “Digital Signal Processing Series 5: Digital Signal Processing of Voice / Sound Information”, Shosodo, November 10, 1997, p. 45, 53
  • the present invention has been made in view of a serious problem, and an object of the present invention is to provide a speech recognition device that can reduce the situation in which a user tries a useless recurrent speech.
  • Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results.
  • the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit. The stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.
  • the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in a speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
  • an utterance of an unregistered word that causes voice recognition by the user to fail is presented to the user, and at the same time, it is also presented to the user that it is not due to a recognition error. Can do.
  • the recognition rate for the utterance of a word in the speech recognition dictionary which is the original purpose, cannot be reduced.
  • the unregistered word word storage means for searching for unregistered word candidates is very large and always requires maintenance. By separating this function from a server as a function, It is possible to reduce the manufacturing cost of the device and at the same time to reduce the maintenance cost of the unregistered word storage means.
  • FIG. 2 is a flowchart showing the operation of the speech recognition apparatus according to the first embodiment.
  • FIG. 3 is a diagram showing an output example of a speech recognition unit when a recognized vocabulary is uttered according to the first embodiment.
  • FIG. 4 is a diagram showing an output example of a reference similarity calculation unit when a recognized vocabulary is uttered according to Embodiment 1.
  • FIG. 5 is a diagram showing a result display example when a recognized vocabulary is uttered according to the first embodiment.
  • FIG. 6 is a diagram showing an output example of a speech recognition unit when an unregistered word is uttered according to the first embodiment.
  • FIG. 7 is a diagram showing an output example of a reference similarity calculation unit when an unregistered word is uttered according to the first embodiment.
  • FIG. 8 is a diagram showing an output example of an unregistered word candidate search unit according to the first embodiment.
  • FIG. 10 is a diagram showing a calculation method of similarity between phoneme sequences at the time of unregistered word search according to the first embodiment.
  • FIG. 11 is a block diagram showing a functional configuration of the unknown utterance detection apparatus.
  • FIG. 12 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 2 of the present invention.
  • FIG. 13 is a diagram showing an example of unregistered word categories according to the second embodiment.
  • FIG. 14 is a block diagram showing a functional configuration of an unregistered word class determination unit using a class N-gram language model.
  • FIG. 17 is a diagram showing an example of a class N-gram language model for unregistered word class determination according to the second embodiment.
  • FIG. 18 is a diagram showing a display example of a result when an unregistered word of a different class according to the second embodiment is uttered.
  • FIG. 19 is a diagram showing a configuration of an unregistered word class determination unit that acquires information for external application force unregistered word class determination according to the second embodiment.
  • FIG. 20 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 3 of the present invention.
  • FIG. 21 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 4 of the present invention. Explanation of symbols
  • a speech recognition device is a speech recognition device that recognizes spoken speech, defines a vocabulary for speech recognition, and stores it as a registered word.
  • speech recognition means for verifying the spoken speech and registered words stored in the speech recognition word storage means, and based on the verification result of the speech recognition means
  • Unspoken word determining means for determining whether the spoken speech is a registered word stored in the speech recognition word storage means and stored, unregistered word;
  • the unregistered word storage means for storing the unregistered word, and the spoken voice based on the spoken voice when the unregistered word determining means determines that it is an unregistered word.
  • Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results.
  • the unregistered word candidate search means may search a plurality of unregistered word candidates from unregistered words stored in the unregistered word word storage means.
  • the unregistered word storage means preferably stores the unregistered words classified according to the category according to the category to which the unregistered word belongs.
  • the speech recognition apparatus further includes unregistered word class determining means for determining a category to which the unregistered word belongs based on the spoken speech, and the unregistered word candidate search means includes the unregistered word More preferably, the unregistered word candidate is searched from the classified categories in the unregistered word word storage unit based on the determination result of the class determining unit.
  • the speech recognition apparatus further includes an information acquisition unit that acquires information about the category, and the unregistered word candidate search unit is based on the information acquired by the information acquisition unit.
  • the registered word candidates may also be searched for the categorized category in the unregistered word word storage means.
  • the unregistered word candidate search means searches for the unregistered word candidate by calculating an unregistered word score obtained by quantifying the degree of similarity with the spoken speech,
  • the display unit displays the unregistered word candidate and its unregistered word score as the search result, and the result display unit displays the unregistered word according to the unregistered word score. I prefer to change the display of the candidate.
  • the unregistered word candidates are quantified, and the unregistered word candidates are emphasized by emphasizing the most likely unregistered word candidates. There is an effect that it can be presented easily.
  • the unregistered words stored in the unregistered word storage means may be updated under a predetermined condition.
  • the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit.
  • the stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.
  • the registered words may be updated under predetermined conditions.
  • the present invention can be realized as a voice recognition system that can be realized as such a voice recognition apparatus. That is, a speech recognition system for recognizing spoken speech, the speech recognition system being registered in the speech recognition device for recognizing spoken speech and the speech recognition device! An unregistered word search server for searching words, wherein the speech recognition device defines a vocabulary for speech recognition and stores speech recognition word storage means as a registered word, and the spoken speech, Speech recognition means for collating a registered word stored in the speech recognition word storage means; Based on the collation result of the speech recognition means, whether the spoken speech is a registered word that is stored in the speech recognition word storage means and whether it is a stored unregistered word.
  • the unregistered search server When the unregistered word determining means and the unregistered word determining means determine that it is an unregistered word, the unregistered search server is searched for an unregistered word candidate that seems to correspond to the spoken speech.
  • Search request transmitting means for requesting, search result receiving means for acquiring a search result of the unregistered word candidate from the unregistered word search server, and result display means for displaying the search result, and the unregistered word
  • the search server includes an unregistered word storage unit that stores the unregistered word, a search request reception unit that receives the search request from the search request transmission unit, and the search request reception unit that has received the search request.
  • the apparatus may further include a search unit and a search result transmission unit that transmits the search result to the voice recognition device.
  • the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in the speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
  • FIG. 1 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 1 of the present invention.
  • the speech recognition apparatus 100 shown in FIG. 1 is used as one of the man 'machine' interfaces.
  • User power is a device that accepts speech input and outputs the recognition result of the input speech.
  • speech recognition unit 101 speech recognition vocabulary storage unit 102, reference similarity calculation unit 103, unregistered word determination unit 104, An unregistered word candidate search unit 105, an unregistered word word storage unit 106, and a result display unit 107 are provided.
  • the speech recognition unit 101 is a processing unit that captures input speech and recognizes the utterance content.
  • the voice recognition vocabulary storage unit 102 is a storage device such as a hard disk that defines and stores the vocabulary recognized by the voice recognition unit 101.
  • This speech recognition vocabulary storage unit 102 stores a standard acoustic pattern of each word as a standard pattern, or a representation of the acoustic pattern of each word in a model called HMM (Hidden Markov Model) or -Uralnet. And!
  • This speech recognition vocabulary storage unit 102 stores a standard pattern expressing a pattern for each shorter acoustic unit or a model expressed by a model such as an HMM or -Ural net.
  • a word pattern or a word model is synthesized and provided to the speech recognition unit 101.
  • the reference similarity calculation unit 103 is a processing unit that calculates a reference similarity used to determine whether or not the input speech is an unregistered word. This reference similarity calculation unit 103 searches for a subword sequence having the highest similarity to the input speech by arbitrarily combining patterns or models of acoustic units shorter than words called subwords, and has the maximum similarity. Ask for.
  • the unregistered word determination unit 104 determines whether or not the user's utterance content is an unregistered word based on the results of both the voice recognition unit 101 and the reference similarity calculation unit 103. This unregistered word determination unit 104, when the utterance content of the user is a word stored in the speech recognition vocabulary storage unit 102, that is, a registered word, results in a recognition result that the utterance content has been recognized. If the word is output to the display unit 107 and is not stored in the speech recognition vocabulary storage unit 102, that is, an unregistered word, the determination result that the utterance content is an unregistered word is displayed as an unregistered word. Output to candidate search section 105.
  • the unregistered word candidate search unit 105 is a processing unit that searches for unregistered word words from the utterance content when the user's utterance content is determined to be an unregistered word.
  • the unregistered word storage unit 106 is a storage device such as a hard disk that stores a large number of words to be searched for unregistered words in the unregistered word candidate search unit 105.
  • the unregistered word candidate search unit 105 searches for an unregistered word word from a very large number of vocabularies stored in the unregistered word word storage unit 106, and will be described later. Thus, it is preferable to perform a search using a simpler and faster method (that is, shorter calculation time), which is different from the speech recognition unit 101.
  • the result display unit 107 is a display device such as a CRT display or a liquid crystal display, and shows a screen showing a recognition result output from the unregistered word determination unit 104, a determination result, and a search result of unregistered word words. By displaying the screen, it is shown to the user whether the content of the user's utterance has been recognized or whether it is an unregistered word.
  • FIG. 2 is a flowchart showing the processing operation of the speech recognition apparatus 100.
  • the speech recognition unit 101 when the speech recognition apparatus 100 receives an input of speech uttered by the user (S10), the speech recognition unit 101 resembles the input speech from the speech recognition vocabulary storage unit 102 based on the input speech.
  • the word to be recognized is recognized (S12). More specifically, the speech recognition unit 101 compares the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 with the input speech, and inputs speech for each word. Similarities are calculated, and those with high similarity are extracted as candidates. At this time, the speech recognition apparatus 100 searches the sub-word sequence closest to the input speech in the reference similarity calculation unit 103, and obtains the similarity as the reference similarity (S14).
  • the speech recognition apparatus 100 uses the unregistered word determination unit 104 to refer to the similarity of the first candidate word (the candidate word having the highest similarity) obtained by the speech recognition unit 101, and the reference
  • the similarity calculation unit 103 compares the reference similarity obtained and determines whether the comparison result is within a predetermined threshold (S16).
  • the predetermined threshold here is a threshold for determining whether the utterance content of the user is a registered word or an unregistered word. Using a large number of sampled utterances, the similarity by each of the speech recognition unit 101 and the reference similarity calculation unit 103 is obtained, and these statistical distribution power optimum threshold values are determined.
  • the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 are within a threshold that is statistically determined in advance. If there is (Yes in S16), it is determined that the utterance content of the user is a word (registered word) included in the speech recognition vocabulary storage unit 102 (S18). Thereafter, the speech recognition apparatus 100 presents the recognition result to the user via the result display unit 107 (S26), and ends the processing operation.
  • the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 exceed a statistically predetermined threshold. (No in S16), the user's utterance content is determined to be a word (unregistered word) that is not included in the speech recognition vocabulary storage unit 102 (S20), and the determination result is stored as an unregistered word candidate search unit. Output to 105
  • the speech recognition apparatus 100 uses the unregistered word candidate search unit 105 to register the utterance content based on the utterance content.
  • a word word is searched (S22).
  • the unregistered word candidate search unit 105 compares the subword sequence obtained by the reference similarity calculation unit 103 with each of the many unregistered word words stored in the unregistered word word storage unit 106.
  • an unregistered word score which is a score related to the similarity, is obtained to search for an unregistered word having a high score, that is, an unregistered word that seems to be the user's utterance content.
  • the unregistered word candidate search unit 105 extracts a plurality of unregistered word candidates that are considered to be the utterance contents of the user, for example, in descending order from the highest score (S24), together with the unregistered word score. Output to the result display section 107.
  • the speech recognition apparatus 100 presents the determination result, the extracted unregistered word candidate and the unregistered word score to the user via the result display unit 107 (S26), and performs the processing operation. finish.
  • the speech recognition device 100 defines a word to be recognized, that is, a speech recognition vocabulary according to an application that uses the speech recognition device 100 as an input device for a man-machine interface.
  • a word to be recognized that is, a speech recognition vocabulary according to an application that uses the speech recognition device 100 as an input device for a man-machine interface.
  • the name of the program to be searched the name of a performer who becomes a key in the search, and the like are defined as the voice recognition vocabulary.
  • the speech recognition apparatus 100 displays differently depending on whether the utterance content by the user is a word included in the speech recognition vocabulary storage unit 102 or not. That is, when the utterance content is a word included in the speech recognition vocabulary storage unit 102, as described above, the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 and The speech recognition unit 101 performs collation with the input speech, calculates the similarity for each word, obtains the top candidates in descending order from the highest similarity, and outputs the content to the result display unit 107.
  • FIG. 3 shows an example in which a user utters “Matsushita Taro” on the assumption that “Taro Matsushita” t ⁇ ⁇ word exists in the speech recognition vocabulary storage unit 102. .
  • the reference similarity calculation unit 103 searches for a subword sequence closest to the input speech and obtains the similarity as the reference similarity.
  • FIG. 4 shows an output example of the reference similarity calculation unit 103 for the utterance “Matsushita Taro” by the user.
  • the difference between the similarity “2041” of the first candidate and the reference similarity “2225” is a statistically calculated threshold (for example, “200 Therefore, the unregistered word determination unit 104 determines that the user's utterance content is a registered word. Since the judgment result of the utterance content is not correct for unregistered words, in this case, the unregistered word candidate search unit 105 outputs the recognition result as it is to the result display unit 107 without performing the unregistered word search. “Taro Matsushita” is correctly displayed as a recognition result via the display unit 107.
  • FIG. 5 shows an example of the result display in the result display unit 107.
  • the speech recognition unit 101 collates with each word stored in the speech recognition vocabulary storage unit 102, The similarity is calculated for each word, and the top candidates are output in descending order from the highest similarity.
  • the utterance content is a word that is not included in the speech recognition vocabulary, there is no word that matches the utterance content among these candidates, so an output example is shown in Fig. 6. It will be like that.
  • the speech utterance content of the user is “Matsushita Taro”. In other words, “Taro Matsushita” t, the word is included, and, as! /
  • the reference similarity calculation unit 103 searches for a subword sequence most similar to the input speech and calculates the similarity. This is whether the utterance content is included in the speech recognition vocabulary. It is not affected at all by no. As a result, the output of the reference similarity calculation unit 103 is the same as the output example (see FIG. 4) when the utterance content is included in the speech recognition vocabulary as shown in FIG.
  • the unregistered word determination unit 104 compares the similarity of the first candidate by the speech recognition unit 101 and the reference similarity by the reference similarity calculation unit 103 as described above. When the utterance content is not included in the speech recognition vocabulary, the similarity between the two is greatly different and the difference between them is greater than a predetermined threshold value. Based on this, the unregistered word determination unit 104 The content is determined as an unregistered word. For example, in the example shown in FIG. 6 and FIG. 7, the similarity “1431” of the first candidate in the speech recognition unit 101 and the reference similarity “2225” are greatly different, and the difference is predetermined. Therefore, the unregistered word determination unit 104 determines that the user's utterance content is an unregistered word.
  • the unregistered word candidate search unit 105 determines the subword sequence obtained in the reference similarity calculation unit 103. Are compared with each of a number of unregistered word words stored in the unregistered word word storage unit 106, and an unregistered word score, which is a score related to the similarity, is calculated.
  • the unregistered word candidate search unit 105 extracts the top five candidates from the unregistered word score in descending order of the unregistered word score, and outputs it to the result display unit 107 together with the unregistered word score.
  • FIG. 8 shows that the unregistered word candidate search unit 105 performs an unregistered word candidate search unit 105 based on the subword string “Matsushima Kanou” obtained by the reference similarity calculation unit 103 for the user's utterance “Matsushi Taro”. It is a figure which shows the example of the result of having searched the registration word.
  • “Taro Matsushita” is stored in the unregistered word storage unit 106.
  • the search result by the unregistered word candidate search unit 105 is sent to the result display unit 107 together with information that these words are unregistered words, and the user's utterance is recognized as an unregistered word. This is communicated to the user.
  • the result shown in Fig. 9 is output. Is done. The user who sees the recognition result in the format illustrated in Fig. 9 can know at a glance that his / her utterance content was unknown to the system.
  • the unregistered word candidate search unit 105 has a high accuracy in search accuracy. Not only is it required, it can also be a great merit, such as keeping the hardware resources to achieve the search accuracy low. However, even if the search accuracy is not so high, by displaying multiple candidates, the words spoken by the user will be included in it with a high probability. Since the word is an unregistered word, it is useful to know that it is useless even if you try to speak repeatedly.
  • unregistered word candidate search section 105 in the first embodiment, a value based on phoneme editing distance is used as a search method for unregistered word candidates.
  • the unregistered word candidate search unit 105 the phoneme symbol string representation of the subword sequence obtained by the reference similarity calculation unit 103 and the phoneme symbol string of the word stored in the unregistered word word storage unit 106 are used.
  • the edit distance is calculated as described above, and the normalized length is subtracted from 1 to obtain the unregistered word score.
  • the unregistered word candidate search unit 105 performs this process for all the words stored in the unregistered word word storage unit 106, and extracts the unregistered word candidates from the words with the highest unregistered word score in descending order. The result is output to the result display unit 107.
  • the diagram shown in FIG. 8 is an example of the unregistered word candidate and the unregistered word score obtained in this way.
  • the advantage of realizing the unregistered word search method by comparing the phoneme sequences in this way is that the entire search for the unregistered word word storage unit 106 in which a large number of words are stored is performed with a light process.
  • unregistered word candidates are searched for and displayed to the user in a short time to give the user a light feeling of use. It is out.
  • the reference similarity calculation unit 103 is provided for unregistered word determination, but this is not an essential requirement. It is also possible to use other methods for determining unregistered words.
  • an unknown utterance detection apparatus as shown in FIG. Use it for the purpose.
  • FIG. 11 is a block diagram showing a functional configuration of the unknown utterance detection apparatus.
  • the speech segment pattern storage unit 111 stores speech segments of standard speech used for matching with the feature parameters of the input speech.
  • a speech segment is a VC pattern that concatenates the latter half of a vowel segment and the first half of a consonant segment that follows it, and the latter half of a consonant segment. It means a set of CV patterns connected to the first half of the vowel interval.
  • a speech piece is a set of phonemes that are almost equivalent to one letter of the alphabet when Japanese is written in Roman letters, and a mora that is almost equivalent to one letter of hiragana when Japanese is written in Hiragana.
  • a set of subwords meaning a chain of multiple mora, or a mixed set of these sets.
  • the word dictionary storage unit 112 stores a rule for synthesizing the word pattern of the speech recognition vocabulary by connecting the speech pieces.
  • the word matching unit 113 compares the input speech expressed in a time series of feature parameters with the synthesized word pattern, and calculates the likelihood corresponding to the similarity for each word.
  • the transition probability storage unit 114 stores a transition probability that expresses the naturalness of the connection as a continuous value when the speech pieces are arbitrarily combined.
  • the 2gram probability of phonemes is used as the transition probability.
  • the 2-gram probability of a phoneme means the probability P (y I X) that a phoneme y connects after the preceding phoneme x, and is obtained in advance using a large number of Japanese text data.
  • the transition probability may be a 2-gram probability of a mora, a 2-gram probability of a subword, or a 2-gram probability of a mixture of these, or a 3-gram probability other than a 2-gram probability. .
  • the speech sequence matching unit 115 considers the transition probability based on the likelihood of a pattern formed by arbitrarily combining the speech segment patterns and the input speech expressed as a time series of feature parameters. The maximum likelihood obtained is calculated.
  • the candidate score difference calculation unit 116 out of the likelihood of each word calculated by the word matching unit 113, the word that has the highest value (first candidate) and the word that has the next highest value (The difference between the likelihoods of the 2nd candidate) is calculated by normalizing the word length.
  • the candidate-phoneme sequence similarity calculation unit 117 calculates the distance between the first candidate phoneme sequence and the second candidate phoneme sequence in order to obtain the acoustic similarity between the first candidate and the second candidate. Calculate
  • Candidate 'speech sequence score difference calculation section 118 normalizes the difference between the likelihood of the first candidate and the reference likelihood calculated by speech sequence matching section 115 by the word length. calculate.
  • Candidate / speech sequence / phoneme sequence similarity calculation unit 119 calculates the acoustic similarity between the first candidate and the sequence determined as the optimal sequence by the speech sequence matching unit 115 as a distance between each phoneme sequence. Calculate as separation.
  • the unregistered word determination unit 104 includes the candidate score difference calculation unit 116, the candidate / phoneme sequence similarity calculation unit 117, the candidate, and the speech sequence.
  • the values obtained by the score difference calculation unit 118 and the candidate / speech sequence / phoneme sequence similarity calculation unit 119 are combined to determine whether or not the input speech is an unregistered word. In this way, the determination accuracy of unregistered words is improved by statistically combining a plurality of measures for detecting unregistered words.
  • the four scales are used as the scales used in the unregistered word determination unit 104. Besides this, the likelihood of each word candidate itself, its distribution, and the local score within the word section. It is also possible to use measures such as the amount of variation and the duration information of the phonemes that make up the word.
  • a linear discriminant obtained in advance using a large number of recognition result cases is used as a method for determining an unregistered word based on a plurality of scales.
  • learning machines such as neural networks, decision trees, and SVM (support 'vector' machines) is also effective.
  • the unregistered word candidate search unit 105 the unregistered word search method based on the edit distance between phoneme sequences has been described.
  • the definition of the edit distance between phonemes, insertion error, omission error, replacement Those obtained experimentally that not all errors are set to edit distance "1" It is also effective to use a continuous value based on the error occurrence probability as the distance.
  • data in the same format as the speech recognition vocabulary storage unit 102 is stored in the unregistered word word storage unit 106, and the unregistered word candidate search unit 105 is similar to the speech recognition unit 101. It is also possible to directly collate words from the input speech parameter and output unregistered word candidates and their unregistered word scores. With such a configuration, the resources required for unregistered word search increase, but the effect of improving the search accuracy for unregistered words can be obtained. Even in this case, the ⁇ effect and the ⁇ ⁇ effect that does not reduce the recognition rate for the target word, which is a feature of the present invention, are maintained.
  • the unregistered word storage unit 106 May be a word included in the speech recognition vocabulary storage unit 102, and instead, a word included in the speech recognition vocabulary storage unit 102 is searched by the unregistered word candidate search unit 105. In such a case, this may be excluded and output to the result display unit 107. By doing so, it becomes possible to determine the vocabulary of the unregistered word word storage unit 106 regardless of the contents of the speech recognition vocabulary storage unit 102, and the maintenance of the unregistered word word storage unit 106 becomes easy. An effect is obtained.
  • the unregistered word determination unit 104 determines whether or not an unregistered word word is included in the sentence utterance, and if it is included, in which position the unregistered word word is present. The required force and other operations are exactly the same.
  • the unregistered word candidate search unit 105 has been described as outputting five unregistered word candidates, but the unregistered word candidate search unit 105 has an unregistered word search accuracy. It is effective to change this according to the above, and the number of candidates to be output according to the similarity of each unregistered word may be made variable by the unregistered word candidate search unit 105. Therefore, depending on the search accuracy of the unregistered word candidate search unit 105 or the unregistered word score of the searched unregistered word, the number of output unregistered word candidates may be one. With this configuration, there is no unnecessary load when letting the user determine whether or not there is a spoken word in the candidate list. If you don't have to spend it on your users, you can get the effect.
  • the output example shown in Fig. 9 shows an example in which all unregistered word candidates are displayed in the same manner, but the result display unit 107 displays the unregistered word score of the unregistered word candidates. It is also possible to emphasize the candidate that seems to be the user's utterance content by changing the font size according to the situation, changing the font to bold, or changing the color. As a result, an effect of reducing the load on the user when searching for a spoken word from the list can be obtained.
  • FIG. 12 is a block diagram showing a functional configuration of the speech recognition apparatus according to the second embodiment.
  • the speech recognition apparatus 200 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105, And it is the same as the speech recognition apparatus 100 according to the first embodiment in that the result display unit 107 is provided.
  • the speech recognition apparatus 200 according to the second embodiment includes the unregistered word class determination unit 201 and the unregistered word class-specific word storage unit 202, and thus the speech recognition according to the first embodiment. Different from device 100. Hereinafter, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment, and description thereof is omitted.
  • the unregistered word class determination unit 201 determines what category the unregistered word belongs to based on the content of the utterance by the user and the usage status of the system. It is a processing part which performs.
  • the unregistered word class-specific word storage unit 202 is a storage device such as a hard disk that stores unregistered word words classified into categories.
  • the operation is the same as that shown in the first embodiment.
  • the reference similarity calculation unit 103 references Based on the similarity, the unregistered word determination unit 104 performs unregistered word determination.
  • the unregistered word class determination unit 201 determines in which category the unregistered word belongs.
  • the unregistered word category refers to a unique person name such as a celebrity name, a unique title name such as a program title, and a unique place name such as a “tour place” as shown in FIG. The method for determining the unregistered word category in the unregistered word class determining unit 201 will be described later.
  • the unregistered word candidate search unit 105 searches for an unregistered word. Done. At this time, the unregistered word candidate search unit 105 narrows down the search range of the unregistered word class-specific word storage unit 202 based on the class determination result by the unregistered word class determination unit 201 to search for unregistered words. Perform a search. When the speech recognition apparatus 200 acquires unregistered word candidates in this manner, the unregistered word candidate is presented to the user via the result display unit 107 as in the case of the first embodiment.
  • the unregistered word category can be determined from information before and after the unregistered word in the recognized sentence. For example, if the user ’s utterance is “I want to see the bans and threads in which ⁇ appears,” it is assumed that “ ⁇ ” is an unregistered word in the proper personal name class, and “ For the utterance “Record ⁇ ”, ⁇ is considered as an unregistered word in the program title class. In this way, a class N-gram language model including an unregistered word class can be used as a model for estimating the word class at the target location from the context before and after the sentence.
  • Figure 14 shows the functional configuration of the unregistered word class determination unit when using a class N-gram language model that includes unregistered word classes.
  • the unregistered word class determination unit 201a when using a class N gram language model, includes a word string hypothesis generation unit 211, a class N gram generation storage unit 221 and a class dependent A word N-gram generation / accumulation unit 231.
  • the word string hypothesis generation unit 211 refers to a class N gram that evaluates a sequence of words and unregistered word classes, and a class-dependent word N gram that evaluates a word string that forms an unregistered word class. A word string hypothesis is generated from the word matching result and a recognition result is acquired. [0116] The class N-gram generation / accumulation unit 221 generates a class N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a context including an unregistered word class, and generates the generated class N-gram. Accumulate ram.
  • the class-dependent word N-gram generation / accumulation unit 231 generates and generates a class-dependent word N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a word sequence in an unregistered word class. Class-dependent words N-grams are accumulated.
  • FIG. 15 shows a functional configuration of the class N gram generation / storage unit 221.
  • the class N gram generation / storage unit 221 includes a sentence expression corpus storage unit 222 in which a large number of sentence expressions to be recognized are stored in advance as text, and a sentence expression for morphological analysis of the sentence expression.
  • the sentence expression corpus accumulating unit 222 accumulates a large number of sentence expression data libraries to be recognized in advance.
  • the morpheme analysis unit 223 for sentence expression has a meaning from a relatively long sentence expression such as “Record the weather forecast for tomorrow” stored in the sentence expression corpus storage unit 222. Analyzes the morphemes that are the language units of.
  • the class N-gram generation unit 224 extracts a word string included in the text parsed into morphemes, refers to an unregistered word class input from the class-dependent word N-gram generation storage unit 231 described later, If the corresponding unregistered word class exists, the unregistered word class included in the text is replaced with a virtual word, and the statistic of the chain of words or unregistered word classes is obtained to determine the word or unregistered word class. A class N-gram that associates class chains with their probabilities is generated. The class N gram generated by the class N gram generation unit 224 is stored in the class N gram storage unit 225.
  • the class-dependent word N-gram generation / storage unit 231 includes a class corpus storage unit 232, a class morpheme analysis unit 233, a class-dependent word N-gram generation unit 234, and a class-dependent word.
  • the N-gram storage unit 235, the unregistered word class definition generation unit 236, and the unregistered word class definition storage unit 237 are configured.
  • the class corpus storage unit 232 stores in advance a data library of unregistered words (for example, a title of a TV program, a person's name, etc.) having the same semantic properties and syntactic properties.
  • the class morphological analyzer 233 performs morphological analysis on the class corpus.
  • the class morpheme analyzer 122 is an unregistered word having a relatively short common property such as a TV program name such as “MMM weather forecast” stored in the class cone storage 121. Are analyzed in morpheme units.
  • the class-dependent word N-gram generation unit 234 processes the morphological analysis results, obtains the statistic of the word chain, and obtains the class-dependent word N-drum, which is information associating the word string with its probability. Generate.
  • the class-dependent word N-gram storage unit 235 stores the class-dependent word N-gram generated by the class-dependent word N-gram generation unit 234.
  • the class-dependent word N-gram stored in the class-dependent word N-gram storage unit 235 is referred to by the word string hypothesis generation unit 211 during speech recognition.
  • the unregistered word class definition generation unit 236 generates a definition of an unregistered word class in which unregistered words having the same characteristics as the morphological analysis result power of the class corpus are defined as classes. In other words, morphological analysis is performed on unregistered words with common characteristics, and a class definition is generated in which the obtained word string is a word string of the unregistered word class.
  • the unregistered word class definition storage unit 237 stores the unregistered word class definition generated by the unregistered word class definition generation unit 236. This unregistered word class definition is referenced by the class N gram generation unit 224 of the class N gram generation storage unit 221 when generating the above class N gram.
  • P (C I C, ..., C) means the probability that an n-chain of word classes will occur.
  • J J c force also means the probability that a specific word w will occur.
  • a class means a group that takes into account the connectivity of the word, such as the part of speech of the word and the unit that further subdivides it.
  • the probability of occurrence of “recording 00” is that if “00” is considered a word of a unique program name class,
  • an example of a determination method using a class N-gram language model is shown as a method of determining an unregistered word category in the unregistered word class determination unit.
  • a method using the context information of the dialogue is possible.
  • the dialog management unit of the voice dialog system generates estimated information on word categories that the user is likely to utter from the dialog history information, and transmits this to the unregistered word class determination unit.
  • the unregistered word class determination unit determines the category of the unregistered word word from the estimated information regarding the word category transmitted.
  • FIG. 19 shows a block diagram of the unregistered word class determination unit in such a configuration.
  • the unregistered word class determination unit 201b is based on the category acquired by the word category information receiving unit 241 that acquires the category of the spoken word and the category acquired by the word category information receiving unit 241.
  • An unregistered word class determining unit 242 that determines a category of a word determined to be a registered word.
  • the effect of adopting such a configuration is that when the unregistered word category is determined using the class N-gram language model, the expected input utterance must be a sentence utterance.
  • estimation results from applications such as the dialog management unit it is possible to determine the category even if the input speech is word utterance.
  • FIG. 20 is a block diagram showing a functional configuration of the speech recognition apparatus according to the third embodiment.
  • the speech recognition apparatus 300 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105,
  • the speech recognition apparatus 300 according to the third embodiment includes the unregistered word word storage unit 301 connected to the unregistered word word server 303 via the network 302. Different from 100 etc. speech recognition device related to 1 etc. In the following, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment and the description thereof is omitted.
  • the unregistered word storage unit 301 has a function of storing a large number of unregistered words that are to be searched for unregistered words in the unregistered word candidate search unit 105, and at the same time updating the stored information by means of communication means. .
  • the network 302 is a communication network such as the Internet or a telephone line.
  • the unregistered word word server 303 stores the necessary latest unregistered word words, and provides a server device that provides such information to a client (in this case, the speech recognition device 300). It is.
  • the output flow of the speech recognition apparatus for the user's utterance in the third embodiment is the same as that shown in the first embodiment.
  • the difference in the third embodiment is in the maintenance method of the unregistered word word storage unit 301 referred to by the unregistered word candidate search unit 105.
  • the unregistered word storage unit 301 can be arbitrarily updated.
  • words that change and increase daily such as unique names and unique program names
  • the user's spoken words cannot be searched when searching for unregistered word candidates. For example, depending on the time of program modification in television broadcasting or the start of a new season in pro sports, New titles appear, new entertainers and new athlete names appear, and these become unregistered words.
  • the update operation of the words stored in the unregistered word storage unit 301 is specifically performed as follows.
  • An unregistered word storage unit 301 has been registered in advance to determine the number of days in which unregistered words are expected to increase rapidly. When this date arrives, an unregistered word update request is automatically sent to the telephone line. And transmitted to the unregistered word server 303 via the network 302 such as the Internet. Alternatively, an unregistered word storage unit 301 always performs an update request according to a predetermined schedule, and a user who feels that there is a shortage of unregistered word registration makes an update request. An update request is transmitted from the word storage unit 301 to the unregistered word server 303.
  • the unregistered word word storage unit 301 detects that a certain amount of unregistered words has been added just by actively transmitting an update request to the unregistered word word server 303 at all times.
  • the server 303 may transmit update information to the unregistered word storage unit 301 of each client.
  • the unregistered word word server 303 that has received the update request or has determined that the update is necessary because the new unregistered word has reached the specified amount, stores the information about the added word in the unregistered word storage unit 301 of the client. Reply to
  • the third embodiment it is dedicated to the operation of updating a word that is optimally maintained.
  • the unregistered word word storage unit 301 updates the word stored in accordance with the unregistered word word server 303, but it is specialized for the update work, and the word is updated using information held by the server. It is also possible.
  • EPG Electronic Program Guide
  • genres that the user is unlikely to utter for example, genre information such as professional baseball player names, foreign movie actor names, Japanese movie titles, etc. are extracted in advance.
  • the unregistered words of these extracted genres may not be acquired from the unregistered word word server 303. Thereby, it is possible to obtain an effect of preventing the unregistered word word storage unit 301 from being unnecessarily enlarged.
  • a modification in which the word stored in the speech recognition vocabulary storage unit 102 is updated is also conceivable.
  • a server provided outside the figure selects a word that is likely to be spoken by the user in the near future, and the speech recognition vocabulary storage unit 102 for the selected word. The contents of may be updated.
  • a word for example, when this audio recognition device 300 is applied to a recording reservation system, it is related to a program scheduled to be broadcast within one week from the performer names and program titles recorded in the EPG described above. Can be used appropriately.
  • the server generates information used by the speech recognition unit 101 for recognizing the extracted word, and updates the content of the speech recognition vocabulary storage unit 102 with the generated information.
  • Such an update operation can be performed in exactly the same manner as the operation of updating the contents of the unregistered word word storage unit 301 from the unregistered word word server 303 via the network 302.
  • the information for recognizing words related to programs scheduled to be broadcast in the past is deleted every day, and the information for recognizing words related to programs scheduled to be broadcast one week ahead is added. May be.
  • the recognition word speech recognition vocabulary
  • the recognition word is used to increase the frequency. Since only a relatively small number of information for recognition that is expected to be used can be stored in the speech recognition vocabulary storage unit 102, it is easy to shorten the recognition time and obtain a good recognition rate.
  • FIG. 21 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 4 of the present invention.
  • the speech recognition apparatus 400 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, and a result display unit 107.
  • the speech recognition apparatus 400 according to the fourth embodiment includes the unregistered word search request transmission / reception unit 401 connected to the unregistered word search server 403 via the network 402. It differs from the speech recognition apparatus 100 according to 1-3. Hereinafter, this difference will be mainly described. Note that the same reference numerals are given to the same portions as those in the first embodiment and the description thereof is omitted.
  • the unregistered word search request transmission / reception unit 401 transmits an unregistered word search request to the unregistered word search server 403 via the network 402, and the unregistered word search server 403 searches for the unregistered word.
  • a processing unit that receives the result, and is realized by a communication interface or the like.
  • This unregistered word search request transmission / reception unit 401 receives the subword sequence obtained by the reference similarity calculation unit 103 as described in Embodiment 1 and the input speech when search for an unregistered word is necessary.
  • the network 402 is a communication network such as the Internet or a telephone line.
  • the unregistered word search server 403 is a server device that searches for unregistered words in response to a request from the client (speech recognition device 400). And a storage unit 405.
  • the unregistered word search unit 404 is a processing unit that performs an unregistered word search, receives information about unregistered words from the client via the network 402, and receives the search result via the network 402. It also has a communication function for replying.
  • Unregistered word word storage section 405 is a storage device such as a hard disk that stores information related to unregistered word words.
  • the output flow of the speech recognition apparatus for the user's utterance is the same as that shown in the first embodiment.
  • the difference in the fourth embodiment is that the unregistered word candidate search unit 105 in the first embodiment is not provided inside, and the search operation for unregistered word candidates is outsourced to an external server.
  • the subword sequence of the unregistered word portion obtained by the reference similarity calculation unit 103 is transmitted / received as an unregistered word search request.
  • the unit 401 transmits the unregistered word search server 403.
  • the unregistered word search unit 404 that has received the sub-word sequence of the unregistered word portion from the client searches the unregistered word uttered by the user from the word group stored in the unregistered word word storage unit 405.
  • the method described with reference to FIG. 10 in Embodiment 1 described above is effective.
  • the search results obtained in this way are returned to the unregistered word search request transmission / reception unit 401 via the network 402 as unregistered word candidates.
  • the unregistered word search request transmission / reception unit 401 passes the returned unregistered word search result to the result display unit 107, and uses the fact that the word spoken by the user via the result display unit 107 was an unregistered word. Present to the person.
  • the server side can generally have a relatively large hardware configuration, it is difficult to install unregistered words that are difficult to install on the client side such as a mobile terminal.
  • a search algorithm can be implemented, and it may be possible to improve the search accuracy for unregistered words.
  • the present invention relates to various electronic devices that use speech recognition technology as input means to devices, such as AV devices such as TVs and videos, car-mounted devices such as car navigation systems, and portable devices such as PDAs and mobile phones. It can be used for terminals, etc., and its industrial applicability is very wide and large.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A speech recognition device can show a user whether the word pronounced by the user is an unregistered word for a speech recognition dictionary and whether the word should be re-pronounced because of erroneous recognition. The speech recognition device includes: a speech recognition vocabulary storage unit (102) for defining a vocabulary for speech recognition and storing it as registered words; a speech recognition unit (101) for correlating the speech pronounced with a registered word; an unregistered word judgment unit (104) for judging whether the speech pronounced is a registered word or an unregistered word according to the correlationresult of the speech recognition unit (101); an unregistered word storage unit (106) for storing an unregistered word; an unregistered word candidate search unit (105) used when the speech is judged to be an unregistered word by the unregistered word judgment unit (104), for searching an unregistered word candidate which is considered to correspond to the speech pronounced, from the unregistered word storage unit (106); and a result display unit (107) for displaying the search result.

Description

明 細 書  Specification
音声認識装置、音声認識方法、及びプログラム  Speech recognition apparatus, speech recognition method, and program
技術分野  Technical field
[0001] 本発明は、音声認識に基づくマン'マシン 'インタフェースに用いられる音声認識装 置に関し、特に、未登録語発声に対する応答技術に関する。  TECHNICAL FIELD [0001] The present invention relates to a speech recognition device used for a man-machine interface based on speech recognition, and more particularly to a response technique for unregistered word utterances.
背景技術  Background art
[0002] 従来、利用者に使!ヽ易 、機器コントロールのための入力フロントエンドとして、音声 認識技術を応用したものが存在する。一般に、音声認識では、非特許文献 1に述べ られているように発話された音声と、あらかじめ音声認識用辞書に定められた単語群 の標準パタンそれぞれとの比較を行な 、、最も近 、ものを認識結果とすると 、う方法 が採られる。  [0002] Conventionally, there has been an application of speech recognition technology as an input front end for ease of use by users and for device control. In general, in speech recognition, the speech spoken as described in Non-Patent Document 1 is compared with the standard patterns of words set in advance in the speech recognition dictionary. If the recognition result is, then the following method is adopted.
[0003] しかし、機器の利用者は、音声認識が対象として 、る単語群を全て覚えて 、る訳で はないため、音声認識が対象としていない単語を発話するという事が起こりうる。この ような場合において、上述したような音声認識の基本的枠組では、音声認識用辞書 内の最も近 、単語を結果として返すので、必然的に誤認識を生じてしまうと 、う問題 がある。このような問題に対して、音声認識用辞書に存在しない単語 (未登録語)の 利用者による発話を検出する方法が考案されている。  [0003] However, since the user of the device remembers not all the word groups targeted for speech recognition but does not translate, it may happen that the user speaks a word not intended for speech recognition. In such a case, in the basic framework of speech recognition as described above, the closest word in the speech recognition dictionary is returned as a result, so that there is a problem that erroneous recognition will inevitably occur. To solve this problem, a method has been devised to detect the utterances by users of words (unregistered words) that do not exist in the speech recognition dictionary.
[0004] 例えば、特許文献 1においては、入力音声と音声認識用辞書の各単語との類似度 を求めると共に、単位標準パタンを結合したパタンカゝら参照類似度を求めて、各単語 に求められた類似度を補正し、この補正類似度が一定のしきい値に満たなければ、 利用者の発話を未登録語とみなすと!ヽぅ方法が記載されて!ヽる。  [0004] For example, in Patent Document 1, the similarity between the input speech and each word in the speech recognition dictionary is obtained, and the reference similarity is obtained by combining the unit standard pattern with each reference word. If the similarity is corrected and the corrected similarity does not reach a certain threshold, the user's utterance is regarded as an unregistered word, and a method is described.
[0005] また、特許文献 2にお!/、ては、音素 HMM (Hidden Markov Model)と、ガーベジ H MMとを用いて、未登録語を、少ない処理量で高精度に検出する方法が記載されて いる。  [0005] Also, Patent Document 2 describes a method for detecting unregistered words with a small amount of processing and high accuracy using a phoneme HMM (Hidden Markov Model) and a garbage HMM. It has been done.
[0006] そして、利用者による未登録語の発話を検出した際に、そのことを利用者に、ビー プ音等の警告音にて示したり、「それはありません」等の発話内容を代名詞で言い換 えた応答出力によって示したりすることが容易に考えられる。 [0007] しかし、このような応答を返すだけでは、利用者にとっては不十分である。なぜなら 、この応答から、自らが発話した単語が、たまたま認識されな力つたの力、未登録語 であるのかを明瞭に切り分けることができないからである。 [0006] When an utterance of an unregistered word by a user is detected, this is indicated to the user by a warning sound such as a beep sound, or the utterance content such as “it does not exist” is pronouned. It can easily be indicated by the converted response output. However, simply returning such a response is insufficient for the user. This is because, from this response, it is impossible to clearly distinguish whether a word spoken by itself is an unrecognized word or a force that is unrecognized.
[0008] このため、利用者は、納得がいくか、または、諦めるまで、発音に一層の注意を払い ながら発話を続けざるを得ず、音声入力による機器コントロールの利便性を低下させ てしまうといった問題が起こる。  [0008] For this reason, the user is forced to continue speaking while paying more attention to the pronunciation until the user is satisfied or gives up, and the convenience of device control by voice input is reduced. Problems arise.
[0009] このような問題に対して、特許文献 3では、利用者による未登録語の発話が検出さ れると、状況に応じて機器が受理可能な単語の一覧を、利用者に提示する方法が記 載されている。これによれば利用者は、機器に認識される単語を知らなくても、未登 録語を発話するたびに、その状況で発話可能な単語を教示されるため、何度も同じ 単語の発話を繰り返すことなぐ思いの動作を実現することが可能である。  [0009] With respect to such a problem, in Patent Document 3, when the utterance of an unregistered word is detected by the user, a method of presenting a list of words that can be accepted by the device according to the situation to the user Is written. According to this, even if the user does not know the word recognized by the device, every time he / she utters an unregistered word, he / she is taught a word that can be spoken in that situation. It is possible to realize the desired operation without repeating the above.
[0010] また、特許文献 4では、従来の音声認識用辞書に相当する内部辞書と、従来の音 声認識用辞書では未登録語となる単語を多数格納した外部辞書とを合わせて、音声 認識用辞書として音声認識を行な!ヽ、外部辞書に含まれる単語が認識結果となった 場合には、これが未登録語であることを同時に提示するという方法が記載されている 。これによれば、例えば「松下太郎」という単語が外部辞書に含まれている状態で、「 松下太郎」と利用者が発話すると、「松下太郎はおりません。」といった応答が可能と なる。  [0010] Also, in Patent Document 4, speech recognition is performed by combining an internal dictionary corresponding to a conventional speech recognition dictionary and an external dictionary that stores a large number of unregistered words in the conventional speech recognition dictionary. A method is described in which speech recognition is performed as a dictionary for use, and when a word included in an external dictionary is recognized as a recognition result, it is simultaneously indicated that this is an unregistered word. According to this, for example, when the user “Taro Matsushita” speaks in a state where the word “Taro Matsushita” is included in the external dictionary, a response such as “Taro Matsushita does not exist” becomes possible.
特許文献 1:特許第 2808906号公報  Patent Document 1: Japanese Patent No. 2808906
特許文献 2:特許第 2886117号公報  Patent Document 2: Japanese Patent No. 2886117
特許文献 3:特許第 3468572号公報  Patent Document 3: Japanese Patent No. 3468572
特許文献 4:特開平 9— 230889号公報  Patent Document 4: Japanese Patent Laid-Open No. 9-230889
非特許文献 1 :鹿野清宏、中村哲、伊勢史郎, 「ディジタル信号処理シリーズ 5 :音声 · 音情報のディジタル信号処理」,昭晃堂, 1997年 11月 10日, p. 45, 53  Non-Patent Document 1: Kiyohiro Shikano, Satoshi Nakamura, Shiro Ise, “Digital Signal Processing Series 5: Digital Signal Processing of Voice / Sound Information”, Shosodo, November 10, 1997, p. 45, 53
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0011] し力しながら、上記特許文献 3の方法では、受理可能な単語の数が非常に多くなつ た場合に、多数の単語の中から利用者が所望の単語を探す必要を生じさせ、利用者 にとつて見落としや煩わしさの原因となりうる。例えば、利用者がそのシステムに存在 しな 、「松下太郎」 、う人物名を発話し、受理可能な単語の中力 人物検索により「 松下太郎」を探そうとした際に、検索可能な人物名として 100名の名前が一覧表示さ れたとすると、利用者は、そのリストの中に「松下太郎」がいるのかいないのか、また、However, in the method of Patent Document 3 described above, when the number of acceptable words becomes very large, the user needs to search for a desired word from among a large number of words. user This can cause oversight and annoyance. For example, if a user does not exist in the system, “Taro Matsushita”, who speaks the name of a person, and tries to find “Taro Matsushita” by searching for a person who can accept words. Assuming that 100 names are listed, the user can check whether there is “Taro Matsushita” in the list,
「松下太郎」の代わりの人物が!/、るのか!ヽな 、のかを確認しなければならな!/、。このよ うな場合に、利用者は、「松下太郎」を見落としてしまうおそれがあり、また、「松下太 郎」を探し出すのは、煩わしい作業であるとともに、容易ではない。 I have to make sure that the person who replaces “Taro Matsushita” is! /! In such a case, the user may overlook “Taro Matsushita”, and finding “Taro Matsushita” is both annoying and not easy.
[0012] また、上記特許文献 4の方法にぉ 、て、前述したような応答を良好に返すには、未 登録語単語辞書としての外部辞書に、非常に多くの単語を登録する必要があるが、 このような大語彙の辞書を用いて音声認識を行なった場合には、類似する多くの単 語が登録されているが故に認識誤りが発生し易くなるという背反する問題が生じる。こ の結果、例えば「松下太郎」という利用者の発話に対して、「松下徹はおりません。」と いった応答や、「松下徹ですね。」といった応答を返してしまい、利用者は無用の混 乱に陥ったり、再発話を余儀なくされてしまうという問題が起こりうる。  [0012] Further, in order to return the above-described response satisfactorily using the method of Patent Document 4, it is necessary to register a very large number of words in an external dictionary as an unregistered word word dictionary. However, when speech recognition is performed using such a large vocabulary dictionary, there is a contradictory problem that recognition errors are likely to occur because many similar words are registered. As a result, for example, in response to the user's utterance “Taro Matsushita”, a response such as “Toru Matsushita” or “Toru Matsushita” is returned. There can be problems such as useless confusion or relapse.
[0013] そこで、本発明は、力かる問題に鑑みなされたものであり、利用者が無益な再発話 を試みる状況を減らすことができる音声認識装置を提供することを目的とする。  [0013] Therefore, the present invention has been made in view of a serious problem, and an object of the present invention is to provide a speech recognition device that can reduce the situation in which a user tries a useless recurrent speech.
課題を解決するための手段  Means for solving the problem
[0014] 上記目的を達成するために、本発明に係る音声認識装置は、発話された音声を認 識する音声認識装置であって、音声認識のための語彙を定義し、登録語として記憶 する音声認識用単語記憶手段と、前記発話された音声と、前記音声認識用単語記 憶手段に記憶されて!ヽる登録語とを照合する音声認識手段と、前記音声認識手段 の照合結果に基づいて、前記発話された音声が、前記音声認識用単語記憶手段に 記憶されて ヽる登録語である力、記憶されて 、な 、未登録語であるかを判定する未 登録語判定手段と、前記未登録語を記憶する未登録語単語記憶手段と、前記未登 録語判定手段で未登録語と判定された場合に、前記発話された音声に基づ!ヽて、 前記発話された音声に対応すると思われる未登録語候補を、前記未登録語単語記 憶手段に記憶されている未登録語の中から検索する未登録語候補検索手段と、前 記検索結果を表示する結果表示手段とを備えることを特徴とする。 [0015] ここで、前記音声認識装置は、前記未登録語単語記憶手段に記憶されて!ヽな ヽ未 登録語群を記憶する未登録語単語サーバと通信する通信手段を備え、前記通信手 段が前記未登録単語サーバから前記未登録語群を受信することによって、前記未登 録語単語記憶手段に記憶されて ヽる未登録語を更新するとしてもよ ヽ。 In order to achieve the above object, a speech recognition device according to the present invention is a speech recognition device that recognizes spoken speech, defines a vocabulary for speech recognition, and stores it as a registered word. Based on speech recognition word storage means, speech recognition means for verifying the spoken speech and registered words stored in the speech recognition word storage means, and based on the verification result of the speech recognition means Unspoken word determining means for determining whether the spoken speech is a registered word stored in the speech recognition word storage means and stored, unregistered word; The unregistered word storage means for storing the unregistered word, and the spoken voice based on the spoken voice when the unregistered word determining means determines that it is an unregistered word. Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results. Here, the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit. The stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.
[0016] なお、本発明は、このような音声認識装置として実現することができるだけでなぐこ のような音声認識装置が備える特徴的な手段をステップとする音声認識方法として実 現したり、それらのステップをコンピュータに実行させるプログラムとして実現したりす ることもできる。そして、そのようなプログラムは、 CD—ROM等の記録媒体やインタ 一ネット等の伝送媒体を介して配信することができるのは言うまでもない。  It should be noted that the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in a speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
発明の効果  The invention's effect
[0017] 本発明によれば、利用者による音声認識が不成立となる未登録語単語の発話を利 用者に提示すると同時に、それが認識誤りによるものでないことも利用者に判り易く 提示することができる。  [0017] According to the present invention, an utterance of an unregistered word that causes voice recognition by the user to fail is presented to the user, and at the same time, it is also presented to the user that it is not due to a recognition error. Can do.
[0018] また、本発明によれば、本来の目的である音声認識辞書内の単語の発話に対する 認識率を低下させることがな ヽ。  [0018] Further, according to the present invention, the recognition rate for the utterance of a word in the speech recognition dictionary, which is the original purpose, cannot be reduced.
[0019] さらに、未登録語候補を検索するための未登録語単語記憶手段は、非常に大きな ものとなり、また常にメンテナンスが必要となるが、この機能をサーバとして機器力も切 り離すことで、機器の製造コストを下げることを可能とすると同時に、未登録語単語記 憶手段のメンテナンスコストも下げることが可能になる。 [0019] Further, the unregistered word word storage means for searching for unregistered word candidates is very large and always requires maintenance. By separating this function from a server as a function, It is possible to reduce the manufacturing cost of the device and at the same time to reduce the maintenance cost of the unregistered word storage means.
図面の簡単な説明  Brief Description of Drawings
[0020] [図 1]図 1は、本発明の実施の形態 1に係る音声認識装置の機能的な構成を示すブ ロック図である。  FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 1 of the present invention.
[図 2]図 2は、本実施の形態 1に係る音声認識装置の動作を示すフローチャートであ る。  FIG. 2 is a flowchart showing the operation of the speech recognition apparatus according to the first embodiment.
[図 3]図 3は、本実施の形態 1に係る認識語彙発話時の音声認識部の出力例を示す 図である。  FIG. 3 is a diagram showing an output example of a speech recognition unit when a recognized vocabulary is uttered according to the first embodiment.
[図 4]図 4は、本実施の形態 1に係る認識語彙発話時の参照類似度計算部の出力例 を示す図である。 [図 5]図 5は、本実施の形態 1に係る認識語彙発話時の結果表示例を示す図である。 [FIG. 4] FIG. 4 is a diagram showing an output example of a reference similarity calculation unit when a recognized vocabulary is uttered according to Embodiment 1. FIG. 5 is a diagram showing a result display example when a recognized vocabulary is uttered according to the first embodiment.
[図 6]図 6は、本実施の形態 1に係る未登録語発話時の音声認識部の出力例を示す 図である。 FIG. 6 is a diagram showing an output example of a speech recognition unit when an unregistered word is uttered according to the first embodiment.
[図 7]図 7は、本実施の形態 1に係る未登録語発話時の参照類似度計算部の出力例 を示す図である。  FIG. 7 is a diagram showing an output example of a reference similarity calculation unit when an unregistered word is uttered according to the first embodiment.
[図 8]図 8は、本実施の形態 1に係る未登録語候補検索部の出力例を示す図である。  FIG. 8 is a diagram showing an output example of an unregistered word candidate search unit according to the first embodiment.
[図 9]図 9は、本実施の形態 1に係る未登録語発話時の結果表示例を示す図である。 FIG. 9 is a diagram showing a result display example when an unregistered word is uttered according to the first embodiment.
[図 10]図 10は、本実施の形態 1に係る未登録語検索時の音素系列間類似度の計算 方法を示す図である。 FIG. 10 is a diagram showing a calculation method of similarity between phoneme sequences at the time of unregistered word search according to the first embodiment.
[図 11]図 11は、未知発話検出装置の機能的な構成を示すブロック図である。  FIG. 11 is a block diagram showing a functional configuration of the unknown utterance detection apparatus.
[図 12]図 12は、本発明の実施の形態 2に係る音声認識装置の機能的な構成を示す ブロック図である。  FIG. 12 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 2 of the present invention.
[図 13]図 13は、本実施の形態 2に係る未登録語カテゴリの例を示す図である。  FIG. 13 is a diagram showing an example of unregistered word categories according to the second embodiment.
[図 14]図 14は、クラス Nグラム言語モデルを利用する未登録語クラス判定部の機能 構成を示すブロック図である。  FIG. 14 is a block diagram showing a functional configuration of an unregistered word class determination unit using a class N-gram language model.
[図 15]図 15は、クラス Nグラム生成蓄積部の機能構成を示すブロック図である。  FIG. 15 is a block diagram showing a functional configuration of a class N gram generation / storage unit.
[図 16]図 16は、クラス依存単語 Nグラム生成蓄積部の機能構成を示すブロック図で ある。 FIG. 16 is a block diagram showing a functional configuration of a class-dependent word N-gram generation / storage unit.
[図 17]図 17は、本実施の形態 2に係る未登録語クラス判定のためのクラス Nグラム言 語モデルの例を示す図である。  FIG. 17 is a diagram showing an example of a class N-gram language model for unregistered word class determination according to the second embodiment.
[図 18]図 18は、本実施の形態 2に係る異なるクラスの未登録語の発話時の結果表示 例を示す図である。  FIG. 18 is a diagram showing a display example of a result when an unregistered word of a different class according to the second embodiment is uttered.
[図 19]図 19は、本実施の形態 2に係る外部アプリケーション力 未登録語クラス判定 のための情報を取得する、未登録語クラス判定部の構成を示す図である。  FIG. 19 is a diagram showing a configuration of an unregistered word class determination unit that acquires information for external application force unregistered word class determination according to the second embodiment.
[図 20]図 20は、本発明の実施の形態 3に係る音声認識装置の機能的な構成を示す ブロック図である。 FIG. 20 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 3 of the present invention.
[図 21]図 21は、本発明の実施の形態 4に係る音声認識装置の機能的な構成を示す ブロック図である。 符号の説明 FIG. 21 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 4 of the present invention. Explanation of symbols
100、 200、 300、 400 音声認識装置  100, 200, 300, 400 Voice recognition device
101 音声認識部  101 Voice recognition unit
102 音声認識語彙格納部  102 Voice recognition vocabulary storage
103 参照類似度計算部  103 Reference similarity calculator
104 未登録語判定部  104 Unregistered word determination part
105 未登録語候補検索部  105 Unregistered word candidate search section
106、 301 未登録語単語格納部  106, 301 Unregistered word storage
107 結果表示部  107 Result display area
111 音声片パタン格納部  111 Voice pattern unit
112 単語辞書格納部  112 Word dictionary storage
113 単語マッチング部  113 Word matching part
114 遷移確率格納部  114 Transition probability storage
115 音声系列マッチング部  115 Voice sequence matching section
116 候補スコア差計算部  116 Candidate score difference calculator
117 候補 ·音素系列間類似度計算部 117 candidates
118 候補 ·音声系列スコア差計算部 118 candidates · Voice sequence score difference calculator
119 候補 ·音声系列 ·音素系列類似度計算部 119 Candidate · Speech sequence · Phoneme sequence similarity calculator
201、 201a, 201b 未登録語クラス判定部 201, 201a, 201b Unregistered word class judgment part
202 未登録語クラス別単語格納部  202 Unregistered word class word storage
211 単語列仮説生成部  211 Word string hypothesis generator
221 クラス Nグラム生成蓄積部  221 class N-gram generator
222 文表現コーパス蓄積部  222 sentence expression corpus storage
223 文表現用形態素解析部  223 Morphological analyzer for sentence expression
224 クラス Nグラム生成咅  224 Class N-gram generator
225 クラス Nグラム蓄積咅  225 class N-gram storage 咅
231 クラス依存単語 Nグラム生成蓄積部 231 Class-dependent words N-gram generator
232 クラスコーパス蓄積部 233 クラス用形態素解析部 232 class corpus storage 233 Class Morphological Analyzer
234 クラス依存単語 Nグラム生成部  234 Class-dependent word N-gram generator
235 クラス依存単語 Nグラム蓄積部  235 Class-dependent words N-gram storage
236 未登録語クラス定義生成部  236 Unregistered word class definition generator
237 未登録語クラス定義蓄積部  237 Unregistered word class definition storage
241 単語カテゴリ情報受信部  241 Word category information receiver
242 未登録語クラス決定部  242 Unregistered word class decision part
302、 402 ネットワーク (通信手段)  302, 402 network (communication means)
303 未登録語単語サーバ  303 Unregistered word server
401 未登録語検索要求送受信部  401 Unregistered word search request transmission / reception unit
403 未登録語検索サーバ  403 Unregistered word search server
404 未登録語検索部  404 Unregistered word search part
405 未登録語単語格納部  405 Unregistered word storage
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0022] 上記目的を達成するために、本発明に係る音声認識装置は、発話された音声を認 識する音声認識装置であって、音声認識のための語彙を定義し、登録語として記憶 する音声認識用単語記憶手段と、前記発話された音声と、前記音声認識用単語記 憶手段に記憶されて!ヽる登録語とを照合する音声認識手段と、前記音声認識手段 の照合結果に基づいて、前記発話された音声が、前記音声認識用単語記憶手段に 記憶されて ヽる登録語である力、記憶されて 、な 、未登録語であるかを判定する未 登録語判定手段と、前記未登録語を記憶する未登録語単語記憶手段と、前記未登 録語判定手段で未登録語と判定された場合に、前記発話された音声に基づ!ヽて、 前記発話された音声に対応すると思われる未登録語候補を、前記未登録語単語記 憶手段に記憶されている未登録語の中から検索する未登録語候補検索手段と、前 記検索結果を表示する結果表示手段とを備えることを特徴とする。  In order to achieve the above object, a speech recognition device according to the present invention is a speech recognition device that recognizes spoken speech, defines a vocabulary for speech recognition, and stores it as a registered word. Based on speech recognition word storage means, speech recognition means for verifying the spoken speech and registered words stored in the speech recognition word storage means, and based on the verification result of the speech recognition means Unspoken word determining means for determining whether the spoken speech is a registered word stored in the speech recognition word storage means and stored, unregistered word; The unregistered word storage means for storing the unregistered word, and the spoken voice based on the spoken voice when the unregistered word determining means determines that it is an unregistered word. Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results.
[0023] これによつて、利用者の発話した単語が未登録語である場合に、未登録語候補を 検索して提示するので、利用者は、未登録語候補の中に自身の発話した単語が含ま れていることを確認するだけで未登録語発話を自覚することができる。また、未登録 語単語候補の検索を、音声認識辞書内の単語比較と別に行なうので、音声認識そ のものの性能を低下させることがな 、。 [0023] With this, when the word spoken by the user is an unregistered word, an unregistered word candidate is searched and presented, so that the user uttered himself / herself among the unregistered word candidates. You can recognize unregistered word utterances simply by confirming that words are included. Also unregistered Since word word candidates are searched separately from word comparison in the speech recognition dictionary, the performance of the speech recognition itself will not be degraded.
[0024] ここで、前記未登録語候補検索手段は、複数の未登録語候補を、前記未登録語単 語記憶手段に記憶されて ヽる未登録語の中から検索するとしてもよ ヽ。  [0024] Here, the unregistered word candidate search means may search a plurality of unregistered word candidates from unregistered words stored in the unregistered word word storage means.
[0025] この構成によれば、未登録語候補を 1単語に絞り込まな 、ので、未登録語候補の 検索につ 、て高精度が要求されず、ハードウェアリソースを低く抑えることができる。  [0025] According to this configuration, since unregistered word candidates are not narrowed down to one word, high accuracy is not required for searching for unregistered word candidates, and hardware resources can be kept low.
[0026] また、前記未登録語単語記憶手段は、前記未登録語の属するカテゴリに応じて、 前記カテゴリごとに分類して前記未登録語を記憶して 、るとするのが好ましぐ前記 音声認識装置は、さらに、前記発話された音声に基づいて、前記未登録語の属する カテゴリを判定する未登録語クラス判定手段を備え、また、前記未登録語候補検索 手段は、前記未登録語クラス判定手段の判定結果に基づいて、前記未登録語候補 を、前記未登録語単語記憶手段における分類されたカテゴリの中から検索するのが より好まし 、。  [0026] The unregistered word storage means preferably stores the unregistered words classified according to the category according to the category to which the unregistered word belongs. The speech recognition apparatus further includes unregistered word class determining means for determining a category to which the unregistered word belongs based on the spoken speech, and the unregistered word candidate search means includes the unregistered word More preferably, the unregistered word candidate is searched from the classified categories in the unregistered word word storage unit based on the determination result of the class determining unit.
[0027] これによつて、未登録語候補の検索範囲を未登録語のカテゴリに応じて絞り込むの で、利用者が本来意図して 、な力 たカテゴリの単語を未登録語候補として提示す るのを防止することができる。また、検索範囲を絞り込むので、未登録語候補の検索 精度を向上させることも可能となる。  [0027] This narrows down the search range of unregistered word candidates according to the category of unregistered words, so the words of the category that the user originally intended are presented as unregistered word candidates. Can be prevented. In addition, since the search range is narrowed down, it is possible to improve the search accuracy of unregistered word candidates.
[0028] また、前記音声認識装置は、さらに、前記カテゴリに関する情報を取得する情報取 得手段を備え、前記未登録語候補検索手段は、前記情報取得手段が取得した情報 に基づいて、前記未登録語候補を、前記未登録語単語記憶手段における分類され たカテゴリの中力も検索するとしてもよ 、。  [0028] In addition, the speech recognition apparatus further includes an information acquisition unit that acquires information about the category, and the unregistered word candidate search unit is based on the information acquired by the information acquisition unit. The registered word candidates may also be searched for the categorized category in the unregistered word word storage means.
[0029] この構成によれば、発音的には類似する力 状況から見て発話されたとは考えにく V、未登録語単語の候補を出力させな 、ので、提示する未登録語候補数を削減し、 利用者に未登録語候補を判り易く提示する音声認識装置が実現される。  [0029] According to this configuration, it is difficult to think that the utterance is uttered from the viewpoint of similar pronunciation in terms of pronunciation. V, the candidate for unregistered word is not output. A voice recognition device that reduces the number of unregistered words and presents them to the user in an easy-to-understand manner is realized.
[0030] さらに、前記未登録語候補検索手段は、前記発話された音声との類似する度合い を数値化した未登録語スコアを計算することにより、前記未登録語候補を検索し、前 記結果表示部は、前記検索結果として、前記未登録語候補とその未登録語スコアと を表示し、また、前記結果表示部は、前記未登録語スコアに応じて、前記未登録語 候補の表示を変更するのが好ま 、。 [0030] Further, the unregistered word candidate search means searches for the unregistered word candidate by calculating an unregistered word score obtained by quantifying the degree of similarity with the spoken speech, The display unit displays the unregistered word candidate and its unregistered word score as the search result, and the result display unit displays the unregistered word according to the unregistered word score. I prefer to change the display of the candidate.
[0031] これによつて、未登録語候補の提示において、未登録語候補を数値化し、また、未 登録語候補として、もっともらしいものを強調することにより、利用者に未登録語候補 を判り易く提示することができるという効果を奏する。 [0031] By this, when presenting unregistered word candidates, the unregistered word candidates are quantified, and the unregistered word candidates are emphasized by emphasizing the most likely unregistered word candidates. There is an effect that it can be presented easily.
[0032] また、前記未登録語単語記憶手段に記憶されて!ヽる未登録語は、所定の条件下で 更新されるとしてもよい。 [0032] Further, the unregistered words stored in the unregistered word storage means may be updated under a predetermined condition.
[0033] これによつて、日々増加する固有人名や番組のタイトル等の未登録語単語に対し て、これを素早く未登録語単語記憶手段に反映させることが可能になる。 [0033] This makes it possible to quickly reflect the unregistered word words such as unique names and program titles that increase daily in the unregistered word word storage means.
[0034] ここで、前記音声認識装置は、前記未登録語単語記憶手段に記憶されて!ヽな ヽ未 登録語群を記憶する未登録語単語サーバと通信する通信手段を備え、前記通信手 段が前記未登録単語サーバから前記未登録語群を受信することによって、前記未登 録語単語記憶手段に記憶されて ヽる未登録語を更新するとしてもよ ヽ。 [0034] Here, the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit. The stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.
[0035] これによつて、外部のサーノから新たな未登録語が提供されるので、日々増加する 固有人名や固有タイトルのような未登録語について、これらを未登録語単語記憶手 段に登録する手間を利用者に要求することなぐ未登録語単語記憶手段を最適な状 態に保つことができる。 [0035] As a result, new unregistered words are provided from an external Sano, so that unregistered words such as unique names and unique titles that are increasing daily are registered in the unregistered word storage unit. It is possible to keep the unregistered word storage means in an optimum state without requiring the user to do the trouble.
[0036] また、前記音声認識用単語記憶手段に記憶されて!、る登録語は、所定の条件下で 更新されるとしてもよい。  [0036] Also stored in the speech recognition word storage means! The registered words may be updated under predetermined conditions.
[0037] これによつて、登録語の使用頻度の時間変動に追従して、高い使用頻度が見込ま れる比較的少数の登録語のみを前記音声認識用単語記憶手段に記憶させておくこ とができるので、認識時間を短縮すると共に良好な認識率を得ることが容易となる。  [0037] Thus, following the time variation of the usage frequency of registered words, only a relatively small number of registered words that are expected to have a high usage frequency may be stored in the speech recognition word storage means. Therefore, it is easy to shorten the recognition time and obtain a good recognition rate.
[0038] また、本発明は、このような音声認識装置として実現することができるだけでなぐ音 声認識システムとして実現することもできる。すなわち、発話された音声を認識する音 声認識システムであって、前記音声認識システムは、発話された音声を認識する音 声認識装置と、前記音声認識装置に登録されて!ヽな ヽ未登録語を検索する未登録 語検索サーバとを備え、前記音声認識装置は、音声認識のための語彙を定義し、登 録語として記憶する音声認識用単語記憶手段と、前記発話された音声と、前記音声 認識用単語記憶手段に記憶されている登録語とを照合する音声認識手段と、前記 音声認識手段の照合結果に基づいて、前記発話された音声が、前記音声認識用単 語記憶手段に記憶されて ヽる登録語である力、記憶されて ヽな 、未登録語であるか を判定する未登録語判定手段と、前記未登録語判定手段で未登録語と判定された 場合に、前記未登録検索サーバに、前記発話された音声に対応すると思われる未 登録語候補の検索を要求する検索要求送信手段と、前記未登録語検索サーバから 前記未登録語候補の検索結果を取得する検索結果受信手段と、前記検索結果を表 示する結果表示手段とを備え、前記未登録語検索サーバは、前記未登録語を記憶 する未登録語単語記憶手段と、前記検索要求送信手段から前記検索要求を受信す る検索要求受信手段と、前記検索要求受信手段が前記検索要求を受信した場合に 、前記発話された音声に基づいて、前記発話された音声に対応すると思われる未登 録語候補を、前記未登録語単語記憶手段に記憶されて!ヽる未登録語の中から検索 する未登録語候補検索手段と、前記検索結果を前記音声認識装置に送信する検索 結果送信手段とを備えることを特徴とするとしてもよ ヽ。 Further, the present invention can be realized as a voice recognition system that can be realized as such a voice recognition apparatus. That is, a speech recognition system for recognizing spoken speech, the speech recognition system being registered in the speech recognition device for recognizing spoken speech and the speech recognition device! An unregistered word search server for searching words, wherein the speech recognition device defines a vocabulary for speech recognition and stores speech recognition word storage means as a registered word, and the spoken speech, Speech recognition means for collating a registered word stored in the speech recognition word storage means; Based on the collation result of the speech recognition means, whether the spoken speech is a registered word that is stored in the speech recognition word storage means and whether it is a stored unregistered word. When the unregistered word determining means and the unregistered word determining means determine that it is an unregistered word, the unregistered search server is searched for an unregistered word candidate that seems to correspond to the spoken speech. Search request transmitting means for requesting, search result receiving means for acquiring a search result of the unregistered word candidate from the unregistered word search server, and result display means for displaying the search result, and the unregistered word The search server includes an unregistered word storage unit that stores the unregistered word, a search request reception unit that receives the search request from the search request transmission unit, and the search request reception unit that has received the search request. If the departure Unregistered word candidates that are stored in the unregistered word word storage means and search for unregistered word candidates that are considered to correspond to the spoken voice based on the spoken voice! The apparatus may further include a search unit and a search result transmission unit that transmits the search result to the voice recognition device.
[0039] この構成によれば、音声認識インタフェースをコンパクトに実現することを可能とす ると同時に、未登録語単語格納部のメンテナンスコストを下げることができるという効 果を奏する。また、常に更新が必要とされる未登録語単語記憶手段を、複数の機器 に対して 1つにまとめることが可能となり、メンテナンスのコストを下げることもできる。  [0039] According to this configuration, it is possible to realize a voice recognition interface in a compact manner, and at the same time, it is possible to reduce the maintenance cost of the unregistered word word storage unit. In addition, unregistered word storage means that always needs to be updated can be combined into one for multiple devices, and maintenance costs can be reduced.
[0040] なお、本発明は、このような音声認識装置として実現することができるだけでなぐこ のような音声認識装置が備える特徴的な手段をステップとする音声認識方法として実 現したり、それらのステップをコンピュータに実行させるプログラムとして実現したりす ることもできる。そして、そのようなプログラムは、 CD—ROM等の記録媒体やインタ 一ネット等の伝送媒体を介して配信することができるのは言うまでもない。  [0040] It should be noted that the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in the speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
[0041] 以下、本発明を実施するための最良の形態について、図 1から図 21を参照しなが ら詳細に説明する。  Hereinafter, the best mode for carrying out the present invention will be described in detail with reference to FIGS. 1 to 21.
[0042] (実施の形態 1) [0042] (Embodiment 1)
図 1は、本発明の実施の形態 1に係る音声認識装置の機能的な構成を示すブロッ ク図である。  FIG. 1 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 1 of the present invention.
[0043] 図 1に示す音声認識装置 100は、マン 'マシン'インタフェースの 1つとして用いられ 、利用者力 音声入力を受け付け、入力された音声の認識結果を出力する装置であ り、音声認識部 101、音声認識語彙格納部 102、参照類似度計算部 103、未登録語 判定部 104、未登録語候補検索部 105、未登録語単語格納部 106、および、結果 表示部 107を備える。 [0043] The speech recognition apparatus 100 shown in FIG. 1 is used as one of the man 'machine' interfaces. User power is a device that accepts speech input and outputs the recognition result of the input speech.Speech recognition unit 101, speech recognition vocabulary storage unit 102, reference similarity calculation unit 103, unregistered word determination unit 104, An unregistered word candidate search unit 105, an unregistered word word storage unit 106, and a result display unit 107 are provided.
[0044] 音声認識部 101は、入力音声を取り込んでその発話内容を認識する処理部である  The speech recognition unit 101 is a processing unit that captures input speech and recognizes the utterance content.
[0045] 音声認識語彙格納部 102は、音声認識部 101で認識する語彙を規定し格納する ハードディスク等の記憶装置である。この音声認識語彙格納部 102は、各単語の標 準的な音響パタンを標準パタンとして、または、各単語の音響パタンを HMM(Hidden Markov Model)や-ユーラルネットと呼ばれるモデルで表現したものを格納して!/、る。 この音声認識語彙格納部 102は、あるいは、より短い音響単位ごとのパタンを表現し た標準パタンや、 HMMや-ユーラルネットなどのモデルで表現したものを格納して おり、音声認識時には単語ごとに単語パタンや単語モデルを合成して、音声認識部 101に提供する。 The voice recognition vocabulary storage unit 102 is a storage device such as a hard disk that defines and stores the vocabulary recognized by the voice recognition unit 101. This speech recognition vocabulary storage unit 102 stores a standard acoustic pattern of each word as a standard pattern, or a representation of the acoustic pattern of each word in a model called HMM (Hidden Markov Model) or -Uralnet. And! This speech recognition vocabulary storage unit 102 stores a standard pattern expressing a pattern for each shorter acoustic unit or a model expressed by a model such as an HMM or -Ural net. A word pattern or a word model is synthesized and provided to the speech recognition unit 101.
[0046] 参照類似度計算部 103は、入力音声が未登録語である力否かを判定するために 用いる参照類似度の計算を行なう処理部である。この参照類似度計算部 103は、サ ブワードと呼ばれる単語よりも短い音響単位のパタンやモデルを任意に組み合わせ て、入力音声に最も類似度が高くなるサブワード系列の探索を行ない、その最大類 似度を求める。  [0046] The reference similarity calculation unit 103 is a processing unit that calculates a reference similarity used to determine whether or not the input speech is an unregistered word. This reference similarity calculation unit 103 searches for a subword sequence having the highest similarity to the input speech by arbitrarily combining patterns or models of acoustic units shorter than words called subwords, and has the maximum similarity. Ask for.
[0047] 未登録語判定部 104は、上記音声認識部 101および参照類似度計算部 103の双 方の結果を基に、利用者の発話内容が未登録語であるか否かを判定する。この未登 録語判定部 104は、利用者の発話内容が音声認識語彙格納部 102に格納されてい る単語、すなわち、登録語である場合には、発話内容を認識したという認識結果を結 果表示部 107に出力し、音声認識語彙格納部 102に格納されていない単語、すな わち、未登録語である場合には、発話内容が未登録語であるという判定結果を未登 録語候補検索部 105に出力する。  The unregistered word determination unit 104 determines whether or not the user's utterance content is an unregistered word based on the results of both the voice recognition unit 101 and the reference similarity calculation unit 103. This unregistered word determination unit 104, when the utterance content of the user is a word stored in the speech recognition vocabulary storage unit 102, that is, a registered word, results in a recognition result that the utterance content has been recognized. If the word is output to the display unit 107 and is not stored in the speech recognition vocabulary storage unit 102, that is, an unregistered word, the determination result that the utterance content is an unregistered word is displayed as an unregistered word. Output to candidate search section 105.
[0048] 未登録語候補検索部 105は、利用者の発話内容が未登録語と判定された際に、そ の発話内容から未登録語単語の検索を行なう処理部である。 [0049] 未登録語単語格納部 106は、上記未登録語候補検索部 105における未登録語検 索の対象となる多数の単語を格納するハードディスク等の記憶装置である。 [0048] The unregistered word candidate search unit 105 is a processing unit that searches for unregistered word words from the utterance content when the user's utterance content is determined to be an unregistered word. [0049] The unregistered word storage unit 106 is a storage device such as a hard disk that stores a large number of words to be searched for unregistered words in the unregistered word candidate search unit 105.
[0050] なお、未登録語候補検索部 105は、未登録語単語格納部 106に格納される非常 に多くの語彙の中から未登録語単語を検索することを想定しているため、後述するよ うに、音声認識部 101とは異なる、より簡素で高速な (つまり計算時間が短い)方法を 用いて検索を行うことが好まし 、。  It is assumed that the unregistered word candidate search unit 105 searches for an unregistered word word from a very large number of vocabularies stored in the unregistered word word storage unit 106, and will be described later. Thus, it is preferable to perform a search using a simpler and faster method (that is, shorter calculation time), which is different from the speech recognition unit 101.
[0051] 結果表示部 107は、 CRTディスプレイや液晶ディスプレイ等の表示装置であり、未 登録語判定部 104から出力される認識結果を示す画面や、判定結果および未登録 語単語の検索結果を示す画面を表示することで、利用者の発話内容が認識された か否か、未登録語であるカゝ否かを利用者に提示する。  [0051] The result display unit 107 is a display device such as a CRT display or a liquid crystal display, and shows a screen showing a recognition result output from the unregistered word determination unit 104, a determination result, and a search result of unregistered word words. By displaying the screen, it is shown to the user whether the content of the user's utterance has been recognized or whether it is an unregistered word.
[0052] 次に、このように構成された音声認識装置 100の動作について説明する。  Next, the operation of the speech recognition apparatus 100 configured as described above will be described.
[0053] 図 2は、音声認識装置 100の処理動作を示すフローチャートである。  FIG. 2 is a flowchart showing the processing operation of the speech recognition apparatus 100.
[0054] まず、音声認識装置 100は、利用者の発話した音声の入力を受け付けると (S10) 、音声認識部 101において、入力音声に基づき、音声認識語彙格納部 102の中から 入力音声に類似する単語を認識する(S12)。ここで、より具体的には、音声認識部 1 01は、音声認識語彙格納部 102に格納された各単語の標準パタンまたは単語モデ ルと、入力音声とを照合して、単語ごとに入力音声との類似度を計算し、類似度の高 いものを候補として抽出する。また、このとき、音声認識装置 100は、参照類似度計 算部 103において、入力音声に最も近いサブワード系列の探索を行ない、その類似 度を参照類似度として求める(S14)。  First, when the speech recognition apparatus 100 receives an input of speech uttered by the user (S10), the speech recognition unit 101 resembles the input speech from the speech recognition vocabulary storage unit 102 based on the input speech. The word to be recognized is recognized (S12). More specifically, the speech recognition unit 101 compares the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 with the input speech, and inputs speech for each word. Similarities are calculated, and those with high similarity are extracted as candidates. At this time, the speech recognition apparatus 100 searches the sub-word sequence closest to the input speech in the reference similarity calculation unit 103, and obtains the similarity as the reference similarity (S14).
[0055] 次に、音声認識装置 100は、未登録語判定部 104において、音声認識部 101で求 められた 1位候補の単語 (最も類似度の高い候補の単語)の類似度と、参照類似度 計算部 103で求められた参照類似度とを比較し、その比較結果が所定の閾値以内 であるかを判断する(S16)。ここにいう所定の閾値とは、利用者の発話内容を、登録 語であるか未登録語であるか判別するための閾値であり、登録語を発話した多数の サンプル音声と、未登録語を発話した多数のサンプル音声を用いて、音声認識部 10 1および参照類似度計算部 103それぞれによる類似度を求めて、これらの統計的分 布力 最適な閾値が決定される。 [0056] ここで、未登録語判定部 104は、音声認識部 101の 1位候補単語の類似度と参照 類似度計算部 103による参照類似度とが、事前に統計的に定めた閾値以内である 場合 (S16の Yes)、利用者の発話内容を音声認識語彙格納部 102に含まれる単語 (登録語)であると判定する(S 18)。その後、音声認識装置 100は、結果表示部 107 を介して、認識結果を利用者に提示し (S26)、処理動作を終了する。 Next, the speech recognition apparatus 100 uses the unregistered word determination unit 104 to refer to the similarity of the first candidate word (the candidate word having the highest similarity) obtained by the speech recognition unit 101, and the reference The similarity calculation unit 103 compares the reference similarity obtained and determines whether the comparison result is within a predetermined threshold (S16). The predetermined threshold here is a threshold for determining whether the utterance content of the user is a registered word or an unregistered word. Using a large number of sampled utterances, the similarity by each of the speech recognition unit 101 and the reference similarity calculation unit 103 is obtained, and these statistical distribution power optimum threshold values are determined. Here, the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 are within a threshold that is statistically determined in advance. If there is (Yes in S16), it is determined that the utterance content of the user is a word (registered word) included in the speech recognition vocabulary storage unit 102 (S18). Thereafter, the speech recognition apparatus 100 presents the recognition result to the user via the result display unit 107 (S26), and ends the processing operation.
[0057] 一方、未登録語判定部 104は、音声認識部 101の 1位候補単語の類似度と参照類 似度計算部 103による参照類似度とが、事前に統計的に定めた閾値を超えている場 合 (S16の No)、利用者の発話内容を音声認識語彙格納部 102に含まれない単語( 未登録語)であると判定し (S20)、判定結果を未登録語候補検索部 105に出力する  [0057] On the other hand, the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 exceed a statistically predetermined threshold. (No in S16), the user's utterance content is determined to be a word (unregistered word) that is not included in the speech recognition vocabulary storage unit 102 (S20), and the determination result is stored as an unregistered word candidate search unit. Output to 105
[0058] 未登録語判定部 104で、利用者の発話内容が未登録語であると判定されると、音 声認識装置 100は、未登録語候補検索部 105において、その発話内容から未登録 語単語の検索を行なう(S22)。このとき、未登録語候補検索部 105は、参照類似度 計算部 103で得られたサブワード系列と、未登録語単語格納部 106に格納された多 数の未登録語単語それぞれとを比較して、類似度に関連するスコアである未登録語 スコアを求めることにより、スコアの高い未登録語、すなわち、利用者の発話内容と思 われる未登録語を検索する。そして、未登録語候補検索部 105は、利用者の発話内 容と思われる未登録語の候補を、例えば、スコアの高いものから降順に複数語抽出 し (S24)、その未登録語スコアと共に結果表示部 107に出力する。その後、音声認 識装置 100は、結果表示部 107を介して、判定結果と、抽出された未登録語の候補 およびその未登録語スコアとを、利用者に提示し (S26)、処理動作を終了する。 [0058] When the unregistered word determination unit 104 determines that the user's utterance content is an unregistered word, the speech recognition apparatus 100 uses the unregistered word candidate search unit 105 to register the utterance content based on the utterance content. A word word is searched (S22). At this time, the unregistered word candidate search unit 105 compares the subword sequence obtained by the reference similarity calculation unit 103 with each of the many unregistered word words stored in the unregistered word word storage unit 106. Then, an unregistered word score, which is a score related to the similarity, is obtained to search for an unregistered word having a high score, that is, an unregistered word that seems to be the user's utterance content. Then, the unregistered word candidate search unit 105 extracts a plurality of unregistered word candidates that are considered to be the utterance contents of the user, for example, in descending order from the highest score (S24), together with the unregistered word score. Output to the result display section 107. After that, the speech recognition apparatus 100 presents the determination result, the extracted unregistered word candidate and the unregistered word score to the user via the result display unit 107 (S26), and performs the processing operation. finish.
[0059] ところで、一般に音声認識装置 100は、音声認識装置 100をマン 'マシン'インタフ エースの入力装置として利用するアプリケーションに応じて、認識する単語すなわち 音声認識語彙が規定される。例えば、音声認識を入力手段として番組検索を行うァ プリケーシヨンの場合、検索対象の番組名や検索の際キーとなる出演者名などが音 声認識語彙として規定されることになる。  By the way, in general, the speech recognition device 100 defines a word to be recognized, that is, a speech recognition vocabulary according to an application that uses the speech recognition device 100 as an input device for a man-machine interface. For example, in the case of an application that searches for a program using voice recognition as an input means, the name of the program to be searched, the name of a performer who becomes a key in the search, and the like are defined as the voice recognition vocabulary.
[0060] このような応用を想定した上で本音声認識装置 100は、利用者による発話内容が 音声認識語彙格納部 102に含まれる単語であるカゝ否かによって異なる表示を行なう [0061] つまり、発話内容が音声認識語彙格納部 102に含まれる単語である場合には、上 述したように、音声認識語彙格納部 102に格納された各単語の標準パタンまたは単 語モデルと、入力音声との照合が音声認識部 101で行なわれ、単語ごとの類似度が 計算され類似度の高いものから降順に上位候補が得られ、その内容が結果表示部 1 07に出力される。 Assuming such an application, the speech recognition apparatus 100 displays differently depending on whether the utterance content by the user is a word included in the speech recognition vocabulary storage unit 102 or not. That is, when the utterance content is a word included in the speech recognition vocabulary storage unit 102, as described above, the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 and The speech recognition unit 101 performs collation with the input speech, calculates the similarity for each word, obtains the top candidates in descending order from the highest similarity, and outputs the content to the result display unit 107.
[0062] 具体的な例として、音声認識語彙格納部 102に「松下太郎」 t ヽぅ単語が存在する と仮定して、利用者が「マツシタタロウ」と発話した場合の例を図 3に示す。またこの時 、参照類似度計算部 103では、入力音声に最も近いサブワード系列の探索が行なわ れて、その類似度が参照類似度として求められる。  As a specific example, FIG. 3 shows an example in which a user utters “Matsushita Taro” on the assumption that “Taro Matsushita” t ヽ ぅ word exists in the speech recognition vocabulary storage unit 102. . At this time, the reference similarity calculation unit 103 searches for a subword sequence closest to the input speech and obtains the similarity as the reference similarity.
[0063] 図 4には、利用者による「マツシタタロウ」という発話に対する参照類似度計算部 10 3の出力例を示す。  FIG. 4 shows an output example of the reference similarity calculation unit 103 for the utterance “Matsushita Taro” by the user.
[0064] 図 3および図 4に示した例では、 1位候補の類似度" 2041"と、参照類似度" 2225" の差は、事前に統計的に求めておいた閾値 (例えば、 "200")より小さいことから、未 登録語判定部 104は、利用者の発話内容は登録語であると判定する。発話内容の 判定結果は、未登録語ではなカゝつたので、この場合、未登録語候補検索部 105は、 未登録語検索を行なわず、そのまま認識結果を結果表示部 107に出力し、結果表 示部 107を介して、認識結果として「松下太郎」が正しく表示される。図 5に、この結果 表示部 107における結果表示の一例を示す。  In the example shown in FIG. 3 and FIG. 4, the difference between the similarity “2041” of the first candidate and the reference similarity “2225” is a statistically calculated threshold (for example, “200 Therefore, the unregistered word determination unit 104 determines that the user's utterance content is a registered word. Since the judgment result of the utterance content is not correct for unregistered words, in this case, the unregistered word candidate search unit 105 outputs the recognition result as it is to the result display unit 107 without performing the unregistered word search. “Taro Matsushita” is correctly displayed as a recognition result via the display unit 107. FIG. 5 shows an example of the result display in the result display unit 107.
[0065] 図 5に例示される形式の認識結果を見た利用者は、自身の発話内容が登録語であ つたことを一目で知ることができる。  [0065] The user who sees the recognition result in the format illustrated in FIG. 5 can know at a glance that his / her utterance content is a registered word.
[0066] 他方、利用者による発話内容が音声認識語彙格納部 102に存在しない単語である 場合でも、音声認識部 101は、音声認識語彙格納部 102に格納された各単語との 照合を行ない、単語ごとに類似度を求め、類似度の高いものから降順に上位候補を 出力する。しかし、この場合、発話内容は音声認識語彙に含まれない単語であるた め、これら候補の中に発話内容に一致する単語は存在しないことになるので、その出 力例は、図 6に示すようなものとなる。ここでは、先に述べた場合と同様に利用者の発 話内容は「マツシタタロウ」である力 音声認識語彙格納部 102には、この単語、すな わち、「松下太郎」 t 、う単語は含まれて 、な 、として!/、る。 [0066] On the other hand, even when the utterance content by the user is a word that does not exist in the speech recognition vocabulary storage unit 102, the speech recognition unit 101 collates with each word stored in the speech recognition vocabulary storage unit 102, The similarity is calculated for each word, and the top candidates are output in descending order from the highest similarity. However, in this case, since the utterance content is a word that is not included in the speech recognition vocabulary, there is no word that matches the utterance content among these candidates, so an output example is shown in Fig. 6. It will be like that. Here, as in the case described above, the speech utterance content of the user is “Matsushita Taro”. In other words, “Taro Matsushita” t, the word is included, and, as! /
[0067] また、このとき、参照類似度計算部 103では、入力音声に最も類似するサブワード 列の探索とその類似度の計算が行なわれるが、これは発話内容が音声認識語彙に 含まれているか否かには全く影響を受けない。この結果、参照類似度計算部 103の 出力は、図 7に示すように、発話内容が音声認識語彙に含まれている場合の出力例 (図 4参照)と同様の出力となる。  [0067] At this time, the reference similarity calculation unit 103 searches for a subword sequence most similar to the input speech and calculates the similarity. This is whether the utterance content is included in the speech recognition vocabulary. It is not affected at all by no. As a result, the output of the reference similarity calculation unit 103 is the same as the output example (see FIG. 4) when the utterance content is included in the speech recognition vocabulary as shown in FIG.
[0068] 続いて、未登録語判定部 104は、上述したように、音声認識部 101による 1位候補 の類似度と、参照類似度計算部 103による参照類似度とを比較する。発話内容が音 声認識語彙に含まれない場合では、この両者の類似度は大きく異なり、それらの差 は予め定めた閾値より大きくなるため、これを根拠として、未登録語判定部 104は、 発話内容を未登録語と判定する。例えば、図 6および図 7に示した例では、音声認識 部 101における 1位候補の類似度" 1431"と、参照類似度" 2225"とは、大きくかけ 離れており、その差は予め定められた閾値 (例えば、 "200")より大きいので、未登録 語判定部 104は、利用者の発話内容を未登録語と判定する。  Subsequently, the unregistered word determination unit 104 compares the similarity of the first candidate by the speech recognition unit 101 and the reference similarity by the reference similarity calculation unit 103 as described above. When the utterance content is not included in the speech recognition vocabulary, the similarity between the two is greatly different and the difference between them is greater than a predetermined threshold value. Based on this, the unregistered word determination unit 104 The content is determined as an unregistered word. For example, in the example shown in FIG. 6 and FIG. 7, the similarity “1431” of the first candidate in the speech recognition unit 101 and the reference similarity “2225” are greatly different, and the difference is predetermined. Therefore, the unregistered word determination unit 104 determines that the user's utterance content is an unregistered word.
[0069] 未登録語判定部 104で、利用者の発話内容が未登録語であると判定されると、未 登録語候補検索部 105は、参照類似度計算部 103において得られたサブワード系 列と、未登録語単語格納部 106に格納された多数の未登録語単語それぞれとの比 較を行ない、類似度に関連するスコアである未登録語スコアを算出する。そして、未 登録語候補検索部 105は、未登録語単語の中から、未登録語スコアの高い順に上 位 5つの候補を抽出して、その未登録語スコアと一緒に結果表示部 107に出力する  [0069] When the unregistered word determination unit 104 determines that the utterance content of the user is an unregistered word, the unregistered word candidate search unit 105 determines the subword sequence obtained in the reference similarity calculation unit 103. Are compared with each of a number of unregistered word words stored in the unregistered word word storage unit 106, and an unregistered word score, which is a score related to the similarity, is calculated. The unregistered word candidate search unit 105 extracts the top five candidates from the unregistered word score in descending order of the unregistered word score, and outputs it to the result display unit 107 together with the unregistered word score. Do
[0070] 図 8は、未登録語候補検索部 105が、利用者の「マツシタタロウ」という発話に対し て、参照類似度計算部 103で得られたサブワード列「マツシマカノウ」に基づいて、未 登録語を検索した結果の例を示す図である。ここで、未登録単語格納部 106には「 松下太郎」が格納されて 、るとして 、る。 [0070] FIG. 8 shows that the unregistered word candidate search unit 105 performs an unregistered word candidate search unit 105 based on the subword string “Matsushima Kanou” obtained by the reference similarity calculation unit 103 for the user's utterance “Matsushi Taro”. It is a figure which shows the example of the result of having searched the registration word. Here, “Taro Matsushita” is stored in the unregistered word storage unit 106.
[0071] このように、未登録語候補検索部 105による検索結果は、これらの単語が未登録語 であるという情報と共に、結果表示部 107に送られ、利用者の発話が未登録語として 認識されたことが利用者に伝えられる。図 8に示した例では、図 9に示す結果が出力 される。図 9に例示される形式の認識結果を見た利用者は、自身の発話内容がシス テムにとって未知であったことを、一目で知ることができる。 [0071] Thus, the search result by the unregistered word candidate search unit 105 is sent to the result display unit 107 together with information that these words are unregistered words, and the user's utterance is recognized as an unregistered word. This is communicated to the user. In the example shown in Fig. 8, the result shown in Fig. 9 is output. Is done. The user who sees the recognition result in the format illustrated in Fig. 9 can know at a glance that his / her utterance content was unknown to the system.
[0072] このような結果表示方法とすることによって、利用者の発話した内容が画面に表示 されているため、利用者は、発話が正しく認識されたかどうかを疑う必要なぐ正しく 認識された単語が音声認識語彙に含まれていなカゝつたことを明確に知ることができる ようになる。 [0072] By adopting such a result display method, since the content of the user's utterance is displayed on the screen, the user needs to doubt whether or not the utterance is correctly recognized. You will be able to know clearly what is not included in the speech recognition vocabulary.
[0073] また、このような結果表示方法とした場合、未登録語候補として複数の単語を表示 することになるので、利用者は、自らが発話した単語を搜すことが必要となる。しかし、 出力される候補の数が少なければ、その手間は僅かで済む。し力も、このような未登 録語候補の表示は、表示された単語について、未登録語であるがために、その後の 処理を行なうことはできないことの表明を意図したものであるため、複数の未登録語 候補の中から、利用者の発話した単語を利用者に選ばせると 、う手間は起こりえな ヽ 。したがって、未登録語候補として複数候補を表示することのデメリットは非常に少な いといえる。  [0073] Further, when such a result display method is used, a plurality of words are displayed as unregistered word candidates, and the user needs to deceive the words spoken by himself / herself. However, if the number of candidates to be output is small, the effort is small. However, the display of such unregistered word candidates is intended to represent that the displayed word is an unregistered word and cannot be further processed. If you let the user choose a word spoken by the user from among the unregistered word candidates, there will be no trouble. Therefore, the disadvantage of displaying multiple candidates as unregistered word candidates is very small.
[0074] また、音声認識システムを実装する観点力も見れば、未登録語候補を 1単語に絞り 込まなくてもよいということは、未登録語候補検索部 105における検索精度は、高精 度が要求されるわけではなぐまた、その検索精度を実現するためのハードウェアリソ ースも低く抑えられる等、大きなメリットとなり得る。し力も、たとえ検索精度があまり高く なかったとしても、複数候補を表示することによって、利用者が発話した単語がその 中に高い確率で含まれることとなり、利用者カゝら見ても、その単語が未登録語である が故に繰り返し発話を試みても無益であると知る上で、大きな実用性が得られる。  [0074] Further, from the viewpoint of implementing a speech recognition system, the fact that unregistered word candidates do not have to be narrowed down to one word means that the unregistered word candidate search unit 105 has a high accuracy in search accuracy. Not only is it required, it can also be a great merit, such as keeping the hardware resources to achieve the search accuracy low. However, even if the search accuracy is not so high, by displaying multiple candidates, the words spoken by the user will be included in it with a high probability. Since the word is an unregistered word, it is useful to know that it is useless even if you try to speak repeatedly.
[0075] 以下では、未登録語候補検索部 105の動作について、より具体的に説明する。  Hereinafter, the operation of the unregistered word candidate search unit 105 will be described more specifically.
[0076] 本実施の形態 1における未登録語候補検索部 105では、未登録語候補の検索手 法として、音素編集距離に基づく値を用いる。  In unregistered word candidate search section 105 in the first embodiment, a value based on phoneme editing distance is used as a search method for unregistered word candidates.
[0077] この検索手法は、 2つの単語をそれぞれ音素記号で表記した場合に、一方の単語 の音素記号列を編集して、もう一方の音素記号列に書き換える作業を想定した場合 に、何ステップの変更作業が必要であるかをカウントするものである。  [0077] In this search method, when two words are represented by phoneme symbols, the number of steps is assumed when the phoneme symbol string of one word is edited and rewritten to the other phoneme symbol string. It counts whether or not the change work is necessary.
[0078] この例を図 10に示す。図 10には、音素記号列" ABCDEF" (系列 1)と、音素記号 列" AXBYDF" (系列 2)とが示されており、系列 2から系列 1に書き換えるために、必 要なステップ (編集距離)は、挿入 (挿入誤り)、置換 (置換誤り)、削除 (脱落誤り)の 編集作業が各 1回ずつ必要であることが示されている。すなわち、図 10に示す例で は、系列 2から系列 1に書き換える作業に必要な編集距離は、 3 (挿入 1 +置換 1 +削 除 1)となる。 An example of this is shown in FIG. Figure 10 shows the phoneme symbol string "ABDEF" (series 1) and the phoneme symbol The column "AXBYDF" (series 2) is shown, and the necessary steps (edit distance) to rewrite from series 2 to series 1 are insertion (insertion error), replacement (replacement error), and deletion (missing). It is shown that the editing work of (error) is necessary once for each. In other words, in the example shown in FIG. 10, the edit distance required for the work of rewriting from series 2 to series 1 is 3 (insert 1 + replace 1 + delete 1).
[0079] 未登録語候補検索部 105では、参照類似度計算部 103により求められたサブヮー ド系列の音素記号列表現と、未登録語単語格納部 106に格納された単語の音素記 号列に対して、上述したような編集距離の計算を行ない、さらに、これを長さで正規 化したものを 1から引いて未登録語スコアとする。未登録語候補検索部 105は、この 処理を未登録語単語格納部 106に格納された全ての単語に対して行な ヽ、未登録 語スコアの高い単語から降順に未登録語候補として抽出し、結果表示部 107に出力 する。先の図 8に示した図は、このようにして得られた未登録語候補とその未登録語 スコアの例である。  [0079] In the unregistered word candidate search unit 105, the phoneme symbol string representation of the subword sequence obtained by the reference similarity calculation unit 103 and the phoneme symbol string of the word stored in the unregistered word word storage unit 106 are used. On the other hand, the edit distance is calculated as described above, and the normalized length is subtracted from 1 to obtain the unregistered word score. The unregistered word candidate search unit 105 performs this process for all the words stored in the unregistered word word storage unit 106, and extracts the unregistered word candidates from the words with the highest unregistered word score in descending order. The result is output to the result display unit 107. The diagram shown in FIG. 8 is an example of the unregistered word candidate and the unregistered word score obtained in this way.
[0080] このように未登録語の検索手法を、音素系列の比較によって実現することの利点は 、非常に多くの語彙が格納される未登録語単語格納部 106に対する全探索を軽い 処理で行なうことで、未登録語検索に必要な計算リソース (計算時間、計算に要する メモリ量、プロセッサ負荷、消費電力等)を小さく抑える点にある。これにより、例えば 携帯情報端末装置といった計算リソースが制限されやすい装置においても、短時間 のうちに未登録語候補を検索して利用者に表示して、利用者に軽快な使用感を与え ることがでさる。  [0080] The advantage of realizing the unregistered word search method by comparing the phoneme sequences in this way is that the entire search for the unregistered word word storage unit 106 in which a large number of words are stored is performed with a light process. This means that the computational resources required for unregistered word search (calculation time, amount of memory required for computation, processor load, power consumption, etc.) are kept small. As a result, even in a device that is easily limited in computing resources, such as a portable information terminal device, unregistered word candidates are searched for and displayed to the user in a short time to give the user a light feeling of use. It is out.
[0081] その反面、探索を簡便に行なうことによる検索精度の低下が懸念されるが、上述し たように、未登録語候補は、複数の候補を出力することが許されているため、上位候 補を複数出力することで利用者が発話した単語がその中に含まれる確率を高め、検 索精度の低下に対応させることができる。また、未登録語検索を音声認識部 101とは 独立に実行することで、音声認識部 101の認識処理に悪影響を与えないという効果 も有する。  [0081] On the other hand, there is a concern that the search accuracy may be reduced by simply performing the search, but as described above, unregistered word candidates are allowed to output a plurality of candidates. By outputting multiple candidates, it is possible to increase the probability that words spoken by the user will be included in the words, and to cope with a decrease in search accuracy. Further, by executing the unregistered word search independently of the voice recognition unit 101, there is an effect that the recognition process of the voice recognition unit 101 is not adversely affected.
[0082] なお、本実施の形態 1においては、未登録語判定のために参照類似度計算部 103 を設けたが、これは必須要件ではなぐ音響モデルにガべッジモデルをカ卩えるなど、 その他の未登録語判定の手法を利用することも可能である。 [0082] In the first embodiment, the reference similarity calculation unit 103 is provided for unregistered word determination, but this is not an essential requirement. It is also possible to use other methods for determining unregistered words.
[0083] また、本実施の形態 1で説明した音声認識部 101、音声認識語彙格納部 102、お よび、参照類似度計算部 103を代替して、図 11に示すような未知発話検出装置を用 いることちでさる。  Further, instead of the speech recognition unit 101, the speech recognition vocabulary storage unit 102, and the reference similarity calculation unit 103 described in the first embodiment, an unknown utterance detection apparatus as shown in FIG. Use it for the purpose.
[0084] 図 11は、未知発話検出装置の機能的な構成を示すブロック図である。  FIG. 11 is a block diagram showing a functional configuration of the unknown utterance detection apparatus.
[0085] 音声片パタン格納部 111は、入力音声の特徴パラメータとのマッチングに用いられ る、標準的な音声の音声片を格納する。  The speech segment pattern storage unit 111 stores speech segments of standard speech used for matching with the feature parameters of the input speech.
[0086] ここで、音声片とは、音声の母音区間の後半部分と、これに後続する子音区間の前 半部分とを連接した VCパタン、および、子音区間の後半部分と、これに後続する母 音区間の前半部分とを連接した CVパタンの集合を意味している。ただし、音声片は 、この他にも日本語をローマ字標記した場合のアルファベット 1文字 1文字にほぼ相 当する音素の集合、日本語をひらがな標記した場合のひらがな 1文字 1文字にほぼ 相当するモーラの集合、複数のモーラの連鎖を意味するサブワードの集合、さらに、 これらの集合の混合集合であってもよ 、。  [0086] Here, a speech segment is a VC pattern that concatenates the latter half of a vowel segment and the first half of a consonant segment that follows it, and the latter half of a consonant segment. It means a set of CV patterns connected to the first half of the vowel interval. However, a speech piece is a set of phonemes that are almost equivalent to one letter of the alphabet when Japanese is written in Roman letters, and a mora that is almost equivalent to one letter of hiragana when Japanese is written in Hiragana. A set of subwords, meaning a chain of multiple mora, or a mixed set of these sets.
[0087] 単語辞書格納部 112は、上記音声片を連結して音声認識語彙の単語パタンを合 成するための規則を格納する。  [0087] The word dictionary storage unit 112 stores a rule for synthesizing the word pattern of the speech recognition vocabulary by connecting the speech pieces.
[0088] 単語マッチング部 113は、特徴パラメータの時系列で表現された入力音声と、上記 合成された単語パタンとを比較し、その類似性に対応する、尤度を単語ごとに求める  [0088] The word matching unit 113 compares the input speech expressed in a time series of feature parameters with the synthesized word pattern, and calculates the likelihood corresponding to the similarity for each word.
[0089] 遷移確率格納部 114は、音声片同士を任意に結合する場合における、結合の自 然さを連続値で表現する遷移確率を格納する。ここでは、遷移確率として音素の 2gr am確率を用いる。音素の 2gram確率とは、先行する音素 xの後に、音素 yが接続す る確率 P (y I X)を意味するもので、多数の日本語テキストデータなどを用いて事前に 求めておく。ただし、遷移確率は、これ以外にモーラの 2gram確率、サブワードの 2g ram確率、あるいは、これらの混合の 2gram確率であってもよぐまた 2gram確率以 外にも、 3gram確率などであってもよい。 [0089] The transition probability storage unit 114 stores a transition probability that expresses the naturalness of the connection as a continuous value when the speech pieces are arbitrarily combined. Here, the 2gram probability of phonemes is used as the transition probability. The 2-gram probability of a phoneme means the probability P (y I X) that a phoneme y connects after the preceding phoneme x, and is obtained in advance using a large number of Japanese text data. However, the transition probability may be a 2-gram probability of a mora, a 2-gram probability of a subword, or a 2-gram probability of a mixture of these, or a 3-gram probability other than a 2-gram probability. .
[0090] 音声系列マッチング部 115は、上記音声片パタンを任意に結合してできるパタンと 、特徴パラメータの時系列として表現された入力音声との尤度を、上記遷移確率を考 慮して計算し、得られた最大尤度とする。 The speech sequence matching unit 115 considers the transition probability based on the likelihood of a pattern formed by arbitrarily combining the speech segment patterns and the input speech expressed as a time series of feature parameters. The maximum likelihood obtained is calculated.
[0091] 候補スコア差計算部 116は、上記単語マッチング部 113で計算された単語ごとの 尤度のうち、最も高い値を得た単語 (1位候補)と次に高い値を得た単語 (2位候補) の尤度の差を単語の長さで正規ィ匕して計算する。  [0091] The candidate score difference calculation unit 116 out of the likelihood of each word calculated by the word matching unit 113, the word that has the highest value (first candidate) and the word that has the next highest value ( The difference between the likelihoods of the 2nd candidate) is calculated by normalizing the word length.
[0092] 候補 ·音素系列間類似度計算部 117は、 1位候補と 2位候補の音響的な類似性を 求めるため、 1位候補の音素系列と 2位候補の音素系列の系列間の距離を計算する  [0092] The candidate-phoneme sequence similarity calculation unit 117 calculates the distance between the first candidate phoneme sequence and the second candidate phoneme sequence in order to obtain the acoustic similarity between the first candidate and the second candidate. Calculate
[0093] 候補'音声系列スコア差計算部 118は、 1位候補の尤度と、上記音声系列マツチン グ部 115で計算された参照尤度との差を単語の長さで正規ィ匕して計算する。 Candidate 'speech sequence score difference calculation section 118 normalizes the difference between the likelihood of the first candidate and the reference likelihood calculated by speech sequence matching section 115 by the word length. calculate.
[0094] 候補 ·音声系列,音素系列類似度計算部 119は、 1位候補と、上記音声系列マッチ ング部 115によって最適系列とされた系列の音響的な類似性を、各音素系列間の距 離として計算する。  Candidate / speech sequence / phoneme sequence similarity calculation unit 119 calculates the acoustic similarity between the first candidate and the sequence determined as the optimal sequence by the speech sequence matching unit 115 as a distance between each phoneme sequence. Calculate as separation.
[0095] そして、このような未知発話検出装置を用いるとした場合、未登録語判定部 104は 、上記、候補スコア差計算部 116、候補 ·音素系列間類似度計算部 117、候補,音声 系列スコア差計算部 118、および、候補 ·音声系列 ·音素系列類似度計算部 119で 求められた各値を総合して、入力音声が未登録語である力否かを判定する。このよう に、複数の未登録語検出のための尺度を統計的に組み合わせて判定することで、未 登録語の判定精度が向上する。なお、ここでは、未登録語判定部 104で用いる尺度 として、 4つの尺度を挙げた力 これ以外にも、各単語候補の尤度そのものや、その 分布、また、単語区間内での局所スコアの変動量、単語を構成する音素の持続時間 情報などの尺度も併用することも可能である。  Then, when such an unknown utterance detection device is used, the unregistered word determination unit 104 includes the candidate score difference calculation unit 116, the candidate / phoneme sequence similarity calculation unit 117, the candidate, and the speech sequence. The values obtained by the score difference calculation unit 118 and the candidate / speech sequence / phoneme sequence similarity calculation unit 119 are combined to determine whether or not the input speech is an unregistered word. In this way, the determination accuracy of unregistered words is improved by statistically combining a plurality of measures for detecting unregistered words. In addition, here, the four scales are used as the scales used in the unregistered word determination unit 104. Besides this, the likelihood of each word candidate itself, its distribution, and the local score within the word section. It is also possible to use measures such as the amount of variation and the duration information of the phonemes that make up the word.
[0096] また、この場合、複数の尺度を元に未登録語を判定する手法として、事前に多数の 認識結果の事例を用いて求めた線型判別式を利用する。しかし、これ以外にも、ニュ 一ラルネットワーク、決定木、 SVM (サポート'ベクトル'マシン)など、いわゆる学習機 械の利用も有効である。  [0096] In this case, a linear discriminant obtained in advance using a large number of recognition result cases is used as a method for determining an unregistered word based on a plurality of scales. However, the use of so-called learning machines such as neural networks, decision trees, and SVM (support 'vector' machines) is also effective.
[0097] また、未登録語候補検索部 105においては、音素系列間の編集距離に基づいた 未登録語検索方法について述べたが、音素間の編集距離の定義として、挿入誤り、 脱落誤り、置換誤りを全て編集距離" 1"とするのではなぐ実験的に得られたそれら 誤りの発生確率に基づいた連続値を距離とすることも効果的である。 Further, in the unregistered word candidate search unit 105, the unregistered word search method based on the edit distance between phoneme sequences has been described. However, as the definition of the edit distance between phonemes, insertion error, omission error, replacement Those obtained experimentally that not all errors are set to edit distance "1" It is also effective to use a continuous value based on the error occurrence probability as the distance.
[0098] さらに、未登録語単語格納部 106に、音声認識語彙格納部 102と同様のフォーマ ットのデータを格納しておき、未登録語候補検索部 105では、音声認識部 101と同 様に入力音声のパラメータカゝら直接単語の照合を行なって、未登録語候補およびそ れらの未登録語スコアを出力するという方法も可能である。このような構成とした場合 、未登録語検索に要するリソースは増大する反面、未登録語の検索精度が向上する という効果が得られる。また、この場合においても、本発明の特長である、目的単語に 対する認識率を低下させな ヽと ヽぅ効果は維持されることになる。  Further, data in the same format as the speech recognition vocabulary storage unit 102 is stored in the unregistered word word storage unit 106, and the unregistered word candidate search unit 105 is similar to the speech recognition unit 101. It is also possible to directly collate words from the input speech parameter and output unregistered word candidates and their unregistered word scores. With such a configuration, the resources required for unregistered word search increase, but the effect of improving the search accuracy for unregistered words can be obtained. Even in this case, the ヽ effect and the ヽ ぅ effect that does not reduce the recognition rate for the target word, which is a feature of the present invention, are maintained.
[0099] さらに、未登録語単語格納部 106に含まれる単語と、音声認識語彙格納部 102に 含まれる単語は重なりがないことを前提に説明をしてきたが、未登録語単語格納部 1 06に格納される単語は、音声認識語彙格納部 102に含まれる単語であっても良いも のとし、代わりに未登録語候補検索部 105において、音声認識語彙格納部 102に含 まれる単語が検索された場合には、これを除外して結果表示部 107に出力するとし てもよい。このようにすることで、音声認識語彙格納部 102の内容に関わらず未登録 語単語格納部 106の語彙を確定することが可能となり、未登録語単語格納部 106の メンテナンスが容易になると 、う効果が得られる。  Furthermore, although the description has been made on the assumption that the words included in the unregistered word storage unit 106 and the words included in the speech recognition vocabulary storage unit 102 do not overlap, the unregistered word storage unit 106 May be a word included in the speech recognition vocabulary storage unit 102, and instead, a word included in the speech recognition vocabulary storage unit 102 is searched by the unregistered word candidate search unit 105. In such a case, this may be excluded and output to the result display unit 107. By doing so, it becomes possible to determine the vocabulary of the unregistered word word storage unit 106 regardless of the contents of the speech recognition vocabulary storage unit 102, and the maintenance of the unregistered word word storage unit 106 becomes easy. An effect is obtained.
[0100] また、本実施の形態 1では、入力発声は単語発声であるものとして説明を述べてき たが、入力発声は文発声であっても構わない。この場合、未登録語判定部 104にお いて、文発声中に未登録語単語が含まれていないか、含まれている場合には、どの 位置に未登録語単語があるかを判定する処理が必要となる力 その他の動作は全く 同様である。  [0100] In Embodiment 1, the description has been made assuming that the input utterance is a word utterance, but the input utterance may be a sentence utterance. In this case, the unregistered word determination unit 104 determines whether or not an unregistered word word is included in the sentence utterance, and if it is included, in which position the unregistered word word is present. The required force and other operations are exactly the same.
[0101] また、本実施の形態 1では、未登録語候補検索部 105は、 5語の未登録語候補を 出力するものとして説明したが、未登録語候補検索部 105の未登録語検索精度に応 じて、これを変更することは有効であり、また、各未登録語の類似度に応じて出力す る候補数を、未登録語候補検索部 105で可変としてもよい。したがって、未登録語候 補検索部 105の検索精度または検索された未登録語の未登録語スコアによっては、 出力される未登録語候補の数は 1つとなる場合もある。このような構成とすることで、 候補リストに発話した単語があるか否かを、利用者に判断させる場合に、無用な負荷 を利用者にかけずに済むと 、う効果が得られる。 [0101] Further, in Embodiment 1, the unregistered word candidate search unit 105 has been described as outputting five unregistered word candidates, but the unregistered word candidate search unit 105 has an unregistered word search accuracy. It is effective to change this according to the above, and the number of candidates to be output according to the similarity of each unregistered word may be made variable by the unregistered word candidate search unit 105. Therefore, depending on the search accuracy of the unregistered word candidate search unit 105 or the unregistered word score of the searched unregistered word, the number of output unregistered word candidates may be one. With this configuration, there is no unnecessary load when letting the user determine whether or not there is a spoken word in the candidate list. If you don't have to spend it on your users, you can get the effect.
[0102] さらに、図 9に示した出力の例では、全ての未登録語候補が同様に表示される例を 示したが、結果表示部 107における表示は、未登録語候補の未登録語スコアに応じ てフォントサイズを変更するとしたり、字体をボールドにしたり、色を変更したりすると いった手法によって、利用者の発話内容と思われる候補をより強調することも可能で ある。これによつて、発話単語をリストから探す際の利用者の負荷を軽減する効果が 得られる。  [0102] Furthermore, the output example shown in Fig. 9 shows an example in which all unregistered word candidates are displayed in the same manner, but the result display unit 107 displays the unregistered word score of the unregistered word candidates. It is also possible to emphasize the candidate that seems to be the user's utterance content by changing the font size according to the situation, changing the font to bold, or changing the color. As a result, an effect of reducing the load on the user when searching for a spoken word from the list can be obtained.
[0103] (実施の形態 2)  [Embodiment 2]
次に、本発明の実施の形態 2に係る音声認識装置について説明する。  Next, a speech recognition apparatus according to Embodiment 2 of the present invention will be described.
[0104] 図 12は、本実施の形態 2に係る音声認識装置の機能的な構成を示すブロック図で ある。  FIG. 12 is a block diagram showing a functional configuration of the speech recognition apparatus according to the second embodiment.
[0105] 図 12に示すように、音声認識装置 200は、音声認識部 101、音声認識語彙格納部 102、参照類似度計算部 103、未登録語判定部 104、未登録語候補検索部 105、 および、結果表示部 107を備えている点で、上記実施の形態 1に係る音声認識装置 100と共通する。しかし、本実施の形態 2に係る音声認識装置 200は、未登録語クラ ス判定部 201および未登録語クラス別単語格納部 202を備えて ヽる点で、上記実施 の形態 1に係る音声認識装置 100と異なる。以下、この異なる点を中心に説明する。 なお、上記実施の形態 1と共通する各部には、同一の符号を付して、その説明を省 略する。  As shown in FIG. 12, the speech recognition apparatus 200 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105, And it is the same as the speech recognition apparatus 100 according to the first embodiment in that the result display unit 107 is provided. However, the speech recognition apparatus 200 according to the second embodiment includes the unregistered word class determination unit 201 and the unregistered word class-specific word storage unit 202, and thus the speech recognition according to the first embodiment. Different from device 100. Hereinafter, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment, and description thereof is omitted.
[0106] 未登録語クラス判定部 201は、発話された単語が未登録語である場合に、利用者 による発話内容やシステムの利用状況から、未登録語がどのようなカテゴリに属する もの力判定を行なう処理部である。  [0106] When the spoken word is an unregistered word, the unregistered word class determination unit 201 determines what category the unregistered word belongs to based on the content of the utterance by the user and the usage status of the system. It is a processing part which performs.
[0107] 未登録語クラス別単語格納部 202は、未登録語単語をカテゴリ毎に分類して格納 するハードディスク等の記憶装置である。 The unregistered word class-specific word storage unit 202 is a storage device such as a hard disk that stores unregistered word words classified into categories.
[0108] 続、て、本実施の形態 2に係る音声認識装置 200の動作にっ 、て述べる。 Next, the operation of speech recognition apparatus 200 according to Embodiment 2 will be described.
[0109] 本実施の形態 2においても、利用者の発話内容が音声認識語彙格納部 102に含 まれる単語である場合、その動作は実施の形態 1に示したものと同様である。 Also in the second embodiment, when the user's utterance content is a word included in the speech recognition vocabulary storage unit 102, the operation is the same as that shown in the first embodiment.
[0110] 利用者の発話が未登録語であった場合には、参照類似度計算部 103による参照 類似度に基づいて、未登録語判定部 104において未登録語判定が行なわれる。同 時に未登録語クラス判定部 201において、前記未登録語がどのようなカテゴリに属す る単語であるかの判定が行なわれる。ここで、未登録語のカテゴリとは、たとえば図 13 に示すように、芸能人名のような固有人名、番組タイトルのような固有タイトル名、名 所'観光地のような固有地名などを指す。なお、未登録語クラス判定部 201における 未登録語カテゴリの判定方法にっ 、ては後述する。 [0110] When the user's utterance is an unregistered word, the reference similarity calculation unit 103 references Based on the similarity, the unregistered word determination unit 104 performs unregistered word determination. At the same time, the unregistered word class determination unit 201 determines in which category the unregistered word belongs. Here, the unregistered word category refers to a unique person name such as a celebrity name, a unique title name such as a program title, and a unique place name such as a “tour place” as shown in FIG. The method for determining the unregistered word category in the unregistered word class determining unit 201 will be described later.
[0111] 利用者の発話した単語が未登録語であり、かつ、その単語がどのようなカテゴリに 属する単語であるか推定されると、未登録語候補検索部 105で未登録語の検索が行 なわれる。この際、未登録語候補検索部 105は、未登録語クラス判定部 201によるク ラス判定結果に基づ 、て、未登録語クラス別単語格納部 202の検索範囲を絞り込ん で、未登録語の検索を行なう。このようにして、音声認識装置 200は未登録語候補を 取得すると、実施の形態 1の場合と同様に結果表示部 107を介して未登録語候補を 利用者に提示する。  [0111] When it is estimated that the word spoken by the user is an unregistered word and to which category the word belongs, the unregistered word candidate search unit 105 searches for an unregistered word. Done. At this time, the unregistered word candidate search unit 105 narrows down the search range of the unregistered word class-specific word storage unit 202 based on the class determination result by the unregistered word class determination unit 201 to search for unregistered words. Perform a search. When the speech recognition apparatus 200 acquires unregistered word candidates in this manner, the unregistered word candidate is presented to the user via the result display unit 107 as in the case of the first embodiment.
[0112] ここで、未登録語クラス判定部 201の動作について、詳細に述べる。  [0112] Here, the operation of the unregistered word class determination unit 201 will be described in detail.
[0113] 利用者による発話が文発声である場合には、認識された文中の未登録語の前後の 情報から、未登録語カテゴリの判定を行なうことができる。例えば、利用者の発話が「 〇〇の出演している番糸且が見たい」であった場合には、「〇〇」は固有人名クラスの 未登録語であるとみなし、「明日の△△を録画して」という発話に対しては、△△は番 組タイトルクラスの未登録語であるとみなす。このように文の前後のコンテキストから、 目的個所の単語のクラスを推定するモデルとしては、未登録語クラスを含んだクラス Nグラム言語モデルの利用が可能である。未登録語クラスを含んだクラス Nグラム言 語モデルを利用する場合の未登録語クラス判定部の機能構成を図 14に示す。  [0113] When the utterance by the user is a sentence utterance, the unregistered word category can be determined from information before and after the unregistered word in the recognized sentence. For example, if the user ’s utterance is “I want to see the bans and threads in which ○ appears,” it is assumed that “○” is an unregistered word in the proper personal name class, and “ For the utterance “Record △”, △△ is considered as an unregistered word in the program title class. In this way, a class N-gram language model including an unregistered word class can be used as a model for estimating the word class at the target location from the context before and after the sentence. Figure 14 shows the functional configuration of the unregistered word class determination unit when using a class N-gram language model that includes unregistered word classes.
[0114] 図 14に示すように、クラス Nグラム言語モデルを利用する場合の未登録語クラス判 定部 201aは、単語列仮説生成部 211と、クラス Nグラム生成蓄積部 221と、クラス依 存単語 Nグラム生成蓄積部 231とを備える。  As shown in FIG. 14, when using a class N gram language model, the unregistered word class determination unit 201a includes a word string hypothesis generation unit 211, a class N gram generation storage unit 221 and a class dependent A word N-gram generation / accumulation unit 231.
[0115] 単語列仮説生成部 211は、単語および未登録語クラスの系列を評価するクラス Nグ ラムと、未登録語クラスを構成する単語列を評価するクラス依存単語 Nグラムとを参照 して単語照合結果カゝら単語列仮説を生成し、認識結果を取得する。 [0116] クラス Nグラム生成蓄積部 221は、言語的な確率の対数値である言語尤度を未登 録語クラスを含む文脈に付与するためのクラス Nグラムを生成し、生成したクラス Nグ ラムを蓄積する。 [0115] The word string hypothesis generation unit 211 refers to a class N gram that evaluates a sequence of words and unregistered word classes, and a class-dependent word N gram that evaluates a word string that forms an unregistered word class. A word string hypothesis is generated from the word matching result and a recognition result is acquired. [0116] The class N-gram generation / accumulation unit 221 generates a class N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a context including an unregistered word class, and generates the generated class N-gram. Accumulate ram.
[0117] クラス依存単語 Nグラム生成蓄積部 231は、言語的な確率の対数値である言語尤 度を未登録語クラス内の単語系列に付与するためのクラス依存単語 Nグラムを生成 し、生成したクラス依存単語 Nグラムを蓄積する。  [0117] The class-dependent word N-gram generation / accumulation unit 231 generates and generates a class-dependent word N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a word sequence in an unregistered word class. Class-dependent words N-grams are accumulated.
[0118] 図 15に、クラス Nグラム生成蓄積部 221の機能構成を示す。  FIG. 15 shows a functional configuration of the class N gram generation / storage unit 221.
[0119] 図 15に示すように、クラス Nグラム生成蓄積部 221は、認識対象となる文表現がテ キストとして予め多数蓄積された文表現コーパス蓄積部 222と、文表現を形態素解析 する文表現用形態素解析部 223と、単語列クラス定義を参照して形態素結果から、 単語や未登録語クラスの連鎖の統計量を求めてクラス Nグラムを生成するクラス Nグ ラム生成部 224と、クラス Nグラムを蓄積し、単語列仮説生成部 211に出力するクラス Nグラム蓄積部 225とから構成される。  As shown in FIG. 15, the class N gram generation / storage unit 221 includes a sentence expression corpus storage unit 222 in which a large number of sentence expressions to be recognized are stored in advance as text, and a sentence expression for morphological analysis of the sentence expression. A morphological analyzer 223, a class N gram generator 224 that generates a class N gram by obtaining a statistic of a chain of words and unregistered words from a morpheme result by referring to the word string class definition, and a class N It comprises a class N gram storage unit 225 that stores the gram and outputs it to the word string hypothesis generation unit 211.
[0120] 文表現コーパス蓄積部 222は、認識対象となる文表現のデータライブラリを予め多 数蓄積する。  [0120] The sentence expression corpus accumulating unit 222 accumulates a large number of sentence expression data libraries to be recognized in advance.
[0121] 文表現用形態素解析部 223は、文表現コーパス蓄積部 222が蓄積している「明日 の天気予報を録画して」などの比較的長 、文表現であるテキストから、意味を有する 最小の言語単位である形態素を解析する。  [0121] The morpheme analysis unit 223 for sentence expression has a meaning from a relatively long sentence expression such as “Record the weather forecast for tomorrow” stored in the sentence expression corpus storage unit 222. Analyzes the morphemes that are the language units of.
[0122] クラス Nグラム生成部 224は、形態素に解析されたテキストに含まれる単語列を抽 出し、後述するクラス依存単語 Nグラム生成蓄積部 231から入力される未登録語クラ スを参照し、該当する未登録語クラスが存在する場合は、テキストに含まれる未登録 語クラスを仮想的な単語に置き換え、単語又は未登録語クラスの連鎖の統計量を求 めることで単語又は未登録語クラスの連鎖とその確率とを対応付けたクラス Nグラムを 生成する。クラス Nグラム生成部 224によって生成されたクラス Nグラムは、クラス Nグ ラム蓄積部 225に蓄積される。  [0122] The class N-gram generation unit 224 extracts a word string included in the text parsed into morphemes, refers to an unregistered word class input from the class-dependent word N-gram generation storage unit 231 described later, If the corresponding unregistered word class exists, the unregistered word class included in the text is replaced with a virtual word, and the statistic of the chain of words or unregistered word classes is obtained to determine the word or unregistered word class. A class N-gram that associates class chains with their probabilities is generated. The class N gram generated by the class N gram generation unit 224 is stored in the class N gram storage unit 225.
[0123] このように各単語連鎖の頻度を計測しておくことで、条件付き確率を計算でき、また 、未登録語クラスは仮想的に 1単語として扱うことができ、単語ごとに条件付き確率が 付加された言語モデルとなる。 [0124] 続いて、図 16に、クラス依存単語 Nグラム生成蓄積部 231の機能構成を示す。 [0123] By measuring the frequency of each word chain in this way, conditional probabilities can be calculated, and unregistered word classes can be treated virtually as one word, with conditional probabilities for each word. Is a language model with. Next, FIG. 16 shows a functional configuration of the class-dependent word N-gram generation / storage unit 231.
[0125] 図 16に示すように、クラス依存単語 Nグラム生成蓄積部 231は、クラスコーパス蓄積 部 232と、クラス用形態素解析部 233と、クラス依存単語 Nグラム生成部 234と、クラ ス依存単語 Nグラム蓄積部 235と、未登録語クラス定義生成部 236と、未登録語クラ ス定義蓄積部 237とから構成される。 As shown in FIG. 16, the class-dependent word N-gram generation / storage unit 231 includes a class corpus storage unit 232, a class morpheme analysis unit 233, a class-dependent word N-gram generation unit 234, and a class-dependent word. The N-gram storage unit 235, the unregistered word class definition generation unit 236, and the unregistered word class definition storage unit 237 are configured.
[0126] クラスコーパス蓄積部 232は、意味的な性質や、構文的な性質が同一である未登 録語 (例えば、テレビ番組のタイトルや、人名等)のデータライブラリを予め蓄積する。 [0126] The class corpus storage unit 232 stores in advance a data library of unregistered words (for example, a title of a TV program, a person's name, etc.) having the same semantic properties and syntactic properties.
[0127] クラス用形態素解析部 233は、クラスコーパスを形態素解析する。具体的には、クラ ス用形態素解析部 122は、クラスコーノ ス蓄積部 121が蓄積している「MMM天気 予報」の様なテレビ番組名などの比較的短ぐ共通の性質を持つ未登録語を、形態 素単位に解析する。 [0127] The class morphological analyzer 233 performs morphological analysis on the class corpus. Specifically, the class morpheme analyzer 122 is an unregistered word having a relatively short common property such as a TV program name such as “MMM weather forecast” stored in the class cone storage 121. Are analyzed in morpheme units.
[0128] クラス依存単語 Nグラム生成部 234は、形態素解析結果を処理し、単語の連鎖の 統計量を求めて、単語列とその確率とを対応付けた情報であるクラス依存単語 Nダラ ムを生成する。  [0128] The class-dependent word N-gram generation unit 234 processes the morphological analysis results, obtains the statistic of the word chain, and obtains the class-dependent word N-drum, which is information associating the word string with its probability. Generate.
[0129] クラス依存単語 Nグラム蓄積部 235は、クラス依存単語 Nグラム生成部 234が生成 したクラス依存単語 Nグラムを蓄積する。このクラス依存単語 Nグラム蓄積部 235に蓄 積されたクラス依存単語 Nグラムは、音声認識の際に単語列仮説生成部 211に参照 される。  [0129] The class-dependent word N-gram storage unit 235 stores the class-dependent word N-gram generated by the class-dependent word N-gram generation unit 234. The class-dependent word N-gram stored in the class-dependent word N-gram storage unit 235 is referred to by the word string hypothesis generation unit 211 during speech recognition.
[0130] 未登録語クラス定義生成部 236は、クラスコーパスの形態素解析結果力も共通の 性質を持つ未登録語をクラスとして定義した未登録語クラスの定義を生成する。すな わち、共通の性質を持つ未登録語を形態素解析し、得られた単語列を未登録語クラ スの単語列とするクラスの定義を生成する。  [0130] The unregistered word class definition generation unit 236 generates a definition of an unregistered word class in which unregistered words having the same characteristics as the morphological analysis result power of the class corpus are defined as classes. In other words, morphological analysis is performed on unregistered words with common characteristics, and a class definition is generated in which the obtained word string is a word string of the unregistered word class.
[0131] 未登録語クラス定義蓄積部 237は、未登録語クラス定義生成部 236が生成した未 登録語クラス定義を蓄積する。この未登録語クラス定義は、上記クラス Nグラムの生成 の際にクラス Nグラム生成蓄積部 221のクラス Nグラム生成部 224によって参照される  [0131] The unregistered word class definition storage unit 237 stores the unregistered word class definition generated by the unregistered word class definition generation unit 236. This unregistered word class definition is referenced by the class N gram generation unit 224 of the class N gram generation storage unit 221 when generating the above class N gram.
[0132] 以上のような構成を有する未登録語クラス判定部 201aにおいて利用されるクラス N グラム言語モデルでは、一般的に、 I個の単語からなる単語系列 W . . . Wの生起す る確率を単語 n連鎖の確率を用いて、下記の数式 1のように定式化する [数 1] ( 】, 2,… ) =fT ( |Cゾ—, ,+】,... ―】) |Cゾ) } [0132] In the class N-gram language model used in the unregistered word class determination unit 201a having the above-described configuration, generally, a word sequence W consisting of I words is generated. The following formula is formulated using the probability of the n-chain of words: [Equation 1] (], 2 ,…) = fT (| Czo—,, +], ... ―】 ) | C
ゾ =1 ここで、 W、 W、 ...、 Wは個々の単語を表わし、 C、じ、 ...、 Cはそれぞれ対  Z = 1 where W, W, ..., W represent individual words, C,
1 2 1 1 2 1  1 2 1 1 2 1
応する単語の属するクラスを意味する。  It means the class to which the corresponding word belongs.
[0133] 従って、 P(C I C , ... , C )は、単語クラスの n連鎖が生起する確率を意味し  [0133] Therefore, P (C I C, ..., C) means the probability that an n-chain of word classes will occur.
j rn+l H  j rn + l H
、P(W I c)はこのクラス  , P (W I c) is this class
J J c力も具体的な単語 wが生起する確率を意味する。ここで  J J c force also means the probability that a specific word w will occur. here
J J  J J
、クラスとは、単語の品詞や、さらにそれを細分ィ匕した単位など、単語の接続性を考 慮したまとまりを意味する。  A class means a group that takes into account the connectivity of the word, such as the part of speech of the word and the unit that further subdivides it.
[0134] なお、このような一般的なクラス Nグラム言語モデルを用いる場合、 Wは未登録語 であるため、未登録語クラスに対して P(W I C)を予め求めておくことはできない。未 登録語のための P(W I C)を持つモデルの 1つ力 上記図 14〜16によって機能構 成を示した、未登録語単語 Wjをより小さく基本的な単語の連鎖としてモデルィ匕する 方法である (特願 2003 - 276844号「連続音声認識装置および連続音声認識方法 」参照。)。  [0134] When such a general class N-gram language model is used, since W is an unregistered word, P (W I C) cannot be obtained in advance for the unregistered word class. One power of a model with P (WIC) for unregistered words In the method shown in Fig. 14-16, the unregistered word word Wj is modeled as a smaller basic word chain. Yes (see Japanese Patent Application No. 2003-276844 “Continuous Speech Recognition Device and Continuous Speech Recognition Method”).
[0135] このようなモデルを用いて未登録語のカテゴリを判定するモデルとする場合には、 判定を行なう未登録語のカテゴリごとに「未登録語固有人名クラス」、「未登録語固有 番組クラス」のようにクラスを定義して言語モデルのトレーニングを行なう。  [0135] When a model for determining a category of unregistered words using such a model is used, an "unregistered word specific person name class" and "unregistered word specific program" are determined for each category of unregistered words to be determined. Language classes are trained by defining classes like “Class”.
[0136] このようなトレーニングによる n=3とした言語モデルの例を図 17に示す。この例に 示された言語モデルを用いると、「〇〇を録画」の生起する確率は、「〇〇」を固有番 組名クラスの単語と考えた場合、  [0136] Fig. 17 shows an example of a language model with n = 3 by such training. Using the language model shown in this example, the probability of occurrence of “recording 00” is that if “00” is considered a word of a unique program name class,
[数 2]  [Equation 2]
P(<動作"録画" >|<未登録語固有番組名 >,<格助詞 >)·Ρ (録画 |<動作"録画" >) = 0.8x0.35  P (<motion "recording >> | <unregistered word specific program name>, <case particle>) · Ρ (recording | <motion" recording >>) = 0.8x0.35
= 0.28 となる。  = 0.28.
[0137] これに対して、「〇〇」を固有人名クラスと考えた場合、 [数 3] (<動作"録画" >|<未登録語固有人名 >,<格助詞 >) · (録画 Iく動作"録画" >) = 0.2 x 0.35 [0137] On the other hand, if "OO" is considered a proper person name class, [Equation 3] (<motion "recording"> | << unregistered word unique name>, <case particle>) · (recording I motion "recording">) = 0.2 x 0.35
= 0.07 となる。  = 0.07.
[0138] すなわち、この場合、「〇〇」を固有番組名クラスと考えた方が、生起確率が高くな ることから、これを固有番組名クラスと判定することができる。  That is, in this case, since the occurrence probability is higher when “OO” is considered as the unique program name class, it can be determined as the unique program name class.
[0139] 全く同様に、「△△の出演」の生起する確率は、「△△」を固有番組名クラスと考えた σ、 [0139] Exactly the same, the probability of occurrence of "△△ Appearance" is σ, where "△△" is considered as a unique program name class,
 Picture
Ρ(<動作"出演" >|<未登録語固有番組名 >,<格助詞 >) · (出演 |<動作' '出演'' >) = 0.1 x 0.35 Ρ (<Action "Appearance >> | <Unregistered word specific program name>, <Case particles>) · (Appearance | <Action '' Appearance '>) = 0.1 x 0.35
= 0.035 となる。  = 0.035.
[0140] これに対して、「△△」を固有人名クラスと考えた場合には、  [0140] On the other hand, when "△△" is considered as a proper person name class,
[数 5]  [Equation 5]
Ρ(<動作"出演" >|<未登録語固有人名 >, <格助詞 >) · (出演 |<動作' '出演" >) = 0.7 x 0.35 Ρ (<Action "Appearance>> | <Unregistered word unique name>, <Case particles>) · (Appearance | <Action '' Appearance>>) = 0.7 x 0.35
= 0.245 となる。  = 0.245.
[0141] すなわち、「△△」を固有人名クラスと考えた方力 生起確率が高くなることから、こ れを固有人名クラスと判定することができる。  [0141] That is, since the probability of occurrence of a force that considers "△△" as a unique person name class is high, this can be determined as a proper person name class.
[0142] このように、本実施の形態 2に係る音声認識装置 200によれば、例えば、利用者の 発話が「今日の"松下太郎"の出ている番組」であって「松下太郎」が未登録語であつ た場合、「松下太郎」を固有人名のクラスに属する未登録語と推定し、未登録語クラス 別単語格納部 202中の、固有人名クラスの未登録語単語格納部から未登録語候補 の検索を行なう。そして、その検索の結果として取得した複数の未登録語候補を結果 表示部 107を介して利用者に提示し、既に示した図 9のような応答を行なう。 [0143] これに対して、利用者の発話が「明日の""太陽を撃て"を録画して」であって、「太 陽を撃て」が未登録語であった場合、「太陽を撃て」を固有番組名クラスに属する未 登録語と推定し、未登録語クラス別単語格納部 202中の、固有番組名クラスの未登 録語単語格納部から未登録語候補の検索を行なう。そして、その検索の結果として 取得した複数の未登録語候補、例えば、図 18に示すような未登録語固有番組クラス の単語候補を、結果表示部 107を介して利用者に提示する。 [0142] Thus, according to the speech recognition apparatus 200 according to the second embodiment, for example, the user's utterance is "a program in which today's" Taro Matsushita "appears" and "Taro Matsushita" If it is an unregistered word, “Taro Matsushita” is presumed to be an unregistered word belonging to the class of the proper person name, and the unregistered word class storage part 202 of the unregistered word class stores the unregistered word from the unregistered word word storage part. Search for registered word candidates. Then, a plurality of unregistered word candidates acquired as a result of the search are presented to the user via the result display unit 107, and a response as shown in FIG. [0143] On the other hand, if the user's utterance is "record tomorrow""shoot the sun" and "shoot the sun" is an unregistered word, then "shoot the sun" ”Is estimated as an unregistered word belonging to the unique program name class, and unregistered word candidates are searched from the unregistered word word storage unit of the unique program name class in the word storage unit 202 for each unregistered word class. Then, a plurality of unregistered word candidates acquired as a result of the search, for example, word candidates of the unregistered word specific program class as shown in FIG. 18 are presented to the user via the result display unit 107.
[0144] このように未登録語の検索範囲を未登録語のカテゴリに応じて絞り込むことによる 効果は、例えば、人物検索をしているにもかかわらず番組名を提示するといつた、利 用者が本来意図していなカゝつたカテゴリの単語を未登録語として提示し、利用者を惑 わせてしまうのを防止することができる点と、検索範囲を絞り込むことで未登録語の検 索精度が向上する点を挙げることができる。  [0144] The effect of narrowing the search range of unregistered words according to the category of unregistered words in this way is, for example, when a program name is presented despite a person search. Presents words in a category that is not intended by users as unregistered words, and prevents users from being confused, and by narrowing the search range to search for unregistered words Can be mentioned.
[0145] なお、本実施の形態 2では、未登録語クラス判定部における未登録語カテゴリの判 定方法として、クラス Nグラム言語モデルを用いた判定方法の例を示した。このような 判定方法以外にも、この未登録語提示手段を搭載した音声認識システムを音声対話 システムの入力インタフェースとして用いるような場合において、対話の文脈情報を 利用する方法が可能である。この方法では、音声対話システムの対話管理部におい て、対話履歴情報から利用者が発話する可能性の高い単語カテゴリに関する推定情 報を生成し、これを未登録語クラス判定部に伝達する。未登録語クラス判定部は、伝 達された単語カテゴリに関する推定情報から、未登録語単語のカテゴリの判定を行な う。このような構成とする場合の未登録語クラス判定部のブロック図を、図 19に示す。  [0145] In the second embodiment, an example of a determination method using a class N-gram language model is shown as a method of determining an unregistered word category in the unregistered word class determination unit. In addition to such a determination method, in the case where a speech recognition system equipped with this unregistered word presenting means is used as an input interface of a spoken dialogue system, a method using the context information of the dialogue is possible. In this method, the dialog management unit of the voice dialog system generates estimated information on word categories that the user is likely to utter from the dialog history information, and transmits this to the unregistered word class determination unit. The unregistered word class determination unit determines the category of the unregistered word word from the estimated information regarding the word category transmitted. FIG. 19 shows a block diagram of the unregistered word class determination unit in such a configuration.
[0146] この場合、未登録語クラス判定部 201bは、外部アプリケーション力も発話単語の力 テゴリを取得する単語カテゴリ情報受信部 241と、単語カテゴリ情報受信部 241の取 得したカテゴリに基づいて、未登録語と判定された単語のカテゴリを決定する未登録 語クラス決定部 242とを備えている。このような構成を採ること〖こよる効果は、未登録 語カテゴリの判定をクラス Nグラム言語モデルを用いて行なう場合には、期待される 入力発声が文発声でなければならな力つたのに対して、対話管理部などアプリケー シヨンでの推定結果を用いることで、入力音声が単語発声であってもカテゴリの判定 ができることが挙げられる。 [0147] (実施の形態 3) [0146] In this case, the unregistered word class determination unit 201b is based on the category acquired by the word category information receiving unit 241 that acquires the category of the spoken word and the category acquired by the word category information receiving unit 241. An unregistered word class determining unit 242 that determines a category of a word determined to be a registered word. The effect of adopting such a configuration is that when the unregistered word category is determined using the class N-gram language model, the expected input utterance must be a sentence utterance. In addition, by using estimation results from applications such as the dialog management unit, it is possible to determine the category even if the input speech is word utterance. [Embodiment 3]
続いて、本発明の実施の形態 3に係る音声認識装置について説明する。  Subsequently, a speech recognition apparatus according to Embodiment 3 of the present invention will be described.
[0148] 図 20は、本実施の形態 3に係る音声認識装置の機能的な構成を示すブロック図で ある。 FIG. 20 is a block diagram showing a functional configuration of the speech recognition apparatus according to the third embodiment.
[0149] 図 20に示すように、音声認識装置 300は、音声認識部 101、音声認識語彙格納部 102、参照類似度計算部 103、未登録語判定部 104、未登録語候補検索部 105、 および、結果表示部 107を備えている点で、上記実施の形態 1等に係る音声認識装 置 100等と共通する。しかし、本実施の形態 3に係る音声認識装置 300は、ネットヮ ーク 302を介して未登録語単語サーバ 303と接続される未登録語単語格納部 301を 備えている点で、上記実施の形態 1等に係る音声認識装置 100等と異なる。以下、こ の異なる点を中心に説明する。なお、上記実施の形態 1等と共通する各部には、同 一の符号を付して、その説明を省略する。  As shown in FIG. 20, the speech recognition apparatus 300 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105, In addition, it is common to the speech recognition apparatus 100 according to the first embodiment and the like in that the result display unit 107 is provided. However, the speech recognition apparatus 300 according to the third embodiment includes the unregistered word word storage unit 301 connected to the unregistered word word server 303 via the network 302. Different from 100 etc. speech recognition device related to 1 etc. In the following, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment and the description thereof is omitted.
[0150] 未登録語単語格納部 301は、未登録語候補検索部 105における未登録語検索の 対象となる多数の未登録語を格納すると同時に、通信手段によってその格納情報を 更新する機能を有する。  [0150] The unregistered word storage unit 301 has a function of storing a large number of unregistered words that are to be searched for unregistered words in the unregistered word candidate search unit 105, and at the same time updating the stored information by means of communication means. .
[0151] ネットワーク 302は、インターネットや電話回線等の通信網である。  [0151] The network 302 is a communication network such as the Internet or a telephone line.
[0152] 未登録語単語サーバ 303は、必要な最新の未登録語単語を格納し、クライアント( ここでは、音声認識装置 300)力ものリクエストに対して、これらの情報を提供するサ ーバ装置である。  [0152] The unregistered word word server 303 stores the necessary latest unregistered word words, and provides a server device that provides such information to a client (in this case, the speech recognition device 300). It is.
[0153] 次に、このように構成される音声認識装置 300の動作を説明する。  [0153] Next, the operation of the speech recognition apparatus 300 configured as described above will be described.
[0154] 本実施の形態 3における、利用者の発話に対する音声認識装置の出力のフローは 、実施の形態 1に示したものと同様である。実施の形態 3における相異点は、未登録 語候補検索部 105が参照する未登録語単語格納部 301のメンテナンス方法にある。 [0154] The output flow of the speech recognition apparatus for the user's utterance in the third embodiment is the same as that shown in the first embodiment. The difference in the third embodiment is in the maintenance method of the unregistered word word storage unit 301 referred to by the unregistered word candidate search unit 105.
[0155] 本実施の形態 3において、未登録語単語格納部 301は任意に更新可能なものとな つている。すなわち、固有人名や固有番組名など、日々変化し増加する単語を固定 的に保持していたのでは、未登録語候補検索時に利用者の発話単語を検索できな いという事態が起こり得る。例えば、テレビ放送における番組改変の時期や、プロスポ ーッにおける新し ヽシーズンの開始時期などにぉ ヽては、放映される番組タイトルに 新 、タイトルが現われたり、新 、芸能人や新 、スポーツ選手名が登場したりす るようになり、これらは未登録語となる。 [0155] In the third embodiment, the unregistered word storage unit 301 can be arbitrarily updated. In other words, if words that change and increase daily, such as unique names and unique program names, are fixedly held, the user's spoken words cannot be searched when searching for unregistered word candidates. For example, depending on the time of program modification in television broadcasting or the start of a new season in pro sports, New titles appear, new entertainers and new athlete names appear, and these become unregistered words.
[0156] そのため、未登録語単語格納部 301に格納される単語を更新可能とし、これらの新 し!ヽ未登録語を未登録語単語格納部 301に格納することで、未登録語候補検索時 に利用者が発話する単語を検索できな 、と 、う事態を回避することができる。  [0156] Therefore, it is possible to update the words stored in the unregistered word word storage unit 301, and by storing these new unregistered words in the unregistered word word storage unit 301, the unregistered word candidate search It is possible to avoid a situation where the user cannot search for words spoken at times.
[0157] 未登録語単語格納部 301に格納される単語の更新作業は、具体的には以下のよう にして行なう。  The update operation of the words stored in the unregistered word storage unit 301 is specifically performed as follows.
[0158] 未登録語単語格納部 301に登録されて ヽな ヽ未登録語が急増すると考えられる日 を予め定めておき、この日が来ると自動的に未登録語単語更新要求を、電話回線や インターネットなどのネットワーク 302を経由して、未登録語単語サーバ 303に送信す る。あるいは、常に未登録語単語格納部 301が予め定められたスケジュールに従つ て更新を行なうだけでなぐ未登録語登録の不足を感じた利用者が更新要求を行な うことによって、未登録語単語格納部 301から未登録語単語サーバ 303へ更新要求 を送信する。さらにまた、未登録語単語格納部 301が、常時、能動的に更新要求を 未登録語単語サーバ 303に送信するだけでなぐ一定量の未登録語が追加されたこ とを検出した未登録語単語サーバ 303が、各クライアントの未登録語単語格納部 30 1へ更新情報を送信するとしてもよい。更新要求を受けた、または、新規の未登録語 が規定量に達して更新が必要と判断した未登録語単語サーバ 303は、追加された 単語に関する情報を、クライアントの未登録語単語格納部 301へ返信する。  [0158] An unregistered word storage unit 301 has been registered in advance to determine the number of days in which unregistered words are expected to increase rapidly. When this date arrives, an unregistered word update request is automatically sent to the telephone line. And transmitted to the unregistered word server 303 via the network 302 such as the Internet. Alternatively, an unregistered word storage unit 301 always performs an update request according to a predetermined schedule, and a user who feels that there is a shortage of unregistered word registration makes an update request. An update request is transmitted from the word storage unit 301 to the unregistered word server 303. Furthermore, the unregistered word word storage unit 301 detects that a certain amount of unregistered words has been added just by actively transmitting an update request to the unregistered word word server 303 at all times. The server 303 may transmit update information to the unregistered word storage unit 301 of each client. The unregistered word word server 303 that has received the update request or has determined that the update is necessary because the new unregistered word has reached the specified amount, stores the information about the added word in the unregistered word storage unit 301 of the client. Reply to
[0159] このようにすれば、未登録語単語は、未登録語単語サーバ 303上でのみ正しくメン テナンスされておればよぐ各クライアントは、未登録語単語サーバ 303にアクセスす るための通信手段だけ有して 、れば、常に未登録語単語格納部 301を最適な状態 に維持することができる。  [0159] In this way, it is only necessary for unregistered word words to be properly maintained only on the unregistered word word server 303, so that each client can communicate to access the unregistered word word server 303. If only the means are provided, the unregistered word storage unit 301 can always be maintained in an optimum state.
[0160] また、このように、外部のサーバから新たな未登録語を提供することにより、日々増 加する固有人名や固有タイトルのような未登録語にっ ヽて、これらを未登録語単語 格納部 301に登録する手間を利用者に要求することなぐ未登録語単語格納部 301 を最適な状態に保つことが可能となるという効果を得ることができる。  [0160] In addition, by providing new unregistered words from an external server in this way, unregistered words such as unique names and unique titles that increase daily are converted into unregistered word words. It is possible to obtain an effect that it is possible to keep the unregistered word / word storage unit 301 in an optimal state without requiring the user to register in the storage unit 301.
[0161] なお、本実施の形態 3では、最適にメンテナンスされた、単語を更新する作業専用 の未登録語単語サーバ 303に従って未登録語単語格納部 301が格納する単語を更 新するものとしたが、更新作業専用に特化されて 、な 、サーバが持つ情報を使って 、単語更新することも可能である。 [0161] In the third embodiment, it is dedicated to the operation of updating a word that is optimally maintained. The unregistered word word storage unit 301 updates the word stored in accordance with the unregistered word word server 303, but it is specialized for the update work, and the word is updated using information held by the server. It is also possible.
[0162] 例えば、テレビ放送にお 、ては、 EPG(Electronic Program Guide)と呼ばれる電子 番組表が放送波に合わせて送信されている。ここに記録された出演者名や番組タイ トルを自動抽出し、これを未登録語単語格納部 301に格納することが可能である。同 様にインターネットの WEBサービスでは、芸能人に関する情報が記載されたサイトや 、番組に関する情報が記載されたサイトなどが多数存在する。これらを順次巡回する ことで必要な情報を収集し、未登録語単語格納部 301に格納することも可能である。 さらに利用者による過去の未登録語参照履歴から、利用者が発話する可能性の少な いジャンル、たとえばプロ野球選手名、外国映画俳優名、日本映画タイトルなどのジ ヤンル情報を予め抽出しておき、これら抽出されたジャンルの未登録語は、未登録語 単語サーバ 303から取得しないようにすることも可能である。これにより、未登録語単 語格納部 301を無用に肥大化させるのを防ぐという効果を得ることもできる。  [0162] For example, in television broadcasting, an electronic program guide called EPG (Electronic Program Guide) is transmitted along with the broadcast wave. It is possible to automatically extract performer names and program titles recorded here and store them in the unregistered word storage unit 301. Similarly, there are many websites that contain information about entertainers and other websites that contain information about programs. It is also possible to collect necessary information by sequentially circulating these and store them in the unregistered word storage unit 301. Furthermore, from the past unregistered word reference history by the user, genres that the user is unlikely to utter, for example, genre information such as professional baseball player names, foreign movie actor names, Japanese movie titles, etc. are extracted in advance. The unregistered words of these extracted genres may not be acquired from the unregistered word word server 303. Thereby, it is possible to obtain an effect of preventing the unregistered word word storage unit 301 from being unnecessarily enlarged.
[0163] また、音声認識語彙格納部 102に格納される単語を更新する変形も考えられる。こ の変形の具体例として、図外に設けられるサーバが、利用者によって近い将来に発 話される可能性が高いと考えられる単語を選択し、選択された単語に関して音声認 識語彙格納部 102の内容を更新してもよい。そのような単語として、例えば、この音 声認識装置 300が録画予約システムに適用される場合、前述した EPGに記録された 出演者名や番組タイトルの中から 1週間以内に放送予定の番組に関連する単語を好 適に用いることができる。そして、そのサーバは、抽出された単語の認識に音声認識 部 101が用 、る情報を生成し、生成された情報で音声認識語彙格納部 102の内容 を更新する。  [0163] Further, a modification in which the word stored in the speech recognition vocabulary storage unit 102 is updated is also conceivable. As a specific example of this modification, a server provided outside the figure selects a word that is likely to be spoken by the user in the near future, and the speech recognition vocabulary storage unit 102 for the selected word. The contents of may be updated. As such a word, for example, when this audio recognition device 300 is applied to a recording reservation system, it is related to a program scheduled to be broadcast within one week from the performer names and program titles recorded in the EPG described above. Can be used appropriately. Then, the server generates information used by the speech recognition unit 101 for recognizing the extracted word, and updates the content of the speech recognition vocabulary storage unit 102 with the generated information.
[0164] このような更新操作は、未登録語単語サーバ 303からネットワーク 302を経由して 未登録語単語格納部 301の内容を更新する操作と、全く同様に行うことができる。好 ましくは、日々、過去の放送予定となった番組に関連する単語の認識用の情報を削 除すると共に、 1週間先の放送予定の番組に関連する単語の認識用の情報を追加し てもよい。 [0165] この構成によれば、登録語の使用頻度が予め知られた時間変動を見せる場合に、 その変動に追従して外部から与えられる認識用の情報 (音声認識語彙)を用いて、 高い使用頻度が見込まれる比較的少数の認識用情報のみを音声認識語彙格納部 1 02に記憶させておくことができるので、認識時間を短縮すると共に良好な認識率を 得ることが容易となる。 Such an update operation can be performed in exactly the same manner as the operation of updating the contents of the unregistered word word storage unit 301 from the unregistered word word server 303 via the network 302. Preferably, the information for recognizing words related to programs scheduled to be broadcast in the past is deleted every day, and the information for recognizing words related to programs scheduled to be broadcast one week ahead is added. May be. [0165] According to this configuration, when the usage frequency of the registered word shows a known time variation, the recognition word (speech recognition vocabulary) given from the outside following the variation is used to increase the frequency. Since only a relatively small number of information for recognition that is expected to be used can be stored in the speech recognition vocabulary storage unit 102, it is easy to shorten the recognition time and obtain a good recognition rate.
[0166] (実施の形態 4)  [Embodiment 4]
さらに続いて、本発明の実施の形態 4に係る音声認識装置について説明する。  Subsequently, a speech recognition apparatus according to Embodiment 4 of the present invention will be described.
[0167] 図 21は、本発明の実施の形態 4に係る音声認識装置の機能的な構成を示すプロ ック図である。  FIG. 21 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 4 of the present invention.
[0168] 図 21に示すように、音声認識装置 400は、音声認識部 101、音声認識語彙格納部 102、参照類似度計算部 103、未登録語判定部 104、および、結果表示部 107を備 えている点で、上記実施の形態 1〜3に係る音声認識装置 100等と共通する。しかし 、本実施の形態 4に係る音声認識装置 400は、ネットワーク 402を介して未登録語検 索サーバ 403と接続される未登録語検索要求送受信部 401を備えている点で、上記 実施の形態 1〜3に係る音声認識装置 100等と異なる。以下、この異なる点を中心に 説明する。なお、上記実施の形態 1等と共通する各部には、同一の符号を付して、そ の説明を省略する。  As shown in FIG. 21, the speech recognition apparatus 400 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, and a result display unit 107. This is common with the speech recognition apparatus 100 according to the first to third embodiments. However, the speech recognition apparatus 400 according to the fourth embodiment includes the unregistered word search request transmission / reception unit 401 connected to the unregistered word search server 403 via the network 402. It differs from the speech recognition apparatus 100 according to 1-3. Hereinafter, this difference will be mainly described. Note that the same reference numerals are given to the same portions as those in the first embodiment and the description thereof is omitted.
[0169] 未登録語検索要求送受信部 401は、ネットワーク 402を経由して、未登録語の検 索要求を未登録語検索サーバ 403に送信し、未登録語検索サーバ 403から未登録 語の検索結果を受信する処理部であり、通信インタフェース等によって実現される。 この未登録語検索要求送受信部 401は、未登録語の検索が必要となった場合に、 実施の形態 1で述べたような参照類似度計算部 103で求めたサブワード系列や、入 力音声の未登録語部分のパラメータなど、未登録語発話個所の発話内容を表わす 情報をネットワーク 402を経由して未登録語検索サーバ 403に送信し、また、未登録 語検索サーノくからの返信を未登録語検索結果として、結果表示部 107に出力する。  [0169] The unregistered word search request transmission / reception unit 401 transmits an unregistered word search request to the unregistered word search server 403 via the network 402, and the unregistered word search server 403 searches for the unregistered word. A processing unit that receives the result, and is realized by a communication interface or the like. This unregistered word search request transmission / reception unit 401 receives the subword sequence obtained by the reference similarity calculation unit 103 as described in Embodiment 1 and the input speech when search for an unregistered word is necessary. Information indicating the utterance content of the unregistered word utterance location, such as parameters of the unregistered word part, is sent to the unregistered word search server 403 via the network 402, and the reply from the unregistered word search Sarnoku is not registered The result is output to the result display unit 107 as a word search result.
[0170] ネットワーク 402は、インターネットや電話回線等の通信網である。 [0170] The network 402 is a communication network such as the Internet or a telephone line.
[0171] 未登録語検索サーバ 403は、クライアント (音声認識装置 400)からの要求に応じて 未登録語の検索を行なうサーバ装置であり、未登録語検索部 404と未登録語単語格 納部 405とを備える。 [0171] The unregistered word search server 403 is a server device that searches for unregistered words in response to a request from the client (speech recognition device 400). And a storage unit 405.
[0172] 未登録語検索部 404は、未登録語検索を行なう処理部であるとともに、未登録語に 関する情報をネットワーク 402を介してクライアントから受信し、その検索結果をネット ワーク 402を経由して返信する通信機能も有している。  [0172] The unregistered word search unit 404 is a processing unit that performs an unregistered word search, receives information about unregistered words from the client via the network 402, and receives the search result via the network 402. It also has a communication function for replying.
[0173] 未登録語単語格納部 405は、未登録語単語に関する情報を格納するハードデイス ク等の記憶装置である。  Unregistered word word storage section 405 is a storage device such as a hard disk that stores information related to unregistered word words.
[0174] 次に、このように構成される音声認識装置 400の動作を説明する。  [0174] Next, the operation of the speech recognition apparatus 400 configured as described above will be described.
[0175] 本実施の形態 4における、利用者の発話に対する音声認識装置の出力のフローは 、実施の形態 1に示したものと同様である。実施の形態 4における相異点は、実施の 形態 1における未登録語候補検索部 105を内部に持たずに、未登録語候補の検索 作業を外部のサーバに委託する点である。  In the fourth embodiment, the output flow of the speech recognition apparatus for the user's utterance is the same as that shown in the first embodiment. The difference in the fourth embodiment is that the unregistered word candidate search unit 105 in the first embodiment is not provided inside, and the search operation for unregistered word candidates is outsourced to an external server.
[0176] すなわち、利用者による発話に未登録語が含まれていたと判断された場合には、 参照類似度計算部 103で求められた未登録語個所のサブワード系列を、未登録語 検索要求送受信部 401が未登録語検索サーバ 403に送信する。クライアントから未 登録語個所のサブワード系列を受信した未登録語検索部 404は、未登録語単語格 納部 405に格納された単語群から、利用者により発話された未登録語の検索を行な う。ここで、サブワード系列を用いて未登録語を検索する方法は、上記した実施の形 態 1で図 10を用いて説明した方法などが有効である。このようにして得られた検索結 果は、未登録語候補としてネットワーク 402を経由して未登録語検索要求送受信部 4 01に返される。未登録語検索要求送受信部 401は、返信された未登録語検索結果 を結果表示部 107に渡し、結果表示部 107を介して利用者の発話した単語が未登 録語であったことを利用者に提示する。  That is, when it is determined that an unregistered word is included in the utterance by the user, the subword sequence of the unregistered word portion obtained by the reference similarity calculation unit 103 is transmitted / received as an unregistered word search request. The unit 401 transmits the unregistered word search server 403. The unregistered word search unit 404 that has received the sub-word sequence of the unregistered word portion from the client searches the unregistered word uttered by the user from the word group stored in the unregistered word word storage unit 405. Yeah. Here, as a method of searching for an unregistered word using a subword sequence, the method described with reference to FIG. 10 in Embodiment 1 described above is effective. The search results obtained in this way are returned to the unregistered word search request transmission / reception unit 401 via the network 402 as unregistered word candidates. The unregistered word search request transmission / reception unit 401 passes the returned unregistered word search result to the result display unit 107, and uses the fact that the word spoken by the user via the result display unit 107 was an unregistered word. Present to the person.
[0177] 以上のように、未登録語検索を行なう処理を外部のサーバに代替して行なわせ、リ モートに配置することにより、日々変化し、また増大する未登録語単語格納部のメン テナンスを一元化して、維持コストを低く抑えることが可能になると 、う効果を得ること ができる。  [0177] As described above, maintenance of the unregistered word storage section that changes and increases daily by having an external server perform the process of searching for unregistered words and placing it remotely. If it becomes possible to consolidate the costs and keep the maintenance costs low, the effect can be obtained.
[0178] また、大語彙のリストの中から目的の単語の検索を行なうには、大きな計算リソース が必要であるところ、このような作業を外部に委託することにより、音声認識装置自体 のハードウェア構成をコンパクトにすることができるという効果も得られる。 [0178] Moreover, in order to search for a target word from the large vocabulary list, a large computational resource is required. By entrusting such work to the outside, the speech recognition apparatus itself The hardware configuration can be made compact.
[0179] また反対に、サーバ側では一般に比較的大きなハードウェア構成とすることが可能 であるから、携帯端末等のクライアント側に搭載するのは、ハードウェア構成として難 しいような未登録語の検索アルゴリズムを実装することが可能となり、未登録語の検 索精度を高めることも実現可能となりうる。  [0179] On the other hand, since the server side can generally have a relatively large hardware configuration, it is difficult to install unregistered words that are difficult to install on the client side such as a mobile terminal. A search algorithm can be implemented, and it may be possible to improve the search accuracy for unregistered words.
[0180] なお、上記実施の形態 4では、未登録語検索のための検索データとしてサブワード 系列を用いる例を示したが、実施の形態 1でも述べたように、利用者による発話音声 そのものや、そこから抽出した音響パラメータを用いて未登録語検索を行なうように、 未登録語検索サーバを実現することも可能である。  [0180] In the fourth embodiment, an example in which a subword sequence is used as search data for unregistered word search has been shown. However, as described in the first embodiment, the speech uttered by the user itself, It is also possible to implement an unregistered word search server so that unregistered word searches are performed using the acoustic parameters extracted from it.
産業上の利用可能性  Industrial applicability
[0181] 本発明は、機器への入力手段として音声認識技術を利用する種々の電子機器、例 えばテレビ、ビデオ等の AV機器、カーナビゲーシヨンシステムなどの車載器、 PDA や携帯電話などの携帯端末器等に利用することができ、その産業上の利用可能性 は、非常に広ぐかつ、大きい。  [0181] The present invention relates to various electronic devices that use speech recognition technology as input means to devices, such as AV devices such as TVs and videos, car-mounted devices such as car navigation systems, and portable devices such as PDAs and mobile phones. It can be used for terminals, etc., and its industrial applicability is very wide and large.

Claims

請求の範囲 The scope of the claims
[1] 発話された音声を認識する音声認識装置であって、  [1] A speech recognition device for recognizing spoken speech,
音声認識のための語彙を定義し、登録語として記憶する音声認識用単語記憶手段 と、  Defining a vocabulary for speech recognition and storing it as a registered word;
前記発話された音声と、前記音声認識用単語記憶手段に記憶されている登録語と を照合する音声認識手段と、  Speech recognition means for collating the spoken speech with registered words stored in the speech recognition word storage means;
前記音声認識手段の照合結果に基づいて、前記発話された音声が、前記音声認 識用単語記憶手段に記憶されて ヽる登録語であるか、記憶されて ヽな ヽ未登録語で あるかを判定する未登録語判定手段と、  Whether the spoken speech is a registered word that is stored in the speech recognition word storage unit or is an unregistered word that is stored based on the collation result of the speech recognition unit Unregistered word determination means for determining
前記未登録語を記憶する未登録語単語記憶手段と、  Unregistered word storage means for storing the unregistered word;
前記未登録語判定手段で未登録語と判定された場合に、前記発話された音声に 基づいて、前記発話された音声に対応すると思われる未登録語候補を、前記未登録 語単語記憶手段に記憶されて ヽる未登録語の中から検索する未登録語候補検索手 段と、  If the unregistered word determining means determines that the word is an unregistered word, an unregistered word candidate that seems to correspond to the spoken voice is stored in the unregistered word word storage means based on the spoken voice. An unregistered word candidate search means for searching from unregistered words that are memorized,
前記検索結果を表示する結果表示手段とを備える  And a result display means for displaying the search result.
ことを特徴とする音声認識装置。  A speech recognition apparatus characterized by that.
[2] 前記未登録語候補検索手段は、  [2] The unregistered word candidate search means includes:
複数の未登録語候補を、前記未登録語単語記憶手段に記憶されて!ヽる未登録語 の中から検索する  Search multiple unregistered word candidates stored in the unregistered word word storage means!
ことを特徴とする請求項 1記載の音声認識装置。  The speech recognition apparatus according to claim 1, wherein:
[3] 前記未登録語単語記憶手段は、 [3] The unregistered word storage means
前記未登録語の属するカテゴリに応じて、前記カテゴリごとに分類して前記未登録 語を記憶している  According to the category to which the unregistered word belongs, the unregistered word is stored by classification for each category.
ことを特徴とする請求項 1または 2記載の音声認識装置。  The speech recognition apparatus according to claim 1 or 2, wherein
[4] 前記音声認識装置は、さらに、 [4] The voice recognition device further includes:
前記発話された音声に基づ 、て、前記未登録語の属するカテゴリを判定する未登 録語クラス判定手段を備える  An unregistered word class determining unit that determines a category to which the unregistered word belongs based on the spoken voice.
ことを特徴とする請求項 3記載の音声認識装置。 The speech recognition apparatus according to claim 3.
[5] 前記未登録語候補検索手段は、 [5] The unregistered word candidate search means includes:
前記未登録語クラス判定手段の判定結果に基づいて、前記未登録語候補を、前記 未登録語単語記憶手段における分類されたカテゴリの中から検索する  Based on the determination result of the unregistered word class determining means, the unregistered word candidate is searched from the classified categories in the unregistered word word storing means.
ことを特徴とする請求項 4記載の音声認識装置。  The speech recognition apparatus according to claim 4, wherein:
[6] 前記音声認識装置は、さらに、 [6] The voice recognition device further includes:
前記カテゴリに関する情報を取得する情報取得手段を備え、  Comprising information acquisition means for acquiring information relating to the category;
前記未登録語候補検索手段は、  The unregistered word candidate search means includes:
前記情報取得手段が取得した情報に基づいて、前記未登録語候補を、前記未登 録語単語記憶手段における分類されたカテゴリの中から検索する  Based on the information acquired by the information acquisition means, the unregistered word candidate is searched from the classified categories in the unregistered word word storage means.
ことを特徴とする請求項 3記載の音声認識装置。  The speech recognition apparatus according to claim 3.
[7] 前記結果表示手段は、 [7] The result display means includes:
前記未登録語候補検索手段における検索結果から前記音声認識用単語記憶手 段に記憶されて ヽる登録語を除外して、前記検索結果を表示する  The search result is displayed by excluding the registered word stored in the word storage means for speech recognition from the search result in the unregistered word candidate search means.
ことを特徴とする請求項 1記載の音声認識装置。  The speech recognition apparatus according to claim 1, wherein:
[8] 前記未登録語候補検索手段は、前記発話された音声との類似する度合!ヽを数値 化した未登録語スコアを計算することにより、前記未登録語候補を検索する [8] The unregistered word candidate search means searches for the unregistered word candidate by calculating an unregistered word score obtained by quantifying the degree of similarity with the spoken voice!
ことを特徴とする請求項 1記載の音声認識装置。  The speech recognition apparatus according to claim 1, wherein:
[9] 前記結果表示部は、前記検索結果として、前記未登録語候補とその未登録語スコ ァとを表示する [9] The result display unit displays the unregistered word candidate and the unregistered word score as the search result.
ことを特徴とする請求項 8記載の音声認識装置。  The speech recognition apparatus according to claim 8.
[10] 前記結果表示部は、前記未登録語スコアに応じて、前記未登録語候補の表示を変 更する [10] The result display unit changes the display of the unregistered word candidates according to the unregistered word score.
ことを特徴とする請求項 9記載の音声認識装置。  The speech recognition apparatus according to claim 9.
[11] 前記未登録語単語記憶手段に記憶されている未登録語は、所定の条件下で更新 される [11] Unregistered words stored in the unregistered word storage means are updated under predetermined conditions.
ことを特徴とする請求項 1記載の音声認識装置。  The speech recognition apparatus according to claim 1, wherein:
[12] 前記音声認識装置は、さらに、 [12] The voice recognition device further includes:
前記未登録語単語記憶手段に記憶されて!ヽな ヽ未登録語群を記憶する未登録語 単語サーバと通信する通信手段を備え、 The unregistered word that is stored in the unregistered word storage means! A communication means for communicating with the word server;
前記通信手段が前記未登録単語サーバから前記未登録語群を受信することによつ て、前記未登録語単語記憶手段に記憶されて ヽる未登録語を更新する  When the communication unit receives the unregistered word group from the unregistered word server, the unregistered word stored in the unregistered word word storage unit is updated.
ことを特徴とする請求項 11記載の音声認識装置。  The speech recognition apparatus according to claim 11, wherein:
[13] 前記音声認識用単語記憶手段に記憶されている登録語は、所定の条件下で更新 される [13] The registered words stored in the speech recognition word storage means are updated under predetermined conditions.
ことを特徴とする請求項 1記載の音声認識装置。  The speech recognition apparatus according to claim 1, wherein:
[14] 発話された音声を認識する音声認識システムであって、 [14] A speech recognition system for recognizing spoken speech,
前記音声認識システムは、  The voice recognition system includes:
発話された音声を認識する音声認識装置と、前記音声認識装置に登録されて!ヽな Vヽ未登録語を検索する未登録語検索サーバとを備え、  A speech recognition device for recognizing spoken speech, and an unregistered word search server for searching for unregistered words registered in the speech recognition device!
前記音声認識装置は、  The voice recognition device
音声認識のための語彙を定義し、登録語として記憶する音声認識用単語記憶手段 と、  Defining a vocabulary for speech recognition and storing it as a registered word;
前記発話された音声と、前記音声認識用単語記憶手段に記憶されている登録語と を照合する音声認識手段と、  Speech recognition means for collating the spoken speech with registered words stored in the speech recognition word storage means;
前記音声認識手段の照合結果に基づいて、前記発話された音声が、前記音声認 識用単語記憶手段に記憶されて ヽる登録語であるか、記憶されて ヽな ヽ未登録語で あるかを判定する未登録語判定手段と、  Whether the spoken speech is a registered word that is stored in the speech recognition word storage unit or is an unregistered word that is stored based on the collation result of the speech recognition unit Unregistered word determination means for determining
前記未登録語判定手段で未登録語と判定された場合に、前記未登録検索サーバ に、前記発話された音声に対応すると思われる未登録語候補の検索を要求する検 索要求送信手段と、  A search request transmitting means for requesting the unregistered search server to search for an unregistered word candidate that seems to correspond to the spoken speech when the unregistered word determining means determines that the word is an unregistered word;
前記未登録語検索サーバから前記未登録語候補の検索結果を取得する検索結果 受信手段と、  Search result receiving means for obtaining a search result of the unregistered word candidate from the unregistered word search server;
前記検索結果を表示する結果表示手段とを備え、  And a result display means for displaying the search result,
前記未登録語検索サーバは、  The unregistered word search server
前記未登録語を記憶する未登録語単語記憶手段と、  Unregistered word storage means for storing the unregistered word;
前記検索要求送信手段から前記検索要求を受信する検索要求受信手段と、 前記検索要求受信手段が前記検索要求を受信した場合に、前記発話された音声 に基づいて、前記発話された音声に対応すると思われる未登録語候補を、前記未登 録語単語記憶手段に記憶されて!ヽる未登録語の中から検索する未登録語候補検索 手段と、 Search request receiving means for receiving the search request from the search request transmitting means; When the search request receiving means receives the search request, an unregistered word candidate that seems to correspond to the spoken voice is stored in the unregistered word word storage means based on the spoken voice. Being! An unregistered word candidate search means for searching from unregistered unregistered words,
前記検索結果を前記音声認識装置に送信する検索結果送信手段とを備える ことを特徴とする音声認識システム。  A speech recognition system comprising: search result transmission means for transmitting the search result to the speech recognition apparatus.
[15] 発話された音声を認識する音声認識装置と、前記音声認識装置に登録されて!ヽな い未登録語を検索する未登録語検索サーバとから構成される音声認識システムにお ける音声認識装置であって、 [15] Speech in a speech recognition system comprising a speech recognition device that recognizes spoken speech and an unregistered word search server that searches unregistered words registered in the speech recognition device! A recognition device,
音声認識のための語彙を定義し、登録語として記憶する音声認識用単語記憶手段 と、  Defining a vocabulary for speech recognition and storing it as a registered word;
前記発話された音声と、前記音声認識用単語記憶手段に記憶されている登録語と を照合  The spoken speech is collated with a registered word stored in the speech recognition word storage means.
する音声認識手段と、  Voice recognition means to
前記音声認識手段の照合結果に基づいて、前記発話された音声が、前記音声認 識用単語記憶手段に記憶されて ヽる登録語であるか、記憶されて ヽな ヽ未登録語で あるかを判定する未登録語判定手段と、  Whether the spoken speech is a registered word that is stored in the speech recognition word storage unit or is an unregistered word that is stored based on the collation result of the speech recognition unit Unregistered word determination means for determining
前記未登録語判定手段で未登録語と判定された場合に、前記未登録検索サーバ に、前記発話された音声に対応すると思われる未登録語候補の検索を要求する検 索要求送信手段と、  A search request transmitting means for requesting the unregistered search server to search for an unregistered word candidate that seems to correspond to the spoken speech when the unregistered word determining means determines that the word is an unregistered word;
前記未登録語検索サーバから前記未登録語候補の検索結果を取得する検索結果 受信手段と、  Search result receiving means for obtaining a search result of the unregistered word candidate from the unregistered word search server;
前記検索結果を表示する結果表示手段とを備える  And a result display means for displaying the search result.
ことを特徴とする音声認識装置。  A speech recognition apparatus characterized by that.
[16] 発話された音声を認識する音声認識装置と、前記音声認識装置に登録されて!ヽな い未登録語を検索する未登録語検索サーバとから構成される音声認識システムにお ける未登録語検索サーバであって、 [16] An unrecognized speech recognition system comprising a speech recognition device that recognizes spoken speech and an unregistered word search server that searches unregistered words registered in the speech recognition device. A registered word search server,
前記未登録語を記憶する未登録語単語記憶手段と、 前記検索要求送信手段から前記検索要求を受信する検索要求受信手段と、 前記検索要求受信手段が前記検索要求を受信した場合に、前記発話された音声 に基づいて、前記発話された音声に対応すると思われる未登録語候補を、前記未登 録語単語記憶手段に記憶されて!ヽる未登録語の中から検索する未登録語候補検索 手段と、 Unregistered word storage means for storing the unregistered word; A search request receiving means for receiving the search request from the search request transmitting means; and when the search request receiving means receives the search request, based on the uttered voice, Possible unregistered word candidates are stored in the unregistered word storage means! An unregistered word candidate search means for searching from unregistered unregistered words,
前記検索結果を前記音声認識装置に送信する検索結果送信手段とを備える ことを特徴とする未登録語検索サーバ。  An unregistered word search server, comprising: search result transmission means for transmitting the search result to the speech recognition apparatus.
[17] 発話された音声を認識する音声認識方法であって、 [17] A speech recognition method for recognizing spoken speech,
前記発話された音声と、音声認識のための語彙を定義し、登録語として記憶する音 声認識用単語データベースに記憶されている登録語とを照合する音声認識ステップ と、  A speech recognition step of collating the spoken speech with a registered word stored in a speech recognition word database that defines a vocabulary for speech recognition and stores it as a registered word;
前記音声認識ステップにおける照合結果に基づいて、前記発話された音声が、前 記音声認識用単語データベースに記憶されて 、る登録語である力 記憶されて 、な い未登録語であるかを判定する未登録語判定ステップと、  Based on the collation result in the speech recognition step, it is determined whether the spoken speech is stored in the speech recognition word database and stored as a registered word or not. Unregistered word determination step,
前記未登録語判定ステップで未登録語と判定された場合に、前記発話された音声 に基づいて、前記発話された音声に対応すると思われる未登録語候補を、前記未登 録語を記憶する未登録語単語データベースに記憶されている未登録語の中から検 索する未登録語候補検索ステップと、  If the unregistered word is determined to be an unregistered word in the unregistered word determining step, the unregistered word is stored as an unregistered word candidate that seems to correspond to the spoken speech based on the spoken speech. An unregistered word candidate search step for searching from unregistered words stored in the unregistered word database;
前記検索結果を表示する結果表示ステップとを含む  A result display step for displaying the search result.
ことを特徴とする音声認識方法。  A speech recognition method characterized by the above.
[18] 発話された音声を認識する音声認識装置のためのプログラムであって、 [18] A program for a speech recognition device for recognizing spoken speech,
前記発話された音声と、音声認識のための語彙を定義し、登録語として記憶する音 声認識用単語データベースに記憶されている登録語とを照合する音声認識ステップ と、  A speech recognition step of collating the spoken speech with a registered word stored in a speech recognition word database that defines a vocabulary for speech recognition and stores it as a registered word;
前記音声認識ステップにおける照合結果に基づいて、前記発話された音声が、前 記音声認識用単語データベースに記憶されて 、る登録語である力 記憶されて 、な い未登録語であるかを判定する未登録語判定ステップと、  Based on the collation result in the speech recognition step, it is determined whether the spoken speech is stored in the speech recognition word database and stored as a registered word or not. Unregistered word determination step,
前記未登録語判定ステップで未登録語と判定された場合に、前記発話された音声 に基づいて、前記発話された音声に対応すると思われる未登録語候補を、前記未登 録語を記憶する未登録語単語データベースに記憶されている未登録語の中力ら検 索する未登録語候補検索ステップと、 If the unregistered word is determined as an unregistered word in the unregistered word determination step, the spoken voice Unregistered word candidates that are considered to correspond to the spoken speech based on the unregistered word stored in the unregistered word word database storing the unregistered words are searched for unregistered words A word candidate search step;
前記検索結果を表示する結果表示ステップとをコンピュータに実行させる ことを特徴とするプログラム。  A program for causing a computer to execute a result display step for displaying the search result.
PCT/JP2005/010183 2004-06-10 2005-06-02 Speech recognition device, speech recognition method, and program WO2005122144A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2006514478A JP4705023B2 (en) 2004-06-10 2005-06-02 Speech recognition apparatus, speech recognition method, and program
US11/628,887 US7813928B2 (en) 2004-06-10 2005-06-02 Speech recognition device, speech recognition method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2004173147 2004-06-10
JP2004-173147 2004-06-10

Publications (1)

Publication Number Publication Date
WO2005122144A1 true WO2005122144A1 (en) 2005-12-22

Family

ID=35503310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2005/010183 WO2005122144A1 (en) 2004-06-10 2005-06-02 Speech recognition device, speech recognition method, and program

Country Status (3)

Country Link
US (1) US7813928B2 (en)
JP (1) JP4705023B2 (en)
WO (1) WO2005122144A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070185713A1 (en) * 2006-02-09 2007-08-09 Samsung Electronics Co., Ltd. Recognition confidence measuring by lexical distance between candidates
WO2007097390A1 (en) * 2006-02-23 2007-08-30 Nec Corporation Speech recognition system, speech recognition result output method, and speech recognition result output program
JP2010014885A (en) * 2008-07-02 2010-01-21 Advanced Telecommunication Research Institute International Information processing terminal with voice recognition function
US7881928B2 (en) * 2006-09-01 2011-02-01 International Business Machines Corporation Enhanced linguistic transformation
KR101300839B1 (en) 2007-12-18 2013-09-10 삼성전자주식회사 Voice query extension method and system
US8688451B2 (en) * 2006-05-11 2014-04-01 General Motors Llc Distinguishing out-of-vocabulary speech from in-vocabulary speech
DE112007002665B4 (en) * 2006-12-15 2017-12-28 Mitsubishi Electric Corp. Voice recognition system
CN113869281A (en) * 2018-07-19 2021-12-31 北京影谱科技股份有限公司 A person identification method, device, equipment and medium

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070132834A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Speech disambiguation in a composite services enablement environment
WO2007138875A1 (en) * 2006-05-31 2007-12-06 Nec Corporation Speech recognition word dictionary/language model making system, method, and program, and speech recognition system
US20080154600A1 (en) * 2006-12-21 2008-06-26 Nokia Corporation System, Method, Apparatus and Computer Program Product for Providing Dynamic Vocabulary Prediction for Speech Recognition
KR100897554B1 (en) * 2007-02-21 2009-05-15 삼성전자주식회사 Distributed speech recognition system and method and terminal for distributed speech recognition
WO2008126347A1 (en) * 2007-03-16 2008-10-23 Panasonic Corporation Voice analysis device, voice analysis method, voice analysis program, and system integration circuit
US8756527B2 (en) * 2008-01-18 2014-06-17 Rpx Corporation Method, apparatus and computer program product for providing a word input mechanism
JP5024154B2 (en) * 2008-03-27 2012-09-12 富士通株式会社 Association apparatus, association method, and computer program
KR101427686B1 (en) 2008-06-09 2014-08-12 삼성전자주식회사 Program selection method and apparatus
JP2010154397A (en) * 2008-12-26 2010-07-08 Sony Corp Data processor, data processing method, and program
JP5692493B2 (en) * 2009-02-05 2015-04-01 セイコーエプソン株式会社 Hidden Markov Model Creation Program, Information Storage Medium, Hidden Markov Model Creation System, Speech Recognition System, and Speech Recognition Method
US9659559B2 (en) * 2009-06-25 2017-05-23 Adacel Systems, Inc. Phonetic distance measurement system and related methods
KR20110006004A (en) * 2009-07-13 2011-01-20 삼성전자주식회사 Combined recognition unit optimization device and method
US20150279354A1 (en) * 2010-05-19 2015-10-01 Google Inc. Personalization and Latency Reduction for Voice-Activated Commands
US8522283B2 (en) 2010-05-20 2013-08-27 Google Inc. Television remote control data transfer
JP5739718B2 (en) * 2011-04-19 2015-06-24 本田技研工業株式会社 Interactive device
JP5642037B2 (en) * 2011-09-22 2014-12-17 株式会社東芝 SEARCH DEVICE, SEARCH METHOD, AND PROGRAM
JP5853653B2 (en) * 2011-12-01 2016-02-09 ソニー株式会社 Server device, information terminal, and program
JP5675722B2 (en) * 2012-07-23 2015-02-25 東芝テック株式会社 Recognition dictionary processing apparatus and recognition dictionary processing program
US9311914B2 (en) * 2012-09-03 2016-04-12 Nice-Systems Ltd Method and apparatus for enhanced phonetic indexing and search
JP6221301B2 (en) * 2013-03-28 2017-11-01 富士通株式会社 Audio processing apparatus, audio processing system, and audio processing method
US10170114B2 (en) 2013-05-30 2019-01-01 Promptu Systems Corporation Systems and methods for adaptive proper name entity recognition and understanding
JP6100101B2 (en) * 2013-06-04 2017-03-22 アルパイン株式会社 Candidate selection apparatus and candidate selection method using speech recognition
US9384731B2 (en) * 2013-11-06 2016-07-05 Microsoft Technology Licensing, Llc Detecting speech input phrase confusion risk
US9653071B2 (en) * 2014-02-08 2017-05-16 Honda Motor Co., Ltd. Method and system for the correction-centric detection of critical speech recognition errors in spoken short messages
JP5921601B2 (en) * 2014-05-08 2016-05-24 日本電信電話株式会社 Speech recognition dictionary update device, speech recognition dictionary update method, program
CN107112007B (en) * 2014-12-24 2020-08-07 三菱电机株式会社 Speech recognition apparatus and speech recognition method
US9392324B1 (en) 2015-03-30 2016-07-12 Rovi Guides, Inc. Systems and methods for identifying and storing a portion of a media asset
JP6744025B2 (en) * 2016-06-21 2020-08-19 日本電気株式会社 Work support system, management server, mobile terminal, work support method and program
US9984688B2 (en) * 2016-09-28 2018-05-29 Visteon Global Technologies, Inc. Dynamically adjusting a voice recognition system
WO2018173295A1 (en) * 2017-03-24 2018-09-27 ヤマハ株式会社 User interface device, user interface method, and sound operation system
CN107103903B (en) * 2017-05-05 2020-05-29 百度在线网络技术(北京)有限公司 Acoustic model training method and device based on artificial intelligence and storage medium
CN107240395B (en) * 2017-06-16 2020-04-28 百度在线网络技术(北京)有限公司 Acoustic model training method and device, computer equipment and storage medium
CN107293296B (en) * 2017-06-28 2020-11-20 百度在线网络技术(北京)有限公司 Voice recognition result correction method, device, equipment and storage medium
US20190147855A1 (en) * 2017-11-13 2019-05-16 GM Global Technology Operations LLC Neural network for use in speech recognition arbitration
KR102455067B1 (en) * 2017-11-24 2022-10-17 삼성전자주식회사 Electronic apparatus and control method thereof
CN109325227A (en) * 2018-09-14 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for generating amendment sentence
US11024310B2 (en) 2018-12-31 2021-06-01 Sling Media Pvt. Ltd. Voice control for media content search and selection
KR102738700B1 (en) * 2019-09-18 2024-12-06 삼성전자주식회사 Electrical Apparatus and Method for controlling Voice Recognition thereof
WO2022198474A1 (en) * 2021-03-24 2022-09-29 Sas Institute Inc. Speech-to-analytics framework with support for large n-gram corpora
US11875780B2 (en) * 2021-02-16 2024-01-16 Vocollect, Inc. Voice recognition performance constellation graph

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0981181A (en) * 1995-09-11 1997-03-28 Atr Onsei Honyaku Tsushin Kenkyusho:Kk Voice recognition device
JP2000259645A (en) * 1999-03-05 2000-09-22 Fuji Xerox Co Ltd Speech processor and speech data retrieval device
JP2001236089A (en) * 1999-12-17 2001-08-31 Atr Interpreting Telecommunications Res Lab Statistical language model generating device, speech recognition device, information retrieval processor and kana/kanji converter
JP2002215670A (en) * 2001-01-15 2002-08-02 Omron Corp Audio response device, audio response method, audio response program, recording medium stored with audio response program, and reservation system
JP2002297179A (en) * 2001-03-29 2002-10-11 Fujitsu Ltd Automatic answering conversation system
JP2002540479A (en) * 1999-03-26 2002-11-26 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Client-server speech recognition
JP2002358095A (en) * 2001-03-30 2002-12-13 Sony Corp Method and device for speech processing, program, recording medium
JP2003044091A (en) * 2001-07-31 2003-02-14 Ntt Docomo Inc Voice recognition system, portable information terminal, device and method for processing audio information, and audio information processing program

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5127055A (en) * 1988-12-30 1992-06-30 Kurzweil Applied Intelligence, Inc. Speech recognition apparatus & method having dynamic reference pattern adaptation
JP2808906B2 (en) * 1991-02-07 1998-10-08 日本電気株式会社 Voice recognition device
JPH06282293A (en) 1993-03-29 1994-10-07 Sony Corp Voice recognition device
JP3468572B2 (en) 1994-03-22 2003-11-17 三菱電機株式会社 Dialogue processing device
JP3459712B2 (en) * 1995-11-01 2003-10-27 キヤノン株式会社 Speech recognition method and device and computer control device
JPH09230889A (en) 1996-02-23 1997-09-05 Hitachi Ltd Voice recognition response device
US6195641B1 (en) * 1998-03-27 2001-02-27 International Business Machines Corp. Network universal spoken language vocabulary
EP1088299A2 (en) * 1999-03-26 2001-04-04 Scansoft, Inc. Client-server speech recognition
JP3976959B2 (en) * 1999-09-24 2007-09-19 三菱電機株式会社 Speech recognition apparatus, speech recognition method, and speech recognition program recording medium
JP4543294B2 (en) * 2000-03-14 2010-09-15 ソニー株式会社 Voice recognition apparatus, voice recognition method, and recording medium
JP4072718B2 (en) * 2002-11-21 2008-04-09 ソニー株式会社 Audio processing apparatus and method, recording medium, and program

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0981181A (en) * 1995-09-11 1997-03-28 Atr Onsei Honyaku Tsushin Kenkyusho:Kk Voice recognition device
JP2000259645A (en) * 1999-03-05 2000-09-22 Fuji Xerox Co Ltd Speech processor and speech data retrieval device
JP2002540479A (en) * 1999-03-26 2002-11-26 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Client-server speech recognition
JP2001236089A (en) * 1999-12-17 2001-08-31 Atr Interpreting Telecommunications Res Lab Statistical language model generating device, speech recognition device, information retrieval processor and kana/kanji converter
JP2002215670A (en) * 2001-01-15 2002-08-02 Omron Corp Audio response device, audio response method, audio response program, recording medium stored with audio response program, and reservation system
JP2002297179A (en) * 2001-03-29 2002-10-11 Fujitsu Ltd Automatic answering conversation system
JP2002358095A (en) * 2001-03-30 2002-12-13 Sony Corp Method and device for speech processing, program, recording medium
JP2003044091A (en) * 2001-07-31 2003-02-14 Ntt Docomo Inc Voice recognition system, portable information terminal, device and method for processing audio information, and audio information processing program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAKAMOTO H. ET AL: "Detection of Unregistered-Words Using Phoneme Cluster Models", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J80-D-II, no. 9, 25 September 1997 (1997-09-25), pages 2261 - 2269, XP002996904 *
TANGAKI H. ET AL: "Hierarchical Language Model Incorporating Probabilistic Description of Vocabulary in Classes", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNIUCATION ENGINEERS, vol. J84-D-II, no. 11, 1 November 2001 (2001-11-01), pages 2371 - 2378, XP002996905 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070185713A1 (en) * 2006-02-09 2007-08-09 Samsung Electronics Co., Ltd. Recognition confidence measuring by lexical distance between candidates
US8990086B2 (en) * 2006-02-09 2015-03-24 Samsung Electronics Co., Ltd. Recognition confidence measuring by lexical distance between candidates
WO2007097390A1 (en) * 2006-02-23 2007-08-30 Nec Corporation Speech recognition system, speech recognition result output method, and speech recognition result output program
US8756058B2 (en) 2006-02-23 2014-06-17 Nec Corporation Speech recognition system, speech recognition result output method, and speech recognition result output program
US8688451B2 (en) * 2006-05-11 2014-04-01 General Motors Llc Distinguishing out-of-vocabulary speech from in-vocabulary speech
US7881928B2 (en) * 2006-09-01 2011-02-01 International Business Machines Corporation Enhanced linguistic transformation
DE112007002665B4 (en) * 2006-12-15 2017-12-28 Mitsubishi Electric Corp. Voice recognition system
KR101300839B1 (en) 2007-12-18 2013-09-10 삼성전자주식회사 Voice query extension method and system
JP2010014885A (en) * 2008-07-02 2010-01-21 Advanced Telecommunication Research Institute International Information processing terminal with voice recognition function
CN113869281A (en) * 2018-07-19 2021-12-31 北京影谱科技股份有限公司 A person identification method, device, equipment and medium

Also Published As

Publication number Publication date
US20080167872A1 (en) 2008-07-10
US7813928B2 (en) 2010-10-12
JPWO2005122144A1 (en) 2008-04-10
JP4705023B2 (en) 2011-06-22

Similar Documents

Publication Publication Date Title
JP4705023B2 (en) Speech recognition apparatus, speech recognition method, and program
US7421387B2 (en) Dynamic N-best algorithm to reduce recognition errors
US9336769B2 (en) Relative semantic confidence measure for error detection in ASR
US8401840B2 (en) Automatic spoken language identification based on phoneme sequence patterns
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US8612212B2 (en) Method and system for automatically detecting morphemes in a task classification system using lattices
US8719021B2 (en) Speech recognition dictionary compilation assisting system, speech recognition dictionary compilation assisting method and speech recognition dictionary compilation assisting program
US7124080B2 (en) Method and apparatus for adapting a class entity dictionary used with language models
US7620548B2 (en) Method and system for automatic detecting morphemes in a task classification system using lattices
JP5440177B2 (en) Word category estimation device, word category estimation method, speech recognition device, speech recognition method, program, and recording medium
US8577679B2 (en) Symbol insertion apparatus and symbol insertion method
US20050256715A1 (en) Language model generation and accumulation device, speech recognition device, language model creation method, and speech recognition method
US20060009965A1 (en) Method and apparatus for distribution-based language model adaptation
JP6323947B2 (en) Acoustic event recognition apparatus and program
US7912707B2 (en) Adapting a language model to accommodate inputs not found in a directory assistance listing
US20050187767A1 (en) Dynamic N-best algorithm to reduce speech recognition errors
US7085720B1 (en) Method for task classification using morphemes
JP4764203B2 (en) Speech recognition apparatus and speech recognition program
JP5243325B2 (en) Terminal, method and program using kana-kanji conversion system for speech recognition
JP2000172294A (en) Method of speech recognition, device thereof, and program recording medium thereof
JP5124012B2 (en) Speech recognition apparatus and speech recognition program
JP4986301B2 (en) Content search apparatus, program, and method using voice recognition processing function
WO2009147745A1 (en) Retrieval device
JP5585111B2 (en) Utterance content estimation device, language model creation device, method and program used therefor
Zhang Making an effective use of speech data for acoustic modeling

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2006514478

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 11628887

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase