WO2005122144A1

WO2005122144A1 - Speech recognition device, speech recognition method, and program

Info

Publication number: WO2005122144A1
Application number: PCT/JP2005/010183
Authority: WO
Inventors: Yoshiyuki Okimoto; Tsuyoshi Inoue; Takashi Tsuzuki
Original assignee: Matsushita Electric Industrial Co., Ltd.
Priority date: 2004-06-10
Filing date: 2005-06-02
Publication date: 2005-12-22
Also published as: US20080167872A1; US7813928B2; JPWO2005122144A1; JP4705023B2

Abstract

A speech recognition device can show a user whether the word pronounced by the user is an unregistered word for a speech recognition dictionary and whether the word should be re-pronounced because of erroneous recognition. The speech recognition device includes: a speech recognition vocabulary storage unit (102) for defining a vocabulary for speech recognition and storing it as registered words; a speech recognition unit (101) for correlating the speech pronounced with a registered word; an unregistered word judgment unit (104) for judging whether the speech pronounced is a registered word or an unregistered word according to the correlationresult of the speech recognition unit (101); an unregistered word storage unit (106) for storing an unregistered word; an unregistered word candidate search unit (105) used when the speech is judged to be an unregistered word by the unregistered word judgment unit (104), for searching an unregistered word candidate which is considered to correspond to the speech pronounced, from the unregistered word storage unit (106); and a result display unit (107) for displaying the search result.

Description

Specification

Speech recognition apparatus, speech recognition method, and program

Technical field

TECHNICAL FIELD [0001] The present invention relates to a speech recognition device used for a man-machine interface based on speech recognition, and more particularly to a response technique for unregistered word utterances.

Background art

[0002] Conventionally, there has been an application of speech recognition technology as an input front end for ease of use by users and for device control. In general, in speech recognition, the speech spoken as described in Non-Patent Document 1 is compared with the standard patterns of words set in advance in the speech recognition dictionary. If the recognition result is, then the following method is adopted.

[0003] However, since the user of the device remembers not all the word groups targeted for speech recognition but does not translate, it may happen that the user speaks a word not intended for speech recognition. In such a case, in the basic framework of speech recognition as described above, the closest word in the speech recognition dictionary is returned as a result, so that there is a problem that erroneous recognition will inevitably occur. To solve this problem, a method has been devised to detect the utterances by users of words (unregistered words) that do not exist in the speech recognition dictionary.

[0004] For example, in Patent Document 1, the similarity between the input speech and each word in the speech recognition dictionary is obtained, and the reference similarity is obtained by combining the unit standard pattern with each reference word. If the similarity is corrected and the corrected similarity does not reach a certain threshold, the user's utterance is regarded as an unregistered word, and a method is described.

[0005] Also, Patent Document 2 describes a method for detecting unregistered words with a small amount of processing and high accuracy using a phoneme HMM (Hidden Markov Model) and a garbage HMM. It has been done.

[0006] When an utterance of an unregistered word by a user is detected, this is indicated to the user by a warning sound such as a beep sound, or the utterance content such as “it does not exist” is pronouned. It can easily be indicated by the converted response output. However, simply returning such a response is insufficient for the user. This is because, from this response, it is impossible to clearly distinguish whether a word spoken by itself is an unrecognized word or a force that is unrecognized.

[0008] For this reason, the user is forced to continue speaking while paying more attention to the pronunciation until the user is satisfied or gives up, and the convenience of device control by voice input is reduced. Problems arise.

[0009] With respect to such a problem, in Patent Document 3, when the utterance of an unregistered word is detected by the user, a method of presenting a list of words that can be accepted by the device according to the situation to the user Is written. According to this, even if the user does not know the word recognized by the device, every time he / she utters an unregistered word, he / she is taught a word that can be spoken in that situation. It is possible to realize the desired operation without repeating the above.

[0010] Also, in Patent Document 4, speech recognition is performed by combining an internal dictionary corresponding to a conventional speech recognition dictionary and an external dictionary that stores a large number of unregistered words in the conventional speech recognition dictionary. A method is described in which speech recognition is performed as a dictionary for use, and when a word included in an external dictionary is recognized as a recognition result, it is simultaneously indicated that this is an unregistered word. According to this, for example, when the user “Taro Matsushita” speaks in a state where the word “Taro Matsushita” is included in the external dictionary, a response such as “Taro Matsushita does not exist” becomes possible.

Patent Document 1: Japanese Patent No. 2808906

Patent Document 2: Japanese Patent No. 2886117

Patent Document 3: Japanese Patent No. 3468572

Patent Document 4: Japanese Patent Laid-Open No. 9-230889

Non-Patent Document 1: Kiyohiro Shikano, Satoshi Nakamura, Shiro Ise, “Digital Signal Processing Series 5: Digital Signal Processing of Voice / Sound Information”, Shosodo, November 10, 1997, p. 45, 53

Disclosure of the invention

Problems to be solved by the invention

However, in the method of Patent Document 3 described above, when the number of acceptable words becomes very large, the user needs to search for a desired word from among a large number of words. user This can cause oversight and annoyance. For example, if a user does not exist in the system, “Taro Matsushita”, who speaks the name of a person, and tries to find “Taro Matsushita” by searching for a person who can accept words. Assuming that 100 names are listed, the user can check whether there is “Taro Matsushita” in the list,

I have to make sure that the person who replaces “Taro Matsushita” is! /! In such a case, the user may overlook “Taro Matsushita”, and finding “Taro Matsushita” is both annoying and not easy.

[0012] Further, in order to return the above-described response satisfactorily using the method of Patent Document 4, it is necessary to register a very large number of words in an external dictionary as an unregistered word word dictionary. However, when speech recognition is performed using such a large vocabulary dictionary, there is a contradictory problem that recognition errors are likely to occur because many similar words are registered. As a result, for example, in response to the user's utterance “Taro Matsushita”, a response such as “Toru Matsushita” or “Toru Matsushita” is returned. There can be problems such as useless confusion or relapse.

[0013] Therefore, the present invention has been made in view of a serious problem, and an object of the present invention is to provide a speech recognition device that can reduce the situation in which a user tries a useless recurrent speech.

Means for solving the problem

In order to achieve the above object, a speech recognition device according to the present invention is a speech recognition device that recognizes spoken speech, defines a vocabulary for speech recognition, and stores it as a registered word. Based on speech recognition word storage means, speech recognition means for verifying the spoken speech and registered words stored in the speech recognition word storage means, and based on the verification result of the speech recognition means Unspoken word determining means for determining whether the spoken speech is a registered word stored in the speech recognition word storage means and stored, unregistered word; The unregistered word storage means for storing the unregistered word, and the spoken voice based on the spoken voice when the unregistered word determining means determines that it is an unregistered word. Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results. Here, the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit. The stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.

It should be noted that the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in a speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.

The invention's effect

[0017] According to the present invention, an utterance of an unregistered word that causes voice recognition by the user to fail is presented to the user, and at the same time, it is also presented to the user that it is not due to a recognition error. Can do.

[0018] Further, according to the present invention, the recognition rate for the utterance of a word in the speech recognition dictionary, which is the original purpose, cannot be reduced.

[0019] Further, the unregistered word word storage means for searching for unregistered word candidates is very large and always requires maintenance. By separating this function from a server as a function, It is possible to reduce the manufacturing cost of the device and at the same time to reduce the maintenance cost of the unregistered word storage means.

Brief Description of Drawings

FIG. 1 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 1 of the present invention.

FIG. 2 is a flowchart showing the operation of the speech recognition apparatus according to the first embodiment.

FIG. 3 is a diagram showing an output example of a speech recognition unit when a recognized vocabulary is uttered according to the first embodiment.

[FIG. 4] FIG. 4 is a diagram showing an output example of a reference similarity calculation unit when a recognized vocabulary is uttered according to Embodiment 1. FIG. 5 is a diagram showing a result display example when a recognized vocabulary is uttered according to the first embodiment.

FIG. 6 is a diagram showing an output example of a speech recognition unit when an unregistered word is uttered according to the first embodiment.

FIG. 7 is a diagram showing an output example of a reference similarity calculation unit when an unregistered word is uttered according to the first embodiment.

FIG. 8 is a diagram showing an output example of an unregistered word candidate search unit according to the first embodiment.

FIG. 9 is a diagram showing a result display example when an unregistered word is uttered according to the first embodiment.

FIG. 10 is a diagram showing a calculation method of similarity between phoneme sequences at the time of unregistered word search according to the first embodiment.

FIG. 11 is a block diagram showing a functional configuration of the unknown utterance detection apparatus.

FIG. 12 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 2 of the present invention.

FIG. 13 is a diagram showing an example of unregistered word categories according to the second embodiment.

FIG. 14 is a block diagram showing a functional configuration of an unregistered word class determination unit using a class N-gram language model.

FIG. 15 is a block diagram showing a functional configuration of a class N gram generation / storage unit.

FIG. 16 is a block diagram showing a functional configuration of a class-dependent word N-gram generation / storage unit.

FIG. 17 is a diagram showing an example of a class N-gram language model for unregistered word class determination according to the second embodiment.

FIG. 18 is a diagram showing a display example of a result when an unregistered word of a different class according to the second embodiment is uttered.

FIG. 19 is a diagram showing a configuration of an unregistered word class determination unit that acquires information for external application force unregistered word class determination according to the second embodiment.

FIG. 20 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 3 of the present invention.

FIG. 21 is a block diagram showing a functional configuration of a speech recognition apparatus according to Embodiment 4 of the present invention. Explanation of symbols

100, 200, 300, 400 Voice recognition device

101 Voice recognition unit

102 Voice recognition vocabulary storage

103 Reference similarity calculator

104 Unregistered word determination part

105 Unregistered word candidate search section

106, 301 Unregistered word storage

107 Result display area

111 Voice pattern unit

112 Word dictionary storage

113 Word matching part

114 Transition probability storage

115 Voice sequence matching section

116 Candidate score difference calculator

117 candidates

118 candidates · Voice sequence score difference calculator

119 Candidate · Speech sequence · Phoneme sequence similarity calculator

201, 201a, 201b Unregistered word class judgment part

202 Unregistered word class word storage

211 Word string hypothesis generator

221 class N-gram generator

222 sentence expression corpus storage

223 Morphological analyzer for sentence expression

224 Class N-gram generator

225 class N-gram storage 咅

231 Class-dependent words N-gram generator

232 class corpus storage 233 Class Morphological Analyzer

234 Class-dependent word N-gram generator

235 Class-dependent words N-gram storage

236 Unregistered word class definition generator

237 Unregistered word class definition storage

241 Word category information receiver

242 Unregistered word class decision part

302, 402 network (communication means)

303 Unregistered word server

401 Unregistered word search request transmission / reception unit

403 Unregistered word search server

404 Unregistered word search part

405 Unregistered word storage

BEST MODE FOR CARRYING OUT THE INVENTION

In order to achieve the above object, a speech recognition device according to the present invention is a speech recognition device that recognizes spoken speech, defines a vocabulary for speech recognition, and stores it as a registered word. Based on speech recognition word storage means, speech recognition means for verifying the spoken speech and registered words stored in the speech recognition word storage means, and based on the verification result of the speech recognition means Unspoken word determining means for determining whether the spoken speech is a registered word stored in the speech recognition word storage means and stored, unregistered word; The unregistered word storage means for storing the unregistered word, and the spoken voice based on the spoken voice when the unregistered word determining means determines that it is an unregistered word. Unregistered word candidates that seem to correspond to Characterized in that it comprises the unregistered word candidate retrieving means for retrieving from the unregistered word stored in means and a result displaying means for displaying the pre-Symbol search results.

[0023] With this, when the word spoken by the user is an unregistered word, an unregistered word candidate is searched and presented, so that the user uttered himself / herself among the unregistered word candidates. You can recognize unregistered word utterances simply by confirming that words are included. Also unregistered Since word word candidates are searched separately from word comparison in the speech recognition dictionary, the performance of the speech recognition itself will not be degraded.

[0024] Here, the unregistered word candidate search means may search a plurality of unregistered word candidates from unregistered words stored in the unregistered word word storage means.

[0025] According to this configuration, since unregistered word candidates are not narrowed down to one word, high accuracy is not required for searching for unregistered word candidates, and hardware resources can be kept low.

[0026] The unregistered word storage means preferably stores the unregistered words classified according to the category according to the category to which the unregistered word belongs. The speech recognition apparatus further includes unregistered word class determining means for determining a category to which the unregistered word belongs based on the spoken speech, and the unregistered word candidate search means includes the unregistered word More preferably, the unregistered word candidate is searched from the classified categories in the unregistered word word storage unit based on the determination result of the class determining unit.

[0027] This narrows down the search range of unregistered word candidates according to the category of unregistered words, so the words of the category that the user originally intended are presented as unregistered word candidates. Can be prevented. In addition, since the search range is narrowed down, it is possible to improve the search accuracy of unregistered word candidates.

[0028] In addition, the speech recognition apparatus further includes an information acquisition unit that acquires information about the category, and the unregistered word candidate search unit is based on the information acquired by the information acquisition unit. The registered word candidates may also be searched for the categorized category in the unregistered word word storage means.

[0029] According to this configuration, it is difficult to think that the utterance is uttered from the viewpoint of similar pronunciation in terms of pronunciation. V, the candidate for unregistered word is not output. A voice recognition device that reduces the number of unregistered words and presents them to the user in an easy-to-understand manner is realized.

[0030] Further, the unregistered word candidate search means searches for the unregistered word candidate by calculating an unregistered word score obtained by quantifying the degree of similarity with the spoken speech, The display unit displays the unregistered word candidate and its unregistered word score as the search result, and the result display unit displays the unregistered word according to the unregistered word score. I prefer to change the display of the candidate.

[0031] By this, when presenting unregistered word candidates, the unregistered word candidates are quantified, and the unregistered word candidates are emphasized by emphasizing the most likely unregistered word candidates. There is an effect that it can be presented easily.

[0032] Further, the unregistered words stored in the unregistered word storage means may be updated under a predetermined condition.

[0033] This makes it possible to quickly reflect the unregistered word words such as unique names and program titles that increase daily in the unregistered word word storage means.

[0034] Here, the speech recognition apparatus includes a communication unit that communicates with an unregistered word word server that stores a group of unregistered words stored in the unregistered word word storage unit. The stage may receive the unregistered word group from the unregistered word server to update the unregistered word stored in the unregistered word word storage means.

[0035] As a result, new unregistered words are provided from an external Sano, so that unregistered words such as unique names and unique titles that are increasing daily are registered in the unregistered word storage unit. It is possible to keep the unregistered word storage means in an optimum state without requiring the user to do the trouble.

[0036] Also stored in the speech recognition word storage means! The registered words may be updated under predetermined conditions.

[0037] Thus, following the time variation of the usage frequency of registered words, only a relatively small number of registered words that are expected to have a high usage frequency may be stored in the speech recognition word storage means. Therefore, it is easy to shorten the recognition time and obtain a good recognition rate.

Further, the present invention can be realized as a voice recognition system that can be realized as such a voice recognition apparatus. That is, a speech recognition system for recognizing spoken speech, the speech recognition system being registered in the speech recognition device for recognizing spoken speech and the speech recognition device! An unregistered word search server for searching words, wherein the speech recognition device defines a vocabulary for speech recognition and stores speech recognition word storage means as a registered word, and the spoken speech, Speech recognition means for collating a registered word stored in the speech recognition word storage means; Based on the collation result of the speech recognition means, whether the spoken speech is a registered word that is stored in the speech recognition word storage means and whether it is a stored unregistered word. When the unregistered word determining means and the unregistered word determining means determine that it is an unregistered word, the unregistered search server is searched for an unregistered word candidate that seems to correspond to the spoken speech. Search request transmitting means for requesting, search result receiving means for acquiring a search result of the unregistered word candidate from the unregistered word search server, and result display means for displaying the search result, and the unregistered word The search server includes an unregistered word storage unit that stores the unregistered word, a search request reception unit that receives the search request from the search request transmission unit, and the search request reception unit that has received the search request. If the departure Unregistered word candidates that are stored in the unregistered word word storage means and search for unregistered word candidates that are considered to correspond to the spoken voice based on the spoken voice! The apparatus may further include a search unit and a search result transmission unit that transmits the search result to the voice recognition device.

[0039] According to this configuration, it is possible to realize a voice recognition interface in a compact manner, and at the same time, it is possible to reduce the maintenance cost of the unregistered word word storage unit. In addition, unregistered word storage means that always needs to be updated can be combined into one for multiple devices, and maintenance costs can be reduced.

[0040] It should be noted that the present invention can be realized as a speech recognition method that can be realized as such a speech recognition device, and has a characteristic means included in the speech recognition device such as NAGKO as a step. It can also be realized as a program that causes a computer to execute steps. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.

Hereinafter, the best mode for carrying out the present invention will be described in detail with reference to FIGS. 1 to 21.

[0042] (Embodiment 1)

FIG. 1 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 1 of the present invention.

[0043] The speech recognition apparatus 100 shown in FIG. 1 is used as one of the man 'machine' interfaces. User power is a device that accepts speech input and outputs the recognition result of the input speech.Speech recognition unit 101, speech recognition vocabulary storage unit 102, reference similarity calculation unit 103, unregistered word determination unit 104, An unregistered word candidate search unit 105, an unregistered word word storage unit 106, and a result display unit 107 are provided.

The speech recognition unit 101 is a processing unit that captures input speech and recognizes the utterance content.

The voice recognition vocabulary storage unit 102 is a storage device such as a hard disk that defines and stores the vocabulary recognized by the voice recognition unit 101. This speech recognition vocabulary storage unit 102 stores a standard acoustic pattern of each word as a standard pattern, or a representation of the acoustic pattern of each word in a model called HMM (Hidden Markov Model) or -Uralnet. And! This speech recognition vocabulary storage unit 102 stores a standard pattern expressing a pattern for each shorter acoustic unit or a model expressed by a model such as an HMM or -Ural net. A word pattern or a word model is synthesized and provided to the speech recognition unit 101.

[0046] The reference similarity calculation unit 103 is a processing unit that calculates a reference similarity used to determine whether or not the input speech is an unregistered word. This reference similarity calculation unit 103 searches for a subword sequence having the highest similarity to the input speech by arbitrarily combining patterns or models of acoustic units shorter than words called subwords, and has the maximum similarity. Ask for.

The unregistered word determination unit 104 determines whether or not the user's utterance content is an unregistered word based on the results of both the voice recognition unit 101 and the reference similarity calculation unit 103. This unregistered word determination unit 104, when the utterance content of the user is a word stored in the speech recognition vocabulary storage unit 102, that is, a registered word, results in a recognition result that the utterance content has been recognized. If the word is output to the display unit 107 and is not stored in the speech recognition vocabulary storage unit 102, that is, an unregistered word, the determination result that the utterance content is an unregistered word is displayed as an unregistered word. Output to candidate search section 105.

[0048] The unregistered word candidate search unit 105 is a processing unit that searches for unregistered word words from the utterance content when the user's utterance content is determined to be an unregistered word. [0049] The unregistered word storage unit 106 is a storage device such as a hard disk that stores a large number of words to be searched for unregistered words in the unregistered word candidate search unit 105.

It is assumed that the unregistered word candidate search unit 105 searches for an unregistered word word from a very large number of vocabularies stored in the unregistered word word storage unit 106, and will be described later. Thus, it is preferable to perform a search using a simpler and faster method (that is, shorter calculation time), which is different from the speech recognition unit 101.

[0051] The result display unit 107 is a display device such as a CRT display or a liquid crystal display, and shows a screen showing a recognition result output from the unregistered word determination unit 104, a determination result, and a search result of unregistered word words. By displaying the screen, it is shown to the user whether the content of the user's utterance has been recognized or whether it is an unregistered word.

Next, the operation of the speech recognition apparatus 100 configured as described above will be described.

FIG. 2 is a flowchart showing the processing operation of the speech recognition apparatus 100.

First, when the speech recognition apparatus 100 receives an input of speech uttered by the user (S10), the speech recognition unit 101 resembles the input speech from the speech recognition vocabulary storage unit 102 based on the input speech. The word to be recognized is recognized (S12). More specifically, the speech recognition unit 101 compares the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 with the input speech, and inputs speech for each word. Similarities are calculated, and those with high similarity are extracted as candidates. At this time, the speech recognition apparatus 100 searches the sub-word sequence closest to the input speech in the reference similarity calculation unit 103, and obtains the similarity as the reference similarity (S14).

Next, the speech recognition apparatus 100 uses the unregistered word determination unit 104 to refer to the similarity of the first candidate word (the candidate word having the highest similarity) obtained by the speech recognition unit 101, and the reference The similarity calculation unit 103 compares the reference similarity obtained and determines whether the comparison result is within a predetermined threshold (S16). The predetermined threshold here is a threshold for determining whether the utterance content of the user is a registered word or an unregistered word. Using a large number of sampled utterances, the similarity by each of the speech recognition unit 101 and the reference similarity calculation unit 103 is obtained, and these statistical distribution power optimum threshold values are determined. Here, the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 are within a threshold that is statistically determined in advance. If there is (Yes in S16), it is determined that the utterance content of the user is a word (registered word) included in the speech recognition vocabulary storage unit 102 (S18). Thereafter, the speech recognition apparatus 100 presents the recognition result to the user via the result display unit 107 (S26), and ends the processing operation.

[0057] On the other hand, the unregistered word determination unit 104 determines that the similarity of the first candidate word of the speech recognition unit 101 and the reference similarity of the reference similarity calculation unit 103 exceed a statistically predetermined threshold. (No in S16), the user's utterance content is determined to be a word (unregistered word) that is not included in the speech recognition vocabulary storage unit 102 (S20), and the determination result is stored as an unregistered word candidate search unit. Output to 105

[0058] When the unregistered word determination unit 104 determines that the user's utterance content is an unregistered word, the speech recognition apparatus 100 uses the unregistered word candidate search unit 105 to register the utterance content based on the utterance content. A word word is searched (S22). At this time, the unregistered word candidate search unit 105 compares the subword sequence obtained by the reference similarity calculation unit 103 with each of the many unregistered word words stored in the unregistered word word storage unit 106. Then, an unregistered word score, which is a score related to the similarity, is obtained to search for an unregistered word having a high score, that is, an unregistered word that seems to be the user's utterance content. Then, the unregistered word candidate search unit 105 extracts a plurality of unregistered word candidates that are considered to be the utterance contents of the user, for example, in descending order from the highest score (S24), together with the unregistered word score. Output to the result display section 107. After that, the speech recognition apparatus 100 presents the determination result, the extracted unregistered word candidate and the unregistered word score to the user via the result display unit 107 (S26), and performs the processing operation. finish.

By the way, in general, the speech recognition device 100 defines a word to be recognized, that is, a speech recognition vocabulary according to an application that uses the speech recognition device 100 as an input device for a man-machine interface. For example, in the case of an application that searches for a program using voice recognition as an input means, the name of the program to be searched, the name of a performer who becomes a key in the search, and the like are defined as the voice recognition vocabulary.

Assuming such an application, the speech recognition apparatus 100 displays differently depending on whether the utterance content by the user is a word included in the speech recognition vocabulary storage unit 102 or not. That is, when the utterance content is a word included in the speech recognition vocabulary storage unit 102, as described above, the standard pattern or word model of each word stored in the speech recognition vocabulary storage unit 102 and The speech recognition unit 101 performs collation with the input speech, calculates the similarity for each word, obtains the top candidates in descending order from the highest similarity, and outputs the content to the result display unit 107.

As a specific example, FIG. 3 shows an example in which a user utters “Matsushita Taro” on the assumption that “Taro Matsushita” t ヽぅ word exists in the speech recognition vocabulary storage unit 102. . At this time, the reference similarity calculation unit 103 searches for a subword sequence closest to the input speech and obtains the similarity as the reference similarity.

FIG. 4 shows an output example of the reference similarity calculation unit 103 for the utterance “Matsushita Taro” by the user.

In the example shown in FIG. 3 and FIG. 4, the difference between the similarity “2041” of the first candidate and the reference similarity “2225” is a statistically calculated threshold (for example, “200 Therefore, the unregistered word determination unit 104 determines that the user's utterance content is a registered word. Since the judgment result of the utterance content is not correct for unregistered words, in this case, the unregistered word candidate search unit 105 outputs the recognition result as it is to the result display unit 107 without performing the unregistered word search. “Taro Matsushita” is correctly displayed as a recognition result via the display unit 107. FIG. 5 shows an example of the result display in the result display unit 107.

[0065] The user who sees the recognition result in the format illustrated in FIG. 5 can know at a glance that his / her utterance content is a registered word.

[0066] On the other hand, even when the utterance content by the user is a word that does not exist in the speech recognition vocabulary storage unit 102, the speech recognition unit 101 collates with each word stored in the speech recognition vocabulary storage unit 102, The similarity is calculated for each word, and the top candidates are output in descending order from the highest similarity. However, in this case, since the utterance content is a word that is not included in the speech recognition vocabulary, there is no word that matches the utterance content among these candidates, so an output example is shown in Fig. 6. It will be like that. Here, as in the case described above, the speech utterance content of the user is “Matsushita Taro”. In other words, “Taro Matsushita” t, the word is included, and, as! /

[0067] At this time, the reference similarity calculation unit 103 searches for a subword sequence most similar to the input speech and calculates the similarity. This is whether the utterance content is included in the speech recognition vocabulary. It is not affected at all by no. As a result, the output of the reference similarity calculation unit 103 is the same as the output example (see FIG. 4) when the utterance content is included in the speech recognition vocabulary as shown in FIG.

Subsequently, the unregistered word determination unit 104 compares the similarity of the first candidate by the speech recognition unit 101 and the reference similarity by the reference similarity calculation unit 103 as described above. When the utterance content is not included in the speech recognition vocabulary, the similarity between the two is greatly different and the difference between them is greater than a predetermined threshold value. Based on this, the unregistered word determination unit 104 The content is determined as an unregistered word. For example, in the example shown in FIG. 6 and FIG. 7, the similarity “1431” of the first candidate in the speech recognition unit 101 and the reference similarity “2225” are greatly different, and the difference is predetermined. Therefore, the unregistered word determination unit 104 determines that the user's utterance content is an unregistered word.

[0069] When the unregistered word determination unit 104 determines that the utterance content of the user is an unregistered word, the unregistered word candidate search unit 105 determines the subword sequence obtained in the reference similarity calculation unit 103. Are compared with each of a number of unregistered word words stored in the unregistered word word storage unit 106, and an unregistered word score, which is a score related to the similarity, is calculated. The unregistered word candidate search unit 105 extracts the top five candidates from the unregistered word score in descending order of the unregistered word score, and outputs it to the result display unit 107 together with the unregistered word score. Do

[0070] FIG. 8 shows that the unregistered word candidate search unit 105 performs an unregistered word candidate search unit 105 based on the subword string “Matsushima Kanou” obtained by the reference similarity calculation unit 103 for the user's utterance “Matsushi Taro”. It is a figure which shows the example of the result of having searched the registration word. Here, “Taro Matsushita” is stored in the unregistered word storage unit 106.

[0071] Thus, the search result by the unregistered word candidate search unit 105 is sent to the result display unit 107 together with information that these words are unregistered words, and the user's utterance is recognized as an unregistered word. This is communicated to the user. In the example shown in Fig. 8, the result shown in Fig. 9 is output. Is done. The user who sees the recognition result in the format illustrated in Fig. 9 can know at a glance that his / her utterance content was unknown to the system.

[0072] By adopting such a result display method, since the content of the user's utterance is displayed on the screen, the user needs to doubt whether or not the utterance is correctly recognized. You will be able to know clearly what is not included in the speech recognition vocabulary.

[0073] Further, when such a result display method is used, a plurality of words are displayed as unregistered word candidates, and the user needs to deceive the words spoken by himself / herself. However, if the number of candidates to be output is small, the effort is small. However, the display of such unregistered word candidates is intended to represent that the displayed word is an unregistered word and cannot be further processed. If you let the user choose a word spoken by the user from among the unregistered word candidates, there will be no trouble. Therefore, the disadvantage of displaying multiple candidates as unregistered word candidates is very small.

[0074] Further, from the viewpoint of implementing a speech recognition system, the fact that unregistered word candidates do not have to be narrowed down to one word means that the unregistered word candidate search unit 105 has a high accuracy in search accuracy. Not only is it required, it can also be a great merit, such as keeping the hardware resources to achieve the search accuracy low. However, even if the search accuracy is not so high, by displaying multiple candidates, the words spoken by the user will be included in it with a high probability. Since the word is an unregistered word, it is useful to know that it is useless even if you try to speak repeatedly.

Hereinafter, the operation of the unregistered word candidate search unit 105 will be described more specifically.

In unregistered word candidate search section 105 in the first embodiment, a value based on phoneme editing distance is used as a search method for unregistered word candidates.

[0077] In this search method, when two words are represented by phoneme symbols, the number of steps is assumed when the phoneme symbol string of one word is edited and rewritten to the other phoneme symbol string. It counts whether or not the change work is necessary.

An example of this is shown in FIG. Figure 10 shows the phoneme symbol string "ABDEF" (series 1) and the phoneme symbol The column "AXBYDF" (series 2) is shown, and the necessary steps (edit distance) to rewrite from series 2 to series 1 are insertion (insertion error), replacement (replacement error), and deletion (missing). It is shown that the editing work of (error) is necessary once for each. In other words, in the example shown in FIG. 10, the edit distance required for the work of rewriting from series 2 to series 1 is 3 (insert 1 + replace 1 + delete 1).

[0079] In the unregistered word candidate search unit 105, the phoneme symbol string representation of the subword sequence obtained by the reference similarity calculation unit 103 and the phoneme symbol string of the word stored in the unregistered word word storage unit 106 are used. On the other hand, the edit distance is calculated as described above, and the normalized length is subtracted from 1 to obtain the unregistered word score. The unregistered word candidate search unit 105 performs this process for all the words stored in the unregistered word word storage unit 106, and extracts the unregistered word candidates from the words with the highest unregistered word score in descending order. The result is output to the result display unit 107. The diagram shown in FIG. 8 is an example of the unregistered word candidate and the unregistered word score obtained in this way.

[0080] The advantage of realizing the unregistered word search method by comparing the phoneme sequences in this way is that the entire search for the unregistered word word storage unit 106 in which a large number of words are stored is performed with a light process. This means that the computational resources required for unregistered word search (calculation time, amount of memory required for computation, processor load, power consumption, etc.) are kept small. As a result, even in a device that is easily limited in computing resources, such as a portable information terminal device, unregistered word candidates are searched for and displayed to the user in a short time to give the user a light feeling of use. It is out.

[0081] On the other hand, there is a concern that the search accuracy may be reduced by simply performing the search, but as described above, unregistered word candidates are allowed to output a plurality of candidates. By outputting multiple candidates, it is possible to increase the probability that words spoken by the user will be included in the words, and to cope with a decrease in search accuracy. Further, by executing the unregistered word search independently of the voice recognition unit 101, there is an effect that the recognition process of the voice recognition unit 101 is not adversely affected.

[0082] In the first embodiment, the reference similarity calculation unit 103 is provided for unregistered word determination, but this is not an essential requirement. It is also possible to use other methods for determining unregistered words.

Further, instead of the speech recognition unit 101, the speech recognition vocabulary storage unit 102, and the reference similarity calculation unit 103 described in the first embodiment, an unknown utterance detection apparatus as shown in FIG. Use it for the purpose.

The speech segment pattern storage unit 111 stores speech segments of standard speech used for matching with the feature parameters of the input speech.

[0086] Here, a speech segment is a VC pattern that concatenates the latter half of a vowel segment and the first half of a consonant segment that follows it, and the latter half of a consonant segment. It means a set of CV patterns connected to the first half of the vowel interval. However, a speech piece is a set of phonemes that are almost equivalent to one letter of the alphabet when Japanese is written in Roman letters, and a mora that is almost equivalent to one letter of hiragana when Japanese is written in Hiragana. A set of subwords, meaning a chain of multiple mora, or a mixed set of these sets.

[0087] The word dictionary storage unit 112 stores a rule for synthesizing the word pattern of the speech recognition vocabulary by connecting the speech pieces.

[0088] The word matching unit 113 compares the input speech expressed in a time series of feature parameters with the synthesized word pattern, and calculates the likelihood corresponding to the similarity for each word.

[0089] The transition probability storage unit 114 stores a transition probability that expresses the naturalness of the connection as a continuous value when the speech pieces are arbitrarily combined. Here, the 2gram probability of phonemes is used as the transition probability. The 2-gram probability of a phoneme means the probability P (y I X) that a phoneme y connects after the preceding phoneme x, and is obtained in advance using a large number of Japanese text data. However, the transition probability may be a 2-gram probability of a mora, a 2-gram probability of a subword, or a 2-gram probability of a mixture of these, or a 3-gram probability other than a 2-gram probability. .

The speech sequence matching unit 115 considers the transition probability based on the likelihood of a pattern formed by arbitrarily combining the speech segment patterns and the input speech expressed as a time series of feature parameters. The maximum likelihood obtained is calculated.

[0091] The candidate score difference calculation unit 116 out of the likelihood of each word calculated by the word matching unit 113, the word that has the highest value (first candidate) and the word that has the next highest value ( The difference between the likelihoods of the 2nd candidate) is calculated by normalizing the word length.

[0092] The candidate-phoneme sequence similarity calculation unit 117 calculates the distance between the first candidate phoneme sequence and the second candidate phoneme sequence in order to obtain the acoustic similarity between the first candidate and the second candidate. Calculate

Candidate 'speech sequence score difference calculation section 118 normalizes the difference between the likelihood of the first candidate and the reference likelihood calculated by speech sequence matching section 115 by the word length. calculate.

Candidate / speech sequence / phoneme sequence similarity calculation unit 119 calculates the acoustic similarity between the first candidate and the sequence determined as the optimal sequence by the speech sequence matching unit 115 as a distance between each phoneme sequence. Calculate as separation.

Then, when such an unknown utterance detection device is used, the unregistered word determination unit 104 includes the candidate score difference calculation unit 116, the candidate / phoneme sequence similarity calculation unit 117, the candidate, and the speech sequence. The values obtained by the score difference calculation unit 118 and the candidate / speech sequence / phoneme sequence similarity calculation unit 119 are combined to determine whether or not the input speech is an unregistered word. In this way, the determination accuracy of unregistered words is improved by statistically combining a plurality of measures for detecting unregistered words. In addition, here, the four scales are used as the scales used in the unregistered word determination unit 104. Besides this, the likelihood of each word candidate itself, its distribution, and the local score within the word section. It is also possible to use measures such as the amount of variation and the duration information of the phonemes that make up the word.

[0096] In this case, a linear discriminant obtained in advance using a large number of recognition result cases is used as a method for determining an unregistered word based on a plurality of scales. However, the use of so-called learning machines such as neural networks, decision trees, and SVM (support 'vector' machines) is also effective.

Further, in the unregistered word candidate search unit 105, the unregistered word search method based on the edit distance between phoneme sequences has been described. However, as the definition of the edit distance between phonemes, insertion error, omission error, replacement Those obtained experimentally that not all errors are set to edit distance "1" It is also effective to use a continuous value based on the error occurrence probability as the distance.

Further, data in the same format as the speech recognition vocabulary storage unit 102 is stored in the unregistered word word storage unit 106, and the unregistered word candidate search unit 105 is similar to the speech recognition unit 101. It is also possible to directly collate words from the input speech parameter and output unregistered word candidates and their unregistered word scores. With such a configuration, the resources required for unregistered word search increase, but the effect of improving the search accuracy for unregistered words can be obtained. Even in this case, the ヽ effect and the ヽぅ effect that does not reduce the recognition rate for the target word, which is a feature of the present invention, are maintained.

Furthermore, although the description has been made on the assumption that the words included in the unregistered word storage unit 106 and the words included in the speech recognition vocabulary storage unit 102 do not overlap, the unregistered word storage unit 106 May be a word included in the speech recognition vocabulary storage unit 102, and instead, a word included in the speech recognition vocabulary storage unit 102 is searched by the unregistered word candidate search unit 105. In such a case, this may be excluded and output to the result display unit 107. By doing so, it becomes possible to determine the vocabulary of the unregistered word word storage unit 106 regardless of the contents of the speech recognition vocabulary storage unit 102, and the maintenance of the unregistered word word storage unit 106 becomes easy. An effect is obtained.

[0100] In Embodiment 1, the description has been made assuming that the input utterance is a word utterance, but the input utterance may be a sentence utterance. In this case, the unregistered word determination unit 104 determines whether or not an unregistered word word is included in the sentence utterance, and if it is included, in which position the unregistered word word is present. The required force and other operations are exactly the same.

[0101] Further, in Embodiment 1, the unregistered word candidate search unit 105 has been described as outputting five unregistered word candidates, but the unregistered word candidate search unit 105 has an unregistered word search accuracy. It is effective to change this according to the above, and the number of candidates to be output according to the similarity of each unregistered word may be made variable by the unregistered word candidate search unit 105. Therefore, depending on the search accuracy of the unregistered word candidate search unit 105 or the unregistered word score of the searched unregistered word, the number of output unregistered word candidates may be one. With this configuration, there is no unnecessary load when letting the user determine whether or not there is a spoken word in the candidate list. If you don't have to spend it on your users, you can get the effect.

[0102] Furthermore, the output example shown in Fig. 9 shows an example in which all unregistered word candidates are displayed in the same manner, but the result display unit 107 displays the unregistered word score of the unregistered word candidates. It is also possible to emphasize the candidate that seems to be the user's utterance content by changing the font size according to the situation, changing the font to bold, or changing the color. As a result, an effect of reducing the load on the user when searching for a spoken word from the list can be obtained.

[Embodiment 2]

Next, a speech recognition apparatus according to Embodiment 2 of the present invention will be described.

FIG. 12 is a block diagram showing a functional configuration of the speech recognition apparatus according to the second embodiment.

As shown in FIG. 12, the speech recognition apparatus 200 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105, And it is the same as the speech recognition apparatus 100 according to the first embodiment in that the result display unit 107 is provided. However, the speech recognition apparatus 200 according to the second embodiment includes the unregistered word class determination unit 201 and the unregistered word class-specific word storage unit 202, and thus the speech recognition according to the first embodiment. Different from device 100. Hereinafter, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment, and description thereof is omitted.

[0106] When the spoken word is an unregistered word, the unregistered word class determination unit 201 determines what category the unregistered word belongs to based on the content of the utterance by the user and the usage status of the system. It is a processing part which performs.

The unregistered word class-specific word storage unit 202 is a storage device such as a hard disk that stores unregistered word words classified into categories.

Next, the operation of speech recognition apparatus 200 according to Embodiment 2 will be described.

Also in the second embodiment, when the user's utterance content is a word included in the speech recognition vocabulary storage unit 102, the operation is the same as that shown in the first embodiment.

[0110] When the user's utterance is an unregistered word, the reference similarity calculation unit 103 references Based on the similarity, the unregistered word determination unit 104 performs unregistered word determination. At the same time, the unregistered word class determination unit 201 determines in which category the unregistered word belongs. Here, the unregistered word category refers to a unique person name such as a celebrity name, a unique title name such as a program title, and a unique place name such as a “tour place” as shown in FIG. The method for determining the unregistered word category in the unregistered word class determining unit 201 will be described later.

[0111] When it is estimated that the word spoken by the user is an unregistered word and to which category the word belongs, the unregistered word candidate search unit 105 searches for an unregistered word. Done. At this time, the unregistered word candidate search unit 105 narrows down the search range of the unregistered word class-specific word storage unit 202 based on the class determination result by the unregistered word class determination unit 201 to search for unregistered words. Perform a search. When the speech recognition apparatus 200 acquires unregistered word candidates in this manner, the unregistered word candidate is presented to the user via the result display unit 107 as in the case of the first embodiment.

[0112] Here, the operation of the unregistered word class determination unit 201 will be described in detail.

[0113] When the utterance by the user is a sentence utterance, the unregistered word category can be determined from information before and after the unregistered word in the recognized sentence. For example, if the user ’s utterance is “I want to see the bans and threads in which ○ appears,” it is assumed that “○” is an unregistered word in the proper personal name class, and “ For the utterance “Record △”, △△ is considered as an unregistered word in the program title class. In this way, a class N-gram language model including an unregistered word class can be used as a model for estimating the word class at the target location from the context before and after the sentence. Figure 14 shows the functional configuration of the unregistered word class determination unit when using a class N-gram language model that includes unregistered word classes.

As shown in FIG. 14, when using a class N gram language model, the unregistered word class determination unit 201a includes a word string hypothesis generation unit 211, a class N gram generation storage unit 221 and a class dependent A word N-gram generation / accumulation unit 231.

[0115] The word string hypothesis generation unit 211 refers to a class N gram that evaluates a sequence of words and unregistered word classes, and a class-dependent word N gram that evaluates a word string that forms an unregistered word class. A word string hypothesis is generated from the word matching result and a recognition result is acquired. [0116] The class N-gram generation / accumulation unit 221 generates a class N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a context including an unregistered word class, and generates the generated class N-gram. Accumulate ram.

[0117] The class-dependent word N-gram generation / accumulation unit 231 generates and generates a class-dependent word N-gram for assigning a language likelihood that is a logarithmic value of a linguistic probability to a word sequence in an unregistered word class. Class-dependent words N-grams are accumulated.

FIG. 15 shows a functional configuration of the class N gram generation / storage unit 221.

As shown in FIG. 15, the class N gram generation / storage unit 221 includes a sentence expression corpus storage unit 222 in which a large number of sentence expressions to be recognized are stored in advance as text, and a sentence expression for morphological analysis of the sentence expression. A morphological analyzer 223, a class N gram generator 224 that generates a class N gram by obtaining a statistic of a chain of words and unregistered words from a morpheme result by referring to the word string class definition, and a class N It comprises a class N gram storage unit 225 that stores the gram and outputs it to the word string hypothesis generation unit 211.

[0120] The sentence expression corpus accumulating unit 222 accumulates a large number of sentence expression data libraries to be recognized in advance.

[0121] The morpheme analysis unit 223 for sentence expression has a meaning from a relatively long sentence expression such as “Record the weather forecast for tomorrow” stored in the sentence expression corpus storage unit 222. Analyzes the morphemes that are the language units of.

[0122] The class N-gram generation unit 224 extracts a word string included in the text parsed into morphemes, refers to an unregistered word class input from the class-dependent word N-gram generation storage unit 231 described later, If the corresponding unregistered word class exists, the unregistered word class included in the text is replaced with a virtual word, and the statistic of the chain of words or unregistered word classes is obtained to determine the word or unregistered word class. A class N-gram that associates class chains with their probabilities is generated. The class N gram generated by the class N gram generation unit 224 is stored in the class N gram storage unit 225.

[0123] By measuring the frequency of each word chain in this way, conditional probabilities can be calculated, and unregistered word classes can be treated virtually as one word, with conditional probabilities for each word. Is a language model with. Next, FIG. 16 shows a functional configuration of the class-dependent word N-gram generation / storage unit 231.

As shown in FIG. 16, the class-dependent word N-gram generation / storage unit 231 includes a class corpus storage unit 232, a class morpheme analysis unit 233, a class-dependent word N-gram generation unit 234, and a class-dependent word. The N-gram storage unit 235, the unregistered word class definition generation unit 236, and the unregistered word class definition storage unit 237 are configured.

[0126] The class corpus storage unit 232 stores in advance a data library of unregistered words (for example, a title of a TV program, a person's name, etc.) having the same semantic properties and syntactic properties.

[0127] The class morphological analyzer 233 performs morphological analysis on the class corpus. Specifically, the class morpheme analyzer 122 is an unregistered word having a relatively short common property such as a TV program name such as “MMM weather forecast” stored in the class cone storage 121. Are analyzed in morpheme units.

[0128] The class-dependent word N-gram generation unit 234 processes the morphological analysis results, obtains the statistic of the word chain, and obtains the class-dependent word N-drum, which is information associating the word string with its probability. Generate.

[0129] The class-dependent word N-gram storage unit 235 stores the class-dependent word N-gram generated by the class-dependent word N-gram generation unit 234. The class-dependent word N-gram stored in the class-dependent word N-gram storage unit 235 is referred to by the word string hypothesis generation unit 211 during speech recognition.

[0130] The unregistered word class definition generation unit 236 generates a definition of an unregistered word class in which unregistered words having the same characteristics as the morphological analysis result power of the class corpus are defined as classes. In other words, morphological analysis is performed on unregistered words with common characteristics, and a class definition is generated in which the obtained word string is a word string of the unregistered word class.

[0131] The unregistered word class definition storage unit 237 stores the unregistered word class definition generated by the unregistered word class definition generation unit 236. This unregistered word class definition is referenced by the class N gram generation unit 224 of the class N gram generation storage unit 221 when generating the above class N gram.

[0132] In the class N-gram language model used in the unregistered word class determination unit 201a having the above-described configuration, generally, a word sequence W consisting of I words is generated. The following formula is formulated using the probability of the n-chain of words: [Equation 1] (], ₂ ,…) = fT (| Czo—,, +], ... ―】 ) | C

Z = 1 where W, W, ..., W represent individual words, C,

1 2 1 1 2 1

It means the class to which the corresponding word belongs.

[0133] Therefore, P (C I C, ..., C) means the probability that an n-chain of word classes will occur.

j rn + l H

, P (W I c) is this class

J J c force also means the probability that a specific word w will occur. here

J J

A class means a group that takes into account the connectivity of the word, such as the part of speech of the word and the unit that further subdivides it.

[0134] When such a general class N-gram language model is used, since W is an unregistered word, P (W I C) cannot be obtained in advance for the unregistered word class. One power of a model with P (WIC) for unregistered words In the method shown in Fig. 14-16, the unregistered word word Wj is modeled as a smaller basic word chain. Yes (see Japanese Patent Application No. 2003-276844 “Continuous Speech Recognition Device and Continuous Speech Recognition Method”).

[0135] When a model for determining a category of unregistered words using such a model is used, an "unregistered word specific person name class" and "unregistered word specific program" are determined for each category of unregistered words to be determined. Language classes are trained by defining classes like “Class”.

[0136] Fig. 17 shows an example of a language model with n = 3 by such training. Using the language model shown in this example, the probability of occurrence of “recording 00” is that if “00” is considered a word of a unique program name class,

[Equation 2]

P (<motion "recording >> | <unregistered word specific program name>, <case particle>) · Ρ (recording | <motion" recording >>) = 0.8x0.35

= 0.28.

[0137] On the other hand, if "OO" is considered a proper person name class, [Equation 3] (<motion "recording"> | << unregistered word unique name>, <case particle>) · (recording I motion "recording">) = 0.2 x 0.35

= 0.07.

That is, in this case, since the occurrence probability is higher when “OO” is considered as the unique program name class, it can be determined as the unique program name class.

[0139] Exactly the same, the probability of occurrence of "△△ Appearance" is σ, where "△△" is considered as a unique program name class,

Picture

Ρ (<Action "Appearance >> | <Unregistered word specific program name>, <Case particles>) · (Appearance | <Action '' Appearance '>) = 0.1 x 0.35

= 0.035.

[0140] On the other hand, when "△△" is considered as a proper person name class,

[Equation 5]

Ρ (<Action "Appearance>> | <Unregistered word unique name>, <Case particles>) · (Appearance | <Action '' Appearance>>) = 0.7 x 0.35

= 0.245.

[0141] That is, since the probability of occurrence of a force that considers "△△" as a unique person name class is high, this can be determined as a proper person name class.

[0142] Thus, according to the speech recognition apparatus 200 according to the second embodiment, for example, the user's utterance is "a program in which today's" Taro Matsushita "appears" and "Taro Matsushita" If it is an unregistered word, “Taro Matsushita” is presumed to be an unregistered word belonging to the class of the proper person name, and the unregistered word class storage part 202 of the unregistered word class stores the unregistered word from the unregistered word word storage part. Search for registered word candidates. Then, a plurality of unregistered word candidates acquired as a result of the search are presented to the user via the result display unit 107, and a response as shown in FIG. [0143] On the other hand, if the user's utterance is "record tomorrow""shoot the sun" and "shoot the sun" is an unregistered word, then "shoot the sun" ”Is estimated as an unregistered word belonging to the unique program name class, and unregistered word candidates are searched from the unregistered word word storage unit of the unique program name class in the word storage unit 202 for each unregistered word class. Then, a plurality of unregistered word candidates acquired as a result of the search, for example, word candidates of the unregistered word specific program class as shown in FIG. 18 are presented to the user via the result display unit 107.

[0144] The effect of narrowing the search range of unregistered words according to the category of unregistered words in this way is, for example, when a program name is presented despite a person search. Presents words in a category that is not intended by users as unregistered words, and prevents users from being confused, and by narrowing the search range to search for unregistered words Can be mentioned.

[0145] In the second embodiment, an example of a determination method using a class N-gram language model is shown as a method of determining an unregistered word category in the unregistered word class determination unit. In addition to such a determination method, in the case where a speech recognition system equipped with this unregistered word presenting means is used as an input interface of a spoken dialogue system, a method using the context information of the dialogue is possible. In this method, the dialog management unit of the voice dialog system generates estimated information on word categories that the user is likely to utter from the dialog history information, and transmits this to the unregistered word class determination unit. The unregistered word class determination unit determines the category of the unregistered word word from the estimated information regarding the word category transmitted. FIG. 19 shows a block diagram of the unregistered word class determination unit in such a configuration.

[0146] In this case, the unregistered word class determination unit 201b is based on the category acquired by the word category information receiving unit 241 that acquires the category of the spoken word and the category acquired by the word category information receiving unit 241. An unregistered word class determining unit 242 that determines a category of a word determined to be a registered word. The effect of adopting such a configuration is that when the unregistered word category is determined using the class N-gram language model, the expected input utterance must be a sentence utterance. In addition, by using estimation results from applications such as the dialog management unit, it is possible to determine the category even if the input speech is word utterance. [Embodiment 3]

Subsequently, a speech recognition apparatus according to Embodiment 3 of the present invention will be described.

FIG. 20 is a block diagram showing a functional configuration of the speech recognition apparatus according to the third embodiment.

As shown in FIG. 20, the speech recognition apparatus 300 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, an unregistered word candidate search unit 105, In addition, it is common to the speech recognition apparatus 100 according to the first embodiment and the like in that the result display unit 107 is provided. However, the speech recognition apparatus 300 according to the third embodiment includes the unregistered word word storage unit 301 connected to the unregistered word word server 303 via the network 302. Different from 100 etc. speech recognition device related to 1 etc. In the following, this difference will be mainly described. Note that the same reference numerals are assigned to the same parts as those in the first embodiment and the description thereof is omitted.

[0150] The unregistered word storage unit 301 has a function of storing a large number of unregistered words that are to be searched for unregistered words in the unregistered word candidate search unit 105, and at the same time updating the stored information by means of communication means. .

[0151] The network 302 is a communication network such as the Internet or a telephone line.

[0152] The unregistered word word server 303 stores the necessary latest unregistered word words, and provides a server device that provides such information to a client (in this case, the speech recognition device 300). It is.

[0153] Next, the operation of the speech recognition apparatus 300 configured as described above will be described.

[0154] The output flow of the speech recognition apparatus for the user's utterance in the third embodiment is the same as that shown in the first embodiment. The difference in the third embodiment is in the maintenance method of the unregistered word word storage unit 301 referred to by the unregistered word candidate search unit 105.

[0155] In the third embodiment, the unregistered word storage unit 301 can be arbitrarily updated. In other words, if words that change and increase daily, such as unique names and unique program names, are fixedly held, the user's spoken words cannot be searched when searching for unregistered word candidates. For example, depending on the time of program modification in television broadcasting or the start of a new season in pro sports, New titles appear, new entertainers and new athlete names appear, and these become unregistered words.

[0156] Therefore, it is possible to update the words stored in the unregistered word word storage unit 301, and by storing these new unregistered words in the unregistered word word storage unit 301, the unregistered word candidate search It is possible to avoid a situation where the user cannot search for words spoken at times.

The update operation of the words stored in the unregistered word storage unit 301 is specifically performed as follows.

[0158] An unregistered word storage unit 301 has been registered in advance to determine the number of days in which unregistered words are expected to increase rapidly. When this date arrives, an unregistered word update request is automatically sent to the telephone line. And transmitted to the unregistered word server 303 via the network 302 such as the Internet. Alternatively, an unregistered word storage unit 301 always performs an update request according to a predetermined schedule, and a user who feels that there is a shortage of unregistered word registration makes an update request. An update request is transmitted from the word storage unit 301 to the unregistered word server 303. Furthermore, the unregistered word word storage unit 301 detects that a certain amount of unregistered words has been added just by actively transmitting an update request to the unregistered word word server 303 at all times. The server 303 may transmit update information to the unregistered word storage unit 301 of each client. The unregistered word word server 303 that has received the update request or has determined that the update is necessary because the new unregistered word has reached the specified amount, stores the information about the added word in the unregistered word storage unit 301 of the client. Reply to

[0159] In this way, it is only necessary for unregistered word words to be properly maintained only on the unregistered word word server 303, so that each client can communicate to access the unregistered word word server 303. If only the means are provided, the unregistered word storage unit 301 can always be maintained in an optimum state.

[0160] In addition, by providing new unregistered words from an external server in this way, unregistered words such as unique names and unique titles that increase daily are converted into unregistered word words. It is possible to obtain an effect that it is possible to keep the unregistered word / word storage unit 301 in an optimal state without requiring the user to register in the storage unit 301.

[0161] In the third embodiment, it is dedicated to the operation of updating a word that is optimally maintained. The unregistered word word storage unit 301 updates the word stored in accordance with the unregistered word word server 303, but it is specialized for the update work, and the word is updated using information held by the server. It is also possible.

[0162] For example, in television broadcasting, an electronic program guide called EPG (Electronic Program Guide) is transmitted along with the broadcast wave. It is possible to automatically extract performer names and program titles recorded here and store them in the unregistered word storage unit 301. Similarly, there are many websites that contain information about entertainers and other websites that contain information about programs. It is also possible to collect necessary information by sequentially circulating these and store them in the unregistered word storage unit 301. Furthermore, from the past unregistered word reference history by the user, genres that the user is unlikely to utter, for example, genre information such as professional baseball player names, foreign movie actor names, Japanese movie titles, etc. are extracted in advance. The unregistered words of these extracted genres may not be acquired from the unregistered word word server 303. Thereby, it is possible to obtain an effect of preventing the unregistered word word storage unit 301 from being unnecessarily enlarged.

[0163] Further, a modification in which the word stored in the speech recognition vocabulary storage unit 102 is updated is also conceivable. As a specific example of this modification, a server provided outside the figure selects a word that is likely to be spoken by the user in the near future, and the speech recognition vocabulary storage unit 102 for the selected word. The contents of may be updated. As such a word, for example, when this audio recognition device 300 is applied to a recording reservation system, it is related to a program scheduled to be broadcast within one week from the performer names and program titles recorded in the EPG described above. Can be used appropriately. Then, the server generates information used by the speech recognition unit 101 for recognizing the extracted word, and updates the content of the speech recognition vocabulary storage unit 102 with the generated information.

Such an update operation can be performed in exactly the same manner as the operation of updating the contents of the unregistered word word storage unit 301 from the unregistered word word server 303 via the network 302. Preferably, the information for recognizing words related to programs scheduled to be broadcast in the past is deleted every day, and the information for recognizing words related to programs scheduled to be broadcast one week ahead is added. May be. [0165] According to this configuration, when the usage frequency of the registered word shows a known time variation, the recognition word (speech recognition vocabulary) given from the outside following the variation is used to increase the frequency. Since only a relatively small number of information for recognition that is expected to be used can be stored in the speech recognition vocabulary storage unit 102, it is easy to shorten the recognition time and obtain a good recognition rate.

[Embodiment 4]

Subsequently, a speech recognition apparatus according to Embodiment 4 of the present invention will be described.

FIG. 21 is a block diagram showing a functional configuration of the speech recognition apparatus according to Embodiment 4 of the present invention.

As shown in FIG. 21, the speech recognition apparatus 400 includes a speech recognition unit 101, a speech recognition vocabulary storage unit 102, a reference similarity calculation unit 103, an unregistered word determination unit 104, and a result display unit 107. This is common with the speech recognition apparatus 100 according to the first to third embodiments. However, the speech recognition apparatus 400 according to the fourth embodiment includes the unregistered word search request transmission / reception unit 401 connected to the unregistered word search server 403 via the network 402. It differs from the speech recognition apparatus 100 according to 1-3. Hereinafter, this difference will be mainly described. Note that the same reference numerals are given to the same portions as those in the first embodiment and the description thereof is omitted.

[0169] The unregistered word search request transmission / reception unit 401 transmits an unregistered word search request to the unregistered word search server 403 via the network 402, and the unregistered word search server 403 searches for the unregistered word. A processing unit that receives the result, and is realized by a communication interface or the like. This unregistered word search request transmission / reception unit 401 receives the subword sequence obtained by the reference similarity calculation unit 103 as described in Embodiment 1 and the input speech when search for an unregistered word is necessary. Information indicating the utterance content of the unregistered word utterance location, such as parameters of the unregistered word part, is sent to the unregistered word search server 403 via the network 402, and the reply from the unregistered word search Sarnoku is not registered The result is output to the result display unit 107 as a word search result.

[0170] The network 402 is a communication network such as the Internet or a telephone line.

[0171] The unregistered word search server 403 is a server device that searches for unregistered words in response to a request from the client (speech recognition device 400). And a storage unit 405.

[0172] The unregistered word search unit 404 is a processing unit that performs an unregistered word search, receives information about unregistered words from the client via the network 402, and receives the search result via the network 402. It also has a communication function for replying.

Unregistered word word storage section 405 is a storage device such as a hard disk that stores information related to unregistered word words.

[0174] Next, the operation of the speech recognition apparatus 400 configured as described above will be described.

In the fourth embodiment, the output flow of the speech recognition apparatus for the user's utterance is the same as that shown in the first embodiment. The difference in the fourth embodiment is that the unregistered word candidate search unit 105 in the first embodiment is not provided inside, and the search operation for unregistered word candidates is outsourced to an external server.

That is, when it is determined that an unregistered word is included in the utterance by the user, the subword sequence of the unregistered word portion obtained by the reference similarity calculation unit 103 is transmitted / received as an unregistered word search request. The unit 401 transmits the unregistered word search server 403. The unregistered word search unit 404 that has received the sub-word sequence of the unregistered word portion from the client searches the unregistered word uttered by the user from the word group stored in the unregistered word word storage unit 405. Yeah. Here, as a method of searching for an unregistered word using a subword sequence, the method described with reference to FIG. 10 in Embodiment 1 described above is effective. The search results obtained in this way are returned to the unregistered word search request transmission / reception unit 401 via the network 402 as unregistered word candidates. The unregistered word search request transmission / reception unit 401 passes the returned unregistered word search result to the result display unit 107, and uses the fact that the word spoken by the user via the result display unit 107 was an unregistered word. Present to the person.

[0177] As described above, maintenance of the unregistered word storage section that changes and increases daily by having an external server perform the process of searching for unregistered words and placing it remotely. If it becomes possible to consolidate the costs and keep the maintenance costs low, the effect can be obtained.

[0178] Moreover, in order to search for a target word from the large vocabulary list, a large computational resource is required. By entrusting such work to the outside, the speech recognition apparatus itself The hardware configuration can be made compact.

[0179] On the other hand, since the server side can generally have a relatively large hardware configuration, it is difficult to install unregistered words that are difficult to install on the client side such as a mobile terminal. A search algorithm can be implemented, and it may be possible to improve the search accuracy for unregistered words.

[0180] In the fourth embodiment, an example in which a subword sequence is used as search data for unregistered word search has been shown. However, as described in the first embodiment, the speech uttered by the user itself, It is also possible to implement an unregistered word search server so that unregistered word searches are performed using the acoustic parameters extracted from it.

Industrial applicability

[0181] The present invention relates to various electronic devices that use speech recognition technology as input means to devices, such as AV devices such as TVs and videos, car-mounted devices such as car navigation systems, and portable devices such as PDAs and mobile phones. It can be used for terminals, etc., and its industrial applicability is very wide and large.

Claims

The scope of the claims

[1] A speech recognition device for recognizing spoken speech,

Defining a vocabulary for speech recognition and storing it as a registered word;

Speech recognition means for collating the spoken speech with registered words stored in the speech recognition word storage means;

Whether the spoken speech is a registered word that is stored in the speech recognition word storage unit or is an unregistered word that is stored based on the collation result of the speech recognition unit Unregistered word determination means for determining

Unregistered word storage means for storing the unregistered word;

If the unregistered word determining means determines that the word is an unregistered word, an unregistered word candidate that seems to correspond to the spoken voice is stored in the unregistered word word storage means based on the spoken voice. An unregistered word candidate search means for searching from unregistered words that are memorized,

And a result display means for displaying the search result.

A speech recognition apparatus characterized by that.

[2] The unregistered word candidate search means includes:

Search multiple unregistered word candidates stored in the unregistered word word storage means!

The speech recognition apparatus according to claim 1, wherein:

[3] The unregistered word storage means

According to the category to which the unregistered word belongs, the unregistered word is stored by classification for each category.

The speech recognition apparatus according to claim 1 or 2, wherein

[4] The voice recognition device further includes:

An unregistered word class determining unit that determines a category to which the unregistered word belongs based on the spoken voice.

The speech recognition apparatus according to claim 3.

[5] The unregistered word candidate search means includes:

Based on the determination result of the unregistered word class determining means, the unregistered word candidate is searched from the classified categories in the unregistered word word storing means.

The speech recognition apparatus according to claim 4, wherein:

[6] The voice recognition device further includes:

Comprising information acquisition means for acquiring information relating to the category;

The unregistered word candidate search means includes:

Based on the information acquired by the information acquisition means, the unregistered word candidate is searched from the classified categories in the unregistered word word storage means.

The speech recognition apparatus according to claim 3.

[7] The result display means includes:

The search result is displayed by excluding the registered word stored in the word storage means for speech recognition from the search result in the unregistered word candidate search means.

The speech recognition apparatus according to claim 1, wherein:

[8] The unregistered word candidate search means searches for the unregistered word candidate by calculating an unregistered word score obtained by quantifying the degree of similarity with the spoken voice!

The speech recognition apparatus according to claim 1, wherein:

[9] The result display unit displays the unregistered word candidate and the unregistered word score as the search result.

The speech recognition apparatus according to claim 8.

[10] The result display unit changes the display of the unregistered word candidates according to the unregistered word score.

The speech recognition apparatus according to claim 9.

[11] Unregistered words stored in the unregistered word storage means are updated under predetermined conditions.

The speech recognition apparatus according to claim 1, wherein:

[12] The voice recognition device further includes:

The unregistered word that is stored in the unregistered word storage means! A communication means for communicating with the word server;

When the communication unit receives the unregistered word group from the unregistered word server, the unregistered word stored in the unregistered word word storage unit is updated.

The speech recognition apparatus according to claim 11, wherein:

[13] The registered words stored in the speech recognition word storage means are updated under predetermined conditions.

The speech recognition apparatus according to claim 1, wherein:

[14] A speech recognition system for recognizing spoken speech,

The voice recognition system includes:

A speech recognition device for recognizing spoken speech, and an unregistered word search server for searching for unregistered words registered in the speech recognition device!

The voice recognition device

A search request transmitting means for requesting the unregistered search server to search for an unregistered word candidate that seems to correspond to the spoken speech when the unregistered word determining means determines that the word is an unregistered word;

Search result receiving means for obtaining a search result of the unregistered word candidate from the unregistered word search server;

And a result display means for displaying the search result,

The unregistered word search server

Unregistered word storage means for storing the unregistered word;

Search request receiving means for receiving the search request from the search request transmitting means; When the search request receiving means receives the search request, an unregistered word candidate that seems to correspond to the spoken voice is stored in the unregistered word word storage means based on the spoken voice. Being! An unregistered word candidate search means for searching from unregistered unregistered words,

A speech recognition system comprising: search result transmission means for transmitting the search result to the speech recognition apparatus.

[15] Speech in a speech recognition system comprising a speech recognition device that recognizes spoken speech and an unregistered word search server that searches unregistered words registered in the speech recognition device! A recognition device,

The spoken speech is collated with a registered word stored in the speech recognition word storage means.

Voice recognition means to

And a result display means for displaying the search result.

A speech recognition apparatus characterized by that.

[16] An unrecognized speech recognition system comprising a speech recognition device that recognizes spoken speech and an unregistered word search server that searches unregistered words registered in the speech recognition device. A registered word search server,

Unregistered word storage means for storing the unregistered word; A search request receiving means for receiving the search request from the search request transmitting means; and when the search request receiving means receives the search request, based on the uttered voice, Possible unregistered word candidates are stored in the unregistered word storage means! An unregistered word candidate search means for searching from unregistered unregistered words,

An unregistered word search server, comprising: search result transmission means for transmitting the search result to the speech recognition apparatus.

[17] A speech recognition method for recognizing spoken speech,

A speech recognition step of collating the spoken speech with a registered word stored in a speech recognition word database that defines a vocabulary for speech recognition and stores it as a registered word;

Based on the collation result in the speech recognition step, it is determined whether the spoken speech is stored in the speech recognition word database and stored as a registered word or not. Unregistered word determination step,

If the unregistered word is determined to be an unregistered word in the unregistered word determining step, the unregistered word is stored as an unregistered word candidate that seems to correspond to the spoken speech based on the spoken speech. An unregistered word candidate search step for searching from unregistered words stored in the unregistered word database;

A result display step for displaying the search result.

A speech recognition method characterized by the above.

[18] A program for a speech recognition device for recognizing spoken speech,

If the unregistered word is determined as an unregistered word in the unregistered word determination step, the spoken voice Unregistered word candidates that are considered to correspond to the spoken speech based on the unregistered word stored in the unregistered word word database storing the unregistered words are searched for unregistered words A word candidate search step;

A program for causing a computer to execute a result display step for displaying the search result.