CN106921560B

CN106921560B - Voice communication method, device and system

Info

Publication number: CN106921560B
Application number: CN201710111296.8A
Authority: CN
Inventors: 王柯
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2020-06-02
Anticipated expiration: 2037-02-28
Also published as: CN106921560A; EP3367379A3; EP3367379A2; EP3367379B1; US10728196B2; US20180248823A1

Abstract

The disclosure relates to a voice communication method, device and system. The method comprises the following steps: acquiring a first operation instruction, wherein the first operation instruction is used for indicating to acquire additional voice content of first voice information; acquiring additional voice content of the first voice information according to the first operation instruction; generating second voice information according to the additional voice content and the additional identification, wherein the additional identification is used for indicating that the voice content included in the second voice information is the additional voice content; and sending the second voice information to the server. In the disclosure, the terminal may add the additional voice content on the basis of the transmitted voice information according to the user instruction, and by sending the additional voice content carrying the additional identifier to the server and instructing the server to send the additional voice content to the opposite terminal, the situation that the voice received by the opposite terminal is incomplete due to a network environment or human reasons and the like is effectively avoided.

Description

Voice communication method, device and system

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a voice communication method, apparatus, and system.

Background

With the development of communication technology, mobile phones are widely used, and the social range of people is greatly expanded. Meanwhile, based on the expansion of the social contact range, the situation that the user uses the instant messaging software of the mobile phone to communicate is more and more. In the related art, a user can send characters, pictures and voice through instant messaging software. Because the voice information does not need to be typed by the user and is sent more conveniently and quickly, the voice information becomes the preferred form for the user to carry out instant messaging.

Disclosure of Invention

To overcome the problems in the related art, embodiments of the present disclosure provide a voice communication method, apparatus, and system. The technical scheme is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a voice communication method, including:

acquiring a first operation instruction, wherein the first operation instruction is used for indicating to acquire additional voice content of first voice information;

acquiring additional voice content of the first voice information according to the first operation instruction;

generating second voice information according to the additional voice content and the additional identification, wherein the additional identification is used for indicating that the voice content included in the second voice information is the additional voice content;

and sending the second voice information to the server.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: in this embodiment, the terminal may add the additional voice content on the basis of the voice information that has been sent according to the user instruction, and by sending the additional voice content carrying the additional identifier to the server, may instruct the server to send the additional voice content to the opposite terminal. By the embodiment, when the terminal sends the voice information to the opposite terminal, the situation that the voice received by the opposite terminal is incomplete due to network environment or human reasons and the like is effectively avoided; and moreover, the opportunity of modifying the sent voice information is provided for the terminal user, and the user experience is favorably improved.

In one embodiment, the obtaining the first operation instruction comprises:

determining whether the first voice information is complete;

and when the first voice information is incomplete, acquiring a first operation instruction.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: after first voice information sends for the server, whether terminal can judge first voice information at first is complete, when first voice information is incomplete, acquire the first operating instruction of user input, and then acquire first voice information's additional information, it is unclear to have avoided adding the semanteme that additional voice content leads to under the complete circumstances of first voice information, cause misleading or puzzlement for the contralateral terminal user who receives first voice information, user experience has been promoted.

In one embodiment, the method further comprises:

when the first voice information is incomplete, prompt information is displayed and used for prompting a user that the first voice information is incomplete.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: when the terminal determines that the first voice information is incomplete, prompt information can be displayed on a screen for prompting that the first voice information input by a user is incomplete, so that the user can timely indicate the terminal to add the additional voice content of the first voice information, the integrity of the voice information sent to the opposite side terminal is guaranteed, misleading or puzzling caused by the incomplete voice information to the opposite side terminal user is avoided, and user experience is improved.

In one embodiment, the determining whether the first speech information is complete comprises:

determining whether a first time length occupied by effective voice content in the first voice information is smaller than or equal to a preset proportion of a second time length, wherein the second time length is the time length of starting a microphone when the first voice information is received;

and when the first time length occupied by the effective voice content in the first voice information is less than or equal to the preset proportion of the second time length, determining that the first voice information is incomplete.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: whether the first voice information is complete or not is determined according to the time occupied by the effective voice content in the first voice information, so that the accuracy of determining whether the voice information is complete or not by the terminal is improved, and the misjudgment of the terminal is avoided.

In one embodiment, the method further comprises:

and storing the additional voice content at the storage address corresponding to the first voice information.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: and storing the additional voice content in a storage address corresponding to the first voice information, so that the relevance between the first voice information and the additional voice content is ensured, and the user can read the complete voice information conveniently.

In one embodiment, the obtaining the first operation instruction comprises:

and acquiring a first operation instruction through a voice icon corresponding to the first voice information on the user interface.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: when a user needs to input the additional voice content of the first voice information, a first operation instruction can be input to the terminal by operating the voice icon of the first voice information on the user interface, so that the terminal takes the acquired voice as the additional voice content of the first voice information. The realization mode carries out voice additional speaking aiming at the sent voice information for the terminal side user, and the operation mode is more visual and convenient.

According to a second aspect of the embodiments of the present disclosure, there is provided a voice communication method, including:

receiving third voice information sent by the first terminal;

detecting whether the third voice information carries an additional identifier, wherein the additional identifier is used for indicating that the voice content included in the third voice information is additional voice content;

and if the third voice information carries the additional identification, sending the voice content included in the third voice information to the second terminal as the additional voice content of fourth voice information, wherein the fourth voice information is the voice information sent to the second terminal by the first terminal before the third voice information is sent.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the server can send the voice content of one voice message to the terminal as the additional voice content of another voice message, so that the integrity of the voice message received by the terminal is ensured, the problems of misleading or puzzlement to a terminal user and the like caused by incomplete voice message received by the terminal due to network environment or human reasons and the like are avoided, and the user experience is improved.

In an embodiment, if the third voice information carries the additional identifier, sending the voice content included in the third voice information to the second terminal as the additional voice content of the fourth voice information includes:

if the third voice information carries the additional identification, detecting whether the third voice information carries a second terminal identification, wherein the second terminal identification is used for uniquely identifying the second terminal;

and if the third voice information carries the second terminal identification, the voice content included in the third voice information is used as the additional voice content of the fourth voice information and is sent to the second terminal.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the server can determine which terminal the additional voice content is sent to through the terminal identification carried by the voice information, so that the accuracy of judgment of the sending object is ensured, and the server is prevented from misjudging.

if the third voice information carries an additional identifier, detecting whether the third voice information carries a fourth voice information identifier, wherein the fourth voice information identifier is used for uniquely identifying the fourth voice information;

and if the third voice information carries the fourth voice information identifier, sending the voice content included in the third voice information to the second terminal as the additional voice content of the fourth voice information.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the server can determine the additional voice content to be attached to the voice information through the voice information identification carried by the voice information, so that the accuracy of judging the voice additional object is ensured, and the server is prevented from misjudging.

In one embodiment, the sending the voice content included in the third voice information to the second terminal as the additional voice content of the fourth voice information includes:

instructing the second terminal to store the voice content included in the third voice information to the storage address of the fourth voice information and withdraw the fourth voice information; or

And instructing the second terminal to add the voice content included in the third voice information to the voice content included in the fourth voice information.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: by the embodiment, the additional voice content can replace the voice information at the same storage address, and under the situation, the situation that the sending end wants to re-express due to the wrong voice expression is supported, so that the opportunity of modifying the sent voice information is provided for the terminal user, and the user experience is effectively improved; or the additional voice content and the voice information can be stored according to the receiving sequence, under the situation, the situation that the information sent by the terminal is incomplete and the receiving end cannot receive the complete voice due to the network and the like is supported, and the integrity of the voice received by the receiving end can be ensured.

In one embodiment, said revoking said fourth voice information comprises:

determining whether read feedback information of fourth voice information sent by the second terminal is received;

and if the read feedback information of the fourth voice information sent by the second terminal is not received, withdrawing the fourth voice information.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the embodiment supports the withdrawal of the voice information under the condition that the voice information is not read by the user, fully considers the feeling of the user at the information receiving end, saves the time for the user to read the information and improves the reading efficiency.

According to a third aspect of the embodiments of the present disclosure, there is provided a voice communication method, including:

receiving a voice additional request sent by a server, wherein the voice additional request is used for requesting that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is the voice information received before the third voice information is received, the voice additional request carries the third voice information, an additional identifier and a fourth voice information identifier, the additional identifier is used for indicating that the voice content included in the third voice information is the additional voice content, and the fourth voice information identifier is used for uniquely identifying the fourth voice information;

and according to the voice additional request, taking the voice content included in the third voice information as the additional voice content of the fourth voice information.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the terminal can add the additional voice content to the user after the incomplete voice information when determining that the received voice information is the additional voice information of another voice information, so that the problems of misleading or puzzlement to the terminal user and the like caused by incomplete voice information received by the terminal due to network environment or human reasons and the like are avoided, and the user experience is improved.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a voice communication apparatus including:

the first acquisition module is used for acquiring a first operation instruction, and the first operation instruction is used for indicating to acquire additional voice content of first voice information;

the second acquisition module is used for acquiring the additional voice content of the first voice message according to the first operation instruction;

a generating module, configured to generate second voice information according to the additional voice content and an additional identifier, where the additional identifier is used to indicate that the voice content included in the second voice information is additional voice content;

and the first sending module is used for sending the second voice information to a server.

In one embodiment, the first obtaining module comprises:

the determining submodule is used for determining whether the first voice information is complete;

and the first acquisition submodule is used for acquiring the first operation instruction when the first voice information is incomplete.

In one embodiment, the apparatus further comprises:

and the prompt module is used for displaying prompt information when the first voice information is incomplete, and the prompt information is used for prompting a user that the first voice information is incomplete.

In one embodiment, the determining sub-module includes:

a first determining unit, configured to determine whether a first duration occupied by valid voice content in the first voice message is less than or equal to a preset proportion of a second duration, where the second duration is a duration for which a microphone is turned on when the first voice message is received;

and the second determining unit is used for determining that the first voice information is incomplete when the first time length occupied by the effective voice content in the first voice information is less than or equal to the preset proportion of the second time length.

In one embodiment, the apparatus further comprises:

and the storage module is used for storing the additional voice content in a storage address corresponding to the first voice message.

In one embodiment, the first obtaining module comprises:

and the second obtaining submodule is used for obtaining the first operation instruction through a voice icon corresponding to the first voice information on a user interface.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a voice communication apparatus including:

the first receiving module is used for receiving third voice information sent by the first terminal;

a detection module, configured to detect whether the third voice information carries an additional identifier, where the additional identifier is used to indicate that a voice content included in the third voice information is an additional voice content;

and a second sending module, configured to send, when the third voice information carries the additional identifier, a voice content included in the third voice information to a second terminal as an additional voice content of fourth voice information, where the fourth voice information is the voice information sent by the first terminal to the second terminal before the third voice information is sent.

In one embodiment, the second sending module comprises:

a first detection submodule, configured to detect whether the third voice information carries a second terminal identifier when the third voice information carries the additional identifier, where the second terminal identifier is used to uniquely identify the second terminal;

and a first sending submodule, configured to send, when the third voice information carries the second terminal identifier, the voice content included in the third voice information to the second terminal as an additional voice content of the fourth voice information.

In one embodiment, the second sending module further comprises:

a second detection submodule, configured to detect whether the third voice information carries a fourth voice information identifier when the third voice information carries the additional identifier, where the fourth voice information identifier is used to uniquely identify the fourth voice information;

and a second sending submodule, configured to send, when the third voice information carries the fourth voice information identifier, the voice content included in the third voice information to the second terminal as an additional voice content of the fourth voice information.

In one embodiment, the second sending module comprises:

the first indication submodule is used for indicating the second terminal to store the voice content included in the third voice message into the storage address of the fourth voice message and withdraw the fourth voice message; or

And the second indicating submodule is used for indicating the second terminal to add the voice content included in the third voice message to the voice content included in the fourth voice message.

In one embodiment, the first indication sub-module includes:

a third determining unit, configured to determine whether read feedback information of the fourth voice information sent by the second terminal is received;

and the withdrawing unit is used for withdrawing the fourth voice message when the read feedback information of the fourth voice message sent by the second terminal is not received.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a voice communication apparatus including:

a second receiving module, configured to receive a voice additional request sent by a server, where the voice additional request is used to request that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is voice information received before the third voice information is received, the voice additional request carries the third voice information, an additional identifier, and a fourth voice information identifier, the additional identifier is used to indicate that the voice content included in the third voice information is an additional voice content, and the fourth voice information identifier is used to uniquely identify the fourth voice information;

and the processing module is used for taking the voice content included in the third voice information as the additional voice content of the fourth voice information according to the voice additional request.

According to a seventh aspect of the embodiments of the present disclosure, there is provided a voice communication system including:

any one of the voice communication apparatuses provided by the fourth aspect, any one of the voice communication apparatuses provided by the fifth aspect, and the voice communication apparatus provided by the sixth aspect.

According to an eighth aspect of the embodiments of the present disclosure, there is provided a voice communication apparatus including:

a first processor;

a first memory for storing first processor-executable instructions;

wherein the first processor is configured to:

acquiring additional voice content of the first voice message according to the first operation instruction;

generating second voice information according to the additional voice content and an additional identifier, wherein the additional identifier is used for indicating that the voice content included in the second voice information is the additional voice content;

and sending the second voice information to a server.

According to a ninth aspect of the embodiments of the present disclosure, there is provided a voice communication apparatus including:

a second processor;

a second memory for storing second processor-executable instructions;

wherein the second processor is configured to:

receiving third voice information sent by the first terminal;

and if the third voice information carries the additional identifier, sending the voice content included in the third voice information to a second terminal as the additional voice content of fourth voice information, wherein the fourth voice information is the voice information sent to the second terminal by the first terminal before the third voice information is sent.

According to a tenth aspect of the embodiments of the present disclosure, there is provided a voice communication apparatus including:

a third processor;

a third memory for storing third processor-executable instructions;

wherein the third processor is configured to:

receiving a voice additional request sent by a server, wherein the voice additional request is used for requesting that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is the voice information received before the third voice information is received, the voice additional request carries the third voice information, an additional voice identifier and a fourth voice information identifier, the additional voice identifier is used for indicating that the voice content included in the third voice information is the additional voice content, and the fourth voice information identifier is used for uniquely identifying the fourth voice information;

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1a is a flow chart illustrating a method of voice communication according to an example embodiment.

FIG. 1b is an interface diagram of a terminal shown in accordance with an example embodiment.

FIG. 1c is a flow chart illustrating a method of voice communication according to an example embodiment.

FIG. 1d is a flow chart illustrating a method of voice communication according to an example embodiment.

FIG. 1e is a flow chart illustrating a method of voice communication according to an example embodiment.

Fig. 2a is a flow chart illustrating a method of voice communication according to an example embodiment.

Fig. 2b is a flow chart illustrating a method of voice communication according to an example embodiment.

Fig. 2c is a flow chart illustrating a method of voice communication according to an example embodiment.

Fig. 3 is a flow chart illustrating a method of voice communication according to an example embodiment.

Fig. 4 is an interaction diagram illustrating a voice communication method according to an example embodiment.

Fig. 5 is an interaction diagram illustrating a voice communication method according to an example embodiment.

Fig. 6a is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 6b is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 6c is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 6d is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 6e is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 6f is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 7a is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 7b is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 7c is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 7d is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 7e is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 7f is a schematic diagram illustrating the structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 8 is a schematic diagram illustrating a structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating a structure of a voice communication apparatus according to an exemplary embodiment.

Fig. 10 is a block diagram illustrating another voice communication apparatus according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The technical scheme provided by the embodiment of the disclosure relates to a server and a terminal. The terminal can be a mobile phone, a tablet personal computer, an intelligent watch and other devices capable of carrying out voice communication; the server may be a server provided by an operator or a server provided by a third-party platform, which is not limited in this disclosure. In the related technology, when a user sends voice information, the user can click a voice icon on a chat interface, at the moment, a microphone is started by a sending end, the voice content input by the user is received and sent to a server, and the server sends the voice information to a receiving end, so that information interaction between the sending end and the receiving end is realized.

In practical application, situations sometimes occur that the voice received by the receiving end is incomplete due to network or human reasons, or the voice sent by the sending end is not the voice content really intended to be sent by the user, or the voice content sent by the user is misspoken and cannot be recovered. For example, if the user mistakenly touches the first terminal to turn on the microphone, and the first terminal user does not input any voice content, no valid information exists in the transmitted voice information; or due to a network reason, the first terminal only sends a part of voice content of the first voice message, and the other part of voice content is lost, so that the voice message received by the receiving end is incomplete.

The above situation will cause misleading or puzzlement to the user of the receiving end or the sending end, and the user experience is not good. In the embodiment of the disclosure, a terminal can add additional voice content on the basis of the transmitted voice information according to the instruction of a user, and can instruct a server to transmit the additional voice content to an opposite terminal by transmitting the additional voice content carrying an additional identifier to the server. By the embodiment, when the terminal sends the voice information to the opposite terminal, the situation that the voice received by the opposite terminal is incomplete due to poor network environment or user reasons and the like is effectively avoided; and moreover, the opportunity of modifying the sent voice information is provided for the terminal user, and the user experience is favorably improved.

Fig. 1a is a flowchart illustrating a voice communication method according to an exemplary embodiment, as shown in fig. 1a, the voice communication method includes the following steps 101 to 104:

in step 101, a first operation instruction is obtained, wherein the first operation instruction is used for indicating that the additional voice content of the first voice information is obtained.

For example, the terminal may send the voice information in various ways, for example, in an instant messaging application, or in a short message application, or in a game application with a voice chat function, and the like, which is not limited in this disclosure.

In the embodiment of the present disclosure, taking the instant messaging application as an example, after the user sends the first voice message, for example, if the user finds that the first voice message is incomplete, or the user finds that the voice content included in the first voice message is expressed incorrectly, the user may perform further operation on the terminal to instruct the terminal to acquire the additional voice content of the first voice message. Optionally, the additional voice content may be displayed on the terminal in a manner that the additional voice content corresponds to the same voice icon as the first voice information. It should be noted that, it is assumed that the first voice message can be read on the terminal by clicking a "first voice bubble" on the chat interface, and the corresponding same voice icon means that after the voice addition is successful, the user can read the additional voice content by clicking the first voice bubble.

It should be noted that the displaying manner of the additional voice content includes, but is not limited to, the above-mentioned manner, and the disclosure does not specifically limit this. For example, the additional voice contents can be sequentially displayed on the chat interface according to the occurrence time sequence of the voice contents which have already occurred. For example, the additional voice content may occupy a separate voice bubble and show the association between the additional voice content and the attached voice message (the first voice message) on the chat interface. For example, the voice bubble corresponding to the additional voice content and the voice bubble corresponding to the first voice information may be connected by a wire. For another example, after receiving the additional voice content of the first voice message, the voice bubble corresponding to the first voice message and the voice bubble corresponding to the additional voice content "flash" at the same time to prompt the user of the association between the two.

Optionally, the process of obtaining the first operation instruction may be implemented by a user clicking an icon (e.g., a voice bubble) corresponding to the first voice message on the chat interface. And after receiving the clicking operation, the terminal opens the microphone for receiving the additional voice content input by the user. For example, if the user holds down a voice bubble, the interface will present a number of actionable options, including an "add voice" function; after clicking the "add voice" function, the user will automatically open the microphone to receive the voice input by the user, i.e., to enable receiving additional voice content.

For example, as shown in fig. 1b, when the user presses the icon b01 of the first voice information for a long time, the terminal pops up the operation menu b02, and the operation menu b02 is provided with a plurality of operable options b03, which include "copy", "favorite", "withdraw", "delete", and "add voice", etc. The user may click the "add voice" option, at which point the terminal instructs the microphone to turn on, receives additional voice content of the first voice message input by the user, and when the user releases his hand, the terminal instructs the microphone to turn off. Note that the long press method may be replaced by a single click, a double click, or the like.

Or, when the user presses the icon of the first voice message for a long time, the terminal can pop up a prompt box, the prompt box displays the character of 'whether to continue speaking', and if the user selects to continue speaking, the terminal indicates that the microphone is turned on to receive the voice content input by the user.

It should be noted that, the manner of receiving the first operation instruction in the present embodiment includes, but is not limited to, the foregoing manner, and the present disclosure does not specifically limit this. For example, a button is set on the terminal, after the user clicks the button, each piece of chat information on the interface is displayed in an optional state (for example, a click can be performed), the user determines that the piece of chat information is information to be subjected to voice addition by selecting the target chat content (that is, the first voice information), and controls the microphone to be turned on to receive the additional voice content. The first operation instruction is used for instructing the terminal to acquire the additional voice content of the first voice message.

Optionally, when the terminal user suspects that the voice information may be incomplete and may affect reading by others, the terminal user may click a voice icon corresponding to the first voice information (e.g., click), turn on a speaker to play the voice content of the first voice information, and manually detect whether the first voice information is complete. If the first voice information is incomplete, the user can input a first operation instruction to input the additional voice content. It should be noted that, the judgment of whether the voice content is complete may be made manually by the user or by the terminal, and this disclosure does not specifically limit this.

Optionally, the expression form of the first operation instruction may be a preset type of operation, for example, the first operation instruction may be in a double-click mode, a single-click mode, or a long-press mode, which is not limited in this disclosure.

In step 102, according to the first operation instruction, additional voice content of the first voice information is acquired.

For example, after determining that the first operation instruction is received, in response to the first operation instruction, the terminal may instruct the microphone to turn on, where the voice content collected by the microphone is the additional voice content of the first voice information input by the user.

In step 103, second voice information is generated according to the additional voice content and the additional identifier, wherein the additional identifier is used for indicating that the voice content included in the second voice information is the additional voice content.

For example, after acquiring the additional voice content of the first voice information, the terminal may package the additional voice content and the additional identifier to generate the second voice information. The additional identification is used for identifying that the second voice information comprises additional voice information.

An additional identification field may be generally set in an extension field of a header of the second voice information, and by writing a preset number in the additional identification field, an additional identification is added to the second voice information where the additional voice content is located. For example, the header field of regular voice information (excluding additional voice content) is the number 0; the header field of the voice information including the additional voice content is the number 1.

In step 104, the second voice information is sent to the server.

For example, the terminal may access a hotspot or Wi-Fi (Wireless-Fidelity), send the second voice information to the server through the internet, and may also send the second voice information to the server through the cellular data network, which is not limited in this disclosure.

Optionally, the user may input different additional voice contents of the first voice message for multiple times, and the terminal may generate multiple different voice messages according to the different additional voice contents input by the user, and send the multiple different voice messages to the server.

In this embodiment, after receiving the second voice information sent by the terminal, the server may parse the additional voice content contained therein and send it to the opposite terminal of the terminal, so as to implement communication between the terminal and the opposite terminal.

In the technical scheme provided by the embodiment of the disclosure, a terminal can add additional voice content on the basis of the sent voice information according to the instruction of a user, and the server can be instructed to send the additional voice content to the opposite terminal by sending the additional voice content carrying the additional identifier to the server. By the embodiment, when the terminal sends the voice information to the opposite terminal, the situation that the voice received by the opposite terminal is incomplete due to network environment or human reasons and the like is effectively avoided; and moreover, the opportunity of modifying the sent voice information is provided for the terminal user, and the user experience is favorably improved.

In one embodiment, as shown in fig. 1c, in step 101, acquiring a first operation instruction may be implemented by steps 1011 and 1012:

in step 1011, it is determined whether the first speech information is complete.

In step 1012, when the first voice message is not complete, a first operation instruction is obtained.

In this embodiment, it is determined whether the first voice information is complete, which may be determined by the user or determined by the terminal. For example, the user may turn on the speaker to listen to the transmitted voice information to determine whether the voice is complete. Alternatively, the terminal may perform speech analysis on the transmitted content to determine whether the transmitted speech is complete.

For example, after the terminal has sent the first voice message, the terminal may obtain a first duration occupied by the valid voice content in the first voice message, and then determine whether the first duration is less than or equal to a preset proportion of a second duration, where the second duration is a time when the microphone is turned on when the first voice message is received. And when the first time length occupied by the effective voice content in the first voice information is less than or equal to the preset proportion of the second time length, determining that the first voice information is incomplete. The preset proportion can be set according to actual conditions. For example, the terminal may determine whether the first duration is less than or equal to 30% of the second duration, and determine that the first voice message is incomplete when the first duration occupied by the valid voice content in the first voice message is less than or equal to 30% of the second duration. Whether the first voice information is complete or not is determined according to the time occupied by the effective voice content in the first voice information, so that the accuracy of determining whether the voice information is complete or not by the terminal is improved, the misjudgment of the terminal is avoided, and the user experience is improved.

It should be noted that, the manner for determining whether the first speech information is complete in the present disclosure includes, but is not limited to, the manner described above. For example, the terminal may divide the voice content included in the first voice information into a plurality of voice segments. Taking the first voice segment as an example, the terminal may detect a sound wave frequency of the first voice segment, indicate that the first voice segment includes valid voice content if the sound wave frequency is greater than or equal to a preset threshold, and indicate that the first voice segment is invalid voice content if the sound wave frequency is less than the preset threshold. The sum of the time occupied by the plurality of voice segments including the effective voice content is the first duration.

It should be noted that, in different application scenarios, the process of determining whether the first speech information is complete may be replaced with a process of determining other content. For example, whether the voice content contained in the first voice information meets the intention of the user is judged; or judging whether the voice content contained in the first voice information contains the uncivilized phrase or not; or, whether the voice content included in the first voice information is too long or too short is determined. Step 1012 may be adaptively adjusted for different application scenarios.

In the technical scheme provided by the embodiment of the disclosure, after first voice information is sent to the server, the terminal can firstly judge whether the first voice information is complete, when the first voice information is incomplete, the first operation instruction input by the user is acquired, and then the additional information of the first voice information is acquired, so that the situation that the added additional voice content leads to unclear semantics under the condition that the first voice information is complete is avoided, misleading or puzzling is caused to the opposite side terminal user receiving the first voice information, and the user experience is improved.

In one embodiment, as shown in fig. 1d, the method further comprises step 105:

in step 105, when the first voice message is incomplete, a prompt message for prompting the user that the first voice message is incomplete is displayed.

For example, after the user finishes sending the first voice message, the user may not know whether the first voice message is completely sent, and when the terminal detects that the first voice message is incomplete, prompt information may be displayed on the display screen, for example, a word "the first voice message is incomplete" is displayed on the display screen, and the user is prompted to input additional voice content of the first voice message in time.

In the technical scheme provided by the embodiment of the disclosure, when the terminal determines that the first voice information is incomplete, the prompt information can be displayed on the screen, the prompt information is used for prompting that the first voice information input by the user is incomplete, so that the user can timely indicate the terminal to add the additional voice content of the first voice information, the integrity of the voice information sent to the opposite side terminal is ensured, misleading or puzzlement caused by the incomplete voice information to the opposite side terminal user is avoided, and the user experience is improved.

In one embodiment, as shown in fig. 1e, the method further comprises step 106:

in step 106, the additional voice content is stored in the storage address corresponding to the first voice information.

For example, the additional voice content of the first voice information included in the first voice information and the second voice information may be sequentially stored in the storage address corresponding to the first voice information according to the acquired time sequence. When the terminal transmits the voice information, the voice information can be sequentially transmitted according to the stored sequence, namely, the voice information stored firstly is transmitted firstly, and then the voice information stored later is transmitted.

Alternatively, in order to save the memory of the terminal, when the first voice message is already transmitted, the additional voice content of the first voice message included in the second voice message may be used to overwrite the first voice message.

Accordingly, the opposite terminal, upon receiving the additional voice content transmitted by the terminal (via the server), stores the acquired additional voice content in the same storage address as the first voice information that has been acquired. On the interface of the opposite terminal, it may appear that the first voice information and the additional voice content correspond to the same voice icon (e.g., voice bubble). In fact, in this case, for the opposite end user, the first voice information and the additional voice information have merged into one piece of information (or, the additional voice content replaces the voice content contained in the first voice information).

The above embodiments are equally applicable to the solutions shown in fig. 1c and 1 d.

In the technical scheme provided by the embodiment of the disclosure, the second voice information including the additional voice content of the first voice information is stored in the storage address corresponding to the first voice information, so that the relevance between the first voice information and the second voice information is ensured, and the user can conveniently read the complete voice information.

Fig. 2a is a flowchart illustrating a voice communication method according to an exemplary embodiment, and as shown in fig. 2a, the voice communication method includes the following steps 201 to 203:

in step 201, third voice information sent by the first terminal is received.

For example, the server may receive the third voice information sent by the first terminal through the internet, and may also receive the third voice information sent by the first terminal through the cellular data network, which is not limited in this disclosure.

In step 202, it is detected whether the third voice information carries an additional identifier, where the additional identifier is used to indicate that the voice content included in the third voice information is an additional voice content.

For example, after receiving the third voice information, the terminal may parse the third voice information to determine whether the third voice information includes the additional identifier. The additional identifier may be set in an extension field of a header of the third voice information, and the additional identifier is added to the third voice information where the additional voice content is located by writing a preset number in the additional identifier field. For example, the header field of regular voice information (excluding additional voice content) is the number 0; the header field of the voice information including the additional voice content is the number 1. When the server analyzes that the header field of the third voice message is number 1, it is determined that the third voice message carries the additional identifier, that is, it is determined that the voice content included in the third voice message is the additional voice content.

It should be noted that, the carrying manner of the additional identifier includes, but is not limited to, the above manner, and the manners that can be used to identify a voice message as the additional voice content are within the protection scope of the present disclosure, and the present disclosure is not particularly limited thereto.

In step 203, if the third voice message carries the additional identifier, the voice content included in the third voice message is sent to the second terminal as the additional voice content of the fourth voice message, and the fourth voice message is the voice message sent by the first terminal to the second terminal before the third voice message is sent.

For example, if the third voice information carries the additional identifier, the server may send the additional voice content to the second terminal, so that the additional voice content is attached to the voice content of the fourth voice information.

The voice content included in the third voice information is used as the additional voice content of the fourth voice information, which includes but is not limited to: (1) replacing the voice content included in the fourth voice message with the voice content included in the third voice message (i.e., the additional voice content), i.e., withdrawing the fourth voice message; optionally, the additional voice content is stored in the storage address where the original fourth voice information is located; (2) after the voice content included in the third voice message is added to the voice content included in the fourth voice message, the third voice message and the fourth voice message correspond to the same voice icon (e.g., a voice bubble), that is, are stored in the same storage address (e.g., the storage address where the fourth voice message is located), and the newly formed voice content includes the voice content included in the third voice message and the voice content included in the fourth voice message; (3) after the voice content included in the third voice message is deployed in the voice content included in the fourth voice message, the third voice message and the fourth voice message correspond to different voice icons, namely, correspond to different storage addresses and are stored in the different storage addresses; a connection may be established between the two voice icons, such as a dashed connection, to give an end user a prompt, etc.

Optionally, if the third voice information carries the additional identifier, the server may withdraw the fourth voice information sent to the second terminal, then obtain the voice content of the fourth voice information, add the additional voice content of the third voice information to the voice content of the fourth voice information, package the additional voice content of the third voice information to generate fifth voice information, and send the fifth voice information to the second terminal. The embodiment can ensure the consistency of voice information and improve the listening effect of users.

Or, if the third voice information carries the additional identifier, the server may also directly obtain the voice content of the fourth voice information from the local storage without withdrawing the fourth voice information, and then add the additional voice content of the third voice information to the voice content of the fourth voice information, package the additional voice content of the third voice information to generate fifth voice information, and send the fifth voice information to the second terminal.

Or, if the third voice information carries the additional identifier, the server may directly send the additional voice content of the third voice information to the second terminal without withdrawing the fourth voice information, and instruct the second terminal to add the additional voice content to the voice content of the fourth voice information.

According to the technical scheme, the server can send the voice content of the third voice information to the second terminal as the additional voice content of the fourth voice information, the integrity of the voice information sent to the second terminal is guaranteed, misdirection or trouble caused to a second terminal user due to incomplete voice information is avoided, and user experience is improved.

In an embodiment, as shown in fig. 2b, in step 203, if the third voice information carries an additional identifier, the voice content included in the third voice information is sent to the second terminal as an additional voice content of the fourth voice information, which may be implemented by step 2031 and step 2032:

in step 2031, if the third voice information carries the additional identifier, it is detected whether the third voice information carries a second terminal identifier, where the second terminal identifier is used to uniquely identify the second terminal.

In step 2032, if the third voice information carries the second terminal identifier, the voice content included in the third voice information is sent to the second terminal as the additional voice content of the fourth voice information.

Illustratively, the second terminal identifier in this embodiment is used to indicate that the third voice information includes voice content for sending to the second terminal. For example, the server attaches the received additional voice content to the last piece of information received before by default (that is, the fourth voice information described above), and when it is detected that the third voice information carries the additional identifier and the second terminal identifier, the server sends the voice content included in the third voice information to the second terminal and attaches the voice content to the last piece of voice information received by the second terminal.

In the embodiment, the server can determine which terminal the additional voice content is sent to through the terminal identification carried by the voice information, so that the accuracy of judgment of the sending object is ensured, and the server is prevented from misjudging.

In an embodiment, as shown in fig. 2c, in step 203, if the third voice information carries an additional identifier, the voice content included in the third voice information is sent to the second terminal as an additional voice content of the fourth voice information, which may be implemented by step 2033 and step 2034:

in step 2033, if the third voice information carries an additional identifier, it is detected whether the third voice information carries a fourth voice information identifier, where the fourth voice information identifier is used to uniquely identify the fourth voice information.

In step 2034, if the third voice information carries the fourth voice information identifier, the voice content included in the third voice information is sent to the second terminal as the additional voice content of the fourth voice information.

Illustratively, the fourth voice information identifier described in this embodiment is used to indicate that the third voice information includes voice content for attaching to the fourth voice information. For example, the server defaults to send the voice group to all terminals (or default terminals) that have sent messages within a preset time period, and when it is detected that the third voice information carries the additional identifier and the fourth voice information identifier, sends the additional voice content group included in the third voice information to all terminals (or default terminals) and attaches the additional voice content group to the fourth voice information of the corresponding terminal.

According to the technical scheme provided by the embodiment of the disclosure, the server can determine the voice information to which the additional voice content is attached through the voice information identifier carried by the voice information, so that the accuracy of judging the voice additional object is ensured, and the server is prevented from misjudging.

In one embodiment, when the server withdraws the fourth voice message, it may first determine whether read feedback information of the fourth voice message sent by the second terminal is received, and if the read feedback information of the fourth voice message sent by the second terminal is not received, withdraw the fourth voice message.

For example, after the user clicks to listen to the voice message, the terminal may send read feedback information of the voice message to the server to inform the server that the voice message has been read. The embodiment supports the withdrawal of the voice information under the condition that the voice information is not read by the user, fully considers the feeling of the user at the information receiving end, saves the time for the user to read the information and improves the reading efficiency.

In addition, optionally, when the server determines that the third voice message includes the additional voice content of the fourth voice message, if the read feedback information of the fourth voice message sent by the second terminal has been received, the server may send incomplete prompt information to the second terminal, where the incomplete prompt information is used to prompt the user that the fourth voice message is incomplete voice message, and then the server may send the processed complete voice content to the second terminal.

The embodiment supports the withdrawal of the voice information under the condition that the voice information is not read by the user, fully considers the feeling of the user at the information receiving end, saves the time for the user to read the information and improves the reading efficiency.

Fig. 3 is a flowchart illustrating a voice communication method according to an exemplary embodiment, and as shown in fig. 3, the voice communication method includes the following steps 301 to 302:

in step 301, a voice attachment request transmitted by a server is received.

The voice additional request is used for requesting that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is the voice information received before the third voice information is received, the voice additional request carries the third voice information, an additional identifier and a fourth voice information identifier, the additional identifier is used for indicating that the voice content included in the third voice information is the additional voice content, and the fourth voice information identifier is used for uniquely identifying the fourth voice information.

For example, after receiving a voice addition request sent by a server, a terminal first parses the voice addition request, obtains information included in the voice addition request, and then determines, according to an additional identifier and a fourth voice information identifier included in the voice addition request, that a voice content included in third voice information included in the voice addition request is an additional voice content of fourth voice information. At this time, the terminal may store the voice content included in the third voice information to the storage address where the fourth voice information is located.

In step 302, according to the voice addition request, the voice content included in the third voice information is used as the additional voice content of the fourth voice information.

For example, after the terminal determines that the voice content included in the third voice message included in the voice addition request is the additional voice content of the fourth voice message, if the user does not read the fourth voice message, the terminal may combine the voice content of the fourth voice message and the voice content included in the third voice message into a complete voice message, and display the complete voice message on the screen of the terminal through the voice icon; if the user has read the fourth voice message, the terminal can combine the voice content of the fourth voice message and the voice content included in the third voice message into a complete voice message, and then display the voice message on the screen of the terminal through the original voice icon of the fourth voice message, and mark the voice icon as unread; or after combining the voice content of the fourth voice message and the voice content included in the third voice message into a complete voice message, displaying the complete voice message on the screen of the terminal through the new voice icon, and simultaneously displaying the relationship between the new voice icon and the voice icon of the fourth voice message, for example, displaying a dotted line connecting the new voice icon and the voice icon of the fourth voice message, so that the user can conveniently know the relationship between the two voice messages.

In the technical scheme provided by the embodiment of the disclosure, when the terminal determines that the received voice information is the additional voice information of another voice information, the additional voice content is added to the incomplete voice information and then fed back to the user, so that the problems of misleading or puzzlement to the terminal user and the like caused by incomplete voice information received by the terminal due to network environment or human reasons and the like are avoided, and the user experience is improved.

The implementation is described in detail below by way of several embodiments.

Fig. 4 is an interaction diagram of a voice communication method according to an exemplary embodiment, which is applicable to a system consisting of a terminal and a server, where the terminal may be a mobile phone, a tablet computer, a smart watch, or other devices capable of performing voice communication; the server may be a server provided by an operator or a server provided by a third-party platform, which is not limited in this disclosure. As shown in fig. 4, the voice communication method includes the following steps 401 to 409:

in step 401, a first terminal acquires a first operation instruction input by a user.

In step 402, the first terminal acquires additional voice content of the first voice information in response to a first operation instruction input by a user.

In step 403, the first terminal generates second voice information according to the additional voice content and the additional identifier.

In step 404, the first terminal stores the second voice message at the storage address corresponding to the first voice message.

In step 405, the first terminal sends the second voice information to the server.

In step 406, the server determines whether the voice content included in the second voice message is the additional voice content of the first voice message according to the additional identifier included in the second voice message.

In step 407, when the server determines that the second voice message includes additional voice content whose voice content is the first voice message, the server withdraws the first voice message transmitted to the second terminal.

When the second voice information includes additional voice content whose voice content is not the first voice information, the server transmits the second voice information to the second terminal.

In step 408, the server generates third voice information by appending the additional voice content of the second voice information to the voice content of the first voice information.

In step 409, the server transmits the third voice information to the second terminal.

In the embodiment of the disclosure, the terminal may add the additional voice content on the basis of the voice information that has been sent according to the user instruction, and by sending the additional voice content carrying the additional identifier to the server, the server may be instructed to send the additional voice content to the opposite terminal. By the embodiment, when the terminal sends the voice information to the opposite terminal, the situation that the voice received by the opposite terminal is incomplete due to network environment or human reasons and the like is effectively avoided; and moreover, the opportunity of modifying the sent voice information is provided for the terminal user, and the user experience is favorably improved.

Fig. 5 is an interaction diagram of a voice communication method according to an exemplary embodiment, which is applicable to a system consisting of a terminal and a server, where the terminal may be a mobile phone, a tablet computer, a smart watch, or other devices capable of performing voice communication; the server may be a server provided by an operator or a server provided by a third-party platform, which is not limited in this disclosure. As shown in fig. 5, the voice communication method includes the following steps 501 to 515:

in step 501, the first terminal generates first voice information according to the received voice content.

In step 502, the first terminal sends the first voice message to the server.

In step 503, the server forwards the first voice information to the second terminal.

In step 504, the first terminal determines whether a first duration occupied by the valid voice content in the first voice message is less than or equal to a preset proportion of a second duration, where the second duration is a time when the microphone is turned on when the first voice message is received.

In step 505, if the first duration occupied by the valid voice content in the first voice message is less than or equal to the preset ratio of the second duration, the first terminal displays the prompt message.

If the first duration occupied by the effective voice content in the first voice message is less than or equal to the preset proportion of the second duration, the first terminal determines that the first voice message is incomplete.

And if the first duration occupied by the effective voice content in the first voice message is greater than the preset proportion of the second duration, the first terminal confirms that the first voice message is complete.

In step 506, the first terminal obtains a first operation instruction input by a user.

In step 507, the first terminal acquires the additional voice content of the first voice information in response to the first operation instruction.

In step 508, the first terminal generates second voice information according to the additional voice content, the second terminal identifier and the additional identifier.

In step 509, the first terminal stores the second voice message at the storage address corresponding to the first voice message.

In step 510, the first terminal sends the second voice information to the server.

In step 511, the server determines whether the second voice information includes the additional identity and the second terminal identity.

In step 512, if the second voice message includes the additional identifier and the second terminal identifier, the server determines whether the read feedback information of the first voice message sent by the second terminal is received.

In step 513, if the read feedback information of the first voice message sent by the second terminal is not received, the server withdraws the first voice message sent to the second terminal.

In step 514, the server adds the additional voice content of the second voice information to the voice content of the first voice information, and generates third voice information.

In step 515, the server transmits the third voice information to the second terminal.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.

Fig. 6a is a schematic diagram illustrating a structure of a voice communication apparatus 60 according to an exemplary embodiment, where the apparatus 60 may be implemented as part of or all of an electronic device through software, hardware or a combination of both. As shown in fig. 6a, the voice communication apparatus 60 includes a first obtaining module 601, a second obtaining module 602, a generating module 603 and a first sending module 604.

The first obtaining module 601 is configured to obtain a first operation instruction, where the first operation instruction is used to instruct to obtain additional voice content of first voice information;

a second obtaining module 602, configured to obtain additional voice content of the first voice message according to the first operation instruction;

a generating module 603, configured to generate second voice information according to the additional voice content and an additional identifier, where the additional identifier is used to indicate that the voice content included in the second voice information is an additional voice content;

a first sending module 604, configured to send the second voice information to a server.

In one embodiment, as shown in fig. 6b, the first obtaining module 601 includes a determining submodule 6011 and a first obtaining submodule 6012.

The determining submodule 6011 is configured to determine whether the first voice information is complete.

The first obtaining submodule 6012 is configured to obtain the first operation instruction when the first voice information is incomplete.

In one embodiment, as shown in fig. 6c, the apparatus 60 further comprises a prompting module 605.

The prompting module 605 is configured to display a prompting message when the first voice message is incomplete, where the prompting message is used to prompt a user that the first voice message is incomplete.

In one embodiment, as shown in fig. 6d, the determining submodule 6011 includes a first determining unit 6011a and a second determining unit 6011 b.

The first determining unit 6011a is configured to determine whether a first time duration occupied by an effective voice content in the first voice information is less than or equal to a preset ratio of a second time duration, where the second time duration is a time duration for turning on a microphone when the first voice information is received.

A second determining unit 6011b, configured to determine that the first voice information is incomplete when a first duration occupied by the valid voice content in the first voice information is less than or equal to a preset ratio of the second duration.

The above embodiments are equally applicable to the voice communication apparatus 60 shown in fig. 6 b.

In one embodiment, as shown in fig. 6e, the apparatus 60 further comprises a storage module 606.

The storage module 606 is configured to store the additional voice content at a storage address corresponding to the first voice information.

The above embodiments are equally applicable to the voice communication apparatus 60 shown in fig. 6b, 6c and 6 d.

In one embodiment, as shown in fig. 6f, the first obtaining module 601 includes:

the second obtaining sub-module 6013 is configured to obtain the first operation instruction through a voice icon corresponding to the first voice information on a user interface.

Embodiments of the present disclosure provide a voice communication apparatus that can add additional voice content on the basis of voice information that has been transmitted according to a user instruction, and can instruct a server to transmit the additional voice content to an opposite-side apparatus by transmitting the additional voice content carrying an additional identification to the server. By the embodiment, when the device sends the voice information to the opposite side device, the situation that the voice received by the opposite side device is incomplete due to network environment or human reasons and the like is effectively avoided; and moreover, the opportunity of modifying the sent voice information is provided for the device user, and the user experience is favorably improved.

Fig. 7a is a schematic structural diagram illustrating a voice communication apparatus 70 according to an exemplary embodiment, where the apparatus 70 may be implemented as part of or all of an electronic device through software, hardware or a combination of both. As shown in fig. 7a, the voice communication apparatus 70 includes a first receiving module 701, a detecting module 702 and a second sending module 703.

The first receiving module 701 is configured to receive third voice information sent by the first terminal.

A detecting module 702, configured to detect whether the third voice information carries an additional identifier, where the additional identifier is used to indicate that a voice content included in the third voice information is an additional voice content.

A second sending module 703, configured to send, when the third voice information carries the additional identifier, the voice content included in the third voice information to a second terminal as an additional voice content of fourth voice information, where the fourth voice information is the voice information sent by the first terminal to the second terminal before sending the third voice information.

In one embodiment, as shown in fig. 7b, the second sending module 703 includes a first detecting sub-module 7031 and a first sending sub-module 7032.

The first detecting sub-module 7031 is configured to detect whether the third voice information carries a second terminal identifier when the third voice information carries the additional identifier, where the second terminal identifier is used to uniquely identify the second terminal.

A first sending sub-module 7032, configured to send, when the third voice information carries the second terminal identifier, the voice content included in the third voice information as the additional voice content of the fourth voice information to the second terminal.

In one embodiment, as shown in fig. 7c, the second sending module 703 further includes a second detecting sub-module 7033 and a second sending sub-module 7034.

The second detecting submodule 7033 is configured to detect whether the third voice information carries a fourth voice information identifier when the third voice information carries the additional identifier, where the fourth voice information identifier is used to uniquely identify the fourth voice information.

A second sending sub-module 7034, configured to send, when the third voice information carries the fourth voice information identifier, the voice content included in the third voice information as an additional voice content of the fourth voice information to the second terminal.

The above embodiments are equally applicable to the voice communication apparatus 70 shown in fig. 7 b.

In one embodiment, as shown in fig. 7d, the second sending module 703 includes a first indication sub-module 7035.

The first instruction submodule 7035 is configured to instruct the second terminal to store the voice content included in the third voice information in the storage address where the fourth voice information is located, and withdraw the fourth voice information;

alternatively, as shown in fig. 7e, the second sending module 703 includes a second indicating sub-module 7036.

The second indicating sub-module 7036 is configured to instruct the second terminal to add the voice content included in the third voice information to the voice content included in the fourth voice information.

The above embodiments are equally applicable to the voice communication apparatus 70 shown in fig. 7b and 7 c.

In one embodiment, as shown in fig. 7f, the first indication submodule 7035 comprises a third determining unit 7035a and a revoking unit 7035 b.

Wherein, the third determining unit 7035a is configured to determine whether read feedback information of the fourth voice information sent by the second terminal is received.

A withdrawing unit 7035b, configured to withdraw the fourth voice message when the read feedback information of the fourth voice message sent by the second terminal is not received.

The embodiment of the disclosure provides a voice communication device, which can send the voice content of one voice message to a terminal as the additional voice content of another voice message, so as to ensure the integrity of the voice message received by the terminal, avoid the problem that the voice message received by the terminal is incomplete due to network environment or human reasons, etc., and cause misleading or puzzling to a terminal user, and improve user experience.

Fig. 8 is a schematic structural diagram of a voice communication apparatus 80 according to an exemplary embodiment, where the apparatus 80 may be implemented as part or all of an electronic device through software, hardware or a combination of the two. As shown in fig. 8, the voice communication apparatus 80 includes a second receiving module 801 and a processing module 802.

A second receiving module 801, configured to receive a voice additional request sent by a server, where the voice additional request is used to request that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is voice information received before the third voice information is received, the voice additional request carries the third voice information, an additional identifier, and a fourth voice information identifier, the additional identifier is used to indicate that the voice content included in the third voice information is an additional voice content, and the fourth voice information identifier is used to uniquely identify the fourth voice information.

A processing module 802, configured to use, according to the voice additional request, the voice content included in the third voice information as the additional voice content of the fourth voice information.

The embodiment of the disclosure provides a voice communication device, which can add additional voice content to a user after incomplete voice information when determining that received voice information is additional voice information of another voice information, thereby avoiding the problem of misleading or puzzling to a terminal user due to incomplete voice information received by a terminal caused by network environment or human reasons, and the like, and improving user experience.

The disclosed embodiment provides a system for voice communication, the system includes:

any of the voice communication devices 60 shown in fig. 6a to 6f described above, any of the voice communication devices 70 shown in fig. 7a to 7f described above, and any of the voice communication devices 80 shown in fig. 8.

The disclosed embodiment provides a voice communication apparatus, which includes:

a first processor;

a first memory for storing first processor-executable instructions;

wherein the first processor is configured to:

and sending the second voice information to a server.

a second processor;

a second memory for storing second processor-executable instructions;

wherein the second processor is configured to:

receiving third voice information sent by the first terminal;

a third processor;

a third memory for storing third processor-executable instructions;

wherein the third processor is configured to:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating a structure of a voice communication apparatus 90, which is suitable for a terminal device, according to an exemplary embodiment. For example, the apparatus 90 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

The apparatus 90 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916.

The processing component 902 generally controls overall operation of the device 90, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the apparatus 90. Examples of such data include instructions for any application or method operating on the device 90, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 906 provides power to the various components of the device 90. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 90.

The multimedia component 908 comprises a screen providing an output interface between the device 90 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 90 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, the audio component 910 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 90 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 914 includes one or more sensors for providing various aspects of status assessment for the device 90. For example, the sensor assembly 914 may detect the open/closed status of the device 90, the relative positioning of the components, such as the display and keypad of the device 90, the sensor assembly 914 may also detect a change in the position of the device 90 or a component of the device 90, the presence or absence of user contact with the device 90, the orientation or acceleration/deceleration of the device 90, and a change in the temperature of the device 90. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate wired or wireless communication between the apparatus 90 and other devices. The device 90 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 90 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the apparatus 90 to perform the method of fig. 1a, 1 c-1 e, or 3, described above, is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 10 is a block diagram illustrating an apparatus for voice communication 100 according to an example embodiment. For example, the apparatus 100 may be provided as a server. The apparatus 100 includes a processing component 1002 that further includes one or more processors, and memory resources, represented by memory 1003, for storing instructions, such as applications, that are executable by the processing component 1002. The application programs stored in memory 1003 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1002 is configured to execute instructions to perform the methods illustrated in fig. 2a, 2b and 2c described above.

The device 100 may also include a power component 1006 configured to perform power management of the device 100, a wired or wireless network interface 1005 configured to connect the device 100 to a network, and an input/output (I/O) interface 1008. The apparatus 100 may operate based on an operating system stored in the memory 1003, such as Windows Server, Mac OSXTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of voice communication, comprising:

sending the second voice information to a server;

wherein the obtaining the first operation instruction comprises:

determining whether the first voice information is complete;

and when the first voice information is incomplete, acquiring the first operation instruction.

2. The method of claim 1, further comprising:

and when the first voice information is incomplete, displaying prompt information, wherein the prompt information is used for prompting a user that the first voice information is incomplete.

3. The method of claim 1, wherein the determining whether the first speech information is complete comprises:

4. The method according to any one of claims 1 to 3, further comprising:

and storing the additional voice content at a storage address corresponding to the first voice information.

5. The method of claim 1, wherein the obtaining the first operation instruction comprises:

and acquiring the first operation instruction through a voice icon corresponding to the first voice information on a user interface.

6. A method of voice communication, comprising:

receiving third voice information sent by a first terminal, wherein the third voice information is sent by the first terminal when the first terminal detects that fourth voice information is incomplete, and the fourth voice information is the voice information sent by the first terminal to a second terminal before the third voice information is sent;

and if the third voice information carries the additional identifier, sending the voice content included in the third voice information to the second terminal as the additional voice content of the fourth voice information.

7. The method of claim 6, wherein if the third voice message carries the additional identifier, sending the voice content included in the third voice message to the second terminal as the additional voice content of the fourth voice message comprises:

if the third voice information carries the additional identifier, detecting whether the third voice information carries a second terminal identifier, wherein the second terminal identifier is used for uniquely identifying the second terminal;

and if the third voice information carries the second terminal identification, sending the voice content included in the third voice information to the second terminal as the additional voice content of the fourth voice information.

8. The method of claim 6, wherein if the third voice message carries the additional identifier, sending the voice content included in the third voice message to the second terminal as the additional voice content of the fourth voice message comprises:

if the third voice information carries the additional identifier, detecting whether the third voice information carries a fourth voice information identifier, wherein the fourth voice information identifier is used for uniquely identifying the fourth voice information;

9. The method according to any one of claims 6 to 8, wherein the sending the voice content included in the third voice information to the second terminal as the additional voice content of the fourth voice information comprises:

instructing the second terminal to store the voice content included in the third voice message to the storage address of the fourth voice message, and withdrawing the fourth voice message; or

10. The method of claim 9, wherein said revoking the fourth voice information comprises:

determining whether read feedback information of the fourth voice information sent by the second terminal is received;

11. A method of voice communication, comprising:

receiving a voice additional request sent by a server, wherein the voice additional request is used for requesting that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is the voice information received before the third voice information is received, the third voice information is the voice information received when the fourth voice information is incomplete, the voice additional request carries the third voice information, an additional identifier and a fourth voice information identifier, the additional identifier is used for indicating that the voice content included in the third voice information is the additional voice content, and the fourth voice information identifier is used for uniquely identifying the fourth voice information;

12. A voice communication apparatus, comprising:

the first sending module is used for sending the second voice information to a server;

wherein the first obtaining module comprises:

13. The apparatus of claim 12, further comprising:

14. The apparatus of claim 12, wherein the determination submodule comprises:

15. The apparatus of any one of claims 12 to 14, further comprising:

16. The apparatus of claim 12, wherein the first obtaining module comprises:

17. A voice communication apparatus, comprising:

the first receiving module is used for receiving third voice information sent by a first terminal, wherein the third voice information is sent by the first terminal when the first terminal detects that fourth voice information is incomplete, and the fourth voice information is the voice information sent by the first terminal to a second terminal before the third voice information is sent;

and a second sending module, configured to send, when the third voice information carries the additional identifier, a voice content included in the third voice information to the second terminal as an additional voice content of the fourth voice information.

18. The apparatus of claim 17, wherein the second sending module comprises:

19. The apparatus of claim 17, wherein the second sending module further comprises:

20. The apparatus according to any one of claims 17 to 19, wherein the second sending module comprises:

21. The apparatus of claim 20, wherein the first indication submodule comprises:

22. A voice communication apparatus, comprising:

a second receiving module, configured to receive a voice additional request sent by a server, where the voice additional request is used to request that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is voice information received before the third voice information is received, the third voice information is voice information received when the fourth voice information is incomplete, the voice additional request carries the third voice information, an additional identifier, and a fourth voice information identifier, the additional identifier is used to indicate that the voice content included in the third voice information is an additional voice content, and the fourth voice information identifier is used to uniquely identify the fourth voice information;

23. A voice communication system, comprising: the voice communication device of any one of claims 12 to 16, the voice communication device of any one of claims 17 to 21, and the voice communication device of claim 22.

24. A voice communication apparatus, comprising:

a first processor;

a first memory for storing first processor-executable instructions;

wherein the first processor is configured to:

sending the second voice information to a server;

wherein the obtaining the first operation instruction comprises:

determining whether the first voice information is complete;

25. A voice communication apparatus, comprising:

a second processor;

a second memory for storing second processor-executable instructions;

wherein the second processor is configured to:

26. A voice communication apparatus, comprising:

a third processor;

a third memory for storing third processor-executable instructions;

wherein the third processor is configured to:

receiving a voice additional request sent by a server, wherein the voice additional request is used for requesting that a voice content included in third voice information is used as an additional voice content of fourth voice information, the fourth voice information is the voice information received before the third voice information is received, the third voice information is the voice information received when the fourth voice information is incomplete, the voice additional request carries the third voice information, an additional voice identifier and a fourth voice information identifier, the additional voice identifier is used for indicating that the voice content included in the third voice information is the additional voice content, and the fourth voice information identifier is used for uniquely identifying the fourth voice information;