US20190018645A1 - Systems and methods for automatically generating enhanced audio output - Google Patents
Systems and methods for automatically generating enhanced audio output Download PDFInfo
- Publication number
- US20190018645A1 US20190018645A1 US16/137,901 US201816137901A US2019018645A1 US 20190018645 A1 US20190018645 A1 US 20190018645A1 US 201816137901 A US201816137901 A US 201816137901A US 2019018645 A1 US2019018645 A1 US 2019018645A1
- Authority
- US
- United States
- Prior art keywords
- audio recording
- frequency range
- power level
- act
- automatically
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 59
- 230000006835 compression Effects 0.000 claims abstract description 88
- 238000007906 compression Methods 0.000 claims abstract description 88
- 230000003595 spectral effect Effects 0.000 claims abstract description 53
- 238000003860 storage Methods 0.000 claims description 21
- 230000002708 enhancing effect Effects 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 abstract description 8
- 238000004458 analytical method Methods 0.000 abstract description 2
- 238000004519 manufacturing process Methods 0.000 description 24
- 230000008569 process Effects 0.000 description 22
- 230000015654 memory Effects 0.000 description 14
- 230000003287 optical effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000003247 decreasing effect Effects 0.000 description 5
- 230000002238 attenuated effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000006855 networking Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 235000019640 taste Nutrition 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G5/00—Tone control or bandwidth control in amplifiers
- H03G5/005—Tone control or bandwidth control in amplifiers of digital signals
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G5/00—Tone control or bandwidth control in amplifiers
- H03G5/16—Automatic control
- H03G5/165—Equalizers; Volume or gain control in limited frequency bands
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G9/00—Combinations of two or more types of control, e.g. gain control and tone control
- H03G9/005—Combinations of two or more types of control, e.g. gain control and tone control of digital or coded signals
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G9/00—Combinations of two or more types of control, e.g. gain control and tone control
- H03G9/02—Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers
- H03G9/025—Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers frequency-dependent volume compression or expansion, e.g. multiple-band systems
Definitions
- Audio production tools exist that enable users to produce high-quality audio.
- some audio production tools enable users to record sound produced by one or more sound sources (e.g., vocals and/or speech captured by a microphone, music played with an instrument, etc.), process the audio (e.g., to master, mix, design, and/or otherwise manipulate the audio), and/or control its playback.
- Audio production tools may be used to produce audio comprising music, speech, sound effects, and/or other sounds.
- Some computer-implemented audio production tools provide a graphical user interface with which users may complete various production tasks on an audio recording. For example, some tools may receive audio input and generate one or more digital representations of the input, which a user may manipulate using the graphical user interface to obtain audio output having desired characteristics.
- a user may employ an audio production tool to perform any of numerous production tasks.
- many audio production tools enable a user to perform sound equalization, which is a technique used to alter a sound recording by applying filters to sound in one or more frequency ranges, so as to boost or attenuate spectral portions of a recording.
- Many audio production tools also enable users to perform sound compression, which is a technique for attenuating loud sounds so that other sounds are more easily perceived by a listener.
- FIG. 1 is a flowchart illustrating a representative audio production process, in accordance with some embodiments of the invention
- FIGS. 2A-2C are plots depicting representative predefined templates, in accordance with some embodiments of the invention.
- FIG. 3 is a tree diagram depicting a representative hierarchy of predefined templates, presets and modes, according to some embodiments of the invention.
- FIG. 4A is a flowchart depicting a representative process for automatically equalizing sound in an audio recording, according to some embodiments of the invention.
- FIG. 4B is a plot depicting representative output of audio equalization, according to some embodiments of the invention.
- FIG. 5 is a plot depicting representative functions for audio compression, according to some embodiments of the invention.
- FIG. 6 is a plot depicting representative cross-overs and frequency bands, according to some embodiments of the invention.
- FIG. 7 is a block diagram illustrating a representative computer system which may be used to implement certain aspects of the invention.
- Some conventional audio production tools are capable of automatically recognizing the source of sound included in an audio track. For example, techniques are known for automatically recognizing whether the sound included in a track was produced by a particular instrument, by vocals, and/or by one or more other sources.
- Some conventional audio production tools may also be capable of applying a predefined “template” of audio production settings based upon a recognized sound source. These settings may, for example, define sound equalization parameters (e.g., the application of one or more digital filters to boost or attenuate sound in certain frequency ranges) to produce audio output which is generally considered to be pleasing to a listener. For example, some conventional audio production tools may apply one collection of settings to a track that is recognized as including sound produced by a guitar, another collection of settings to a track recognized as including sound produced by drums, another collection of settings to a track that is recognized as including sound produced by vocals, and so forth.
- sound equalization parameters e.g., the application of one or more digital filters to boost or attenuate sound in certain frequency ranges
- some conventional audio production tools may apply one collection of settings to a track that is recognized as including sound produced by a guitar, another collection of settings to a track recognized as including sound produced by drums, another collection of settings to a track that is recognized as including sound produced by
- a user may have any of numerous goals for a track that includes sound from a particular source, and that the settings defined by a “one size fits all” predefined template may not serve those goals. For example, a user may wish to achieve a particular mood, sound quality and/or other characteristic for a guitar track that one or more settings in a predefined guitar template does not allow the user to achieve.
- some embodiments of the invention may enable a user to modify the manner in which one or more settings specified by a predefined template for a particular sound source are applied. For example, some embodiments may enable a user to specify that the amplitude at which one or more digital filters specified by a predefined template is applied is varied, by selecting a “preset” for the template.
- some embodiments may enable the user to define the extent to which the amplitude for one or more digital filters is varied, by selecting a “mode” for a preset.
- mode for a preset.
- the user may have greater control over the settings which are applied to a track than conventional audio production tools afford. The user may therefore benefit from the time savings that come with having access to a collection of predefined settings for a particular sound source, without being restricted to a “one size fits all” scheme for the sound source.
- some embodiments of the invention may allow the user to switch from using one collection of settings (e.g., defined by a predefined template, as modified per the user's selection of a preset and/or mode) to another, so that he/she may “try out” different settings before deciding on a particular collection that allows him/her to achieve the goals he/she has in mind for a track.
- settings e.g., defined by a predefined template, as modified per the user's selection of a preset and/or mode
- Some embodiments of the invention are directed to reducing the amount of time and effort a user expends, and/or enabling the user to produce higher quality music than he/she may have been capable of producing on his/her own.
- some embodiments may automatically determine one or more settings for a track based at least in part upon an analysis of the spectral and/or dynamic content of the track.
- the settings which are automatically determined may take any of numerous forms.
- some embodiments may automatically perform sound equalization by applying one or more digital filters to a track, and/or defining the frequency range(s) in which the filter(s) are applied.
- Some embodiments may automatically apply dynamic range compression to a track, so as to attenuate loud sounds in the track, without diminishing the track's overall character. Some embodiments may automatically define the manner in which compression is applied in multiple sub-bands of the audible spectrum, such as by intelligently positioning “cross-overs” between the sub-bands so as to promote overall sound quality. Any suitable setting(s) may be automatically determined based at least in part upon the spectral content of a track, as the invention is not limited in this respect.
- the settings for a track may be automatically determined in any suitable way.
- one or more heuristics, algorithms, and/or other processing technique(s) may be used to determine how various spectral characteristics of a track may influence the settings for the track.
- settings may be automatically determined to a track so as to achieve any of numerous (e.g., artistic) goals.
- some embodiments may automatically determine certain settings to bring to the forefront certain elements of the natural character of the sound in a track, to enhance the track's overall balance and/or clarity, and/or achieve any of numerous other objectives.
- some embodiments of the invention may enable a user to modify any of the settings which are automatically applied to a track. As such, some embodiments of the invention may enable the user to reap the benefits of increased audio production efficiency and/or enhanced audio quality, to the extent he/she deems appropriate, while still producing audio output that suits his/her unique tastes and objectives.
- FIG. 1 depicts a representative process 100 for automatically generating enhanced audio.
- representative process 100 may be performed via execution of software, by a system which includes an audio recording system, digital audio workstation, personal computer, and/or portable device (e.g., a tablet, smartphone, gaming console, and/or other suitable portable device) that presents a graphical user interface which a user may employ to invoke certain functions.
- portable device e.g., a tablet, smartphone, gaming console, and/or other suitable portable device
- representative process 100 is not limited to being performed via execution of software, by any particular component(s), and that embodiments of the invention may be implemented using any suitable combination of hardware and/or software.
- Representative process 100 begins at act 102 , wherein one or more tracks are received.
- Each track received in act 102 may, for example, include sound which is produced by a particular sound source, such as a musical instrument, microphone, computer system, and/or any other suitable source(s).
- a track need not include only sound from a single source, as any suitable number of sound sources may be represented in a particular track.
- act 102 comprises receiving multiple tracks, then the tracks may collectively form a multi-track recording.
- each track received in act 102 may comprise a digital representation of a time-delimited audio recording. As such, act 102 may comprise storing each track in computer-readable memory.
- each track received in act 102 is analyzed to identify the sound source(s) represented in the track.
- Those skilled in the art will recognize that any of numerous techniques, whether now known or later developed, may be used to identify the sound source(s) represented in a track. These identification techniques are not considered by the Assignee to be a part of the invention, and so they will not be described in further detail here.
- a predefined template of settings is selected for each track received in act 102 , based at least in part on the sound source(s) identified in act 104 .
- the settings which are specified by the selected predefined template may be varied based upon the user's selection of a “preset” and “mode” (described in further detail below), which are received in act 105 .
- acts 104 and 105 may occur concurrently or at different times.
- FIGS. 2A-2C show a two-dimensional plot in which frequency is represented on the x-axis and amplitude is represented on the y-axis.
- Each plot shows a series of nodes, with the placement of each node indicating the frequency and amplitude of a corresponding digital filter. The various nodes may therefore be considered the settings of a digital equalizer.
- FIG. 2A depicts a collection of equalizer settings predefined for a first sound source (i.e., nodes 212 1 , 212 2 , 212 3 , 212 4 , 212 5 , 212 6 , 212 7 , 212 8 , 212 9 , and 212 10 )
- FIG. 2B depicts a collection of settings predefined for a second sound source (i.e., nodes 222 1 , 222 2 , 222 3 , 222 4 , 222 5 , 222 6 , 222 7 , 222 8 , 222 9 , 222 10 , and 222 11 )
- FIGS. 2A-2C depict a collection of settings predefined for a third sound source (i.e., nodes 232 1 , 232 2 , 232 3 , 232 4 , 232 5 , 232 6 , 232 7 , 232 8 , and 232 9 ).
- a third sound source i.e., nodes 232 1 , 232 2 , 232 3 , 232 4 , 232 5 , 232 6 , 232 7 , 232 8 , and 232 9 .
- a predefined template is not limited to having the same number of settings as any of those shown, as a predefined template may include any suitable number of settings.
- the template represented in FIG. 2A for the first sound source boosts high frequencies and cuts low frequencies
- the template represented in FIG. 2B for the second sound source boosts high frequencies and cuts some middle frequencies
- the template shown in FIG. 2C for the third sound source boosts some middle frequencies and cuts high frequencies.
- a template that is predefined for a particular sound source may include settings that are designed to achieve any suitable frequency response. It can be seen in FIGS. 2A-2C that the line segments extending between the nodes in each template create a “shape” for the template.
- Some embodiments of the invention enable a user to modify the frequency response associated with a predefined template by specifying a “preset” which varies the amplitude at which one or more of the filters shown in FIGS. 2A-2C is applied.
- a preset may be established so as to modify a given predefined template in any suitable way.
- some embodiments may allow the user to select a “broadband clarity” preset to enhance sound clarity across the entire audible spectrum, a “warm and open” preset to make low frequencies more pronounced, an “upfront midrange” preset to make certain midrange frequencies more pronounced, and/or any other suitable preset. Any suitable number of presets may be made available to a user.
- a particular preset alters the frequency response of one predefined template may be different than the way that the same preset alters the frequency response of another predefined template. For example, selecting a “warm and open” preset may cause the amplitude at which three digital filters shown in FIG. 2A to be modified, but may cause the amplitude at which five digital filters shown in FIG. 2B to be modified.
- a preset may vary a given predefined template in any suitable fashion.
- a mode may define the extent to which a selected preset varies the amplitude at which one or more digital filters is applied. For example, some embodiments allow the user to select a “subtle” mode in which the amplitudes of one or more of the digital filters defined by a template are increased or decreased by no more than a first amount (e.g., 0.5 dB), a “medium” mode wherein the amplitudes of one or more of the digital filters are increased or decreased by no more than a second amount which is larger than the first amount (e.g., 2 dB), and an “aggressive” mode wherein the amplitudes of one or more of the digital filters in a template are increased or decreased by an amount that exceeds the second amount.
- a first amount e.g., 0.5 dB
- a second amount which is larger than the first amount
- an “aggressive” mode wherein the amplitudes of one or more of the digital filters in a template are increased or decreased by an amount that exceed
- a representative scheme 300 of predefined templates, presets and modes is shown in FIG. 3 .
- predefined templates, presets and modes are arranged in a hierarchy, with predefined templates at the highest level in the hierarchy, followed by presets and modes at successively lower levels in the hierarchy.
- predefined template there may be one or more presets, and for each preset there may be one or more modes.
- there are two predefined templates A and B each corresponding to a particular sound source.
- For each predefined template there are two presets A and B.
- any suitable number of predefined templates, presets, and/or modes may be employed.
- templates may have a different number and type of associated presets than other templates, and some presets may have a different number and type of associated modes than other presets.
- templates, presets and modes may not be arranged in a hierarchy. If arranged in a hierarchy, the hierarchy may include any suitable number of levels.
- act 106 involves selecting, for each track received in act 102 , a predefined template based upon the sound source(s) identified for the track in act 104 , as modified based upon a user's selection of a preset and/or mode received in act 105 .
- act 106 may involve applying the template/preset/mode combination shown at 301 in FIG. 3 .
- a predefined template may include settings which are established to suit the preferences of a particular user, which may vary over time. These preferences may be determined by gathering information on how the user commonly applies filters to tracks that include particular sound sources over time.
- a predefined template may also, or alternatively, include settings which are established to suit the preferences of multiple users, which may also vary over time. These preferences may be determined by gathering information on how the users apply filters over time to tracks that include particular sound sources.
- information on how one or more users applies a particular preset and/or mode may cause the manner in which a preset and/or mode modifies a setting specified by a template to vary over time.
- Modifying a template, preset and/or mode over time may be accomplished in any suitable fashion.
- one or more machine learning algorithms may process information on preferences exhibited by one or more users over time to determine the ways in which a template, preset or mode are to be modified.
- representative process 100 proceeds to act 110 , wherein the spectral and/or dynamic (time-domain) content of each track received in act 102 is automatically analyzed.
- the spectral and/or dynamic content of a track may be automatically analyzed in any of numerous ways, to identify any of numerous spectral and/or dynamic characteristics.
- act 110 may involve executing software which takes as input a digital representation of a track, and applies one or more encoded algorithms to identify characteristics such as the frequency range(s) in which a track exceeds a particular threshold power level, a relationship between the power density in one frequency range and the power density in another frequency range, the frequency range(s) in which the power density is below a certain threshold, the presence and/or amplitude of peaks, and/or identify any of numerous other spectral characteristics of a track.
- one or more settings are automatically determined for each track and applied, based at least in part upon the spectral and/or dynamic characteristics of the track.
- the settings which are determined and applied in act 112 may be designed to achieve any of numerous objectives, such as enhancing certain characteristics of the sound in the track, making one or more sounds in the track more or less pronounced, enhancing the track's balance and/or clarity, etc.
- Various processing techniques may be used to achieve these objectives, including but not limited to sound equalization, single-band compression, multi-band compression, limiting and panning.
- act 112 may involve automatically performing sound equalization for a track.
- a representative process 400 for performing automatic sound equalization is shown in FIG. 4A .
- Representative process 400 begins in act 402 , wherein the track's spectral content (i.e., identified in act 110 ( FIG. 1 )) is compared to a spectral content model for the sound source(s) in the track, to determine the ways in which the track's spectral content varies from the model. Any suitable spectral content model may be used, and a spectral content model may be defined in any suitable way.
- the shape associated with the predefined template identified in act 106 for the sound source(s) included in the track may be used as a spectral content model.
- the predefined template was applied to the track in act 106
- the predefined template may include digital filters applied in corresponding frequency ranges, so that the track should conform to some extent to the shape associated with the predefined template at the completion of act 106
- the predefined template may not include digital filters for all of the frequency ranges in which sound is present in the track.
- FIG. 2C which includes filters designed to boost some middle frequencies and cut high frequencies.
- a particular track may include sound in a frequency range in which the predefined template does not include a digital filter, such as sound in the low frequencies.
- the predefined template was applied to the track in act 106 , the spectral content of the track may not fully conform to the shape associated with the predefined template.
- the invention is not limited to employing a predefined template as the spectral content model in act 402 .
- Any suitable spectral content model(s) may be compared with a track's spectral content in act 402 .
- the result of the comparison in act 402 is an identification of one or more frequency ranges in which the track's spectral content varies from the model, and the manner and extent to which the content varies from the model
- one or more digital filters is applied in the identified frequency range(s), so as to reduce or eliminate this variance.
- FIG. 4B An illustrative example is shown in FIG. 4B . Specifically, FIG. 4B depicts a portion of the predefined template shown in FIG. 2A , in a frequency range which extends from a frequency lower than f 1 at which node 212 1 is placed to f 2 , at which node 212 2 is placed.
- Line segment 425 extends between nodes 212 1 and 212 2 , and thus comprises a portion of the “shape” of the predefined template shown in FIG. 4A .
- no shape is explicitly formed at frequencies lower than f 1 .
- Dotted line segment 420 depicts where the shape may be located at frequencies lower than f 1 if line segment 425 continued along the same path as between f 1 and f 2 . It can be seen that dotted line segment 420 crosses the y-axis at amplitude a 2 .
- FIG. 4B also depicts the spectral content of a representative track 410 in the frequency range shown.
- a threshold 415 resides at frequencies less than or equal to f 1 , at amplitude a 2 . It can be seen in FIG. 4B that the power of the track at frequencies lower than f 1 exceeds threshold 415 .
- act 404 may include placing one or more additional digital filters (not shown in FIG. 4B ) at frequencies lower than f 1 .
- one or more digital filters may each be placed at a frequency lower than f 1 at an amplitude which approximates a 2 , at an amplitude at which dotted line segment 420 intersects the frequency, and/or at any other suitable amplitude. Any suitable number of digital filters may be applied at any suitable amplitude and frequency, as the invention is not limited in this respect.
- act 406 wherein the amplitude and/or frequency at which one or more digital filters is applied to the track is modified.
- Act 406 may involve modifying the amplitude and/or frequency of a digital filter applied in act 404 , and/or modifying the amplitude and/or frequency of a digital filter applied as part of a predefined template in act 106 ( FIG. 1 ). This modification may, for example, be based upon predefined heuristics or rules, be based upon information which is dynamically determined (e.g., the spectral content of the track), and/or defined in any other suitable way.
- a predefined heuristic may provide an optimal ratio between the bandwidth in which “boost” filters are applied and the bandwidth in which “cut” filters are applied.
- act 406 may involve modifying the bandwidths in which “boost” and “cut” filters are applied so that the optimal ratio is achieved.
- the extent to which any one or more bandwidths in which filters are applied to achieve the optimal ratio may be defined based at least in part on the spectral content of the track, the sound source(s) included in the track, and/or any other suitable characteristic(s) of the track.
- a predefined heuristic may provide that a track with excessive content in the high frequencies sounds too “cold.”
- act 406 may involve modifying the frequency and/or amplitude at which one or more digital filters is applied, so as to make the track sound “warmer” by making spectral content in the middle and/or lower frequencies more prominent.
- the frequencies which constitute “high” frequencies, and the threshold defining whether an amount of content in those frequencies is excessive may each be defined in any suitable fashion.
- a predefined heuristic may provide for modifying the frequency and/or amplitude of one or more digital filters based upon a particular sound system which is to be used to reproduce the track, the environment in which the track is to be reproduced, and/or any other suitable information. For example, if a particular loudspeaker tends to suppress the low frequencies when used in a particular setting, one or more digital filters may be modified so as to boost the content of a track in the low frequencies and/or suppress the content in other frequencies. It should be appreciated that the frequencies which constitute “low” frequencies in this example may be defined in any suitable fashion.
- the introduction of one or more digital filters in act 404 , and/or the modification of the amplitude and/or frequency at which one or more digital filters is applied in act 406 may be governed by one or more rules.
- a rule may provide a maximum extent to which a predefined template may be modified in acts 404 and/or 406 , such as to preserve the fundamental character of a particular sound source with which the template is associated.
- a rule may specify that if the average power of a track in a particular frequency range over a particular time period exceeds a particular threshold, then at least one digital filter is to be applied. Any suitable rule(s) may govern the automatic performance of sound equalization to an audio track, in any suitable way.
- act 112 may involve automatically performing audio compression.
- compression is an audio production technique in which loud sounds are attenuated, to an extent determined by one or more compression parameters.
- One of these parameters is the compression threshold, which is the gain level which a track must exceed in a frequency range to be attenuated.
- Another parameter is the compression ratio, which defines the extent to which sound that exceeds the compression threshold is attenuated. For example, if a 2:1 compression ratio is used, then sounds above the compression threshold be attenuated by a factor of 2.
- FIG. 5 depicts how the level of a track is modified if different compression ratios are applied.
- FIG. 5 includes two regions, separated by the compression threshold L th . Below the compression threshold (i.e., to the left of L th in FIG. 5 ), the ratio of the increase in the gain output level of the track to the increase in the gain input level of the track is roughly 1:1 (as indicated by curve 302 ), as no compression is applied.
- L th When the input level exceeds the compression threshold L th , however, compression is performed, and the ratio of the increase in the gain output level of the track to the increase in the gain input level of the track is less than 1:1, as compression is applied and the dynamic range of the track is decreased.
- the extent to which the gain output level is compressed depends on the compression ratio.
- curves labeled 304 A , 304 B , 304 C , and 304 D represent four successively higher compression ratios being applied.
- curve 304 A may be associated with a 2:1 compression ratio
- curve 304 B may be associated with a 5:1 compression ratio
- curve 304 C may be associated with a 10:1 compression ratio
- curve 304 D may be associated with a ⁇ :1 compression ratio (so that a compressor which applies compression corresponding to curve 304 D behaves essentially as a limiter).
- the attack time is the period which starts when compression is applied and ends when the compression ratio is reached.
- the release time is the period which starts when the audio level falls below the compression threshold and ends when the ratio between the output level and input level of the signal is 1:1.
- tuning compression parameters to produce pleasing audio can be cumbersome and time-consuming, as it often involves multiple trial-and-error iterations before a satisfying output is produced.
- tuning the attack time parameter often involves finding the right balance between the duration of a drum kick sound and that of a guitar sound. Selecting too long an attack time may result in an overly extended guitar sound, and selecting too short an attack time can “choke” the sound produced by the kick drum.
- Some embodiments of the invention are directed to automatically applying compression to a track or multi-track mix.
- the application of compression may involve tuning one or more compression parameters based at least in part on the spectral content and/or dynamic characteristics of the track or mix, so as to produce clear and balanced audio without affecting its character.
- the compression threshold and/or compression ratio may be automatically set based upon one or more characteristics of peaks in the track or mix, such as the presence, amplitude, duration, and/or regularity of peaks.
- the compression threshold and/or compression ratio may be automatically set based at least in part upon the spectral bandwidth(s) in which peaks occur in a track or mix.
- the compression threshold and/or compression ratio may be automatically set based at least in part upon the ratio between the power associated with one or more peaks and the average power of the track or mix, or between the power associated with one or more peaks and the average power of portions of the track or mix which do not include the peak(s). Any suitable information, which may or may not relate to peaks, may be used to automatically set the compression threshold and/or compression ratio for a track or mix.
- the attack time and/or release time may be automatically set based at least in part upon one or more dynamic characteristics of a track or mix, such as the duration and/or amplitude of “tails” generated by a particular sound source (e.g., a kick drum hit, a guitar strum, etc.), the ratio between the durations and/or amplitudes of tails generated by different sound sources, and/or the frequency of tails (e.g., how many occur in a given predetermined time interval).
- a particular sound source e.g., a kick drum hit, a guitar strum, etc.
- the ratio between the durations and/or amplitudes of tails generated by different sound sources e.g., how many occur in a given predetermined time interval
- any suitable information (which may or may not relate to tails generated by a sound source) may be used to automatically set an attack time and/or release time for a track or mix.
- the manner in which compression is automatically applied may be governed by one or more rules.
- a rule may provide specify admissible ranges for a compression threshold, compression ratio, attack time and/or and release time, to ensure that compression which is automatically applied does not alter the fundamental character of a track or mix.
- Any suitable rule(s) may govern the automatic application of compression to a track or mix, in any suitable way.
- act 112 may involve automatically specifying one or more cross-overs.
- a cross-over is defined so as to delimit a portion of the frequency spectrum so that different frequency bands may have compression applied differently.
- FIG. 6 shows a frequency spectrum which is segmented using cross-overs.
- two cross-overs are applied, at frequencies f 1 and f 2 , thus creating three bands (i.e., bands A, B and C) in which compression may be applied differently.
- Applying compression differently in different frequency bands is known as multi-band compression, and may be performed when performing single-band compression tends to negatively affect the relationships between different sounds in a track or mix.
- some embodiments of the present invention are directed to automatically determining the manner in which multi-band compression is applied. This determination may, for example, be based at least in part upon the spectral and/or dynamic characteristics of a track or mix.
- the position and number of cross-overs, and the compression threshold and/or ratio to be applied in each of multiple frequency bands may be automatically identified so as to balance the level of a track across the entire frequency spectrum. For example, if frequent and large peaks occur within a particular frequency range, then a cross-over may be positioned so as to isolate these peaks, and compression within the isolated area may employ a low compression threshold and/or high compression ratio.
- the position and number of cross-overs, and the attack time and release time to be applied in each of multiple frequency bands may be automatically identified so as to balance the duration of sounds across the frequency spectrum. For example, if high-frequency sounds tend to exhibit long tails and low-frequency sounds tend to exhibit short tails, then one or more cross-overs may be positioned to isolate the bands in which the short and long tails tend to occur, the attack time in the low-frequency band may be increased, and the attack time in the high-frequency band may be decreased.
- the manner in which multi-band compression is automatically applied may be governed by one or more rules.
- a rule may provide a minimum and/or maximum number of cross-overs that may be applied to a track or mix. Any suitable rule(s) may govern the automatic application of multi-band compression to a track or mix, in any suitable way.
- the manner in which different audio production operations are applied may be governed by one or more rules.
- a rule may provide a sequence in which certain production operations are performed, such as to specify that automatic sound equalization is to be performed before automatic compression.
- Any suitable rule(s) may govern the performance of different audio production operations, in any suitable way.
- representative process 100 proceeds to act 114 , wherein a user is allowed to modify any of the settings applied in act 106 and/or act 110 .
- audio production is ultimately a creative task in which a user seeks to express a particular perspective, convey a particular emotion, create a particular mood, etc.
- some embodiments of the invention may provide features designed to improve the overall efficiency of the audio production process, and may enhance the quality of the output of that process, some embodiments may also provide features designed to ensure that the user's creativity is not abridged.
- settings which are automatically determined for a track or mix may evolve over time.
- the settings which are automatically determined for a given track at a first time may be different than the settings which are automatically determined for the track at a second time.
- Any differences in the way that settings are automatically determined over time may, for example, be the result of analyzing how one or more users employ an audio production tool providing the functionality described herein, how one or more users modifies one or more settings subsequent to the setting(s) being automatically determined, and/or based upon any other suitable information.
- one or more machine learning algorithms may process information on user habits over time to change the way in which certain settings are automatically determined.
- FIG. 7 depicts a general purpose computing device, in the form of computer 910 , which may be used to implement certain aspects of the invention.
- components include, but are not limited to, a processing unit 920 , a system memory 930 , and a system bus 921 that couples various system components including the system memory to the processing unit 920 .
- the system bus 921 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
- Computer 910 typically includes a variety of computer readable media.
- Computer readable media can be any available media that can be accessed by computer 910 and includes both volatile and nonvolatile media, removable and non-removable media.
- Computer readable media may comprise computer storage media and communication media.
- Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other one or more media which may be used to store the desired information and may be accessed by computer 910 .
- Communication media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
- the system memory 930 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 931 and random access memory (RAM) 932 .
- ROM read only memory
- RAM random access memory
- BIOS basic input/output system
- RAM 932 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 920 .
- FIG. 7 illustrates operating system 934 , application programs 935 , other program modules 939 , and program data 937 .
- the computer 910 may also include other removable/non-removable, volatile/nonvolatile computer storage media.
- FIG. 7 illustrates a hard disk drive 941 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 951 that reads from or writes to a removable, nonvolatile magnetic disk 952 , and an optical disk drive 955 that reads from or writes to a removable, nonvolatile optical disk 959 such as a CD ROM or other optical media.
- removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary computing system include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
- the hard disk drive 941 is typically connected to the system bus 921 through an non-removable memory interface such as interface 940
- magnetic disk drive 951 and optical disk drive 955 are typically connected to the system bus 921 by a removable memory interface, such as interface 950 .
- hard disk drive 941 is illustrated as storing operating system 944 , application programs 945 , other program modules 949 , and program data 947 .
- operating system 944 application programs 945 , other program modules 949 , and program data 947 .
- these components can either be the same as or different from operating system 934 , application programs 935 , other program modules 539 , and program data 937 .
- Operating system 944 , application programs 945 , other program modules 949 , and program data 947 are given different numbers here to illustrate that, at a minimum, they are different copies.
- a user may enter commands and information into the computer 910 through input devices such as a keyboard 992 and pointing device 991 , commonly referred to as a mouse, trackball or touch pad.
- Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
- These and other input devices are often connected to the processing unit 920 through a user input interface 590 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB).
- a monitor 991 or other type of display device is also connected to the system bus 921 via an interface, such as a video interface 990 .
- computers may also include other peripheral output devices such as speakers 997 and printer 999 , which may be connected through a output peripheral interface 995 .
- the computer 910 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 980 .
- the remote computer 980 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 910 , although only a memory storage device 981 has been illustrated in FIG. 7 .
- the logical connections depicted in FIG. 7 include a local area network (LAN) 971 and a wide area network (WAN) 973 , but may also include other networks.
- LAN local area network
- WAN wide area network
- Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
- the computer 910 When used in a LAN networking environment, the computer 910 is connected to the LAN 971 through a network interface or adapter 970 .
- the computer 910 When used in a WAN networking environment, the computer 910 typically includes a modem 972 or other means for establishing communications over the WAN 973 , such as the Internet.
- the modem 972 which may be internal or external, may be connected to the system bus 921 via the user input interface 990 , or other appropriate mechanism.
- program modules depicted relative to the computer 910 may be stored in the remote memory storage device.
- FIG. 7 illustrates remote application programs 985 as residing on memory device 981 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
- Embodiments of the invention may be embodied as a computer readable storage medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs (CD), optical discs, digital video disks (DVD), magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above.
- a computer readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form.
- Such a computer readable storage medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above.
- the term “computer-readable storage medium” encompasses only a tangible machine, mechanism or device from which a computer may read information.
- the invention may be embodied as a computer readable medium other than a computer-readable storage medium. Examples of computer readable media which are not computer readable storage media include transitory media, like propagating signals.
- the invention may be embodied as a method, of which various examples have been described.
- the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include different (e.g., more or less) acts than those which are described, and/or which may involve performing some acts simultaneously, even though the acts are shown as being performed sequentially in the embodiments specifically described above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Auxiliary Devices For Music (AREA)
- Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
Abstract
Description
- This application is a continuation of commonly assigned International Patent Application No. PCT/US2018/034340, filed May 24, 2018, entitled “Systems and Methods for Automatically Generating Enhanced Audio Output,” which claims priority to commonly assigned U.S. Provisional Application Ser. No. 62/516,605, filed Jun. 7, 2017, entitled “Systems and Methods for Automatically Generating Enhanced Audio Output.” The entirety of each of the documents listed above is incorporated herein by reference.
- Audio production tools exist that enable users to produce high-quality audio. For example, some audio production tools enable users to record sound produced by one or more sound sources (e.g., vocals and/or speech captured by a microphone, music played with an instrument, etc.), process the audio (e.g., to master, mix, design, and/or otherwise manipulate the audio), and/or control its playback. Audio production tools may be used to produce audio comprising music, speech, sound effects, and/or other sounds.
- Some computer-implemented audio production tools provide a graphical user interface with which users may complete various production tasks on an audio recording. For example, some tools may receive audio input and generate one or more digital representations of the input, which a user may manipulate using the graphical user interface to obtain audio output having desired characteristics.
- A user may employ an audio production tool to perform any of numerous production tasks. For example, many audio production tools enable a user to perform sound equalization, which is a technique used to alter a sound recording by applying filters to sound in one or more frequency ranges, so as to boost or attenuate spectral portions of a recording. Many audio production tools also enable users to perform sound compression, which is a technique for attenuating loud sounds so that other sounds are more easily perceived by a listener.
- Various aspects and embodiments of the invention are described below with reference to the following figures. It should be appreciated that the figures are not necessarily drawn to scale. Items appearing in multiple figures are indicated by the same reference number in all the figures in which they appear.
-
FIG. 1 is a flowchart illustrating a representative audio production process, in accordance with some embodiments of the invention; -
FIGS. 2A-2C are plots depicting representative predefined templates, in accordance with some embodiments of the invention; -
FIG. 3 is a tree diagram depicting a representative hierarchy of predefined templates, presets and modes, according to some embodiments of the invention; -
FIG. 4A is a flowchart depicting a representative process for automatically equalizing sound in an audio recording, according to some embodiments of the invention; -
FIG. 4B is a plot depicting representative output of audio equalization, according to some embodiments of the invention; -
FIG. 5 is a plot depicting representative functions for audio compression, according to some embodiments of the invention; -
FIG. 6 is a plot depicting representative cross-overs and frequency bands, according to some embodiments of the invention; and -
FIG. 7 is a block diagram illustrating a representative computer system which may be used to implement certain aspects of the invention. - Some conventional audio production tools are capable of automatically recognizing the source of sound included in an audio track. For example, techniques are known for automatically recognizing whether the sound included in a track was produced by a particular instrument, by vocals, and/or by one or more other sources.
- Some conventional audio production tools may also be capable of applying a predefined “template” of audio production settings based upon a recognized sound source. These settings may, for example, define sound equalization parameters (e.g., the application of one or more digital filters to boost or attenuate sound in certain frequency ranges) to produce audio output which is generally considered to be pleasing to a listener. For example, some conventional audio production tools may apply one collection of settings to a track that is recognized as including sound produced by a guitar, another collection of settings to a track recognized as including sound produced by drums, another collection of settings to a track that is recognized as including sound produced by vocals, and so forth.
- The Assignee has recognized that a user may have any of numerous goals for a track that includes sound from a particular source, and that the settings defined by a “one size fits all” predefined template may not serve those goals. For example, a user may wish to achieve a particular mood, sound quality and/or other characteristic for a guitar track that one or more settings in a predefined guitar template does not allow the user to achieve. As such, some embodiments of the invention may enable a user to modify the manner in which one or more settings specified by a predefined template for a particular sound source are applied. For example, some embodiments may enable a user to specify that the amplitude at which one or more digital filters specified by a predefined template is applied is varied, by selecting a “preset” for the template. Further, some embodiments may enable the user to define the extent to which the amplitude for one or more digital filters is varied, by selecting a “mode” for a preset. As a result, the user may have greater control over the settings which are applied to a track than conventional audio production tools afford. The user may therefore benefit from the time savings that come with having access to a collection of predefined settings for a particular sound source, without being restricted to a “one size fits all” scheme for the sound source.
- Additionally, some embodiments of the invention may allow the user to switch from using one collection of settings (e.g., defined by a predefined template, as modified per the user's selection of a preset and/or mode) to another, so that he/she may “try out” different settings before deciding on a particular collection that allows him/her to achieve the goals he/she has in mind for a track.
- The Assignee has also recognized that many users may expend significant time and effort defining settings for a track, regardless of whether a predefined template of settings is applied. Some embodiments of the invention are directed to reducing the amount of time and effort a user expends, and/or enabling the user to produce higher quality music than he/she may have been capable of producing on his/her own. For example, some embodiments may automatically determine one or more settings for a track based at least in part upon an analysis of the spectral and/or dynamic content of the track. The settings which are automatically determined may take any of numerous forms. For example, some embodiments may automatically perform sound equalization by applying one or more digital filters to a track, and/or defining the frequency range(s) in which the filter(s) are applied. Some embodiments may automatically apply dynamic range compression to a track, so as to attenuate loud sounds in the track, without diminishing the track's overall character. Some embodiments may automatically define the manner in which compression is applied in multiple sub-bands of the audible spectrum, such as by intelligently positioning “cross-overs” between the sub-bands so as to promote overall sound quality. Any suitable setting(s) may be automatically determined based at least in part upon the spectral content of a track, as the invention is not limited in this respect.
- The settings for a track may be automatically determined in any suitable way. For example, one or more heuristics, algorithms, and/or other processing technique(s) may be used to determine how various spectral characteristics of a track may influence the settings for the track. In this respect, settings may be automatically determined to a track so as to achieve any of numerous (e.g., artistic) goals. For example, some embodiments may automatically determine certain settings to bring to the forefront certain elements of the natural character of the sound in a track, to enhance the track's overall balance and/or clarity, and/or achieve any of numerous other objectives.
- Although automatically determining the settings for a track may save the user considerable time and effort, and/or enable him/her to produce sound of a quality which he/she may not otherwise have been capable of producing, the Assignee has recognized that audio production is ultimately a creative endeavor by which a user may seek to express his/her own unique “voice.” Thus, some embodiments of the invention may enable a user to modify any of the settings which are automatically applied to a track. As such, some embodiments of the invention may enable the user to reap the benefits of increased audio production efficiency and/or enhanced audio quality, to the extent he/she deems appropriate, while still producing audio output that suits his/her unique tastes and objectives.
-
FIG. 1 depicts arepresentative process 100 for automatically generating enhanced audio. In some embodiments,representative process 100 may be performed via execution of software, by a system which includes an audio recording system, digital audio workstation, personal computer, and/or portable device (e.g., a tablet, smartphone, gaming console, and/or other suitable portable device) that presents a graphical user interface which a user may employ to invoke certain functions. However, it should be appreciated thatrepresentative process 100 is not limited to being performed via execution of software, by any particular component(s), and that embodiments of the invention may be implemented using any suitable combination of hardware and/or software. -
Representative process 100 begins atact 102, wherein one or more tracks are received. Each track received inact 102 may, for example, include sound which is produced by a particular sound source, such as a musical instrument, microphone, computer system, and/or any other suitable source(s). Of course, a track need not include only sound from a single source, as any suitable number of sound sources may be represented in a particular track. Ifact 102 comprises receiving multiple tracks, then the tracks may collectively form a multi-track recording. In some embodiments, each track received inact 102 may comprise a digital representation of a time-delimited audio recording. As such, act 102 may comprise storing each track in computer-readable memory. - In
act 104, each track received inact 102 is analyzed to identify the sound source(s) represented in the track. Those skilled in the art will recognize that any of numerous techniques, whether now known or later developed, may be used to identify the sound source(s) represented in a track. These identification techniques are not considered by the Assignee to be a part of the invention, and so they will not be described in further detail here. - In
act 106, a predefined template of settings is selected for each track received inact 102, based at least in part on the sound source(s) identified inact 104. Inrepresentative process 100, the settings which are specified by the selected predefined template may be varied based upon the user's selection of a “preset” and “mode” (described in further detail below), which are received inact 105. Inrepresentative process 100, acts 104 and 105 may occur concurrently or at different times. - To illustrate the manner in which one or more settings specified by a predefined template may be modified based upon the user's selection of a preset and mode, three representative predefined templates are symbolically depicted in
FIGS. 2A-2C . Specifically, each ofFIGS. 2A-2C shows a two-dimensional plot in which frequency is represented on the x-axis and amplitude is represented on the y-axis. Each plot shows a series of nodes, with the placement of each node indicating the frequency and amplitude of a corresponding digital filter. The various nodes may therefore be considered the settings of a digital equalizer.FIG. 2A depicts a collection of equalizer settings predefined for a first sound source (i.e., nodes 212 1, 212 2, 212 3, 212 4, 212 5, 212 6, 212 7, 212 8, 212 9, and 212 10),FIG. 2B depicts a collection of settings predefined for a second sound source (i.e., nodes 222 1, 222 2, 222 3, 222 4, 222 5, 222 6, 222 7, 222 8, 222 9, 222 10, and 222 11), andFIG. 2C depicts a collection of settings predefined for a third sound source (i.e., nodes 232 1, 232 2, 232 3, 232 4, 232 5, 232 6, 232 7, 232 8, and 232 9). It should be appreciated that although three representative predefined templates are shown inFIGS. 2A-2C , any suitable number of predefined templates may be used to process audio input. It should also be appreciated that a predefined template is not limited to having the same number of settings as any of those shown, as a predefined template may include any suitable number of settings. - In the examples shown, the template represented in
FIG. 2A for the first sound source boosts high frequencies and cuts low frequencies, the template represented inFIG. 2B for the second sound source boosts high frequencies and cuts some middle frequencies, and the template shown inFIG. 2C for the third sound source boosts some middle frequencies and cuts high frequencies. Of course, a template that is predefined for a particular sound source may include settings that are designed to achieve any suitable frequency response. It can be seen inFIGS. 2A-2C that the line segments extending between the nodes in each template create a “shape” for the template. - Some embodiments of the invention enable a user to modify the frequency response associated with a predefined template by specifying a “preset” which varies the amplitude at which one or more of the filters shown in
FIGS. 2A-2C is applied. A preset may be established so as to modify a given predefined template in any suitable way. As an example, some embodiments may allow the user to select a “broadband clarity” preset to enhance sound clarity across the entire audible spectrum, a “warm and open” preset to make low frequencies more pronounced, an “upfront midrange” preset to make certain midrange frequencies more pronounced, and/or any other suitable preset. Any suitable number of presets may be made available to a user. - It should be appreciated that the manner in which a particular preset alters the frequency response of one predefined template may be different than the way that the same preset alters the frequency response of another predefined template. For example, selecting a “warm and open” preset may cause the amplitude at which three digital filters shown in
FIG. 2A to be modified, but may cause the amplitude at which five digital filters shown inFIG. 2B to be modified. A preset may vary a given predefined template in any suitable fashion. - Some embodiments of the invention may also enable the user to select a mode. A mode may define the extent to which a selected preset varies the amplitude at which one or more digital filters is applied. For example, some embodiments allow the user to select a “subtle” mode in which the amplitudes of one or more of the digital filters defined by a template are increased or decreased by no more than a first amount (e.g., 0.5 dB), a “medium” mode wherein the amplitudes of one or more of the digital filters are increased or decreased by no more than a second amount which is larger than the first amount (e.g., 2 dB), and an “aggressive” mode wherein the amplitudes of one or more of the digital filters in a template are increased or decreased by an amount that exceeds the second amount. Of course, any suitable number of modes may be defined, and each mode may be designed to achieve any suitable variation on the amplitude at which one or more digital filters is applied.
- A representative scheme 300 of predefined templates, presets and modes is shown in
FIG. 3 . In representative scheme 300, predefined templates, presets and modes are arranged in a hierarchy, with predefined templates at the highest level in the hierarchy, followed by presets and modes at successively lower levels in the hierarchy. For each predefined template, there may be one or more presets, and for each preset there may be one or more modes. Thus, in representative scheme 300, there are two predefined templates A and B, each corresponding to a particular sound source. For each predefined template there are two presets A and B. For each preset there are two modes A and B. Of course, any suitable number of predefined templates, presets, and/or modes may be employed. Some templates may have a different number and type of associated presets than other templates, and some presets may have a different number and type of associated modes than other presets. In some embodiments, templates, presets and modes may not be arranged in a hierarchy. If arranged in a hierarchy, the hierarchy may include any suitable number of levels. - Referring again to
FIG. 1 , act 106 (FIG. 1 ) involves selecting, for each track received inact 102, a predefined template based upon the sound source(s) identified for the track inact 104, as modified based upon a user's selection of a preset and/or mode received inact 105. As an illustrative example, if the result ofact 104 is the identification of a sound source which is associated with predefined template A shown inFIG. 3 , and act 105 involves receiving a user's selection of a preset B and mode A, then act 106 may involve applying the template/preset/mode combination shown at 301 inFIG. 3 . - It should be appreciated that the settings associated with a predefined template, preset or mode need not be static, or uniform across all users. For example, a predefined template may include settings which are established to suit the preferences of a particular user, which may vary over time. These preferences may be determined by gathering information on how the user commonly applies filters to tracks that include particular sound sources over time. A predefined template may also, or alternatively, include settings which are established to suit the preferences of multiple users, which may also vary over time. These preferences may be determined by gathering information on how the users apply filters over time to tracks that include particular sound sources. Similarly, information on how one or more users applies a particular preset and/or mode may cause the manner in which a preset and/or mode modifies a setting specified by a template to vary over time. Modifying a template, preset and/or mode over time may be accomplished in any suitable fashion. For example, one or more machine learning algorithms may process information on preferences exhibited by one or more users over time to determine the ways in which a template, preset or mode are to be modified.
- At the completion of
act 106,representative process 100 proceeds to act 110, wherein the spectral and/or dynamic (time-domain) content of each track received inact 102 is automatically analyzed. The spectral and/or dynamic content of a track may be automatically analyzed in any of numerous ways, to identify any of numerous spectral and/or dynamic characteristics. For example, act 110 may involve executing software which takes as input a digital representation of a track, and applies one or more encoded algorithms to identify characteristics such as the frequency range(s) in which a track exceeds a particular threshold power level, a relationship between the power density in one frequency range and the power density in another frequency range, the frequency range(s) in which the power density is below a certain threshold, the presence and/or amplitude of peaks, and/or identify any of numerous other spectral characteristics of a track. - In
act 112, one or more settings are automatically determined for each track and applied, based at least in part upon the spectral and/or dynamic characteristics of the track. The settings which are determined and applied inact 112 may be designed to achieve any of numerous objectives, such as enhancing certain characteristics of the sound in the track, making one or more sounds in the track more or less pronounced, enhancing the track's balance and/or clarity, etc. Various processing techniques may be used to achieve these objectives, including but not limited to sound equalization, single-band compression, multi-band compression, limiting and panning. Some representative techniques for automatically determining and applying the settings for a track are described in more detail in the sections that follow. - In some embodiments of the invention, act 112 may involve automatically performing sound equalization for a track. A
representative process 400 for performing automatic sound equalization is shown inFIG. 4A .Representative process 400 begins inact 402, wherein the track's spectral content (i.e., identified in act 110 (FIG. 1 )) is compared to a spectral content model for the sound source(s) in the track, to determine the ways in which the track's spectral content varies from the model. Any suitable spectral content model may be used, and a spectral content model may be defined in any suitable way. - In some embodiments, the shape associated with the predefined template identified in
act 106 for the sound source(s) included in the track may be used as a spectral content model. In this respect, it should be appreciated that although the predefined template was applied to the track inact 106, and the predefined template may include digital filters applied in corresponding frequency ranges, so that the track should conform to some extent to the shape associated with the predefined template at the completion ofact 106, the predefined template may not include digital filters for all of the frequency ranges in which sound is present in the track. To illustrate, consider the predefined template shown inFIG. 2C , which includes filters designed to boost some middle frequencies and cut high frequencies. A particular track may include sound in a frequency range in which the predefined template does not include a digital filter, such as sound in the low frequencies. As a result, although the predefined template was applied to the track inact 106, the spectral content of the track may not fully conform to the shape associated with the predefined template. - Of course, it should be appreciated that the invention is not limited to employing a predefined template as the spectral content model in
act 402. Any suitable spectral content model(s) may be compared with a track's spectral content inact 402. - In
representative process 400, the result of the comparison inact 402 is an identification of one or more frequency ranges in which the track's spectral content varies from the model, and the manner and extent to which the content varies from the model Inact 404, then, one or more digital filters is applied in the identified frequency range(s), so as to reduce or eliminate this variance. An illustrative example is shown inFIG. 4B . Specifically,FIG. 4B depicts a portion of the predefined template shown inFIG. 2A , in a frequency range which extends from a frequency lower than f1 at which node 212 1 is placed to f2, at which node 212 2 is placed.Line segment 425 extends between nodes 212 1 and 212 2, and thus comprises a portion of the “shape” of the predefined template shown inFIG. 4A . As no node is placed at a frequency lower than f1 in the predefined template shown inFIG. 4A , no shape is explicitly formed at frequencies lower than f1.Dotted line segment 420 depicts where the shape may be located at frequencies lower than f1 ifline segment 425 continued along the same path as between f1 and f2. It can be seen that dottedline segment 420 crosses the y-axis at amplitude a2. -
FIG. 4B also depicts the spectral content of arepresentative track 410 in the frequency range shown. Athreshold 415 resides at frequencies less than or equal to f1, at amplitude a2. It can be seen inFIG. 4B that the power of the track at frequencies lower than f1 exceedsthreshold 415. As a result, act 404 may include placing one or more additional digital filters (not shown inFIG. 4B ) at frequencies lower than f1. As an example, one or more digital filters may each be placed at a frequency lower than f1 at an amplitude which approximates a2, at an amplitude at whichdotted line segment 420 intersects the frequency, and/or at any other suitable amplitude. Any suitable number of digital filters may be applied at any suitable amplitude and frequency, as the invention is not limited in this respect. - Referring again to
FIG. 4A , at the completion ofact 404,representative process 400 proceeds to act 406, wherein the amplitude and/or frequency at which one or more digital filters is applied to the track is modified. Act 406 may involve modifying the amplitude and/or frequency of a digital filter applied inact 404, and/or modifying the amplitude and/or frequency of a digital filter applied as part of a predefined template in act 106 (FIG. 1 ). This modification may, for example, be based upon predefined heuristics or rules, be based upon information which is dynamically determined (e.g., the spectral content of the track), and/or defined in any other suitable way. - As an example, a predefined heuristic may provide an optimal ratio between the bandwidth in which “boost” filters are applied and the bandwidth in which “cut” filters are applied. As a result, act 406 may involve modifying the bandwidths in which “boost” and “cut” filters are applied so that the optimal ratio is achieved. The extent to which any one or more bandwidths in which filters are applied to achieve the optimal ratio may be defined based at least in part on the spectral content of the track, the sound source(s) included in the track, and/or any other suitable characteristic(s) of the track.
- As another example, a predefined heuristic may provide that a track with excessive content in the high frequencies sounds too “cold.” Thus, if a track includes an amount of spectral content in the high frequencies which exceeds a predefined threshold, then act 406 may involve modifying the frequency and/or amplitude at which one or more digital filters is applied, so as to make the track sound “warmer” by making spectral content in the middle and/or lower frequencies more prominent. It should be appreciated that the frequencies which constitute “high” frequencies, and the threshold defining whether an amount of content in those frequencies is excessive, may each be defined in any suitable fashion.
- As another example, a predefined heuristic may provide for modifying the frequency and/or amplitude of one or more digital filters based upon a particular sound system which is to be used to reproduce the track, the environment in which the track is to be reproduced, and/or any other suitable information. For example, if a particular loudspeaker tends to suppress the low frequencies when used in a particular setting, one or more digital filters may be modified so as to boost the content of a track in the low frequencies and/or suppress the content in other frequencies. It should be appreciated that the frequencies which constitute “low” frequencies in this example may be defined in any suitable fashion.
- In some embodiments, the introduction of one or more digital filters in
act 404, and/or the modification of the amplitude and/or frequency at which one or more digital filters is applied inact 406, may be governed by one or more rules. For example, a rule may provide a maximum extent to which a predefined template may be modified inacts 404 and/or 406, such as to preserve the fundamental character of a particular sound source with which the template is associated. As another example, a rule may specify that if the average power of a track in a particular frequency range over a particular time period exceeds a particular threshold, then at least one digital filter is to be applied. Any suitable rule(s) may govern the automatic performance of sound equalization to an audio track, in any suitable way. - In some embodiments of the invention, act 112 may involve automatically performing audio compression. In this respect, compression is an audio production technique in which loud sounds are attenuated, to an extent determined by one or more compression parameters. One of these parameters is the compression threshold, which is the gain level which a track must exceed in a frequency range to be attenuated. Another parameter is the compression ratio, which defines the extent to which sound that exceeds the compression threshold is attenuated. For example, if a 2:1 compression ratio is used, then sounds above the compression threshold be attenuated by a factor of 2.
-
FIG. 5 depicts how the level of a track is modified if different compression ratios are applied.FIG. 5 includes two regions, separated by the compression threshold Lth. Below the compression threshold (i.e., to the left of Lth inFIG. 5 ), the ratio of the increase in the gain output level of the track to the increase in the gain input level of the track is roughly 1:1 (as indicated by curve 302), as no compression is applied. When the input level exceeds the compression threshold Lth, however, compression is performed, and the ratio of the increase in the gain output level of the track to the increase in the gain input level of the track is less than 1:1, as compression is applied and the dynamic range of the track is decreased. The extent to which the gain output level is compressed depends on the compression ratio. The curves labeled 304 A, 304 B, 304 C, and 304 D represent four successively higher compression ratios being applied. For example, curve 304 A may be associated with a 2:1 compression ratio, curve 304 B may be associated with a 5:1 compression ratio, curve 304 C may be associated with a 10:1 compression ratio, and curve 304 D may be associated with a ∞:1 compression ratio (so that a compressor which applies compression corresponding to curve 304D behaves essentially as a limiter). - Other parameters often used in audio compression include the attack time and release time. The attack time is the period which starts when compression is applied and ends when the compression ratio is reached. The release time is the period which starts when the audio level falls below the compression threshold and ends when the ratio between the output level and input level of the signal is 1:1.
- The Assignee has appreciated that while compression may improve sound clarity, suppress background noise, reduce sibilance, and/or boost low sounds without affecting others, it can also negatively affect the quality of a mix if not applied judiciously. The Assignee has also appreciated that tuning compression parameters to produce pleasing audio can be cumbersome and time-consuming, as it often involves multiple trial-and-error iterations before a satisfying output is produced. As an example, tuning the attack time parameter often involves finding the right balance between the duration of a drum kick sound and that of a guitar sound. Selecting too long an attack time may result in an overly extended guitar sound, and selecting too short an attack time can “choke” the sound produced by the kick drum.
- Some embodiments of the invention, then, are directed to automatically applying compression to a track or multi-track mix. The application of compression may involve tuning one or more compression parameters based at least in part on the spectral content and/or dynamic characteristics of the track or mix, so as to produce clear and balanced audio without affecting its character. For example, in some embodiments, the compression threshold and/or compression ratio may be automatically set based upon one or more characteristics of peaks in the track or mix, such as the presence, amplitude, duration, and/or regularity of peaks. In some embodiments, the compression threshold and/or compression ratio may be automatically set based at least in part upon the spectral bandwidth(s) in which peaks occur in a track or mix. In some embodiments, the compression threshold and/or compression ratio may be automatically set based at least in part upon the ratio between the power associated with one or more peaks and the average power of the track or mix, or between the power associated with one or more peaks and the average power of portions of the track or mix which do not include the peak(s). Any suitable information, which may or may not relate to peaks, may be used to automatically set the compression threshold and/or compression ratio for a track or mix.
- In some embodiments, the attack time and/or release time may be automatically set based at least in part upon one or more dynamic characteristics of a track or mix, such as the duration and/or amplitude of “tails” generated by a particular sound source (e.g., a kick drum hit, a guitar strum, etc.), the ratio between the durations and/or amplitudes of tails generated by different sound sources, and/or the frequency of tails (e.g., how many occur in a given predetermined time interval). As with the compression threshold and compression ratio discussed above, any suitable information (which may or may not relate to tails generated by a sound source) may be used to automatically set an attack time and/or release time for a track or mix.
- In some embodiments, the manner in which compression is automatically applied may be governed by one or more rules. For example, a rule may provide specify admissible ranges for a compression threshold, compression ratio, attack time and/or and release time, to ensure that compression which is automatically applied does not alter the fundamental character of a track or mix. Any suitable rule(s) may govern the automatic application of compression to a track or mix, in any suitable way.
- In some embodiments of the invention, act 112 may involve automatically specifying one or more cross-overs. In this respect, a cross-over is defined so as to delimit a portion of the frequency spectrum so that different frequency bands may have compression applied differently.
FIG. 6 shows a frequency spectrum which is segmented using cross-overs. In the example shown, two cross-overs are applied, at frequencies f1 and f2, thus creating three bands (i.e., bands A, B and C) in which compression may be applied differently. Applying compression differently in different frequency bands is known as multi-band compression, and may be performed when performing single-band compression tends to negatively affect the relationships between different sounds in a track or mix. - The Assignee has appreciated that improperly positioning cross-overs across the frequency spectrum can have significant negative effects on a track or mix, such as by introducing excess noise, ringing and/or distortion. The Assignee has also appreciated, however, that determining where to place cross-overs is a complex task which often involves time-consuming trial-and-error. As such, some embodiments of the present invention are directed to automatically determining the manner in which multi-band compression is applied. This determination may, for example, be based at least in part upon the spectral and/or dynamic characteristics of a track or mix.
- For example, the position and number of cross-overs, and the compression threshold and/or ratio to be applied in each of multiple frequency bands, may be automatically identified so as to balance the level of a track across the entire frequency spectrum. For example, if frequent and large peaks occur within a particular frequency range, then a cross-over may be positioned so as to isolate these peaks, and compression within the isolated area may employ a low compression threshold and/or high compression ratio.
- As another example, the position and number of cross-overs, and the attack time and release time to be applied in each of multiple frequency bands, may be automatically identified so as to balance the duration of sounds across the frequency spectrum. For example, if high-frequency sounds tend to exhibit long tails and low-frequency sounds tend to exhibit short tails, then one or more cross-overs may be positioned to isolate the bands in which the short and long tails tend to occur, the attack time in the low-frequency band may be increased, and the attack time in the high-frequency band may be decreased.
- In some embodiments, the manner in which multi-band compression is automatically applied may be governed by one or more rules. For example, a rule may provide a minimum and/or maximum number of cross-overs that may be applied to a track or mix. Any suitable rule(s) may govern the automatic application of multi-band compression to a track or mix, in any suitable way.
- In some embodiments, the manner in which different audio production operations are applied may be governed by one or more rules. For example, a rule may provide a sequence in which certain production operations are performed, such as to specify that automatic sound equalization is to be performed before automatic compression. Any suitable rule(s) may govern the performance of different audio production operations, in any suitable way.
- Referring again to
FIG. 1 , at the completion ofact 112,representative process 100 proceeds to act 114, wherein a user is allowed to modify any of the settings applied inact 106 and/or act 110. In this respect, the Assignee has appreciated that audio production is ultimately a creative task in which a user seeks to express a particular perspective, convey a particular emotion, create a particular mood, etc. While some embodiments of the invention may provide features designed to improve the overall efficiency of the audio production process, and may enhance the quality of the output of that process, some embodiments may also provide features designed to ensure that the user's creativity is not abridged. - At the completion of
act 114,representative process 100 completes. - It should be appreciated that settings which are automatically determined for a track or mix may evolve over time. For example, the settings which are automatically determined for a given track at a first time may be different than the settings which are automatically determined for the track at a second time. Any differences in the way that settings are automatically determined over time may, for example, be the result of analyzing how one or more users employ an audio production tool providing the functionality described herein, how one or more users modifies one or more settings subsequent to the setting(s) being automatically determined, and/or based upon any other suitable information. For example, one or more machine learning algorithms may process information on user habits over time to change the way in which certain settings are automatically determined.
- It should also be appreciated that various embodiments of the invention may vary from the specific techniques and processes described above, in any of numerous ways, without departing from the spirit and scope of the invention. Using
representative process 100 as an illustrative example, certain embodiments of the invention may omit some of the acts described above with reference toFIG. 1 , may include additional acts not described above with reference toFIG. 1 , and/or may involve performing acts in a different order than that which is described above with reference toFIG. 1 . As an example, some embodiments may involve automatically determining one or more settings for a track (as described above with reference to act 112 inFIG. 1 ) without applying a predefined template associated with one or more sound sources included in the track (as described above with reference to act 106 inFIG. 1 ). - It should be appreciated from the foregoing that some embodiments of the invention may employ a computing device. For example,
representative process 100 may be performed via execution of software by such a computing device. FIG.7 depicts a general purpose computing device, in the form ofcomputer 910, which may be used to implement certain aspects of the invention. - In
computer 910, components include, but are not limited to, aprocessing unit 920, asystem memory 930, and asystem bus 921 that couples various system components including the system memory to theprocessing unit 920. Thesystem bus 921 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus. -
Computer 910 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed bycomputer 910 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other one or more media which may be used to store the desired information and may be accessed bycomputer 910. Communication media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media. - The
system memory 930 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 931 and random access memory (RAM) 932. A basic input/output system 933 (BIOS), containing the basic routines that help to transfer information between elements withincomputer 910, such as during start-up, is typically stored inROM 931.RAM 932 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processingunit 920. By way of example, and not limitation,FIG. 7 illustratesoperating system 934,application programs 935, other program modules 939, andprogram data 937. - The
computer 910 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,FIG. 7 illustrates ahard disk drive 941 that reads from or writes to non-removable, nonvolatile magnetic media, amagnetic disk drive 951 that reads from or writes to a removable, nonvolatilemagnetic disk 952, and anoptical disk drive 955 that reads from or writes to a removable, nonvolatile optical disk 959 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary computing system include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. Thehard disk drive 941 is typically connected to thesystem bus 921 through an non-removable memory interface such asinterface 940, andmagnetic disk drive 951 andoptical disk drive 955 are typically connected to thesystem bus 921 by a removable memory interface, such asinterface 950. - The drives and their associated computer storage media discussed above and illustrated in
FIG. 7 , provide storage of computer readable instructions, data structures, program modules and other data for thecomputer 910. InFIG. 7 , for example,hard disk drive 941 is illustrated as storingoperating system 944,application programs 945, other program modules 949, andprogram data 947. Note that these components can either be the same as or different fromoperating system 934,application programs 935, other program modules 539, andprogram data 937.Operating system 944,application programs 945, other program modules 949, andprogram data 947 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into thecomputer 910 through input devices such as a keyboard 992 andpointing device 991, commonly referred to as a mouse, trackball or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to theprocessing unit 920 through a user input interface 590 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). Amonitor 991 or other type of display device is also connected to thesystem bus 921 via an interface, such as avideo interface 990. In addition to the monitor, computers may also include other peripheral output devices such asspeakers 997 and printer 999, which may be connected through a outputperipheral interface 995. - The
computer 910 may operate in a networked environment using logical connections to one or more remote computers, such as aremote computer 980. Theremote computer 980 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to thecomputer 910, although only amemory storage device 981 has been illustrated inFIG. 7 . The logical connections depicted inFIG. 7 include a local area network (LAN) 971 and a wide area network (WAN) 973, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet. - When used in a LAN networking environment, the
computer 910 is connected to theLAN 971 through a network interface oradapter 970. When used in a WAN networking environment, thecomputer 910 typically includes amodem 972 or other means for establishing communications over theWAN 973, such as the Internet. Themodem 972, which may be internal or external, may be connected to thesystem bus 921 via theuser input interface 990, or other appropriate mechanism. In a networked environment, program modules depicted relative to thecomputer 910, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,FIG. 7 illustratesremote application programs 985 as residing onmemory device 981. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used. - Embodiments of the invention may be embodied as a computer readable storage medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs (CD), optical discs, digital video disks (DVD), magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above. As is apparent from the foregoing examples, a computer readable storage medium may retain information for a sufficient time to provide computer-executable instructions in a non-transitory form. Such a computer readable storage medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above. As used herein, the term “computer-readable storage medium” encompasses only a tangible machine, mechanism or device from which a computer may read information. Alternatively or additionally, the invention may be embodied as a computer readable medium other than a computer-readable storage medium. Examples of computer readable media which are not computer readable storage media include transitory media, like propagating signals.
- Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Further, though advantages of the present invention are indicated, it should be appreciated that not every embodiment of the invention will include every described advantage. Some embodiments may not implement any features described as advantageous herein and in some instances. Accordingly, the foregoing description and drawings are by way of example only.
- Various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing, and it is, therefore, not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiments.
- The invention may be embodied as a method, of which various examples have been described. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include different (e.g., more or less) acts than those which are described, and/or which may involve performing some acts simultaneously, even though the acts are shown as being performed sequentially in the embodiments specifically described above.
- Use of ordinal terms such as “first,” “second,” “third,” etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having the same name (but for use of the ordinal term) to distinguish the claim elements.
- Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having,” “containing,” “involving,” and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
Claims (30)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/137,901 US10635389B2 (en) | 2017-06-07 | 2018-09-21 | Systems and methods for automatically generating enhanced audio output |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762516605P | 2017-06-07 | 2017-06-07 | |
PCT/US2018/034340 WO2018226419A1 (en) | 2017-06-07 | 2018-05-24 | Systems and methods for automatically generating enhanced audio output |
US16/137,901 US10635389B2 (en) | 2017-06-07 | 2018-09-21 | Systems and methods for automatically generating enhanced audio output |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2018/034340 Continuation WO2018226419A1 (en) | 2017-06-07 | 2018-05-24 | Systems and methods for automatically generating enhanced audio output |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190018645A1 true US20190018645A1 (en) | 2019-01-17 |
US10635389B2 US10635389B2 (en) | 2020-04-28 |
Family
ID=64566646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/137,901 Active US10635389B2 (en) | 2017-06-07 | 2018-09-21 | Systems and methods for automatically generating enhanced audio output |
Country Status (2)
Country | Link |
---|---|
US (1) | US10635389B2 (en) |
WO (1) | WO2018226419A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10672371B2 (en) | 2015-09-29 | 2020-06-02 | Amper Music, Inc. | Method of and system for spotting digital media objects and event markers using musical experience descriptors to characterize digital music to be automatically composed and generated by an automated music composition and generation engine |
US10854180B2 (en) | 2015-09-29 | 2020-12-01 | Amper Music, Inc. | Method of and system for controlling the qualities of musical energy embodied in and expressed by digital music to be automatically composed and generated by an automated music composition and generation engine |
US10964299B1 (en) | 2019-10-15 | 2021-03-30 | Shutterstock, Inc. | Method of and system for automatically generating digital performances of music compositions using notes selected from virtual musical instruments based on the music-theoretic states of the music compositions |
US11024275B2 (en) | 2019-10-15 | 2021-06-01 | Shutterstock, Inc. | Method of digitally performing a music composition using virtual musical instruments having performance logic executing within a virtual musical instrument (VMI) library management system |
US11037538B2 (en) | 2019-10-15 | 2021-06-15 | Shutterstock, Inc. | Method of and system for automated musical arrangement and musical instrument performance style transformation supported within an automated music performance system |
US20210357174A1 (en) * | 2020-05-18 | 2021-11-18 | Waves Audio Ltd. | DIgital Audio Workstation with Audio Processing Recommendations |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018226418A1 (en) * | 2017-06-07 | 2018-12-13 | iZotope, Inc. | Systems and methods for identifying and remediating sound masking |
CN110610702B (en) * | 2018-06-15 | 2022-06-24 | 惠州迪芬尼声学科技股份有限公司 | Method for sound control equalizer by natural language and computer readable storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101359473A (en) * | 2007-07-30 | 2009-02-04 | 国际商业机器公司 | Auto speech conversion method and apparatus |
US9310959B2 (en) * | 2009-06-01 | 2016-04-12 | Zya, Inc. | System and method for enhancing audio |
GB2503867B (en) * | 2012-05-08 | 2016-12-21 | Landr Audio Inc | Audio processing |
JP6453314B2 (en) * | 2013-05-17 | 2019-01-16 | ハーマン・インターナショナル・インダストリーズ・リミテッド | Audio mixer system |
-
2018
- 2018-05-24 WO PCT/US2018/034340 patent/WO2018226419A1/en active Application Filing
- 2018-09-21 US US16/137,901 patent/US10635389B2/en active Active
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11430418B2 (en) | 2015-09-29 | 2022-08-30 | Shutterstock, Inc. | Automatically managing the musical tastes and preferences of system users based on user feedback and autonomous analysis of music automatically composed and generated by an automated music composition and generation system |
US11468871B2 (en) | 2015-09-29 | 2022-10-11 | Shutterstock, Inc. | Automated music composition and generation system employing an instrument selector for automatically selecting virtual instruments from a library of virtual instruments to perform the notes of the composed piece of digital music |
US12039959B2 (en) | 2015-09-29 | 2024-07-16 | Shutterstock, Inc. | Automated music composition and generation system employing virtual musical instrument libraries for producing notes contained in the digital pieces of automatically composed music |
US11011144B2 (en) | 2015-09-29 | 2021-05-18 | Shutterstock, Inc. | Automated music composition and generation system supporting automated generation of musical kernels for use in replicating future music compositions and production environments |
US11017750B2 (en) | 2015-09-29 | 2021-05-25 | Shutterstock, Inc. | Method of automatically confirming the uniqueness of digital pieces of music produced by an automated music composition and generation system while satisfying the creative intentions of system users |
US11776518B2 (en) | 2015-09-29 | 2023-10-03 | Shutterstock, Inc. | Automated music composition and generation system employing virtual musical instrument libraries for producing notes contained in the digital pieces of automatically composed music |
US11030984B2 (en) | 2015-09-29 | 2021-06-08 | Shutterstock, Inc. | Method of scoring digital media objects using musical experience descriptors to indicate what, where and when musical events should appear in pieces of digital music automatically composed and generated by an automated music composition and generation system |
US11037539B2 (en) | 2015-09-29 | 2021-06-15 | Shutterstock, Inc. | Autonomous music composition and performance system employing real-time analysis of a musical performance to automatically compose and perform music to accompany the musical performance |
US11657787B2 (en) | 2015-09-29 | 2023-05-23 | Shutterstock, Inc. | Method of and system for automatically generating music compositions and productions using lyrical input and music experience descriptors |
US11651757B2 (en) | 2015-09-29 | 2023-05-16 | Shutterstock, Inc. | Automated music composition and generation system driven by lyrical input |
US10854180B2 (en) | 2015-09-29 | 2020-12-01 | Amper Music, Inc. | Method of and system for controlling the qualities of musical energy embodied in and expressed by digital music to be automatically composed and generated by an automated music composition and generation engine |
US11037541B2 (en) | 2015-09-29 | 2021-06-15 | Shutterstock, Inc. | Method of composing a piece of digital music using musical experience descriptors to indicate what, when and how musical events should appear in the piece of digital music automatically composed and generated by an automated music composition and generation system |
US11037540B2 (en) | 2015-09-29 | 2021-06-15 | Shutterstock, Inc. | Automated music composition and generation systems, engines and methods employing parameter mapping configurations to enable automated music composition and generation |
US10672371B2 (en) | 2015-09-29 | 2020-06-02 | Amper Music, Inc. | Method of and system for spotting digital media objects and event markers using musical experience descriptors to characterize digital music to be automatically composed and generated by an automated music composition and generation engine |
US11430419B2 (en) | 2015-09-29 | 2022-08-30 | Shutterstock, Inc. | Automatically managing the musical tastes and preferences of a population of users requesting digital pieces of music automatically composed and generated by an automated music composition and generation system |
US11037538B2 (en) | 2019-10-15 | 2021-06-15 | Shutterstock, Inc. | Method of and system for automated musical arrangement and musical instrument performance style transformation supported within an automated music performance system |
US11024275B2 (en) | 2019-10-15 | 2021-06-01 | Shutterstock, Inc. | Method of digitally performing a music composition using virtual musical instruments having performance logic executing within a virtual musical instrument (VMI) library management system |
US10964299B1 (en) | 2019-10-15 | 2021-03-30 | Shutterstock, Inc. | Method of and system for automatically generating digital performances of music compositions using notes selected from virtual musical instruments based on the music-theoretic states of the music compositions |
GB2595222A (en) * | 2020-05-18 | 2021-11-24 | Waves Audio Ltd | Digital audio workstation with audio processing recommendations |
US20210357174A1 (en) * | 2020-05-18 | 2021-11-18 | Waves Audio Ltd. | DIgital Audio Workstation with Audio Processing Recommendations |
CN113691909A (en) * | 2020-05-18 | 2021-11-23 | 波音频有限公司 | Digital audio workstation with audio processing recommendations |
US11687314B2 (en) * | 2020-05-18 | 2023-06-27 | Waves Audio Ltd. | Digital audio workstation with audio processing recommendations |
Also Published As
Publication number | Publication date |
---|---|
WO2018226419A1 (en) | 2018-12-13 |
US10635389B2 (en) | 2020-04-28 |
WO2018226419A8 (en) | 2019-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10635389B2 (en) | Systems and methods for automatically generating enhanced audio output | |
US8649531B2 (en) | Method and system for approximating graphic equalizers using dynamic filter order reduction | |
KR102477001B1 (en) | Method and apparatus for adjusting audio playback settings based on analysis of audio characteristics | |
KR102074135B1 (en) | Volume leveler controller and controlling method | |
US9530396B2 (en) | Visually-assisted mixing of audio using a spectral analyzer | |
US7774078B2 (en) | Method and apparatus for audio data analysis in an audio player | |
US20190074807A1 (en) | Audio control system and related methods | |
US10433089B2 (en) | Digital audio supplementation | |
CN102881283B (en) | Method and system for speech processing | |
US10014841B2 (en) | Method and apparatus for controlling audio playback based upon the instrument | |
WO2011035626A1 (en) | Audio playing method and audio playing apparatus | |
US20240314499A1 (en) | Techniques for audio track analysis to support audio personalization | |
CN109147739B (en) | Sound effect adjusting method, medium, device and computing equipment based on voice control | |
TWI607321B (en) | System and method for optimizing music | |
JPWO2020066681A1 (en) | Information processing equipment and methods, and programs | |
US12204814B2 (en) | Computer implemented method, device and computer program product for setting a playback speed of media content comprising audio | |
CN102045619B (en) | Recording apparatus, recording method, audio signal correction circuit, and program | |
CN119094959A (en) | Audio parameter setting method and electronic device | |
Case | Mix smart: Professional techniques for the home studio | |
US11935552B2 (en) | Electronic device, method and computer program | |
JP2019205114A (en) | Data processing apparatus and data processing method | |
CN112185325B (en) | Audio playback style adjustment method, device, electronic device and storage medium | |
Seppänen | Production of an audiobook: recording, editing and mastering | |
US20240321320A1 (en) | Harmonizing system for optimizing sound in content | |
CN117765900A (en) | Audio curve generation method, electronic device, driving device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: CAMBRIDGE TRUST COMPANY, MASSACHUSETTS Free format text: SECURITY INTEREST;ASSIGNORS:IZOTOPE, INC.;EXPONENTIAL AUDIO, LLC;REEL/FRAME:050499/0420 Effective date: 20190925 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
AS | Assignment |
Owner name: IZOTOPE, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MCCLELLAN, JAMES;WICHERN, GORDON;WISHNICK, AARON;AND OTHERS;SIGNING DATES FROM 20170901 TO 20171003;REEL/FRAME:051757/0605 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: EXPONENTIAL AUDIO, LLC, MASSACHUSETTS Free format text: TERMINATION AND RELEASE OF GRANT OF SECURITY INTEREST IN UNITED STATES PATENTS;ASSIGNOR:CAMBRIDGE TRUST COMPANY;REEL/FRAME:055627/0958 Effective date: 20210310 Owner name: IZOTOPE, INC., MASSACHUSETTS Free format text: TERMINATION AND RELEASE OF GRANT OF SECURITY INTEREST IN UNITED STATES PATENTS;ASSIGNOR:CAMBRIDGE TRUST COMPANY;REEL/FRAME:055627/0958 Effective date: 20210310 |
|
AS | Assignment |
Owner name: LUCID TRUSTEE SERVICES LIMITED, UNITED KINGDOM Free format text: INTELLECTUAL PROPERTY SECURITY AGREEMENT;ASSIGNOR:IZOTOPE, INC.;REEL/FRAME:056728/0663 Effective date: 20210630 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: NATIVE INSTRUMENTS USA, INC., MASSACHUSETTS Free format text: CHANGE OF NAME;ASSIGNOR:IZOTOPE, INC.;REEL/FRAME:065317/0822 Effective date: 20231018 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |