WO2020102005A1 - Dynamic music creation in gaming - Google Patents
Dynamic music creation in gaming Download PDFInfo
- Publication number
- WO2020102005A1 WO2020102005A1 PCT/US2019/060306 US2019060306W WO2020102005A1 WO 2020102005 A1 WO2020102005 A1 WO 2020102005A1 US 2019060306 W US2019060306 W US 2019060306W WO 2020102005 A1 WO2020102005 A1 WO 2020102005A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- emotion
- musical
- game
- motifs
- event
- Prior art date
Links
- 230000008451 emotion Effects 0.000 claims abstract description 55
- 238000000034 method Methods 0.000 claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 39
- 239000000203 mixture Substances 0.000 claims abstract description 13
- 230000002996 emotional effect Effects 0.000 claims description 46
- 238000013528 artificial neural network Methods 0.000 claims description 34
- 238000013507 mapping Methods 0.000 claims description 24
- 230000015654 memory Effects 0.000 claims description 22
- 230000004044 response Effects 0.000 claims description 4
- 230000006399 behavior Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 description 24
- 238000004458 analytical method Methods 0.000 description 22
- 230000006870 function Effects 0.000 description 22
- 238000012552 review Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 12
- 238000013527 convolutional neural network Methods 0.000 description 11
- 239000011159 matrix material Substances 0.000 description 11
- 230000004913 activation Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 230000036651 mood Effects 0.000 description 9
- 230000001020 rhythmical effect Effects 0.000 description 8
- 238000004088 simulation Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000036772 blood pressure Effects 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 4
- 230000033764 rhythmic process Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 235000006040 Prunus persica var persica Nutrition 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000002156 mixing Methods 0.000 description 3
- 230000029058 respiratory gaseous exchange Effects 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 230000009278 visceral effect Effects 0.000 description 3
- 244000144730 Amygdalus persica Species 0.000 description 2
- 241001050985 Disco Species 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000005266 casting Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000000284 resting effect Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- IJJWOSAXNHWBPR-HUBLWGQQSA-N 5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]-n-(6-hydrazinyl-6-oxohexyl)pentanamide Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)NCCCCCC(=O)NN)SC[C@@H]21 IJJWOSAXNHWBPR-HUBLWGQQSA-N 0.000 description 1
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 102100032202 Cornulin Human genes 0.000 description 1
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 1
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 1
- 240000005809 Prunus persica Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- ZINJLDJMHCUBIP-UHFFFAOYSA-N ethametsulfuron-methyl Chemical compound CCOC1=NC(NC)=NC(NC(=O)NS(=O)(=O)C=2C(=CC=CC=2)C(=O)OC)=N1 ZINJLDJMHCUBIP-UHFFFAOYSA-N 0.000 description 1
- 229910001385 heavy metal Inorganic materials 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 230000007510 mood change Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000001931 thermography Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/50—Controlling the output signals based on the game progress
- A63F13/54—Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/70—Game security or game management aspects
- A63F13/77—Game security or game management aspects involving data related to game devices or game servers, e.g. configuration data, software version or amount of memory
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/20—Input arrangements for video game devices
- A63F13/21—Input arrangements for video game devices characterised by their sensors, purposes or types
- A63F13/212—Input arrangements for video game devices characterised by their sensors, purposes or types using sensors worn by the player, e.g. for measuring heart beat or leg activity
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/10—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
- A63F2300/1012—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals involving biosensors worn by the player, e.g. for measuring heart beat, limb activity
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6027—Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/021—Background music, e.g. for video sequences or elevator music
- G10H2210/026—Background music, e.g. for video sequences or elevator music for games, e.g. videogames
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/111—Automatic composing, i.e. using predefined musical rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/125—Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/091—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
- G10H2220/101—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
- G10H2220/106—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters using icons, e.g. selecting, moving or linking icons, on-screen symbols, screen regions or segments representing musical elements or parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/085—Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
Definitions
- the present disclosure relates to the fields of music composition, music orchestration, machine learning, game design and psychological mapping of emotions.
- Gaming has always been a dynamic pursuit with game play responding to the actions of the players.
- games become more cinematic and more immersive, music continues to grow in importance.
- music in games is mostly created from pre-written snippets (usually pre recorded) that are pieced together like puzzle pieces. Occasionally they are slowed down, sped up, pitch transposed and often overlain on top of each other.
- a composer can guess about likely paths through gameplay but because it is interactive, much of it is unpredictable - certainly, the timing of most sections is rarely predictable.
- the present disclsoure describes a mechanism for analyzing music, separating out its musical components (rhythms, time signature, melodic structure, modality, harmonic structure, harmonic density, rhythmic density and timbral density) mapping those components to emotional components individually and in combination, based on published reviews and social media expressing human opinions about concerts, records, etc. This is done at both a macro and micro level within the musical works. Based on this training set, faders (or virtual faders in software) are given emotional components like Tension, Power, Joy, Wonder, Tenderness, Transcendence, Peacefulness, Nostalgia, Sadness, Sensuality, Fear, etc.
- motifs which have been created for individual elements/participants of the game including but not limited to Characters (Lead Person, Partner, Primary Enemy, Wizard, etc.), Activity Types (fighting, resting, planning, hiding etc.), Areas (forest, city, desert, etc.), Personality of the person playing the game, etc.
- the motifs can be melodic, harmonic, rhythmic, etc.
- Figure 1 is a block diagram depicting an overview of the Dynamic Music Creation
- Figure 2 is a block diagram showing the collection and an analysis of musical elements used to generate a corpus of musical elements according to aspects of the present disclosure.
- Figure 3 is a block diagram depicting the collection and analysis of reviews and commentaries used to generate a corpus of annotated performance data according to aspects of the present disclosure.
- Figure 4 is a representative depiction of an embodiment of emotional parameters of music according to aspects of the present disclosure.
- Figure 5 is a systematic view of training of a model using the musical and emotional review elements according to aspects of the present disclosure.
- Figure 6 is a block diagram of showing creation of an annotated collection of game elements according to aspects of the present disclosure.
- Figure 7 is a block diagram showing the creation and mapping of musical elements against game elements using virtual or real faders according to aspects of the present disclosure.
- Figure 8 is a block diagram depicting the interaction of Faders and Switches with the Game Scenario Playback Engine according to aspects of the present disclosure.
- Figure 9 is a block diagram showing how the cuing display is used in mapping the game and musical elements by previewing upcoming events according to aspects of the present disclosure.
- Figure 10 is a block diagram of how foreshadowing is enabled using the cuing display to manage upcoming musical and gaming elements according to aspects of the present disclosure.
- Figure 11 depicts a block schematic diagram of a system for dynamic music creation in gaming according to aspects of the present disclosure.
- Figure 12A is a simplified node diagram of a recurrent neural network for use in dynamic music creation in gaming according to aspects of the present disclosure.
- Figure 12B is a simplified node diagram of an unfolded recurrent neural network for use in dynamic music creation in gaming according to aspects of the present disclosure.
- Figure 12C is a simplified diagram of a convolutional neural network for use in dynamic music creation in gaming according to aspects of the present disclosure.
- Figure 12D is a block diagram of a method for training a neural network in dynamic music creation in gaming according to aspects of the present disclosure.
- the musical emotional data must be constructed: Musical Data can be collected from a corpus of written and recorded music 101. This can be scores, transcriptions created by people or transcriptions created intelligent software systems. Next, the system has to break the musical data into its individual components - melodies, harmonies, rhythms, etc. 102 and these must be stored as metadata associated with pieces or parts of pieces. Now, in order to determine the emotional components or markers associated with the individual musical elements, we will rely on the wisdom of the crowds. This is accurate, by definition, because it is people’s emotions in relation to the music that we are trying to capture. We will use reviews, blog posts, liner notes and other forms of commentary to generate emotional metadata for whole pieces and individual sections 103. The next step is to map musical metadata to emotional metadata 104.
- CNN convolutional neural network
- the next step is to analyze the gaming environment.
- the phases, components emotions and characteristics of the game must first be collected and mapped. These scenes and components include things like locales, characters, environments, verbs like fights, casting spells, etc. 106.
- the next step is to map game components to the emotional metadata components 107.
- a known structural musical analysis 310 is applied to determine which pieces and/or sections of pieces or melodic phrases are beautiful, disappointing, melancholy, strident, plaintive, solemn, powerful, expressive, etc. This may involve looking at the music literature and apply general principles of compositional development to the analysis. For example, a melodic leap followed by stepwise reverse of direction is considered beautiful. Melodic intervals can be associated along a scale of pleasing, unpleasing or neutral (e.g., a jump of a minor 9 th is unpleasant, a fifth is heroic and a step within the key signature is neutral). All of this performance and compositional data may be used to Map Emotional Key Words to Performances and Segments 311.
- mapping the musical elements to create a corpus of musical performances need not be, and will not likely be, a one to one mapping but will rather be set on a scale of an emotional continuum, e.g., a set of musical vectors. For example, a piece might be 8 out of 10 on a scale of sensuality and also 4 out of 10 on a scale of sadness.
- the model can be tested on titles that are not in the training dataset. Humans can listen to the results and fine tune the model until it is more accurate. Ultimately, the model will get very accurate for a place in time.
- the model can map not only the pieces and sections to emotional vectors but may also map the constituent melodies, rhythms and chord progressions to those same emotional vectors.
- the machine learns melodic structures (like a melodic leap upward followed by a step downward is generally considered beautiful and a leap of a minor ninth is generally considered discordant, etc.).
- the structural musical analysis 310 may involve a number of components.
- beauty, dissonance and all measures of subjective description could be placed in time as a compositional reference.
- dissonant in Bach’s era for example it was considered too dissonant to play the interval a major 7 th unless the 7 th was carried over from the previous more consonant chord
- beautiful in later eras today a major seventh chord is considered normal and sometimes even sappy.
- the moods of songs are divided according to psychologist Robert Thayer’s traditional model of mood.
- the model divides songs along the lines of energy and stress, from happy to sad and calm to energetic, respectively.
- the eight categories created by Thayer’s model include the extremes of the two lines as well as each of the possible intersections of the lines (e.g. happy-energetic or sad-calm).
- This historic analysis may be of limited value and the approach described herein may be much more nuanced and flexible. Because of the size of the corpus and therefore the training dataset, a much richer and more nuanced analysis can be applied.
- Some of the components to be analyzed 402 may include, but are not limited to, harmonic groupings, modes and scales, time signature(s), tempo(s), harmonic density, rhythmic density, melodic structure, phrase length (and structure), dynamics, phrasing and compositional techniques (transposition, inversion, retrograde, etc.), grooves (including rhythms like Funky, Lounge, Latin, Reggae, Swing, Tango, Merengue, Salsa, Fado, 60s disco, 70s disco, Heavy Metal, etc. - there are hundreds of established rhythmic styles)
- the corpus of musical data into three buckets.
- the first bucket, Corpus A 501 is a set of musical data 502 for which the Collected Emotional Descriptions 503 have been mapped based on analysis of listener review data.
- the second bucket, Corpus B 506 is a set of musical data 507 for which its own set of Collected Emotional
- the third bucket, Corpus C is a set of Music Data 511 for which there are no Collected Emotional Descriptions.
- the model is initially trained on Corpus A.
- a convolutional neural network 504 the Collected Emotional Descriptions 503 are mapped to the musical data 502.
- the results of this mapping is a Trained Engine 505, which should be capable of deriving Emotional Descriptions from Musical Data.
- the Musical Data 507 for Corpus B 506 it is then fed into the Trained Engine 505.
- the resultant output is the Predicted Emotional Descriptions of Corpus B Music 509.
- These predictions are then compared against Collected Descriptions 510 to determine how accurate the predictions are. This process of predicting, comparing and iterating (e.g.
- All of these game elements 605 are collected along with their triggers (in or out) and modifiers. Additionally, these may be collected as game vectors 606. They are referred to as vectors because many of the components have multiple states. For example, a field might be different at night than in the daytime and different in the rain than in the sunshine. An enemy might have variable powers that increase or decrease based on time, place, or level. From this perspective, any element could have an array of vectors. All game elements and their vectors have moods associated with them. The mood may vary with the value of the vector but, nonetheless, moods can be associated with the game elements. Once we have mapped the game elements to their Emotions 607, we can store that relationship in a corpus or Collection of Annotated Game Elements 608.
- compositional tools could be used to create music on its own from the game play but the focus of this embodiment is to create the music from compositional primitives.
- the composer will create musical elements or motifs and associate them with characters or elements in the game.
- the use of the tools here need not be used alone and will often be used to augment traditional scoring techniques where prerecorded music is combined both sequentially and in layers with multiple components being mixed together to create a whole. That is, today, different layers may start in different time and some will overlap with others and still others may run independently of other layers.
- the idea here is to develop additional tools that can be used 1) in addition to (e.g. on top of) existing techniques, 2) instead of existing techniques in some or all places or 3) as a mechanism to inform the use of existing techniques. Additionally, these mechanisms could be used to create entirely new forms of interactive media (for example people trying to control their blood pressure or brain wave states could use musical feedback as a training tool or even use biometric markers as a compositional tool.
- motifs typically refer to small melodic segments.
- motifs can be melodic segments, harmonic structures, rhythmic structures and or specific tonalities. Motifs can be created for as many of the individual elements/participants of the game as desired, including but not limited to characters (lead person, partner, primary enemy, wizard, etc.), activity types (fighting, resting, planning, hiding etc.), areas (forest, city, desert, etc.), personality of the person playing the game (young, old, male, female, introvert, extrovert, etc.).
- the motifs can be melodic, harmonic, rhythmic, etc. Additionally, there can be multiple motifs for an individual element, for example a rhythmic pattern and a melodic pattern that might be used individually or together or there might be both a sad motif and a happy motif for the same character to be used in different circumstances.
- buttons could be real physical buttons or faders or could be virtual buttons or faders.
- physical buttons and faders will be more intuitive to use and likely have better outcomes.
- actual physical faders as used in computerized audio mixing consoles
- the physicality will create serendipitous results (e.g. raising the sensuality folder might have a better and more interesting effect than raising the tension fader even though it is a tense environment).
- raising the sensuality folder might have a better and more interesting effect than raising the tension fader even though it is a tense environment).
- this application either may work.
- the logical order depicted in Figure 7 may begin with a Collection of Annotated Game Elements 701, which may be carried over from the Collection of Annotated Game Elements 608 in Figure 6.
- Game triggers are then assigned to switches at 702. These triggers could be the appearance of an enemy or an obstacle or a level up, etc., etc., etc.
- Primary Emotional Vectors 703 are selected and visual markers are assigned at 704 so that element types may be displayed to the Composer.
- Game elements are then assigned to emotional elements in multi dimensional arrays, as indicated at 705.
- the multi-dimensional arrays are then assigned to faders and switches, which are used to map musical emotional markers to game markers at 710. Once a set of emotional markers for game elements has been established, music be applied to the game elements.
- the Composer writes the motifs 707.
- the motifs can represent characters, events, areas or moods.
- baseline emotions for motifs 708. These are the default emotions. For example, anytime a wizard appears, there might be a default motif with its default emotion. These can vary based on circumstance but if no variables are applied, they will operate in their default mode.
- buttons perhaps in a colored matrix so it is easy to see many at once. Some of these may be the same as the game switches but others will be musical switches. These could be grouped by type of character (hero, villain, wizard, etc.), musical composition component (say grooves on one side and melodies on another, modes across the top) and scenes (city, country, etc.)
- game simulations can be run in real time to early versions of the game, even if it is only storyboards or even if there are no storyboards just to write music that applies to different scenarios.
- the composer can select motif combinations and emotional mappings and applies them to various simulations and test drive them. As the game develops, these scenarios can be fine- tuned.
- this can be used as a generalized composition too where a composer or performer can use the machine to create music based on primitives.
- Figure 8 illustrates an example of how game simulations and musical programming can actually be utilized.
- motifs 801, game triggers 802 and game components 803 are mapped to faders and switches 804.
- These motifs 801, game triggers 802 and game components 803 mapped to faders and switches 804 are plugged into a game scenario playback engine 805.
- the composer now plays the music while testing various gaming scenarios 806 and it is recorded by a recorded music scene / event mapping module 808.
- the composer can look at a cuing display screen 808 showing what elements are being used or are coming up. If a fader is touched, the element to which it is mapped may be highlighted on the screen to provide visual feedback.
- the cuing display can show the elements from the game scenario playback engine 805, as they are playing out from the live faders and switches. This is represented by a play music while gaming module 806 and from the recorded music scene / event mapping module 807.
- all the faders and switches can be updated and modified in real time at 809 and remembered for later playback.
- motifs, components and triggers can be updated at 810 and remembered by the system.
- there are unlimited undos and different performances can be saved as different versions, which can be used at different times and in combination with each other.
- Figure 9 displays additional detail on how a Cuing Monitor 907 may be used according to aspects of the present disclosure.
- Upcoming events 901, character 902 and locations 903 can all be seen on the Cuing Monitor (there may be multiple Cuing Monitors).
- These and the Fader (and Switch) Mapping Matrix 905 are all displayed on one or more Cuing Monitors 907.
- One other input to the fader mapping matrix 905 and visible on a Cuing Monitor is the Player History / Prediction Matrix 906. Based on the player’s style of play or other factors (time of day, number of minutes or hours in the current session, state of the game, etc.), this matrix can vary the music either automatically or based on parameters set by the composer or sound designer.
- Many foreshadowings are possible from possible fear overtones to happy projections. Again, because, this is designed to be a visceral too, this functionality may well end up creating unanticipated results some of which will be useful to program into the game.
- a set of Possible Next Events lOOl can be seen on the Cuing Matrix 1003.
- Next Events Timing Matrix 1002 which can be used to set the timing values and ramping characteristics. Another factor that can play into the
- Foreshadowing is the Probability Emotion Weighting Engine 1004. This Engine weighs the expected emotional state of the upcoming event and is used to effect the foreshadowing audio. It can be seen on the Cueing Monitor and the weighting of its effect can be varied. The likelihood of an upcoming event to have any particular emotion and the predicted intensity of that emotion is mapped by the Event to Emotion Likelihood Mapping Engine 1005. Foreshadowing is also affected by Player History and the Player History / Prediction Matrix 1007, This Matrix can be seen on the Cuing Monitor and also feeds into the Foreshadowing Matrix Engine 1006.
- compositional tool are not limited to game use only. Interactive VR environments, even in non-game uses can take advantage of these techniques. Additionally, this could be used as a compositional too for scoring a traditional TV show or Film. And one final use might be to use this in the pure creation of music to create albums or beds for pop songs, etc.
- FIG. 11 depicts a system for dynamic music creation in gaming according to aspects of the present disclosure.
- the system may include a computing device 1100 coupled to a user input device 1102.
- the user input device 1102 may be a controller, touch screen, microphone, keyboard, mouse, joystick, fader board or other device that allows the user to input information including sound data in to the system.
- the system may also be coupled to a biometric device 1123 configured to measure a electro-dermal activity, pulse and respiration, body temperature, blood pressure, or brain wave activity.
- the biometric device may be for example and without limitation a thermal or IR camera pointed at a user and configured to determine the respiration and heart rate of user from thermal signatures, for more information see commonly owned Patent No. 8,638,364 to Chen et al.“USER INTERFACE SYSTEM AND METHOD USING
- the biometric device may be such other devices as for example and without a limitation, a pulse oximeter, blood pressure cuff, electroencephalograph machine,
- electrocardiograph machine wearable activity tracker or a smartwatch with bio-sensing.
- the computing device 1100 may include one or more processor units 1103, which may be configured according to well-known architectures, such as, e.g., single-core, dual-core, quad- core, multi-core, processor-coprocessor, cell processor, and the like.
- the computing device may also include one or more memory units 1104 (e.g., random access memory (RAM), dynamic random access memory (DRAM), read-only memory (ROM), and the like).
- RAM random access memory
- DRAM dynamic random access memory
- ROM read-only memory
- the processor unit 1103 may execute one or more programs, portions of which may be stored in the memory 1104 and the processor 1103 may be operatively coupled to the memory, e.g., by accessing the memory via a data bus 1105.
- the programs may be configured to generate or use sound motifs 1108 to create music based on game vectors 1109 and emotion vectors 1110 of a videogame.
- the sound motifs may be short musical motifs composed by musicians, users or machines.
- the Memory 1104 may contain programs that implement training of a sound categorization and classification NNs 1121.
- the memory 1104 may also contain one or more databases 1122 of annotated performance data and emotional descriptions.
- Neural network modules 1121 e.g., convolutional neural networks for associating musical motifs with emotions may also be stored in the memory 1104.
- the memory 1104 may store a report 1110 lasting items not identified by the neural network modules 1121 as being in the databases 1122.
- the sound motifs, game vectors, emotional vectors, neural network modules and annotated performance data, 1108, 1109, 1121, 1122 may also be stored as data 1118 in the Mass Store 1118 or at a server coupled to the Network 1120 accessed through the network interface 1114.
- data for a videogame may be stored the memory 1104 as data in the database or elsewhere or as a program 1117 or data 1118 in the mass store 1115.
- the overall structure and probabilities of the NNs may also be stored as data 1118 in the Mass Store 1115.
- the processor unit 1103 is further configured to execute one or more programs 1117 stored in the mass store 1115 or in memory 1104 which cause processor to carry out a method of dynamic music creation using musical motifs 1108, game vectors 1109 and emotional vectors 1110 as described herein. Music generated from the musical motifs may be stored in the database 1122. Additionally the processor may carry out the method for NN 1121 training and classification of musical motifs to emotions as described herein.
- the system 1100 may generate the Neural Networks 1122 as part of a NN training process and store them in memory 1104. Completed NNs may be stored in memory 1104 or as data 1118 in the mass store 1115.
- the NN 1121 may be trained using actual responses from users with the biometric device 1123 being used to provide biological feedback from the user.
- the computing device 1100 may also include well-known support circuits, such as input/output (I/O) 1107, circuits, power supplies (P/S) 1111, a clock (CLK) 1112, and cache 1113, which may communicate with other components of the system, e.g., via the bus 1105. .
- the computing device may include a network interface 1114.
- the processor unit 1103 and network interface 1114 may be configured to implement a local area network (LAN) or personal area network (PAN), via a suitable network protocol, e.g., Bluetooth, for a PAN.
- LAN local area network
- PAN personal area network
- the computing device may optionally include a mass storage device 1115 such as a disk drive, CD-ROM drive, tape drive, flash memory, or the like, and the mass storage device may store programs and/or data.
- the computing device may also include a user interface 1116 to facilitate interaction between the system and a user.
- the user interface may include a monitor, Television screen, speakers, headphones or other devices that communicate information to the user.
- the computing device 1100 may include a network interface 1114 to facilitate
- the network interface 1114 may be configured to implement wired or wireless communication over local area networks and wide area networks such as the Internet.
- the device 1100 may send and receive data and/or requests for files via one or more message packets over the network 1120.
- Message packets sent over the network 1120 may temporarily be stored in a buffer 1109 in memory 1104.
- the annotated performance data, sound motifs, and annotated game elements may be available through the network 1120 and stored partially in memory 1104 for use.
- neural networks used in dynamic music generation may include one or more of several different types of neural networks and may have many different layers.
- the classification neural network may consist of one or multiple convolutional neural networks (CNN), recurrent neural networks (RNN) and/or dynamic neural networks (DNN).
- CNN convolutional neural networks
- RNN recurrent neural networks
- DNN dynamic neural networks
- FIG 12A depicts the basic form of an RNN having a layer of nodes 1220, each of which is characterized by an activation function S, one input weight U, a recurrent hidden node transition weight W, and an output transition weight V.
- the activation function S may be any non-linear function known in the art and is not limited to the (hyperbolic tangent (tanh) function.
- the activation function S may be a Sigmoid or ReLu function.
- RNNs have one set of activation functions and weights for the entire layer.
- the RNN may be considered as a series of nodes 1220 having the same activation function moving through time T and T+l.
- the RNN maintains historical information by feeding the result from a previous time T to a current time T+l.
- a convolutional RNN may be used.
- Another type of RNN that may be used is a Long Short-Term Memory (LSTM) Neural Network which adds a memory block in a RNN node with input gate activation function, output gate activation function and forget gate activation function resulting in a gating memory that allows the network to retain some information for a longer period of time as described by Hochreiter & Schmidhuber“Long Short term memory” Neural Computation 9(8): 1735-1780 (1997), which is incorporated herein by reference.
- Figure 12C depicts an example layout of a convolution neural network such as a CRNN according to aspects of the present disclosure.
- the convolution neural network is generated for training data in the form of an array 1232, e.g., with 4 rows and 4 columns giving a total of 16 elements.
- the depicted convolutional neural network has a filter 1233 size of 2 rows by 2 columns with a skip value of 1 and a channel 1236 of size 9.
- a filter 1233 size of 2 rows by 2 columns with a skip value of 1
- a channel 1236 of size 9 For clarity in Figure 12C only the connections 1234 between the first column of channels and their filter windows is depicted. Aspects of the present disclosure, however, are not limited to such implementations.
- the convolutional neural network that implements the classification 1229 may have any number of additional neural network node layers 1231 and may include such layer types as additional convolutional layers, fully connected layers, pooling layers, max pooling layers, local contrast normalization layers, etc. of any size.
- Training a neural network begins with initialization of the weights of the NN 1241.
- the initial weights should be distributed randomly.
- an NN with a tanh activation function should have random values distributed between _ 1_ J_
- n is the number of inputs to the node.
- the NN is then provided with a feature vector or input dataset 1242.
- Each of the different features vectors may be generated by the NN from inputs that have known labels.
- the NN may be provided with feature vectors that correspond to inputs having known labeling or classification.
- the NN then predicts a label or classification for the feature or input 1243.
- the predicted label or class is compared to the known label or class (also known as ground truth) and a loss function measures the total error between the predictions and ground truth over all the training samples 1244.
- the loss function may be a cross entropy loss function, quadratic cost, triplet contrastive function, exponential cost, etc.
- a cross entropy loss function may be used whereas for learning pre-trained embedding a triplet contrastive function may be employed.
- the NN is then optimized and trained, using the result of the loss function and using known methods of training for neural networks such as backpropagation with adaptive gradient descent etc. 1245.
- the optimizer tries to choose the model parameters (i.e., weights) that minimize the training loss function (i.e. total error). Data is partitioned into training, validation, and test samples.
- the Optimizer minimizes the loss function on the training samples. After each training epoch, the mode is evaluated on the validation sample by computing the validation loss and accuracy. If there is no significant change, training can be stopped and the resulting trained model may be used to predict the labels of the test data.
- the neural network may be trained from inputs having known labels or
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Cardiology (AREA)
- General Health & Medical Sciences (AREA)
- Heart & Thoracic Surgery (AREA)
- Business, Economics & Management (AREA)
- Electrophonic Musical Instruments (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
A method and system for dynamic music creation is disclosed. An emotion is assigned to one or more musical motifs and a game vector is associated with the emotion. The one or more musical motifs are mapped to the game vector based on the emotion. A musical composition is generated based on the game vector and desired emotions.
Description
DYNAMIC MUSIC CREATION IN GAMING
FIELD OF THE DISCLOSURE
The present disclosure relates to the fields of music composition, music orchestration, machine learning, game design and psychological mapping of emotions.
BACKGROUND OF THE INVENTION
Gaming has always been a dynamic pursuit with game play responding to the actions of the players. As games become more cinematic and more immersive, music continues to grow in importance. Currently, music in games is mostly created from pre-written snippets (usually pre recorded) that are pieced together like puzzle pieces. Occasionally they are slowed down, sped up, pitch transposed and often overlain on top of each other. A composer can guess about likely paths through gameplay but because it is interactive, much of it is unpredictable - certainly, the timing of most sections is rarely predictable.
In parallel Machine Learning and Artificial Intelligence have been making it possible to generate content based on training sets of existing content as labeled by human reviewers.
Additionally, there is a large corpus of review data and emotional mapping to various forms of artistic expression.
As one further element, we are learning more and more about the players that participate in the game. Players who have opted in can be tracked on social media, analysis about their personalities can be made based on their behavior and as more and more users take advantage of biometric devices which track them (electrodermal activity, pulse and respiration, body
temperature, blood pressure, brain wave activity, genetic predispositions, etc.), environmental customization can be applied to musical environments.
SUMMARY OF THE DISCLOSURE
The present disclsoure describes a mechanism for analyzing music, separating out its musical components (rhythms, time signature, melodic structure, modality, harmonic structure, harmonic density, rhythmic density and timbral density) mapping those components to emotional
components individually and in combination, based on published reviews and social media expressing human opinions about concerts, records, etc. This is done at both a macro and micro level within the musical works. Based on this training set, faders (or virtual faders in software) are given emotional components like Tension, Power, Joy, Wonder, Tenderness, Transcendence, Peacefulness, Nostalgia, Sadness, Sensuality, Fear, etc. These Musical Components are mapped against motifs which have been created for individual elements/participants of the game including but not limited to Characters (Lead Person, Partner, Primary Enemy, Wizard, etc.), Activity Types (fighting, resting, planning, hiding etc.), Areas (forest, city, desert, etc.), Personality of the person playing the game, etc. The motifs can be melodic, harmonic, rhythmic, etc. Once the composer has created the motifs and assigned the expected emotions to the faders, game simulations can be run where the composer selects motif combinations and emotional mappings and applies them to various simulations. These simulations can be described a priori or generated using a similar algorithm to map the game to similar emotional environments. It is possible that actual physical faders (as used in computerized audio mixing consoles) will make the process of mapping emotions to scenarios much more intuitive and visceral and that the physicality will create serendipitous results (e.g. raising the sensuality folder might have a better and more interesting effect than raising the tension fader even though it is a tense environment).
BRIEF DESCRIPTION OF THE DRAWING FIGURES
The above and still further objects, features and advantages of the present invention will become apparent upon consideration of the following detailed description of some specific embodiments thereof, especially when taken in conjunction with the accompanying drawings wherein like reference numerals in the various figures are utilized to designate like components, and wherein:
Figure 1 is a block diagram depicting an overview of the Dynamic Music Creation
Architecture according to aspects of the present disclosure.
Figure 2 is a block diagram showing the collection and an analysis of musical elements used to generate a corpus of musical elements according to aspects of the present disclosure.
Figure 3 is a block diagram depicting the collection and analysis of reviews and
commentaries used to generate a corpus of annotated performance data according to aspects of the present disclosure.
Figure 4 is a representative depiction of an embodiment of emotional parameters of music according to aspects of the present disclosure.
Figure 5 is a systematic view of training of a model using the musical and emotional review elements according to aspects of the present disclosure.
Figure 6 is a block diagram of showing creation of an annotated collection of game elements according to aspects of the present disclosure.
Figure 7 is a block diagram showing the creation and mapping of musical elements against game elements using virtual or real faders according to aspects of the present disclosure.
Figure 8 is a block diagram depicting the interaction of Faders and Switches with the Game Scenario Playback Engine according to aspects of the present disclosure.
Figure 9 is a block diagram showing how the cuing display is used in mapping the game and musical elements by previewing upcoming events according to aspects of the present disclosure.
Figure 10 is a block diagram of how foreshadowing is enabled using the cuing display to manage upcoming musical and gaming elements according to aspects of the present disclosure.
Figure 11 depicts a block schematic diagram of a system for dynamic music creation in gaming according to aspects of the present disclosure.
Figure 12A is a simplified node diagram of a recurrent neural network for use in dynamic music creation in gaming according to aspects of the present disclosure.
Figure 12B is a simplified node diagram of an unfolded recurrent neural network for use in dynamic music creation in gaming according to aspects of the present disclosure.
Figure 12C is a simplified diagram of a convolutional neural network for use in dynamic music creation in gaming according to aspects of the present disclosure.
Figure 12D is a block diagram of a method for training a neural network in dynamic music creation in gaming according to aspects of the present disclosure.
DETAILED DESCRIPTION OF THE SPECIFIC EMBODIMENTS
Although the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
Component Overview
As can be seen in Figure 1, there are a number of components that comprise the complete system. Details will follow but it is useful to see the complete system from a high level. The components break down as follows:
First, the musical emotional data must be constructed: Musical Data can be collected from a corpus of written and recorded music 101. This can be scores, transcriptions created by people or transcriptions created intelligent software systems. Next, the system has to break the musical data into its individual components - melodies, harmonies, rhythms, etc. 102 and these must be stored as metadata associated with pieces or parts of pieces. Now, in order to determine the emotional components or markers associated with the individual musical elements, we will rely on the wisdom of the crowds. This is accurate, by definition, because it is people’s emotions in relation to the music that we are trying to capture. We will use reviews, blog posts, liner notes and other forms of commentary to generate emotional metadata for whole pieces and individual sections 103. The next step is to map musical metadata to emotional metadata 104. At this point, we will have a fairly large corpus of musical data that is associated with emotional data but it will by no means be complete. The next step is to use a convolutional neural network (CNN) to compare the actual music metadata with the analysis of the emotional components 105. This CNN can then be used to suggest emotional associations for music that the model has not previously been trained on. This can then be checked for accuracy. Review and repeat - as iteration continues, accuracy will improve.
The next step is to analyze the gaming environment. The phases, components emotions and
characteristics of the game must first be collected and mapped. These scenes and components include things like locales, characters, environments, verbs like fights, casting spells, etc. 106. The next step is to map game components to the emotional metadata components 107.
Once the gaming environment is mapped out, it is time to write the musical motifs 108. Motifs can be written for all of the key characters, for scenes, for moods or for any recurring element of the game. Once the basic motifs are written, faders (virtual or real) can be used to map musical emotional markers to game markers 109. The personality or behavioral components of the user (player of the game) can also be mapped as additional emotional characteristics to be considered 110. Next, continued testing, review and iteration are done try the model on different scenarios in the game under development 111.
What follows is a more detailed description of the various processes.
Analysis of Musical Components
There are a few elements required to do an analysis of musical components. These are outlined in Figure 2. First, there is a large corpus of musical scores. Scores are available in printed form 201. These printed scores can be scanned and analyzed to be converted into electronic versions of the scores. Any form of electronic scoring can be used (MIDI, Sibelius, etc.). There are also scores already available in digital form 202, which can be added to the corpus. Finally, we can use machine learning to transcribe music into its musical elements (including the scores) 203. Machine Learning approaches can generate scores for music that may not be in notation form.
This will include multiple versions of the same song by the same artist. Analyses of live and recorded performances have the advantage that they can be mapped in greater detail including things like variations in tempo and in dynamics - and for improvisational material many additional variables can be considered. We must keep track of the analyses of different performances 204 so that we can know the differences between performances - e.g. the NY Philharmonic performance of Beethoven’s 5th symphony conducted by Leonard Bernstein in 1964 would be different from Seiji Ozawa’s performance of the same piece with the Boston Symphony years later. These scores have to be analyzed for tempo 205, rhythmic components 206, melody 207 harmonic structure 208, dynamics 209, harmonic and improvisational elements 210. All of this is now stored in a
detailed corpus of musical performances 211.
Once there are deep mechanical analyses of a large corpus of music, these analyses are then mapped to reviews of those pieces using descriptions from reviews and analyses in literature about the pieces. This can be seen in Figure 3. The reviews and analyses could be in textual, aural or visual form and include but not be limited to Articles 301, Reviews 302, Blogs 303, Podcasts 304, Liner Notes 305 and Social Media Comments 306. From this large corpus of musical analysis 307, will come descriptions of various pieces and sections of pieces 308. Further, these analyses must have context for the period in which they were written and performed 309. For example, one may look at the work of Mozart and read the descriptions in the literature of his work (which may be different periods e.g. a Mozart piece performed in the 20th century). Next, a known structural musical analysis 310 is applied to determine which pieces and/or sections of pieces or melodic phrases are beautiful, heroic, melancholy, strident, plaintive, solemn, powerful, expressive, etc. This may involve looking at the music literature and apply general principles of compositional development to the analysis. For example, a melodic leap followed by stepwise reverse of direction is considered beautiful. Melodic intervals can be associated along a scale of pleasing, unpleasing or neutral (e.g., a jump of a minor 9th is unpleasant, a fifth is heroic and a step within the key signature is neutral). All of this performance and compositional data may be used to Map Emotional Key Words to Performances and Segments 311.
“Emotion” words may come out of the process of digitally parsing the reviews and may include, and any manual analysis will almost certainly, include the words: Tension, Power,
Joy, Wonder, Tenderness, Transcendence, Peacefulness, Nostalgia, Sadness, and Sensuality.
Note that for the keywords to work, it is not necessarily important that they be accurate only that they have a consistent effect. When actually using the final system, it will not matter if the fader associated with the word sensuality actually creates music that is more sensual but only that it is predictable and emotionally understandable to the composer. This is for two reasons: l) the labels can always be changed to something more intuitive and 2) humans are adaptable when working with music and sounds (in the history of music synthesizers, the knobs, buttons and faders - for example - going back to the Yamaha DX-7 did not affect the sound in any way that was intuitive based on the name but rather had an affect that became part of the musician’s sense memory and muscle memory and so was easy to use in spite of the
meaningless labeling).
This detailed analysis of musical pieces and components is placed into a Corpus of
Annotated Performance Data 312.
Note that mapping the musical elements to create a corpus of musical performances need not be, and will not likely be, a one to one mapping but will rather be set on a scale of an emotional continuum, e.g., a set of musical vectors. For example, a piece might be 8 out of 10 on a scale of sensuality and also 4 out of 10 on a scale of sadness. Once musical elements are mapped to these emotional vectors the model can be tested on titles that are not in the training dataset. Humans can listen to the results and fine tune the model until it is more accurate. Ultimately, the model will get very accurate for a place in time. Different models can be run in different time frames to create 50s style horror music or 21st century style horror music - remembering that the actual descriptors are not as important as the classification groupings (that is, what is mellow to one composer might be boring to another).
Note that the model can map not only the pieces and sections to emotional vectors but may also map the constituent melodies, rhythms and chord progressions to those same emotional vectors. At the same time, the machine learns melodic structures (like a melodic leap upward followed by a step downward is generally considered beautiful and a leap of a minor ninth is generally considered discordant, etc.).
The structural musical analysis 310 may involve a number of components. For example, as depicted in Figure 4, beauty, dissonance and all measures of subjective description could be placed in time as a compositional reference. For example, what is considered dissonant in Bach’s era (for example it was considered too dissonant to play the interval a major 7th unless the 7th was carried over from the previous more consonant chord) is considered beautiful in later eras (today a major seventh chord is considered normal and sometimes even sappy).
In one existing method of music mood classification 401, the moods of songs are divided according to psychologist Robert Thayer’s traditional model of mood. The model divides songs along the lines of energy and stress, from happy to sad and calm to energetic, respectively. The eight categories created by Thayer’s model include the extremes of the two lines as well as each of the possible intersections of the lines (e.g. happy-energetic or sad-calm).
This historic analysis may be of limited value and the approach described herein may be much more nuanced and flexible. Because of the size of the corpus and therefore the training dataset, a much richer and more nuanced analysis can be applied.
Some of the components to be analyzed 402 may include, but are not limited to, harmonic groupings, modes and scales, time signature(s), tempo(s), harmonic density, rhythmic density, melodic structure, phrase length (and structure), dynamics, phrasing and compositional techniques (transposition, inversion, retrograde, etc.), grooves (including rhythms like Funky, Lounge, Latin, Reggae, Swing, Tango, Merengue, Salsa, Fado, 60s disco, 70s disco, Heavy Metal, etc. - there are hundreds of established rhythmic styles)
Training the Model
Just as people would label photos of peaches to train a convolutional neural network to recognize a photo of a peach when it is shown a photo of a peach it has never seen before, the data about opinions of consumers and reviewers can be used to train a Machine Learning model and then use the model to classify new material. An example of this process is shown in Figure 5. In the illustrated implementation, the wisdom of crowds (e.g., reviews, posts, etc.) may be used to train our model to recognize the emotions associated with different musical passages.
In the implementation shown in Figure 5, the corpus of musical data into three buckets. The first bucket, Corpus A 501 is a set of musical data 502 for which the Collected Emotional Descriptions 503 have been mapped based on analysis of listener review data. The second bucket, Corpus B 506 is a set of musical data 507 for which its own set of Collected Emotional
Descriptions 508 have been mapped. The third bucket, Corpus C is a set of Musical Data 511 for which there are no Collected Emotional Descriptions. The model is initially trained on Corpus A. Using a convolutional neural network 504 the Collected Emotional Descriptions 503 are mapped to the musical data 502. The results of this mapping is a Trained Engine 505, which should be capable of deriving Emotional Descriptions from Musical Data. The Musical Data 507 for Corpus B 506 it is then fed into the Trained Engine 505. The resultant output is the Predicted Emotional Descriptions of Corpus B Music 509. These predictions are then compared against Collected Descriptions 510 to determine how accurate the predictions are. This process of predicting,
comparing and iterating (e.g. by moving the musical elements from Corpus B into Corpus A and repeating) continue until the predicted emotional descriptions of Corpus B Music 509 closely match the actual results. This process can continue indefinitely with the system getting smarter and smarter over time. After the trained engine 505 has gone through enough iterations, it can be used as a prediction engine 512 on Corpus C 511to predict the Emotional Descriptions of Corpus C Music 513.
Mapping the Game Elements
To match dynamic musical elements to different elements in a game, a corpus of elements that may require music or changes in music may be created. There are many different elements that can have an impact on what the music should be at that moment. These elements are, of course, used by game developers and can be made available to the composer and sound designer in order for the music to be matched. As can be seen in Figure 6, there are numerous game elements to be tracked and mapped. These include but are not limited to game characters 601, game environments or locations 602, game moods or tones 603 and perhaps most importantly, game events 604. Events in a game can be anything from the appearance of an enemy or friend to the change of a location to a battle to the casting of a spell to a race (and many, many more). All of these game elements 605 are collected along with their triggers (in or out) and modifiers. Additionally, these may be collected as game vectors 606. They are referred to as vectors because many of the components have multiple states. For example, a field might be different at night than in the daytime and different in the rain than in the sunshine. An enemy might have variable powers that increase or decrease based on time, place, or level. From this perspective, any element could have an array of vectors. All game elements and their vectors have moods associated with them. The mood may vary with the value of the vector but, nonetheless, moods can be associated with the game elements. Once we have mapped the game elements to their Emotions 607, we can store that relationship in a corpus or Collection of Annotated Game Elements 608.
Gathering the Components Needed to Feed the Environment
Ultimately, the compositional tools could be used to create music on its own from the game play but the focus of this embodiment is to create the music from compositional primitives. The
composer will create musical elements or motifs and associate them with characters or elements in the game.
The use of the tools here need not be used alone and will often be used to augment traditional scoring techniques where prerecorded music is combined both sequentially and in layers with multiple components being mixed together to create a whole. That is, today, different layers may start in different time and some will overlap with others and still others may run independently of other layers. The idea here is to develop additional tools that can be used 1) in addition to (e.g. on top of) existing techniques, 2) instead of existing techniques in some or all places or 3) as a mechanism to inform the use of existing techniques. Additionally, these mechanisms could be used to create entirely new forms of interactive media (for example people trying to control their blood pressure or brain wave states could use musical feedback as a training tool or even use biometric markers as a compositional tool.
In traditional composition studies, motifs typically refer to small melodic segments.
However, in this context motifs can be melodic segments, harmonic structures, rhythmic structures and or specific tonalities. Motifs can be created for as many of the individual elements/participants of the game as desired, including but not limited to characters (lead person, partner, primary enemy, wizard, etc.), activity types (fighting, resting, planning, hiding etc.), areas (forest, city, desert, etc.), personality of the person playing the game (young, old, male, female, introvert, extrovert, etc.). The motifs can be melodic, harmonic, rhythmic, etc. Additionally, there can be multiple motifs for an individual element, for example a rhythmic pattern and a melodic pattern that might be used individually or together or there might be both a sad motif and a happy motif for the same character to be used in different circumstances.
Once the composer has created the motifs, they can be assigned to the elements/characters. This could be done by the composer or sound designers and can be changed as the game is developed or even dynamically inside the game after it has been released.
Utilizing the Tools to Map the Musical Components to the Game Elements
The aforementioned aspects may be combined to map musical components to the game elements. It should be assumed throughout the document that reference to faders and buttons
could be real physical buttons or faders or could be virtual buttons or faders. Intuitively, based on the experience of musicians and composers, it is expected that physical buttons and faders will be more intuitive to use and likely have better outcomes. It is possible, likely even, that actual physical faders (as used in computerized audio mixing consoles) will make the process of mapping emotions to scenarios much more intuitive and visceral and that the physicality will create serendipitous results (e.g. raising the sensuality folder might have a better and more interesting effect than raising the tension fader even though it is a tense environment). However, for the purposes of this application either may work.
An example of a possible logical order of events as shown in Figure 7, though any order may yield results and the process will undoubtedly be iterative, recursive and non-linear.
The logical order depicted in Figure 7 may begin with a Collection of Annotated Game Elements 701, which may be carried over from the Collection of Annotated Game Elements 608 in Figure 6. Game triggers are then assigned to switches at 702. These triggers could be the appearance of an enemy or an obstacle or a level up, etc., etc., etc. Next, Primary Emotional Vectors 703 are selected and visual markers are assigned at 704 so that element types may be displayed to the Composer. Game elements are then assigned to emotional elements in multi dimensional arrays, as indicated at 705. The multi-dimensional arrays are then assigned to faders and switches, which are used to map musical emotional markers to game markers at 710. Once a set of emotional markers for game elements has been established, music be applied to the game elements. The Composer writes the motifs 707. As noted above, the motifs can represent characters, events, areas or moods. Next, we select baseline emotions for motifs 708. These are the default emotions. For example, anytime a wizard appears, there might be a default motif with its default emotion. These can vary based on circumstance but if no variables are applied, they will operate in their default mode. Now we assign emotional vectors to faders 709. Now that the game vectors have been assigned and the musical vectors assigned, the faders and switches can be used to map musical emotional markers to game markers at 710. A fader does not have to correspond to a single emotion. As this is much like fine tuning the timbre of an instrument, the composer can be creative. Perhaps a fader that is to be 80% heroic and 20% sad will yield unexpected and delightful results.
Next, the various themes may be mapped to a set of buttons (perhaps in a colored matrix so it is easy to see many at once). Some of these may be the same as the game switches but others will be musical switches. These could be grouped by type of character (hero, villain, wizard, etc.), musical composition component (say grooves on one side and melodies on another, modes across the top) and scenes (city, country, etc.)
Note, many of the game triggers that have been mapped to the buttons or switches will eventually be“pushed” by the game play itself, but in the early stages it will be useful to be able to simulate things like the arrival of an enemy or a sunrise or impending doom.
Now, game simulations can be run in real time to early versions of the game, even if it is only storyboards or even if there are no storyboards just to write music that applies to different scenarios. The composer can select motif combinations and emotional mappings and applies them to various simulations and test drive them. As the game develops, these scenarios can be fine- tuned.
In fact, this can be used as a generalized composition too where a composer or performer can use the machine to create music based on primitives.
Using the Faders and Switches to Program the Music
Figure 8 illustrates an example of how game simulations and musical programming can actually be utilized. As was seen in Figure 7 and can be seen in Figure 8, motifs 801, game triggers 802 and game components 803 are mapped to faders and switches 804. These motifs 801, game triggers 802 and game components 803 mapped to faders and switches 804 are plugged into a game scenario playback engine 805. The composer now plays the music while testing various gaming scenarios 806 and it is recorded by a recorded music scene / event mapping module 808.
In order to see what the elements are, the composer can look at a cuing display screen 808 showing what elements are being used or are coming up. If a fader is touched, the element to which it is mapped may be highlighted on the screen to provide visual feedback. The cuing display can show the elements from the game scenario playback engine 805, as they are playing out from the live faders and switches. This is represented by a play music while gaming module 806 and from the recorded music scene / event mapping module 807. Just like updating the faders while
mixing with automated faders, all the faders and switches can be updated and modified in real time at 809 and remembered for later playback. Additionally, motifs, components and triggers can be updated at 810 and remembered by the system. Of course, there are unlimited undos and different performances can be saved as different versions, which can be used at different times and in combination with each other.
Figure 9 displays additional detail on how a Cuing Monitor 907 may be used according to aspects of the present disclosure. Upcoming events 901, character 902 and locations 903 can all be seen on the Cuing Monitor (there may be multiple Cuing Monitors). There may be a default weighting 904 based on the expected parameters of the game play. These and the Fader (and Switch) Mapping Matrix 905 are all displayed on one or more Cuing Monitors 907. One other input to the fader mapping matrix 905 and visible on a Cuing Monitor is the Player History / Prediction Matrix 906. Based on the player’s style of play or other factors (time of day, number of minutes or hours in the current session, state of the game, etc.), this matrix can vary the music either automatically or based on parameters set by the composer or sound designer.
Music that Foreshadows a Change
In film, the music often changes before the visuals do. This process of presaging or foreshadowing is important for the emotional connection to the piece and to prepare the viewer / listener for a change in mood or to create other emotional preparation (even if it is a false prediction and the viewer is surprised). Now, as we are writing music on the fly, we will want to be able to foreshadow changes. This could be associated with being close to the completion of a level or signaling the entrance of a new character or environment (or setting up the viewer for one kind of change but actually surprising them with another kind of change).
How will our Compositional Engine foreshadow effectively? We can use triggers that are known in advance and use faders or dials to control the timing of the foreshadowing and also the ramp of the foreshadowing. For example, if there is a timer running out on a level, the
foreshadowing might be set to begin 30 seconds before the time ends and to rise in intensity using an exponential curve such as y = 2X. Many foreshadowings are possible from possible fear overtones to happy projections. Again, because, this is designed to be a visceral too, this
functionality may well end up creating unanticipated results some of which will be useful to program into the game. Looking at Figure 10, a set of Possible Next Events lOOlcan be seen on the Cuing Matrix 1003. There is also a Next Events Timing Matrix 1002, which can be used to set the timing values and ramping characteristics. Another factor that can play into the
Foreshadowing is the Probability Emotion Weighting Engine 1004. This Engine weighs the expected emotional state of the upcoming event and is used to effect the foreshadowing audio. It can be seen on the Cueing Monitor and the weighting of its effect can be varied. The likelihood of an upcoming event to have any particular emotion and the predicted intensity of that emotion is mapped by the Event to Emotion Likelihood Mapping Engine 1005. Foreshadowing is also affected by Player History and the Player History / Prediction Matrix 1007, This Matrix can be seen on the Cuing Monitor and also feeds into the Foreshadowing Matrix Engine 1006. Though all of the elements in Figure 10 and be seen in the Cuing Monitor, they would likely be relegated to secondary Cuing Monitors as the Foreshadowing Matrix Engine will be using machine intelligence to surface the most likely scenarios. The visible components can be set in preferences or programmed but as material will be coming quickly (though, of course, the composer can have many takes), it would be best for the Learning Engine to automate as many of these components as possible.
Uses Outside of Games
The uses of this compositional tool are not limited to game use only. Interactive VR environments, even in non-game uses can take advantage of these techniques. Additionally, this could be used as a compositional too for scoring a traditional TV show or Film. And one final use might be to use this in the pure creation of music to create albums or beds for pop songs, etc.
System
Figure 11 depicts a system for dynamic music creation in gaming according to aspects of the present disclosure. The system may include a computing device 1100 coupled to a user input device 1102. The user input device 1102 may be a controller, touch screen, microphone, keyboard, mouse, joystick, fader board or other device that allows the user to input information including sound data in to the system. The system may also be coupled to a biometric device
1123 configured to measure a electro-dermal activity, pulse and respiration, body temperature, blood pressure, or brain wave activity. The biometric device may be for example and without limitation a thermal or IR camera pointed at a user and configured to determine the respiration and heart rate of user from thermal signatures, for more information see commonly owned Patent No. 8,638,364 to Chen et al.“USER INTERFACE SYSTEM AND METHOD USING
THERMAL IMAGING” the contents of which are incorporated herein by reference.
Alternatively, the biometric device may be such other devices as for example and without a limitation, a pulse oximeter, blood pressure cuff, electroencephalograph machine,
electrocardiograph machine, wearable activity tracker or a smartwatch with bio-sensing.
The computing device 1100 may include one or more processor units 1103, which may be configured according to well-known architectures, such as, e.g., single-core, dual-core, quad- core, multi-core, processor-coprocessor, cell processor, and the like. The computing device may also include one or more memory units 1104 (e.g., random access memory (RAM), dynamic random access memory (DRAM), read-only memory (ROM), and the like).
The processor unit 1103 may execute one or more programs, portions of which may be stored in the memory 1104 and the processor 1103 may be operatively coupled to the memory, e.g., by accessing the memory via a data bus 1105. The programs may be configured to generate or use sound motifs 1108 to create music based on game vectors 1109 and emotion vectors 1110 of a videogame. The sound motifs may be short musical motifs composed by musicians, users or machines. Additionally the Memory 1104 may contain programs that implement training of a sound categorization and classification NNs 1121. The memory 1104 may also contain one or more databases 1122 of annotated performance data and emotional descriptions. Neural network modules 1121, e.g., convolutional neural networks for associating musical motifs with emotions may also be stored in the memory 1104. The memory 1104 may store a report 1110 lasting items not identified by the neural network modules 1121 as being in the databases 1122. The sound motifs, game vectors, emotional vectors, neural network modules and annotated performance data, 1108, 1109, 1121, 1122 may also be stored as data 1118 in the Mass Store 1118 or at a server coupled to the Network 1120 accessed through the network interface 1114. Additionally
data for a videogame may be stored the memory 1104 as data in the database or elsewhere or as a program 1117 or data 1118 in the mass store 1115.
The overall structure and probabilities of the NNs may also be stored as data 1118 in the Mass Store 1115. The processor unit 1103 is further configured to execute one or more programs 1117 stored in the mass store 1115 or in memory 1104 which cause processor to carry out a method of dynamic music creation using musical motifs 1108, game vectors 1109 and emotional vectors 1110 as described herein. Music generated from the musical motifs may be stored in the database 1122. Additionally the processor may carry out the method for NN 1121 training and classification of musical motifs to emotions as described herein. The system 1100 may generate the Neural Networks 1122 as part of a NN training process and store them in memory 1104. Completed NNs may be stored in memory 1104 or as data 1118 in the mass store 1115.
Additionally the NN 1121 may be trained using actual responses from users with the biometric device 1123 being used to provide biological feedback from the user.
The computing device 1100 may also include well-known support circuits, such as input/output (I/O) 1107, circuits, power supplies (P/S) 1111, a clock (CLK) 1112, and cache 1113, which may communicate with other components of the system, e.g., via the bus 1105. . The computing device may include a network interface 1114. The processor unit 1103 and network interface 1114 may be configured to implement a local area network (LAN) or personal area network (PAN), via a suitable network protocol, e.g., Bluetooth, for a PAN. The computing device may optionally include a mass storage device 1115 such as a disk drive, CD-ROM drive, tape drive, flash memory, or the like, and the mass storage device may store programs and/or data. The computing device may also include a user interface 1116 to facilitate interaction between the system and a user. The user interface may include a monitor, Television screen, speakers, headphones or other devices that communicate information to the user.
The computing device 1100 may include a network interface 1114 to facilitate
communication via an electronic communications network 1120. The network interface 1114 may be configured to implement wired or wireless communication over local area networks and wide area networks such as the Internet. The device 1100 may send and receive data and/or
requests for files via one or more message packets over the network 1120. Message packets sent over the network 1120 may temporarily be stored in a buffer 1109 in memory 1104. The annotated performance data, sound motifs, and annotated game elements may be available through the network 1120 and stored partially in memory 1104 for use.
Neural Network Training
Generally, neural networks used in dynamic music generation may include one or more of several different types of neural networks and may have many different layers. By way of example and not by way of limitation the classification neural network may consist of one or multiple convolutional neural networks (CNN), recurrent neural networks (RNN) and/or dynamic neural networks (DNN).
Figure 12A depicts the basic form of an RNN having a layer of nodes 1220, each of which is characterized by an activation function S, one input weight U, a recurrent hidden node transition weight W, and an output transition weight V. The activation function S may be any non-linear function known in the art and is not limited to the (hyperbolic tangent (tanh) function. For example, the activation function S may be a Sigmoid or ReLu function. Unlike other types of neural networks, RNNs have one set of activation functions and weights for the entire layer. As shown in Figure 12B the RNN may be considered as a series of nodes 1220 having the same activation function moving through time T and T+l. Thus, the RNN maintains historical information by feeding the result from a previous time T to a current time T+l.
In some embodiments, a convolutional RNN may be used. Another type of RNN that may be used is a Long Short-Term Memory (LSTM) Neural Network which adds a memory block in a RNN node with input gate activation function, output gate activation function and forget gate activation function resulting in a gating memory that allows the network to retain some information for a longer period of time as described by Hochreiter & Schmidhuber“Long Short term memory” Neural Computation 9(8): 1735-1780 (1997), which is incorporated herein by reference.
Figure 12C depicts an example layout of a convolution neural network such as a CRNN according to aspects of the present disclosure. In this depiction, the convolution neural network is generated for training data in the form of an array 1232, e.g., with 4 rows and 4 columns giving a total of 16 elements. The depicted convolutional neural network has a filter 1233 size of 2 rows by 2 columns with a skip value of 1 and a channel 1236 of size 9. For clarity in Figure 12C only the connections 1234 between the first column of channels and their filter windows is depicted. Aspects of the present disclosure, however, are not limited to such implementations. According to aspects of the present disclosure, the convolutional neural network that implements the classification 1229 may have any number of additional neural network node layers 1231 and may include such layer types as additional convolutional layers, fully connected layers, pooling layers, max pooling layers, local contrast normalization layers, etc. of any size.
As seen in Figure 12D Training a neural network (NN) begins with initialization of the weights of the NN 1241. In general, the initial weights should be distributed randomly. For example, an NN with a tanh activation function should have random values distributed between _ 1_ J_
^ and ^ where n is the number of inputs to the node.
After initialization the activation function and optimizer is defined. The NN is then provided with a feature vector or input dataset 1242. Each of the different features vectors may be generated by the NN from inputs that have known labels. Similarly, the NN may be provided with feature vectors that correspond to inputs having known labeling or classification. The NN then predicts a label or classification for the feature or input 1243. The predicted label or class is compared to the known label or class (also known as ground truth) and a loss function measures the total error between the predictions and ground truth over all the training samples 1244. By way of example and not by way of limitation the loss function may be a cross entropy loss function, quadratic cost, triplet contrastive function, exponential cost, etc. Multiple different loss functions may be used depending on the purpose. By way of example and not by way of limitation, for training classifiers a cross entropy loss function may be used whereas for learning pre-trained embedding a triplet contrastive function may be employed. The NN is then optimized and trained, using the result of the loss function and using known methods of training for neural
networks such as backpropagation with adaptive gradient descent etc. 1245. In each training epoch, the optimizer tries to choose the model parameters (i.e., weights) that minimize the training loss function (i.e. total error). Data is partitioned into training, validation, and test samples.
During training, the Optimizer minimizes the loss function on the training samples. After each training epoch, the mode is evaluated on the validation sample by computing the validation loss and accuracy. If there is no significant change, training can be stopped and the resulting trained model may be used to predict the labels of the test data.
Thus, the neural network may be trained from inputs having known labels or
classifications to identify and classify those inputs.
While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various alternatives, modifications and equivalents. Therefore, the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the claims that follow, the indefinite article“A” or“An” refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase“means for.”
Claims
1. A method for dynamic music creation comprising;
a) assigning an emotion to one or more musical motifs;
b) associating a game vector with the emotion;
c) mapping the one or more musical motifs to the game vector based on the emotion;
d) generating a musical composition based on the game vector and desired emotions.
2. The method of claim 1 wherein the emotion is assigned to the one or more musical motifs by a neural network trained to assign emotions to music based on content.
3. The method of claim 1 wherein a level is assigned to the emotion and the mapping of the one or more musical motifs to the game vector changes with the level assigned to the emotion.
4. The method of claim 3 wherein the level assigned to the emotion is changed using an emotion level slider.
5. The method of claim 1 further comprising associating a user behavior to the emotion and mapping the one or more musical motifs to the user behavior based on the emotion.
6. The method of claim 1 further comprising assignment of the one or more musical motifs to the emotion based on a physical response of a user to each of the one or more musical motifs.
7. The method of claim 1 further comprising assigning one of one or more motifs to a game vector as a default motif and wherein the default motif is included in the musical composition whenever the game vector is present within a videogame.
8. The method of claim 1, wherein the game vector includes a game element and a trigger for the game element within a videogame.
9. The method of claim 8 further comprising assigning one of the one or more motifs to a game element as a default motif for the game element and including the default motif in the musical composition whenever the game element is present within the videogame.
10. The method of claim 8 further comprising assigning one of the one or more motifs to the trigger as a default motif for the trigger and including the default motif in the musical composition whenever the trigger is activated within the videogame.
11. The method of claim 1 wherein generating a musical composition comprises selecting one of the one or more musical motifs to play while a videogame is being played wherein the musical motifs are played contemporaneously with the videogame.
12. The method of claim 1 further comprising a generating an event cue for a videogame wherein the event cue includes upcoming events within a videogame and wherein the event cue is used during generation of the musical composition.
13. The method of claim 12 wherein the event cue includes an event probability, event timing, and an event emotional weight, wherein the event probability defines the likelihood an event will occur within the event timing and wherein the event emotional weight provides the emotion of the event.
13. The method of claim 12 wherein the event emotional weight includes a level for the emotion.
14. A system for dynamic music creation comprising;
a processor;
a memory coupled to the processor;
non-transitory instructions embedded in the memory that when executed cause the processor to carry out the method comprising:
a) assigning an emotion to one or more musical motifs;
b) associating a game vector with the emotion;
c) mapping one or more musical motifs to the game vector based on the emotion;
d) generating a musical composition based on the game vector and desired emotions.
15. The system of claim 14 wherein the emotion is assigned to the one or more musical motifs by a neural network trained to assign emotions to music based on content.
16. The system of claim 14 wherein a level is assigned to the emotion and the mapping of the one or more musical motifs to the game vector changes with the level assigned to the
emotion.
17. The system of claim 16 further comprising an emotion slider coupled to the processor wherein the emotion slider is configured to control the level assigned to the emotion.
18. The system of claim 14 further comprising assignment of the one or more musical motifs to the emotion based on a physical response of a user to each of the one or more musical motifs.
19. The system of claim 18 further comprising a biometric device coupled to the processor and configured to track the physical response of the user.
20. Non-transitory instructions embedded in a computer readable medium that when executed cause a computer to perform the method comprising;
a) assigning an emotion to one or more musical motifs;
b) associating a game vector with the emotion;
c) mapping one or more musical motifs to the game vector based on the emotion;
d) generating a musical composition based on the game vector and desired emotions.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19884029.0A EP3880324A4 (en) | 2018-11-15 | 2019-11-07 | DYNAMIC MUSIC CREATION IN GAMING |
JP2021526653A JP7223848B2 (en) | 2018-11-15 | 2019-11-07 | Dynamic music generation in gaming |
CN201980075286.0A CN113038998B (en) | 2018-11-15 | 2019-11-07 | Dynamic music creation in games |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862768045P | 2018-11-15 | 2018-11-15 | |
US62/768,045 | 2018-11-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020102005A1 true WO2020102005A1 (en) | 2020-05-22 |
Family
ID=70732126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2019/060306 WO2020102005A1 (en) | 2018-11-15 | 2019-11-07 | Dynamic music creation in gaming |
Country Status (5)
Country | Link |
---|---|
US (2) | US11969656B2 (en) |
EP (1) | EP3880324A4 (en) |
JP (1) | JP7223848B2 (en) |
CN (1) | CN113038998B (en) |
WO (1) | WO2020102005A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220262329A1 (en) * | 2018-11-15 | 2022-08-18 | Sony Interactive Entertainment LLC | Dynamic music modification |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022026864A1 (en) * | 2020-07-31 | 2022-02-03 | Maestro Games, SPC | Systems and methods to improve a user's mental state |
US11514877B2 (en) | 2021-03-31 | 2022-11-29 | DAACI Limited | System and methods for automatically generating a musical composition having audibly correct form |
US11978426B2 (en) | 2021-03-31 | 2024-05-07 | DAACI Limited | System and methods for automatically generating a musical composition having audibly correct form |
CN113143289B (en) * | 2021-03-31 | 2024-07-26 | 华南理工大学 | Intelligent brain wave music earphone capable of realizing interconnection and interaction |
WO2022207765A2 (en) * | 2021-03-31 | 2022-10-06 | DAACI Limited | System and methods for automatically generating a musical composition having audibly correct form |
EP4348644A4 (en) * | 2021-05-27 | 2025-02-12 | Xdmind Inc | Selecting supplemental audio segments based on video analysis |
CN113239694B (en) * | 2021-06-04 | 2022-06-14 | 北京理工大学 | Argument role identification method based on argument phrase |
CN113908548A (en) * | 2021-11-15 | 2022-01-11 | 网易(杭州)网络有限公司 | Method and device for controlling music in game, storage medium and electronic equipment |
WO2024020497A1 (en) * | 2022-07-22 | 2024-01-25 | Sony Interactive Entertainment LLC | Interface customized generation of gaming music |
US12128308B2 (en) | 2022-07-22 | 2024-10-29 | Sony Interactive Entertainment LLC | Game environment customized generation of gaming music |
US20240024775A1 (en) * | 2022-07-22 | 2024-01-25 | Sony Interactive Entertainment LLC | User preference customized generation of gaming music |
US12179114B2 (en) * | 2022-07-22 | 2024-12-31 | Sony Interactive Entertainment LLC | Customized audio spectrum generation of gaming music |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010007542A1 (en) | 2000-01-06 | 2001-07-12 | Konami Corporation | Game system and computer readable storage medium therefor |
WO2007043679A1 (en) * | 2005-10-14 | 2007-04-19 | Sharp Kabushiki Kaisha | Information processing device, and program |
US20100307320A1 (en) * | 2007-09-21 | 2010-12-09 | The University Of Western Ontario | flexible music composition engine |
US20120047447A1 (en) * | 2010-08-23 | 2012-02-23 | Saad Ul Haq | Emotion based messaging system and statistical research tool |
US20140058735A1 (en) | 2012-08-21 | 2014-02-27 | David A. Sharp | Artificial Neural Network Based System for Classification of the Emotional Content of Digital Music |
US20170092247A1 (en) | 2015-09-29 | 2017-03-30 | Amper Music, Inc. | Machines, systems, processes for automated music composition and generation employing linguistic and/or graphical icon based musical experience descriptors |
US20170365277A1 (en) * | 2016-06-16 | 2017-12-21 | The George Washington University | Emotional interaction apparatus |
KR20180005277A (en) * | 2009-07-16 | 2018-01-15 | 블루핀 랩스, 인코포레이티드 | Estimating and displaying social interest in time-based media |
US20180226063A1 (en) | 2017-02-06 | 2018-08-09 | Kodak Alaris Inc. | Method for creating audio tracks for accompanying visual imagery |
Family Cites Families (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5451709A (en) | 1991-12-30 | 1995-09-19 | Casio Computer Co., Ltd. | Automatic composer for composing a melody in real time |
JP3704980B2 (en) | 1997-12-17 | 2005-10-12 | ヤマハ株式会社 | Automatic composer and recording medium |
US7076035B2 (en) | 2002-01-04 | 2006-07-11 | Medialab Solutions Llc | Methods for providing on-hold music using auto-composition |
JP4081789B2 (en) | 2002-03-07 | 2008-04-30 | ベスタクス株式会社 | Electronic musical instruments |
WO2004025306A1 (en) | 2002-09-12 | 2004-03-25 | Musicraft Ltd | Computer-generated expression in music production |
EP1703491B1 (en) | 2005-03-18 | 2012-02-22 | Sony Deutschland GmbH | Method for classifying audio data |
US20070044639A1 (en) | 2005-07-11 | 2007-03-01 | Farbood Morwaread M | System and Method for Music Creation and Distribution Over Communications Network |
US7663045B2 (en) * | 2005-09-20 | 2010-02-16 | Microsoft Corporation | Music replacement in a gaming system |
GB0606119D0 (en) | 2006-03-28 | 2006-05-03 | Telex Communications Uk Ltd | Sound mixing console |
US8168877B1 (en) | 2006-10-02 | 2012-05-01 | Harman International Industries Canada Limited | Musical harmony generation from polyphonic audio signals |
US7696426B2 (en) | 2006-12-19 | 2010-04-13 | Recombinant Inc. | Recombinant music composition algorithm and method of using the same |
US8445767B2 (en) * | 2009-04-11 | 2013-05-21 | Thomas E. Brow | Method and system for interactive musical game |
US8629342B2 (en) * | 2009-07-02 | 2014-01-14 | The Way Of H, Inc. | Music instruction system |
CA2725722A1 (en) * | 2010-12-17 | 2012-06-17 | Karen E. Collins | Method and system for incorporating music into a video game using game parameters |
WO2012165978A1 (en) * | 2011-05-30 | 2012-12-06 | Auckland Uniservices Limited | Interactive gaming system |
DE202013104376U1 (en) | 2012-09-12 | 2013-11-12 | Ableton Ag | Dynamic diatonic musical instrument |
TWI486949B (en) * | 2012-12-20 | 2015-06-01 | Univ Southern Taiwan Sci & Tec | Music emotion classification method |
TWI482149B (en) * | 2012-12-20 | 2015-04-21 | Univ Southern Taiwan Sci & Tec | The Method of Emotional Classification of Game Music |
US9183849B2 (en) | 2012-12-21 | 2015-11-10 | The Nielsen Company (Us), Llc | Audio matching with semantic audio recognition and report generation |
US9583084B1 (en) | 2014-06-26 | 2017-02-28 | Matthew Eric Fagan | System for adaptive demarcation of selectively acquired tonal scale on note actuators of musical instrument |
US10276058B2 (en) | 2015-07-17 | 2019-04-30 | Giovanni Technologies, Inc. | Musical notation, system, and methods |
US10854180B2 (en) | 2015-09-29 | 2020-12-01 | Amper Music, Inc. | Method of and system for controlling the qualities of musical energy embodied in and expressed by digital music to be automatically composed and generated by an automated music composition and generation engine |
US20170228745A1 (en) * | 2016-02-09 | 2017-08-10 | UEGroup Incorporated | Tools and methods for capturing and measuring human perception and feelings |
US9799312B1 (en) | 2016-06-10 | 2017-10-24 | International Business Machines Corporation | Composing music using foresight and planning |
CN106362260B (en) * | 2016-11-09 | 2019-09-27 | 武汉智普天创科技有限公司 | VR mood regulation device |
US10675544B2 (en) * | 2017-03-31 | 2020-06-09 | Sony Interactive Entertainment LLC | Personalized user interface based on in-application behavior |
JP7041270B2 (en) | 2017-12-18 | 2022-03-23 | バイトダンス・インコーポレイテッド | Modular automatic music production server |
IL259059A (en) | 2018-04-30 | 2018-06-28 | Arcana Instr Ltd | A musical instrument with a joystick with variable tension and variable travel distance and a method of use thereof |
US11508393B2 (en) * | 2018-06-12 | 2022-11-22 | Oscilloscape, LLC | Controller for real-time visual display of music |
WO2020006452A1 (en) | 2018-06-29 | 2020-01-02 | Godunov Vladimir | Music composition aid |
US20200138356A1 (en) * | 2018-11-01 | 2020-05-07 | Moodify Ltd. | Emotional state monitoring and modification system |
US11328700B2 (en) | 2018-11-15 | 2022-05-10 | Sony Interactive Entertainment LLC | Dynamic music modification |
US10964299B1 (en) | 2019-10-15 | 2021-03-30 | Shutterstock, Inc. | Method of and system for automatically generating digital performances of music compositions using notes selected from virtual musical instruments based on the music-theoretic states of the music compositions |
US11475867B2 (en) | 2019-12-27 | 2022-10-18 | Spotify Ab | Method, system, and computer-readable medium for creating song mashups |
-
2019
- 2019-11-07 WO PCT/US2019/060306 patent/WO2020102005A1/en unknown
- 2019-11-07 EP EP19884029.0A patent/EP3880324A4/en active Pending
- 2019-11-07 CN CN201980075286.0A patent/CN113038998B/en active Active
- 2019-11-07 JP JP2021526653A patent/JP7223848B2/en active Active
- 2019-11-07 US US16/677,303 patent/US11969656B2/en active Active
-
2024
- 2024-04-29 US US18/649,362 patent/US20240278135A1/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010007542A1 (en) | 2000-01-06 | 2001-07-12 | Konami Corporation | Game system and computer readable storage medium therefor |
WO2007043679A1 (en) * | 2005-10-14 | 2007-04-19 | Sharp Kabushiki Kaisha | Information processing device, and program |
US20100307320A1 (en) * | 2007-09-21 | 2010-12-09 | The University Of Western Ontario | flexible music composition engine |
KR20180005277A (en) * | 2009-07-16 | 2018-01-15 | 블루핀 랩스, 인코포레이티드 | Estimating and displaying social interest in time-based media |
US20120047447A1 (en) * | 2010-08-23 | 2012-02-23 | Saad Ul Haq | Emotion based messaging system and statistical research tool |
US20140058735A1 (en) | 2012-08-21 | 2014-02-27 | David A. Sharp | Artificial Neural Network Based System for Classification of the Emotional Content of Digital Music |
US20170092247A1 (en) | 2015-09-29 | 2017-03-30 | Amper Music, Inc. | Machines, systems, processes for automated music composition and generation employing linguistic and/or graphical icon based musical experience descriptors |
US20170365277A1 (en) * | 2016-06-16 | 2017-12-21 | The George Washington University | Emotional interaction apparatus |
US20180226063A1 (en) | 2017-02-06 | 2018-08-09 | Kodak Alaris Inc. | Method for creating audio tracks for accompanying visual imagery |
Non-Patent Citations (2)
Title |
---|
KIM ET AL.: "MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW", 11TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL CONFERENCE, 2010, pages 255 - 256, XP055707165 * |
See also references of EP3880324A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220262329A1 (en) * | 2018-11-15 | 2022-08-18 | Sony Interactive Entertainment LLC | Dynamic music modification |
Also Published As
Publication number | Publication date |
---|---|
CN113038998A (en) | 2021-06-25 |
US20200188790A1 (en) | 2020-06-18 |
US11969656B2 (en) | 2024-04-30 |
EP3880324A4 (en) | 2022-08-03 |
CN113038998B (en) | 2024-12-10 |
US20240278135A1 (en) | 2024-08-22 |
JP2022507579A (en) | 2022-01-18 |
JP7223848B2 (en) | 2023-02-16 |
EP3880324A1 (en) | 2021-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240278135A1 (en) | Dynamic music creation | |
US12236160B2 (en) | Audio techniques for music content generation | |
Herremans et al. | A functional taxonomy of music generation systems | |
Kaliakatsos-Papakostas et al. | Artificial intelligence methods for music generation: a review and future perspectives | |
Kirke et al. | A survey of computer systems for expressive music performance | |
Kirke et al. | An overview of computer systems for expressive music performance | |
Wallis et al. | Computer-generating emotional music: The design of an affective music algorithm | |
Canazza et al. | Expressiveness in music performance: analysis, models, mapping, encoding | |
Sun et al. | Research on pattern recognition of different music types in the context of AI with the help of multimedia information processing | |
Assayag et al. | Cocreative Interaction: Somax2 and the REACH Project | |
Tsatsishvili | Automatic subgenre classification of heavy metal music | |
Collins | " There is no reason why it should ever stop": Large-scale algorithmic composition | |
Nika et al. | Composing structured music generation processes with creative agents | |
Aljanaki | Emotion in Music: representation and computational modeling | |
Leman | Foundations of musicology as content processing science | |
Assayag et al. | Interaction with machine improvisation | |
Thompson IV | Creating Musical Scores Inspired by the Intersection of Human Speech and Music Through Model-Based Cross Synthesis | |
Durão | L-Music: uma abordagem para composição musical assistida usando L-Systems | |
da Silva Durão | L-music: Uma Abordagem Para Composição Musical Assistida Usando L-systems | |
Giavitto | All Watched Over by Machines of Loving Grace & A brief and subjective chronology of AI technics in music composition | |
Khosravi Mardakheh | The Sound of the hallmarks of cancer | |
Giavitto | All Watched Over by Machines of Loving Grace | |
Rawat et al. | Challenges in Music Generation Using Deep Learning | |
Karnop | Prediction of audio features of self-selected music by situational and person-related factors | |
Chechelashvili | Modular Synthesis and the Unconscious: An Exploration of the Role of Technology and Self-Reflection in Sound Creation and Compositional Process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19884029 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021526653 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019884029 Country of ref document: EP Effective date: 20210615 |