On Composing Interactive Music
by John Szinger
I have been doing research, development, and production of interactive music projects for a number of years. This is an investigation of some issues relevant to a composer of music for an interactive software application and an explanation of my personal approach to interactive music development.
Concept and Background
Approach to music and interactivity.
Conceptual Models
Cartoons and video games.
Issues in Composing Music for Interactivity
Timing, segues & transitions, musical depth, improvisation, local & global orientations, AI models of music, visual representation, and lyrics.
Methods of Generating Interactive Music
Cueing & queuing, muting & unmuting, asynchronous firing of sequences, parametric filtering of sequences, generative sequences.
Sound Rendering Technology Considerations
MIDI sequencing and digital audio sampling.
Character-Based Music-Driven Animation
MIDI Puppets and algorithmic animation techniques.
Music, Narrative and Navigation
Object differentiation, spatial differentiation, and temporal or state-dependent differentiation.
Interactive music is a field which is still in its infancy but is becoming more widespread as multimedia technology is more
capable of supporting it. There are many approaches one can take when creating an interactive music piece, and there are many issues
one must take into consideration when developing an approach to a project. Often music in a multimedia project takes a secondary
importance to the other elements such as the graphics, action or narrative. But music can be produced to be much more contextual
and meaningful. Indeed, a whole new genre of applications can be created in which music is the driving force. These may be dubbed
participatory music environments. In such environments the user or player controls the music in a virtual world rich with visual
and spatial cues to reinforce the musical actions.
When composing music for an interactive application, the goal is to tap the inherent ability of music to express emotion, evoke a
sense of drama, and communicate a story. One should take advantage of many musical devices, such as tension, release,
consonance, dissonance, rhythm, tempo, timbre, voice, lyric, structure, repetition, variation, and a variety of other elements, and
adapt these elements to an interactive context. A key element of my approach to interactive music development has been that the
environments are real-time simulation-based, as opposed to the many so-called "interactive" database-browser or slide show-
type products currently available.
My work has been focused in two main areas: perceptual and technological. On the perceptual side, the issues are knowing
what elements in music people will respond to emotionally and intuitively; what are appropriate musical parameters for user
control and for "smart" computer control; what are appropriate methods of input control and visual feedback; how can one design a
musical environment so that users can easily identify their contribution to an interactive composition and have a sense that
they are "making it happen"; and what are appropriate metaphors for the role of the user(s) in a participatory musical experience. On the
technological side, we are simply interested in finding and developing the proper tools to build interactive music environments
in terms of hardware platforms, controllers and interface devices, software operating systems and authoring environments, musical
data protocols, audio standards, and sound generation gear (synthesizers and samplers).
To a musician, all music is interactive in the sense that one is as active participant in the creation of the music, even when that
participation involves only listening. To an audience, the main difference between experiencing a live musical performance and
listening to a recording is the sense of involvement and interaction with the music, the crowd and the musicians. The knowledge the
music is in the future and unrealized adds a sense of excitement and unpredictability.
Improvisation is a major feature of musical performances in many genres. Musicians typically improvise off of a composed piece,
or a composition may have parts in it that have room for varying degrees of improvisation. In either case, the piece being improvised
can be thought of as a musical "space", or non-linear domain, (bounded by the parameters of the composition, style, etc.) with
each performance being a unique instance of expression in that realm, a squiggly line that maps a "fly-through" of that musical
space. The musician's moment-to-moment decisions shape this line in real time and these decisions are influenced by many
simultaneous factors: the global parameters of the tune he is playing, the music being made by the musicians he is jamming with,
the mood and expectations of the audience, his own mood, and perhaps the desire to make a specific statement or reach a
particular musical destination. The audience can be a direct participant in this process by providing context, response, and
collective influence for the musicians. Indeed, many performers make the audience active participants in their live shows.
Similarly, the goal in an interactive music piece is to make the player an active participant in the creation and direction of the
music through his actions in the environment.
A genre that has been a great source of inspiratation is animated cartoons. In many great classic cartoons, the entire action proceeds from the music,
as does the pacing, tone, and choreography. In the best ones, the soundtrack is so well crafted that the line between the score
and sound effects is indistinguishable. Furthermore, many cartoons are overtly musical in their themes and actions, or proceed directly
from the music as the source of inspiration for the rest of the work. They stand as important examples of the use of a musical score as
the basis for a multimedia production, in which the scores closely support carefully choreographed actions and themes. Indeed, the
genres of cartoons and video games may ultimately merge into a single art form.
An examination of video games can provide us with some other conceptual models. Video games in general have solved many
of the problems one faces in the areas of interface, point of view, graphic representation of abstract data, and the user's identification
of characters and situations in a complex, simulated environment. Furthermore the experience of playing a video game can be strikingly
similar to that of playing music. Each requires high level of control over some physical instrument (such as a joystick or saxophone)
with reactions based or recognizing where one is in the moment, and in each case the player relies on a combination of learned patterns
and contextually appropriate improvisation for success.
Various possibilities have been suggested for adapting different video game scenarios to an interactive music context. One of these
is the flight simulator or navigable Three-Space model, which also includes some racing and combat games. Environments
like these are often real-time and simulation-based, which fits in well with my approach to interactivity in music. There is a strong
parallel between a multi-dimensional virtual space and an abstract musical "space" with different axes corresponding to different
musical parameters.
Quest-oriented adventure games also offer some useful insights. Many allow multiple players to work together to achieve a
common goal of fighting a common enemy. This concept can obviously be extended towards multiple players controlling multiple
musical elements, contributing to a single harmony. Adventure games usually also have a map or other representation of the game
space. Each room in the game space represents an encounter in the adventure, and the local environment defines the parameters of the
encounter and influences the outcome. The sequence of the encounters is influenced by their relative locations and often by the
necessity of solving puzzles in order to proceed into a new area. Players acquire skills and items that enable them to carry out their
quest. In the same way, one could construct an environment with different "rooms" that represent musical themes and encounters that
amount to playing a song or part of a song, while players carry with them items that enable them to complete certain melodies and move
into new musical territory.
Existing sports and action games already feature interactive sound to a degree, although not usually in a musical context. Objects
in the environment may produce sound effects, or alter the tone or tempo of the background music, or trigger a segue to a different
piece of music. Many activities in these games are inherently rhythmic, such as running, jumping, bobbing and weaving, or
dribbling a ball. There is a tremendous potential to exploit these types of on-screen movements to musical ends.
There are many issues which must be considered when creating music for an interactive project, especially if one is to deliver
continuous control in a real-time participatory environment. First among these the problem of timing and resolving user input
to musically consistent events. If an interface allows a user to trigger an event at any moment, the timing of that event usually
must be evaluated in terms of the pulse of the music (bars and beats, etc.). Often a delay will be required (such as waiting until the
next down beat), so the activation of the event does not throw the music out of time, and some form of interim feedback must be
provided. Conversely, the computer will sometimes have to anticipate an expected input that may arrive late. Both
contingencies must be provided for in the music and the interface.
Similarly, segues and transitions between different themes must be handled with consideration. The music as a whole must
"hang together", and jarring or abrupt changes from one segment to another (caused perhaps by a global change to the screen
environment) can be very disruptive. A musical phrase often needs to be resolved before a new one can be introduced. In general,
themes that can be juxtaposed will have to be composed so that they dovetail together. The general timing issues already mentioned
apply here as well.
A third issue is that of depth. Interactive music is by definition non-linear, so a composer will have to write significantly
more music than the intended length of the musical experience if the composition is intended to be heard repeatedly in various ways.
Additionally, he will have to write many more tracks in a given section of music than will necessarily be heard, if he intends to give
the user control over that dimension. He may have to write parts to bridge disparate sections or themes, compose multiple endings,
intros, harmonies, counterpoints, turnarounds and breaks, depending on the intended nature of the experience and level of interaction.
As mentioned before, improvisation is a major element in many forms of music, and a major opportunity for us as developers
of interactive music. Obviously, improvised music cannot be wholly composed ahead of time, but will be generated interactively by the
computer and user as a collaborative musical experience. Improvisation can take place on many levels, from simple variations on a
theme to tripping free-form space jams. Allowing opportunities for improvisation is a major challenge in composing interactive music.
Another issue is the resolution of the local and global orientations at a given moment of the music. Usually, a given
music event can be thought of in more than one way. For example, a chord can be thought of in terms of its relationship to the previous
chord or the chord following it (local orientations), or in terms of its relationship to the current key or its absolute tonic value (global
orientations). This can get many levels deep in some kinds of music. Similarly, questions of where a groove is (in terms of strong and
weak beats, etc.) can be answered in multiple ways, depending on one's musical orientation. These considerations lead us to questions
of Artificial Intelligence models of music, chiefly how much of a composition ought to exist at a global score level, and how much
is realized by individual AI "players" when they perform the composition and how to best represent a musical mind as an AI player.
In most cases, visual representation of the music will be an important consideration. There are many ways one can graphically
depict music, from traditional sheet music to bouncing ball piano rolls to animated musicians to abstract shapes and colors to wacky
creatures and instruments. In general, the graphics ought to illustrate some relationships among the various elements present in
the music, such as key, harmony, rhythm, meter, voicing or instrumentation. Of course this can be done very imaginatively, and
the graphics should reinforce the content of the music. The visuals of an interactive work may also contribute to any story elements
present by providing characters and helping define a point of view and role for the user.
So far we have considered primarily instrumental music, but writing lyrics for interactivity poses a whole other set of
challenges. Like music, language must follow certain grammatical and semantic consistencies. The lyric element of an interactive
music work will likely convey a large aspect of the narrative or overt dramatic content, and is closely tied to the issues of
interactive fiction. As with the visual portion of a work, the issues of point of view, feedback, user role, narrative, and non-linear story
development will require careful consideration, in addition to all the musical elements.
There are multiple methods a composer can use when writing music for interactivity. All of them proceed from the basic premise of
taking a more-or-less defined composition and making it manipulable in any of several ways, which are dependent on the
musical authoring tools available to the composer. Generally the composer ought to be aware of the method(s) to be employed and
write the music with their opportunities and limitations in mind.
The most basic level of imparting an element of interactivity to music is by cueing and queuing sequences. This simply means
that many pre-composed segments of music can be strung together and played back in any random or user-defined order. This is
essentially the same as using the shuffle feature on a CD player, although some sort of navigable tree could greatly aid the user in
sensibly controlling the music and establishing context.
The next deeper level of control is provided by muting and unmuting tracks within a musical sequence or series of
sequences. This represents a musical dimension perpendicular to the ordering of parts, and combining the two can give the impression
of significant musical depth. Like the first method, it relies on pre-composed material, but gives the allows for control of the mix. For
example, a user could choose between one of several bass lines, or elect to have a horn section provide an accompaniment. Again, a
navigable tree structure in the background could control groups of tracks and lead to logical musical choices.
To gain more interactivity, the third level calls for asynchronous firing of sequences. This means having individual riffs or other
segments of music exist independently (with respect to time or meter) from other tracks, lines or patterns. The parts can then be
recombined with a much greater flexibility, allowing for a degree of genuine interactive music composition as opposed to merely slicing
up and shuffling pre-composed songs.
The fourth level of control involves parametrically filtering sequences. This will enable a user to manipulate a track,
sequence, or group of tracks or sequences along a host of parameters such as volume, timbre, tempo, and key. More advanced applications
will allow tracks to re-harmonize themselves in a different mode or voicing, automatically follow chord progressions, or employ a
rhythmic or harmonic template, either pre-composed or generated on the fly. Additionally, this method allows for continual application
of modifiers such as tremolo or pitch bend differentially to individual musical voices or sub-groups.
The deepest level of musical interactivity can be made by generative sequences. In this method, there are no pre-
composed sequences per se; the computer "improvises" the music in real time according to a set of rules set forth by the composer. This
provides opportunities for user input to control the music at a full range of levels and in a huge variety of ways. This is the only
compositional method that is wholly real-time simulation-based. Obviously, it also requires the most sophisticated and intelligent
drivers, and the development these drivers will require a significant amount technological research.
Although each of the above methods provides increasing degrees of interactive control, they are not completely separate or distinct. An
interactive work will probably employ several of the methods to varying extents. Still, the method(s) used will strongly influence
the nature of composition, the kinds of drivers required to generate the music, and the resulting musical experience. An important point
is that each progressively deeper method is also more computationally efficient in terms of maximizing the usage of
existing data, and providing opportunities for thematic development and variation.
In order to create interactive music experiences, we will generally need to have highly malleable musical data at our disposal, and the
capacity for high quality sound delivery. Currently the two primary means of processing and producing or reproducing sound in computer
mediated environments are MIDI Sequencing and Digital Audio Sampling. Each has its relative merits and drawbacks, but
the two technologies can be used in tandem to get the most advantage from each.
MIDI sequencing has evolved into a primary way of working for many electronic musicians and composers. The MIDI (Musical
Instrument Digital Interface) standard is universally supported by computers, synthesizers, and a vast host of other gear. The great
strength of MIDI is that it treats music as data, representing individual notes in terms of their pitch, velocity, and duration. Many
other control parameters are supported that can effect timbre, volume, instrument, key, and any other imaginable factor that might
effect the sound of a note. A MIDI sequence is simply a list of these note and control messages indexed in respect to time. An additional
advantage is that MIDI has evolved from a live performance orientation, and is very well suited for real time applications. The
major drawback of MIDI is that since the music is represented as performance data, it requires external sound renderers
(synthesizers, effects processors, mixers, etc.) to realize the music. These machines vary greatly in their capabilities and programming
implementation. In general, every MIDI studio is unique, and the sound must be custom designed with the specific studio environment
and gear in mind.
Digital audio sampling and playback is the major alternative to MIDI sequencing and sound synthesis. With digital audio, sound is
recorded directly into the memory of a computer, where it can be processed, filtered, edited, looped, and otherwise manipulated. This
is a less flexible method, since all the music must be performed and recorded ahead of time, and cannot be composed on the fly.
Additionally, audio samples require enormous amounts of disk space for storage and RAM for playback, especially with high quality
stereo sound. This also limits the number of samples that can be played back and manipulated simultaneously. However, digital audio
files rely much less on quirky external hardware, and can be ported across multiple platforms with reasonably consistent results. Also,
well produced and completed source material can be directly adapted to Digital Audio for interactive applications. It is also currently the
best means of reproducing in a computer environment dialogue, vocal music with lyrics, and other sounds which cannot be easily synthesized.
The digital audio and MIDI realms can be combined using MIDI controlled sampling devices, which can trigger samples and
manipulate them with the same flexibility as with any other sound source. Such sample playback devices can be emulated on a
computer and may allow a complete, self-contained system for realizing an interactive music product.
The concept of a character as a fundamental organizing unit has been central to my approach. A character consists of several
things: the component artwork that comprises the different cel cycles or behavioral loops, the musical sequences, samples, or
algorithms for the character, and modules of code to generate behavior, to map MIDI input to the animation, to receive and
interpret input signals from the user to the character, and coordinate the component elements in real time. One of the main
musical objectives is to consistently identify the individual screen characters with different voices or instruments in the musical
arrangement.
To this end, an important method of achieving a meaningful link between musical and visual elements is through a technology I have
dubbed MIDI Puppets. These are animated screen characters whose moment-to-moment movements are generated by reading
musical data from an incoming MIDI stream and calculating the appropriate position to correspond to a musical event. For example,
a singer would open its mouth when a Note On command arrived on the MIDI channel that triggered the voice associated with the
character. This is a very strong technique, since the character is being controlled by the same data as the synthesizer responsible for
rendering the audio portion of the simulation, and the same animation engine will drive a character for any music, whether a
sequenced composition, an algorithmically-created composition, or input from a live musical performance.
For some puppets, I have employed a refined version of this method which differentiates note triggers and pitch values and
combinatorically derives the correct behavior for the puppet for the current moment. For example, my GigMe Drummer (note: the version of the Drummer on the WWW has been modified to operate completely within the Shockwave
environment and uses samples instead of MIDI, but the end application is similar) can differentiate six ranges of pitch values,
corresponding (in the General MIDI specification) to bass drum, snare drum, hi-hat pedal, hi-hat stick, crash cymbal, and ride cymbal.
Whenever a MIDI signal is received it evaluated against the six drum types (remaining valid for fifty milliseconds, which is 1.5 times the
duration of an animation frame). Based on a default state of the Drummer hitting no drums, I have derived what is essentially a six
dimensional matrix of behavioral response. The code that comprise the drummer's animation engine is very modular and can be easily
recalibrated to control any set of character artwork along any set of MIDI parameters.Concept and Background
Conceptual Models
Issues in Composing Music for Interactivity
Methods of Generating Interactive Music
Sound Rendering Technology Considerations
Character-Based Music-Driven Animation Techniques