On Composing Interactive Music

John Szinger, 1993


Concept and Background

Interactive music is a field which is still in its infancy but is becoming more widespread as multimedia technology is more capable of supporting it. There are many approaches one can take when creating an interactive music piece, and there are many issues one must take into consideration when developing an approach to a project. Often music in a multimedia project takes a secondary importance to the other elements such as the graphics, action or narrative. But music can be produced to be much more contextual and meaningful. Indeed, a whole new genre of applications can be created in which music is the driving force. These may be dubbed participatory music environments. In such environments the user or player controls the music in a virtual world rich with visual and spatial cues to reinforce the musical actions.

When composing music for an interactive application, the goal is to tap the inherent ability of music to express emotion, evoke a sense of drama, and communicate a story. One should take advantage of many musical devices, such as tension, release, consonance, dissonance, rhythm, tempo, timbre, voice, lyric, structure, repetition, variation, and a variety of other elements, and adapt these elements to an interactive context. A key element of my approach to interactive music development has been that the environments are real-time simulation-based, as opposed to the many so-called "interactive" database-browser or slide show-type products currently available.

My work has been focused in two main areas: perceptual and technological. On the perceptual side, the issues are knowing what elements in music people will respond to emotionally and intuitively; what are appropriate musical parameters for user control and for "smart" computer control; what are appropriate methods of input control and visual feedback; how can one design a musical environment so that users can easily identify their contribution to an interactive composition and have a sense that they are "making it happen"; and what are appropriate metaphors for the role of the user(s) in a participatory musical experience. On the technological side, we are simply interested in finding and developing the proper tools to build interactive music environments in terms of hardware platforms, controllers and interface devices, software operating systems and authoring environments, musical data protocols, audio standards, and sound generation gear (synthesizers and samplers).

To a musician, all music is interactive in the sense that one is as active participant in the creation of the music, even when that participation involves only listening. To an audience, the main difference between experiencing a live musical performance and listening to a recording is the sense of involvement and interaction with the music, the crowd and the musicians. The knowledge the music is in the future and unrealized adds a sense of excitement and unpredictability.

Improvisation is a major feature of musical performances in many genres. Musicians typically improvise off of a composed piece, or a composition may have parts in it that have room for varying degrees of improvisation. In either case, the piece being improvised can be thought of as a musical "space", or non-linear domain, (bounded by the parameters of the composition, style, etc.) with each performance being a unique instance of expression in that realm, a squiggly line that maps a "fly-through" of that musical space. The musician's moment-to-moment decisions shape this line in real time and these decisions are influenced by many simultaneous factors: the global parameters of the tune he is playing, the music being made by the musicians he is jamming with, the mood and expectations of the audience, his own mood, and perhaps the desire to make a specific statement or reach a particular musical destination. The audience can be a direct participant in this process by providing context, response, and collective influence for the musicians. Indeed, many performers make the audience active participants in their live shows. Similarly, the goal in an interactive music piece is to make the player an active participant in the creation and direction of the music through his actions in the environment.


Conceptual Models

A genre that has been a great source of inspiratation is animated cartoons. In many great classic cartoons, the entire action proceeds from the music, as does the pacing, tone, and choreography. In the best ones, the soundtrack is so well crafted that the line between the score and sound effects is indistinguishable. Furthermore, many cartoons are overtly musical in their themes and actions, or proceed directly from the music as the source of inspiration for the rest of the work. They stand as important examples of the use of a musical score as the basis for a multimedia production, in which the scores closely support carefully choreographed actions and themes. Indeed, the genres of cartoons and video games may ultimately merge into a single art form.

An examination of video games can provide us with some other conceptual models. Video games in general have solved many of the problems one faces in the areas of interface, point of view, graphic representation of abstract data, and the user's identification of characters and situations in a complex, simulated environment. Furthermore the experience of playing a video game can be strikingly similar to that of playing music. Each requires high level of control over some physical instrument (such as a joystick or saxophone) with reactions based or recognizing where one is in the moment, and in each case the player relies on a combination of learned patterns and contextually appropriate improvisation for success.

Various possibilities have been suggested for adapting different video game scenarios to an interactive music context. One of these is the flight simulator or navigable Three-Space model, which also includes some racing and combat games. Environments like these are often real-time and simulation-based, which fits in well with my approach to interactivity in music. There is a strong parallel between a multi-dimensional virtual space and an abstract musical "space" with different axes corresponding to different musical parameters.

Quest-oriented adventure games also offer some useful insights. Many allow multiple players to work together to achieve a common goal of fighting a common enemy. This concept can obviously be extended towards multiple players controlling multiple musical elements, contributing to a single harmony. Adventure games usually also have a map or other representation of the game space. Each room in the game space represents an encounter in the adventure, and the local environment defines the parameters of the encounter and influences the outcome. The sequence of the encounters is influenced by their relative locations and often by the necessity of solving puzzles in order to proceed into a new area. Players acquire skills and items that enable them to carry out their quest. In the same way, one could construct an environment with different "rooms" that represent musical themes and encounters that amount to playing a song or part of a song, while players carry with them items that enable them to complete certain melodies and move into new musical territory.

Existing sports and action games already feature interactive sound to a degree, although not usually in a musical context. Objects in the environment may produce sound effects, or alter the tone or tempo of the background music, or trigger a segue to a different piece of music. Many activities in these games are inherently rhythmic, such as running, jumping, bobbing and weaving, or dribbling a ball. There is a tremendous potential to exploit these types of on-screen movements to musical ends.


Issues in Composing for Interactivity

There are many issues which must be considered when creating music for an interactive project, especially if one is to deliver continuous control in a real-time participatory environment. First among these the problem of timing and resolving user input to musically consistent events. If an interface allows a user to trigger an event at any moment, the timing of that event usually must be evaluated in terms of the pulse of the music (bars and beats, etc.). Often a delay will be required (such as waiting until the next down beat), so the activation of the event does not throw the music out of time, and some form of interim feedback must be provided. Conversely, the computer will sometimes have to anticipate an expected input that may arrive late. Both contingencies must be provided for in the music and the interface.

Similarly, segues and transitions between different themes must be handled with consideration. The music as a whole must "hang together", and jarring or abrupt changes from one segment to another (caused perhaps by a global change to the screen environment) can be very disruptive. A musical phrase often needs to be resolved before a new one can be introduced. In general, themes that can be juxtaposed will have to be composed so that they dovetail together. The general timing issues already mentioned apply here as well.

A third issue is that of depth. Interactive music is by definition non-linear, so a composer will have to write significantly more music than the intended length of the musical experience if the composition is intended to be heard repeatedly in various ways. Additionally, he will have to write many more tracks in a given section of music than will necessarily be heard, if he intends to give the user control over that dimension. He may have to write parts to bridge disparate sections or themes, compose multiple endings, intros, harmonies, counterpoints, turnarounds and breaks, depending on the intended nature of the experience and level of interaction.

As mentioned before, improvisation is a major element in many forms of music, and a major opportunity for us as developers of interactive music. Obviously, improvised music cannot be wholly composed ahead of time, but will be generated interactively by the computer and user as a collaborative musical experience. Improvisation can take place on many levels, from simple variations on a theme to tripping free-form space jams. Allowing opportunities for improvisation is a major challenge in composing interactive music.

Another issue is the resolution of the local and global orientations at a given moment of the music. Usually, a given music event can be thought of in more than one way. For example, a chord can be thought of in terms of its relationship to the previous chord or the chord following it (local orientations), or in terms of its relationship to the current key or its absolute tonic value (global orientations). This can get many levels deep in some kinds of music. Similarly, questions of where a groove is (in terms of strong and weak beats, etc.) can be answered in multiple ways, depending on one's musical orientation. These considerations lead us to questions of Artificial Intelligence models of music, chiefly how much of a composition ought to exist at a global score level, and how much is realized by individual AI "players" when they perform the composition and how to best represent a musical mind as an AI player.

In most cases, visual representation of the music will be an important consideration. There are many ways one can graphically depict music, from traditional sheet music to bouncing ball piano rolls to animated musicians to abstract shapes and colors to wacky creatures and instruments. In general, the graphics ought to illustrate some relationships among the various elements present in the music, such as key, harmony, rhythm, meter, voicing or instrumentation. Of course this can be done very imaginatively, and the graphics should reinforce the content of the music. The visuals of an interactive work may also contribute to any story elements present by providing characters and helping define a point of view and role for the user.

So far we have considered primarily instrumental music, but writing lyrics for interactivity poses a whole other set of challenges. Like music, language must follow certain grammatical and semantic consistencies. The lyric element of an interactive music work will likely convey a large aspect of the narrative or overt dramatic content, and is closely tied to the issues of interactive fiction. As with the visual portion of a work, the issues of point of view, feedback, user role, narrative, and non-linear story development will require careful consideration, in addition to all the musical elements.


Methods of Generating Interactive Music

There are multiple methods a composer can use when writing music for interactivity. All of them proceed from the basic premise of taking a more-or-less defined composition and making it manipulable in any of several ways, which are dependent on the musical authoring tools available to the composer. Generally the composer ought to be aware of the method(s) to be employed and write the music with their opportunities and limitations in mind.

The most basic level of imparting an element of interactivity to music is by cueing and queuing sequences. This simply means that many pre-composed segments of music can be strung together and played back in any random or user-defined order. This is essentially the same as using the shuffle feature on a CD player, although some sort of navigable tree could greatly aid the user in sensibly controlling the music and establishing context.

The next deeper level of control is provided by muting and unmuting tracks within a musical sequence or series of sequences. This represents a musical dimension perpendicular to the ordering of parts, and combining the two can give the impression of significant musical depth. Like the first method, it relies on pre-composed material, but gives the allows for control of the mix. For example, a user could choose between one of several bass lines, or elect to have a horn section provide an accompaniment. Again, a navigable tree structure in the background could control groups of tracks and lead to logical musical choices.

To gain more interactivity, the third level calls for asynchronous firing of sequences. This means having individual riffs or other segments of music exist independently (with respect to time or meter) from other tracks, lines or patterns. The parts can then be recombined with a much greater flexibility, allowing for a degree of genuine interactive music composition as opposed to merely slicing up and shuffling pre-composed songs.

The fourth level of control involves parametrically filtering sequences. This will enable a user to manipulate a track, sequence, or group of tracks or sequences along a host of parameters such as volume, timbre, tempo, and key. More advanced applications will allow tracks to re-harmonize themselves in a different mode or voicing, automatically follow chord progressions, or employ a rhythmic or harmonic template, either pre-composed or generated on the fly. Additionally, this method allows for continual application of modifiers such as tremolo or pitch bend differentially to individual musical voices or sub-groups.

The deepest level of musical interactivity can be made by generative sequences. In this method, there are no pre-composed sequences per se; the computer "improvises" the music in real time according to a set of rules set forth by the composer. This provides opportunities for user input to control the music at a full range of levels and in a huge variety of ways. This is the only compositional method that is wholly real-time simulation-based. Obviously, it also requires the most sophisticated and intelligent drivers, and the development these drivers requires a significant amount technological research.

Although each of the above methods provides increasing degrees of interactive control, they are not completely separate or distinct. An interactive work will probably employ several of the methods to varying extents. Still, the method(s) used will strongly influence the nature of composition, the kinds of drivers required to generate the music, and the resulting musical experience. An important point is that each progressively deeper method is also more computationally efficient in terms of maximizing the usage of existing data, and providing opportunities for thematic development and variation.


Sound Rendering Technology Considerations

In order to create interactive music experiences, we will generally need to have highly malleable musical data at our disposal, and the capacity for high quality sound delivery. Currently the two primary means of processing and producing or reproducing sound in computer mediated environments are MIDI Sequencing and Digital Audio Sampling. Each has its relative merits and drawbacks, but the two technologies can be used in tandem to get the most advantage from each.

MIDI sequencing has evolved into a primary way of working for many electronic musicians and composers. The MIDI (Musical Instrument Digital Interface) standard is universally supported by computers, synthesizers, and a vast host of other gear. The great strength of MIDI is that it treats music as data, representing individual notes in terms of their pitch, velocity, and duration. Many other control parameters are supported that can effect timbre, volume, instrument, key, and any other imaginable factor that might effect the sound of a note. A MIDI sequence is simply a list of these note and control messages indexed in respect to time. An additional advantage is that MIDI has evolved from a live performance orientation, and is very well suited for real time applications. The major drawback of MIDI is that since the music is represented as performance data, it requires external sound renderers (synthesizers, effects processors, mixers, etc.) to realize the music. These machines vary greatly in their capabilities and programming implementation. In general, every MIDI studio is unique, and the sound must be custom designed with the specific studio environment and gear in mind.

Digital audio sampling and playback is the major alternative to MIDI sequencing and sound synthesis. With digital audio, sound is recorded directly into the memory of a computer, where it can be processed, filtered, edited, looped, and otherwise manipulated. This is a less flexible method, since all the music must be performed and recorded ahead of time, and cannot be composed on the fly. Additionally, audio samples require enormous amounts of disk space for storage and RAM for playback, especially with high quality stereo sound. This also limits the number of samples that can be played back and manipulated simultaneously. However, digital audio files rely much less on quirky external hardware, and can be ported across multiple platforms with reasonably consistent results. Also, well produced and completed source material can be directly adapted to Digital Audio for interactive applications. It is also currently the best means of reproducing in a computer environment dialogue, vocal music with lyrics, and other sounds which cannot be easily synthesized.

The digital audio and MIDI realms can be combined using MIDI controlled sampling devices, which can trigger samples and manipulate them with the same flexibility as with any other sound source. Such sample playback devices can be emulated on a computer and may allow a complete, self-contained system for realizing an interactive music product.


Character-Based Music-Driven Animation Techniques

The concept of a character as a fundamental organizing unit has been central to my approach. A character consists of several things: the component artwork that comprises the different cel cycles or behavioral loops, the musical sequences, samples, or algorithms for the character, and modules of code to generate behavior, to map MIDI input to the animation, to receive and interpret input signals from the user to the character, and coordinate the component elements in real time. One of the main musical objectives is to consistently identify the individual screen characters with different voices or instruments in the musical arrangement.

To this end, an important method of achieving a meaningful link between musical and visual elements is through a technology I have dubbed MIDI Puppets. These are animated screen characters whose moment-to-moment movements are generated by reading musical data from an incoming MIDI stream and calculating the appropriate position to correspond to a musical event. For example, a singer would open its mouth when a Note On command arrived on the MIDI channel that triggered the voice associated with the character. This is a very strong technique, since the character is being controlled by the same data as the synthesizer responsible for rendering the audio portion of the simulation, and the same animation engine will drive a character for any music, whether a sequenced composition, an algorithmically-created composition, or input from a live musical performance.

For some puppets, I have employed a refined version of this method which differentiates note triggers and pitch values and combinatorically derives the correct behavior for the puppet for the current moment. For example, my GigMe Drummer can differentiate six ranges of pitch values, corresponding (in the General MIDI specification) to bass drum, snare drum, hi-hat pedal, hi-hat stick, crash cymbal, and ride cymbal. Whenever a MIDI signal is received it evaluated against the six drum types (remaining valid for fifty milliseconds, which is 1.5 times the duration of an animation frame). Based on a default state of the Drummer hitting no drums, I have derived what is essentially a six dimensional matrix of behavioral response. The code that comprise the drummer's animation engine is very modular and can be easily recalibrated to control any set of character artwork along any set of MIDI parameters.


Narrative and Navigation

Another part of the interface issue deals with how to navigate a non-linear musical space, how to enable a user to go to different parts of a song. This becomes especially important when using music to relate a story and particular pieces of music must be matched with specific narrative events. The three basic methods of moving forward in this type of environment are object differentiation, spatial differentiation and temporal or state-dependent differentiation.

Object differentiation is best exemplified by my implementation of the musician-character concept. Each character has his own musical state which is influenced by global factors. To the user this means that interacting with a given character (by clicking, dragging, shooting, colliding, or whatever) will have a consistent musical result, usually associated with a particular instrument in the mix.

With spatial differentiation, different areas or objects on the screen are mapped to specific musical events. For example, the melody being sung by a backup singer in a choir is directly dependent on which riser the character is standing. Similarly, a bass player can groove in one of several ways depending on where on the stage he is standing. Another character, when being driven around the stage by the user will trigger a percussion sound whenever takes a step. The volume at which each character is playing may directly proportional to the distance between the cursor and that character.

A third method is temporal or state differentiation. By this I mean that a change in behavior is triggered by a sequence of specific inputs in time or a complex relationship in the change of several conditional states. One example of this is that when a character is clicked in an Attract Mode, it triggers a solo riff, whereas the same click in a Groove Mode results in a different behavior, such as moving to a new position and singing a different line. Many other examples can be created from this principal.

These methods can be combined to create a vareity of rich interactive musical expeiences.