Sound Design at Edinburgh Napier University

Category: Music

How Do You Design a Virtual Instrument? Alejandro Cabrera on Sampling, Sound Design, and Building Kontakt Libraries

How do you design a virtual instrument?

Every virtual instrument begins long before the first note is recorded. Musicians often experience sample libraries as polished products that load instantly inside a digital audio workstation, responding naturally to every performance. Hidden behind that apparent simplicity lies an extraordinary amount of planning, recording, editing and technical development. During an online guest lecture for Edinburgh Napier University, Sound Design alumnus Alejandro Cabrera drew upon his professional experience developing sample libraries at 8Dio to reveal how professional virtual instruments are created. Although he used Kontakt to illustrate many of the techniques, his wider message extended well beyond any individual software platform. Successful sound design depends as much upon preparation, organisation and critical listening as it does upon recording itself.

Cabrera began by challenging a common misconception. Building a virtual instrument is not simply a matter of recording every note and loading the resulting files into a sampler. Instead, it is a carefully structured process comprising pre-production, recording, editing and software development, with each stage influencing everything that follows. Recording sessions may occupy only a small proportion of the overall project, yet their success depends almost entirely upon the decisions made beforehand. Choosing the instrument, selecting an appropriate recording space, determining microphone configurations, deciding which articulations should be captured and calculating the number of samples required all take place before the recording engineer presses record. By the time the first note is performed, many of the most significant creative decisions have already been made.

Planning emerged as one of the defining themes of Cabrera’s presentation. Recording studios are expensive environments in which every unnecessary decision consumes valuable time. Arriving without a detailed recording plan risks producing inconsistent material, overlooking essential articulations or capturing far more audio than the finished instrument will ever require. To avoid these problems, Cabrera demonstrated the production sheets used to calculate precisely how many samples each instrument will need. The combination of notes, microphone positions, dynamic layers, articulations and recorded variations quickly expands into thousands of individual files. Even a comparatively modest instrument can generate an unexpectedly large collection of audio once every variation has been considered. Careful preparation therefore becomes far more than administrative organisation. It provides the framework upon which the entire virtual instrument will later be constructed.

This emphasis upon preparation reflects a broader principle that extends well beyond sample library development. Whether recording Foley, ambience, dialogue or musical instruments, professional sound designers rarely begin by placing microphones in front of a source and hoping for the best. They begin by asking what the finished project needs to achieve. Every technical decision should support that objective. Microphone placement depends upon the character of the instrument, the intended listening experience and the amount of flexibility required during production. Recording an intimate acoustic instrument demands different decisions from sampling a full drum kit with multiple microphone positions, while noisy environments require different strategies from carefully controlled studio spaces. Cabrera encouraged students to think of recording not as an isolated technical exercise, but as one stage within a much larger design process in which every decision influences those that follow.

One particularly revealing discussion centred upon how rapidly complexity increases once realism becomes the goal. Professional sample libraries rarely rely upon a single recording of each note. Different playing dynamics, alternative articulations, multiple microphone positions and repeated performances all contribute towards creating an instrument that responds naturally to the performer. Cabrera introduced concepts such as velocity layers and round robins, not simply as software features, but as perceptual design decisions. Human listeners detect repeated sounds remarkably quickly. Replaying exactly the same recording whenever a note is triggered produces an artificial, mechanical quality that immediately reveals the illusion. Recording carefully controlled variations allows the instrument to remain convincing even during repeated passages, illustrating that realism often depends less upon producing more sound than upon introducing meaningful variation. The objective is not to simulate every possible performance. It is to create enough believable variation that musicians stop thinking about the technology and simply play.

By this point, a recurring theme had become unmistakable. Building a convincing virtual instrument is not primarily a software problem. It is a sound design problem. The quality of the finished library depends upon understanding the instrument, anticipating how musicians will perform with it and making thoughtful decisions long before the first recording session begins. Technology undoubtedly provides the tools, though preparation, organisation and critical listening determine how successfully those tools can ultimately be used.

Once the recordings have been completed, the project enters what is often the longest and least visible stage of development. Thousands of individual recordings must be reviewed, edited and organised before they can become a playable instrument. Cabrera emphasised that this work extends far beyond removing unwanted noise or trimming the beginnings and endings of files. Every sample must behave consistently alongside every other sample, allowing the finished instrument to respond naturally regardless of how it is played. Editing therefore becomes a continuation of the design process rather than a separate technical activity. Decisions made at this stage shape the responsiveness of the instrument every bit as much as the recordings themselves.

Organisation proved equally important. A professional sample library may contain many thousands of individual audio files representing different notes, articulations, dynamic levels, microphone positions and performance variations. Without a rigorous naming convention and carefully structured file management, even relatively modest projects quickly become difficult to maintain. Cabrera demonstrated how systematic organisation supports every subsequent stage of development. Samples can be located immediately, revisions become easier to implement and future updates remain manageable long after the original recording sessions have finished. Good organisation rarely attracts attention, yet it underpins almost every successful production workflow.

The discussion then turned to Kontakt, the software platform used to assemble these recordings into fully playable virtual instruments. Rather than presenting Kontakt as a collection of technical features, Cabrera used it to demonstrate a broader principle. Software should serve the behaviour of the instrument rather than dictate it. Every mapping decision, performance control and scripting choice exists to make the instrument respond in ways that feel intuitive to the musician. The objective is not simply to trigger recordings accurately, but to create the impression that a real instrument is responding naturally to performance. Technology becomes valuable only when it disappears behind the experience of playing.

This philosophy also shaped Cabrera’s discussion of scripting. Many musicians never see the programming that sits beneath the graphical interface, yet these invisible systems determine how the instrument behaves. Scripts decide which recordings should be triggered, how different articulations are selected, how repeated notes vary over time and how controls respond to the performer. Much of the intelligence within a modern virtual instrument therefore lies not in the recordings themselves, but in the logic that governs their behaviour. Sound design, software engineering and user experience become closely interconnected, each contributing towards the illusion that the performer is interacting with a coherent musical instrument rather than a collection of audio files.

Throughout the discussion, Cabrera consistently resisted the temptation to equate realism with complexity. Recording more samples, adding more controls or increasing the number of available options does not automatically produce a better instrument. Every additional recording increases editing time, complicates organisation and places greater demands upon storage, processing power and the musician using the library. The more important question concerns value rather than volume. Which additional recordings genuinely improve the playing experience, and which merely increase complexity without offering meaningful benefit? Successful virtual instruments emerge through thoughtful selection rather than unlimited accumulation.

These decisions reflect a much broader principle within sound design. Whether recording dialogue, creating Foley, designing interactive game audio or developing sample libraries, practitioners continually shape the listener’s experience by deciding which details deserve attention and which can remain implicit. Technology undoubtedly expands the range of available possibilities, though it rarely removes the need for editorial judgement. Every successful project depends upon identifying the information that listeners or performers genuinely need, then presenting it clearly without unnecessary complication. The objective is not technical excess, but meaningful communication.

The discussion also highlighted the collaborative nature of professional practice. Developing a virtual instrument combines disciplines that are often treated separately within education and industry. Recording engineers, musicians, software developers, editors, interface designers and producers each contribute different forms of expertise, yet the finished instrument succeeds only when those contributions work together coherently. Cabrera’s examples demonstrated that professional sound design rarely develops in isolation. The most effective solutions emerge when technical and creative perspectives continually inform one another throughout the production process rather than being treated as independent stages.

Taken together, these discussions revealed that virtual instruments represent far more than collections of recorded sounds. They are carefully designed systems that combine acoustics, performance, recording, editing and software into a single expressive tool. Every decision, from the earliest planning documents to the final user interface, contributes towards the illusion that a performer is interacting with a living instrument rather than triggering digital recordings. For sound designers, perhaps that is the most enduring lesson. The success of a design is rarely determined by the sophistication of its technology alone. It depends upon how completely the technology disappears, allowing creativity, expression and musical performance to take centre stage.

4 May 2026
How Do You Make an Orchestra Fit Inside a Television Show? Phil McGowan on Recording, Mixing, and the Sound of Star Trek: Picard

How do you make an orchestra fit inside a television show?

At first glance, the answer appears straightforward. Musicians gather in a studio, microphones are placed around the room, a conductor raises a baton, and the music is recorded. Yet during his online guest lecture for Edinburgh Napier University, recording and mixing engineer Phil McGowan revealed a process that is considerably more complex. Drawing upon his work on Star Trek: Picard, McGowan described a world of orchestral recording that combines musical performance, engineering, editing, production management, and problem-solving. By the end of the lecture, it became clear that recording an orchestra is only one small part of a much larger process. Throughout the lecture, McGowan repeatedly returned to the importance of preparation, organisation, and communication. Although microphones, software, and recording techniques played important roles, many of the challenges he described ultimately concerned coordinating people, decisions, and workflows across an unusually complex production process.

McGowan began by introducing the recording sessions for the third season of Star Trek: Picard. Across ten episodes, the score was recorded using large orchestral forces, with most episodes featuring a sixty-five-piece ensemble recorded at Warner Brothers Studios in Burbank. For the majority of the season, the orchestra was divided across separate recording sessions. Strings and woodwinds were recorded together, while brass was recorded later. Only the final episode brought the entire eighty-piece orchestra into the room simultaneously. Although audiences often imagine a film score as a single orchestra performing together, McGowan explained that modern production frequently relies upon these layered recording approaches. Recording sections separately provides greater flexibility during mixing while allowing music editors and dubbing mixers more control later in the production process.

Yet even before a note is recorded, a surprising number of decisions have already been made. The placement of every section within the room affects both the recording and the eventual mix. Strings, woodwinds, brass, piano, harp, and other instruments each occupy carefully chosen positions. Microphone placement becomes equally important. Looking at the recording diagrams shown during the lecture, it was difficult not to be struck by the sheer number of microphones involved. Individual sections receive dedicated spot microphones, larger groups receive overhead microphones, and the entire orchestra is captured by an array of room microphones positioned high above the ensemble.

What was particularly interesting, however, was McGowan’s repeated emphasis that the most important microphones are often not the closest ones. In a well-designed scoring stage, much of the orchestra’s character emerges from a relatively small number of carefully positioned room microphones. Spot microphones provide detail, definition, and control, though the overall impression of the orchestra often comes from the way the ensemble interacts with the acoustic space itself. Rather than constructing an orchestral sound entirely from individual instruments, the recording process begins with capturing the orchestra as a unified musical body.

This relationship between detail and cohesion appeared repeatedly throughout the lecture. Modern recording technology allows engineers to place microphones extremely close to instruments. Individual players can be isolated with remarkable precision. Yet McGowan’s approach demonstrates considerable restraint. Spot microphones are available when needed, though many remain relatively low in the final mix. The objective is not to maximise separation. Instead, it is to preserve the sense that listeners are hearing a single orchestra performing together within a shared acoustic environment.

Recording the orchestra is only the beginning. Once the sessions finish, the material enters a complex process of editing and mixing. Here, McGowan’s role becomes particularly interesting. The raw recordings arrive alongside extensive collections of programmed material supplied by the composer. Modern television scores often combine live orchestral recordings with sampled instruments, synthesizers, percussion libraries, pads, textures, and electronic elements. One of the mixer’s responsibilities is deciding how these different layers should coexist.

What emerged from the lecture was a strong preference for using the live recordings whenever possible. Sampled instruments often provide useful support, additional weight, or subtle reinforcement, though McGowan repeatedly emphasised that the live orchestra remains the foundation of the sound. The samples are rarely intended to replace the musicians. Instead, they are carefully blended into the mix where appropriate.

Organisation becomes essential at this stage. Large orchestral sessions generate enormous numbers of tracks. Strings, brass, woodwinds, percussion, piano, harp, synthesizers, effects, and auxiliary elements all require separate management. McGowan demonstrated how sessions are organised into stems, allowing different components of the score to be adjusted independently later in the production process. These stems become particularly important when the music eventually reaches the dubbing stage, where it must coexist with dialogue, sound effects, Foley, ambience, and every other element of the soundtrack.

This relationship between music and the rest of the soundtrack formed one of the most revealing parts of the discussion. Audiences often imagine that a score reaches the screen in essentially the same form in which it leaves the recording studio. McGowan demonstrated that the reality is considerably more complicated. The music mixer occupies a position between composition and final dubbing, shaping material that must eventually coexist with dialogue, Foley, ambience, sound effects, and every other component of the soundtrack.

This creates an unusual challenge. During the mixing process, the final soundtrack often does not yet exist. Dialogue may still be evolving. Effects tracks may be incomplete. Editorial changes may continue arriving. The mixer therefore works partly with the present version of the programme and partly with an anticipated future version. Decisions must account not only for what is currently on screen but also for what will eventually happen when the material reaches the dubbing stage.

In this sense, music mixing becomes an act of translation. The composer’s intentions need to remain intact, though they must also survive the practical realities of television production. A passage that sounds spectacular in isolation may compete with dialogue once the final soundtrack is assembled. A delicate orchestral texture may disappear beneath effects. A dramatic crescendo may need flexibility if the editorial structure changes. The mixer therefore balances musical priorities with narrative requirements, ensuring that the score remains expressive while still serving the larger needs of the programme.

McGowan described the importance of communication throughout this process. Conversations with composers, music editors, producers, and re-recording mixers help establish how the material will ultimately be used. Stem structures become especially valuable here. By separating different orchestral and electronic elements into organised groups, later stages of production retain the flexibility needed to support storytelling decisions. What appears to be a purely technical workflow is therefore deeply connected to narrative concerns.

Seen in this light, the music mixer occupies a remarkably important position within the production chain. The role involves much more than balancing levels or applying plug-ins. It requires understanding composition, orchestration, recording, editing, post-production, and storytelling simultaneously. The objective is not simply to make the music sound good. The objective is to ensure that the music can fulfil its dramatic function once every other element of the soundtrack is finally assembled.

Questions of storytelling therefore remain central throughout the process. Although the lecture contained detailed discussions of microphones, reverbs, routing structures, and plug-ins, these technical topics were rarely presented as ends in themselves. Instead, they were framed as tools supporting dramatic communication. Reverb is not merely an acoustic effect. It helps create scale, atmosphere, and emotional character. Stem structures are not simply organisational devices. They provide flexibility for storytelling. Even microphone choices ultimately serve narrative goals.

A particularly striking example emerged in McGowan’s discussion of reverberation. For Star Trek: Picard, the production deliberately embraced a more expansive orchestral sound inspired by earlier generations of science-fiction scoring. Rather than pursuing absolute clarity or dryness, the score was allowed to inhabit larger acoustic spaces. The resulting sound connects contemporary production practices with earlier traditions of science-fiction scoring associated with composers such as Jerry Goldsmith and James Horner. Listening to McGowan describe these decisions, it became clear that technical choices often carry historical and aesthetic significance as well.

The lecture also offered a fascinating glimpse into the practical realities of large-scale media production. Television schedules are rarely generous. Recording sessions must fit within union regulations, musicians’ availability, studio bookings, editorial deadlines, and dubbing schedules. Scores are often recorded while other parts of the production remain unfinished. Picture edits may continue evolving. Visual effects may still be in development. Deadlines continue approaching regardless.

Under such conditions, consistency becomes invaluable. McGowan described how recording setups, templates, routing structures, and mixing approaches are designed to remain stable across multiple episodes. Establishing reliable systems allows creative decisions to happen more efficiently. Rather than reinventing workflows repeatedly, engineers can focus their attention on the musical and dramatic needs of each project.

Another recurring theme throughout the lecture was collaboration. Large orchestral productions depend upon extensive networks of expertise. Composers, orchestrators, contractors, recording engineers, Pro Tools operators, music editors, re-recording mixers, musicians, producers, and showrunners all contribute to the final result. No individual controls every aspect of the process. Instead, successful productions emerge through coordination between specialists whose work overlaps at crucial moments.

Listening to McGowan describe recording sessions, one gains a strong sense of the trust involved. Musicians are trusted to perform complex scores with remarkable efficiency. Engineers are trusted to capture those performances accurately. Music editors are trusted to manage revisions and conforming. Dubbing mixers are trusted to integrate the score into the larger soundtrack. The finished music reflects not only technical skill but also a highly collaborative production culture.

Perhaps the most interesting aspect of the lecture was the way it challenged romantic ideas about orchestral recording. Popular accounts often focus on dramatic moments: the orchestra enters the room, the conductor raises a baton, and the music comes to life. Those moments certainly exist. Yet McGowan’s account suggests that the real craft often lies elsewhere. It lies in preparation, organisation, consistency, communication, editing, and the countless small decisions that allow large productions to function successfully.

Looking back across the lecture, what emerges most clearly is not simply a story about recording orchestras. It is a story about connecting different stages of a creative process. Recording sessions, editing workflows, stem preparation, music mixing, and final dubbing all form part of a chain in which every decision influences what follows. Managing that chain requires technical expertise, though it also requires communication, anticipation, and an understanding of how music functions within narrative storytelling. Every stage of the process involves balancing competing demands. Technical precision must coexist with musical expression. Flexibility must coexist with consistency. Individual details must support larger dramatic goals. The orchestra must sound impressive in its own right while still serving the needs of the programme.

For students interested in recording, mixing, or film music production, this may be the lecture’s most valuable lesson. Technology remains important. Microphones matter. Software matters. Recording techniques matter. Yet none of these elements exist in isolation. They are part of a larger system whose purpose is ultimately narrative. The audience does not hear microphone placements, stem structures, or routing templates. They hear music supporting a story.

For Phil McGowan, the challenge is not simply recording an orchestra. The challenge is shaping hundreds of performances, thousands of audio tracks, and countless technical decisions into something that helps bring a fictional world to life. By the time audiences sit down to watch Star Trek: Picard, most of that work has become invisible. The orchestra feels as though it simply belongs there. Achieving that illusion, however, requires an extraordinary amount of craft.

2 February 2026
How Does a Whisky Glass Become an Orchestra? Trevor Wishart on Transformation, Imagination, and Sound

How much can a sound become?

Most of us think of sounds as belonging to identifiable sources. A glass sounds like a glass. A bell sounds like a bell. A voice sounds like a voice. Recording technology allows sounds to be edited, layered, stretched, filtered, and transformed, though we often assume that their essential identity remains tied to the object that created them. During his online guest lecture for Edinburgh Napier University, composer, author, and software developer Trevor Wishart challenged this assumption repeatedly. Drawing on examples from his electroacoustic composition Imago, he explored how a single recorded sound can evolve into something entirely different, revealing possibilities hidden within the material itself.

The lecture centred on a piece whose title provides an important clue to Wishart’s thinking. Imago refers to the final stage of insect metamorphosis, the moment when an apparently unremarkable pupa becomes a butterfly. For Wishart, this process offered more than a title. It provided the conceptual foundation for the composition itself. The piece begins with an extremely modest source: two whisky glasses gently clinking together. From that brief event, lasting only fractions of a second, an entire musical world gradually emerges. Bells, birds, voices, gamelan-like textures, immense resonant structures, and oceanic soundscapes all grow from the same source material. The lecture therefore became an exploration of how transformation occurs, not only within music but within listening itself.

Wishart explained that his compositions often begin with two parallel motivations. One is technical. He wants a problem to investigate, a process to develop, or a question that requires experimentation. The other is poetic. There needs to be a broader reason for making the piece beyond demonstrating a particular technique. Neither is sufficient on its own. Technical ingenuity without expressive purpose quickly becomes sterile, while expressive intentions without any technical challenge provide little opportunity for discovery. Much of his work emerges from the interaction between these two impulses. The technical challenge creates opportunities. The artistic idea provides direction.

This relationship also helps explain why software occupies such an important place in his practice. During the lecture, Wishart reflected on the period when electronic composers often relied upon specialised hardware systems. Such equipment could be expensive, inflexible, and frequently superseded. Learning to program offered a different possibility. Rather than adapting ideas to the limitations of existing tools, it became possible to create processes tailored to specific creative questions. More importantly, software allowed entirely new forms of transformation to be explored. If a process did not already exist, it might be possible to invent it.

Yet what emerges most clearly from Wishart’s account is that invention is rarely the final objective. Again and again, he described composition as a process of exploration. Sounds are transformed not simply to produce novel effects but to discover possibilities hidden within them. Certain experiments fail. Others reveal unexpected directions. Some transformations produce results that could never have been predicted in advance. Listening becomes as important as designing. The composer is not merely constructing sounds. The composer is searching for relationships, behaviours, and opportunities that emerge through experimentation.

The opening of Imago illustrates this approach particularly clearly. The piece begins with isolated whisky-glass impacts separated by substantial periods of silence. The pace is deliberately restrained. Contemporary listeners, accustomed to rapid development, may initially wonder where the material is heading. Yet this simplicity serves an important purpose. If the work concerns metamorphosis, the listener needs to encounter the pupa before encountering the butterfly. The source material remains visible, or rather audible, long enough for its later transformations to carry meaning.

What makes the whisky glass such productive material is the complexity concealed within an apparently simple sound. Strike a glass and a resonance emerges. Listen more carefully and the sound reveals an intricate internal structure. The attack contains numerous frequencies that appear and disappear extremely rapidly. Ordinarily these details pass unnoticed. The event ends too quickly for individual components to be heard. By stretching the sound in time, however, hidden layers become accessible. Frequencies separate. Tiny fluctuations become audible. A sound that initially appeared straightforward begins to reveal unexpected richness.

One of the most memorable moments in the lecture emerged from a story about washing glasses. Wishart described noticing that repeated impacts between two heavy whisky glasses produced an unusual perceptual effect. As the impacts accelerated, there came a point at which they ceased to be heard as individual events. Instead, they fused into a continuous rising pitch. What began as a mundane domestic observation suddenly revealed a remarkable musical possibility. A sequence of impacts had become a tone. More importantly, it suggested a route through which one kind of sound might transform into another.

Experiences such as this appear repeatedly throughout Wishart’s creative process. New ideas often emerge from moments that initially seem insignificant. A process behaves differently than expected. A sound reveals an unanticipated quality. An experiment generates an unexpected result. The challenge is recognising which discoveries deserve further attention. Throughout the lecture, curiosity appeared less as a personality trait than as a working method. Creative progress depends upon noticing what others might ignore. As Imago unfolds, the whisky glasses gradually begin producing sounds that seem increasingly distant from their origin. Resonances expand into bell-like structures. Repeated transformations generate textures that suggest birdsong. Elsewhere, spectral manipulations create sounds with distinctly vocal qualities, as though fragments of speech are beginning to emerge from within the glass itself. None of these transformations completely abandons the original material. Traces of the source remain present, even as new identities begin to appear.

This ambiguity plays an important role within the work. Wishart is rarely concerned with creating perfect imitations. The objective is not to convince listeners that a whisky glass has literally become a bird or a human voice. Instead, he creates sounds that occupy a space between recognition and uncertainty. Listeners hear associations rather than direct representations. A transformed sound may suggest several different identities simultaneously. That tension between familiarity and strangeness gives many of the transformations their expressive character.

The lecture contained numerous examples of this process. Through synchronised transpositions, simple resonances begin forming complex harmonic structures. Spectral blurring allows sounds to emerge gradually from dense textures, creating the impression of material coming into focus. Distortions generate new timbral characteristics that feel organic rather than mechanical. Spatial movement contributes to the sense of evolution, allowing listeners to follow streams of sound as they separate, merge, and transform across the listening space. Each process extends the possibilities contained within the original material.

One particularly striking example involved a large gamelan-like passage that emerges later in the composition. Wishart was careful to explain that he had not set out with the intention of creating a gamelan ensemble from whisky glasses. The possibility emerged through experimentation. Once discovered, however, it became a major structural feature of the work. Earlier sections began functioning as anticipations. Later sections reflected upon what had been revealed. Relationships between different materials gradually became apparent. The composition developed not through the execution of a predetermined blueprint but through recognising patterns that emerged during the process itself.

A similar principle governs some of the work’s largest sonic landscapes. Through extensive transformation, the original material eventually produces textures that evoke oceans and breaking waves. These sounds are not realistic recordings of the sea, nor are they intended to be. Their effectiveness lies in the way they balance abstraction and association. Listeners recognise qualities that resemble waves while remaining aware that they are hearing something more complex. The illusion never becomes complete, and that incompleteness is part of its fascination.

Throughout the lecture, Wishart repeatedly returned to the importance of structure. Transformations alone are not enough. A composition requires relationships between events, phrases, sections, and larger formal shapes. To manage this complexity, he described working hierarchically. Individual sounds become events. Events become phrases. Phrases become sections. Sections become complete works. This approach allows material to remain flexible throughout development. Elements can be revised, expanded, condensed, or reorganised without losing their connection to the broader structure.

An equally revealing observation concerned sounds that might initially appear unsuccessful. Students often assume that every sound within a composition must be remarkable. Wishart suggested otherwise. Certain sounds function primarily as connections. They establish continuity, provide context, or prepare the listener for future developments. Their significance lies not in their individual impact but in their contribution to larger processes. The value of a sound cannot always be judged in isolation.

Looking back across the lecture, what emerges most clearly is not a philosophy of technology but a philosophy of listening. Software matters. Technical processes matter. Spectral transformations, distortions, interpolations, filters, and spatial manipulations all play important roles. Yet they ultimately serve a larger purpose. They create opportunities to discover possibilities hidden within sounds themselves.

For students of sound design, composition, and audio production, this may be the lecture’s most valuable lesson. Creativity is often imagined as the ability to invent entirely new ideas. Wishart’s work suggests something slightly different. New ideas may emerge through paying closer attention to existing ones. A familiar sound may contain far more than it initially reveals. The challenge is learning how to listen deeply enough, experiment patiently enough, and remain curious enough to discover what it might become.

In that sense, Imago is more than a composition about metamorphosis. It demonstrates a way of thinking about sound itself. Every sound contains unrealised possibilities. Given enough imagination, patience, and exploration, even the simplest of sources can become an entire world.

5 January 2026
Playing Along: When Music Is Part of the Game World

“We talk about music that originates from within the diegesis — and not from some non-diegetic player outside of it.”
— Axel Berndt

In a guest lecture on game audio, Dr.-Ing. Axel Berndt examined the role of diegetic music — music that exists within a game’s fictional world and can be heard, performed, or even disrupted by its characters. This kind of music, Berndt argued, is not background or emotional subtext. It is part of the world itself.

Berndt, is a member of the Center of Music and Film Informatics within the Detmold University of Music, working at the intersection of sound design, musical interaction, and adaptive systems. His lecture brought together commercial examples, music-theoretic distinctions, and design considerations to illustrate how music behaves differently when it belongs to the world rather than framing it from outside.

Inside the World: What Makes Music Diegetic

Diegetic music refers to music that originates within the game’s diegesis — its fictional environment. Berndt described it as everything “within this world”: sounds that characters can hear and react to, including wind, speech, and music performed or played through in-world devices.

“If someone switches the radio on, triggers the music box, sings a song, or plays an instrument… their music is also diegetic.”

Examples included a street musician in The Patrician, a pipe player at a party, and the bard at the start of Conquest of the Longbow. In Doom 3, a gaming machine plays music within the scene; in Oceanarium, a robot performs in a clearly defined virtual space. These are not aesthetic flourishes — they anchor music in the logic of the world.

Berndt contrasted this with non-diegetic music, which accompanies a scene without being part of it — such as a film score swelling during a battle. “There is no orchestra sitting on an asteroid during the space battle,” he remarked, highlighting the artificiality of non-diegetic scoring in game environments that otherwise strive for realism.

Sound That Can Be Interrupted

Once music is part of the world, it becomes subject to physical space, interruption, and interaction.

“The simplest type of interaction may be to switch a radio on and off, but there is much more possible.”

Berndt categorised musical interactions as either destructive — disrupting a performance — or constructive, where player input enriches or alters the musical output. In Monkey Island 3, players must stop their crew from singing an extended shanty by choosing responses that are woven into the rhyme scheme. Each interruption is musical and interactive.

“The sequential order of verses and interludes is arranged according to the multiple choice decisions the player makes.”

Such scenes turn performance into a mechanic. Music is not a layer applied to gameplay — it is the gameplay.

When Music Isn’t Polished — And Why That Matters

Berndt emphasised that diegetic music should not always sound flawless. Live performance in reality includes irregularities: tuning fluctuations, missed notes, imperfect timing. Simulating this can enhance believability.

“Fluctuations of intonation, rhythmic asynchrony, wrong notes — these things simply happen in life situations. Including them brings a gain of authenticity.”

He cited the harmonica player in Gabriel Knight, whose wavering tone subtly reinforces the impression of a street musician with limited technical control. Imperfection isn’t failure — it is context-aware design.

Berndt also warned against repetitive loops that expose the limits of a system. When the player leaves and re-enters a scene, and the same music starts again from the beginning, the world appears frozen. “We reached the end of the world,” he said. “There is nothing more to come.”

To counter this, he advocated techniques such as generative variation, asynchronous playback, and music that continues even when not audible — preserving the impression of an autonomous, living environment.

Games Where Music Is the Environment

Berndt’s second category of diegetic music is visualised music — where players engage not just with music in the scene, but with music as the environment itself. This includes rhythm games like Guitar Hero, Dance Dance Revolution, and Crypt of the Necrodancer, where music structures time, space, and action.

“What we actually interact with is music itself. The visuals are just a transformation — an interface that eases our visually coined interaction techniques.”

In Audiosurf, players import their own tracks and race through colour-coded lanes shaped by the waveform. In Rez, players shoot targets that trigger rhythmic events. These games represent a shift from music as accompaniment to music as system.

“The diegesis is the domain of musical possibilities. The visual layer follows the routines of the music.”

Berndt emphasised that this kind of interaction demands careful timing, expressive range, and sometimes even simplification to make musical gameplay accessible.

From Instruments to Systems

Not all music-based interaction takes the form of traditional games. Electroplankton allowed Nintendo DS users to create sound patterns through direct manipulation — drawing curves, arranging nodes, or triggering plankton-like agents.

“Interestingly, all these concepts don’t really need introduction. Give it to the players, let them try it out, and they will soon find out by themselves how it works.”

Berndt distinguished between note-level interaction (e.g. triggering individual sounds, as in Donkey Konga) and structural interaction, where players influence arrangement, progression, or generative systems. Both approaches are valid, but they ask different things of the player — and of the designer.

Designing with Music in Mind

Berndt’s lecture underscored a recurring principle: if music is situated in the world, it should behave accordingly. It must continue when out of frame, shift based on player presence, and reflect changes in the environment. When music is visualised or systematised, it should offer feedback and form, not simply decoration.

“Music as part of the world has to be interactive, too.”

This is not a stylistic preference — it is a design commitment. When music is embedded in the rules of the world, it becomes not only more believable, but more meaningful. It can reflect character, reinforce consequence, and establish rhythm within both narrative and mechanics.

Berndt’s examples — from Monkey Island to Rez, from ambient performance to interactive music toys — show how music can operate on multiple levels at once: as texture, mechanic, and presence. His lecture made clear that diegetic music in games is not a solved problem or a historical curiosity. It remains a rich site for experimentation and design.

1 September 2025