Sound Design at Edinburgh Napier University

Category: ADR

How Do You Make ADR Sound Like It Was Never Replaced? Paul Carden and Chris Navarro on Performance, Technology, and the Art of Dialogue Replacement

How do you make ADR sound like it was never replaced?

A line of dialogue may last only a few seconds, yet replacing it convincingly can require an extraordinary combination of preparation, performance, technology and judgement. The original production recording must first be identified as unusable, the replacement carefully documented and prepared, the actor returned to the emotional and physical circumstances of a performance recorded months earlier, and the new dialogue captured so that it matches the timing, vocal quality, microphone perspective and acoustic character of the original scene. If the process succeeds, the audience should never know that any of this work happened. During their joint online guest lecture for Edinburgh Napier University, ADR supervisor Paul Carden and ADR mixer Chris Navarro demonstrated this complete process from beginning to end. Rather than discussing Automated Dialogue Replacement only in theory, they created a deliberately compromised line of production dialogue, prepared it for replacement, recorded it on an ADR stage and evaluated the result. Their demonstration revealed a process in which meticulous preparation and technical fluency serve a deceptively simple objective: allowing everyone involved to concentrate upon the performance.

Carden began not in a recording studio, but outside beside a car. His objective was to demonstrate how an apparently simple piece of dialogue could become unusable during production. Wearing a lavalier microphone, he performed the five-word line “I’m late for work” while getting into the vehicle. The exercise immediately revealed how vulnerable production dialogue can be. Clothing could obscure the microphone, a zipped jacket could change the sound, movement could complicate recording and the closing car door could mask the line itself. The example was deliberately simple. Carden was standing almost still, knew exactly what he was going to say and had constructed the situation specifically for the demonstration. On a real production, actors may be walking, running, interacting with objects and performing emotionally demanding scenes while the production sound team manages multiple radio microphones and one or more booms. From this perspective, the surprise is often not that some lines require replacement, but that so much production dialogue survives at all.

The demonstration also established one of the central tensions within ADR. Production dialogue is not simply speech recorded on location. It contains a performance created at a particular moment, within a particular physical environment, as part of an interaction with other performers. Months later, an actor may arrive on an ADR stage while already immersed in an entirely different project and be asked to recreate a few seconds from that earlier performance. The recording environment offers little of the original context. The actor stands in a comparatively neutral room, watches an image on a screen, listens for cues and attempts to reproduce not merely the words, but the emotional and physical conditions under which those words were originally spoken. Carden emphasised that this is one reason actors can find ADR difficult. Matching a line involves returning to a performance that may no longer feel immediate or familiar.

Before any actor reaches the stage, however, the line must become an ADR cue. Carden demonstrated this preparation process by locating the damaged dialogue within the picture, defining the cue and documenting the reason for replacement. The cue was assigned a unique identifier, linked to its scene information and accompanied by notes explaining that the car door had closed over the line. He also demonstrated the value of professional flexibility. Where a production line was imperfect but potentially usable, he might identify the replacement as optional rather than forcing an unnecessary argument over whether it must be replaced. The cue sheet then carried the information required by the recording stage, including project details, version information, character and actor names, cue number, timecode and recording information. A five-word performance therefore arrived at the stage supported by an extensive information system designed to ensure that everybody was working on the correct material.

Version control was particularly important. Carden described production teams as living and dying by version dates, reflecting the practical reality that picture changes can quickly make carefully prepared cues inaccurate. Cue numbering performs a similarly essential role. Each take is voice-slated so that the recording itself retains its identity even if paperwork becomes separated from the audio. These details may appear administrative when viewed from outside professional production, yet they protect the continuity of the entire process. An ADR session may contain hundreds of replacement lines recorded across multiple days, actors and facilities. The actor should not have to think about whether the cue is correctly identified or whether the stage is working from the right picture version. Preparation creates the stability within which performance can happen.

Carden also offered one deceptively simple instruction for aspiring ADR mixers: keep recording until somebody clearly indicates that the take has ended. An actor may continue the performance, a director may decide to record another version immediately or the session may move into wild recording without a formal interruption. Stopping too early can lose useful material for no meaningful benefit. Storage is cheap; an unrecoverable performance is not. The advice reflected a wider principle that would continue through the lecture. Good ADR practice depends upon remaining attentive to what is happening in the room rather than allowing the machinery of recording to dictate the session.

The second stage of the demonstration moved into Navarro’s ADR facility. Even though the project consisted of a single line created for the lecture, he established the session as though it were a conventional production. Folder structures, documents, Pro Tools sessions and media were organised according to the same principles he would use for a project involving one session or hundreds. His ADR template was already configured for the ordinary demands of recording, while remaining adaptable when unusual situations arose, such as multiple actors performing together with several microphones each. A good template does not prescribe every session in advance. It removes predictable technical work so that attention remains available for situations that cannot be predicted.

Even the imported picture introduced a practical lesson. Navarro noticed that the system was responding sluggishly and identified the compressed H.264 picture as a likely cause, since the computer had to decode it continuously during playback. The moment was minor, but revealing. Professional technical fluency often appears not through dramatic troubleshooting, but through the ability to recognise quickly why a system is behaving unexpectedly and continue without allowing the problem to dominate the room. Earlier, Carden had offered similarly pragmatic advice about computer failure: save the work, restart the system and continue rather than allowing panic to consume valuable time. Both speakers treated technical expertise as calm familiarity rather than technological display.

Microphone selection then revealed how ADR matching differs from conventional voice recording. The objective is not to capture the most beautiful possible version of the actor’s voice. It is to create a recording that can inhabit the existing production soundtrack without attracting attention. Navarro showed that the stage had previously been configured for voice-over work, with microphones positioned closely and directly to produce the clear, present sound required for narration. ADR demanded a different approach. Carden had recorded the original line using a lavalier, so the replacement needed to reproduce the qualities of that production perspective rather than simply offering a technically superior recording.

Carden explained that ADR sessions commonly record both boom and lavalier microphones, ideally using the same models employed during production. Even when the original line appears to come predominantly from one microphone, alternatives can prove unexpectedly useful. Navarro demonstrated this by positioning a shotgun microphone close to Carden but substantially off-axis. Pointing it directly at the performer would have produced a cleaner and more extended sound, but that was not necessarily desirable. The off-axis position rejected particular frequencies and created a tonal character that could potentially sit closer to the lavalier recording. The result was counterintuitive: the boom microphone could, under those conditions, sound more like the production lavalier than the replacement lavalier itself. Matching therefore began not with assumptions about microphone categories, but with listening.

This distinction between technical quality and contextual suitability runs throughout professional ADR. A beautifully recorded line can fail if it sounds too clean, too close, too rich or too controlled for the image and surrounding production dialogue. Conversely, a microphone position that would appear unconventional in another recording context may provide exactly the spectral character required for a convincing match. The mixer must understand microphones well enough to use their imperfections deliberately. The question is never simply which microphone sounds best. It is which recording can become part of the scene without revealing the process that created it.

Once Carden stepped in front of the microphone, the demonstration moved from equipment towards performance. The familiar system of three audible beeps established the timing, with the actor beginning the line where an imaginary fourth beep would occur. In principle, the task appeared straightforward. In practice, Carden repeatedly entered early, became self-conscious about the timing and discovered how quickly an apparently trivial five-word line could become difficult once performance, synchronisation and technical awareness competed for attention. His reaction gave the students an unusually useful demonstration of the psychological demands placed upon actors during ADR. Knowing the line is not enough. Understanding the timing is not enough. Even achieving synchronisation is not enough. The replacement still has to sound as though it belongs to the original performance.

At one point, Carden produced a take that was synchronised correctly but immediately recognised that the performance itself was wrong. Navarro’s response was subtle. He did not ask for shouting or a dramatically louder delivery. He suggested only slightly more projection. The difference was not simply level. A small change in the physical production of the voice altered its tone and gave the line the additional presence heard in the original recording. When the replacement was compared directly with production dialogue, the improvement became obvious. The exercise demonstrated that ADR matching cannot be reduced to waveform alignment, pitch correction or equalisation. Performance changes the spectrum of the voice before any microphone or processor becomes involved.

Navarro later developed this point in greater depth. Two performances can have similar apparent loudness and pitch while differing substantially in vocal quality. An actor performing naturally on set may have a relaxed throat and a particular physical relationship with the surrounding scene. On the ADR stage, tension, self-consciousness or the effort to satisfy technical instructions can change the voice. A performer may become tighter, brighter or less natural even while reproducing the words and timing accurately. Matching therefore requires attention to qualities that are difficult to describe numerically. Projection, pitch, volume and rhythm matter, but so do muscular relaxation, breath and the physical origin of the voice.

This presents the ADR mixer with a delicate problem. The mixer may hear precisely what is preventing a line from matching, yet communicating every technical observation to the actor may make the performance worse. Navarro warned that performers can absorb only so many notes before they begin thinking about the mechanics of speech rather than the character. A useful intervention must therefore translate technical listening into language that supports performance. Sometimes the correct decision is to offer a suggestion. Sometimes it is to communicate through the director. Sometimes it is to recognise that an imperfection is not important enough to justify disturbing the creative balance of the room. Hearing a problem and knowing whether to mention it are separate professional skills.

The lecture also demonstrated an older approach to ADR recording that remains remarkably useful. Navarro sampled the original production line and played it repeatedly into Carden’s headphones. Instead of concentrating simultaneously upon picture, beeps and performance, Carden could hear the original line and immediately reproduce it, repeating the process several times. Navarro then edited the resulting takes into position and compared them with the production recording. The method offered a direct reference for timing, intonation, volume and vocal character, allowing the performer to respond to the sound of the original performance rather than attempting to reconstruct every detail intellectually.

Navarro’s explanation of this technique revealed an important insight into perception. Picture can sometimes become a distraction. An actor watching for a mouth movement may wait until it becomes visually apparent, by which time the correct moment to begin has already passed. If the original audio is correctly synchronised to picture, matching its rhythm and timing can naturally reproduce picture sync. The performer can therefore concentrate upon hearing and responding rather than continually monitoring several streams of information at once. Despite the age of the sampling technique, Navarro regarded it as one of the most useful tools available on the stage. Its continued value comes not from technological sophistication, but from the way it simplifies the performer’s task.

The same objective shaped Navarro’s control room. His system contained extensive routing, multiple microphone inputs, separate monitoring paths and a heavily customised control surface, yet the purpose of this complexity was to make the session itself feel simple. He might need to manage separate mixes for the control room, recording stage, actor headphones, supervisor headphones and remote participants, with each requiring different material during rehearsal, recording and playback. Attempting to reconfigure every route manually between passes would slow the session and create repeated opportunities for error. Navarro therefore designed systems that allowed complex changes to happen immediately.

His customised keypad provided a particularly revealing example. Individual buttons could trigger sequences of macros that armed tracks, initiated recording and performed repetitive editing operations once a take had finished. Navarro had spent months developing the core system and continued refining it whenever he noticed himself repeating unnecessary keyboard operations. He had begun his career as an ADR recordist and understood the division of labour on a two-person stage, where one person could manage recordings and files while the mixer concentrated upon the performers. Working alone, he used automation to reproduce some of that support, delegating repetitive technical operations to macros so that his attention could remain directed towards the stage.

This led to one of Navarro’s most important ideas: the ADR mixer must work at the speed of creativity. An actor or director may suddenly discover a new approach to a line and want to record it immediately. If the mixer responds by asking everyone to wait while tracks are configured, routes changed or files prepared, the idea may lose its immediacy. A delay of only a few seconds can alter the atmosphere of a performance. Technical speed therefore has a human purpose. The mixer learns the system thoroughly enough that machinery does not interrupt thought.

Navarro connected this principle to advice he had received about respected ADR mixer Tommy O’Connell. Asked what distinguished O’Connell’s work, a sound editor gave a simple answer: he anticipates. Useful actions are completed before anyone needs to request them. Navarro interpreted this not as a mysterious talent, but as the result of attention. If the mixer is watching the stage, listening to conversations and understanding the direction in which a session is moving, preparation for the next action can begin before a formal instruction arrives. Anticipation therefore depends upon both technical readiness and social awareness. A mixer whose attention is buried in the workstation may complete every requested task correctly while still remaining one step behind the session.

Carden’s preparation of the cue and Navarro’s management of the stage reveal two sides of the same professional process. Carden reduces uncertainty before the session begins through accurate cueing, documentation, version control and communication. Navarro reduces friction during the session through templates, routing, automation and anticipation. Neither form of preparation is intended to make the process more rigid. Both preserve the possibility that an actor, director or mixer can respond immediately when something unexpected and valuable occurs.

Microphone signal flow offered another example of this balance. Navarro described a deliberately clean recording path from microphone through preamplifier into Pro Tools, avoiding unnecessary outboard processing. Within the workstation, compression and equalisation could be used sparingly, but he warned against making irreversible decisions without good reason. Recording completely flat preserves maximum flexibility for the re-recording mixer, while careful decisions made during the session can still produce useful, committed tracks. Heavy processing may be difficult to undo later. A technically impressive recording decision is not valuable if it reduces the options available to the production.

Navarro’s session was configured to make microphone comparison immediate. Several microphone inputs could remain available, feeding dedicated record tracks at appropriate levels. A boom and lavalier could be recorded simultaneously as discrete channels, preserving both perspectives for later evaluation. The workflow reflected the uncertainty inherent in matching. The microphone expected to work best may not produce the most convincing result once the line is placed against production. Recording alternatives gives later editors and mixers material from which to construct the most believable transition.

His discussion of recording format was equally pragmatic. For conventional ADR, he worked at the established production standard of 48 kHz, 24-bit audio. Higher sample rates could be valuable for sound effects intended for extensive manipulation, where recordings might later be slowed or processed heavily. ADR serves a different purpose. Recording at unnecessarily high rates would increase storage requirements before the material was eventually converted to the format used by the production. The appropriate technical choice depends upon what will happen to the sound. More data is not automatically more useful.

As the lecture progressed, it became increasingly clear that the most difficult parts of ADR were not contained within any equipment specification. Navarro estimated that learning the technical fundamentals and becoming comfortable with Pro Tools took years, yet he distinguished that competence from the broader ability to run a session. Recording dialogue in sync with picture is ultimately a technical process that can be learned. Managing actors, directors, supervisors, performance, uncertainty and the emotional energy of the room represents a different level of expertise.

Actors may arrive nervous. Directors may be highly collaborative, completely self-sufficient or resistant to suggestions. A performer may struggle with a line while becoming increasingly aware of the difficulty. The mixer must understand how much intervention the situation can support. Navarro described the importance of creating a calm and comfortable environment, particularly when clients are unfamiliar with the process. Confidence can be communicated without dominance. If someone is uncertain, the mixer can explain what will happen, answer questions and demonstrate that the session is under control. Technical authority becomes useful when it reduces anxiety rather than displaying superiority.

Carden and Navarro also acknowledged the professional judgement involved in deciding when not to intervene. A mixer may recognise an aspect of a performance that could be improved, yet nobody has asked for technical input and the issue may not be important enough to disrupt the session. Navarro framed this as a balancing act. Would the intervention materially improve the line? Is the director open to suggestions? Will another note distract the actor from a performance that is already working? Expertise includes recognising problems, while professional maturity requires distinguishing consequential problems from imperfections that do not matter.

The ADR stage therefore brings technical precision and emotional sensitivity into the same process. The mixer must hear minute changes in vocal quality, understand microphone behaviour, maintain synchronisation, manage several monitoring environments and operate the recording system almost instinctively. At the same time, attention must remain on body language, conversation, confidence, frustration and creative momentum. Mastery of the technology is essential precisely so that the technology no longer consumes the attention needed elsewhere.

Carden’s demonstration also placed the scale of professional ADR into perspective. Their five-word line required production recording, cue preparation, documentation, stage setup, microphone selection, rehearsal, multiple takes, alternative recording methods, editing and comparison. A skilled actor might complete around ten or twelve conventional cues in an hour, while a feature film may contain 150 or 200 ADR lines. Complicated performances naturally take longer. What appears to an audience as a few moments of seamless dialogue can therefore represent days of concentrated work.

Yet the success of that work is measured largely through its invisibility. When Navarro compared Carden’s replacement line with the production recording, the evaluation concerned relationships: volume, tone, vocal quality, perspective and the way the replacement interacted with the surrounding scene. The door slam that had originally damaged the line was restored around the new performance, returning the replacement to the event from which it had temporarily been separated. Heard alone, the ADR recording was merely a voice on a stage. Placed back into the scene, it became part of an action.

This transformation captures the philosophy shared by both speakers. ADR is often described as the replacement of unusable dialogue, but their demonstration showed that replacement is only the beginning of the problem. The objective is reconstruction. The actor reconstructs a performance. The mixer reconstructs a microphone perspective. Editors reconstruct timing and continuity. The final soundtrack reconstructs the relationship between voice, action and environment so successfully that the audience experiences a single uninterrupted moment.

The joint nature of the lecture made this especially clear. Carden approached ADR through the complete production process, showing how a line travels from location problem to documented cue and finally to the recording stage. Navarro approached the same process from inside the control room, revealing the technical systems, listening skills and interpersonal judgement required to capture a convincing replacement. Their perspectives met at the point where professional preparation serves human performance.

By the end of the session, the deliberately damaged line had become something much larger than a technical demonstration. It revealed why ADR demands more than synchronisation, why the cleanest microphone is not always the correct microphone, why a performer can match timing while missing the voice, why an old sampler can remain useful in a modern digital workflow and why a highly automated control room can make a session feel more human rather than less. Above all, Carden and Navarro showed that successful ADR depends upon where attention is directed. The actor should be thinking about performance rather than machinery. The director should be thinking about the scene rather than routing. The mixer should be watching and listening to the room rather than fighting the workstation. The audience, finally, should be thinking about none of these things. When every part of the process works together, they simply hear a character speak.

15 June 2026
How Does a Crowd Find Its Voice? David Monteath on Crowd ADR, Performance, and Creating Believable Worlds

How does a crowd find its voice?

When audiences watch a film or television programme, their attention naturally settles upon the principal actors. Far less notice is taken of the countless background voices that transform a collection of images into a believable social world. Conversations drifting through a restaurant, murmured discussions in an office, distant arguments in a crowded street or the indistinct atmosphere of a busy marketplace all contribute to the impression that life continues beyond the central characters. Remove those voices, and even the most carefully photographed scene can feel strangely artificial. During his online guest lecture for Edinburgh Napier University, David Monteath returned to the University as a Sound Design alumnus to explore the specialised craft of crowd ADR. Drawing upon more than three decades working as an actor and voice artist, he demonstrated that believable crowd performances depend upon observation, improvisation and an understanding of dramatic context rather than simply recording large numbers of voices. One principle underpinned the discussion. Context is king.

Rather than replacing the dialogue of principal actors, crowd ADR creates the sense that an entire world exists beyond them. A small group of performers may become the customers in a restaurant, the spectators at a football match, the passengers waiting on a railway platform or the crowd gathered in a courtroom. Individual conversations overlap, reactions ripple through the group and emotional responses emerge at precisely the right moments, creating the impression that every person visible on screen possesses a life extending beyond the immediate story. Audiences rarely notice these performances consciously, yet they immediately recognise when they are missing. Scenes that lack convincing crowd performances often feel unexpectedly empty, regardless of how carefully they have been photographed or edited.

Monteath repeatedly challenged the assumption that this work consists simply of creating background noise. Crowd ADR is first and foremost a form of acting. Every performance responds to the circumstances of the scene, the relationships between characters and the emotional atmosphere established by the director. People waiting quietly in a hospital corridor behave differently from supporters leaving a football stadium. Conversations in an expensive restaurant differ from those heard in a busy café, while voices surrounding a royal procession carry a very different energy from those accompanying a political protest. Every reaction, interruption and fragment of conversation exists to support the dramatic reality of the scene rather than to attract attention in its own right. Authenticity emerges from understanding how people genuinely behave in different situations, not from making scenes louder or busier.

This emphasis upon dramatic context shaped every practical discussion throughout the lecture. Monteath encouraged students to think beyond individual words and instead consider the circumstances in which those words are spoken. Before deciding how loudly to speak, how quickly to react or even what might be said, performers first need to understand where they are, who surrounds them and what is happening within the story. The same phrase may require entirely different delivery depending upon whether it takes place in a library, an airport, a football ground or the middle of a battlefield. Successful crowd performers therefore begin by observing people. Everyday behaviour, casual conversations, shared laughter, hesitation, disagreement and excitement all provide material that can later be adapted naturally within the recording studio. The objective is not to invent behaviour, but to recognise and recreate it convincingly.

Perhaps the most revealing insight from this opening part of the lecture concerned the relationship between realism and audibility. Many beginning sound designers instinctively assume that important sounds should always be heard clearly. Monteath argued for almost the opposite approach. Successful crowd ADR often succeeds precisely when audiences remain largely unaware of it. Background voices should usually be felt rather than heard, contributing movement, texture and emotional energy without competing with the principal dialogue. Monteath returned repeatedly to the idea that audiences should sense the presence of a living world long before they consciously identify individual voices. Crowd ADR achieves its greatest success not when listeners admire the performance, but when they accept the world on screen without ever questioning how it came to life.

One of the most valuable themes running through the lecture concerned the difference between sounding natural and sounding believable. These ideas are not always identical. Performers working in crowd ADR rarely speak at the same level they would use in everyday conversation, yet exaggeration can become equally unconvincing. Monteath described the continual process of judging how voices should sit within the perspective of the scene. A performer passing close to the camera requires a different vocal presence from someone crossing the background several metres away, while conversations taking place outdoors demand a different energy from those occurring in confined interior spaces. Every decision depends upon dramatic perspective rather than fixed performance rules. Context, once again, determines everything. For sound designers, these distinctions become equally important during editing and mixing. A crowd recording that sounds entirely convincing in isolation may feel unexpectedly prominent once placed alongside production dialogue, Foley and ambience. Perspective therefore emerges through the relationship between every element of the soundtrack rather than through any individual recording considered on its own.

This attention to perspective extends beyond volume alone. Monteath discussed the subtle adjustments people make instinctively when speaking in different environments. Outside, voices naturally rise in level before settling into an appropriate projection as people unconsciously judge the surrounding space. He compared this process to a form of echolocation. Speakers continually test their surroundings, modifying projection almost instantly until their voice feels appropriate for the environment. Recording inside a studio removes many of the environmental cues that normally guide these unconscious adjustments, requiring performers to recreate them deliberately. The challenge is not simply to speak more loudly for an exterior scene, but to reproduce the natural behaviour that accompanies speaking outdoors. Audiences rarely analyse these details consciously, though they recognise immediately when they feel unconvincing. Successful crowd ADR therefore depends upon recreating patterns of human behaviour rather than merely increasing vocal intensity.

The physical demands of crowd ADR also proved far greater than many students had expected. Scenes involving panic, conflict or large-scale action often require sustained shouting over many hours, placing considerable strain on performers’ voices. Monteath reflected upon sessions in which actors had pushed themselves to the point of temporary vocal exhaustion, particularly when recording intense battle scenes. Curiously, he observed that shouting repeatedly inside a recording studio often proves more tiring than raising the voice naturally outdoors. In everyday life, people instinctively project according to their surroundings. Within the artificial environment of a studio, performers can find themselves holding unnecessary tension in the throat in ways that feel surprisingly unnatural. Maintaining vocal health therefore becomes an important professional skill alongside acting itself. It also reflects another aspect of professional sound design that audiences rarely consider. Recordings capable of conveying fear, excitement or urgency often depend upon performers sustaining physically demanding work throughout lengthy recording sessions while preserving consistency from one take to the next.

The discussion of large battle sequences illustrated another revealing aspect of the profession. Crowd performers may spend an entire day creating layers of screams, reactions and movement for scenes involving hundreds or even thousands of people, fully aware that much of their work will eventually disappear beneath music, sound effects and the principal action. Monteath recalled recording material for a major battle sequence in Game of Thrones, where hours of physically demanding vocal performances ultimately became almost imperceptible within the finished soundtrack. Rather than expressing disappointment, he presented this as an inevitable consequence of professional sound design. The objective was never for individual performances to stand out. Their purpose was to contribute energy, scale and credibility to the scene, even if audiences remained almost entirely unaware of their presence. The irony is that some of the hardest work in post-production often becomes the least conspicuous in the finished mix.

Monteath’s recurring phrase, “Context is king,” captures this philosophy particularly well. Every vocal decision derives from the dramatic situation rather than from the performer. Voices rise or fall according to the surrounding environment, emotional reactions emerge in response to the unfolding action and every fragment of conversation exists to reinforce the illusion that life extends beyond the principal characters. Successful crowd ADR is therefore measured not by how clearly individual voices are heard, but by how convincingly they allow audiences to believe in the world unfolding around them. Like many aspects of professional sound design, its greatest achievement lies in remaining almost invisible while making the fictional world feel entirely real.

The lecture concluded with a discussion that moved beyond recording techniques and towards the broader decisions that shape professional sound design. One student described the challenge of creating the atmosphere for a bank robbery scene. Adding more and more voices had seemed the obvious solution, yet the result quickly became cluttered and distracted from the drama. Monteath’s response illustrated once again why crowd ADR depends upon judgement rather than quantity. Real crowds rarely behave as a single, unified group. Even in moments of fear, surprise or excitement, different people react at different times and in different ways. Some remain silent, others whisper, a few call out, while many simply watch events unfold. Attempting to represent every visible person with an equally prominent vocal performance often produces a soundtrack that feels less realistic rather than more so. Believability emerges through carefully judged variation, allowing individual reactions to appear and disappear naturally instead of competing continuously for the listener’s attention.

This observation extends well beyond crowd ADR. Throughout post-production, sound designers continually decide what deserves the audience’s attention and what should remain part of the wider acoustic environment. A convincing soundtrack is not created through the accumulation of detail, but through the careful organisation of that detail into a coherent dramatic experience. Crowd performances occupy a role similar to ambience, Foley and environmental sound. They establish context, scale and emotional texture without constantly demanding attention. Their purpose is not to demonstrate how much work has been carried out, but to convince audiences that the world extending beyond the principal characters already exists. Like every other element of a soundtrack, their success depends upon supporting the story rather than competing with it.

Towards the end of the lecture, discussion turned to the growing influence of artificial intelligence within the voice industry. Monteath acknowledged that AI is already beginning to affect areas such as commercial voice-over, where some clients have started experimenting with synthetic voices. He regarded crowd ADR rather differently. While aspects of the work may eventually become automated, authentic crowd performance depends upon subtle variations that emerge naturally whenever people work together. Voices change over the course of a recording session as performers become tired. Emotional intensity shifts between takes. Individual personalities influence rhythm, timing and vocal colour in ways that are difficult to predict or reproduce consistently. These variations might appear inconvenient from a purely technical perspective, yet they contribute directly to the richness, unpredictability and authenticity that audiences instinctively recognise as human. Technology will continue to evolve, though observation, collaboration and performance remain at the heart of believable sound design.

For sound design students, perhaps the most valuable lesson lay in the way Monteath described his profession. Crowd ADR may appear to occupy the margins of post-production, hidden beneath dialogue, music and sound effects, yet it influences how audiences perceive almost every scene they watch. Every murmur in the background of a restaurant, every distant conversation in a station concourse and every carefully judged reaction during a moment of crisis contributes to the illusion that life continues beyond the frame. These performances do not simply fill silence. They create social spaces that feel inhabited, allowing viewers to concentrate on the story without questioning the reality of the world surrounding it.

Throughout the lecture, Monteath returned repeatedly to one deceptively simple principle: “Context is king.” Crowd ADR succeeds not through memorable performances or individually recognisable voices, but through creating the impression that every environment extends beyond the limits of the frame. Every carefully judged laugh, argument, whispered conversation and fleeting reaction reinforces a believable social world without distracting from the principal narrative. For sound designers, this represents a broader lesson that reaches far beyond dialogue replacement. Successful audio is rarely measured by how noticeable it becomes. More often, it is measured by how completely it allows audiences to believe in the world they are experiencing. Crowd ADR exemplifies that philosophy. It remains one of the least visible aspects of professional sound design, yet it is also one of the crafts that most quietly transforms moving images into convincing places inhabited by believable people.

27 April 2026
How Does a Film Speak Every Language? George Mikrogiannakis on Film Localisation, Dubbing, and International Sound Production

How does a film speak every language?

Most audiences rarely stop to consider the question. A film appears in a cinema or on a streaming service, characters speak naturally in the local language, performances feel convincing, and the soundtrack appears entirely coherent. Nothing suggests that thousands of individual decisions, spread across months of work and involving specialists in numerous countries, have contributed to what appears to be a seamless experience. During his online guest lecture for Edinburgh Napier University, George Mikrogiannakis drew back the curtain on that process. Drawing upon many years supervising international localisation for Walt Disney Studios, DreamWorks Animation, and other major productions, he revealed that dubbing represents only one small part of a much larger undertaking. Localisation combines translation, performance, dialogue editing, sound design, recording, mixing, quality control, and project management into a production process whose success depends upon remaining almost completely invisible.

Before discussing localisation itself, Mikrogiannakis addressed a question that many sound design students might reasonably ask. Why should someone interested in sound effects, Foley, ambience, or mixing concern themselves with dubbing? His answer challenged the assumption that localisation belongs solely to translators or dialogue editors. Film sound does not end when the final mix has been approved. Modern productions are expected to travel internationally, and that expectation influences decisions made throughout post-production. Deliverables, session organisation, music and effects mixes, dialogue editing, documentation, and recording practice all determine whether a soundtrack can later be adapted successfully into dozens of different languages. Understanding localisation therefore provides a wider understanding of professional sound production itself.

The distinction between dubbing and localisation formed the starting point for the discussion. Dubbing describes the replacement of spoken dialogue with performances recorded in another language. Localisation encompasses everything required to ensure that a film functions naturally within another culture while preserving the creative intentions of the original production. Dialogue must communicate the same dramatic ideas, fit the visible movements of actors’ mouths, respect timing, preserve emotional performances, and integrate seamlessly into the original soundtrack. A successful localisation should never feel like a compromise. Audiences should simply experience the film as though it had always belonged in their own language.

Commercial realities make this work indispensable. For many major studio productions, international audiences account for the majority of ticket sales. A film that performs well domestically may still depend upon worldwide distribution for its overall commercial success. Localisation therefore becomes an essential stage of production rather than an optional addition. Mikrogiannakis illustrated the scale of this work with one striking example. Pirates of the Caribbean eventually required more than six hundred separate versions to satisfy different languages, territories, exhibition formats, airlines, and distribution requirements. Once work reaches this scale, localisation no longer resembles a straightforward translation exercise. It becomes an international production pipeline operating alongside the creation of the original film.

Maintaining consistency across so many versions requires careful coordination between numerous creative and technical disciplines. Scripts pass from translators to dialogue adaptors, from recording directors to voice actors, from editors to mixers, and through repeated rounds of quality control before final approval. Every participant contributes something different while working towards the same objective. The audience should experience the same characters, the same emotional performances, and the same dramatic pacing regardless of which language they hear.

Translation itself proved far more creative than many students had expected. Mikrogiannakis explained that translators receive extensive supporting documentation describing characters, situations, cultural references, jokes, and dramatic context. Their task is not to reproduce individual words as literally as possible. Instead, they seek to preserve the intention behind the dialogue. Humour frequently illustrates this challenge. A joke that depends upon an English idiom or a cultural reference may simply fail when translated directly. Rather than forcing audiences to decode unfamiliar expressions, adaptors reconstruct the underlying comic idea so that viewers in another country experience a similar moment of humour, even if the dialogue itself changes substantially.

Lip synchronisation introduces further complications. Different languages occupy different amounts of time. A short English sentence may require considerably more syllables elsewhere, while other languages express the same meaning much more concisely. Dialogue therefore undergoes continual adjustment until it satisfies several competing requirements simultaneously. It must sound natural, preserve the original dramatic meaning, fit within the available time, and remain synchronised with the visible movements of the actor’s mouth. Accuracy alone is never enough. Rhythm, emphasis, breathing, pacing, and performance all contribute to whether audiences believe what they are watching.

The recording process reflects the same attention to detail. Unlike dramatic productions in which actors frequently perform together, dubbing sessions normally record performers individually, allowing every voice to remain completely controllable throughout the final mix. Consistency becomes one of the engineer’s principal responsibilities. Mikrogiannakis emphasised the importance of using the same recording environment, microphone, and acoustic conditions throughout an entire production. Local studios are generally expected to deliver clean recordings with minimal processing. Equalisation, dynamics processing, reverberation, and other creative treatments remain the responsibility of the originating production rather than the individual dubbing facility. Every language version therefore begins from comparable source material before being shaped into the finished soundtrack.

One particularly memorable example demonstrated just how carefully these productions preserve even the smallest creative details. During the localisation of How to Train Your Dragon, one character briefly speaks while wearing a leather mask. Rather than leaving each territory to interpret the scene independently, the production required two separate recordings of every affected line. One version was performed normally. The second was recorded with an obstruction placed in front of the actor’s mouth to recreate the acoustic effect of speaking through the mask. Such requests may appear unusually specific, though they illustrate a broader principle running throughout the lecture. Localisation seeks to reproduce the experience of the original production as faithfully as possible, even when that requires remarkably detailed technical preparation.

For sound design students, the most revealing discussion centred upon the music and effects mix, more commonly known as the M&E. At first glance, creating an international version might appear straightforward. Remove the original dialogue, record new voices, and place them into the existing soundtrack. Mikrogiannakis demonstrated why this assumption quickly breaks down. Production dialogue rarely contains voices alone. Clothing movement, footsteps, room reflections, environmental ambience, prop handling, breathing, incidental vocalisations, and countless other sounds often exist within the same recordings. Removing dialogue therefore removes far more than speech.

Producing a convincing M&E requires many of these elements to be rebuilt separately before localisation can even begin. Foley artists recreate physical actions. Ambience editors restore the acoustic character of locations. Sound editors recover or redesign details that disappear when production dialogue is removed. Every reconstructed element must integrate naturally with the remaining soundtrack so that audiences remain unaware that significant parts of the scene have effectively been recreated. Localisation therefore exposes something that audiences rarely notice. Successful dialogue replacement depends upon the invisible work of many other sound professionals whose contributions make the reconstructed world feel complete.

The same attention to detail extends into the organisation of production sessions. Mikrogiannakis explained that major studios prescribe how dialogue sessions should be structured long before recording begins. Lead characters, supporting roles, incidental dialogue, and background voices occupy predetermined locations within Pro Tools sessions so that material arriving from different countries can be assembled without confusion. Track layouts, naming conventions, file structures, and version numbers follow equally strict standards. These systems may appear administrative rather than creative, though they exist for a practical reason. Hundreds of dialogue files may pass between translators, recording studios, editors, mixers, and quality-control teams before a film reaches cinemas. Small inconsistencies introduced at the beginning of the process can rapidly become expensive problems once productions begin moving between countries.

For students accustomed to working alone, this offers an interesting perspective on professional practice. Large productions depend upon predictability as much as originality. Other members of the production team must be able to identify recordings immediately, locate the correct version of every file, and understand how sessions have been organised without needing lengthy explanations. Good organisation does not restrict creativity. It allows creativity to survive within projects involving hundreds of contributors working across multiple continents.

The recording sessions themselves reflect similar priorities. Actors rarely record together, even when their characters share a conversation. Instead, every performance is captured independently under carefully controlled acoustic conditions. Recording engineers seek consistency above all else, maintaining the same microphones, recording chains, and studio environments wherever possible. Performances can then be balanced, edited, and integrated into the soundtrack with considerably greater precision than would otherwise be possible. The objective is not simply to record dialogue. It is to provide material that remains flexible throughout every subsequent stage of post-production.

Security introduces another layer of complexity. Long before a film reaches cinemas, localisation teams may already be working on dialogue in numerous languages. Scripts, images, and recordings therefore become highly confidential. Mikrogiannakis described productions protected through extensive non-disclosure agreements, secure online workflows, watermarked media, and carefully controlled distribution systems. In particularly sensitive cases, even the picture supplied to dubbing studios may reveal only a small area surrounding a character’s mouth while the remainder of the image remains concealed. The performers receive enough visual information to synchronise their dialogue without exposing details of the story before release.

These precautions reveal another aspect of contemporary sound production that audiences rarely encounter. Localisation frequently begins while visual effects continue to evolve, editorial changes remain possible, and marketing campaigns have yet to reveal significant elements of the film. Sound departments therefore work within productions that remain in constant development. Flexibility becomes as valuable as technical expertise. Dialogue may require revision, scenes may be shortened, and editorial decisions may continue long after recording has begun. Every change must then be reflected consistently across every language version.

Once recording has finished, another phase begins. Every performance is reviewed against the original production to evaluate synchronisation, pronunciation, dramatic intention, technical quality, and consistency with previously approved material. Recordings that satisfy one requirement may still require revision for another. A technically perfect recording may not match the emotional intensity of the original actor. A convincing performance may reveal a slight synchronisation problem. A translation may preserve meaning while sounding unnatural when spoken aloud. Each stage of review narrows these differences until the finished soundtrack supports the same dramatic experience as the original production.

The process depends upon specialists whose expertise overlaps rather than duplicates. Translators evaluate language. Dialogue directors shape performances. Recording engineers concentrate on technical quality. Editors refine timing and synchronisation. Mixers integrate new dialogue into the existing soundtrack. Supervisors compare each completed version with the original production before granting approval. None of these roles can replace another. The finished film emerges through collaboration between people whose responsibilities remain distinct while contributing towards a shared creative objective.

One aspect of the discussion resonated particularly strongly for sound design students. Many university projects naturally emphasise creating interesting sounds. Professional productions require that creativity to coexist with organisation, documentation, planning, and consistency. A beautifully designed soundtrack that cannot be delivered reliably to another department quickly becomes difficult to maintain. Localisation demonstrates this reality with unusual clarity. Every recording created during production may later support dozens of additional versions distributed across the world. Decisions made while organising sessions, preparing stems, documenting edits, or recording apparently insignificant details may continue influencing the production years after the original mix has been completed.

The relationship between creativity and organisation also changes the way professional sound departments approach collaboration. Rather than treating editing, Foley, dialogue, sound effects, ambience, and mixing as isolated activities, localisation reveals how closely each depends upon the others. Replacing dialogue successfully requires carefully prepared music and effects mixes. Those mixes depend upon dialogue editors separating production material accurately. Dialogue editors depend upon clean recordings, consistent session management, and comprehensive documentation. Every department inherits decisions made by the departments before it. Strong workflows therefore support creative outcomes rather than competing with them.

Audiences rarely recognise any of this work, and perhaps they should not. Successful localisation draws attention towards the story rather than the production process. Viewers become absorbed in performances, relationships, humour, and dramatic tension without considering how many different versions of the soundtrack exist or how many specialists contributed to the one they happen to hear. The technical achievement lies precisely in making reconstruction disappear.

For sound design students, localisation offers an unusually clear picture of contemporary professional practice. It demonstrates that sound production extends well beyond recording and mixing. Projects continue to evolve after the original soundtrack has been completed, passing through new languages, cultures, technologies, and distribution platforms while preserving a coherent creative identity. Every carefully organised session, every clean recording, every reconstructed ambience, and every accurately prepared deliverable helps make that possible.

A film may begin life in a single language, though its soundtrack is often expected to communicate with audiences across much of the world. Making that transition successfully depends upon considerably more than translation. It depends upon planning, technical precision, collaboration, and a shared commitment to preserving the creative intentions embedded within the original production. The better those foundations have been established, the more naturally the film speaks to audiences, regardless of which language they hear.

6 April 2026