16 Oct 2025
A novel sound synthesis technique generates realistic sounds from visual scenes by analyzing objects and simulating pressure waves within voxelized representations. This method creates dynamic and physically accurate audio environments without relying on pre-recorded samples or artificial intelligence.

A novel sound synthesis technique is capable of analyzing objects in a scene and subsequently generating their associated sounds, operating without pre-recorded audio or artificial intelligence.
The computer simulation of sounds produced by this technique is remarkably realistic, often indistinguishable from real-life audio, which initially evokes disbelief regarding its digital origin.
The technique operates by deconstructing objects into 'voxels' (small volumetric pieces) and then simulating the behavior of pressure waves as they interact and propagate within these voxelized representations to create sound.
The method seamlessly morphs the air between voxelized representations of objects as they move or deform, allowing for smooth sound updates without audible cuts or pops, similar to a DJ blending tracks.
The solver inherently understands the acoustic properties of the space where sounds occur, automatically differentiating, for example, a splash near a wall from one in an open field, creating physically accurate auditory experiences.
This technique eliminates the need for manual placement of sound effects in games and films, as the physics engine automatically generates the required audio, saving significant development time.
The system accurately accounts for geometry, demonstrated by how a sound source enclosed by hands produces a muffled sound, precisely reflecting real-world physical acoustic damping.
A single unified solver integrates diverse sound interactions, including pre-recorded sounds, vibrating shells, sloshing liquids, and rigid objects like Lego bricks, eliminating the need for multiple specialized algorithms.
The technique leverages uniform grids, making it highly GPU-friendly and enabling substantial speed improvements, with typical gains of 140x and up to 1000x faster than traditional multi-core CPU solvers.
Some simulations, even at low resolutions, run faster than real-time, signifying a significant step towards interactive sound simulations in various applications.
Smooth interpolation between animation frames is achieved, preventing the "popping" artifacts common in earlier methods and ensuring a seamless auditory experience.
The system robustly handles drastic geometric transformations, such as cavities opening and closing, without numerical instability.
The technique is capable of simulating over 300,000 concurrent candy impact sounds, although not yet in real-time, requiring approximately 15 seconds to compute 1 second of audio.
It resolves the challenge of newly appearing air after object movement by globally solving for missing pressure and velocity fields using a least-squares method, maintaining simulation stability.
The method supports tiny point-like sound sources for detailed elements like debris or splashes, reducing the need for ultra-fine grids to capture subtle audio events.
The system allows for the integration of "phantom" geometry (mathematical constructs, not physical objects) to shape and customize sound output, providing advanced sound design capabilities.
For moving objects, boundary conditions are intelligently reset, preventing sudden sound pops when an object enters a noisy area and ensuring physical believability.
The technique is nearing real-time interactive sound synthesis, envisioning a future where VR, games, and simulations feature physics-computed, dynamic soundscapes instead of static, pre-recorded audio, emphasizing a future where sound is computed, not recorded.
The code and dataset for this groundbreaking work are freely available to the public.
The future of sound is not recorded - it’s computed, and it’s going to be spectacular.
| Feature | Description |
|---|---|
| Unified Solver | Integrates various sound interactions (e.g., liquids, vibrating shells, rigid bodies) into a single, comprehensive algorithm. |
| GPU Acceleration | Achieves significant speedups (140x to 1000x faster than CPUs) by running efficiently on uniform grids and a single GPU. |
| Real-time Capability | Some demonstrations already run faster than real-time, paving the way for interactive sound simulations. |
| Smooth Sound Transitions | Employs smooth interpolation between animation frames to eliminate "popping" artifacts during object movement and deformation. |
| Geometry-Aware Sound | Accurately accounts for complex geometry and environmental acoustics, producing sounds that reflect physical reality (e.g., muffled sounds in enclosed spaces). |
| Physics-Driven Soundscapes | Generates dynamic and believable audio entirely from physics simulations, removing reliance on pre-recorded audio and AI. |
