Sound is a physical phenomenon before it is a digital signal. Mastering Audio AI begins with understanding the physics of the wave.
1Waves of Pressure
Sound is a Longitudinal Wave that travels through a medium (air, water, or solids). It consists of regions of high pressure (Compressions) and low pressure (Rarefactions). When we record sound, we are measuring the displacement of a microphone's diaphragm caused by these pressure changes. This physical displacement is what we eventually convert into the digital numbers that an AI model can process. If you don't grasp this, you won't understand what those numbers in your matrices actually represent.
// Modeling Pressure Changes over time
class SoundWave {
constructor(sampleRate) {
this.sampleRate = sampleRate;
this.pressureSamples = [];
}
recordDisplacement(pressureValue) {
// In real life, the mic diaphragm moves in & out
this.pressureSamples.push(pressureValue);
}
}2Frequency (Pitch)
Frequency is the number of cycles a wave completes in one second, measured in Hertz (Hz). Higher frequencies produce 'High Pitch' sounds (like a whistle), while lower frequencies produce 'Low Pitch' sounds (like a bass drum). In audio AI, we often focus on the human voice range, which typically falls between 80 Hz and 14,000 Hz, though the full range of human hearing extends up to 20,000 Hz. If you're building a speech recognizer, filtering out frequencies above 8,000 Hz can often save compute without losing phonetic information.
// Frequency Band Filtering Concept
function filterVoiceBand(audioSignal) {
let voiceBand = [];
for (let freq of audioSignal) {
if (freq >= 80 && freq <= 8000) {
voiceBand.push(freq); // Keep human speech range
}
}
return voiceBand;
}3Amplitude (Volume)
Amplitude represents the strength or intensity of the sound wave. In the digital world, we often measure this in Decibels (dB). It's important to remember that decibels are a logarithmic scaleโan increase of 10 dB represents a sound that is roughly 10 times more intense. Understanding amplitude is critical for 'Normalizing' audio data so that different recordings have consistent volume levels for training. If your dataset has quiet whispers and loud screams, your model will struggle unless you normalize the amplitude first.
// Basic Audio Normalization Concept
function normalizeAmplitude(audioBuffer, targetPeak = 0.95) {
let maxAmp = Math.max(...audioBuffer.map(Math.abs));
let ratio = targetPeak / maxAmp;
// Scale all samples uniformly
return audioBuffer.map(sample => sample * ratio);
}