How does a computer know the difference between Jazz and Heavy Metal? By looking for the unique spectral signatures that define each genre.
1The Center of Mass
The Spectral Centroid is a measure used in digital signal processing to characterize a spectrum. It indicates where the 'center of mass' of the spectrum is located. Perceptually, it has a strong correlation with the Brightness of a sound. A song with many high-frequency instruments (like cymbals or electric guitars) will have a much higher centroid than a song dominated by low-frequency instruments (like a double bass or kick drum).
import librosa
# Calculate Spectral Centroid
centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
print(f"Average Brightness: {centroid.mean()}")2The Spectral Edge
Spectral Rolloff is the frequency below which a certain percentage (usually 85%) of the total spectral energy, or magnitude, of the signal is contained. This feature is excellent for distinguishing between different types of 'noisiness' and timbre. It helps the model understand the Cutoff Frequency of the recording, which is a powerful indicator of both the genre and the quality of the audio equipment used.
import librosa
# Calculate Spectral Rolloff at 85%
rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr, roll_percent=0.85)
print(f"85% Edge: {rolloff.mean()} Hz")3Building the Classifier
No single feature is enough to classify a genre perfectly. Instead, we create a Feature Vector that combines MFCCs (vocal tract shape), Spectral Centroid (brightness), Spectral Rolloff (energy distribution), and Zero-Crossing Rate (noisiness). This multi-dimensional 'fingerprint' is then fed into a machine learning model like an SVM or a CNN to predict the genre with high accuracy.
from sklearn.svm import SVC
import numpy as np
# Combine features: MFCCs, Centroid, Rolloff, ZCR
features = np.hstack([mfccs.mean(axis=1), centroid.mean(), rolloff.mean()])
model = SVC(kernel='linear')
model.fit(X_train, y_train)
prediction = model.predict([features])