Frequency Analysis


Sound waves can be combined by adding the samples.  For example, we can add left and right stereo channels together to get a mono wave.  What about breaking a single wave into useful pieces?  

We can't easily break it into left and right channels, but we can represent it as a sum of simpler pieces.

The human ear responds to frequencies-- change of air pressure that repeat at a given rate.   A mathematical technique known as the Fourier transform can represent a signal in terms of frequencies.  The Fourier transform is the basic tool of signal analysis.

The  mathematics of Fourier analysis is not covered here, but can be found many places, for example  "Signal Processing and Digital Filters".  What follows is a discussion of the concepts, intended to build intuition, in which many of the complexities have been avoided.


Fourier analysis is a way of representing a wave as a sum of sinusoids, or sinewaves, which correspond to the vibrations that the ear detects.


A section of a sinusoid, or simple oscillator

Sinusoids occur in nature as a result of circular motion.  For example, picture a planet going around a star:


As the viewpoint drops into the plane of the planet's orbit, the motion appears like this:


If we trace out this circular motion over time, the resulting curve is a sinusoid:


•Representing a signal with sinusoids

Waves in the air are continuously valued-- there are air pressure values at every instant in time.  A digital sound signal, however, is a collection of air pressure samples measured at equally-spaced time intervals.

Consider the 4 sample values { 0, 1, 1, 0 }.  We could plot this as a signal like this:


A signal: the step function

This is sometimes called a time domain representation, because each point is the signal value at a certain time.  In a sound wave, each point is an air pressure measurement at a certain moment.  

We could think of the signal as a sum of these two simple pieces.  Each is zero everywhere except at one specific time:


Time domain representation of the step function

Another way to express these values is as a sum of these two pieces:


This is useful because these pieces are derived from sinusoids (the first one has frequency zero):


If our sample signal has a few more points:


Then the first few parts look like this:


Adding these parts gives an approximation of the original signal:


As more pieces are added up, the original function gradually reappears:


•The Fourier transform

The method of representing a signal as a weighted sum of sinusoids is the Fourier transform.  Usually we just graph the weights:


Frequency domain representation of the step function

The weights are labeled with integers (positive and negative whole numbers), which correspond to positive and negative frequencies.  This is a frequency domain representation--  each point is the weight of a certain frequency.  In this case, frequency zero has the largest contribution, and the even-numbered frequencies are all zero.  

For real-world signals, positive and negative frequencies are related in a simple way, so usually we just refer to the positive-numbered ones.   Often we are only concerned with the magnitude of each weight , so a frequency graph might look like this:


Fourier transform magnitude for the positive frequencies of the step function

The sinusoids used to represent the signal are harmonics; they each make complete cycles over the duration of the signal:


Fundamental and its first 3 harmonics

For sound waves, the zeroth harmonic corresponds to the average air pressure.  The other harmonics represent deviations from the average.  The first harmonic, or fundamental, is the lowest frequency that can be measured in the signal.  Its actual value, in cycles per second, depends on the number of samples in the signal and the time delay between samples.

The wonderful thing about the Fourier transform is that it represents sound much the way our ears do.   Here is the graph of a single plucked guitar string:


Plucked guitar string, waveform

Here is a section of the transform, showing how much energy is present at each frequency:


Plucked string fourier transform magnitude, detail

The transform shows that this sound is composed primarily of certain specific frequencies; the ones musicians refer to as partials (also called harmonics or overtones).  Partials will be discussed in detail later.

•Windows and windowing artifacts

Since sinusoids repeat, we can picture them on a circle.  This is like saying that no matter how many times the earth circles the sun, it never leaves its orbit:


Left: one period of a sinusoid.  Right: the sinusoid represented on a 3-dimensional circle

Now the harmonics look like this:


Fundamental and its first 3 harmonics

Since real-world sound signals don't repeat forever, we snip out a window to perform the transform on.  Taking a simple example signal:


Time blocks are clipped out, using a tiling of rectangular windows:


Windowed section of waveform

Now we can put the windowed samples on a circle and take the transform.  On a circle, though, the signal looks like this:


Windowed signal graphed on a circle

Where the left and right edges of the window meet, the wave jumps between two distant values.  

Unfortunately, the smooth sinusoids have a difficult time representing a jump like this, requiring many high-frequency terms to provide a good reconstruction of the signal.  These high frequencies aren't part of the signal we're interested in, they're part of the window.  Fourier transforms are easily corrupted by such windowing artifacts.

The transform of the rectangular window shows how important the high-frequency terms are to representing the jump:


Fourier transform magnitude for the rectangular window, detail

The small but numerous high-frequency components away from the central peak cause noticeable degradation of the frequency information.

•Smooth windows

A solution to the problem of the jump is to fade the signal in and out, using a smooth-edged window like this:


Now the windowed clip fades to zero at the edges, like this:


Here is a zoom-in of the transform of the window.  The peak is at frequency zero:


Fourier transform magnitude for the smoothed window, detail

The peak is wider than that of the rectangular window, telling us that the window spills some energy onto nearby frequencies.  Away from the central peak, however, it goes to zero much more quickly, indicating that there are fewer terms needed to approximate the signal accurately.

We get better frequency results using a smoothed window to break a signal into pieces.  However,  instead of nicely adjacent rectangles, the windows look like this:


Clearly we are missing part of the original signal.  We could try overlapping the windows, like this:


Now parts of the signal are measured twice.  

Fourier transforms are theoretically reversible, which can be useful for synthesis of sound with specific frequency characteristics.  If the transforms use windows that miss or duplicate energy measurements, however, this useful property is compromised.

Windowed Fourier transforms, while remaining the key tool of sound analysis, are not perfect tools.  However, the offshoot of Fourier analysis known as wavelet analysis helps address these problems.

© 2003 N. Resnikoff

 (March 3, 2003)