Sound waves can be combined by adding the samples. For example, we can add left and right stereo channels together to get a mono wave. What about breaking a single wave into useful pieces?

We can't easily break it into left and right channels, but we can represent it as a sum of simpler pieces.

The human ear responds to *frequencies*-- change
of air pressure that repeat at a given rate. A
mathematical technique known as the *Fourier transform* can
represent a signal in terms of frequencies. The Fourier
transform is the basic tool of signal analysis.

The mathematics of Fourier analysis is not covered here, but can be found many places, for example "Signal Processing and Digital Filters". What follows is a discussion of the concepts, intended to build intuition, in which many of the complexities have been avoided.

Fourier analysis is a way of representing a wave as a
sum of *sinusoids*, or sinewaves, which correspond to the
vibrations that the ear detects.

A section of a sinusoid, or simple oscillator

Sinusoids occur in nature as a result of circular motion. For example, picture a planet going around a star:

As the viewpoint drops into the plane of the planet's orbit, the motion appears like this:

If we trace out this circular motion over time, the resulting curve is a sinusoid:

Waves in the air are *continuously valued*--
there are air pressure values at every instant in time. A
digital sound signal, however, is a collection of air pressure samples
measured at equally-spaced time intervals.

Consider the 4 sample values { 0, 1, 1, 0 }. We could plot this as a signal like this:

A signal: the step function

This is sometimes called a *time domain*
representation, because each point is the signal value at a certain
time. In a sound wave, each point is an air pressure
measurement at a certain moment.

We could think of the signal as a sum of these two simple pieces. Each is zero everywhere except at one specific time:

Time domain representation of the step function

Another way to express these values is as a sum of these two pieces:

This is useful because these pieces are derived from sinusoids (the first one has frequency zero):

If our sample signal has a few more points:

Then the first few parts look like this:

Adding these parts gives an approximation of the original signal:

As more pieces are added up, the original function gradually reappears:

The method of representing a signal as a weighted sum
of sinusoids is the *Fourier transform*. Usually we
just graph the weights:

Frequency domain representation of the step function

The weights are labeled with *integers* (positive
and negative whole numbers), which correspond to positive and negative
frequencies. This is a *frequency domain*
representation-- each point is the weight of a certain
frequency. In this case, frequency zero has the largest
contribution, and the even-numbered frequencies are all
zero.

For real-world signals, positive and negative frequencies are related in a simple way, so usually we just refer to the positive-numbered ones. Often we are only concerned with the magnitude of each weight , so a frequency graph might look like this:

Fourier transform magnitude for the positive frequencies of the step function

The sinusoids used to represent the signal are *harmonics;
*they each make complete cycles over the duration of the signal:

Fundamental and its first 3 harmonics

For sound waves, the zeroth harmonic corresponds to the
average air pressure. The other harmonics represent
deviations from the average. The first harmonic, or *fundamental*,
is the lowest frequency that can be measured in the
signal. Its actual value, in cycles per second, depends on
the number of samples in the signal and the time delay between samples.

The wonderful thing about the Fourier transform is that it represents sound much the way our ears do. Here is the graph of a single plucked guitar string:

Plucked guitar string, waveform

Here is a section of the transform, showing how much energy is present at each frequency:

Plucked string fourier transform magnitude, detail

The transform shows that this sound is composed
primarily of certain specific frequencies; the ones musicians refer to
as *partials *(also called *harmonics* or *overtones*). Partials
will be discussed in detail later.

Since sinusoids repeat, we can picture them on a circle. This is like saying that no matter how many times the earth circles the sun, it never leaves its orbit:

Left: one period of a sinusoid. Right: the sinusoid represented on a 3-dimensional circle

Now the harmonics look like this:

Fundamental and its first 3 harmonics

Since real-world sound signals don't repeat forever, we snip out a window to perform the transform on. Taking a simple example signal:

Time blocks are clipped out, using a tiling of rectangular windows:

Windowed section of waveform

Now we can put the windowed samples on a circle and take the transform. On a circle, though, the signal looks like this:

Windowed signal graphed on a circle

Where the left and right edges of the window meet, the wave jumps between two distant values.

Unfortunately, the smooth sinusoids have a difficult
time representing a jump like this, requiring many high-frequency terms
to provide a good reconstruction of the signal. These high
frequencies aren't part of the signal we're interested in, they're part
of the window. Fourier transforms are easily corrupted by
such *windowing artifacts*.

The transform of the rectangular window shows how important the high-frequency terms are to representing the jump:

Fourier transform magnitude for the rectangular window, detail

The small but numerous high-frequency components away from the central peak cause noticeable degradation of the frequency information.

A solution to the problem of the jump is to fade the signal in and out, using a smooth-edged window like this:

Now the windowed clip fades to zero at the edges, like this:

Here is a zoom-in of the transform of the window. The peak is at frequency zero:

Fourier transform magnitude for the smoothed window, detail

The peak is wider than that of the rectangular window, telling us that the window spills some energy onto nearby frequencies. Away from the central peak, however, it goes to zero much more quickly, indicating that there are fewer terms needed to approximate the signal accurately.

We get better frequency results using a smoothed window to break a signal into pieces. However, instead of nicely adjacent rectangles, the windows look like this:

Clearly we are missing part of the original signal. We could try overlapping the windows, like this:

Now parts of the signal are measured twice.

Fourier transforms are theoretically reversible, which can be useful for synthesis of sound with specific frequency characteristics. If the transforms use windows that miss or duplicate energy measurements, however, this useful property is compromised.

Windowed Fourier transforms, while remaining the key
tool of sound analysis, are not perfect tools. However, the
offshoot of Fourier analysis known as *wavelet analysis* helps
address these problems.

© 2003 N. Resnikoff

(March 3, 2003)