The Physical Nature of Musical Sound

Boris Reitman
10 min readSep 30, 2018

This is an educational post that will be useful to anyone who wants to understand the keys on a piano keyboard from a mathematical perspective. It is a first article in three-part series, in which I focus on the physics of sound. Although I use the piano as the case-study, this explanation also applies to all musical instruments.

A sound is a physical vibration of air, or some other medium like water, that hits your eardrum in a repeated fashion like a woodpecker. Sound requires a medium and there can’t be any sound in a vacuum. I will assume air as the medium in all further examples.

In order to be perceived as sound, a vibration of air has to repeatedly hit the eardrum tens or hundreds of times per second. The frequency of vibration is perceived by human perception as a characteristic of pitch. When the frequency is high, we call it high pitch or treble. When it is low, we call it low pitch or bass.

The lowest pitch that a piano can produce — by pressing the leftmost key on the piano keyboard — vibrates your eardrum at about 30 times per second. The rightmost key produces a high pitch sound that vibrates your eardrum at about 4200 times per second.

The key “A” located in the middle of the piano, also known as “la”, produces a sound that vibrates the eardrum at 440 times per second. Normally this key is denoted as A4. That’s because on the piano it is the fourth key called “A”, counting from the left. In this article, however, I will denote this key as A-440 for reasons that will be soon apparent.

How does a piano key cause a sound? There are tightened metal strings inside the piano, one corresponding to each key. When a key is pressed, a hammer inside the piano hits a metal string and causes it to vibrate. Other keys on the piano are connected to hammers that hit other metal strings. Those strings may be thicker or thinner, and are tightened differently. When they are hit by hammers, they vibrate via air your eardrum at different rates. Here’s a YouTube video of inside of a piano.

You can experiment with how different speeds of air vibration sound using this handy Online Tone Generator app. If you have tried that app to play A-440, you may have noticed that the sound produced is hollow and unpleasant. Yet, if you hit the A-440 key on a piano it would sound much better. Why is that? That’s because of the harmonic overtones produced by a piano, described later in this article.

Now that I have described the basics of sound, let’s continue. There are other keys called “A” on the piano keyboard and they are all spaced 7 white keys apart. The one to the left of A-440 produces eardrum vibration at 220 frequency, which is twice as slow. The one to the right vibrates at 880 frequency, which is twice as fast. Notice the doubling pattern: 220, 440, 880.

Red keys: A-220, A-440, A-880. Yellow key: E-660

Our hearing faculty has a peculiarity: while it perceives absolute frequencies such as 220 and 440, it also perceives them relatively. For instance, if we played the A-220 key, and then the A-440 key, our hearing faculty “notices” that the vibration is twice as fast. This perception of relative pitches is called pitch interval. All pitch intervals in which one frequency is double than the other feel the same. It even has a name, it is called an octave.

Moreover, many people can not perceive absolute frequencies but can indeed perceive them relatively. If A-220 followed by A-440 is played, a listener would be able to tell you that it sounds like an octave, but he would not be able to tell you that you played the “A” keys on the piano. Those who can indeed determine by ear the actual constituent keys are said to have a perfect pitch hearing. This kind of ability is a natural talent that may be strengthened in early childhood. Here’s a YouTube video in which a father demonstrates how he trained his son to have perfect pitch hearing.

A visual analogy with color demonstrates the difference between relative vs perfect pitch. A person may be able to reproduce from memory a gradient from red to orange, but not the exact starting red and the exact ending orange.

Another interesting thing about our perception of vibration frequencies is that we perceive them on a logarithmic scale. In mathematics a logarithmic scale converts a multiplicative relationship to an additive relationship. Repeated multiplication, which yields an exponential growth, is converted to repeated addition, which is only a linear growth. The following two charts show how an exponential curve is linearized. Notice the change in gradation of the vertical axis.

Pitch is not the only attribute of sound that we perceive logarithmically. Another is volume. One author describes it:

Our ears detect changes in volume in a non-linear fashion. A decibel is a logarithmic scale of loudness. A difference of 1 decibel is perceived as a minimum change in volume, 3 decibels is a moderate change, and 10 decibels is perceived by the listener as a doubling of volume. Decibels are designated by the letters: dB.

The need to invent the unit such as decibel arose because we perceive changes in volume additively, while physically those changes are multiplicative. Likewise, we perceive pitch intervals additively, while physically they are multiplicative. All octaves sound like the same pitch interval, despite the fact that the difference between the frequencies is not the same. (Only the proportion is the same). So, while mathematically frequencies are multiplied or divided, pitch intervals are added or subtracted. To avoid possible confusion, the terms transpose up or transpose down are used. For instance, multiplying a frequency by 2 is described as transposing up by an octave.

So far we talked about perception of musical sounds voiced one after the other. Now, I would like to talk about human perception of simultaneous musical sounds. For a long time people did not focus on playing two sounds at the same time. This was partly because some musical instruments, such as a flute, can only produce one sound at a time. Same is with singing. Piano is one of a few musical instruments in which playing two sounds at the same time is easy: just press any two keys. Moreover, piano allows to play much more sounds simultaneously. The ten human fingers pressing ten keys can produce ten sounds at once. With the legato pedal which prolongs sounds, as much as 40 sounds can be voiced easily by pressing keys in sequence while holding the pedal.

Let’s talk about human perception of two simultaneous sounds. Suppose that keys A-220 and A-440 are pressed on the piano. What is the resulting effect on the human eardrum vibration? Let’s understand this using the “woodpecker” analogy: a woodpecker hits the eardrum at a certain rate. There is a woodpecker that hits the eardrum 220 times per second, an another woodpecker that hits it 440 times per second.

Let’s divide an interval of 1-second into 440 time units. If we number them from 0 to 439, then in every even interval the eardrum will be hit by both woodpeckers, but in every odd interval, only by one. The hearing faculty detects this pattern and delivers it to the mind as a distinct perception of a simultaneous pitch interval.

The simultaneous pitch interval feels differently than a sequential pitch interval involving the same two sounds. It has a texture. One musical exercise is to hear a simultaneous pitch interval, and then to sing it as a sequential pitch interval.

In the previous example we pressed the A-220 and A-440 keys. But, things get more interesting when one frequency is not a double of the other. Another piano key called “E”, found in the middle of the piano keyboard, produces a sound that vibrates at frequency of 660 times per second. I will refer to this key as E-660.

So, lets play A-440 and E-660. This is no longer a 2-to-1 vibration ratio, but is a ratio of 3-to-2. Using the woodpecker analogy, if we were to divide one second into 2*3*220 or 1320 time intervals, then every sixth interval would get a hit from both woodpeckers, every first and fifth will get no hits, and the others will get one hit each. Our hearing faculty detects this pattern and perceives it as another characteristic pitch interval. Let’s label it as 3:2, according to the ratio involved.

The mathematician and philosopher Pythagoras, back in 6th century BCE, made a crucial observation: sound pairs whose frequencies are related by a simple ratio, are perceived as a pleasing pitch interval. For instance, vibration pair of frequencies in ratio of 2 to 1 sounds pleasing. He called it an octave. A ratio of 3 to 2 also sounds pleasing. He called it a fifth. And, a ratio of 4 to 3 sounds to pleasing too. He called it a fourth. (Why these names were chosen for those intervals are explained in my next article on musical scales.)

According to Pythagoras, a simple ratio of vibration frequencies means that the numerator and denominator are low whole numbers. For instance, a ratio of vibration frequencies of 21 to 17 would not sound pleasant. An explanation for this may be that it is hard for our hearing faculty to grok a complex vibration pattern. (It is important to mention that the perception of what sounds pleasing is not universal.) However, two millenniums after Pythagoras, a physicist Simon Stevin suggested that there is no need to stick to such simple fractions. He came up with the Equal Tempered piano tuning, which I explain in my article on musical scales.

Let’s summarize: we have discussed what happens when an eardrum is hit with two vibrations simultaneously. Such thing happens when two different piano keys are pressed simultaneously, such as A-440 and E-660. You may be surprised to know that, to a degree, the are two vibration frequencies hitting an eardrum, even if you press a single piano key. The reason is that we have tacitly assumed that each vibration looks like a basic sine function:

Pure sine wave vibration

But, in fact, a tight metal string, when hit with a hammer, vibrates somewhat like this:

Vibration of a tight metal string

Mathematically, this wavy shape can be expressed as a sum of several basic sine waves. The mathematical term for this is Fourier series. For the A-440 sound, such a decomposition would show that the highest amplitude sine-wave is the one that has 440 periods per second. Sine-waves of other frequencies will have smaller amplitudes and contribute less to the overall sound profile. Although this kind of decomposition is merely an on-paper analysis of a sum total, it appears that the human hearing faculty also does it to an extent. The primary sine wave appears as the dominant, while the others are called by musicians harmonic overtones.

To reiterate, the sound produced by A-440 piano key will have a 440Hz sine-wave vibration component at high loudness, but will also contain 660Hz and 880Hz sine-waves as overtones, at lesser loudness. The effect on human perception is that the resultant sound is rich, full and “rounded”.

Also, since the sound of A-440 includes the 660Hz sine-wave, it should not surprise the reader that playing both piano keys A-440 and E-660 sounds pleasing. The E was already contained in the A, as a harmonic overtone.

These overtones is where music and physics meet. The following StackExchange thread gives a Physics explanation for harmonic overtones (it has to do with the principle of resonance):

While acoustic and string musical instruments always produce sound with overtones, electronic speakers are able to produce pure sine-wave vibrations. In fact, the electronics of a speaker must be programmed exactly with what wave pattern to vibrate. While it gives more control, it does require extra work to produce a nice sound.

For instance, cheap musical toys for children often play melodies of famous composers. But, the way the melodies are played sounds squeaky and flat. This happens because the sounds of the melody are not played with all of the natural overtones.

Electronic piano keyboards simulate the overtones, using Sound Fonts. These fonts are produced by recording a real acoustic piano per key and per velocity of attack (speed of pressing and release). The recorded sounds are combined using interpolation, to give the approximate output for any key combination. Generating these sounds happens computationally at “runtime” (immediately after a person presses a key), which limits the number of simultaneous keys that an electronic piano can play.

The overtones “thing” equally applies to singers. Why is it that one singer has a nicer voice than another? That’s because when he sings note “A” at 440Hz, his sound also has the overtones in other frequencies, that make it sound full and rounded. Another singer may have a different combination of the overtones, which is not as pleasant.

Differences in the way various musical instruments sound can be explained in the same fashion. Instantly recognizable is the characteristic saxophone sound, or the violin sound, or the trumpet sound. Even if the same note A-440 is played on each of them, they sound very differently. This musical characteristic has a name: timbre.

Timbre is even different within the same family of instruments. That is because musical instruments can only produce rich sound in a limited pitch range. Instruments like tenor sax, tuba, and contrabass play low pitch sounds with a rich timbre. Instruments like alto sax, violin, flute, and piccolo play high pitched sounds well. For instance, a high pitched sound of A-880 will sound better on an alto sax than on a tenor sax. Likewise, it will sound better on a violin than on a contrabass.

Before I end this article, here are two short video clips that demonstrate overtones:

Video 1
Video 2

I hope that with this introduction, the reader has an intuitive understanding of the nature of sound. In the next article, I talk about more pitch intervals, and how they are combined into a musical scale. I will also explain the peculiar order of black and white keys on the piano keyboard.

--

--

Boris Reitman

The course of history is determined by the spreading of ideas. I’m spreading the good ones.