# Problems with Ancient Musical Scales

*One-sentence summary: I explain why an exponential scale was invented, and how it compares with the older scales based on ratios of integers.*

This article is the continuation of the article on the mathematical nature of musical scales. In that article we have used ratios to come up with a division of an octave. We have given Latin names to musical intervals (or ratios of frequencies): octave as 2/1, tertia as 5/4, quinta as 3/2, quarta as 4/3, tone as 9/8, and semitone as 16/15.

In American English the tone, tertia, quinta and quarta intervals are familiarly known as second, third, fourth, and fifth notes of a basic scale. However, I will use the Latin names for reasons described in the previous article. (In Russian, these are more familiarly known as секунда, терция, кварта, квинта.)

When combining intervals we “add” them, but in terms of frequency ratios we multiply them. That’s because the human ear perceives frequencies on a logarithmic scale. Squaring a frequency ratio, feels like doubling a musical interval. Narrowing a musical interval corresponds to multiplying its frequency ratio by a number less than 1.

For instance, a two-octave interval has frequency ratio of 2/1 * 2/1 or 4/1. And a 7-octaves interval has a frequency ratio of (2/1)⁷ or 128/1.

(The logarithmic scale converts multiplication to addition. Before calculators were invented, men used a logarithmic slide rule to compute fast multiplications by hand.)

All approaches to tuning based on ratios are called *just intonation* tunings. Also, the term *just* *tuning* may also refer to the specific Ptolemaic tuning. The idea behind this name is that the musical intervals are most pleasing to listen to.

However, tuning an instrument based on ratios creates inconsistencies, because there is no single unit from which all the intervals are made up. As an analogy, take the case of mixing inches with centimeters. These units of length are incongruent. (On the other hand, inches are congruent with feet, and centimeters are congruent with meters.)

With the just tuning, for instance, a note 12 quintas away from a certain base note is dissonant with a note 7 octaves away from the same base note. The ratio between them is called the “comma of Pythagoras” and it can be calculated like this,

We can abuse the concept of “overtones” to say that the comma of Pythagoras is a distant 531,440-th overtone of the base note, translated 524,288/2 octaves down. Clearly, such note is not going to sound good together with the base note. This is the reason that this highly dissonant interval is called a “comma.”

The word “comma” came from Greek κόμμα which means “an act of cutting.” The idea behind this name is that a dissonant sound is “cutting the ear.” Although such “comma” intervals are a theoretical possibility, could one simply avoid such intervals in music? After all, why would a musical piece have a 7 octaves range? Turns out that the commas are more prevalent and are hard to avoid.

The music theorist Philalaus of Tarentum of the 5th century CE noticed that two hemitones and a comma of Pythagoras combine to a tone interval of 9/8. (A hemitone is an interval with a frequency ratio 256/243 and it is Pythagoras’ choice for a “half” of a tone).

Philalaus’ observation shows that if F# and G♭are tuned according to the Pythagorean scale, then they would clash in the Pythagorean comma dissonance. (The # and ♭notation here means raising and lowering by a hemitone). How likely is it to happen? Because some instruments are tuned with “black” keys as sharps of white keys, but other instruments are tuned with “black” keys as flats of white keys, such instruments would clash in a dissonance.

There are other notable commas that occur in just intonation tunings. The Ptolemaic comma occurs in the Ptolemaic tuning when the notes C, G, D, A, E, C are played in this ascending and descending sequence known as a “comma drift”,

The frequency ratios between the notes are 3/2, 3/4, 3/2, 3/4, 4/5. (For the ascending direction the ratios are greater than 1, and for the descending direction the ratios are less than 1.) Multiplied together they give the ratio of 81/80, the Ptolemaic comma. The sequence is known as comma drift, because the original base note (here C) drifts to a rogue “base” note that is dissonant with the first. This comma is also known as the Syntonic comma.

Another dissonance occurs when the Ptolemaic comma clashes with the Pythagorean comma. Suppose that one instrument plays Gb instead of F#, resulting in the Pythagorean comma 531441/524288, and another instrument plays a drifted F# resulting in the Ptolemaic comma 81/80. What’s the interval between the two notes? The interval is called “schisma,”

The syntonic comma 81/80 is also the interval between the Pythagorean tertia and the Ptolemaic tertia. From Wikipedia article on the syntonic comma:

The Pythagorean major third (81:64) and minor third (32:27) were dissonant, and this prevented musicians from using triads and chords, forcing them for centuries to write music with relatively simple texture. In late Middle Ages, musicians realized that by slightly tempering the pitch of some notes, the Pythagorean thirds could be made consonant. For instance, if the frequency of E is decreased by a syntonic comma (81:80), C-E (a major third), and E-G (a minor third) become just.

Thus we can see that the syntonic comma is important. The german physicist Helmholtz of the 19th century developed a musical notation that indicates by how many syntonic commas a Pythagorean-tuned note should be raised or lowered.

The medieval Pythagorean scale was derived by repeating quintas (multiplying 3/2 by itself) and transposing down by octaves (dividing by 2). In this way, the frequency 81/64 for a tertia was derived as (3/2)⁴ divided by 4. How could we modify this scheme, in order to get the more ear-pleasing 5/4 ratio for the tertia? If we decrease the quinta by a quarter of the syntonic comma 81/80, then a total of four quintas will contribute an overall decrease by a full syntonic comma. In numbers,

This exact approach was taken by musicians of the 16th century, and the result was the Quarter-comma Meantone scale. They sacrificed the perfect sounding quinta (the “perfect fifth” in American English), to get a better sounding tertia (“major third”). This scale was prevalent in the 16th and 17th centuries.

If we multiply both sides of above equation by 4, and then take a fourth root, we have a simple and revealing expression for the meantone quinta as the fourth root of 5,

Thus, here we see the first example of an interval that is not tuned to a ratio of integers, but is, in fact, an “irrational” number. More on that later.

A limitation of scales based on ratios is that they do not permit *modulation*. Modulation refers to changing to a new octave interval, one that starts from another base frequency. Mozart and Bach used modulation extensively. It is prevalent in later music, including jazz and pop. Here is a quote from the 18th century by a french composer Charles-Henri Blainville:

Modulation is the essential part of the art. Without it there is little music, for a piece derives its true beauty not from the large number of fixed modes which it embraces but rather from the subtle fabric of its modulation.

For example, one may wish to begin a song in the A-to-A octave which has the base frequency of 440 Hz, and then switch to the D-to-D octave which has the base frequency of 440*4/3 or 586.66… Hz. The base note (or the base frequency) is called a *tonic *or* key*. Thus, a song may start in the key of A, and then modulate into the key of D. (Do not confuse “key” used in the sense of “tonic”, with a physical piano key that one presses.)

In just intonation tunings, in order to modulate in the middle of a performance to a new tonic, one would have to switch, in some cases, to another instrument that is tuned to that tonic. Why is that? In the Pythagorean tuning of (tone, tone, hemitone, tone, tone, tone, hemitone) sometimes a black key would have to be tuned to be a tone away from a previous note, or a semitone away from the previous note. Once you chose a tuning for the black keys, some tonics can’t be used. For example, if F# is tuned as a tone ahead of E (this is needed when D is the tonic), yet F must be ahead of E by a semitone when C is the tonic, then F# can never be a semitone above F. But the latter is needed if C# is to be the tonic. Thus, if it is possible to modulate to the key of D, then it is no longer possible to module to the key of C#.

A similar issue occurs in the Ptolemaic scale. If “a” is the tonic frequency, then, the scale is (tone, minor tone, semitone, tone, minor tone, tone, semitone) or in numbers (9/8, 10/9, 16/15, 9/8, 10/9, 9/8, 16/15). Let’s say that I tuned the piano keys to match this scale with A as tonic. Now, suppose I pretend that B is the new tonic because I want to raise the whole music by a tone in pitch. The note (C#) following B must be an interval of a tone relative to B. Is it? No, the next interval is the Pythagorean minor tone, which is 10/9.

The Ptolemaic scale is less forgiving than the Pythagorean scale, when it comes to modulation, because it has tones of different sizes (9/8 and 10/9). This observations shows that it may be possible to develop a tuning that is more forgiving for modulations, allowing to maximize the number of different tonics that can be used without retuning the instrument. Such scale was developed in the 17th century, and it is called the Well Tempered scale.

From Wikipedia article on the Well Tempered scale:

As the term was used in the 17th century, “Well tempered” meant that the twelve notes per octave of the standard keyboard were tuned in such a way that it was possible to play music in all major or minor keys that were commonly in use, and it would not sound perceptibly out of tune (Duffin 2007, 37).

The only way modulation could work unrestricted, is if all intervals were composed from one repeated small interval. However, this is impossible using ratios. The Pythagorean philosopher Archytas of Tarentum of the 4th century BCE, showed that there is no solution with “a” and “b” as integers to the equation,

The expression on the right side of the equation is 2/1 when n is equal to 1. Archytas’ result shows that it is impossible to express 2/1 as a repeated product of a smaller ratio.

(A number expressed as a ratio of two integers is called “rational” in mathematics. Conversely, if a number can’t be expressed in such way, it is called “irrational.” These terms are not to be confused with the meaning of “rational” and “irrational” in philosophy.)

The Well Tempered scale was the answer to the problems with the Meantone Quarter-comma scale described above. To permit modulation, the Meantone scale defined frequencies for all white and black keys of the piano, of which there are 12. The idea behind the name “mean-tone” is that all the tones are about the same average size. The value for each black key was derived by a combinations of narrowed meantone quintas and octaves. If we take D as the tonic, then the black keys are found by moving by quintas in both directions, like this:

For example, take D as the tonic and let Y be the fourth root of 5. (Recall that this is the frequency of the meantone quinta.) Then the black key E♭ that follows D in the same octave is tuned to 8/Y⁵. The black key C# that is preceeding D is tuned to Y⁵/8. We expect the interval between two adjacent black keys to be a tone interval. Let’s check: the ratio between them is 64/Y¹⁰. The denominator Y¹⁰ is 5 to the power of 2.5 which is approximately 55.9. Thus, the interval has frequency ratio of approximately 640/559. Is this close to the expected “just” tone interval of 9/8? The corresponding decimal expansions are 1.1449… and 1.1250.

As you can see, the interval between two adjacent black keys is not 9/8. Is it at least, the same as the interval between two white keys in this scale? No. The scale sets E to the frequency of Y²/2 relative to the tonic D. That simplifies to a half of square root of 5, or 1.1180… in decimal. As you can see, this is a different kind of tone.

Things get more out of hand, if we try to decide what to do with A♭and G#. Are they the same note? They are computed to have different frequencies, but the computation put them several octaves apart. How should they be tuned when they are in the same octave? If we transpose the lower note up by octaves, and the higher note down by octaves, they will meet in the octave interval of the D tonic. (Curiously, the down-transposed G# is lower than the up-transposed A♭). In this case, the ratio between the higher frequency to the lower frequency would be 128/125, which is perceived as out-of-tune because it is less than half of a semitone. (Western music doesn’t have intervals less than a semitone, but such intervals are found in Eastern music.)

If we get rid of G#, and just keep A♭transposing it up by octaves as needed to replace the G#s, then we have a new problem. The new G# is now further from the notes below it, and particularly it is further away from C#. Before the change, the interval C# and G# was the meantone quinta, but now it is neither the perfect quinta nor the meantone quarter-comma reduced quinta. It is yet another kind of quinta. This new interval is called the “wolf quinta” (Russian: волчья квинта) or “wolf fifth” in American English terminology. It has this name because it causes distinctive beats, which sound like a wolf howling.

In summary, the Meantone Quarter-comma scale would have a wolf quinta somewhere, and that place depends on the choice of the center note to which the instrument is tuned. This disqualified modulating into many other tonics on this instrument. (In our example D was the center note.) So, the musician couldn’t modulate to other tonics which would have to feature the C#-G# combinations in the music. (For example, you couldn’t play the E6/C# chord which has the notes C#-E-G#-B.)

If quintas repeated downward would meet quintas repeated upward (after octave transpositions), then they would create a “Circle of Quintas” (known in American English as the “Circle of Fifths.”) Alas, that doesn’t happen in the Quarter-comma meantone scale, resulting in the wolf interval, as we have seen.

As a response to these difficulties, Andreas Werckmeister of 16th century proposed a creation of a *well tempered *scale that would yield such a closed circle of quintas, while retaining desirable qualities found in other scales. From Wikipedia:

[Werckmeister] described a series of tunings where enharmonic notes had the same pitch: in other words, the same note was used as both (say) E♭ and D♯, thereby “bringing the keyboard into the form of a circle”. This refers to the fact that the notes or keys may be arranged in a circle of fifths and it is possible to modulate from one key to another unrestrictedly.

The term “well tempered” is an umbrella term for various tunings that have possibly non-equal intervals, with the goal of maximizing the ability to modulate between tonics. Furthermore, playing in a different tonic resulted in a different combination of deviations from just. intervals, which gave a unique texture to playing in that tonic key. Wikipedia gives a good summary of this phenomenon:

The term “well temperament” or “good temperament” usually means some sort of irregular temperament in which the tempered fifths are of different sizes but no key has very impure intervals. Historical irregular temperaments usually have the narrowest fifths between the diatonic notes (“naturals”) producing purer thirds, and wider fifths among the chromatic notes (“sharps and flats”). Each key then has a slightly different intonation, hence different keys have distinct characters.

Such “key-color” was an essential part of much 18th- and 19th-century music and was described in treatises of the period.[my emphasis]

The key-color phenomenon is the reason why the concept of *temperament* was chosen to describe tuning systems. The chosen “temperament” by which to tune a piano, or by which a symphony or orchestra tunes its instruments, had determined the distinct texture, a temperament, of a musical composition played. That is, the musical composition may modulate through different keys, yielding a different key-color in each case, but the overall effect is covered by the overall temperament of the tuning.

(Ironically, the word *temperamentum* in Latin* *means “mixture” which could be the name for a mixture of various intervals and frequencies. Psychological temperament, however, is a different matter. Wikipedia writes that temperament “broadly refers to consistent individual differences in behavior that are biologically based and are relatively independent of learning, system of values and attitudes.” This is incorrect, because person’s character and hence his behavior, are determined by morality and thus by Free Will. However, the biological differences indeed affect the speed at which different people experience emotions. Some people react quickly and the emotion is gone quickly. Others react slower, and the emotion lasts longer. It is speculated that those who react quickly, have stronger emotions.)

Several people invented versions of well tempered tunings. As mentioned Werkminster developed one of them. And so did Thomas Young, the 18th century physicist. Another was developed by a student of J.S. Bach, Kirnberger. The physicist Huygens (17th century) wrote a letter “concerning the harmonic cycle” in which he described a 31-fold division of an octave.

However, the most successful “well tempered” scale, is the Equal Temperament. Western music saw a dramatic shift when the physicist Simon Stevin invented the Equal Tempered scale at the turn of 17th century. This scale enabled modulation to any key, and it worked by abandoning “rational” ratios of frequencies altogether.

As mentioned, there are no rational solutions to the equation

but the solution is readily available as the *k*-th root of 2. What should be “k” ? Should we split the octave into 12 notes (k = 12), or should we split it into 11 or 13 notes?

After all, why do we have 12 notes in the first place? The diatonic scale (the famiar “major scale”) split the octave into only 7 intervals. But more subdivisions are needed to allow modulations into diationic scales with other base notes. (The subdivisions are made at the black keys of the piano.) With rational frequency ratios it was hard enough to come up with 12 subdivisions, trying to keep them as even and as versatile as possible. However, when using exponentials, we are free to subdivide the octave into any number of equal intervals.

Stevin indeed chose k to be 12, which resulted in the familiar 12 chromatic keys per octave. But, was another choice possible? If we take as the guiding principle that “just” intervals 9/8 is desirable for a tone, a 5/4 is desirable for the tertia, and 3/2 is desirable for the quinta, then we should at least hope that the exponential subdivisions would yield combinations close to these values.

Here’s a diagram showing k-th roots as dots on horizontal lines and just intonation intervals as intersecting verticals. On line 12, the dots are close to the vertical lines. This means that when the number of divisions is 12, the resultant frequencies fall close to those predicted via just intonation schemes. Thus, the 12 semitone Equal Temperament scale, appears to be a good scheme for a division into equal intervals.

This scale or tuning is also known under several names and abbreviations: the 12-TET which stands for 12 semi-Tone Equal Temperament, or 12-ET, or 12-EDO for Equally Divided Octave, or simply ET for Equal Temperament since 12 division is the only one widely variant in Western music. The word “tempered” is also used interchangeably with “temperament.”

Dividing by more than 12 results in new unexplored sounds between the just intonation frequencies. The Blues note of 7/4 ratio described in previous article could be one of such sounds.

Let’s look more closely at the Equal Temperament scale numerically. It’s unit interval is called a semitone and has the frequency ratio of 12-th root of 2.

The Equal Tempered semitone has the frequency ratio of 1.0595… in decimal. (We still talk about frequency ratios, it’s just that they can no longer be expressed as ratios of whole numbers.) Contrast it with the Pythagorean hemitone and the Ptolemaic semitone. The Pythagorean hemitone is 256/243 or approximately 1.0535, and a Ptolemaic semitone is 16/15 or approximately 1.0666. Thus, these are all similar musical intervals. How similar are they, and how significant are the differences to the human perception?

Comparing small differences in musical intervals is convenient using the units of “cents.” This logarithmic unit is the small interval that would arise if we divided an octave into 1200 equal pieces. Numerically, it is equal to:

Thus, the Equal Tempered semitone is 100 cents, and the full octave is 1200 cents. The Equal Tempered semitone is narrower than the Ptolemaic semitone by 11.73 cents.

Since a tone is two semitones, it is equal to 200 cents. Likewise, the tertia is 4 semitones, quarta is 5 semitones and the quinta is 7 semitones, and therefore in units of cents they are 400, 500 and 700 cents respectively. Their differences from the Ptolemaic just intervals are: tone -3.91, tertia 13.69, quarta 1.96, quinta -1.96 cents.

How bad are these deviations? Can human perception distinguish such differences as 13.69 cents in the quinta? The syntonic comma of 81/80 is about 21.51 in units of cents, and the Pythagorean comma is 23.46 cents. The Schisma is only 1.9537 cents. The wolf quinta is 3.42157 cents wider than the just quinta.

Normal adults can reliably distinguish a pitch difference above 25 cents, thus they would perceive a 25 cent difference as out of tune. On the other hand, vibrato deviates from a mean frequency by an average of 60 cents, and yet it sounds pleasing. Experiments showed that the perception also depends on the timbre of the instrument.

Widogast Iring, at the turn of the 20th century, defined a unit now known as *centitone, *which equals two cents. What was his motivation? He noticed that the Schisma is about two cents (1.95…). He also noticed that if he divided the Pythagorean comma by 12, he got a similar number which he too approximated as two cents.

How are the commas relevant to the Equal Temperament (ET) scale? We can think of the equal temperament scale as an extension of the idea first seen in the Quarter-comma meantone tuning. There we have distributed the Ptolemaic comma over the quinta intervals. In the Equal Temperament (ET) tuning, we have instead distributed the *Pythagorean* comma into the 12 chromatic semitone intervals.

(The Ptolemaic comma interval plus the schisma interval equals the Pythagorean comma interval. Thus, the ET scale distributes the Ptolemaic comma into 11 semitones, and the schisma onto the remaining one.)

Thus, because the ET scale distributes the Pythagorean comma into 12 notes, and the 12-th part of this comma is a centitone, Iring thought that the centitone would be a more natural unit than the cent.

Here’s a visual the difference between the Equal Temperament scale and Ptolemy’s just intonation scale:

Note that Ptolemy’s scale does not specify tunings for chromatic notes (black keys on the piano with C as tonic). It only specifies tunings for a diatonic division of the octave into (tone, minor tone, semitone, tone, minor tone, tone, semitone). In the diagram above the positions of just notes are the right edges of these areas: (M2, M3, P4, P5, m6, M6, M7). The other positions are what the black keys could be tuned to.

Notice that both scales align closely on the quarta (right edge of the P4 area) and on the quinta (right edge of the P5 area). On the major tertia (the right edge of M3) they are off: just intonation scale has it as a smaller interval.

Furthermore, we can find the minor tertia in the diagram. The minor tertia is formed between the right edge of M3, and the right edge of P5. It is easy to see that this interval is bigger in just intonation.

Another way to say it is that the equal temperament scale has sharp major tertias and flat minor tertias. Its sixth and septima are also sharp.

A piano tuned to the Equal Temperament tuning can modulate in all keys. All of the notes are a little off relative to the “just” intervals, the perceived music is pleasing enough to human perception.

Yet, when musicians are performing without piano accompaniment, then they naturally switch to the just intonation tuning.

The following Music StackExchange has some singers discussing the practice of switching back and forth between equal temperament and just intonation scales.

One of the comments in that thread states:

A study done by Lottermoser and Meyer in 1960 found that professional choirs tend to sing the major thirds sharp and the minor thirds flat, thus veering toward Pythagorean [tuning].

Another commenter states that musicians can hear when they hit the right, just frequency. The note begins to “ring” beautifully in the ear.

A human eardrum vibrates, and this vibration produces its own overtones. These personal overtones may clash with notes and overtones coming from the piano. An absence of the clashes allows a stronger *resonance*, which gives a perception of ringing.

In summary, the Equal Tempered scale solved a technical problem of how to tune a physical piano, because the physical piano can’t be re-tuned mid-performance. However, an electronic piano can be indeed instantly re-tuned. With a press of a button, or as a response to musician’s playing of dissonant intervals, a computer program can reconfigure frequencies emitted by the electronic piano. In other words, it can detect modulations and switch to a diatonic scale based on another tonic.

Some electronic pianos already have ability to manually switch between tunings. One of them is the Yamaha Harmony Director HD-200. In this YouTube video, it is used to show the difference between 12-TET and just intonation scales. The C-major chord that is played is made of major tertia (C-E) and minor tertia (E-G).

Professional musicians switch between tunings dynamically to give the best sound effect. In the following video a violin soloist is advised stay in the Pythagorean tuning, despite the Equal Temperament of the piano. On the other hand, a string quartet is advised to play in Ptolemy’s just intonation. This creates a clash between the soloist and the supporting string quartet. The clash must be resolved by a case-by-case approach.

Another solution to the problem of tuning, is to adjust frequencies not in real-time, but for a recorded music. A program called Auto-Tune adjusted frequencies of notes sang by singers, to align with a the equal temperament tuning of the accompaniment. This program was so successful that “auto-tune” became a new term in the musical lexicon.

The same principle of re-tuning can be done to convert a music recording into Ptolemy’s just intonation scale, or into the Pythagorean scale.

Let’s summarize. We have seen problems with just intonation scales with respect to dissonances and modulations. While the equal temperament tuning is a clear winner for the acoustic piano, the just intonation tunings are still used by other musicians. Electronic pianos and post processing music software can further improve music to have better pitches.

Since the Equal Temperament is not always used, you can still hear “key-color” in music performances. The same musical composition played note-by-note may sound significantly different when played in different keys, in different tunings. (Tempo, timbre and volume dynamics also greatly contribute overall differences.)

Furthermore, divisions of an octave into more than 12 equal tones may lead to new musical genres. Such music exists and it is called “microtonal.” However, it is still waiting for its Mozart and Bach.

The Equal Tempered tuning also removes the tuning complexities when exploring modulation and chord progressions. I will take this up in the next article of this series.

In closing, here is a MIDI rendering of Bach’s “Pachelbel Canon” in the three tunings discussed: just, mean-tone and equal-tempered. Can you hear the difference?