Place your hand on your neck and say “zzz.” Feel the vibration? Now say “sss,” and you’ll notice none. Do note: this won’t work if you’re whispering. If you speak Standard Canadian English, which we will use for the duration of this article, these two sounds differ primarily by whether or not your vocal folds vibrate. The sounds accompanied by vibration are called “voiced.”
This isn’t the only pair of sounds that differ like this, either. The voiced “d” in “dock” and the voiceless “t” in “talk,” or the voiced “v” in “vowel” and the voiceless “f” in “foul” work similarly. Try it out!
In spoken languages, which are more common than signed languages, we send linguistic signals to each other through the air as sound waves. There are many minute acoustic and physical phenomena that we perceive as categorically different sounds.
Our vocal folds, or vocal cords, are found in our throats — specifically, in the larynx, or “voice box.” These are two flaps of skin with a vertical opening called the glottis. You can feel the glottis in action when you say “uh-oh”; most speakers will pronounce the break between “uh” and “oh” using the glottis.
In “uh-oh,” we only close and open the glottis once. When we vibrate our vocal folds while articulating a sound — like the vast majority of vowels or the “n” or the “s” in “nose” — the glottis opens and closes extremely quickly, producing between 85 and 255 vibrations per second during regular speech. This is called “voicing.”
Of course, it’s rather difficult to deliberately vibrate our vocal folds that quickly. Our fastest muscle twitches still take as long as approximately 50 milliseconds on average. Instead, voicing relies on a fundamental rule of fluid dynamics that you may be familiar with: Bernoulli’s principle.
Bernoulli’s principle states that faster moving fluids — like air — exert lower pressures than slower moving fluids. The airflow at the narrowest point of the vocal tract, the glottis, will be faster than at the rest of the vocal tract. If you push air through the glottis at a fast enough speed, the air pressure at the glottis becomes much lower than the surrounding air pressure.
This pressure differential — that is, the difference between these pressures — leads to a pressure-gradient force — where a force acts from a higher-pressure area to a lower-pressure area, similarly to when you feel “sucked in” when a large truck passes you by. This pressure-gradient force draws the vocal folds closed. When this happens, the air above the vocal folds comes into equilibrium with the surrounding air and pressure builds up behind the vocal folds until they are pushed open again, repeating the process. As you might imagine, this process occurs over a miniscule timeframe, allowing for the rapid vibration characteristic of voicing.
The consequence of this process is that voiced sounds are, as a rule, louder than unvoiced sounds. Above a certain volumetric flow rate — or speed at which air flows through the glottis — the effects of Bernoulli’s principle and the resultant pressure-gradient forces are unavoidable, and the volume of a sound is dependent on a property called sound intensity.
Sound waves are essentially rapid variations of air pressure between high and low pressure. The magnitude of the pressure differential is called the pressure amplitude, and sound intensity, which we hear as loudness, is proportional to the square of the pressure amplitude. All that is to say, as the pressure differential increases, a sound sounds louder.
The pressure differential itself is dependent on two things: the volumetric flow rate of the air, because as the lungs push out more air, the air immediately above them is pushed to a greater pressure; and the point at which the airflow is restricted, causing points of lesser pressure in the stream of air. The pressure differential is the difference between the greatest pressure and the least pressure in the stream of air. Unrestricted airflow doesn’t have a pressure differential within its current, which is why we don’t hear most breathing sounds. We only hear breathing if it’s either at a high enough volumetric flow rate to be restricted at some point in your vocal tract, or if you decrease the size of your vocal tract at some point, such as by pulling your lips into a small “o” shape.
This means that to have a sound intensity above a certain level, the volumetric flow rate of the air has to be great enough to cause voicing, which is why voiced sounds are always louder than unvoiced sounds.
If you vibrate the vocal folds faster, the vibration is heard as a higher pitch, and a slower vibration as a lower pitch. Similar effects occur when we roll our “r”s for certain language varieties: at the tip of the tongue for the “rr” in Spanish or Scottish English, or at the uvula for the “r” in French.
Other physics properties are at play, as well. Contesting the glottis and vocal cords in their importance to spoken language is the tongue, the unsung hero of speech. The tongue plays a huge acoustic role in creating sounds, acting as the main restrictor of airflow for most consonants and vowels.
Almost every consonant is produced by your tongue — “s” as in “sip” is pronounced by the tip; “ng” as in “thing” is pronounced by the middle of your tongue, and so on. By altering the way air flows between the tongue and what it’s against — the part that remains stationary during articulation and that linguists accordingly call the “passive articulator” — we can create a huge variety of sound waveforms, using different frequencies and overtones that we can perceive as categorically distinct.
The “s” in “sip” or the “f” in “food” are made by only allowing a very small opening between the tongue and the passive articulator, creating a turbulent stream of air using quite high frequencies that our ears most readily perceive. This is why these sounds stand out, especially when whispering, when we don’t voice any of our speech sounds.
Conversely, the “d” in “dock” or the “k” in “king” close the gap between the tongue and the stationary passive articulator entirely before suddenly opening it, resulting in a short and loud burst of air.
The tongue affects our vowels too. By moving the tongue up and down or forward and backward in the mouth, we can affect the shape of the oral cavity and create different acoustic properties that we hear as different vowels. The “ea” in “beat” has your tongue to the front whereas the “oo” in “boot” has it to the back, and the “a” in “cat” has it to the front but lower than either of these.
The physics of speech, especially fluid dynamics, are crucial to how language is articulated and are much more complicated than explained here. For further reading, explore topics such as phones and phonemes, phonology, speech articulators, and the International Phonetic Alphabet, or consider taking an introductory course to phonology and phonetics.