Why do we feel uneasy when we think someone will say a certain
word but we hear something else? The issue was raised in a Nature
Neuroscience article this past June that investigated a study by Arnal
and colleagues that showed how we use vision to predict what
we hear using MEG technology. As it turns out, our brains naturally
create speech predictions based on visual representations. In other
words, the brain guesstimates what others will say before it hears
them say it.
This is possible since we produce facial movements faster than
we do sounds with our vocal cords. Our brains use these facial
movements to predict the subsequent sounds a person may make.
In addition, as it is well known in linguistics, there are certain lip
movements associated with phonemes that are significantly better
speech predictors than others. For instance, readily visible sounds
like /p/ and /m/ are much easier to predict than less visible phonemes
like /k/ and /g/, which are produced at the back of the mouth.
Researchers in the study investigated patterns of brain activation
involved in this phenomenon by showing subjects videos of actors
producing a particular sound (for example, /pa/) paired with a vocal
mismatched sound (for example, /ka/). The researchers altered
the match between visuals and sounds by choosing vocal sounds
that ranged from low to high oral visibility. For example, the /ka/
sound is less visible than the /ga/ sound. Therefore, a visual of /pa/
paired with /ka/ should be harder to predict than /pa/ paired with /
ga/, since /ga/ visually resembles /pa/ more than /ka/ does.
Results showed differences in brain rhythms based on the degree
of matching between the visual stimuli and sounds. High frequency
brain patterns were evident when the prediction error was large (i.e.
the visual cues did not match the subsequent speech). In contrast,
low frequencies were present in the brain when prediction error was
low. Investigators believe that the high frequency in the former situation
is due to enhanced brain activity attempting to resolve the error.
This demonstrates the brain’s ability to continually re-shape its
neuronal connections. In a similar way, most incorrect predictions
occur late in the cognitive response. This result is consistent with
current predictive coding theories that say the brain is consistently
scanning new information to make updates about its external surroundings.
However, the problem of locating these rhythms in the brain remains.
For example, beta activity (the normal brainwaves of an awake
and alert person) can increase in certain regions of the brain such as
the visual cortex yet decrease in the frontal cortex (the area of higher
reasoning and planning). Additionally, the timing of the prediction errors
is hard to pinpoint since it seems that multiple prediction errors
can occur simultaneously.
So how does the brain interpret a connection between visual
stimuli and speech? The assumption is that there is a cortical hierarchy:
the higher order area predicts an upcoming sensory signal
and transmits this prediction to a lower order area. When the lower
order area picks up the sensory signal it sends it to the higher order
area. The circuit then calculates a mismatch between the prediction
and the incoming sensory signal as result of this higher order/lower
order cortical communication.
Predictive coding is a new area of research that has triggered
much interest within the scientific community and further investigation
seems promising. The valuable research of Arnal and his colleagues
provides the first insights into how such coding in audiovisual
speech works and exposes the complex interdependent nature
of visual and auditory systems.