During the pandemic, video calls became a way for me to connect with my aunt in the nursing home and extended family during the holidays. Zoom is my way of enjoying trivia nights, happy hours and live performances. As a university professor, Zoom is also how I conduct all my work meetings, coaching and teaching.
But I often feel exhausted after Zoom meetings, even with some of the entertainment I’ve arranged. Several well-known factors — intense eye contact, slightly misaligned eye contact, being in front of the camera, limited body movement, lack of nonverbal communication — can cause Zoom fatigue. But I was curious as to why conversations between Zoom and other videoconferencing software were more laborious and awkward than face-to-face interactions.
As a researcher studying psychology and linguistics, I decided to study the impact of videoconferencing on conversation. I did two experiments with three undergraduates.
The first experiment found that response times to pre-recorded yes/no questions more than tripled when the questions were played through Zoom rather than from the participants’ own computers.
A second experiment replicated this finding in natural, spontaneous conversations between friends. In that experiment, switching time between speakers averaged 135 milliseconds, while the same pair of speakers spoke through Zoom for 487 milliseconds. While less than half a second may seem fast, in terms of natural conversational pacing, the difference is timeless.
We also found that during Zoom conversations, people spoke for longer, so there was less switching between speakers. These experiments showed that the natural rhythm of conversation was disrupted by videoconferencing apps like Zoom.
Cognitive anatomy of dialogue
I already have some expertise in the study of conversation. Before the pandemic, I conducted several experiments to investigate how topic shifts and working memory load affect how long speakers take turns during a conversation.
In that study, I found that the pauses between speakers were longer when two speakers were talking about different things, or if the speaker was distracted by another task while talking. I was initially interested in the timing of turn transitions because planning responses during conversations is a complex process that people complete at lightning speed.
The average pause between speakers in the two conversations is about one-fifth of a second. By comparison, it takes more than half a second to move your foot from the accelerator to the brake while driving — more than twice as long.
The speed of the turn indicates that the listener does not wait for the speaker to finish before planning a response. Instead, the listener simultaneously understands the current speaker, plans the response, and predicts the appropriate time to initiate that response. All this multitasking should make conversation rather taxing, but it doesn’t.
Brain waves are the rhythmic firing or oscillations of neurons in the brain. These fluctuations may be one of the factors that facilitate easy conversation. Some researchers have proposed that neural oscillations automatically synchronize the firing rate of a group of neurons with the speed of a conversation partner. This oscillating timing mechanism can alleviate some of the mental work of planning when to start speaking, especially if it is combined with predictions of the partner’s remaining words.
While there are many unanswered questions about how oscillatory mechanisms affect perception and behavior, there is direct evidence that neural oscillators track syllable rates when syllables occur periodically. For example, when you hear four syllables per second, the electrical activity in your brain peaks at the same rate.
There is also evidence that oscillators can accommodate some variations in the rate of syllables. This makes sense to the idea that an automatic neural oscillator can track the fuzzy rhythm of speech. For example, an oscillator with a period of 100 milliseconds can keep in sync with speech ranging from 80 to 120 milliseconds per short syllable. If the length of a long syllable is a multiple of the length of a short syllable, the long syllable is not a problem.
Internet lag is the spanner in the mental gear
My hunch is that this proposed oscillating mechanism doesn’t work well on Zoom due to variable transfer lag. In video calls, audio and video signals are split into packets and sent quickly over the Internet. In our study, it took approximately 30 to 70 milliseconds for each packet to travel from sender to receiver, including disassembly and reassembly.
While this is very fast, it adds so much extra variability that the brain waves cannot automatically synchronize with the speed of speech, and must take over more difficult mental operations. This helps explain why I find m-conversations more tiring in zoos than the same conversations in person.
Our experiments showed that the natural rhythm of switching between speakers was disrupted by Zoom. This disruption would occur identically if the researchers believed that the neural set that normally synced with speech was out of sync due to delays in electronic transmission.
Our evidence for this interpretation is circumstantial. We did not measure cortical oscillations, nor did we manipulate the electronic transmission delay. Overall, research into the connection between neural oscillatory timing mechanisms and speech is promising, but not definitive.
Researchers in this field need to determine the oscillatory mechanisms of naturally occurring speech. From there, cortical tracking can show whether the mechanism is more stable in face-to-face conversations than in videoconferencing, and how much lag and variability leads to disruptions.
Can the syllable-tracking oscillator tolerate relatively short but actual electronic delays of less than 40 ms, even though they vary dynamically between 15 and 39 ms? If the transmission delay is constant rather than variable, can it tolerate a relatively long delay of 100 milliseconds?
Knowledge gained from such studies could open the door to technological improvements to help people stay in sync and make video conferencing conversations less cognitively disruptive.