What does this sound like to you? I am not referring to the words themselves or the syntactic make-up of the sentence but the sounds as an experience when you read. The phenomenology of reading is one understanding of how you hear the inaudible voice in your head as you process the words on the screen.
The Four Modalities
There are four modalities for communication: Oral expression, written expression, auditory comprehension and reading comprehension.
The modalities exist based on the single unit of meaning produced as sounds. There are rhythmic differences between mediums of expression that extend outside of the communication modalities but are medium specific. When we read aloud, the tone of the spoken word is stifled to fit the transfer of words to sounds. Poetry reads differently than a sales ad and a research article reads differently than a children’s book. It also sounds different when spoken.
Put simply, talking doesn’t sound like writing and writing doesn’t sound like talking when it is read aloud. You can tell when someone is reading as opposed to speaking simply by the tone. But what we are exploring here is how those tonal or prosodic features manifest themselves in your mind.
Words have a melody, even if we don’t know the song, we improvise it. It is like learning a song from reading the lyrics as opposed to hearing it be sung. Consider someone reading the star-spangled banner. “The land of the free” is flat in text but once you hear the exuberant stretch of the word ‘free’ you can never read it the same way again. The shared context of the external representation evolves the inner representation.
Who’s Talking, Me or the Author?
My first job out of college was at a local radio station. The owner had a distinct, pretentious, condescending tone that still makes me quiver in disgust when I think about it. He had given me one of his many, mind numbing books of radio transcripts to read. I remember taking it home and opening it up when to my surprise there was a new voice in my head.
I was reading his book in his words, not the usual voice that I had known but never considered until that moment. I was bewildered and appalled by this, to the point where I threw his book in the garbage, where it belonged. I had to take a shower, but there was no cleaning thorough enough to drain that awful echo from my head.
Bone Conduction and Reverb
The digital word has saturated our mind with text. Each with a unique prosody.
We are anatomically designed to never hear our actual voice. Technology has enabled us as a species to experience the phenomena of our voice, something our ancestors only rarely heard except through the echoes of nature. Even then, the reverb effect layers the sound of our voices with a delay, so the perception is altered. We were biologically wired to only hear our voice through echoes.
The reason why our voice sounds so different when we hear it on a recording is because we are taking in the sound in a different manner entirely. It is often jarring and unsettling not because we are self-conscious but because it is, well, unnatural. It stirs our lizard brain by forcing us to process something it never considered, our own voice.
Bone conduction is the way sounds reverberate through the skull. We have an open-air labyrinth of bone behind the ears called our mastoid process. The process of weaving the sounds through the bones filters how our voice is perceived. Since the skull conducts lower frequencies of sound, as opposed to the air, it gives us the false sense that our voices are deeper and fuller than they are to others.
When we hear our voice from a recording, it is traveling through the air and doesn’t have the vibrations and resonance of our bones. We can hear it and even perceive it as our own voice, but it’s not the same one that is in our head when we read.
Text to Speech and Speech to Text
The sound from symbols or text are inextricably linked experientially. There is an auditory innervation as various parts of the brain light up in response to seeing a word. It isn’t simply a visual or an auditory experience, it is a synthesis.
When you text with a friend whose voice is familiar, it is difficult to register their words as their own voice until they say, or in this case type, something that they say frequently. Like when my friend says “oh yeah.” The orthographic representation instantly transforms into his distinct, exuberant, “Oh yeah!” that he has been saying since 6th grade when it was hilarious then and still equally as hilarious, untainted by time.
In a group chat, the bombardment of text from different voices is also impossible to interpret auditorily. If we were listening, it would all sound jumbled and indecipherable, like a loud tavern, but because we can parse it out with our eyes, we are able to integrate the information.
But the sound of those voices we read is far from what their actual voice sounds like, we are transmuting their voices into the voice in our head, especially if we don’t know what they actually sound like. Even after we hear their voice, it still reverts back to the sound we have in our head when we read their words.
The digitally saturated world we live in is one of text and therefore sounds. But the more time we spend reading without hearing, the more nebulous and incoherent they become. We are in a unique position in human history where we can talk all day without uttering a word. There is no substitute for the human voice. When we hear someone’s voice, their written words will never sound the same way again. That’s something worth listening to.
Find Matt on Twitter.