Hearing Oneself VoIP

Consider this piece a small contribution to the literature of ambivalence over the turn to online communication in an age of mass quarantine. In the world of wealthy universities—the world where I work—there has been a massive move to videoconferencing as an attempt to keep our work going, to maintain an illusion of “business as usual.” Even before the current moment, voice over IP (VoIP) applications like Zoom, Skype, WhatsApp, Messenger, and Teams, have been celebrated as an assistive technology, and an alternative to physical copresence for people with various kinds of disabilities who cannot easily get from one place to another. But in at least one way, they come with a small cost. While commenters have reflected on the limited video and audio quality of Zoom meetings, or the problems of interacting with people through that modality all day, the main issue it has produced for me is one of interacting with my own voice.


Old fashioned land line phones have a mostly forgotten feature: feedback. When a person speaks into the microphone, their voice comes out the speaker in the handset, as well as going down the line. VoIP applications, as well as modern mobile phones, do not have this feature. It’s a small thing, probably not noticed by most people, and clearly eliminated by engineers because it was understood either as something to be engineered out because it was superfluous, or presented some technical problem that was deemed not worth solving. Or perhaps it was forgotten: when engineers designed the specifications for Video Compact Disc in the 1980s, they initially forgot to leave room for the soundtrack. Either way, this is one reason why people are more likely to carry on an otherwise convivial mobile phone or VoIP conversation at a level closer to yelling: they are trying to get their own voices in their heads to a volume similar to the voices coming from the other end of the line and into their ears.


There is a vast literature on hearing oneself speak. Philosophers usually celebrate the possibility that a subject can perceive its own speech, and they usually celebrate it in an ableist, audist way. Not everyone can hear themselves speak. Some d/Deaf and hard of hearing speakers have spoken without hearing themselves for generations. At the same time, not hearing oneself speak is not a form of liberation, at least not for people like me who have a paralyzed vocal cord. When learning to speak again when my vocal cord paralysis was new, my speech therapist warned me not to overuse the phone or VoIP, because it will strain or blow out my voice. Over time, vocal strain wears me out and hits me by surprise; I am working harder than I feel like I am.


But now that VoIP provides a supplement to confinement, one that allows a distant approximation of “business as usual,” I am condemned to video chat. For the first few weeks it was a special kind of exhausting hell because of the extra effort to speak. I knew that the only way to get through many hours of video chats for work would be to construct a technique—really an assembly of technologies—for hearing myself speak. This involves a kludge that makes use of a studio microphone, an audio interface for getting sound in and out of my computer, and a way of getting that sound into my VoIP. On one level, this should be a simple proposition. Musicians routinely need this feature because microtiming matters when one is recording music to a beat. Audio interface companies market the feature as “zero-latency monitoring” or “input monitoring” and most modern interfaces have it. Unfortunately, these interfaces run on specialized software drivers that get installed in the hidden nether regions of the computer’s operating system. There was a glitch in my driver that did not allow me to connect my microphone to my VoIP applications. I tried every troubleshooting trick I could imagine. Then I contacted technical support, which is the problem-solving equivalent of that moment in a horror movie when the protagonist enters a dark basement: you know it will not end well. Sure enough, technical support for the operating system and the audio interface created an infinite, recursive loop where they blamed the problem on each other.


I will spare you details on how I fixed it in the end, but I now have the ability to hear myself speak, so long as I wear headphones. I also invested in one of those cool broadcast arms to move the microphone back and forth. Because this setup uses a fairly sensitive microphone (again, to make it easier for me to speak quietly) and is located in front of a set of speakers, it produces a more heavy metal kind of feedback, at skull-ripping volumes, if I accidentally switch on the speakers while the microphone is on. I also have to be careful not to blast the sound into my headphones.


To my knowledge nobody is clamoring for audio feedback mechanisms as accessibility features in computers and VoIP programs, but I can’t help wonder if, like every other technology for access—curb cuts and closed captioning are two classic examples—my access need points to a broader problem introduced by the lack of audio feedback in new telephonic media. Who, besides those of us with crip voices, might need or want to hear themselves VoIP?