As a 40-plus-year veteran of the sound industry, when I’m asked to reflect on the importance of good audio, specifically microphone performance as part of a videoconferencing system, it leads me to wonder why it’s even a question at all. Quite often, audio quality is the primary indicator of connection quality, with video processing and network issues all being expressed as “audio problems.” In reality, they are not audio-related issues, although adjustments to the audio system can often mask or suppress those connection problems. Good microphone design and placement can mitigate problems with bad audio signal, but only within the boundaries of the physics of sound that govern the design of all microphones. Although it is true to say that current advances in processor power and design open up new and faster avenues of sound manipulation, the underlying constraints nevertheless remain.
At the beginning of unified communications (UC), when video and audio were combined to develop what we now call videoconferencing, the emphasis was always on the picture. Invariably, comments were made about “dropped frames” and “video tiling,” whereas audio was rarely mentioned unless there was the catastrophic “echo” or a complete loss of sound. In reality, the audio part of the process was infinitely more difficult to achieve, owing to the slow development of the ability to control return echo from the far end and the lack of speed in the connection between the two points.
Slowly, as the technology became more readily available and the developers began to understand the issues, acoustic echo control became better and better. As audio, video and connectivity technology all improved, more manufacturers began to release devices to support the conferencing market. As a consequence, the complexity of the installed systems began to increase substantially. To wit, people began asking, “If I can use one box for eight microphones, then surely I can use four boxes to enable 32 microphones, right?”
What originally was a complex room with eight microphones on a conference table has now become a 600-seat conference auditorium with 200 or more inputs. The $100,000 boardroom has become the $100 million audiovisual extravaganza, boasting a broadcast-style control room, behind glass, that’s in full view of the conference participants. Concurrent with this, it did not escape the notice of enterprise CFOs that they could have an equally satisfying, completely free FaceTime chat with their kids on their cell phones. The real question then became this: “Does the room sound $100 million better than my cell phone does?”
We’ve all seen the results of transmission delay and latency in video. Just look at most live-TV news reports in which two people are trying to have a conversation in locations many miles from each other. One person asks a question and the video shows the delay in the person hearing the question at the other end, and then responding. Equally jarring are instances in which video and audio have different arrival times; often, this appears as bad lip syncing of audio and video. This problem exists in conferencing, and it’s often dependent on the quality of the connection and the quality of the endpoint. It’s compounded by the probability of multiple far ends, each of which has varying audio and video arrival times. Try explaining to a CEO why his or her $100 million conferencing system’s audio does not match with the lip movement in the video. Good luck with that!
We expend a great deal of effort testing to try to find an average delay time to help compensate for this non-audio issue. Our findings show that, psychologically, matching the voice with the lip movement improves speech intelligibility. One day soon, someone will show me a way to automate this process that can measure and account for all the variables. Hopefully, someone is working on that problem!
We have all learned over time—especially through familiarity with our cell-phone technology—to discount minor glitches in audio and video. We’ll dismiss them as connection problems, and we’ll even resort to reinitiating a call to see if things improve. However, in a big, expensive conference room, there’s far less tolerance for connection issues. This typically results in opening a service call/ticket, even before redialing a call is attempted. The problems are frequently related to an issue at one of the many far ends; for example, one person reports he or she cannot hear well, even though every other far end is just fine. This is not, and it never will be, an issue that can be resolved by adjusting the transmission side of the audio.
Speaking of mobile phones, their typically lesser-quality audio—especially with calls from noisy environments—has contributed to decreased audio performance in conferences. That being said, two ways that cellphone sound has improved are (a) by using headsets and (b) by using sophisticated, in-car infotainment systems with cabin microphones and speakers.
Many of the problems with audio can be linked directly to microphone type and placement. Originally, small conference-room systems had microphones on the table in front of the person talking. Simple speakerphones were positioned on the desk in front of the user. Lectern microphones were located directly in front of the presenter, and each person on the dais had a microphone placed directly in front of him or her. All these had the advantage of microphones being placed a short distance from the source of the sound—the talker’s mouth.
Regrettably, minor improvements in microphone technology are now constantly being oversold by manufacturers’ marketing departments—in many cases, directly to architects and business owners—resulting in the unreasonable expectation that technology can correct for the “annoying” physics of sound. When we’re told by an architect that we cannot place microphones on the table or on the ceiling in front of the conference participants—indeed, that the only acceptable location is on the ceiling behind the heads of these people—one has to wonder what the architect is thinking. AV consultants are in a difficult position because saying “no” to an architect puts further work from that firm in jeopardy; unfortunately, that can leave no one to argue the case for properly placed microphones. (The architect, of course, points to the marketing materials that say there are microphones on the market that can work in those locations.) Let’s examine that hype.
I, personally, have listened to and measured the results of these claims of bending and/or altering the laws of physics. I can honestly say that the obvious point remains true: The farther away you place the microphone from the person talking, the less effective the resulting audio will be. Even though it is quite obvious to everyone in the audio business, it might be news to some manufacturers and consultants that, when people talk, the sound comes out of their mouths and not the backs of their heads. We shouldn’t have an expectation that a microphone can pick up sound clearly from that position.
I can attest to this personally. During live demonstrations at trade shows and during various video presentations I’ve viewed, I’ve listened to microphones that supposedly track people speaking around a room. I’ve been told by manufacturers that the voice quality is the same when you talk directly at these devices as when you face away from them; it never is, though. The problem is that people who don’t take the time to test for themselves believe the claims about the products. This has become so bad that architects and designers are dictating compromised positioning options on the assumption that we can just use technology to make it work. Why make it difficult when it doesn’t have to be?
One would think that beamforming microphones somehow were newly developed technology; that’s not the case, however. I was shown an early example of the technology installed in an auditorium at Bell Labs in Holmdel NJ in the mid-1990s. It was placed over the center of the stage, and it was used to focus pickup toward individuals in the audience who had a question. I saw another example in the late ’90s on a microphone, designed by an Austrian company, mounted on the rearview mirror of a high-end Mercedes car. It was meant to focus the microphone toward the vehicle occupant who was talking—a four-seat conference room on wheels!
Yes, there are excellent examples of beamforming and tracking microphones on the market today; however, none of the good ones is inexpensive. The unfortunate result of putting this type of product in the wrong hands is an unreasonable expectation of its abilities. I have not found one of these devices that can set itself up; all of them require very specific positioning and commissioning. Despite this, we now find room designers who are designating “Technology Zones” and telling us how and where we can position microphones. Of course, we’ve tried to comply with those requests to keep relationships; unfortunately, as we continue to do this, the requests become even more challenging.
Excellent UC audio is now compromised not only by the technology limitations detailed earlier, but also by the overreach of the design or the difficulty of the commissioning process—all just to look “modern” or “slick.” The entire industry must be mindful not to allow difficult situations to occur just because someone acting as an AV designer—someone who has no audio background—has decided things must be done this way.
It’s not all bad news, though. Some small telecom companies and their products have been purchased by strong, audio-centered companies; meanwhile, some other firms have taken the time to hire engineers to transform their clever ideas into real audio products. Yes, we do have more sophisticated microphones in our toolbox to resolve difficult problems—and this helps everyone. Echo-cancelling and noise-cancelling algorithms are getting better and better as digital signal processing (DSP) and core-based controllers get more and more powerful. The outlook is very good.
It’s an exciting time to be in the audio business. There are so many excellent products available, many of which sound better than ever before. Embrace the moment, but don’t for one minute lose sight of the physics of sound. Be empowered to say that there’s an alternate or better way to get good sound to the far end of a conference call. You will find doing so is well worth the effort.