Introduction to Polycom Technology
Through the last 10 years, Polycom has developed the most advanced
speakerphone technology available today. Systems available in 1992, unstable
and hard to use, only approximated full-duplex performance, and wore price
tags from $2,500 to $10,000. Today Polycom offers a wide range of products
incorporating the Clarity by Polycom technology, matched to a variety of
applications, for prices starting at a tenth of that. But even today, the
question is often voiced, "what is so hard about building a speakerphone, of
Well, this is a reasonable question, given how simple the function seems:
you talk, they hear; they talk, you hear. What could be easier? This paper
is intended to explain some of the challenges involved in developing a
premier audio conferencing system, and how Polycom has approached these
challenges to produce Clarity by Polycom: a technology, or rather, a family
of interlocking technologies, that exploit a high order of internal
intricacy to produce the effect of external simplicity and transparency.
What is a speakerphone? It is something that acoustically links one or
more users, through the open air, to an electronic communication medium,
with no need to hold a handset, or wear a headset, next to the ear or mouth.
A speakerphone contains something to hear with, something to make noises
with, and something to control things with; jobs that are most commonly
performed by a microphone, a speaker, and a keypad. The simplest way to use
these elements is to then take the microphone and the speaker, and connect
them to the telephone or communications line. If this is done properly, then
the far-end talks and the sound comes out the near-end speaker; when the
near-end talks, the microphone picks up the sound of this talking and sends
it to the far end.
This sounds too simple to be true but it does, in fact, work. It works
well, up to a point: this is exactly what happens in a normal telephone
handset. The problem comes when the loudspeaker gets loud enough to hear
farther than an inch or two away: the microphone then hears not only the
person talking, but also its own loudspeaker. The result of this is much
like that in a badly adjusted auditorium PA system: feedback and howling.
And it occurs for the same reason—the microphone is hearing too much of the
loudspeaker, and the signal just feeds back on itself. The person at the far
end of this kind of phone call also hears echo, their own voice coming back
after a fraction of a second's delay. This is a very unsettling effect in
To overcome these problems, conventional speakerphones have for years
added a selector switch that allows only the microphone or the loudspeaker,
but not both, to be connected at the same time. The switch is controlled
automatically. Whichever end is talking louder gets control, an approach
that solves the howling and echo problems. But it now introduces the problem
of “clipping.” Because of this "loudest noise wins" strategy for setting the
switch, either end can now block out the other completely; coughs and
dropped pencils cut out important parts of the conversation. To compensate
for this clipping, we learn to shout at the speakerphone when we want to be
heard, to move stealthily and with great caution when we don't, and to ask,
“what did you say?” and "can you back up? I missed that" a lot. The whole
meeting is conducted in an uncomfortable, stilted fashion. Some companies
have even institutionalized the use of the "mute" button commonly found on
speakerphones, assigning to one attendee at each end of a speakerphone
meeting the task of pushing this button each time it is the far end's turn
The optimal strategy for this problem has only become available during
the past 10 years. This is built around a technique called echo
cancellation, and it works by using a very fast, specialized computer to
analyze the acoustics of a room, monitor the sound coming out of its own
speaker, and then predict the echoes that will result, in order to eliminate
them from the microphone signal. When done perfectly, this yields
“full-duplex” operation, allowing the microphone and speaker to remain on
all the time and conversation to proceed easily and naturally. Both sides
can talk, interrupt, drop those pencils or cough those coughs, without
impeding the flow of information. Because conversation is more natural, time
spent is not nearly as fatiguing, and meetings can go as long as required.
This effect has begun to change the way in which businesses hold meetings.
Many businesses are moving from the short, uncomfortable “speakerphone call”
to audioconferences in which conversation flows naturally, people are free
to move about, and work gets done nearly as well as talking in person.
But as the reader has probably begun to suspect, there is more to a
state-of-the-art audio conferencing system than meets the eye. Let us look
at some of the challenges inherent in the functions described above.
Room echo simulation
It is not just the sound that comes directly from the loudspeaker to the
microphone that is a problem, it is reflections within the room also. With
walls, furniture, doors, and people in different places, every room has a
different echo pattern and must be analyzed independently. Even the
difference between a door open and closed can result in feedback and howling
if not detected and compensated for. In addition, room environments change
continually during meetings as people lean back and turn in their chairs,
sip from coffee cups, push things around, open and close notebooks
(excellent planar reflectors!), and so forth. Although these changes can
seem small, they often create a big difference in the pattern of
reflections, just as a tiny chip of mirror can reflect a lot of sunlight. So
these changes also must be captured and compensated.
Earlier systems that measured the room response only once with a burst of
noise at the beginning of the call quickly lost track of the room
environment as people moved around, and became unstable. So while it is much
more difficult to continually update this model than to do it just once,
this kind of continuous updating, called "adaptive echo cancellation," is
essential for trouble-free operation.
Clear microphone pickup
Although the human ear is not highly directional by itself, the brain works
in conjunction with the 2 ears to separate sounds from room reverberation.
This is why a person talking from across a room can sound much better than
if heard on a conventional speakerphone: the ear may not be that good, but
the brain, using techniques that are still not well understood, cleans up
the signal. An electronic system, however, must depend instead on clever
acoustical design to be able to send the clearest sound to the far end of a
call. Careful attention to microphone frequency response, orientation with
respect to users and table, noise level, consistency, sensitivity, and a
variety of other factors, all play a part in crafting the clearest sound.
In some cases, multiple microphones are used in combination. The
SoundStation Premier®, for example, uses 3 independent highly directional
hypercardioid microphones, each with a full independent echo canceller. This
allows the system to select sound only from the direction of the talker,
which markedly cuts down on room reverberation. Microphones in Polycom's
conferencing systems are also mounted in a pressure-zone-microphone
configuration, which reinforces sensitivity in the direction of the talker
while eliminating almost half of the ambient noise.
Because the conventional telephone signal has very limited bandwidth, from
about 300 to 3300 Hz, it is essential to convey and reproduce all of this
signal as transparently as possible. High-end loudspeaker designs, although
still uncommon in most telephony systems, make a big difference in producing
clear, pleasant sound that is not tiring to listen to. In addition to
incorporating advanced acoustic suspension loudspeaker system designs, most
of Polycom’s audioconferencing systems contain custom loudspeaker drivers
optimized for wide bandwidth, high efficiency, broad dispersion, and plenty
of loudness without distortion for conferencing applications. These are all
factors that add extra cost, but are essential for transparent performance
in an audioconferencing system.
Because echo cancellers, as described above, must create a model of the room
before they are fully functional, the speed with which they can deduce an
accurate acoustic model is an important factor. A well-designed echo
canceller must be optimized for speed as well as accuracy, another function
that places heavy demands on the computational engine. In Polycom's new
systems, the audio environment is analyzed each 1/8000 of a second; each of
these analyses averages more than 10,000 computations.
Redundancy and reliability
As we have all experienced, a meeting that is halted by problems with
speakerphones is an expensive and embarrassing disaster. Just as airplanes
and spacecraft must operate redundant systems to assure continued operation,
so must as complex a system as a high-performance audio conferencing unit.
An integral strength of the Clarity by Polycom technology lies in its
control and supervision processing, in which the basic elements of the
system, as described above, are linked with numerous levels of management
and coordination software to ensure that changing conditions are compensated
for, that impending problems are detected and quickly corrected, and that
the audio conferencing unit will thus continue to perform in a manner as
transparent and invisible as possible.
So here we have the fundamental irony behind the best audio conferencing
systems: that it takes a dense and carefully tuned concoction of acoustics,
algorithms, electronics, mechanics, and user interface design to produce a
simple-to-use, transparent channel for continued productive audio
communications. This is the strength of the Clarity by Polycom technology:
not one algorithm, but a suite of techniques and processes that are melded
to create the optimal, focused, communication system.