Adaptive step size sub-band non-linear
filter (AS3NLF) technology Channel Control
Adjustable suppression factor post filter
Non-artificial noise smoother
SAM (Small Array Micrphone) Technology
SAM consists of the following major functions:
(1) AS3NLF
(2) Channel Control
(3) Small Array Microphone and Cone Shape Beam-forming
(4) Non-artificial voice smoother.
(1) AS3NLF
AS3NLF is implemented in our state of the art single-chip SAM
echo cancellers such as FM1182E or FM1093. With echo cancellation
in both time and frequency domain and innovative non-linear processing,
AS3NLF is able to handle extremely large echoes. In addition,
a special Channel Control and Beam-forming
with array microphones is also applied.
AS3NLF has two major portions: Adaptive
Step Sized Sub-band NLMS and Non-Linear Filter,
as shown in figure 2. With AS3NLF, SAM achieves 65dB (Adaptive
Step-Sized Sub-band NLMS contributes 30dB and Non-Linear Filter
contributes 35dB) of acoustic echo cancellation performance in
a mostly linear system.
(1.1) AS3 NLMS (Adaptive Step-Sized Sub-band Normalized Least
Mean Square Algorithm)
AS3 NLMS does not use noise to train the echo canceller.
Rather, it continuously trains on the normal exchange of speech
during the conversation. This results in a more natural flow
of conversation from the beginning of the conversation. Fortemedia
technology also allows talkers to move around.
AEC adaptation can be roughly divided into two phases: large,
rapid changes are required to adapt to major acoustical changes
(such as moving to a new room); smaller changes are required
to adapt to minor perturbations or echo path changes (people
moving, doors opening, etc.). When an AEC is first operated
in a room/car or moved to a new location, it needs to adapt
to the new acoustics of its surroundings. A good AEC approaches
this level of acoustical change quickly and unobtrusively
by determining when it is in the receive state and adapting
rapidly during that state. AS3 NLMS uses adaptive step-size
to automatically adjust the change rate based on the echo
environment. When there is only near end talk, the step size
is minimized, and becomes medium when there is double-talk.
The step size is maximized when far end talks. This method
allows the system to quickly switch among the various states
and still maintain stability. Hence it performs extremely
well on the echo path change.
(1.2) Nonlinear Filter
Non-linear echo is caused by over driven speaker, non-perfect
audio design, low supply voltage, mechanical vibrations, microphone
saturation, and echo path changes. These issues happen often
in the cellphone and other portable devices where
very small speaker with very low speaker output volume
battery voltage source, 3.7 to 4.4V, is too low to drive
audio amplifier to deliver enough loudness in linear operation
range
light weight, small form factor, and compact system design
prohibit a good sound chamber effect to increase loudness
enormous mechanical vibration from speaker to microphone
since they are in the same mechanical enclosure.
If non-linear echo is not dealt with properly, it will result
in howling and echo from the speaker, or the system would
revert to half duplex, if there is such a mechanism built-in.
Traditionally, center clipping is widely used to suppress
non-linear echo. The major disadvantage of center clipping
is that it does not suppress echo without significantly degrading
the full duplex performance and voice quality. On the other
hand, Non-Linear Filter uses very different techniques to
suppress non-linear echo. There are three major elements in
the Non-Linear Filter:
Big Echo Cancellation
ost Filter
Frequency Domain Non-Linear Filter
Fortemedia's Big Echo Cancellation process first
detects and classifies the non-linear echo element by correlating
the reference signal from the reference microphone and the main
signal from the voice pickup microphone. The resultant correlated
signal process can handle very large and severe non-linear echo.
Fortemedia’s nonlinear process also suppresses the non-linear
component accordingly based on the degree of non-linearity; therefore
it doesn’t affect full-duplex performance.
Fortemedia's Post Filter utilizes sophisticated
suppression factors to eliminate residual echo. Depending on the
proportion of the linear and non-linear echo, the Post Filter
can achieve an additional 25 to 35dB of echo cancellation with
a faster convergence time than other echo cancellers. This allows
Fortemedia’s SAM products to remove echo tail and adapt
rapidly and unobtrusively to changing acoustic conditions. Users
can move about freely during the conversation without degrading
communications quality.
The Frequency Domain Non-Linear Filter utilizes
the difference in acoustic characteristics between two microphones,
preferably an omni-directional and a unidirectional microphone,
to further identify and suppress specific non-linear echo elements.
This mechanism takes advantages of correlated acoustic information
from the two microphones and effectively eliminates echo created
by a loudspeaker operating in non-linear range. A much better
full duplex performance is achieved this way while suppressing
non-linear echo compared to other non-linear filters.
The combined SAM Non-Linear Filter automatically
adapts to changes in the placement of both the loudspeaker and
microphone as well as to changes in loudspeaker volume. The system
integrator is freed from the design constraints required of other
echo cancellers , and end users are enabled with flexbilble usage
models.
The 65dB AEC performance from Fortemedia (compared to 35dB AEC
performance with conventional acoustic echo cancellers) offers
significantly greater acoustic power output. This permits great
flexibility in microphone and loudspeaker placement and volume
adjustment. With the AS3NLF technologies, SAM
achieves between 25dB to 35dB side tone reduction.
(2) Channel Control
A traditional AEC uses “Voice Activity Detection”
to converge echoes. But using “Voice Activity Detection”
is not as decisive or may even make incorrect decisions during
this critical phase due to the fact that Voice Activity Detection
is hardly accurate by nature. As a result, the AEC remains un-converged
for a long time, and in extreme cases, never properly adapts.
Some echo cancellers force the speakerphone to store the room
characteristics after the initial convergence. This compensates
for the fact that the echo canceller is not capable of converging
quickly to major acoustical changes. In this scenario, the AEC
must undergo a rapid training procedure to learn the interior
environment from its un-initialized state. Once trained, it adapts
to small acoustical changes, but major changes require retraining.
This training usually takes the form of a loud burst of noise
or a sequence of tones, which the AEC uses to adapt to the gross
acoustical characteristics of its environment. Fortemedia does
not use “Voice Activity Detection” in its AEC algorithm
therefore no initial training required.
Fortemedia’s Channel Control compensates
for inadequacies in the conventional AEC by restricting the rate
of change and the amount of change allowed in the adaptive filters.
This prevents the AEC from going too far out of convergence by
adapting too rapidly when it is confused by a major disturbance,
while allowing the AEC to track relatively minor changes such
as a door opening or slow movement of a driver in a car. The most
complex and difficult task of an effective AEC is reliably determining
when to permit its internal acoustic model to adapt to changes
in the acoustic character of the car/room. Such changes occur
when volume levels are changed, people move about, doors are opened
or closed, the loudspeaker or the microphone is moved, etc. Adaptation
should only occur when in receive mode (for example, when the
near end party is silent and the far end party is talking). Inaccurate
mode decisions cause the AEC’s internal model to diverge
resulting in echoes which are not effectively canceled. Inaccuracies
in this decision process can cause another significant problem,
one that is handled very elegantly with Fortemedia’s Channel
Control. In addition to reducing the effectiveness of the echo
cancellation, inaccurate state or mode decisions may introduce
artifacts into the transmitted or received speech signals. Words
may be clipped or exhibit dropouts. Switch loss or center clipping,
which are nonlinear processes, may be applied to a speech signal
in the wrong mode. This causes chirping or warbling artifacts
that can be annoying and distracting. This can be noticeable in
any mode, but particularly in doubletalk (both parties talking
simultaneously).
AS3NLF and SAM Channel Control
create a smooth transition between states in double talk mode,
permit natural, undistorted doubletalk while enabling a fast and
accurate adaptation.
(3) Small Array Microphone and Cone Shaped Beam-forming
Array microphone solution has emerged as the most promising technology
to suppress non-stationary noise such as human babble, background
music, or passing traffic noise. By arranging multiple microphones
in an array, companies such as Fortemedia, AKG, Knowles, and even
Microsoft can further reduce surrounding noise, providing a more
natural sounding voice. Leveraging the information gathered by
the multiple microphones about the voice and surrounding environment,
an array microphone can process the signals in such a way that
effectively forms a beam to pick up the wanted signal within the
beam, and cancel out noise outside the beam. Several hands-free
car kits using the array microphone solution have already been
deployed.
While there are improvements in noise suppression, however, the
traditional broad array microphone is still impractical and limited
in three ways:
Requires at least 30mm between each microphone, putting placement
and space constraints on the end solution.
Can only cancel noise on a 2-D plane. This makes it harder
to pin-point the talker, while allowing noise to leak into the
beam; diffused noise, engine noise, rattling of the dash board,
and general road noise coming from above and below the pie-shaped
beam will cause major problems for voice recognition related
applications.
Not able to suppress the wind noise (from wind blowing and
applying mechanical pressure on the membrane).
SAM (Small Array Microphone) is the next step in the voice interface
market. Requiring only 5 to 10 mm between microphones, at least
600% closer than broad array microphones, SAM can be deployed
in practically any situation or application. SAM uses a fundamentally
different algorithm than the traditional array microphone to process
the voice, effectively forming a 3-D cone shaped beam. As such,
any noise outside of the beam, whether above or below, will be
cancelled out, without any leakage. The discussion that follows
will provide more background in the differences between these
two array microphone setups.
(3.1) Traditional Beam-forming
Traditional beam-forming utilizes the difference in time delay
between signals received at different microphones in the array.
As such, the microphones are placed further apart so the information
received at each microphone is sufficiently different. The width
of a broadside array beam is based on the wavelength of the signal
divided by the length of the aperture. So, at low frequencies
(longer wavelength), the beam will need to be wider than that
of higher frequencies (shorter wavelength).
Due to the need to process the difference in time delay, and
the need to capture frequencies between 300Hz to 3.3kHz, the traditional
array microphone needs to be at least 30mm apart. This brings
about many limitations.
To understand why, please look at figure 1. In this example,
the 2 microphones are facing 0º, meaning that the beam center
is the y-axis. Now, let’s assume the signal source at point
A is playing at the same dB level as the signal source at point
B. Let’s also assume that point A and point B are the same
distance away from the center of the array. In this case, the
signal from source A will be suppressed because the array microphone
can obviously detect that source A is outside the beam (time delay
to Mic 1 is much longer than time delay to Mic 2). However, the
signal from source B will not be suppressed, because to the traditional
array microphone, source B is effectively in the middle on the
beam, since the difference in time delay is exactly the same to
Mic 1 as to Mic 2. This limitation applies to every plane throughout
the z-axis, as well as directly behind the array (180 degrees).
Thus, the traditional array microphone can only effectively suppress
noise in a 2-D manner (in our example, only noise on the xy-plane
is canceled). Please refer to figure 3 for the effective beam.
Figure 1. Traditional Array Microphone Setup
Another major disadvantage for the traditional approach is the
wind noise suppression. Since that wind has a random blow pattern
and runs at various frequencies, it is very difficult to differentiate
between the voice and wind noise in the car given the long distance
between the array microphones. Fortemedia’s SAM technology
is ideal for providing wind noise suppression. The following few
paragraphs, will explain how this unique SAM feature is capable
of suppressing wind noise.
(3.2) SAM (Small Array Microphone)
Beam-formin
SAM beam-forming technology is unlike traditional setups. SAM
beam-forming technology uses 2 omni directional microphones, or
1 uni-directional microphone and 1 omni-directional microphone.
Since these 2 microphones can be placed very close to each other,
,the information coming to both microphones is highly correlated
(virtually the same). Consequently, the beam-forming capability
relies on the intelligence of Fortemedia’s algorithm to
decipher the commonality of the information as opposed to the
difference.
Since SAM’s elements can be placed virtually right next
to each other, the effective beam is a 3-D cone shaped beam with
its vertex right in the middle of the 2 microphones. This has
many advantages compared to the traditional array microphone.
To understand the advantages, please refer to Figure 2. In this
example, the setup is exactly the same as Figure 1, except the
receiving device is a small array microphone instead of the traditional
broad array microphone. For SAM, the signals from source A and
source B are exactly the same (in this case, both outside the
beam). This applies throughout the y axis, forming a 3-D cone-shaped
beam. Noise above, below, and behind the beam is effectively suppressed.
Please see Figure 3 for the effective beam.
Figure 2. SAM Setup
Figure 3. Beam Comparison
With Fortemedia’s patent pending SAM technology, the distance
between 2 microphones may be shortened to a few millimeters. SAM
supports two type of array microphone configurations: omni-omni
microphones or uni-omni microphones configuration. The resulting
unique cone-shaped beam is able to suppress noise right above
and below the microphone array (see figure 3).
Figure
4. Beam-forming Effect of Small Array Microphones
Figure 5. SAM Polar Pattern
For the uni-omni configuration, SAM processes acoustic
information based on the different characteristics between the
unidirectional (main) microphone and omni-directional (reference)
microphone, it achieves an exceptional beam-forming effect (see
figure 4) to suppress non-stationary noise by up to 25 dB and
to enhance the voice quality, while still maintaining a very small
form factor. This configuration however, does not support wind
noise suppression due to the big difference between the uni and
omni directional microphone phase characteristics .
(3.3) SAM (Small Array Microphone) Wind Noise Suppression
In addition to the beam-forming, SAM has creates a unique wind
noise suppression with the omni-omni configuration. Since the
distance between the two omni directional microphones (assuming
the omni directional microphones have the same phase characteristics)
is 1cm, SAM processes the “similarity” of both omni
diectional microphones to form the beam. In the process of extracting
the similar acoustic and phase information from both microphones,
the effect created by the wind blow effect is easily filtered
out. This is due to the speed difference between the voice and
the wind (330 m/s vs. 10 m/s avg. for example for 36km/h). The
pressure applied to each microphone caused by the wind blowing
is totally uncorrelated vs. the phase characteristics created
by voice. Therefore, these two effects can be easily differentiated
by SAM with two omni directional microphones only 1cm apart! A
25dB of wind noise suppression without much voice distortion can
be achieved by SAM from a very simple yet elegant approach.
Fortemedia systems work best with 2 microphones, but it can
also work with just one microphone. 2 microphones in a small array
microphone configuration can be utilized, to form a conical beam
by simply pairing it with either a unidirectional microphone or
an omni-directional microphone.
(4) Non-Artificial Noise Smoother
A CDMA handset has its own noise suppression to:
suppress the background noise, but it requires a long convergence
time, and
cut off the background completely when there is no voice
from the near end.
The problem occurs in the scenario where there is a hands-free
car kit at the near end: there is an attempt to suppress noise
twice. First by the hands free car kit, then by the CDMA handset.
When the near end is in a noisy environment, the far end will
hear unstable noise coupled with the near end’s distorted
voice due to the long convergence time from the CDMA handset’s
noise suppression. This effect reduces the intelligibility and
deteriorates the quality of communication. A noise smoother is
typically used for adding some noise back to the background to
shorten the convergence time of the handset noise suppression,
therefore improving the communication quality. However, conventional
noise smoothers are artificial, which introduce unnatural background
noises that create very unpleasant user experiences.
Fortemedia’s Non-Artificial Noise Smoother uses the actual
environment noise as the base as opposed to artificially generated
one. When applied to the hands free applications, the noise level
can be adjusted for the handset to maximize the comfort noise
generator.
Conclusion
Combining AS3NLF, powerful Channel Control and small array microphone
Beam-forming, Fortemedia’s SAM delivers
(1) a superior AEC performance of up to 65dB (including non-linear
echo cancellation),
(2) 25dB of non-stationary noise suppression, and
(3) 25dB of wind noise suppression.
Fortemedia then embodies these technologies into a very small single
chip.Integrated with DSP, memory, CODEC, and hardware accelerator,
Fortemedia’s low power solutions are the ideal single-chip
solutions for hands free communication applications such as hands
free car kit, speaker phone, mobile or other embedded communications
and voice input devices.