SAM
SAM Benefits
SAM vs. Broadside Array Microphone
SAM in Automotive Hands-free Applications
SAM in VoIP Applications
SAM in Handheld Applications
History of Adaptive Filter
Strength & Pending Patents
 

Pending Patents:

Adaptive step size sub-band non-linear filter (AS3NLF) technology
Channel Control
Adjustable suppression factor post filter
Non-artificial noise smoother

SAM (Small Array Micrphone) Technology

SAM consists of the following major functions:

(1) AS3NLF
(2) Channel Control
(3) Small Array Microphone and Cone Shape Beam-forming
(4) Non-artificial voice smoother.

(1) AS3NLF

AS3NLF is implemented in our state of the art single-chip SAM echo cancellers such as FM1182E or FM1093. With echo cancellation in both time and frequency domain and innovative non-linear processing, AS3NLF is able to handle extremely large echoes. In addition, a special Channel Control and Beam-forming with array microphones is also applied.

AS3NLF has two major portions: Adaptive Step Sized Sub-band NLMS and Non-Linear Filter, as shown in figure 2. With AS3NLF, SAM achieves 65dB (Adaptive Step-Sized Sub-band NLMS contributes 30dB and Non-Linear Filter contributes 35dB) of acoustic echo cancellation performance in a mostly linear system.

(1.1) AS3 NLMS (Adaptive Step-Sized Sub-band Normalized Least Mean Square Algorithm)

AS3 NLMS does not use noise to train the echo canceller. Rather, it continuously trains on the normal exchange of speech during the conversation. This results in a more natural flow of conversation from the beginning of the conversation. Fortemedia technology also allows talkers to move around.

AEC adaptation can be roughly divided into two phases: large, rapid changes are required to adapt to major acoustical changes (such as moving to a new room); smaller changes are required to adapt to minor perturbations or echo path changes (people moving, doors opening, etc.). When an AEC is first operated in a room/car or moved to a new location, it needs to adapt to the new acoustics of its surroundings. A good AEC approaches this level of acoustical change quickly and unobtrusively by determining when it is in the receive state and adapting rapidly during that state. AS3 NLMS uses adaptive step-size to automatically adjust the change rate based on the echo environment. When there is only near end talk, the step size is minimized, and becomes medium when there is double-talk. The step size is maximized when far end talks. This method allows the system to quickly switch among the various states and still maintain stability. Hence it performs extremely well on the echo path change.

(1.2) Nonlinear Filter

Non-linear echo is caused by over driven speaker, non-perfect audio design, low supply voltage, mechanical vibrations, microphone saturation, and echo path changes. These issues happen often in the cellphone and other portable devices where

  • very small speaker with very low speaker output volume
  • battery voltage source, 3.7 to 4.4V, is too low to drive audio amplifier to deliver enough loudness in linear operation range
  • light weight, small form factor, and compact system design prohibit a good sound chamber effect to increase loudness
  • enormous mechanical vibration from speaker to microphone since they are in the same mechanical enclosure.

If non-linear echo is not dealt with properly, it will result in howling and echo from the speaker, or the system would revert to half duplex, if there is such a mechanism built-in.

Traditionally, center clipping is widely used to suppress non-linear echo. The major disadvantage of center clipping is that it does not suppress echo without significantly degrading the full duplex performance and voice quality. On the other hand, Non-Linear Filter uses very different techniques to suppress non-linear echo. There are three major elements in the Non-Linear Filter:

  • Big Echo Cancellation
  • ost Filter
  • Frequency Domain Non-Linear Filter

Fortemedia's Big Echo Cancellation process first detects and classifies the non-linear echo element by correlating the reference signal from the reference microphone and the main signal from the voice pickup microphone. The resultant correlated signal process can handle very large and severe non-linear echo. Fortemedia’s nonlinear process also suppresses the non-linear component accordingly based on the degree of non-linearity; therefore it doesn’t affect full-duplex performance.

Fortemedia's Post Filter utilizes sophisticated suppression factors to eliminate residual echo. Depending on the proportion of the linear and non-linear echo, the Post Filter can achieve an additional 25 to 35dB of echo cancellation with a faster convergence time than other echo cancellers. This allows Fortemedia’s SAM products to remove echo tail and adapt rapidly and unobtrusively to changing acoustic conditions. Users can move about freely during the conversation without degrading communications quality.

The Frequency Domain Non-Linear Filter utilizes the difference in acoustic characteristics between two microphones, preferably an omni-directional and a unidirectional microphone, to further identify and suppress specific non-linear echo elements. This mechanism takes advantages of correlated acoustic information from the two microphones and effectively eliminates echo created by a loudspeaker operating in non-linear range. A much better full duplex performance is achieved this way while suppressing non-linear echo compared to other non-linear filters.

The combined SAM Non-Linear Filter automatically adapts to changes in the placement of both the loudspeaker and microphone as well as to changes in loudspeaker volume. The system integrator is freed from the design constraints required of other echo cancellers , and end users are enabled with flexbilble usage models.

The 65dB AEC performance from Fortemedia (compared to 35dB AEC performance with conventional acoustic echo cancellers) offers significantly greater acoustic power output. This permits great flexibility in microphone and loudspeaker placement and volume adjustment. With the AS3NLF technologies, SAM achieves between 25dB to 35dB side tone reduction.

(2) Channel Control

A traditional AEC uses “Voice Activity Detection” to converge echoes. But using “Voice Activity Detection” is not as decisive or may even make incorrect decisions during this critical phase due to the fact that Voice Activity Detection is hardly accurate by nature. As a result, the AEC remains un-converged for a long time, and in extreme cases, never properly adapts. Some echo cancellers force the speakerphone to store the room characteristics after the initial convergence. This compensates for the fact that the echo canceller is not capable of converging quickly to major acoustical changes. In this scenario, the AEC must undergo a rapid training procedure to learn the interior environment from its un-initialized state. Once trained, it adapts to small acoustical changes, but major changes require retraining. This training usually takes the form of a loud burst of noise or a sequence of tones, which the AEC uses to adapt to the gross acoustical characteristics of its environment. Fortemedia does not use “Voice Activity Detection” in its AEC algorithm therefore no initial training required.

Fortemedia’s Channel Control compensates for inadequacies in the conventional AEC by restricting the rate of change and the amount of change allowed in the adaptive filters. This prevents the AEC from going too far out of convergence by adapting too rapidly when it is confused by a major disturbance, while allowing the AEC to track relatively minor changes such as a door opening or slow movement of a driver in a car. The most complex and difficult task of an effective AEC is reliably determining when to permit its internal acoustic model to adapt to changes in the acoustic character of the car/room. Such changes occur when volume levels are changed, people move about, doors are opened or closed, the loudspeaker or the microphone is moved, etc. Adaptation should only occur when in receive mode (for example, when the near end party is silent and the far end party is talking). Inaccurate mode decisions cause the AEC’s internal model to diverge resulting in echoes which are not effectively canceled. Inaccuracies in this decision process can cause another significant problem, one that is handled very elegantly with Fortemedia’s Channel Control. In addition to reducing the effectiveness of the echo cancellation, inaccurate state or mode decisions may introduce artifacts into the transmitted or received speech signals. Words may be clipped or exhibit dropouts. Switch loss or center clipping, which are nonlinear processes, may be applied to a speech signal in the wrong mode. This causes chirping or warbling artifacts that can be annoying and distracting. This can be noticeable in any mode, but particularly in doubletalk (both parties talking simultaneously).

AS3NLF and SAM Channel Control create a smooth transition between states in double talk mode, permit natural, undistorted doubletalk while enabling a fast and accurate adaptation.

(3) Small Array Microphone and Cone Shaped Beam-forming

Array microphone solution has emerged as the most promising technology to suppress non-stationary noise such as human babble, background music, or passing traffic noise. By arranging multiple microphones in an array, companies such as Fortemedia, AKG, Knowles, and even Microsoft can further reduce surrounding noise, providing a more natural sounding voice. Leveraging the information gathered by the multiple microphones about the voice and surrounding environment, an array microphone can process the signals in such a way that effectively forms a beam to pick up the wanted signal within the beam, and cancel out noise outside the beam. Several hands-free car kits using the array microphone solution have already been deployed.

While there are improvements in noise suppression, however, the traditional broad array microphone is still impractical and limited in three ways:

  • Requires at least 30mm between each microphone, putting placement and space constraints on the end solution.
  • Can only cancel noise on a 2-D plane. This makes it harder to pin-point the talker, while allowing noise to leak into the beam; diffused noise, engine noise, rattling of the dash board, and general road noise coming from above and below the pie-shaped beam will cause major problems for voice recognition related applications.
  • Not able to suppress the wind noise (from wind blowing and applying mechanical pressure on the membrane).

SAM (Small Array Microphone) is the next step in the voice interface market. Requiring only 5 to 10 mm between microphones, at least 600% closer than broad array microphones, SAM can be deployed in practically any situation or application. SAM uses a fundamentally different algorithm than the traditional array microphone to process the voice, effectively forming a 3-D cone shaped beam. As such, any noise outside of the beam, whether above or below, will be cancelled out, without any leakage. The discussion that follows will provide more background in the differences between these two array microphone setups.

(3.1) Traditional Beam-forming

Traditional beam-forming utilizes the difference in time delay between signals received at different microphones in the array. As such, the microphones are placed further apart so the information received at each microphone is sufficiently different. The width of a broadside array beam is based on the wavelength of the signal divided by the length of the aperture. So, at low frequencies (longer wavelength), the beam will need to be wider than that of higher frequencies (shorter wavelength).

Due to the need to process the difference in time delay, and the need to capture frequencies between 300Hz to 3.3kHz, the traditional array microphone needs to be at least 30mm apart. This brings about many limitations.

To understand why, please look at figure 1. In this example, the 2 microphones are facing 0º, meaning that the beam center is the y-axis. Now, let’s assume the signal source at point A is playing at the same dB level as the signal source at point B. Let’s also assume that point A and point B are the same distance away from the center of the array. In this case, the signal from source A will be suppressed because the array microphone can obviously detect that source A is outside the beam (time delay to Mic 1 is much longer than time delay to Mic 2). However, the signal from source B will not be suppressed, because to the traditional array microphone, source B is effectively in the middle on the beam, since the difference in time delay is exactly the same to Mic 1 as to Mic 2. This limitation applies to every plane throughout the z-axis, as well as directly behind the array (180 degrees). Thus, the traditional array microphone can only effectively suppress noise in a 2-D manner (in our example, only noise on the xy-plane is canceled). Please refer to figure 3 for the effective beam.


Figure 1. Traditional Array Microphone Setup

Another major disadvantage for the traditional approach is the wind noise suppression. Since that wind has a random blow pattern and runs at various frequencies, it is very difficult to differentiate between the voice and wind noise in the car given the long distance between the array microphones. Fortemedia’s SAM technology is ideal for providing wind noise suppression. The following few paragraphs, will explain how this unique SAM feature is capable of suppressing wind noise.

(3.2) SAM (Small Array Microphone) Beam-formin

SAM beam-forming technology is unlike traditional setups. SAM beam-forming technology uses 2 omni directional microphones, or 1 uni-directional microphone and 1 omni-directional microphone. Since these 2 microphones can be placed very close to each other, ,the information coming to both microphones is highly correlated (virtually the same). Consequently, the beam-forming capability relies on the intelligence of Fortemedia’s algorithm to decipher the commonality of the information as opposed to the difference.

Since SAM’s elements can be placed virtually right next to each other, the effective beam is a 3-D cone shaped beam with its vertex right in the middle of the 2 microphones. This has many advantages compared to the traditional array microphone. To understand the advantages, please refer to Figure 2. In this example, the setup is exactly the same as Figure 1, except the receiving device is a small array microphone instead of the traditional broad array microphone. For SAM, the signals from source A and source B are exactly the same (in this case, both outside the beam). This applies throughout the y axis, forming a 3-D cone-shaped beam. Noise above, below, and behind the beam is effectively suppressed. Please see Figure 3 for the effective beam.


Figure 2. SAM Setup

Figure 3. Beam Comparison

With Fortemedia’s patent pending SAM technology, the distance between 2 microphones may be shortened to a few millimeters. SAM supports two type of array microphone configurations: omni-omni microphones or uni-omni microphones configuration. The resulting unique cone-shaped beam is able to suppress noise right above and below the microphone array (see figure 3).

Figure 4. Beam-forming Effect of Small Array Microphones


Figure 5. SAM Polar Pattern

For the uni-omni configuration, SAM processes acoustic information based on the different characteristics between the unidirectional (main) microphone and omni-directional (reference) microphone, it achieves an exceptional beam-forming effect (see figure 4) to suppress non-stationary noise by up to 25 dB and to enhance the voice quality, while still maintaining a very small form factor. This configuration however, does not support wind noise suppression due to the big difference between the uni and omni directional microphone phase characteristics .

(3.3) SAM (Small Array Microphone) Wind Noise Suppression

In addition to the beam-forming, SAM has creates a unique wind noise suppression with the omni-omni configuration. Since the distance between the two omni directional microphones (assuming the omni directional microphones have the same phase characteristics) is 1cm, SAM processes the “similarity” of both omni diectional microphones to form the beam. In the process of extracting the similar acoustic and phase information from both microphones, the effect created by the wind blow effect is easily filtered out. This is due to the speed difference between the voice and the wind (330 m/s vs. 10 m/s avg. for example for 36km/h). The pressure applied to each microphone caused by the wind blowing is totally uncorrelated vs. the phase characteristics created by voice. Therefore, these two effects can be easily differentiated by SAM with two omni directional microphones only 1cm apart! A 25dB of wind noise suppression without much voice distortion can be achieved by SAM from a very simple yet elegant approach.

Fortemedia systems work best with 2 microphones, but it can also work with just one microphone. 2 microphones in a small array microphone configuration can be utilized, to form a conical beam by simply pairing it with either a unidirectional microphone or an omni-directional microphone.

(4) Non-Artificial Noise Smoother

A CDMA handset has its own noise suppression to:

  • suppress the background noise, but it requires a long convergence time, and
  • cut off the background completely when there is no voice from the near end.

The problem occurs in the scenario where there is a hands-free car kit at the near end: there is an attempt to suppress noise twice. First by the hands free car kit, then by the CDMA handset. When the near end is in a noisy environment, the far end will hear unstable noise coupled with the near end’s distorted voice due to the long convergence time from the CDMA handset’s noise suppression. This effect reduces the intelligibility and deteriorates the quality of communication. A noise smoother is typically used for adding some noise back to the background to shorten the convergence time of the handset noise suppression, therefore improving the communication quality. However, conventional noise smoothers are artificial, which introduce unnatural background noises that create very unpleasant user experiences.

Fortemedia’s Non-Artificial Noise Smoother uses the actual environment noise as the base as opposed to artificially generated one. When applied to the hands free applications, the noise level can be adjusted for the handset to maximize the comfort noise generator.

Conclusion

Combining AS3NLF, powerful Channel Control and small array microphone Beam-forming, Fortemedia’s SAM delivers

(1) a superior AEC performance of up to 65dB (including non-linear echo cancellation),
(2) 25dB of non-stationary noise suppression, and
(3) 25dB of wind noise suppression.

Fortemedia then embodies these technologies into a very small single chip.Integrated with DSP, memory, CODEC, and hardware accelerator, Fortemedia’s low power solutions are the ideal single-chip solutions for hands free communication applications such as hands free car kit, speaker phone, mobile or other embedded communications and voice input devices.

[ Back to TOP ]
© 1997-2006 Fortemedia, Inc. All Rights Reserved. Site MapTerms of Use & Disclaimers