a novel frequency domain bwe with relaxed synchronization and associated bwe switching lei miao,...
DESCRIPTION
3GPP EVS codec overview 3GPP Enhanced Voice Services (EVS) codec standardized in Sep Significantly improves user experience EVS codec encodes the signal based on the signal content: Time-Domain LP coding techniques are used: ACELP and GSC GSC (Generic audio Signal Coding) is used to improve the quality of music/mixed segments in LP domain. Frequency-Domain coding techniquesTRANSCRIPT
A Novel Frequency Domain BWE with Relaxed Synchronization
and Associated BWE Switching
Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu,
Jon GibbsHuawei Technologies Co. Ltd
Beijing, China
Kihyun Choo, Eunmi OhSamsung Electronics Co., Ltd.,
Seoul, Korea
Václav Eksler VoiceAge Corp.,
Montreal, QC, Canada
Agenda 3GPP EVS codec overview
Prior-art review of BWE techniques
Multi-mode FD BWE
Multi-Mode FD BWE with Relaxed Synchronization
BWE Switching Mechanism
Quality Evaluation
Conclusions
3GPP EVS codec overview
3GPP Enhanced Voice Services (EVS) codec standardized in Sep. 2014. Significantly improves user experience
EVS codec encodes the signal based on the signal content: Time-Domain LP coding techniques are used: ACELP and GSC
GSC (Generic audio Signal Coding) is used to improve the quality of music/mixed segments in LP domain.
Frequency-Domain coding techniques
Prior-art Review of BWE techniques
BWE exploits the intrinsic correlation between the low and high frequency parts of a signal’s spectrum in order to reconstruct the high frequency part. Spectral Band Replication (SBR) in MPEG-4 HE‑AAC. A multi-mode bandwidth extension scheme in Recommendations
ITU-T G.711.1 Annex D and G.722 Annex B. SWB is a key feature of EVS as a new 3GPP codec, important to
extend the bandwidth from WB to SWB/FB. Time domain (TD) or Frequency domain (FD) BWE for different types
of input signals. The paper focuses on FD-BWE on top of either ACELP or GSC. This switched bandwidth extension (BWE) approach improves the
EVS codec LP based coding efficiency.
Multi-mode FD BWE Concept standardized for the first time in Recommendations ITU-
T G.711.1 Annex D and G.722 Annex B. A transient detector identifies rapid variations of the high band
signal over time. 4 classes: TRANSIENT (TS), HARMONIC (HM), NORMAL (NM) or
NOISE (NS). A combination of adaptive spectral envelope and time
envelope coding, derived from the high band signal. TRANSIENT frames: four spectral envelopes and four time
envelopes. Non-TRANSIENT frames: fourteen spectral envelopes and no time
envelope. The high frequency band excitation is generated by either
normalizing the selected region of the low frequency band with an adaptive normalization length or generated by random noise.
The design constraints imposed upon the EVS codec specify that the total algorithmic delay of the codec must not exceed 32 ms.
Transform used by FD-BWE: an Asymmetric Low Delay Optimized (ALDO) window with a time support of 40 ms while the non-zero window length is 28.75 ms.
Still insufficient delay allowance remaining for the FD BWE to achieve the overall 32 ms delay requirement.
Multi-mode BWE in ITU-T G.711.1 Annex D and G.722 Annex B as a baseline. Then the new low delay FD-BWE approach relies on a relaxed synchronization scheme, based on the multi-mode BWE in ITU-T codecs.
FD BWE constraints in EVS
The relaxed synchronization of multi-mode FD BWE is achieved by utilizing the time difference between the high frequency band excitation and the high frequency band envelope.
Assume D1 is the delay of the low frequency band coding, D2 is the delay of high frequency band coding where D2 is introduced by the windowing prior to the MDCT.
The total algorithmic delay of the prior bandwidth extension algorithms is (D1+ D2). By the proposed scheme, the delay may be adaptively reduced to the range [max(D1, D2), (D1+D2)] with minimal impact on the perceptual quality. When the spectrum is relatively stable, the phase of the high frequency
band signal is of relatively minor perceptual significance when compared to its energy.
By reducing the delay, the time alignment between the energy envelope of the low and high frequency band signals is maintained while a short time misalignment (or a relaxation) of the synchronization between the high frequency band excitation and envelopes is permitted.
Relaxed synchronization
An asymmetric low delay optimized (ALDO) window is used for FD BWE.
Assumes that the target overall delay is Dt, which is in the range [max(D1, D2), (D1+D2)]. The input time domain signal may be delayed by (Dt−D2).
frame m+1 frame m-1 frame m
Dt – D2
Encoder
frame m
frame m
frame m
Delayed low frequency band signal frame m
High frequency band excitation
High frequency band spectral
envelope frame m Dt-D2
Dt
Decoder
Input signal
D1+D2
Decoded high frequency band signal
Dt
frame m Decoded low
frequency band signalD1
The time misalignment between the high frequency band excitation and the high frequency band envelope is {(D1+D2)−Dt}. Consequently a lower delay than (D1+D2) is achieved by the proposed FD BWE.
Time alignment
The time envelope is calculated on top of the delayed high frequency band time domain signal:
It is then adjusted by an attenuation factor R, which represents the energy attenuation of the low frequency band due to the LP based low band coding:
Finally the time envelope is adjusted:
Spectral envelopes: multi-stage split VQ. The envelopes at even positions are quantized by Split VQs. The prediction errors at odd positions are calculated withinterpolation and quantized by another stage of the Split VQ.
.3,,0,))80((801)( 279
0
jnjsjtn hbrms
,))(())((1
021
02
LFLF N
n oriN
n syn nsnsR
,3,,0,,otherwise)(1ifelse)(
,5.0if)(5.1)(
j
jtRRjt
RjtRjt
rms
rms
rms
rms
Transient frames
Non-Transient frames Energy control in each sub-band to prevent unpleasant distortion. The distortion may occur due to the un-matched characteristics of the original
and the generated spectra. Energy control adjusts the energies depending on the comparison of the
tonalities of the two spectra to avoid the distortion. Adaptive normalization length to generate the high frequency band
excitation The more harmonic the high frequency band is, the longer normalization length. Depends on the number of the sub-bands of the low frequency band whose
peak to average ratio is larger than a threshold, nh,
A pre-echo reduction is performed to improve the performance of fricatives for Non-Transient frames.
,if24,232max
,if5.08,if25.04
HMmodenNSorNMmoden
TSmodenL
h
h
h
Other techniques
The proposed relaxed synchronization FD BWE scheme is applied in the EVS codec for WB at 13.2 kbps and for SWB at 13.2 kbps and 32 kbps.
The BWE bit budget for SWB is 31 bits, while it is 6 bits for WB. For WB coding, there are fewer spectral envelopes and no time
envelopes encoded since the WB FD BWE covers only frequencies from 6~8 kHz.
The delay parameters correspond in the EVS codec to Dt = 12 ms (overall delay constraint minus the frame length), D1 = 9.6875 ms (encoder look-ahead plus encoder resampling) D2 = 8.75 ms (overlap length).
This results in the time misalignment between the high frequency band excitation and the high frequency band envelope being 6.4375 ms.
EVS implementation details
BWE Switching Mechanism
In general, TD BWE on top of ACELP performs well when encoding active speech segments and FD BWE on top of GSC performs well when encoding inactive and mixed/music segments. However, some mixed/music segments are better coded with ACELP coding and FD BWE.
If the input signal is classified as a music signal, or the low frequency band signal is classified as inactive, multi-mode FD BWE is used irrespective of whether the low frequency band is coded with ACELP or GSC. Otherwise, if the input signal is classified as a speech signal, TD BWE is used no matter how the low frequency band has been coded.
When the high frequency band signal is judged to contain inactive or mixed/music content signals then FD BWE is used as the high band coding technology.
ACELP coding
TD-BWE
bandwidth
GSC coding
0 Technology
FD-BWE FD-BWE
bandwidth
0 Technology
bandwidth
0 Technology
TD-BWE
FD-BWE TD-BWE TD-BWEFD-BWETD-BWE
GSC coding
ACELP coding
GSC coding
GSC coding
ACELP coding
ACELP coding
Quality Evaluation MUSHRA, 95% confidence intervals, 16 expert listeners, 16
mixed content items and 16 music items. Two variants were evaluated:
Low delay (LD) FD BWE configured to have overall delay Dt = 12 ms.
High delay (HD) FD BWE configured to have overall delay of Dt = (D1+D2) = 18.4375 ms.
EVS SWB at 13.2 kbps.
Low delay FD BWE is statistically equivalent to the high delay FD BWE.
Conclusions A novel multi-mode FD BWE scheme with relaxed
synchronization optimized for inactive and mixed/music content signals is presented.
It forms a part of the LP based coding of the 3GPP EVS codec.
High subjective quality and low algorithmic delay are achieved by relaxing the time alignment constraints between the high frequency band excitation and its envelope.
A switching mechanism between two different BWE technologies shows a performance advantage.
Thank you!