development of a microcontroller based dlct end-point device1152628/fulltext01.pdf ·...

17 010 oktober

Examensarbete 15 hpOktober 2017

Development of a microcontroller based DLCT end-point device

Adam MyrénSimon von Schmalensee

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Development of a microcontroller based DLCTend-point device

Adam Myrén, Simon von Schmalensee

In this bachelor thesis the possibility of implementing a fully functioning Digital Signal Processing system based on the ARM-Cortex-m7 microcontroller from ST- microelectronics is investigated and implemented. The microcontroller is equipped with an Floating Point Unit, which resulted in the filter calculations being performed in floating-point arithmetic instead of fixed-point. The system is intended to be used in audio room correction with filter coefficients calculated by DLCT (Dirac Live Calibration Tool) which is a software distributed by the company Dirac Research. The main system components are a run-time where the audio is processed and a TCP/IP server for communication over ethernet between the system and DLCT. The system is also able to play stimuli sounds on a command from DLCT.The final system is capable of executing the filter calculations required for room correction with the filter topology used. The communication between DLCT and the subsystem was not fully established but the TCP/IP server was implemented and is a good foundation if the project is to be resumed in the future.The work showed that a modern microcontroller is able to perform real-time audio signal processing without the use of a digital signal processor which is more expensive and has a higher development cost.

ISSN: 1654-7616, 17 010 oktoberExaminator: Hana BarankovaÄmnesgranskare: Ladislav BardosHandledare: Patrik Berglund

AcknowledgementsThe authors wish to express sincere appreciation to our supervisor Patrik Berglund at DiracResearch for the massive support through out the project. We would also like to thankProfessor Mikael Sternard from the department of Signals and Systems for all his guidance.And a special thank you to Kjell Sta�as for making the project possible in the first place.

1

Contents1 Introduction 5

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Theory 62.1 Binary representation of numbers . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Fixed point vs floating point in DSP-applications . . . . . . . . . . . 62.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Reconstructing/ Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4.1 FIR filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.2 IIR filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Communication protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6.1 UART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6.2 I2S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.3 TCP/IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.7 DMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.8 Room correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.8.1 System requirements for room correction . . . . . . . . . . . . . . . . 182.9 DLCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Software structure 203.1 Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Audio loop section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Collection of input data . . . . . . . . . . . . . . . . . . . . . . . . . 223.2.2 Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.3 Outputting processed samples . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Stimuli playback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 Ethernet Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.5 GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Results and Discussion 264.1 Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Software structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Conclusion 28

3

Abbreviations & ExplanationsADC Analog-to-Digital Converter

API Application Programming Interface

AVR Audio/Video Receiver

BSP Board Supply Package

CMSIS Cortex Microcontroller Software Interface Standard

CODEC Coder/Decoder

DAC Digital-to-Analog Converter

DLCT Dirac Live Calibration Tool

DSP Digital Signal Processing

FATFS File Allocation Table File System

FFT Fast Fourier Transform

FIR Finite Impulse Response

FPU Floating Point Unit

GUI Graphical User Interface

I2S Inter-IC sound

IFFT Inverse Fast Fourier Transform

IIR Infinite Impulse Response

ISR Interrupt service routine

LTI Linear Time Invariant

LWIP Lightweight Internet Protocol

MAC Media Access Control

MCU Microcontroller Unit

PHY Physical layer

RPC Remote Procedure Call

UART Universal asynchronous receiver/transmitter

int8/16/32 Signed 8/16/32-bit integer

4

1 Introduction1.1 BackgroundSince the sound systems entry into the consumer market, technologies have been used tooptimize the sound with respect to the acoustics of the environment. Earlier this was onlypossible by constructing di�erent types of analog filters specified for a particular position ina room. With the development of the modern computer and the ability to digitize sound andprocess it digitally, new methods based on digital signal processing have been developed. Thishas meant that room correction can be done by using software that creates tailored digitalfilters. Thus, the same software can be used for di�erent speakers and di�erent environments.Dedicated signal processors (DSPs) have usually been used for this signal processing. Becausesystems based on DSPs also usually require a MCU for communication and interface betweenexternal devices, it would be desirable to also perform the signal processing only with theMCU. At present, there are several MCUs suitable for signal processing, which makes thispossible. Another reason for this is that the development costs for MCU applications arelower than the equivalent of DSP.Dirac Research is a company which develops algorithms and software for audio optimizationof speaker systems. One of their products is a room correction software that is used tominimize the speaker and room coloring of the sound. The frequency response for the roomand speakers is measured at di�erent locations and then combined to create filters thatcompensate for this. At the moment, the music must be played through the same computeron which the program is installed. To reduce the system’s limit, it is desirable to be able toplay music from other devices and easily switch between them. By implementing the filteron a stand-alone platform and using the computer only when measuring the room acousticsand calculating filter coe�cients this can be achieved.

1.2 PurposeThe main purpose of this bachelor thesis is to investigate the possibility of implementing aDSP-system, powerful enough to be used for high-end real-time room correction on a smallembedded platform based on an ARM Cortex-M7 microcontroller with a limited amount ofprocessing power and memory capacity.

1.3 GoalThe goal of this project is to implement a system for real-time audio signal processing on aSTM32F746NG microcontroller. The filter coe�cients used in the filter should be calculatedby DLCT. The system and the computer where DLCT is installed should communicate overethernet for transmitting filter coe�cients as well as commands to start and stop stimuli.The system should also be able to store the received filter coe�cients on a µSD-card andload it into the RAM(Random Access Memory) at the beginning of each start up.

5

2 Theory2.1 Binary representation of numbersDigital electronics uses a base-2 number system called binary. The number is represented asa sequence of 1’s and 0’s where each individual element is called a bit. The sequence by itself does not represent a unique number, instead it depends on how the computer interpretsit. There are two common ways to represent decimal numbers in a binary format calledfloating point and fixed point. A floating point number consist of three parts, the sign, theexponent and the mantissa. The standard definition of a 32-bit floating point number isgiven by IEEE-754. A floating point number can be written as

(≠1)s ú 2e≠127 ú3

1 + m

223

4(1)

Where s represents the sign bit, e represents the exponent and m represents the mantissa[1].

Figure 1: structure of 32-bit floating point representation

Figure 1 shows how a floating point number is structured in a computer. This way ofrepresenting numbers makes it possible to have a wide dynamic range. With fixed pointrepresentation one uses a integer which is scaled by a specific factor. The same scaling factorcan not be changed under computation, which means unlike floating point representationthat a specific fixed point data type always have the same number of digits after the radixpoint. A disadvantage with fixed point representation is that all of the parts in a computerprogram which process a type of fixed point numbers must keep track of where in the binarynumber the radix point is located.

2.1.1 Fixed point vs floating point in DSP-applications

DSP applications uses either floating point or fixed point representations. Using floatingpoint representation gives a larger dynamic range and it is usually easier to write algorithmsfor floating point data types than for fixed point[2]. However the increase in precision requiresadditional storage and significantly more computing power than fixed point. Some processorsare equipped with dedicated FPU’s which takes care of the floating point computation inhardware which speeds up the computational process. Floating point hardware are morecomplex, require more power and has a greater production cost than fixed point hardware.[2]

6

2.2 SamplingTo be able to represent an analog signal in a computer, the signal must go through a processcalled sampling. The process is done by taking measurements of a signal at distinct timepoints as shown in figure 2 . It is common that the time interval is uniform throughoutthe process. If the signal can be reconstructed from the samples without any errors, theprocess is considered to be well performed. An important theorem regarding sampling isthe Nyquist theorem. The theorem distinguishes if the sampled signal can be reconstructedwithout errors or not. It states that for a signal x(t) where

F (x(t)) = X(Ê) = 0 for some |Ê| > Êm (2)

x(t) is reconstructible from its samples x[nT ] if the sampling frequency ws is > 2wm. Tobe able to reconstruct a signal without errors a sampling frequency two times the largestfrequency component occurring in the sampled signal must be used. This frequency is usuallyreferred to as the Nyquist frequency. If a signal is sampled with sampling frequency lowerthan the Nyquist frequency, a phenomenon called aliasing will occur which makes perfectreconstruction impossible[3].

Figure 2: Figure showing the analog signal and its digital representation with a samplingperiod of St

A human can hear sound in the frequency range [20 ≠ 20kHz]. When sampling an arbitrarysound signal one have to use a sampling frequency of at least 40kHz to be sure that thesignal can be perfectly reconstructed. In audio applications the common standard is to usea sampling frequency of 44.1kHz or 48kHz.

7

2.3 Reconstructing/ InterpolationA continuous signal x(t) bandlimited to a frequency F which has been sampled with a sam-pling frequency of 2F can, according to the Nyquist theorem, theoretically be reconstructedwithout error by passing the signal through an ideal reconstruction filter which in the fre-quency domain has the following form[3]

H(2fif) =Y]

[As if f < |F |0 else

(3)

This theoretical method of ideal reconstruction can not be implemented in a real systemdue to the fact that the filter has a infinite impulse response and is not casual. An easyand common way to implement an interpolation method in a real system is to use theZOH-method.[3] An output signal x(t) from a interpolator using the ZOH-model is given by

x(t) =Œÿ

n=≠Œx[n] · rect

Q

at ≠ T2 ≠ nT

T

R

b (4)

Equation 4 shows that the output signal is obtained by convolving each sample with a casualrectangular pulse.

2.4 Digital FiltersLTI(Linear Time Invariant) filters are often characterized by the magnitude and phase re-sponse in frequency domain and the impulse response in time domain. From the filterstransfer function H(z) all this information can be derived. A filter magnitude and phaseresponse of the transfer function, input and output can be expressed as

|Y [z]| = H[z]X[z] (5)

and\Y [z] = \H[z]\X[z] (6)

A discrete filter impulse response can be obtained by taking the inverse z-transform of thetransfer function. The two main types of digital filters are FIR(Finite Impulse Response)and IIR(Infinite Impulse Response) filters. These are described in the following two sections.

2.4.1 FIR filter

The main advantages of FIR filters are [4]

1. They are always stable

2. They are less sensitive than IIR filters to round of errors due to finite precision math.

3. They can be constructed to have linear phase.

8

These advantages come with the disadvantage that they require a higher order to accomplisha greater roll o�, compared to IIR filters. A block diagram of a FIR filter structure can beseen in figure 3 where the z≠1 blocks represents a delay by one sample and the b’s is theindividual samples of the impulse response, or filter coe�ciants. To compute an outputsample from the filter the input samples are shifted in from the left, multiplied by therespective impulse response sample and added to the output. In this example the outputwould be calculated as

y[n] = x[n]h[0] + x[n ≠ 1]h[1] + x[n ≠ 2]h[2] ... + ... x[0]h[n] (7)

where h[0] = b0 and so on. This calculation is the discrete convolution of the input signalx[n] with the impulse response h[n] and convolution is the given method used for calculatingoutputs of FIR filters. The discrete linear convolution sum can be described mathematicallyby

y[n] = x[n] ú h[n] =nÿ

k=0x[k]h[n ≠ k] (8)

where n is the number of taps in the impulse response. k starts at zero because of the filterbeing causal.

Figure 3: Block diagram of a FIR filter

Since FIR filters have a finite impulse response and the output only depends on a finitenumber of previous inputs, they are always stable. This can be shown with the BIBO(bounded-input, bounded-output) stability condition stating that a system is BIBO stableif the impulse response h[n] of the system satisfies

Œÿ

n=0|h[n]| Æ Œ (9)

Since the impulse response always contains a finite number of elements the resulting sumwill be finite and the system will always be stable.

9

Since FIR filters are frequently used but do not have a great computational e�ciency, dif-ferent convolution methods exist. The convolution theorem states that convolution in onedomain corresponds to point wise multiplication in the other domain. Linear convolutionin the time domain can be achieved either with the straightforward linear convolution sumor with FFT(Fast Fourier Transform) convolution. The linear convolution sum produces alinear convolution with the signal and the filter kernel, which is desired when convolvingreal-time or very long signals where the longer signal needs to be divided into smaller blocks,and are not periodic.

The convolution sum has a quadratic computational complexity of approximately O(n2)and takes a lot of computational power when convolving large filters. With FFT convolutionthis computational complexity can be reduced to O(n log(n)).[5] Because of this improve-ment the FFT convolution is often used when convolving large filter kernels and dealing withlong signals.

The FFT convolution produces circular convolution [4] which is not desired when the signalto be processed is divided into smaller blocks. To achieve linear convolution both the inputsignal and filter kernel must be zero-padded to the same length as the FFT before the FFTis computed. If a filter h[n] of length M and a signal x[n] of block size L is used the FFT sizeN must be N Ø (M +L≠1). After the signal and filter kernel has been zero-padded and theFFT of them has been computed they are multiplied together point wise. The IFFT(InverseFast Fourier Transform) is then computed to retrieve the signal to the time domain again.This results in a signal that is M+L-1 samples long when the input signal originally wasM samples long. To compose this raw output signal to the final output signal the last M-1samples from the previous processed block is saved and added to the first L samples of thecurrent output block. This is called overlap add and is a common method to produce linearconvolution from the circular convolution. Figure 4 shows the steps that are involved in thetime domain part of the overlap-add process.

10

Figure 4: Graphical interpretation of the overlap-add method [6]

The steps involved in the overlap-add method are

1. Take L samples of the incoming signal x[n] and zero pad to achieve a signal length ofN samples. Zero pad the filter kernel h[n] to the same length.

2. Do a N point FFT of the zero-padded samples to produce the frequency domain X[n]and H[n]

3. Do a point wise multiplication of X[n] and H[n] to get the frequency domain outputY [n] = X[n]H[n].

4. Take the IFFT of Y [n] and get the raw output yú[n]

5. Add the last M ≠ 1 samples from the previous output and add it to the first samplesof yú[n] to get y[n]. Save the last M ≠ 1 samples from the current output block to beadded to the next output block.

For computing e�ciency the FFT of the zero-padded filter kernel is often computed once inthe beginning of a program or application and saved for all filter operations.

2.4.2 IIR filter

IIR filters are more computational e�cient than FIR filters but comes with the disadvantagethat the stability can not always be guaranteed. IIR filters depend on previous inputs andon previous outputs, which makes it a feedback or recursive filter. This recursion makesthe IIR filter more e�cient since they produce a longer convolution with fewer coe�cients

11

and operations.[7] IIR filters are more sensitive to quantization and round o� errors. Thetransfer function of a second order IIR filter can be expressed as:

H(z) = b0 + b1z≠1 + b2z

≠2

1 + a1z≠1 + a2z≠2 (10)

Where the a’s and b’s denote the filters coe�cients.

When implementing an IIR filter for use with floating point arithmetic, the transposeddirect form II structure is suitable since floating point arithmetic is not sensitive to overflowand the structure saves two memory locations in the state variables, compared to direct formI. [8] This structure is not suitable when fixed point arithmetic is used because of the widedynamic range needed in the feedback state variables. Figure 5 shows a block diagram ofthe transposed direct form II structure. The input-output equation for this structure can beexpressed as

y[n] = b0x[n] + d1

d1 = b1x[n] + a1y[n] + d2

d2 = b2x[n] + a2y[n](11)

Where x and y denote input and output respectively, the a’s and b’s are the filter coe�cientsand d1 and d2 is the state variables saved for the next output sample to be computed.

Figure 5: Block diagram of an IIR filter using the direct form II transposed structure

12

2.5 MicrocontrollerThe microcontroller used in this project was selected according to the peripheral devices re-quired and that the CPU would be able to handle many calculations quickly. The peripheralsthat were initially required were I2S communication, Ethernet MAC(Media Access Control),and any communication protocol to interface with an SD card. The microcontroller wouldalso have to be equipped with an FPU to run all calculations in floating point arithmetic.One important factor was that the microcontroller would be on a development board thathad audio codec, audio input and output, ethernet PHY(Physical Layer) with RJ45 connec-tor and SD card reader. The board would also have a programmer/debugger mounted on itto ease the development. The development board chosen was the 32F746NGDISCOVERYdevelopment board from ST-microelectronics. The microcontroller on this board is theSTM32f746ng which has an ARM cortex-m7 CPU core with FPU. The card is equippedwith all the required devices but also other devices used during the project. These are anexternal SDRAM and a capacitive touch display.

2.6 Communication protocols2.6.1 UART

UART(Universal Asynchronous Receiver/Transmitter) is a basic asynchronous serial pro-tocol that preforms full-duplex communication with three signal lines. A UART device isequipped with pins for receiving packets (RX), sending packets (TX) and ground. Being anasynchronous protocol means that there are no common clock signal required. Instead, theuser must set up individual clocks for both of the units. The units must be set up so thatthe internal clocks have the same data-transmission frequency. It is therefore important forhaving flawless UART communication that the clocks are accurate relative to the transmis-sion frequency and stable over time and temperature. If this is not the case there is a riskthat the data that is being sent will be misinterpreted or completely missed.

Figure 6: A correct connection for UART-communication and the structure of a packet.

To alert the receiver to start/stop reading, the transmitter adds start and stop bits into thedata packet which is being transfered. The UART-packet is structured in four blocks as

13

shown in figure 6. It is most common to send the data bits with the LSB first. The paritybit, which is optional, can be used by the receiver for error checking.

2.6.2 I2S

I2S(Inter-IC Sound) is a communication protocol developed by Phillips semiconductors. Theprotocol is a serial interface and is used to transmit stereo PCM audio data. The bus consistof at least three lines which are shown in figure 7. These are

• Word select (WS)

• Serial data (SD)

• Continuous serial clock (SCK)

The word select line decides which channel being transfered, the SD transfers the audio asserial data, and SCK is a common clock shared between the two units which are communi-cating. I2S is a master/slave-protocol where the master provides the shared clock.

Figure 7: A setup of two units communicating with I2S where the transmitter acts as master.

The serial data is transfered in twos complement with the most significant bit first. Thisgives flexibility to the transfer due to the fact that the transmitter and receiver may havedi�erent word lengths. If the receiver receives more bits than its word length, the bits afterthe least significant will be ignored. On the contrary, if the receiver receives fewer bits, theremaining bits will be set to zero.[9] Which clock frequency that should be used is decidedby the following equation

fc = fs · S · 2 (12)Where fc is the clock frequency, fs is the systems sampling frequency and S is sample sizein bits.

2.6.3 TCP/IP

TCP/IP or Transmission Control Protocol/Internet Protocol is a set of layer to layer commu-nication protocols which is used to transmit data between di�erent types of devices. It hasbecome the de-facto standard for communications between computer systems.[10] TCP/IP

14

uses five layers of protocols. Together these protocols are often referred to as a protocolstack. Figure 8 shows the topology of the TCP/IP-stack. On the sending transmitting sidethe stack is successively encapsulating the payload data with headers and trailers regardingformation, package order and addresses. The task for the stack on the receiving side isthen to unwrap the received message and in each layer exam, use and strip of the headerinformation related to that specific layer.

1. Physical Layer: This layer contains the necessary functions needed to be able tosend and receive a bit stream over a physical medium.

2. Data Link Layer: Takes care of the decoding, encoding and organization of the bitstream from and into frames.

3. Network Layer: Creates or disassembles the packets which are moved around thenetwork. It uses IP addresses to associate the packets with a source and destination.

4. Transport Layer: Establishes a connection between applications on di�erent hosts

5. Application Layer: Where the actual payload data is generated.

Figure 8: Flow chart showing the flow of a message through the TCP/IP-stacks of transmitterand receiver

Although the TCP/IP-model is referred to as being a five-layer stack, it is most concernedabout the network, transport and application layer. It defines how the network layer shouldinterface with the data link and the physical layer, but it is not concerned with the twolayers themselves.

15

2.7 DMADMA or Direct memory access is a technique which lets peripherals have direct access tothe main system memory without interfering with the CPU. This means that the CPU canperform other tasks while data is being transfered. The hardware which makes this possibleis called a DMA-controller. Figure 9 shows a basic layout for a DMA-system. For a typicalDMA transfer the DMA-controller gets notified by a peripheral that it wants to read/writefrom/to memory, the DMA-controller then asserts a request signal to the CPU asking forpermission to take control over the data bus. The CPU then stops driving the bus andreturns a acknowledge signal to the DMA controller. Now the DMA controller drives thedata and memory bus as if it was the CPU. When the transfer between the peripheral andmemory is done, the DMA sends a signal to the CPU that the transfer is finished. TheDMA-controller itself never processes any of the data being transfered. It only directs thedata to the requested address.

Figure 9: DMA structure

2.8 Room correctionTo optimize the performance of a loudspeaker in a arbitrary room one must take in toconsideration that the sound coming directly from the loudspeaker will interfere with thesound that gets reflected from the surrounding surfaces. This will alter the sound and thee�ect is referred to as unwanted convolution. To minimize the e�ect of this phenomena, aprocess of deconvolution is performed[13]. The main goal of the deconvolution is to recreatethe signal that existed before being altered by the room acoustics. In its most basic form, thedeconvolution process can be understood by examining the LTI-system in figure 10 wherex(t) is the system input signal and z(t) is the output signal. H(Ê) is the system transferfunction, which is responsible for the unwanted convolution and F (Ê) is the deconvolutionfilter also referred to as inverse filter. A perfect deconvolution filter satisfies the followingequations !

x(t) · f(t)" · h(t) = z(t) Where z(t) = x(t) (13)

16

Figure 10: LTI-system with a deconvolution/inverse filter

If assuming that both the loudspeaker and listener will remain in the same exact location,the room can be regarded as a linear and time invariant system which is characterized byan impulse response h(t). Many acoustical parameters can be derived from the impulseresponse, therefore there is of great importance that the measurement made to get hold ofthe impulse response is done with high accuracy. The most common approach in preformingthe impulse response measurement is to apply a known input signal and measure the outputof the system. There exist many di�erent choices when it comes to which input signal anddeconvolution technique to use. The application described in this report uses the methodscalled Exponential Sine Sweep and Pink noise. The pink noise method uses an input signalwhich have a power spectral density that is inversely proportional to the signals frequencyThis section will describe the Sine sweep method very briefly. A more in depth descriptioncan be found in [11]. The input signal used in this technique is based on the followingequation

x(t) = sin

Q

a Ê1T

ln(Ê1Ê2

)(etT ln( Ê2

Ê1) ≠ 1)

R

b (14)

Ê1 = 2fif0 and Ê2 = 2fifs where f0 and fs is the start respectively stop frequency of thesweep and T is the the sweep length in seconds. Figure 11 shows a part of the input signalwhich is used in the application described in this report.

17

Figure 11: Representation of the logarithmic sweep used in the application. The graphis zoomed to better show how the frequency increases. The actual frequency interval is10-24kHz

There are di�erent approaches regarding the design of the inverse filter f(t). The most trivialapproach to obtain a filter is to first reverse and then delay the logarithmic sweep. The inversesignal must also be scaled to obtain a linear frequency response. The delay is necessary tomake the inverse filter causal. After constructing the inverse filter this can be used to obtainthe impulse response of the room which is then used to design a room correction filter[12].As mentioned, this is a trivial approach. DLCT uses a more sophisticated method whichis described in [16]. To be able to do impulse response correction a non-causal part of thefilter is needed. For computational e�ciency, an IIR filter would be ideal. The problem isthat IIR filters has an infinite impulse response, and an infinite amount of samples wouldhave to be delayed to create the non-causal part with IIR. An IIR filter can however modelthe causal part of the filter with a combination of a FIR filter that models the non-causalpart.[14] This makes the filter more computational e�cient than if only a FIR filter was tobe used.

2.8.1 System requirements for room correction

There are certain requirements which a filter intended to be used for digital room correctionmust meet. These requirements widely depends on which type of filter topology that is used.Figure 12 shows some guidelines for the type of filter topology used in this application. [15]

18

Figure 12: Table showing filter requirements per channel at a sample rate of 44.1kHz

2.9 DLCTThe Dirac Live Calibration Tool is a software intended for use in calculating room correctionfilters. These filters are then applied to audio through Dirac Audio Processor or a standalonedevice such as an AVR(Audio/Video Receiver). The measurement process involved whencalculating a filter for a standalone device is divided into six pages that need to be interactedwith.

1. The first page contains information about the sound system supported by the device

2. The second page is the mic configuration page. Here the user selects a calibratedmicrophone to be used during the measurement process.

3. On the third page the input and output levels can be tested to check if the volume issu�cient or not too loud.

4. On the fourth page the measurement process takes place. A stimuli is played at onespeaker at a time and DLCT records the output on which the deconvolution then takesplace.

5. At the fifth page the filter design takes place. Here the user can change how theresulting impulse response or magnitude response will look like.

6. At the sixth and last page the filter can be downloaded to the standalone device onwhich the filter will be applied to the audio played.

The protocol used in communication between DLCT and the endpoint device is DiracIO overTCP/IP. DiracIO uses RPC’s(Remote Procedure Call) to call on functions implemented onthe endpoint device. The RPC framework used is gRPC which is an open source RPC de-veloped at Google. gRPC uses Google’s protobuf which is a protocol for serializing data andused to create callback functions in di�erent programming languages supported by protobuf.These callback functions are written and defined in protobuf and compiled to the desiredlanguage with di�erent plugins.

19

3 Software structure3.1 SectionsAll parts of the software was written in C and compiled with the SW4STM32 toolchainwhich has a GCC-based compiler. In all of the initializations of the peripherals the HALlibrary was used. The HAL is a hardware abstraction layer written by ST to simplify thedevelopment and portability of their devices. With the HAL, there is no need for the devel-oper of an application to deal with the hardware registers of each peripheral, instead eachperipheral is contained in di�erent software structures where the configuration takes place.The initialization function then takes these structures as inputs and configures the periph-erals as described in the structures.

The application is developed without any operating system. The software is structuredin a branchlike manor with four main sections. The four di�erent functionalities for thesesections are.

1. Handling the input, output and signal processing of audio.

2. Handling the playback of the stimuli.

3. Communicate with the user and external software via the ethernet interface.

4. Check for GUI(Graphical User Interface)-input and update.

These sections will be covered in more depth below. The overall structure of the softwarefollows the flow of figure 13. The program is running in an infinite loop where it for everyiteration calls functions that handles the initialization and startup of the tasks mentionedabove and the functions which handles the input from the GUI and the TCP-stack.

20

Figure 13: Flowchart of the overall structure of the main function

Which of the functions responsible for the audio loop back and the playback of stimulisound that will fully execute depends on input received from the Ethernet interface. Figure14 shows the code structure for these three functions.

Figure 14: Flow chart showing the overlying structure of the functions Init Audio, Init Sweepand Init Noise

21

3.2 Audio loop sectionThe part of the software which takes care of the audio loop and the signal processing ismainly developed around two libraries. These two are the STM32F7-BSP(Board SupplyPackage) which is a library containing a set of API’s which are related to the externalhardware components such as the audio codec. The other one is the CMSIS-DSP(CortexMicrocontroller Software Interface Standard) library which handles the computational workin the audio loop, such as conversion between data types and the signal processing. The codeis a further development of an example provided by ST. The example showed how to recordaudio from the two microphones located on the board and output it through the 3.5mmAUX output jack. The main modification of the example was to change the code so thatthe recorded audio was to be taken from the 3.5mm AUX input instead of the microphones.This change introduced the possibility to output audio from an external device and pass ittrough the STM32F7 Discovery board.

3.2.1 Collection of input data

Collecting and storing samples is vital to be able to do any signal processing. The first thingthat has do be done is to convert the incoming sound from analog to digital. This is doneby the W8994 audio codecs internal ADC(Analog-To-Digital Converter) which operates asample rate of 48kHz. The codec and MCU transfers the audio data over I2S. The I2Speripheral then stores the samples in a bu�er located in the SDRAM. The transfer betweenthe I2S and the memory is done with DMA. When one half of the bu�er is filled, the DMA-controller alerts the MCU by calling an ISR which collects the new samples which are readyto be processed. The full audio loop process is shown in figure 15. This technique of havingthe codec write to one part of the bu�er while the other part of the bu�er is send to theMCU for processing is refereed to as double bu�ering. This is a method to help preventoverwriting samples that haven’t yet been sent to the MCU for processing.

Figure 15: Flow chart showing the audio loop process

22

3.2.2 Signal Processing

The signal processing is operating in blocks, which means it processes a block of data ata time. This method of using block processing reduces the overhead function calls to theDSP-functions. Some algorithms like the FFT are by default implemented in such a waythat block processing is the only option. The flow of the function is as follows

1. Convert the processed samples from the previous iteration from floats to int16(Signed16 bit integer) and send the bu�er to SDRAM

2. Collect the new samples from the SDRAM and convert them from int16 to floats

3. Split the samples into two separate bu�ers, one for each channel.

4. Do the filtering on each of the channels.

5. Interleave the samples from each channel back into a single bu�er.

A block diagram of the process is shown in figure 16. The conversion from int16 to float andvice versa is done because of the fact that the samples collected from the SDRAM are 16-bitintegers whereas the DSP-algorithms are written to work with floats. Arguments why thecode is designed to work with floats is based on the statements in the theory section aboutfloating point vs fixed point arithmetic. The conversion is made with functions from theCMSIS library. The channel splitting is necessary to be able to apply filters with di�erentfilter coe�cients on the two channels. The filter function heavily depends on the CMSIS-functions for FFT/IFFT, complex multiplication and IIR filtering. As earlier mentioned theFFT/IFFT-functions operates on blocks of data. A section for performing overlap add isimplemented in the filter function. Why this is important is covered in the theory sectionabout FIR-filters.

Figure 16: Flow chart showing the signal processing function

23

3.2.3 Outputting processed samples

The process of playing the processed samples from the 3.5mm output follows a similarstructure as for collecting an incoming signal. The processed samples gets sent from theSDRAM to the I2S peripheral with DMA and then to the codec. The codec uses its internalDAC (Digital-to-Analog Converter), which is running at a 48kHz sample rate, for convertingthe samples to a continuous analog signal. The codec then routes the analog signal outthrough the 3.5mm AUX output.

3.3 Stimuli playbackThe section which is in charge of outputting the stimuli is structured in a similar way as theaudio loop, but instead of having the input signal coming from an external audio source theinput data is fetched from a µSD-card. The samples which represent a frequency sweep andpink noise are stored as individual WAV-files on the µSD-card. To have the MCU interfacingwith a µSD-card a file system is required. The file system used is FATFS(File AllocationTable File System) which is an open source FAT/exFAT file system module targeted specificto embedded platforms. The system is platform independent and very well documentedwhich made it a good choice for this application. The frequency sweep was generated froma code provided by Dirac Research and the pink noise was generated in MATLAB.The WAV-file format is a good choice in this application due to the fact that it can storeuncompressed PCM-data. Because of this, no extra decoding has to be done. The WAV-filestores the bytes in little endian format, which means that the data is stored with the leastsignificant byte first. The application reads data in big endian, so a conversion between thetwo formats is made in the software.Similar to the Audio loop, the stimuli playback is based around two functions, one whichinitializes the codec and I2S and opens the desired file, and one for updating the outputbu�er which the codec gets its data from through the I2S peripheral. In the same manner aswith the audio loop, the initialization function gets called in the main loop while the functionresponsible for updating the output bu�er gets called in an ISR(Interrupt Service Routine).The ISR is triggered when new samples must be sent to the SDRAM. The playback functionis handling the bu�er with the same technique as the audio loop, i.e double bu�ering.

3.4 Ethernet CommunicationThe communication between the MCU and the computer with DLCT running is TCP/IPover ethernet with the MCU as a server and the DLCT as a client. The MCU used hasan ethernet MAC peripheral which handles all the incoming data in hardware and makesthe communication faster. The MAC is connected to a PHY chip which is the link to thephysical layer, in this case the RJ45 connector. The MAC and PHY communicates over thestandard protocol RMII for control and data transfers. The initialization of the ethernetperipheral and communication with the PHY was done with the HAL libraries. To be ableto interpret the data a TCP/IP stack was needed. The lwIP(Lightweight Internet Protocol)

24

stack was chosen because the target of it in embedded systems, the wide use and the richdocumentation of it. The application was built from an application note written by STtargeting the stm32f4 microcontroller with another PHY chip that used MII communicationinstead of RMII communication. The main di�erence between the MII and RMII protocolsis the number of data lines used. The internal registers of the PHY’s is very similar andthe only thing that distinguishes between them is vendor-specific registers which was notneeded. This made the use of the application note suitable.

Figure 17: Flow of the lwIP implementation running in stand-alone mode.

The implemented lwIP stack runs raw (i.e without any operating system). In this rawenvironment the stack is based around continuous software polling to examine if a packetis received. As shown in figure 17, if a packet is received it will be copied from the bu�ershandled by the ethernet drivers into the lwIP bu�ers. The packet then gets handed to theLwIP stack and gets processed.

25

3.5 GUIThe GUI was from the beginning not a part of the project but since the development boardused had a touch display a simple GUI was implemented. The touch LCD screen wascontrolled with the MCU from functions in a BSP packet from the STM32CubeF7 software.The GUI consists of a volume slide bar and a play/pause button which mutes and unmutesthe audio when touched. The volume slide bar changes the bar level and volume level whenthe display is feeling a touch in the area of the bar. Since the play/pause button togglesstate each time touched, a simple state machine was implemented to keep track of the buttonstate. Figure 18 shows a picture of the GUI.

Figure 18: A picture of the simple GUI with the button in the playing state

4 Results and Discussion4.1 Signal ProcessingThe final system is capable of performing signal processing powerful enough to be suitablefor room correction. The system is able to filter audio at 48kHz with a filter consisting ofa FIR filter with 899 taps and 26 IIR biquad sections. The length of the FIR-filter andthe number of biquad sections meet the requirements presented in figure 12. The numberof taps in the FIR filter was chosen to meet the requirement N = M + L ≠ 1, where Nis the FFT size, L is the audio bu�er size and M is the number of FIR taps. With anaudio bu�er size of 1150 samples and a FFT size of 2048 points, a FIR filter with 899 tapswas required. Doing the FIR-filtering in the frequency domain proves to be vital for havinga filter which meets the requirements for a room correction filter of the specific topologyused in the application. Figure 19 shows the di�erence in execution time between doing thefiltering in frequency domain versus doing it in time domain. The measurements were madewith a Saleae logic 8 logic analyzer. Figure 19 shows that it takes approximately ten timesas much time to perform the filtering in the time domain. The large time di�erence wasexpected due to the computational complexity of the two algorithms. The frequency domainfiltering introduces a bit more complexity to the code. The complexity comes from the needto implement some code that handles the circular convolution introduced by filtering in the

26

frequency domain. The method that this application uses is the overlap-add method whichis presented in theory section regarding FIR-filters. Another method to handle the circularconvolution called overlap-save could also be used, but the implementation of the overlap-addwas more well fitting with the rest of the code responsible for the signal processing.

Figure 19: Graphs showing the di�erence in execution times between two 899 taps 48kHzFIR-filters.

4.2 Software structureThe resulting overall software structure runs well. There were discussions regarding makingthe software more interrupt driven instead of the polling in the main functions. This is dueto the fact that polling is generally seen as an ine�cient method. But it seems that theine�ciency is not a problem in this application. In the early time of the project there weresome ideas about using a real time operating system. This would make it possible to utilizefunctions like threads and scheduling. It would also simplify the process of implementingthe TCP/IP-stack which took a great deal of time.

4.3 StorageThe system is able to store and read data from a µSD-card by using the FatFs file system.The system is also capable of playing WAV-files stored on a µSD-card. This feature is usedto play the stimuli signals which are intended to be used as excitation signal under thedeconvolution process for obtaining the impulse response of a room. The µSD-card can alsobe used to store filter coe�cients.

4.4 CommunicationThe communication between the system and DLCT was never fully established becauseof lack of time. However, a TCP/IP server was implemented which made it possible tocommunicate over Ethernet. This was used to communicate with a remote client which wasable to set and reset the flags responsible for playback of the stimuli signals and the audioloop. The gRPC with the callback functions that DLCT required was not implemented.Since this project was written in C and protobuf used by gRPC does not have any C basedAPI’s(Application Programming Interface), gRPC and the callback functions would have to

27

be implemented in C++. This would require some type of wrapping of the functions to makeit possible to call them from the code written in C. Another possibility is to make use of thegRPC core code which is written in C, this approach requires a solid understanding aboutgRPC which was out of the scope of this thesis. Also a lot of the functionality would be lostby using the C core. No investigation regarding the memory requirements for implementinga fully functional gRPC system was done. So no conclusion regarding the possibility ofcommunicating with the DLCT with gRPC could be drawn. There exists ideas regardinghow to overcome the need of porting a gRPC interface to the application and instead use theimplemented interface in combination with network sockets. This would require some typeof middleware on the remote client side. The middleware would be responsible for handlingthe parsing of messages being send between the application and DLCT.

5 ConclusionThe MCU based audio system is able to process two channels of audio data at a sample rateof 48kHz, which eliminates the need of a dedicated digital signal processor in this applica-tion. Since the filter implementation was not tested with filter coe�cients calculated withDLCT, no conclusions can be drawn whether the implementation is flawless or not, but asolid foundation to further work on the project exists. The ethernet communication withTCP/IP works as expected but more work has to be done if gRPC is to be used in the com-munication between the system and DLCT. Establishing the communication between DLCTand the application with gRPC could probably serve well as an independent thesis project.There are many di�erent approaches on how to implement the overall software structure andif to use a real-time operating system or not, all these factors makes the project suitable forfurther work and investigation.

There were some troubles with the development board. One of the issues was that it was notpossible to use the on-board debugger due to a silicon bug. Due to this problem, debuggingwas done by using the on-board LED, GPIO’s and UART. The development process couldprobably have gone faster if the debugger were to function properly.

The fact that the authors had no previous experience with any STM32 or other ARMbased device before made the start up time a bit longer than one would have hoped for. Tohypothetical further workers, the authors strongly recommend to have some experience withdevelopment on some STM32 platform and solid knowledge of the C-language.

28

References[1] D Goldberg What Every Computer Scientist Should Know About Floting-Point Arit-

metic (Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo AltoMarch 1991)

[2] Gene Frantz, Ray Simar Comparing Fixed- and Floating-Point DSPs (Texas Instru-ments Incorporated Dallas, Texas 2004)

[3] FRED J. TAYLOR Digital Filters: Principles and Applications with MATLAB (JohnWiley and Sons, Incorporated 09/2011)

[4] Sen M. Kuo, Bob H. Lee, Wenshun Tian Real-Time Digital Signal Processing : Funda-mentals, Implementations and Applications [Page 102, 222] third. Ed. (Wiley, 2013)

[5] Julius O. Smith III Review of the Discrete Fourier Transform: FFT Convolution vs.Direct Convolution (CCRMA, Department of Music, Stanford University)

[6] Dr.Deepa Kundur Overlap-Save and Overlap-Add (University of Toronto)

[7] Steven W. Smith The Scientist and Engineer’s Guide to Digital Signal Processing [Ch.19] second ed. (California Technical Publishing , 1999)

[8] Nigel Redmon Biquads (http://www.earlevel.com/main/2003/02/28/biquads/, 2003)Information acquired 2017-07-15

[9] Philips Semiconductors I2S bus specification (February 1986)

[10] Forouzan, Behrouz A TCP/IP protocol suite (McGraw-Hill Forouzan networking series,2010, 4. ed.)

[11] Angelo Farina Simultaneous measurement of impulse response and distortion with aswept-sine technique (Dipartimento di Ingegneria Industriale, Universita di Parma,February 1, 2000)

[12] TAN Guy-Bart, EMBRECHTS Jean-Jacques, ARCHAMBEAU Dominique Comparisonof di�erent impulse response measurement techniques (Sound and Image Department,University of Liege, Institut Montefiore B28, Sart Tilman, B-4000 LIEGE 1 BELGIUMDecember 2002)

[13] Malcolm J. Crocker Handbook of Acoustics(John Wiley and Sons Inc, 9 March 1998)

[14] Mathias Johansson On Room Correction and Equalization of Sound Systems (DiracResearch AB)

[15] Internal Document (Dirac Research AB)

29

[16] Lars-Johan Brännmark and Anders Ahlén Spatially robust audio compensation based onSIMO feedforward control (IEEE Transactions on Signal Processing, vol. 57, no. 5, May2009.)

30

development of a microcontroller based dlct end-point device1152628/fulltext01.pdf ·...

Documents