[ieee 2009 third uksim european symposium on computer modeling and simulation - athens, greece...

An Ultra-Low Power Real Time Embedded System for Map Generation Using Ultrasound

SensorsPrabhakar Mishra†, H N Shankar††, Pai Dipti G*, Mudit Mathur**

†Author for correspondence, Department of Telecommunication Engineering, PESIT, Bangalore††Department of Telecommunication Engineering,*Department of Computer Science and Engineering,

**Department of Electronics and Communication Engineering, PESIT, [email protected]

[email protected]

Abstract— In this paper, a low power structure for acquiring the range reading from a set of ultrasound sensors and calculating the probability of occupancy of cells in the region under the sonar scan is proposed. The architecture considerably lowers the switching activity at various stages of acquiring and processing the sensor data to provide updates of cell occupancy values for rapid in-motion mapping for robot navigation. Power reduction at various architectural levels have been attempted and simulation results show that the proposed architecture lowers the switching activity by over 70% and reduces the power consumption up to 60% as compared to the conventional fixed-point processor based architecture.

Keywords— Ultrasound sensors, occupancy grids, Autonomousrobot navigation, switching activity reduction, low-power design.

I. INTRODUCTION

Autonomous robot navigation in unknown and unstructured environments is central to many industrial and research applications. Autonomous navigation requires mapping, localization, obstacle avoidance and path planning capabilities. Mapping involves creation of a world model of the environment around the robot using sensory data provided by the robot. Previous work in this area includes the use of ultra-sound sensors for map generation using occupancy grids [1] [2]. Many mapping techniques use a probabilistic approach to detect the presence or absence of obstacles in the environment as perceived by the robot. The ultrasound sensor’s range data is divided into cells and probability functions are applied on them to ascertain if the cell is empty or occupied. The map is incrementally updated based on Bayesian estimation procedures to improve the map definition.

The sensor array consists of 24 transducers arranged as a ring, each spaced 15º apart to cover the entire 360ºpanorama around the robot. Each sensor has a beam width of 30º and a maximum range of 20 feet. The sensors in close vicinity are fired sequentially to avoid interference and each sensor reading is converted into a probability profile.

The sonar beam is divided into two parts, empty region

and somewhere occupied region [3] [4].The final sonar map is a two-dimensional array of cells with values ranging between (0, 1). The values below a certain threshold are considered probably empty and the values above it are considered probably occupied. Fig. 1 shows the sonar model and associated parameters.

Fig. 1 The Sonar Model

R is the range measurement returned by the sonar sensor ��is the mean sonar deviation error��is the beam apertureS (x, y, z) is the position of the sonar sensor ��is the distance between P and S��is the distance between the main axis and SP

In the empty region, the probability is calculated using the formula

PE(X, Y) = Er (�) * Ea (�)

Er (�) is the estimation that the cell is empty based on its range from the sensor. The closer it is to the sensor, the more likely that it is not occupied.

Ea(�) is the estimation that a cell is unoccupied based onthe difference in angle between it and the central beam of the sonar, �.

2009 Third UKSim European Symposium on Computer Modeling and Simulation

978-0-7695-3886-0/09 $26.00 © 2009 IEEE

DOI 10.1109/EMS.2009.87

578


978-0-7695-3886-0/09 $26.00 © 2009 IEEE

DOI 10.1109/EMS.2009.87

580


978-0-7695-3886-0/09 $26.00 © 2009 IEEE

DOI 10.1109/EMS.2009.87

579

Cells closer to the central beam of the sonar are more strongly updated as empty than cells near the extremities of the beam.

The probability that the cell is occupied is calculatedusing the formula,

PO(X, Y) = Or (�) * Oa (�)

Or = 0 otherwise.Or (�) is the probability that the cell is occupied based on its range from the sensor. The closer it is to the range reading received, the higher the probability that the cell is occupied.

Oa (�) is the probability that the cell is occupied based on the difference in angle between the obstacle and the central beam of the sonar. The closer the cell is to the centre of the beam, the more likely it is that the cell is occupied.

These probability values are calculated and thresholding is applied wherein the values below a certain upper bound and above a certain lower bound are treated as the end points of the range of probability values.

Our robot- freelancer

Fig. 2 Our robot freelancer.

Freelancer (fig. 2) is our multi-sensor robot. It measures 65cm×45cm×40cm with ground clearance 8cm. It has a four-wheel differential drive. It has (i) provision for up to 24 sonar sensors, Devantech SRF-04/07/08, currently it has one at each corner for detection and ranging of obstacles up to 3m; (ii) 15 infrared LED sensor pairs to detect obstacles in close proximity of the robot to facilitate guiding through a clutter of closely spaced obstacles; and (iii) a Devantech CMPS03 digital compass for precise orientation. The Logitech webcam seen in the front has been mounted very recently. Freelancer has a distributed control architecture with an IRFZ44N MOSFET based full-bridge chopper drive in Class E configuration and driven by ATMEGA 88 microcontroller. The top level processing unit is built around an AMD Athlon 2600+ processor and an ASUS A7S266-VM/U2 mother board. Freelancer is powered by two sources – one for drive and

one for control. It can take a payload of 5.8 Kg with a top speed of 30cm/s.

Fig. 3 Schematic of the control architecture for freelancer

The overall schematic of the control architecture is shown in fig 3. [5] The Intermediate Processing Unit (IPU) takes the sensor inputs and generates different descriptions of the world model; these are the inputs to different algorithms as per their individual requirements. In addition, the IPU generates an estimate of the obstacle density in the polar reference frame. This in turn is used to decide whether or not a path with sufficient clearance exists within a specific range and in a specific orientation in relation to the current state of the robot. The inputs to the fuzzy controller are the speed commands and the steering commands of the individual algorithms along with the polar obstacle density. The fuzzy controller outputs the speed and angle control. This is transformed into PWM signals for the individual drive motors. The details of these are omitted here.

II. PROPOSED SOLUTIONSensor data acquisition and evaluation of the probability

values imposes significant overheads on the processor time. Hence a single purpose fixed-point processor supporting multi-channel ultrasound sensor interface and dedicated memory block is used in the present system. [6]

In this method, the processing sequence of one sonar scan includes the following steps.

� The sonar sensor returns a pulse whose width is proportional to the range of the obstacle.

� This pulse is used to enable an 8-bit counter. �� The range value and the constants used in the

evaluation of the probability are stored in a set of registers. ��

� A fixed-point ALU with separate instances of adder and multiplier calculate the probabilities and a finite state machine sequences the flow of operands. ��

� Two RAM areas hold the values of probability of empty and occupied cells which is used by the main �

579581580

processor for generating and updating the map of the environment as perceived by the robot.

The architecture of the custom fixed point processor based system is shown in Fig. 4. In robotic applications, as in the case of real time portable systems, power consumption is a major design constraint which impacts Ampere hour usage of the battery and thus the reliability and un-interrupted operation of the robot. We have identified 4 major areas of power dissipation in the system.(a) Activity in the binary counter. (b)Power dissipation in the multiplier. (c) Activity due to RAM access. (d)Subtraction for thresholding.

In the previous scheme, a binary counter is used for acquisition of sensory data from each of the 24 sensors mounted on the robot. The sonar sensor has a maximum range of 20 feet and an 8-bit counter is used to obtain the binary number corresponding to the range of the obstacle as detected by the sensor. The clock for the counter is so chosen that for the maximum range of the obstacle detected, which results in the longest pulse width returned by the sensor, the counter does not overflow. Hence, for the longest range, the counter toggles through 28 statesand for the minimum range it toggles through 12 states. For all values of the range between the minimum and maximum, the counter toggles through 12-28 states. This leads to significant switching activity for calculation of range of the obstacle.

In the proposed architecture, we use a low-power ring counter instead of the binary counter by keeping the resolution in range measurement as 1 feet and dividing the sensor space into cells which are spaced 1 foot apart on the range axis. Hence, a 20-bit ring counter is used to encode the range measurement by the position of the ‘1’ bit. The counter is initialized to a reset value of 00000000000000000001 which corresponding to the least measurement of 1 foot. The pulse returned by the sonar sensor is used as the enable to this counter and the clock frequency is so chosen that for every increment in one foot, the ‘1’ bit shifts one place left. Thus the position of the ‘1’ bit in the ring counter directly gives the range returned in feet. In this architecture, there is no toggle in the counter for the minimum range measured and for the maximum range measured, the toggles are 40. Thus, the average case toggle is brought down significantly as compared to the conventional 8-bit counter.

Fig. 4 Custom single purpose fixed-point processor architecture for data acquisition and calculation of probability of empty and occupied cells..

By removing or minimizing any of the above stated sources of power dissipation, the overall power consumption can be lowered. In this paper, we propose an ultra-low power real time architecture where we have attempted at eliminating a few of the sources and reducing the power consumption in the others.

III. IMPLEMENTATION

To derive the low-power architecture, we propose methods to eliminate or reduce the power consumption in areas identified in the previous section.

(a) Reduction of activity in the counter

Fig. 5 Low-power ring counter with block size being 4.

Fig. 5 shows the low power ring counter with block size 4. The ring counter is further optimized for power dissipation by partitioning the ring counter into 5 blocks of 4 bits each and a clock gating circuit is employed to shut off the clock to those blocks of the ring counter which are not in the immediate neighborhood of the 1 bit. To determine which block must be clocked and which must not an entry-exit criteria is used. [7] For each block, an Entrance and Exit criteria is used to enable the clock. The clock is given to the block only when the Entrancecriterion is met and till the Exit criterion is not.

580582581

During this period, the clock to all the other blocks is shut-off. Thus, the switching activity in the flip-flops and the clock transition are minimized.

In our architecture (fig. 6), we have used 2 such ring counters for calculating the probability values of the cells in the range of each sensor. The first ring counter (R1) contains the range R from the sensor output. The second ring counter�� 1and the range measurement returned by the sensor, R.

Fig. 6 Proposed low-power architecture acquiring range data from 6 sensors in parallel. Here, sensor 1 and 6 show a range of 1 feet and 19 feet respectively.

(b) Elimination of the multiplier for computation of probability values.

By observing the probability functions for calculation of probability of empty and occupied cells, we notice that the varying quantity in the function Er�� ,R. The other parameters remain constant. Also, to compute the probability of occupied cells, PE(x, y), Er�� a��are to be multiplied.To eliminate the multiplier, we have maintained two lookup tables which hold the pre-computed values of Ea��R2 and Ea(��2. The contents of the lookup tables are de-normalized with respect to R2. Every row of the first lookup table (L1) contains four set of values corresponding to a fixed R. The � values vary as -7.5º, 0º, 7.5º, 15º. The second lookup table (L2) �� . There are 20 such rows corresponding to each discrete value of the range measurement. A row of L1 is enabled by the output of a ring counter which corresponds to the appropriate value of R. Similarly a row of L2 is enabled by the output of R2.

Hence, the multiplication for the calculation of probability of empty cells gets completely eliminated. We assume that all cells in �� !�� are occupied.

(b) One-hot encoding for RAM addressing In the previous scheme, after computation of the

probability values, an address generation unit generates the address of the RAM location where it needs to be stored and the value is transferred to that location over an 8-bit data bus. Power is dissipated each time a write operation occurs due to the effective capacitance of the RAM cells.

We use a one-hot encoding scheme to transfer data from the RAM location to the subtractor and then the thresholding unit thereby reducing the power dissipation. The ‘1’ bit of the ring counter enables a write operation from the corresponding RAM location to a register in the thresholding unit.

(d) Elimination of subtraction for thresholding using pre-computation architecture

Once the probability values of empty and occupied cells are obtained, thresholding is applied, wherein the values greater than a threshold are assumed to be occupied (probability value equal to 1) and the values below it are considered to be empty (probability value equal to 0). To achieve this in a low power manner, a pre-computation architecture is designed so as to minimize the power dissipation in thresholding. In the previous scheme, a subtracter is used, the inputs to the subtracter are the threshold value and the probability value to which thresholding is to be applied.

In our proposed architecture, a 4 bit precomputation unit has been designed, the inputs to which are the more significant 4 bits of the probability value to which thresholding is to be applied and the threshold value. If this comparison is inconclusive, we use the lower significant bits to conclusively assert if the probability value is greater than or less than the threshold. The block diagram of the pre-computation unit is shown in fig. 7. We observe that in a majority of cases, the probability value can be computed successfully with the more significant four bits and hence switching activity due to subtraction is minimized.

Fig. 7 Precomputation architecture for thresholding unit.

581583582

A parallel architecture has been designed; it takes �� values being -7.5º, 0º, 7.5º and 15º. It applies thresholding on the probability values corresponding to the �� stores it in a RAM area dedicated for it. The threshold value is stored in a register (input B in fig. 8 is the threshold value) and the probability value is shown as A. The circuit checks whether A<B, A>B or A=B. For the latter two cases, the probability value is 1, i.e. the cell is occupied if the probability value is greater than or equal to the threshold value. In the former case, the probability value is 0, i.e. the cell is empty.

The proposed architecture was implemented for a maximum range of 8 feet. Accordingly the size of the ring counter is 8 bits. For power optimization, the counter is divided into 3 blocks of 3 flip-flops each wherein the output of the last flip-flop is unused. The circuit diagram of the implementation of the ring counter is shown in fig. 8.

Fig. 8 Implementation of low-power ring counter.

The two lookup tables mentioned above store the pre-computed values of Ea�� 2 and Ea(��2 forvarying values ��. The lookup table hence contains 8 rows corresponding to the 8 distinct range readings. The implementation of one such row is shown in fig. 9.

Fig. 9 Implementation of a row of the lookup table.

The row is enabled by a bit of the ring counter R1 through a tri-state buffer; the first 8 bits correspond to �� "�� 0º. We notice that the value of Ea��2 and Ea(��2 is identical for � values +7.5º and -7.5º, the next 8 bits correspond to these values �� . The remaining 8 bits contains the threshold value for a given range R. This threshold value is set to R2/2. The registers employed are implemented with a clock gating structure such that they are clocked only when being loaded with value. Since the values co�� #$%&º and -7.5º are identical and the value corresponding to ��equal to 15º is always zero, 24 bits in each row of L1 and 16 bits in each row of L2 suffice. The selected values from the two lookup tables are given to two 8-bit subtracters in parallel. The output of each of the subtracters and the threshold value stored in the least significant 8 bits of L1 are fed to the thresholding unit to calculate the final probability values of the occupancy cells.

IV. RESULTS

The proposed architecture was implemented on VirtexE field programmable gate array and simulated using Xilinx tool. The schematic for the proposed architecture was drawn using Xilinx schematic tool, and the results were validated using ISE Simulator, in built in Xilinx9.2i.The various performance metrics like the number of slices used, maximum combinational path delay, register and gate count were extracted from various reports generated after synthesis and implementation of the design. These values are tabulated in Table I and Table II.

TABLE IValues of parameters on implementation

Parameter CountSlices 114LUTs 39Register count 485Total equivalent gate 4324count

TABLE IIMaximum combinational path delay on implementation

Parameter Value(ns)Maximum combinational 8.016path delay

Full custom layout of this architecture is under progress and will provide a more comprehensive estimate of the power consumption by proposed architecture.

582584583

CONCLUSION

In the present work we have demonstrated the implementation of an ultra low-power embedded system which computes the probability values of cells for a complete 360º panorama. Since in an FPGA power consumption is dependant on logic utilization, an attempt has been made here to minimize the logic utilization and routing resources. Routing resources consume 45% of FPGA power, and LUTs consume 68% of total power for logic[8] on FPGA. Here, use of look up tables has eliminated real time multiplication. In the thresholding unit as described in[6], the subtractor has been replaced by a MSB comparator.A ring counter has been used instead of a binary or gray counter whose count is decoded for distance measurement. This requires lesser logic and burns lesser power[9]. Byeliminating the need to store probability values of all 2 of the 4 cells , for a given delta, further logic consumption has been reduced, leading to reduced power consumption. The number of slices in this architecture is 114 which is 82.6% lesser than the architecture in [6].

ACKNOWLEDGEMENTWe wish to acknowledge the support provided

by PES Institute of Technology, Bangalore, India for carrying out this work. We also acknowledge the excellent support provided by the Centre for Intelligent Systems for completion of this work

REFERENCES

[1] Konolige.K. 1997. “Improved Occupancy Grids for Map Building” Autonomous Robots4(4) 351-367

[2] Alberto Elfes, “Using Occupancy Grids for Mobile Robot Perception and Navigation”, Vol. 22, Issue 6, IEEE Computer Society, June 1989.

[3] A.Elfes, “Sonar-Based Real World Mapping and Navigation”, IEEE J.Robotics and Automation, Vol. RA-3, No. 3, June 1987.

[4] H.P.Moravec and A.Elfes, “High-Resolution Maps from Wide-Angle Sonar”, Proc. IEEE, CS Press, Los Alamitos, Calif., March 1985.

[5] Prabhakar Mishra, H.N.Shankar et. al, “A Fuzzy Controller for a Multi-Sensor Based Autonomous Robot Navigating in an Unknown Environment”, IEEE International Conference on Signal and Image Processing, ICSIP, Hubli, India, Dec 2006.

[6] Prabhakar Mishra, H.N.Shankar, B.Rajeshwari, Pai Dipti G, “A Time-Power Efficient Fixed-Point Single Purpose Processor for Mapping Using Ultrasound Sensors”, Submitted to ICUMT 2009, St. Petersburg, Russia.

[7] M. Mottaghi–Dastjerdi, A. Afzali–Kusha, and M. Pedram,” BZ–FAD “A Low–Power Low–Area Multiplier based on Shift–and–Add Architecture”, IEEE Trans. onVLSISystems,2008.

[8] Vijay Degalahal, Tim Tuan, “Methodology for High Level Estimation of FPGA Power Consumption”, ASP-DAC

2005.[9] Hichem Belhadj, Vishal Aggrawal, Ajay Pradhan, and Amal

Zerrouki, Actel “Power-aware FPGA design”,Programmable Logic Design Line (02/17/2009).

583585584

[ieee 2009 third uksim european symposium on computer modeling and simulation - athens, greece...

Documents