data compression project presentation

33
Stockpile Resource Center – Aircraft Compatibility Summer Work Presentation: Graflab Data Compression Study Myuran Kanga August 12, 2010 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Upload: myuran-kanga-ms-mba

Post on 07-Aug-2015

115 views

Category:

Documents


1 download

TRANSCRIPT

  1. 1. Stockpile Resource Center Aircraft Compatibility Summer Work Presentation: Graflab Data Compression Study Myuran Kanga August 12, 2010 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energys National Nuclear Security Administration under contract DE-AC04-94AL85000.
  2. 2. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page ii
  3. 3. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 1
  4. 4. Introduction Myuran Kanga Bachelors Degree: Oklahoma State University Electrical Engineering Masters Fellowship Program: Rice University Electrical Engineering (Communications Specialization) Sandia: Meaningful Work/Projects: - Team Assimilation - Shaker Testing - Cadence ORCAD Electronic Design Software familiarization - ORCAD Installation/licensing procedure documentation - Courses Quality for Project Management, Engineering Excellence, Labview Core I, and Labview Core II - Graflab Data Compression Study/Evaluation Page 2
  5. 5. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 3
  6. 6. Project Overview Graflab Data Compression Study Page 4 Summary: Evaluation of three Data Compression Algorithms created by Dr. Samuel D. Sterns. Primary Investigator/Technical Project Lead: Myuran Kanga Key Personnel: Jerry Cap and Troy Skousen Biography: Author Compression Algorithms: Dr. Sam Sterns [1] - Electrical Engineer specializing in digital signal processing and adaptive signal processing - Distinguished Member of the Technical Staff at Sandia National Laboratories for 27 years. Retired in 1996. - Author/Co-author of 7 signal processing textbooks - Professor Emeritus at the University of New Mexico, involved with teaching/research at the university since 1960.
  7. 7. Project Overview Graflab Data Compression Study cont. Page 5 Project: Evaluation and interpretation of three data compression algorithms. - Algorithms labeled 2, 3, and 4 - Code written in Matlab - Each similar in nature - Algorithms implement additional and more sophisticated methods of compression - More complex algorithms said to require longer computational time but greater accuracy - Hope to utilize compression with GRAFLAB - GRAFLAB is a database, analysis, and plotting package used for data reduction, analysis, and archival purposes at Sandia.
  8. 8. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 6
  9. 9. What is Data Compression? Page 7 [2]
  10. 10. Data Compression Definition: The process of encoding information using fewer units of storage than an un-encoded representation of data, through the use of specific encoding schemes. [3] Data compression, or sometimes called source coding, is the process of converting input data into another data stream that has a smaller size, but retains the essential information contained within the original data stream. Page 8
  11. 11. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 9
  12. 12. Data Compression Implementations Page 10 - Compression is useful because it helps reduce the consumption of resources, such as hard disk space or transmission bandwidth. - With the interest and surge in environmental test data for the Surveillance Program, significant strains on computer storage resources will occur. - Archiving of environmental test data from legacy systems, including data for the Environment Test lab. - Familiar examples of data compressed files include .zip, .rar, .tar file extensions. [4]
  13. 13. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 11
  14. 14. Lossless vs. Lossy Compression Two forms of compression: Lossless and Lossy Lossless compression: - These types of algorithms usually exploit statistical redundancy to represent the users data more concisely without error. - Most real-world data has statistical redundancy - Example In English text, the letter e is much more common than the letter z. Similarly the probability that the letter q will be followed by the letter z is very small. Page 12
  15. 15. Lossless vs. Lossy Compression Lossy Compression: - Guided by research on how people perceive the data in question. - Used when some loss of fidelity is acceptable. - As an example, the human eye is more sensitive to subtle variations in luminance than to variations in color. Therefore, color complexity can be reduced to maintain the integrity of images, etc. - JPEG image compression works in part by rounding off some of this less important information. - Lossy data compression provides a method of obtaining the best fidelity for a given amount of compression desired. Page 13
  16. 16. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 14
  17. 17. Compression Algorithms Page 15 Compression 2 - Quantizes the data signal and packs the result into a sequence of bytes. Compression 3 - Predicts the quantized data and packs the prediction error into a sequence of bytes. Compression 4 - Said to provide the maximum compression - Encodes the prediction error into a sequence of bytes using adaptive arithmetic coding. [5]
  18. 18. Compression Algorithms cont. Page 16 Quantization - The process of mapping a continuous range of values by a relatively small set of discrete symbols or integer values. - Sampling occurs on a periodic basis to convert the continuous signal to discrete values. - Can by viewed as accumulating data in bins [6]
  19. 19. Compression Algorithms cont. Page 17 Linear Prediction [7] - Signal processing tool used in which future values of a digital signal are estimated as a linear function of previous samples in the data. - Time varying digital filter, excitation function, desired output y(n) - Finding the appropriate excitation function and filter coefficients to minimize the error of the predicted y(n) and original y(n). - Also called Linear Predictive Coding - Common application: - Speech compression - Transmit only filter coefficients (Hk) and excitation sequence x(n) - For extreme compression, only transmit filter coefficients and use a fix-frequency excitation voice-coder )( 1 0 0 )()( jnx N j M j b jjnya jny N j j nejnyny a1 )()()( N j j jnyn ay 1 ^ )()( )()()( ^ nnyne y
  20. 20. Compression Algorithms cont. Page 18 Arithmetic Coding [8] - Long data strings are represented by a single number, which is obtained by repeatedly partitioning the range of possible values in proportion to the probabilities of the data string. - Example string: DABDDB Symbol Part 1 Part 2 Freq. Product Total D 65 x 3 23328 A 64 x 0 3 0 B 63 x 1 3 x 1 648 D 62 x 3 3 x 1 x 2 648 D 61 x 3 3 x 1 x 2 x 3 324 B 60 x 1 3 x 1 x 2 x 3 x 3 54 25002 sFrequencieTotalDataCoded _ 2510023321325002 Part 1: - 6 digit string = Radix of 6 - Multiplied by index of letter A = 0 to D = 3 Part 2: - Multiply by frequency of accumulated product in symbol data
  21. 21. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 19
  22. 22. Evaluation Procedure/Analysis Page 20 Classical Waveform Compression Study: - Triangle Wave - Trapezoid Wave - Sine Wave - Sawtooth Wave - Hanning Window - Harmonic Sine Waves - Combined Sine Waves - Gap Analysis - White Noise - Sine Wave with Noise - Power Spectral Density - Square Wave - .wav File Waveforms created manually in individual m-files for predictability of vector arrangement in Matlab. Frequencies and signal durations are easily modifiable.
  23. 23. Waveform Examples Page 21 0 1 2 3 4 5 6 7 8 9 10 -5 0 5 Original Time (Seconds) Amplitude 0 1 2 3 4 5 6 7 8 9 10 -5 0 5 Decompressed Waveform Time (Seconds) Amplitude 0 1 2 3 4 5 6 7 8 9 10 -0.02 0 0.02 Difference Time (Seconds) Amplitude 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 50 100 Original Time (Seconds) Amplitude 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 50 100 Decompressed Waveform Time (Seconds) Amplitude 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 -5 0 5 x 10 -4 Difference Time (Seconds) Amplitude 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 -1 0 1 Original Time (Seconds) Amplitude 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 -1 0 1 Decompressed Waveform Time (Seconds) Amplitude 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 -2 0 2 x 10 -5 Difference Time (Seconds) Amplitude 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -1 0 1 Original Time (Seconds) Amplitude 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -1 0 1 Decompressed Waveform Time (Seconds) Amplitude 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -2 0 2 x 10 -5 Difference Time (Seconds) Amplitude Trapezoid WaveWhite Noise Gap AnalysisSawtooth Wave
  24. 24. Testing and Measurements Page 22 Implemented Analysis and Measurements: - Input and output data array sizes - Percentage accuracy of compression - Compression ratio - Relative computational time - Percent difference: Max. and Min. values of original and decompressed waveforms - Percent difference: Standard deviation value of original and decompressed waveforms - Percent error: Max. and min. values of original and decompressed waveforms - Percent error: Standard deviation value of original and decompressed waveforms - Root Mean Square values of original and decompressed waveforms - Normal values of original and decompressed waveforms - Difference in RMS values - Difference in Normal values
  25. 25. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 23
  26. 26. Compression/Decompression Example Page 24 Using Compression 4, the compression ratio of the file was 1.52 with an accuracy of 99.6078 percent. M-file written to create this .wav file for real-world compression/decompression testing. Compressed output using Compression 2 and 4 Turn up your volume, the amplitude of the compressed file is much lower. Compressed data should not represent the original data string. This example demonstrates the inefficiency of Compression 2. Original Song Compressed Song Compression 2 Decompressed Song Compressed Song Compression 4
  27. 27. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 25
  28. 28. Findings Page 26 Compression 2: - Generally, this algorithm produced a compression ratio of about 1 in most cases. For simple waveforms like the square wave, compression did occur. - Fastest compression algorithm of the three - Inefficient compression Compression ratio of 1 = No compression Compression 3 and 4: - Compression ratio increases with increased data length/duration - Increased data length/duration causes longer calculation times Within limits - Compression 4 produced a much higher compression ratio in comparison to other algorithms - Compression 4 is the slowest algorithm Three compression methods Special Cases: - The square wave produces 100% accuracy and very high compression with all three algorithms - White Noise does not seem to compress much past a ratio of 1 - Code has been modified to handle gaps in the input data - The accuracy of compression/decompression for all three algorithms has proven to be above 99% in all cases
  29. 29. Presentation Outline Introduction Project Overview Sam Sterns Data Compression Uses for Data Compression Types of Data Compression Three Algorithms Testing Procedure Compression/Decompression Example Findings Conclusion Page 27
  30. 30. Future Work Page 28 - Similar waveform analysis with the raw data files provided by Dr. Sam Sterns - Additional error or warning messages - Noise - Gaps - Invalid array data - Implementation of compression algorithms into Graflab database - Investigate possibilities of real-time compression/decompression Recommendations: - Filter noise from data prior to compression - Compress all data, disregarding size - Continue implementation of replacing gaps with zeros
  31. 31. Summer Work Applicability / Benefit Page 29 - Applicability to our organization - Meaningful work - Storing new and legacy environmental test data from the surveillance program - Environmental Test lab data storage - Opportunity to continue education - Improved Matlab skills - Introduction to Labview - ORCAD familiarity - Organizational and leadership skills Management course - Assimilation to Albuquerque, work environment at Sandia National Laboratories, and Aircraft Compatibility [9] [10]
  32. 32. Citations and Questions [1] University of New Mexico ECE, Dr. Samuel D. Stearns, 2010. [Online]. Available: http://www.ece.unm.edu/faculty/stearns/. [Accessed: July 2010]. [2] Plus Magazine, Text, Bytes and Videotape, January 1, 2003. [Online]. Available: http://plus.maths.org/issue23/features/data/data.jpg. [Accessed: August 2010]. [3] Wikipedia, Data compression, July 20, 2010. [Online]. Available: http://en.wikipedia.org/wiki/Data_compression. [Accessed: August 2010]. [4] Hoax-slyer.com, Burning-hard-drive, 2010. [Online]. Available: http://www.hoax- slayer.com/images/burning-hard-drive.jpg. [Accessed: August 2010]. [5] S. Sterns, Encoding and Decoding of Instrumentation and Telemetry Waveforms. Samuel D. Sterns: Sandia National Laboratories. January 25, 2008. [6] Wikipedia, Quantization (signal processing), July 2, 2010. [Online]. Available: http://en.wikipedia.org/wiki/Quantization_(signal_processing). [Accessed: June 2010]. [7] Connexions, Linear Prediction and Cross Synthesis, March 18, 2008. [Online]. Available: http://cnx.org/content/m15478/latest/ . [Accessed: June 2010]. [8] Wikipedia, Arithmetic coding, August 7, 2010. [Online]. Available: http://en.wikipedia.org/wiki/Arithmetic_coding. [Accessed: June 2010]. [9] Rice University, Home page, 2010. [Online]. Available: http://www.rice.edu. [Accessed: August 2010]. Appendix I
  33. 33. Citations and Questions [10] Sandia National Laboratories, Home page, 2010. [Online]. Available: http://www.sandia.gov. [Accessed: August 2010]. [11] T. Skousen. (private communication). 2010. [12] J. Cap. (private communication). 2010. Appendix II