dimension reduction for hyperspectral imaging:...
TRANSCRIPT
DIMENSION REDUCTION FOR
HYPERSPECTRAL IMAGING:
FIRST SEMESTER PROGRESS
REPORT
Yiran Li AMSC 663
Advisors: John Benedetto, Wojtek Czaja Department of Mathematics
1
Background
Light is described in terms of its wavelength
A reflectance spectrum shows the reflectance of a material measured across a range of wavelengths. It helps identify certain materials uniquely
Hyperspectral images are three dimensional (x-coordinate, y-coordinate, spectrum)
Each pixel has a different spectrum that may represent different materials
Sometimes over 100 bands and with large number of pixels
2
Spectrum and hyperspectral imagery
Left: Reflectance spectra measured by laboratory
spectrometers for three materials: a green bay laurel leaf, the
mineral talc, and a silty loam soil.
Right: The concept of hyperspectral imagery. (Shippert, 2003)
3
Project Goal
Reduce dimensionality of hyperspectral imaging data
Because that hyperspectral imaging contains large amount of (possibly redundant) information, we want to reduce the dimensionality (and thus the size) of the data while preserving key features of the original data
Two algorithms: Laplacian eigenmaps and Randomized Principal Componenet Analysis are tested and compared
4
Laplacian eigenmaps: the idea
We view each pixel on the hyperspectral imaging data as a node on the graph G, and the distance between them is measured by the Euclidean distance of the spectrum
Distance= 𝑥𝑖 − 𝑥𝑗 Because the spectrum is long (usually hundreds of bands), we want
to map the graph G to a lower dimensional space so that the size of the data is reduced, and that connected points stay as close together as possible, let 𝑦 = 𝑦1, 𝑦2, … 𝑦𝑛
T be such a map. Our goal is to minimize
𝑖,𝑗 𝑦𝑖 − 𝑦𝑗2𝑊𝑖𝑗
where
𝑊𝑖𝑗 = 𝑒−𝑥𝑖−𝑥𝑗
2
𝑡 , is the weight on each edge (large if points are close),
and t is the time passed. In my code, I chose t=10000, due to large distances between nodes.
5
Laplacian eigenmaps: the idea
Since
𝑖,𝑗
𝑦𝑖 − 𝑦𝑗2𝑊𝑖𝑗 = 2y
TLy,
where
𝐷𝑖𝑖 = 𝑗𝑊𝑗𝑖 ,
and
𝐿 = 𝐷 −𝑊the problem of finding 𝑎𝑟𝑔𝑚𝑖𝑛 𝑦𝑇𝐿𝑦 given that 𝑦𝑇𝐷𝑦 = 1,𝑦𝑇𝐷1 = 0 becomes the minimum eigenvalue problem:
𝐿𝑓 = 𝜆𝐷𝑓(Belkin, Niyogi, 2002)
6
Laplacian eigenmaps: the algorithm
Step 1: Constructing the Adjacency Graph
Construct a weighted graph with n nodes (n number of data points), and a set of edges connecting neighboring points.
In our context, each node represents one pixel on the graph, with position represented by its spectrum of length l (l dimensional vector).
Two nodes are connected if
𝑥𝑖 − 𝑥𝑗2< 𝜀
In my code, I chose 𝜀 to be 1/5 of the maximum distance between nodes.
7
Laplacian eigenmaps: the algorithm8
Step 2:
Choosing the weights using Heat Kernel:
𝑊𝑖𝑗 = 𝑒−𝑥𝑖−𝑥𝑗
2
𝑡
Step 3:
Compute eigenvalues and eigenvectors for the generalized eigenvector problem:
𝐿𝑓 = 𝜆𝐷𝑓 (1)
Where 𝑊 is the weight matrix defined earlier, 𝐷 is diagonal weight matrix, 𝐿 = 𝐷 −𝑊
Laplacian eigenmaps: the algorithm
Result:
Let 𝑓0, 𝑓1, … , 𝑓𝑛−1 be the solutions of equation (1), ordered such that
0 = 𝜆0 ≤ 𝜆1 ≤ … ≤ 𝜆𝑛−1Then the first m eigenvectors (excluding 𝑓0) ,
{𝑓1, 𝑓2, … , 𝑓𝑚}
are the desired vectors for embedding in m-dimensional Euclidean space
(Belkin, Niyogi, 2002)
9
Discussion: dimension reduction
Each eigenfunction in {𝑓1, 𝑓2, … , 𝑓𝑚} represents a mapped image onto one dimensional space.
We pick the first m eigenfunctions corresponding to the first m smallest eigenvalues, so that our goal of minimization is satisfied as much as possible.
The resulting image, represented as a graph (nodes are the pixels, locations indicated by the corresponding vector on each pixel), lies in the m dimensional space
Key structure of the graph is preserved
10
Discussion: Dimension reduction11
Visualization of dimension reduction (Shen-En Qian)
Implementation
Software: matlab
hardware: personal computer
Databases:
• Salinas A scene, SalinasA, 1.5MB
86*83 pixels, subscene of Salinas, which was collected by the 224-band AVIRIS sensor over Salinas Valley, California. Contains 6 classes.
• Indian Pines, 6.0MB.
145*145 pixels, gathered by 224-band AVIRIS sensor over the Indian Pines test site in North-western Indiana.
Contains 16 classes.
(Hyperspectral Remote Sensing Scenes)
12
Achievements of the semester
Implementation of laplacian eigenmaps
Understanding of the math behind laplacian
eigenmaps
Code validation
Results compared with groundtruth images
(verification)
Results with error percentage computed
(verification)
13
Code validation
Comparison of results directly with laplacian
eigenmaps code that is publically available
(from DR toolbox from Delft University)
Run my code and the code from the toolbox of Delft
University on the same data sets (pseudo)
Compared the results directly by looking at the
eigenvalues and eigenvectors generated by each
algorithm
14
Verification: ground truth image
Ground truth image is the classification of hyperspectral imaging based on the real objects in the image
Left: Indian Pines hyperspectral image
Right: groundtruth image (Hyperspectral Remote Sensing Scenes)
15
Verification: ground truth image16
Groundtruth classes for the Indian Pines scene and their respective samples number
# Class Samples
1 Alfalfa 46
2 Corn-notill 1428
3 Corn-mintill 830
4 Corn 237
5 Grass-pasture 483
6 Grass-trees 730
7 Grass-pasture-mowed 28
8 Hay-windrowed 478
9 Oats 20
10 Soybean-notill 972
11 Soybean-mintill 2455
12 Soybean-clean 593
13 Wheat 205
14 Woods 1265
15 Buildings-Grass-Trees-Drives 386
16 Stone-Steel-Towers 93
Verification: classification of pixels
In order to verify that the image of reduced
dimension preserves key structure of the original
data, we classify the vectors produced by the
algorithm based on ground truth data
In each category of ground truth image, pick one
pixel as a representative (training data)
In the vector space of reduced dimension, identify
the vectors at the location of the training data
17
Verification: 1NN-classifier18
Find k nearest vectors of a training vector, nearest
in terms of distance, and classify them in the same
category as the training vector
Example of an KNN classifier(Wikipedia)
Results: SalinasA vs groundtruth19
Left: my result
Right: ground truth
Quantitative analysis: error percentage
Calculated the percentage of number of pixels that
disagree with groundtruth data sets.
Percentage: ≈7.1%
Possible reasons: unrepresentative training data;
dimension is too low; epsilon is large
Time analysis
Running time: 15829.7s ≈ 4.40h
20
Timeline/milestones
October 17th: Project proposal
Now to November, 2014: Implement and test
laplacian eigenmaps, prepare for implementation
of randomized PCA
December, 2014: Midyear presentation
January to March: Implement and test randomized
PCA, compare two methods in various situations
April to May: Final presentation and Final report
21
Deliverables
Presentation of data sets with reduced dimensions
of both algorithms (presented as images)
Comparison charts in terms of running time and
accuracy of two different methods (half completed)
Comparison charts with other methods that are
available from the DR matlab toolbox
Data sets (available), Matlab codes(half available),
presentations(two available), proposals
(completed), mid-year report, final report
22
Possible modifications
Algorithm: different ways of choosing neighboring
nodes (the value of epsilon can be modified)
Verification: Use different classification methods. for
example: K-means
Parallelization in C to improve running time
23
Summary
Implementation of the algorithm was successful
Validated the code
Verified and compared results to groundtruth image
The eigenvectors of reduced dimensionality did
preserved the information about the original data
set while being able to reduce complexity of the
data sets and allowing for more actions
24
Bibliography
Shippert, Peg. Introduction to Hyperspectral Image Analysis. Online Journal of Space Communication, issue No. 3: Remote Sensing of Earth via Satellite. Winter 2003. http://spacejournal.ohio.edu/pdf/shippert.pdf
Hyperspectral Imaging. From Wikipedia. Oct. 6th, 2014. http://en.wikipedia.org/wiki/Hyperspectral_imaging
Belkin, Mikhail; Niyogi, Partha. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation.Neural Computation, vol 15. Dec. 8th, 2002. Web. http://web.cse.ohio-state.edu/~mbelkin/papers/LEM_NC_03.pdf
25
Rokhlin, Vladimir; Szlam, Arthur; Tygert, Mark. A Randomized Algorithm for Principal Component Analysis. SIAM Journal on Matrix Analysis and Applications Volume 31 Issue 3. August 2009. Web. ftp://ftp.math.ucla.edu/pub/camreport/cam08-60.pdf
Matlab Toolbox for Dimension Reduction. Delft University. Web. Oct. 6th, 2014. http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html
IC: Hyperspectral Remote Sensing Scenes. Web. Oct. 6th, 2014. http://www.ehu.es/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes
Qian, Shen-En. Dimensionality reduction of multidimensional satellite imagery. SPIE Newsroom. 21 March 2011. http://spie.org/x45097.xml
26