detection and segmentation of bird song in noisy environments
DESCRIPTION
Detection and Segmentation of Bird Song in Noisy Environments. Lawrence Neal, UHC Honors Thesis. Bioacoustics Project. Bird Species Identifiable by species Presence/Absence, activity data is useful Bird activity may shift in response to climate change, ecological factors. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/1.jpg)
Detection and Segmentation of Bird Song in Noisy EnvironmentsLawrence Neal, UHC Honors Thesis
![Page 2: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/2.jpg)
Bioacoustics ProjectBird Species
◦Identifiable by species◦Presence/Absence, activity data is
useful Bird activity may shift in response to
climate change, ecological factors
![Page 3: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/3.jpg)
Bioacoustics Project
![Page 4: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/4.jpg)
Automated RecordingSong Meter automated recordersCollected May-August beginning
2009
![Page 5: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/5.jpg)
Audio Data Analysis
Involves several steps:◦Extracting Bird Sound from Audio◦Identifying Bird Species◦Mapping species data back to sites
![Page 6: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/6.jpg)
Audio Data Analysis
Involves several steps:◦Extracting Bird Sound from Audio
(Segmentation)◦Identifying Bird Species◦Mapping species data back to sites
![Page 7: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/7.jpg)
SegmentationTime-Domain Segmentation
◦Separates audio into multiple clips◦Energy Thresholding, Onset/Offset
Detection◦Has been applied to bird song
Harma 2003, Fagerlund 2004, Lee 2008
![Page 8: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/8.jpg)
SegmentationTime-Domain Segmentation
![Page 9: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/9.jpg)
SegmentationTime-Domain Segmentation
◦Cannot separate overlapping sounds
![Page 10: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/10.jpg)
SegmentationTime-Frequency Segmentation
◦Segment regions of the 2D spectrogram
![Page 11: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/11.jpg)
SegmentationSpectrogram Segmentation
◦Similar to image segmentation
![Page 12: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/12.jpg)
SpectrogramsTwo-dimensional representation
of sound◦Audio amplitude at each (time,
frequency)◦Generated by short-time Fourier
Transform Male voice saying 'nineteenth century'.
Violin playing (note harmonics)
![Page 13: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/13.jpg)
SpectrogramsTradeoffs in parameters
◦Larger STFT size◦Higher freq. resolution
![Page 14: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/14.jpg)
SpectrogramsTradeoffs in parameters
◦Shorter step size◦Higher time resolution
![Page 15: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/15.jpg)
Spectrogram SegmentsEach segment is a continuous
region◦Defined by a binary mask over the
spectrogram
![Page 16: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/16.jpg)
Spectrogram SegmentsCan be converted back to audio
with inverse STFT, or left as 2D segments
![Page 17: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/17.jpg)
Segmentation MethodsPer-Pixel Random Forest
◦Trains on one feature vector per pixel◦Outputs probability per-pixel
Superpixel Merger Method◦First splits spectrogram into
‘superpixels’◦Trains on one feature vector per
superpixel◦Second classifier trains per
superpixel pair◦Outputs connected sets of
superpixels
![Page 18: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/18.jpg)
Random ForestSupervised Classifier
◦Trains on human-provided data with labels “Feature Vector” of values, each with
yes/no label◦Learns to mimic the human’s labels
Based on decision trees: ◦Tree is traversed with feature vector
X◦Each interior node is a decision of
the type: If (Xd < θ) go left; else go right
◦Each leaf node contains a class label In this case, two classes: ‘Bird Sound’ and
‘Negative’
![Page 19: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/19.jpg)
Random ForestConstructed by recursive procedure
◦Check if all remaining examples are the same If so, finish with a leaf node
◦Select a random subset of features For each one, find the optimal split (highest Gini)
◦Choose the (feature, split) pair for maximum Gini coefficient and create new interior node
◦Split the examples and recursively create two child nodes
Classification is a vote among all trees
![Page 20: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/20.jpg)
Per-Pixel TrainingHand-Drawn mask over
spectrogram◦Pixels are randomly sampled
![Page 21: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/21.jpg)
Per-Pixel TrainingFeature vector includes:
◦Pixel Frequency◦Window Variance◦All window pixel values
![Page 22: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/22.jpg)
Per-Pixel OutputProbability Mask over the
spectrogramThreshold is applied to extract
segments
![Page 23: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/23.jpg)
Per-Pixel Output
![Page 24: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/24.jpg)
Per-Pixel Output
![Page 25: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/25.jpg)
Per-Pixel Output
![Page 26: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/26.jpg)
Per-Pixel LimitationsScope is limited to window sizeHigh threshold causes
oversegmentationLow threshold causes
undersegmentationSlow- must classify for each pixel
![Page 27: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/27.jpg)
Superpixel MethodBegins with an initial pre-
segmentation◦Modification of Simple Linear
Iterative Clustering (SLIC) image segmentation
◦Uses computed features that describe regions of the spectrogram
Segments are sets of superpixels
![Page 28: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/28.jpg)
Superpixel ClusteringBased on SLIC method:
◦Each pixel is assigned a 5-valued vector (X,Y, L, a, b) for position and color
Locally-constrained K-Means Clustering◦Each centroid searches only a radius
of 2S S = sqrt(N/K)
Creates a set of regularly-sized regions◦Some regions’ boundaries follow the
edges of larger objects in the image
![Page 29: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/29.jpg)
Superpixel ClusteringOver-segments an image
◦Edges of clusters arealong image edges
But, doesn’t workfor spectrograms
![Page 30: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/30.jpg)
Superpixel ClusteringSpectrograms lack edges
◦Also, only one channel of colorInstead of (x,y,L,a,b), we use a
new vector:◦(x, y, B, V, Gx, Gy, Px, Py)
![Page 31: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/31.jpg)
Superpixel ClusteringX,Y values
◦Time and frequency values in the spectrogram
B, V◦Pixel values after Gaussian blur, variance of
pixel valuesGx ,Gy
◦Horizontal/Vertical Sobel Gradient valuesPx, Py
◦Time and Frequency values of nearest peak (weighted by Gaussian kernel)
![Page 32: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/32.jpg)
Superpixel Clustering
![Page 33: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/33.jpg)
Foreground/Background ClassifierRandom Forest trained using the
same manual spectrogram labels as per-pixel◦Each superpixel is labeled positive
(foreground) if more than 10% of its area overlaps with a positive-labeled region
Feature vector describes superpixel:◦Mean and variance of pixel values,
blurred pixel values, peak frequencies◦Histogram of Oriented Gradients
![Page 34: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/34.jpg)
Foreground/Background Classifier
![Page 35: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/35.jpg)
Superpixel Merger ClassifierRandom Forest trained to classify
pairs of adjacent superpixels◦Positive classification: Merge
together◦Negative classification: Split apart
After background pixels are discarded, all remaining edges between superpixels are classified◦All edges above a threshold are
merged
![Page 36: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/36.jpg)
Superpixel Merger Classifier
![Page 37: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/37.jpg)
Superpixel Method Output
![Page 38: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/38.jpg)
Superpixel Method Output
![Page 39: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/39.jpg)
Superpixel Method Output
![Page 40: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/40.jpg)
Superpixel Method Output
![Page 41: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/41.jpg)
Superpixel Method Output
![Page 42: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/42.jpg)
Superpixel Method Output
![Page 43: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/43.jpg)
Evaluation DatasetsHJ Andrews dataset, 625
recordings◦Each 15 seconds long◦Drawn 2 each from 24 hours
“Set A” dataset, 166 recordings◦All from early and mid morning◦Paired by year, 2009/2010
![Page 44: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/44.jpg)
Differences in Training Data
![Page 45: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/45.jpg)
Results
![Page 46: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/46.jpg)
Results
![Page 47: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/47.jpg)
Results
![Page 48: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/48.jpg)
Results
![Page 49: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/49.jpg)
Future WorkSuperpixel Method is promising
◦Faster than per-pixel classification◦Could use more sophisticated
merger technique
![Page 50: Detection and Segmentation of Bird Song in Noisy Environments](https://reader036.vdocuments.mx/reader036/viewer/2022062305/56815d88550346895dcb94a3/html5/thumbnails/50.jpg)
Bibliography A. Harma, “Automatic identification of bird species based on sinusoidal
modeling of syllables,” in IEEE International Conference on Acoustics Speech and Signal Processing, April 2003, pp. 545–548.
Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien Chuang, “Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1541 – 1550, 2008.
Leo Breiman, “Random forests,” Machine Learning, pp. 5–32, January 2001. Fagerlund, Seppo. Automatic Recognition of Bird Species by Their Sounds.
Master’s Thesis, HELSINKI UNIVERSITY OF TECHNOLOGY, Laboratory of Acoustics and Audio Signal Processing. Nov. 8, 2004