research statement geometric audiovisual signal ...is music information retrieval (mir), an...

Research Statement

Geometric Audiovisual Signal Processing (GASP!):

Video And Music Processing with A Twist

Christopher J. [email protected]

1 Introduction

My research lies broadly in the field of “geometricsignal processing” (GSP), with a particular focus onmultimedia data, which means I leverage skills fromcomputer science, math, electrical engineering, anddata science in my work. Geometric signal processinghas historically meant analysis of shapes in 2D/3Din computer graphics/vision [Tau00], but emergingtools in geometric/topological data analysis have re-cently broadened the scope of applications substan-tially, from computational biology [PDHH15], to or-thodontia [GLMH12, BLC+11], to botany [GMB+12],to neuroscience [Cur17, BMM+16, GPCI15], to mate-rials science [KGKM13], and more. Hence, it is animportant, broad field with growth potential. Fur-thermore, since it is concerned with the developmentof fundamental tools and techniques to be applied else-where, it is naturally interdisciplinary.

One of my primary application areas of these toolsis music information retrieval (MIR), an increasinglyactive field of research1. Using geometric tools on topof more traditional signal processing pipelines, I haveshown state of the art results on the long standingproblem of automatic audio cover song identification(Section 2.2). I accomplished this in an entirely un-supervised (no training data) fashion, which meansthat my approach naturally generalizes to any genre.I have also developed new geometry-based techniquesfor clean representations and visualizations of multi-scale music structure (Section 4). Beyond music, Ihave done some related work on cross-modal (e.g. au-dio to video, seismic to acoustic) time series, where Ihave brought together a number of mathematical tools(e.g. self-similarity matrices [JDLP11], similarity net-work fusion [WJW+12], and the scattering transform[Mal]) into a very general, unsupervised pipeline usedfor both pattern identification and time synchroniza-tion across modalities (Section 2.3).

Outside of MIR and cross-modal analysis, I haveapplied my techniques in a dynamical systems contextto quantify periodicity and quasiperiodicity in videodata (Section 3). For instance, in the periodicity case,

1The flagship conference on International Society for MusicInformation Retrieval (ISMIR) had 90 participants at its in-ception back in 2000, while the 2018 conference had over 300participants.

using topological data analysis [ELZ00, EH10], I amable to improve onset detection of stereotypical motormotions in children with autism spectrum disorder. Inthe quasiperiodic case, I am able to differentiate be-tween healthy and partially paralyzed vocal fold mo-tions from “high speed glottography” videos by quan-tifying whether the videos make a topological loop ora topological torus after an appropriate transforma-tion I refer to as “sliding window videos.” The use ofgeometric tools completely eliminates the need to domotion tracking, which can often fail in this scenario.Furthermore, to my knowledge, this is also one of thefirst applications of higher homology (i.e. persistentH2) to high dimensional point clouds. This was en-abled in part by fast software for TDA computations,to which I contributed (Section 5).

Beyond the research problems I have spelled out,my research style also lends itself nicely to interdis-ciplinary interactions and integration of undergradu-ates (Section 7). I am a very hands on researcher whogets his hands dirty with implementations, and I canmodel best practices of software development to mystudents (Section 5). I also put lots of time into devel-oping intuitive visualizations to illustrate complicatedconcepts (Section 4), which helps both with researchpresentations and with teaching. My work even hasapplications to the “digital humanities” (Section 6),making it a natural fit for a small liberal arts environ-ment.

Finally, my open, exploratory style of researchmakes it so that I never get stuck or stagnate fortoo long on any particular idea or line of work if itisn’t bearing fruit (Section 8), and I have identifieda number of future projects to hedge my bets get-ting this started (Section 9). I have also put a lot oftime into scoping out publication venues for under-graduate research within my area that are practicalfor them within a short timeframe to ensure that theyhave something to show for their work (Section 7).

1

Figure 1: SSMs on a 5 second clip of audio from theCovers80 dataset [Ell07]. The left figure shows anSSM on MFCC features of the audio, while the rightfigure shows an SSM on the raw audio sliding win-dow embedding. MFCC hones in on more clearly onrelevant structural details.

2 Self-Similarity Based TimeSeries Analysis

2.1 Sliding Windows And Self-Similarity Matrices

The fundamental data stream that I study in my re-search is time series, which I use in a general way torefer to any time-ordered sequence of measurements,such as traditional 1D time series (e.g. audio), to se-quences of images in video, to sequences of 3D shapes,to sequences of graphs, etc. In my work, I rely cru-cially on a transformation from a time series to a geo-metric shape known as the sliding window embedding.In the case of a 1D time series f : R → R, a timeindex t, an integer M ≥ 0, and a real number τ > 0,the sliding window embedding at time t is defined as

SM,τf(t) =

f(t)

f(t+ τ)...

f(t+Mτ)

∈ RM+1 (1)

In other words, one takes a sample along with Mlags and stack them all up into an M + 1 dimensionalEuclidean vector. As t varies, the sliding window em-bedding traces out a space-curve in RM+1, though oneusually takes discrete samples of that curve, yielding atime-ordered point cloud. In the case of more generaltime series, this construction still applies and is neces-sary to geometrically disambiguate states (Section 3).

Sliding windows have myriad applications, includ-ing cover song identification [SSA09] (which I buildon in Section 2.2), music structure analysis ([Bel11]),EEG data analysis ([Sta05, PMT+14]), activity andgait recognition ([FMP10, VT16]), and gene expres-sion time series data ([PDHH15]), among many oth-ers.

There are a number of feature summaries thathave been devised on top of this construction [KS04],though I give a particular focus to summaries whichretain both the geometry and time order. As a first

stop in the absence of any other information aboutthe time-series f , I utilize time-ordered self-similaritymatrices (SSMs) on top of the sliding window em-bedding. That is, given a set T = t1, t2, ..., tN ofN time indices, I construct an N × N matrix D sothat Dij = ||SM,τf(ti) − SM,τf(tj)||2. These imagesencode all geometric information up to an isometry,which is particularly advantageous when comparingtime series which are rotated and translated in theirfeature spaces (e.g. Figure 7). They are also al-ways a 2D matrix, regardless of the dimensionalityof the sliding window they summarize, and so analy-sis techniques derived on top of them are quite gen-eral. Hence, they have been used in applications rang-ing from activity recognition [JDLP08] and periodicityanalysis in video [CD00], to music segment boundarydetection [Foo00] and music structure understanding[Bel09], [KS10], [SMGA12] and [ME14].

Note that doing the sliding window embedding onthe raw time series is often not a good idea, and Iinstead often turn to feature summaries of each win-dow, which consist of a (possibly nonlinear) map fromthe sliding window space to some other feature space2.For instance, Figure 1 shows an SSM of a sliding win-dow on raw audio, in which hardly any structure isvisible (all windows are all far apart in sliding win-dow space), but applying the MFCC [BHT63] leadsto much more visible structure.

I will now describe several promising research di-rections I have explored using SSMs as sliding windowsummaries.

2.2 Cover Song Identification

Relevant Contributions

[TB15] Christopher J Tralie and Paul Bendich. Coversong identification with timbral shape sequences.In 16th International Society for Music Informa-tion Retrieval (ISMIR), pages 38–44, 2015.

[Tra17] Christopher J Tralie. Early mfcc and hpcp fusionfor robust cover song identification. In 18th Inter-national Society for Music Information Retrieval(ISMIR), 2017.

[Tra18] Christopher J Tralie. Cover song synthesis byanalogy. In 19th International Society for MusicInformation Retrieval (ISMIR), 2018.

A “cover song” is a different version of the samesong, usually performed by a different artist, and of-ten with different instruments, recording setting, mix-ing/balance, tempo, and key. For our purposes, liveperformances by the same artist are also considered“cover songs.” This notion is not well-defined mathe-matically, so to evaluate algorithms, I report results onseveral widely used benchmark datasets. Also, I have

2I have in the past referred to this as “community-acceptedfeatures spaces” in the NSF Big Data Grant Topological DataAnalysis and Machine-Learning with Community-AcceptedFeatures” (Award No. 1447491), in which I wrote the majorityof the technical background

2

Figure 2: An example of beat-synchronous HPCPfeatures between two different versions of the song“Grand Illusion” from the covers80 dataset [Ell07].

Figure 3: An 8 beat block from “We Can Work ItOut” by The Beatles with a live cover by Tesla. Inboth clips, the singers sing the words “We Can WorkIt Out” twice, and so there is one strong secondarydiagonal in each one.

tested methods on my own curated data set, which Irefer to as “covers 1000” [Tra17], which filled the needof a dataset which was diverse in both genre and timeperiod, and in which custom features are available.

Automatic audio cover song identification is a sur-prisingly difficult problem. Though the “Shazam”technique of audio fingerprinting [W+03, Wan06]works quite well at identifying songs from just a fewseconds of audio in incredibly noisy environments, itmatches to exact recordings in a database. The moregeneral problem of identifying different versions of asong, possibly even by the same artist, remains chal-lenging. And recently, there has been increasing inter-est in the more general case from companies such asSpotify and Soundcloud, who would like to automatecover identification and similarity measures on lesser-known artists, and from companies such as Youtubeand Facebook, who have to content with copyright in-fringment where users perform warps on the originalcontent to disguise it, which render traditional audiofingerprinting useless.

Nearly all approaches to this problem in the pastdecade have focused on an obvious “invariant tocover”: the notes [Ell06, SGHS08, Bel07, SSA09,HNB13, NB14, SYB+16, CLX17, OVDE16]. Themost common approach is to use features on the slid-ing window which map frequencies to pitches, such asthe “harmonic pitch class profiles” [Gom06], and thisworks quite well on a lot of pop songs. Figure 2 showsan example.

One obvious drawback of note-based approaches,

Figure 4: An 8 beat block from “Time” by Tori Amosand Tom Waits. In this block, the pattern is “time”+ instrumentals + “time” + different instrumentals.The recurrence for the “time” is visible in both about2/3 of the way through.

Figure 5: A toy example of similarity network fusion(SNF) from three different (top row) noisy measure-ments of 3 simulated clusters. Simple averaging (bot-tom left) of the similarities is far inferior to SNF (bot-tom center).

however, is that notes are not the most prominent el-ement in certain songs and genres, such as hip hop.Motivated by this, I took on the challenge of trying touse MFCC features [BHT63] in the cover songs task,which are exactly complementary to note-based fea-tures; they pick up on timbre and overall spectralenvelope only. What I eventually discovered is thateven if sequences of MFCC features lie in differentparts of the feature space between cover songs, theirSSMs are approximately the same in small regionsspanning several beats. In mathematical terms, se-quences of MFCC features are approximate isometrieslocally in time. In intuitive terms, the relative evolu-tion of the sound is the same between songs, even ifthe absolute sound is different. Figure 3 and Figure 4shows two examples from cover songs in the covers80cover songs benchmark database [Ell07]. In [TB15], Ishowed that a simple L2 distance between sequencesof beat-synchronous (aligned to the beat for temponormalization) SSMs nearly reaches the performanceof note-based features, which was surprising to manyin the community.

3

Figure 6: An example of early SNF on blocks of MFCCand blocks of HPCP features on the song “Before YouAccuse Me” with versions by Eric Clapton and Cree-dence Clearwater Revival. The corresponding cross-similarity portions for all three matrices shown foreach on the bottom.

2.2.1 Fusing Features

Since my approach is completely complementary toprior approaches, a natural question is whether itis possible to combine my approach with others forstate of the art results. I discovered the way todo this was using an algorithm known as “similaritynetwork fusion” [WJW+12, WMD+14], which doesa joint random walk between two or more SSMs tocome up with a fused SSM leveraging the strengthsand mitigating weaknesses of the original SSMs. Fig-ure 5 shows a synthetic example. This algorithm wasoriginally designed to improve shape classification byleveraging different comparison methods at the objectlevel[WJW+12], but I realized this could also be usedwithin a single object, for different features (MFCC,MFCC SSMs, HPCP) on sliding windows.

Figure 6 shows an example of this application ofSNF on the concatenation of two cover songs. In thefused matrix, diagonals in the cross-similarity region(upper right block, or windows compared from onesong to the other) are much crisper and more dis-tinct from the background than they are in the orig-inal two. This means it is easier to pick out manymatching windows in sequence between the two cov-ers, indicating a good match. With this technique,I showed state of the art identification in [Tra17] onboth the covers80 benchmark dataset [Ell07] and alarger dataset I curated myself, which I call “covers1000” [Tra17]. With this scheme, I was able to iden-tify covers of hip hop songs that failed with more tra-ditional techniques, and I even identified a cluster of7 covers of Frank Zappa’s “The Black Page,” which isentirely percussive. For a live demo of these match-ings comparing my technique to others, please visithttp://www.covers1000.net/demo.html. My abil-ity to align cover songs with this scheme is so ac-curate that I was even able to devise a simple algo-

rithm based on 2D Convolutional Nonnegative MatrixFactorization (2D-NMF) [SM06] to synthesize coverssongs from aligned examples, which I call “cover songanalogies” [Tra18]3. My current and future work inthis area includes scaling up algorithms to massivedatasets and exploring other “cover sequence” appli-cations (e.g. video copyright infringement detection[BBK10]).

2.3 Cross-Modal Time Series Analysis


[TBH19] Christopher J Tralie, Paul Bendich, and JohnHarer. Multi-scale geometric summaries forsimilarity-based sensor fusion. In The 40thIEEE Aerospace Conference (In Submission),2019.

[Tra17] Christopher J Tralie. Self-similarity based timewarping. arXiv preprint arXiv:1711.07513 (InSubmission), 2017.

[TSB+18] Christopher J Tralie, Abraham Smith, NathanBorggren, Jay Hineman, Paul Bendich, PeterZulch, and John Harer. Geometric cross-modalcomparison of heterogenous sensor data. InProceedings of The 39th IEEE Aerospace Con-ference, 2018.

Though HPCP and MFCC features are built ontop of the same sliding window embedding, they aresufficiently different that one could conceivably thinkof them as different modalities. To extend my ap-proach further in this direction, I have been exploringadditional cross-modal problems (e.g. audio/video,seismic/acoustic) in which a sliding window + self-similarity matrix approach can add value. In one ap-plication, I showed that summary statistics, includingL2 distance, between appropriately normalized SSMscan be used to associate driving behaviors in video tothe same behaviors measured by doppler radar profiles[TSB+18].

One challenge in this scenario, however, is that, un-like with musical audio, there is usually not a “beattracker” at hand. Thus, if one wants to cluster simi-lar behaviors together across different runs, one mustaccount for global rescaling and local time warps an-other way. Most approaches to temporally aligningtime series across modalities require learning a map-ping to a common space in which they are spatiallyaligned, using, for example, CCA [ZDlT16], mani-fold learning [GM11], or deep learning [TNZS16]. Bycontrast, SSMs at least already factor out isometries,and re-parameterizing a time series has a very par-ticular effect on its SSM. Figure 7 shows a sketch ofthis observation and my ensuing unsupervised algo-rithm (“isometry-blind dynamic time warping” (IB-DTW)) between two time-ordered point clouds whichare isometries, up to a parameterization [Tra17]. Rows

3Audio examples can be found at http://www.covers1000.

net/analogies.html

4

1

2

Figure 7: A concept figure for my unsupervised “isom-etry blind dynamic time warping” (IBDTW) tech-nique for aligning time-ordered point clouds which arerotated/translated/flipped and re-parameterized ver-sions of each other.

Figure 8: Using IBDTW to synchronize a video of twomen doing jumping jacks with motion capture data ofsomeone else doing jumping jacks

of SSMs of points which are in correspondence are re-parameterized versions of each other, which reducesthe global alignment problem to a series of 1D timewarping problems. If in the cross-modal scenario, thedistances are appropriately scaled in the SSMs, thenthis can be used, for example, to synchronize a videoof a motion with that same motion represented as mo-tion capture data (Figure 8). In addition to synchro-nization, I show that the similarity measure returnedfrom my algorithm is related to the Gromov-Hausdorffdistance between metric spaces [Gro07].

One drawback of IBDTW is its computational com-plexity, which is O(N4) for an N -length time series.Recently in [TBH19], I have been exploring an alter-native of uniform scaling followed by the scatteringtransform [Mal, BM, Bru13]. This is similar to the 2Dwavelet transform, but it is stable to small perturba-tions in the SSMs induced by local time warping. Itsarchitecture is also similar to 2D convolutional neu-ral networks (CNNs) [LB+95, LBH15], but instead oflearning the weights, it is unsupervised, with weightsfixed by a choice of mother wavelet4. Finally, by con-trast to IBDTW, it can be computed in O(N2) log(N)time. With a combination of similarity network fu-

4One huge advantage of fixing the weights this way is stabil-ity, which CNNs sorely lack in certain situations [SVK17]

Figure 9: Precision recall curves for different featureson SSMs for classifying digit sequences captured byaudio and video. The scattering transform (orangecurves) vastly improves results over straight L2 (bluecurves). SNF also has a positive impact, especiallywhen applied at a “late stage” to the object level sim-ilarities.

sion to combine modalities, followed by the scatter-ing transform, I show nearly perfect discriminationbetween speech sequences in the “OuluVS2” dataset[AZZP15], as shown in Figure 9. This is noteable be-cause my pipeline is completely unsupervised, whilethe vast majority of multimedia fusion approaches re-quire labeled data and a training phase [AHEK10].

3 Topological(Quasi)Periodicity Analysisin Video


[TB18] Christopher J Tralie and Matthew Berger. Topo-logical eulerian synthesis of slow motion periodicvideos. In IEEE International Conference onImage Processing, 2018.

[TMS18] Christopher J. Tralie, Goodwin S. Matthew,and Guillermo Sapiro. Automated detection ofstereotypical motor movements in children withautism spectrum disorder using geometric fea-ture fusion. International Society for Autism Re-search (INSAR) Abstract, 2018.

[TP18] Christopher J. Tralie and Jose A. Perea.(quasi)periodicity quantification in video data,using topology. SIAM Journal on Imaging Sci-ences, 11(2):1049–1077, 2018.

[Tra16] Christopher J Tralie. High dimensional ge-ometry of sliding window embeddings of peri-odic videos. In Proceedings of the 32st Interna-tional Symposium on Computational Geometry(SOCG), 2016.

[Tra17] Christopher J Tralie. Geometric MultimediaTime Series. Duke ph.d. dissertation, Depart-

5

Figure 10: An example of a 1D persistence diagram(multiscale loop quantification) on a point cloud sam-pled from a figure 8. No single scale appropriatelycaptures both of the holes in the original Figure 8 fromwhich these points were sampled, but the persistencediagram has two dots of non-negligible persistence, in-dicating there are two nontrivial ranges during whichthese two holes individually exist.

ment of Electrical and Computer Engineering,Duke University, 2017.

I now move beyond 1D time series to video data,but the analysis is slightly simplified, since I have ac-cess to a geometric model in all applications. In partic-ular, there is a beautiful connection between nonlinear1D time series analysis and geometry known as Tak-ens’ theorem [T+81, Nol10], which states under cer-tain conditions, a single observable (1D time series)of a dynamical system is enough to reconstruct thetopology of the state space. For periodic time series,this means that a sliding window embedding (Equa-tion 1) yields a point cloud sampled from a topologicalloop. One way to quantify the existence of a loop, re-gardless of the dimension in which it lives, using a toolfrom topological data analysis (TDA) known as persis-tent homology. Briefly, persistent homology is a linearalgebraic framework for tracking topological features(e.g. 0D connected components, 1D loops, 2D voids,etc.) over multiple scales in a point cloud via a 2D“persistence diagram.” There is a point in the persis-tence diagram for every nontrivial topological feature,whose birth time is the scale at which the feature firstforms in the point cloud, and whose death time is thescale at which the feature disappears (i.e. is “filledin”). The death minus birth for each point is knownas the “persistence” of the corresponding topologicalfeature, and points with higher persistences generallyrepresent more “important” features. Figure 10 showsan example for a 2D point cloud sampled from a thick-ened figure 8. Please refer to [ELZ00], [EH08], [EH10],[Car09], or [Ghr14] for more detailed explanations ofTDA and persistent homology.

In the case of sliding windows of periodic processes,one generally expects a single dot in the 1D persis-tence diagram to reside very far above the diagonal,with a possible smattering of small persistence pointscorresponding to “noise.” The persistence of the most

Figure 11: Three persistence diagrams from 3-axis ac-celerometers on the trunk, left wrist, and right wrist,respectively, of a child with autism spectrum disorderperforming a “rocking” action. In this case, the child’strunk is the main part that registers with a high per-sistence, which is consistent with a rocking action.

persistent dot can be used to quantify how periodica time series is. This is precisely the approach with1D gene expression data in [PH15]. In my work, Iextend sliding window embeddings + persistent ho-mology beyond 1D time series to video (and generalmultivariate time series) by making each lag an en-tire grayscale video frame instead of a single timeseries sample [Tra16, TP18]. That is, for a W × Hgrayscale video with M lags, each window in a slid-ing window video is a W ×H × (M + 1) dimensionalvector5; the sliding window is a simple way to geomet-rically disambiguate all states, and it acts as a formof “time regularization” for all periodic videos. I havedemonstrated that this works better for quantifyingperiodicity in videos than more traditional approachesbased on Fourier analysis and SSMs [CD00]. I alsoused this to quantify stereotypical motor motions fromthree-axis accelerometer data6 from autism spectrumdisorder children performing stereotypical motor mo-tions [TMS18], which are ubiquitous for those withthis disorder, and which clinicians want to study overtime. I showed that maximum persistence adds valuewhen combined with more traditional approaches onthis problem, such as recurrence quantification anal-

5This leads to very high dimensional data, but I describesome tricks to cut down on memory usage [TP18]. One mayquestion whether just a raw embedding of frames would do thetrick, but I demonstrated that this does not work for all videos[TP18]

6A joint embedding of these 3 axis can be thought of as a 3pixel video.

6

Figure 12: The power spectral density of the timeseries fh(t) = cos(t) + cos(t/3) and fg(t) = cos(t) +cos(t/π). Their profiles are nearly identical.

ysis [GMM+17]. Figure 11 shows an example of thisanalysis. Finally, I used this framework to synthesizeslow motion videos of periodic motions [TB18].

There are other interesting topological structuresthat can emerge from very simple time series. Fig-ure 13 shows the sliding window embedding of the timeseries fh(t) = cos(t) + cos(t/3) (frequency ratio of 3).Even though this is a periodic signal with only twoharmonics, projecting down to lower dimensions forvisualization via principal component analysis (PCA)hints at a much more complicated geometry, thoughthere is still only a single dot in the 1D persistencediagram. By contrast, if one makes a very slight alter-ation and instead creates the time series with a ratioof π between frequencies, fg(t) = cos(t) + cos(t/π),which is very similar to fh(t), there is a dramaticchange in the geometry Figure 14, while the differ-ence between the two is not discernable in the ordinarypower spectral density (Figure 12). This is an instanceof quasiperiodicity [HBTS94, BMM+15, GMFL17], inwhich the frequencies involved are non-commensurate(i.e. irrational with respect to each other), and itsstate space is a torus. Hence, by Takens’ theorem, asliding window embedding should yield a torus. In-deed, as shown in Figure 14, the persistence diagramsmatch the homology of a torus. Surprisingly, I wasalso able to find a video application in which the sim-ple sliding window video transformation I describedyields the exact same patterns under different scenar-ios. Figure 15 shows the SSM and persistence diagramfor a sliding window embedding of a high speed videoof vocal folds of a healthy patient. Vocal folds are os-cillatory, and the pattern is a single highly persistentdot in 1D homology, as in the 1D time series in Fig-ure 13. However, for a patient with asymmetric vocalfolds, it is possible to have quasiperiodic behavior. In-deed, when I apply my sliding window embedding, anearly perfect torus emerges (Figure 16). Please referto [TP18] for more details.

Overall, my approach to this problem is also agood example of my philosophy for applying mathwhere appropriate and in concert with existing tools.It also requires minimal preprocessing, which meansless parameter tuning. And finally, unlike many othervideo techniques, it is Eulerian; that is, it requires

no tracking. This is in contract to, for example,other approaches to vocal fold quasiperiodicity detec-tion in video, which can become quite complicateddue to the vocal folds closing and occluding edges[LTR+07, HUH+16]. So the implementation is actu-ally quite simple, as long as one can compute persis-tent homology (Section 5).

Finally, based on some foundational work by Pereaand Harer [PH15], I discovered a truly bizarre re-sult on rhythm hierarchies [TB18] and certain subhar-monic vocal fold patterns [Tra17]: they have slidingwindow embeddings that concentrate on the bound-ary of the Mobius strip. This is a very different andgeneral way to analyze harmonic hierarchies in whicha topological understanding is crucial.

4 Geometry / Topology FueledVisualizations


[BGHT16] Paul Bendich, Ellen Gasparovic, John Harer,and Christopher Tralie. Geometric modelsfor musical audio data. In Proceedings ofthe 32st International Symposium on Compu-tational Geometry (SOCG), 2016.

[BGHT18] Paul Bendich, Ellen Gasparovic, John Harer,and Christopher J Tralie. Scaffoldings andspines: organizing high-dimensional data usingcover trees, local principal component analysis,and persistent homology. In Research in Com-putational Topology, pages 93–114. Springer,2018.

[XTA+18] Boyan Xu, Christopher J. Tralie, Alice Antia,Michael Lin, and Jose A. Perea. Twisty takens:A geometric characterization of good observa-tions on dense trajectories. (In Submission)arXiv preprint arXiv:1809.07131, 2018.

As I have demonstrated thus far, visualizations areessential to my research process. Beyond carefully ap-plying existing visualization techniques, there are in-stances where I used geometric and toplogical ideas todesign new intuitive visualizations of complex data.One of my first such visualization algorithms cameout of some work I did with mathematicians comingup with a “scaffolding” representation of estimatedstratified spaces from point cloud data [BGHT18]. Asan example, I used this to visualize the structure ofmusic [BGHT16]. A more recent incarnation of thisidea is software which I refer to as “Graph Ditty,”which uses a force-weighted graph to visualize a self-similarity matrix resulting from similarity network fu-sion of MFCC and HPCP features on a song (Sec-tion 2.2.1). The fusion helps to keep the ensuing graphclean, and the force-weighting results in an interactiveversion of nonlinear dimension reduction. Figure 17shows an annotated example in which key song struc-tures are visible as loops, and in which recurrence isvisible as loops even within structures (e.g. there is

7

Figure 13: The sliding window embedding of fh(t) = cos(t)+cos(t/3) is a topological loop, so one dot rises abovethe diagonal in the 1D persistence diagram.

Figure 14: The sliding window embedding of fg(t) = cos(t) + cos(t/π) is a torus, so two dots rise above thediagonal in the 1D persistence diagram (one for each core cycle of the torus), and one dot rises above thediagonal for 2D persistence (the torus encloses a void).

Figure 15: Persistence diagrams on the sliding window embedding of a video of periodic vocal fold oscillationsin a healthy patient.

Figure 16: Persistence diagrams on the sliding window embedding of a video of quasiperiodic vocal fold oscillationsin a patient with asymmetric vocal fold oscillations. This is analogous to the synthetic 1D time series in Figure 14.

8

Figure 17: An annotated screenshot from my “GraphDitty” software on the song “Smooth Criminal” byMichael Jackson. Colors correspond to occurrences ofsliding windows in time, so clusters that have differentcolors contain recurrent sections in the music.

repetition within the verse and chorus). The user canlisten to the music synchronized with the graph, sothis makes it easy to visually assess whether differ-ent similarity measures are picking up on perceptualdifferences.

Finally, working with one of my collaborators, JosePerea, I have been part of an effort to implementnew nonlinear dimension reduction techniques basedon topology, including cohomology circular coordi-nates [DSMVJ11, Per18a] and projective coordinates[Per18b]. Recently, I worked closely with a group ofundergraduate students at Brown ICERM to designtime series whose sliding window embeddings resideon non-orientable manifolds [XTA+18], and I used myPython implementation of projective coordinates tohelp show that a Klein bottle time series they dis-covered could in fact be viewed as two Mobius stripsglued together on their boundary (Figure 18), alongwith many other such figures. Not only did this helpus to validate that they were correct empirically aswell as mathematically, but it also helped us to un-derstand the time series much better and to relate toprevious discoveries about time series on the Mobiusstrip [PH15].

5 Open Source Software


[TSBO18] Christopher Tralie, Nathaniel Saul, and RannBar-On. Ripser.py: A lean persistent homologylibrary for python. The Journal of Open SourceSoftware (JOSS), 2018.

As a reaction to the reproducibility crisis in sci-ence, many communities are starting to recognize the

Figure 18: A time series we discovered whose slidingwindow embeddng lies on the Klein bottle. Projectivecoordinates are shown on the bottom. Please refer to[XTA+18] for my intuitive explanation.

value of open source software as a part of the solu-tion. Furthermore, since my work is on algorithmsthat are highly applicable across many domains, I ammotivated to develop software that is easier for non-experts to use. Hence, I have been honing skills inthis area by learning continuous integration tools andother best practices that make it more likely for othersto use my code (skills which I can naturally teach tostudents who are learning coding for the first time).As a recent example, I helped to develop a lightweightPython interface to the “ripser” algorithm for persis-tent homology [TSBO18], which is as simple to installas “pip install ripser.” My collaborator and I expectthis to get more people both from the math and ap-plications side involved in topological data analysis.Furthermore, as I mentioned in Section 8, I also havea deep understanding of existing techniques as a resultof my implementations, and this makes it much easierfor me to draw on them in research and to teach themeffectively in a classroom setting7.

In addition to working on my own software, I alsoreview others’ open source software in several ways.I have been an active contributor of detailed bug re-ports and pull requests on Github for others’ academicsoftware for several years, including a bug I discov-ered in a paper from ISMIR in 2014 [ME14], whichreceived the best paper award that year8. In addition,

7A recent example of this is the week long introduction toTDA workshop I ran with my adviser John Harer https://

github.com/ctralie/TDALabs8The authors are also very dedicated to open source software

and were very gracious. Thankfully, the bug did not impact

9

Figure 19: My talking heads transfer tool, in whichI animated the “head of a virtue” to speak as I wasspeaking. My 3D model time series was taken witha Kinect, and I used a combination of triangle meshdeformation transfer [SP04] and Laplacian mesh edit-ing [SCOL+04], which I implemented in Python, toaccomplish the speech cloning.

I was involved for 3 years (2015-2017) in the “musicinformation retrieval evaluation exchange” (MIREX)[DEBJ10] as a “task captain” for the cover song iden-tification task. Not only did I put in the work to helpothers evaluate their algorithms on larger datasetswhich are behind a firewall due to copyright reasons,but I also stayed on the bleeding edge of developmentin one of my key research areas.

6 The Digital Humanities


[TL14] Christopher Tralie and Amanda Lazarus. A headof our times: Reimagining the heads in the brum-mer collection via real-time face mapping. TheAge of Sensing 5th International Conference onRemote Sensing in Archeaology, 2014.

Though I am in STEM, I believe strongly in thevalue of the humanities. Truly, the humanities are thejoy of life, and the value of STEM is mostly to helpkeep us alive (and to train our minds to solve technicalproblems, but that in isolation is fruitless). This is thereason I, myself chose to study at a small liberal artscollege (Princeton) as an undergraduate, even thoughmy primary academic interests are in STEM.

In my current research life, I look for connectionsto the humanities wherever possible. Since music in-formation retrieval is built upon music, an artistic en-deavor, the connections there are natural. However, Ihave explored other connections of my work on geo-metric time series to the humanities. As an example,I worked on a project called “A Head of Our Times”[TL14], in which I created technology that could tex-ture map and re-animate 3D models that were scannedof statues from the “Brummer collection,” a collectionof busts from medieval cathedrals in France [Mer94].The goal was to enable scholars of this collection tobring the statues “back to life” and having them lit-erally tell their own story. Figure 19 shows a proto-type of the tool I created to do the speech transfer.

their results significantly. If anything, it moved their results inan overall positive direction.

Others are starting to recognize the value of this typeof work, and numerous conferences are springing up,such as the “Age of Sensing,” in which I presentedthis work back in 2014. Thus in addition to being funand meaningful, practical means exist to support thisresearch, even as a STEM academic.

One important lesson I learned as a technical per-son doing this research is how crucial it is to have feed-back from the people who would actually use the tech-nologies. Originally, I was working to design a proto-type in which museum visitors would project their ownfaces onto statue heads as a form of engagement withthe art. However, the art scholars I was working withdeemed this to be distracting and irrelevant. Thanksto this feedback, we were able to settle on a much moreartistically relevant solution with the talking heads.

7 Undergraduate Development

The visual nature of my research and the subject mat-ter involving music and videos should naturally get un-dergraduate students interested ane excited, and thissort of work can help them draw their own mental con-nections between math, CS, ECE, art, music, etc. Theresearch is definitely challenging, but certainly withinthe reach of any undergraduate who is willing to put inthe work. By contrast to pure math, I also have an ad-vantage that there are generally fewer courses neededfor students to jump into novel work, and my researchis diverse enough to support multiple projects that canprogress with different skill levels. Instead, the mainchallenge is training the students in time managementand the process of research. To that end, I have de-veloped a number of strategies for undergraduate re-search in the process of supervising 13 undergraduatestudents across 7 projects so far. The biggest chal-lenge is to keep momentum throughout the semestervia consistent progress, however incremental it maybe. Every week (or daily if it’s a summer program),I work with them to set incredibly specific, realisticgoals for the next time we meet. I also stress veryheavily that negative results are a part of the process,and to please share them. To help break the ice forthis, I am very open about my own process and strug-gles in research as we are going along. Likewise, Iencourage them to let me know the moment that theyare stuck so I can help them9. Since I am a strongcoder, I am able to provide lots of skeleton code (as Ido with teaching) and to help with debugging to keepthem out of a rut and back working on interestingthings.

So far, most of the work I have supervised hasturned into senior theses, with one journal publi-cation in submission [XTA+18] (please refer to myCV for more details). In the future, I would addi-

9We have been doing this by e-mail, but in the future Iwould like to move over to Slack since it is designed specificallyfor these kinds of interactions, and it has worked very well forme in research collaborations.

10

tionally like to explore various publication venues forshorter vignettes more appropriate on an undergrad-uate time scale then lengthy journal papers. Havingsmall projects can boost morale by giving students asense of accomplishment, and it gives them a deliver-able that looks good for graduate school or job appli-cations, even if they don’t have the time to follow upafter the fact like my students in [XTA+18]. In my lineof work, one excellent venue for them is the Sympo-sium on Computational Geometry (SOCG) multime-dia session, which asks for 2 page extended abstractswith either a companion video or demo showing a ge-ometry algorithm (novel or not). Making such demosor videos about geometry can be a particularly funcreative exercise for a group of several students, andI have the knowledge of GUI and data visualizationprogramming in Javascript to help them get started.In music information retrieval, there is MIREX, whichI mentioned in Section 5, which is also accompaniedby a 2 page abstract. MIREX has the added bonusof giving students quantitative feedback in a competi-tion. The International Society of Music InformationRetrieval (ISMIR) also has a serious “late breakingdemo” session accepting 2 page papers with prelimi-nary ideas / results for any research in progress, andthe feedback and exposure to the ISMIR communityhas turned a lot of these initial presentations into ex-cellent full papers later on. Finally, there is alwaysa need for robust, well documented implementationsof algorithms which are open to the community. TheJournal of Open Source Software (JOSS) is an excel-lent academic venue for undergraduates to get creditfor this kind of work that’s done supporting biggerprojects, regardless of if the research works out. Asimilar, more specific venue in image processing is“Image Processing Online” (IPOL) [LM11].

8 My Algorithm for GeneratingNew Research

Now that I have given an overview of my work todate, I will describe my general approach to a sus-tained research program. I approach research in avery open and exploratory fashion. When faced witha new problem, my main tools for exploration are vi-sualizations that I code up to help me probe data.Some of them are interactive (Section 4), but manyof them are simply videos that I make for myself andmy collaborators showing some evolving statistics orplots synchronized with the time series under study(e.g. https://youtu.be/BGQiZbqj-4U). Sometimes,I even provide easy-to-use interactive interfaces (usu-ally in Javascript) for my collaborators who knowless about programming to share in this process (Sec-tion 5). There is much overlap here with what I do formy students in active learning exercises. As my stu-dents have also experienced, these visualizations oftenaid in cutting directly through to the core mathemat-

ical issues at play.When solving the original problem turns out to be

too difficult or uninteresting, I make observations dur-ing my process that help me to branch out into new,more approachable or relevant problems. This is pre-cisely how I stumbled into the largest, most fruitfulline of my work using self-similarity matrices on aplethora of problems (Section 2), rather than the topo-logical data analysis tools on music genre identifica-tion that I started with (which turned out both not towork and not to be interesting). Furthermore, sinceI studied computer science and electrical engineeringduring my undergraduate and masters, but since I alsobegan working with mathematicians during my Ph.D.,I have a wide breadth of knowledge and can often findholes where these areas can complement each other.Additionally, I have a depth of knowledge on appliedalgorithms and signal processing that I have built upover the years by implementing virtually every algo-rithm I use from scratch, which makes it easy for meto understand the “trade space” in which different al-gorithms are useful.

Finally, I am a huge proponent of small vignettesand “hackathons” where I sequester myself for a fewdays to try out “pie in the sky” ideas, which also oftenfeed research or at least get me to learn more toolsthat are useful later in unexpected ways. In fact, Irecently won a hackathon by making a program called“Face Jam” 10 in which a face in an image changesexpressions to the music and raises its eyebrows to thebeat, and I now have the tools to normalize for facialexpressions in video, which I hope to use in heartbeatanalysis in video (Section 3). I have held hackathonsfor my students during class assignments as well toteach them the importance of taking space and timeto work on problems and learn new ideas deeply.

9 Future Directions

[REMOVED FOR ONLINE VERSION]

External References

[AHEK10] Pradeep K Atrey, M Anwar Hossain, Abdul-motaleb El Saddik, and Mohan S Kankan-halli. Multimodal fusion for multimediaanalysis: a survey. Multimedia Systems,16(6):345–379, nov 2010.

[AZZP15] I. Anina, Z. Zhou, G. Zhao, and M. Pietiki-nen. Ouluvs2: A multi-view audiovisualdatabase for non-rigid mouth motion analy-sis. In 2015 11th IEEE International Confer-ence and Workshops on Automatic Face andGesture Recognition (FG), volume 1, pages1–5, May 2015.

10Please see the video demo at https://www.youtube.com/

watch?v=nCy7NGGN-3U

11

[BBK10] Alexander M Bronstein, Michael M Bron-stein, and Ron Kimmel. The video genome.arXiv preprint arXiv:1003.5320, 2010.

[Bel07] Juan Pablo Bello. Audio-based cover songretrieval using approximate chord sequences:Testing shifts, gaps, swaps and beats. In IS-MIR, volume 7, pages 239–244, 2007.

[Bel09] Juan P. Bello. Grouping recorded music bystructural similarity. Int. Conf. Music Inf.Retrieval (ISMIR-09), 2009.

[Bel11] Juan P Bello. Measuring structural sim-ilarity in music. IEEE Transactions onAudio, Speech, and Language Processing,19(7):2013–2025, 2011.

[BHT63] Bruce P Bogert, Michael JR Healy, andJohn W Tukey. The quefrency alanysis oftime series for echoes: Cepstrum, pseudo-autocovariance, cross-cepstrum and saphecracking. In Proceedings of the symposium ontime series analysis, volume 15, pages 209–243. chapter, 1963.

[BLC+11] Doug M Boyer, Yaron Lipman, Elizabeth StClair, Jesus Puente, Biren A Patel, ThomasFunkhouser, Jukka Jernvall, and IngridDaubechies. Algorithms to automaticallyquantify the geometric similarity of anatom-ical surfaces. Proceedings of the NationalAcademy of Sciences, 2011.

[BM] Joan Bruna and Stephane Mallat. Classifi-cation with scattering operators. In CVPR2011.

[BMM+15] Elodie F Briefer, Anne-Laure Maigrot, RoiMandel, Sabrina Briefer Freymond, Iris Bach-mann, and Edna Hillmann. Segregation ofinformation about emotional arousal and va-lence in horse whinnies. Scientific reports,4:9989, 2015.

[BMM+16] Paul Bendich, James S Marron, Ezra Miller,Alex Pieloch, and Sean Skwerer. Persistenthomology analysis of brain artery trees. Theannals of applied statistics, 10(1):198, 2016.

[Bru13] Joan Bruna. Scattering representations forrecognition. PhD thesis, Ecole PolytechniqueX, 2013.

[Car09] Gunnar Carlsson. Topology and data. Bul-letin of the American Mathematical Society,46(2):255–308, 2009.

[CD00] Ross Cutler and Larry S. Davis. Robustreal-time periodic motion detection, analy-sis, and applications. IEEE Transactions onPattern Analysis and Machine Intelligence,22(8):781–796, 2000.

[CLX17] Ning Chen, Wei Li, and Haidong Xiao. Fus-ing similarity functions for cover song identi-fication. Multimedia Tools and Applications,pages 1–24, 2017.

[Cur17] Carina Curto. What can topology tell usabout the neural code? Bulletin of the Amer-ican Mathematical Society, 54(1):63–78, 2017.

[DEBJ10] J Stephen Downie, Andreas F Ehmann, MertBay, and M Cameron Jones. The music infor-mation retrieval evaluation exchange: Someobservations and insights. In Advances inmusic information retrieval, pages 93–115.Springer, 2010.

[DSMVJ11] Vin De Silva, Dmitriy Morozov, and MikaelVejdemo-Johansson. Persistent cohomologyand circular coordinates. Discrete & Compu-tational Geometry, 45(4):737–759, 2011.

[EH08] Herbert Edelsbrunner and John Harer. Per-sistent homology-a survey. Contemporarymathematics, 453:257–282, 2008.

[EH10] Herbert Edelsbrunner and John Harer. Com-putational topology: an introduction. Ameri-can Mathematical Soc., 2010.

[Ell06] Daniel PW Ellis. Identifying’cover songs’with beat-synchronous chroma features.MIREX 2006, pages 1–4, 2006.

[Ell07] Daniel PW Ellis. The “covers80” cover songdata set. URL: http://labrosa. ee. columbia.edu/projects/coversongs/covers80, 2007.

[ELZ00] Herbert Edelsbrunner, David Letscher, andAfra Zomorodian. Topological persistenceand simplification. In Foundations of Com-puter Science, 2000. Proceedings. 41st AnnualSymposium on, pages 454–463. IEEE, 2000.

[FMP10] Jordan Frank, Shie Mannor, and Doina Pre-cup. Activity and gait recognition with time-delay embeddings. In AAAI. Citeseer, 2010.

[Foo00] Jonathan Foote. Automatic audio segmenta-tion using a measure of audio novelty. In Mul-timedia and Expo, 2000. ICME 2000. 2000IEEE International Conference on, volume 1,pages 452–455. IEEE, 2000.

[Ghr14] Robert W Ghrist. Elementary applied topol-ogy. Createspace, 2014.

[GLMH12] Jennifer Gamble, Manuel O Lagravere,Paul W Major, and Giseon Heo. New sta-tistical method to analyze three-dimensionallandmark configurations obtained with cone-beam ct: basic features and clinical applica-tion for rapid maxillary expansion. Koreanjournal of radiology, 13(2):126–135, 2012.

[GM11] Dian Gong and Gerard Medioni. Dy-namic manifold warping for view invariantaction recognition. In Computer Vision(ICCV), 2011 IEEE International Confer-ence on, pages 571–578. IEEE, 2011.

[GMB+12] Taras Galkovskyi, Yuriy Mileyko, Alexan-der Bucksch, Brad Moore, Olga Symonova,Charles A Price, Christopher N Topp, An-jali S Iyer-Pascuzzi, Paul R Zurek, SuqinFang, et al. Gia roots: software for thehigh throughput analysis of plant root systemarchitecture. BMC plant biology, 12(1):116,2012.

[GMFL17] Bryan Glaz, Igor Mezic, Maria Fonoberova,and Sophie Loire. Quasi-periodic intermit-tency in oscillating cylinder flow. Journal ofFluid Mechanics, 828:680–707, 2017.

12

[GMM+17] Ulf Großekathofer, Nikolay V Manyakov, Vo-jkan Mihajlovic, Gahan Pandina, AndrewSkalkin, Seth Ness, Abigail Bangerter, andMatthew S Goodwin. Automated detectionof stereotypical motor movements in autismspectrum disorder using recurrence quantifi-cation analysis. Frontiers in neuroinformat-ics, 11:9, 2017.

[Gom06] Emilia Gomez. Tonal description of poly-phonic audio for music content process-ing. INFORMS Journal on Computing,18(3):294–304, 2006.

[GPCI15] Chad Giusti, Eva Pastalkova, Carina Curto,and Vladimir Itskov. Clique topology revealsintrinsic geometric structure in neural corre-lations. Proceedings of the National Academyof Sciences, 2015.

[Gro07] Mikhail Gromov. Metric structures forRiemannian and non-Riemannian spaces.Springer Science & Business Media, 2007.

[HBTS94] Hanspeter Herzel, David Berry, Ingo R Titze,and Marwa Saleh. Analysis of vocal disor-ders with methods from nonlinear dynam-ics. Journal of Speech, Language, and HearingResearch, 37(5):1008–1019, 1994.

[HNB13] Eric J Humphrey, Oriol Nieto, andJuan Pablo Bello. Data driven and dis-criminative projections for large-scale coversong identification. In ISMIR, pages 149–154,2013.

[HUH+16] Christian T Herbst, Jakob Unger, HanspeterHerzel, Jan G Svec, and Jorg Lohscheller.Phasegram analysis of vocal fold vibrationdocumented with laryngeal high-speed videoendoscopy. Journal of Voice, 30(6):771–e1,2016.

[JDLP08] Imran N Junejo, Emilie Dexter, Ivan Laptev,and Patrick Perez. Cross-view action recog-nition from temporal self-similarities. In Pro-ceedings of the 10th European Conference onComputer Vision: Part II, pages 293–306.Springer-Verlag, 2008.

[JDLP11] Imran N Junejo, Emilie Dexter, Ivan Laptev,and Patrick Perez. View-independent actionrecognition from temporal self-similarities.IEEE transactions on pattern analysis andmachine intelligence, 33(1):172–185, 2011.

[KGKM13] M Kramar, A Goullet, L Kondic, and K Mis-chaikow. Persistence of force networks incompressed granular media. Physical ReviewE, 87(4):042207, 2013.

[KS04] Holger Kantz and Thomas Schreiber. Nonlin-ear time series analysis, volume 7. Cambridgeuniversity press, 2004.

[KS10] Florian Kaiser and Thomas Sikora. Musicstructure discovery in popular music usingnon-negative matrix factorization. In ISMIR,pages 429–434, 2010.

[LB+95] Yann LeCun, Yoshua Bengio, et al. Convolu-tional networks for images, speech, and timeseries. The handbook of brain theory and neu-ral networks, 3361(10):1995, 1995.

[LBH15] Yann LeCun, Yoshua Bengio, and Geof-frey Hinton. Deep learning. nature,521(7553):436, 2015.

[LM11] Nicolas Limare and Jean-Michel Morel. Theipol initiative: Publishing and testing algo-rithms on line for reproducible research in im-age processing. In International Conferenceon Computational Science, page 00, 2011.

[LTR+07] Jorg Lohscheller, Hikmet Toy, FrankRosanowski, Ulrich Eysholdt, and MichaelDollinger. Clinically evaluated procedure forthe reconstruction of vocal fold vibrationsfrom endoscopic digital high-speed videos.Medical image analysis, 11(4):400–413, 2007.

[Mal] Stephane Mallat. Group invariant scattering.Communications on Pure and Applied Math-ematics, 65(10):1331–1398.

[ME14] Brian McFee and Daniel PW Ellis. Analyz-ing song structure with spectral clustering. In15th International Society for Music Informa-tion Retrieval (ISMIR) Conference, 2014.

[Mer94] Jill Meredith. Romancing the stone: resolv-ing some provenance mysteries of the brum-mer collection at duke university. Gesta,33(1):38–46, 1994.

[NB14] Oriol Nieto and Juan Pablo Bello. Music seg-ment similarity using 2d-fourier magnitudecoefficients. In Acoustics, Speech and SignalProcessing (ICASSP), 2014 IEEE Interna-tional Conference on, pages 664–668. IEEE,2014.

[Nol10] David D Nolte. The tangled tale of phasespace. Physics today, 63(4):33–38, 2010.

[OVDE16] Julien Osmalsky, Marc Van Droogenbroeck,and Jean-Jacques Embrechts. Enhancingcover song identification with hierarchicalrank aggregation. In Proceedings of the17th International for Music Information Re-trieval Conference, pages 136–142, 2016.

[PDHH15] Jose A Perea, Anastasia Deckard, Steve BHaase, and John Harer. Sw1pers: Slidingwindows and 1-persistence scoring; discover-ing periodicity in gene expression time seriesdata. BMC bioinformatics, 16(1):257, 2015.

[Per18a] J.A. Perea. Towards sparse and stable circu-lar coordinates. 2018.

[Per18b] Jose A Perea. Multiscale projective coordi-nates via persistent cohomology of sparse fil-trations. Discrete & Computational Geome-try, 59(1):175–225, 2018.

[PH15] Jose A Perea and John Harer. Slidingwindows and persistence: An applicationof topological methods to signal analysis.Foundations of Computational Mathematics,15(3):799–838, 2015.

[PMT+14] Emil Plesnik, Olga Malgina, Jurij F Tasic,Saso Tomazic, and Matej Zajc. Detectionand delineation of the electrocardiogram qrs-complexes from phase portraits. 2014.

13

[SCOL+04] Olga Sorkine, Daniel Cohen-Or, Yaron Lip-man, Marc Alexa, Christian Rossl, and H-P Seidel. Laplacian surface editing. In Pro-ceedings of the 2004 Eurographics/ACM SIG-GRAPH symposium on Geometry processing,pages 175–184. ACM, 2004.

[SGHS08] Joan Serra, Emilia Gomez, Perfecto Her-rera, and Xavier Serra. Chroma binarysimilarity and local alignment applied tocover song identification. Audio, Speech, andLanguage Processing, IEEE Transactions on,16(6):1138–1151, 2008.

[SM06] Mikkel N Schmidt and Morten Mørup. Non-negative matrix factor 2-d deconvolution forblind single channel source separation. In In-ternational Conference on Independent Com-ponent Analysis and Signal Separation, pages700–707. Springer, 2006.

[SMGA12] Joan Serra, Meinard Muller, Peter Grosche,and Josep Lluis Arcos. Unsupervised detec-tion of music boundaries by time series struc-ture features. In Twenty-Sixth AAAI Confer-ence on Artificial Intelligence, 2012.

[SP04] Robert W Sumner and Jovan Popovic. Defor-mation transfer for triangle meshes. In ACMTransactions on Graphics (TOG), volume 23,pages 399–405. ACM, 2004.

[SSA09] Joan Serra, Xavier Serra, and Ralph G An-drzejak. Cross recurrence quantification forcover song identification. New Journal ofPhysics, 11(9):093017, 2009.

[Sta05] Cornelis J Stam. Nonlinear dynamical anal-ysis of eeg and meg: review of an emergingfield. Clinical Neurophysiology, 116(10):2266–2301, 2005.

[SVK17] Jiawei Su, Danilo Vasconcellos Vargas, andSakurai Kouichi. One pixel attack for fool-ing deep neural networks. arXiv preprintarXiv:1710.08864, 2017.

[SYB+16] Diego F Silva, Chin-Chin M Yeh, GustavoEnrique de Almeida Prado Alves Batista, Ea-monn Keogh, et al. Simple: assessing musicsimilarity using subsequences joins. In Inter-national Society for Music Information Re-trieval Conference, XVII. International Soci-ety for Music Information Retrieval-ISMIR,2016.

[T+81] Floris Takens et al. Detecting strange attrac-tors in turbulence. Lecture notes in mathe-matics, 898(1):366–381, 1981.

[Tau00] G TaubinY. Geometric signal processing onpolygonal meshes. 2000.

[TNZS16] George Trigeorgis, Mihalis A Nicolaou, Ste-fanos Zafeiriou, and Bjorn W Schuller. Deepcanonical time warping. 2016.

[VT16] V Venkataraman and P Turaga. Shape de-scriptions of nonlinear dynamical systems forvideo-based inference. IEEE transactionson pattern analysis and machine intelligence,2016.

[W+03] Avery Wang et al. An industrial strength au-dio search algorithm. In ISMIR, pages 7–13.Washington, DC, 2003.

[Wan06] Avery Wang. The shazam music recogni-tion service. Communications of the ACM,49(8):44–48, 2006.

[WJW+12] Bo Wang, Jiayan Jiang, Wei Wang, Zhi-HuaZhou, and Zhuowen Tu. Unsupervised metricfusion by cross diffusion. In Computer Visionand Pattern Recognition (CVPR), 2012 IEEEConference on, pages 2997–3004. IEEE, 2012.

[WMD+14] Bo Wang, Aziz M Mezlini, Feyyaz Demir,Marc Fiume, Zhuowen Tu, Michael Brudno,Benjamin Haibe-Kains, and Anna Golden-berg. Similarity network fusion for aggregat-ing data types on a genomic scale. Naturemethods, 11(3):333, 2014.

[ZDlT16] Feng Zhou and Fernando De la Torre. Gen-eralized canonical time warping. IEEE trans-actions on pattern analysis and machine in-telligence, 38(2):279–294, 2016.

14

research statement geometric audiovisual signal ...is music information retrieval (mir), an...

Documents