| Time series, shape and more generally multimedia data are
ubiquitous; large volumes of such data are routinely created in
scientific, industrial, entertainment, medical and biological
domains. Examples include anthropological imagery, gene
expression data, medical images, electrocardiograms, gait
analysis, stock market quotes, space telemetry, military
intelligence, zoology etc. To efficiently and accurately mine
such data we must carefully chose algorithms and data
representations. While most representations used in the past
have been real valued (i.e. wavelets and Fourier methods), this
tutorial will advocate for using discrete (symbolic)
representations of the data. Symbolic representations allow us
to avail of very useful algorithms and data structures which are
not available for real data, for example suffix trees, hashing
and Markov models. The tutorial will be illustrated with
numerous real world examples created just for this tutorial,
including examples from archeology (petroglyphs and projectile
points), microscopy (nematodes and blood cells), historical
manuscripts, zoology, motion capture and biometrics. The data
mining tasks considered include indexing, classification,
clustering, novelty discovery, motif discovery and
visualization.
Dr. Keogh's research interests are in Data Mining, Machine
Learning and Information Retrieval. He has published more than
90 papers, including 11 papers in SIGKDD, 12 papers in IEEE ICDM.
Several of his papers have won "best paper" awards. He is the
recipient of a 5-year NSF Career Award for "Efficient Discovery
of Previously Unknown Patterns and Relationships in Massive Time
Series Databases". Dr Keogh has given well received tutorials on
time series, machine learning and data mining all over the
world, and his papers have been referenced well over 3,000
times. |