Abstract: |
In statistical data analysis, we are often looking for structure in high
dimensional data. In classification problems, we are interested in how different
known classes separate from (and relate to) one another in the data space
of measured values. In clustering, we are hoping to discover distinct groups
of points in this space. In model building, we are often interested in which
data points agree/disagree with the conjectured model and whether important
structure has been missed. And, … we hope to do all of this without
prejudging the nature of the structure itself, even as far as to discover
the unanticipated!
In three or fewer dimensions, our visual system is an important asset,
as much (even unanticipated) structure can be recognized effortlessly when
points can be plotted so few dimensions. Unfortunately, even after formal
dimension reduction methods have been applied, we are often faced with many
more dimensions than three.
In this talk, I will explore some visualization methods for high dimensional
data. I will review and illustrate methods based on radial, parallel, and
orthogonal coordinates. These three axis systems have different strengths
and weaknesses. In all cases however, improvements may be had by casting
the axis arrangement in a graph theoretic framework. I will explore the relevant
graph theoretic representations and illustrate their use on real data sets.
I will pay particular attention to the orthogonal axis system and show how
graph traversal can be used to meaningfully navigate through high dimensional
space.
All software used is (or shortly will be) available as a package in the
open source statistical system called R.
This is based on joint work with Catherine Hurley of the National
University of Ireland, Maynooth and Adrian Waddell of the University
of Waterloo.
|