| mond {heR.Misc} | R Documentation |
Create a Mondrian "World" plot for use in visualizing relationships in an n-dimensional data space. Useful for large data sets with many records and multiple variates.
mond(z, colors, levels=4, pretty=FALSE, dep=TRUE,
nlab=4, samples=TRUE, plot=TRUE,
legend=TRUE, xlab, ylab, main, ... )
z |
a dataframe object containing at least two variates (columns) and any number of records (rows) |
colors |
a vector of integers or color names, defaults to ("white", "black", "green", "red", "blue", ... ) |
levels |
number of splits (class intervals). Can be an integer, a vector, or a list. A single integer indicates that all variates will be cut into that number of levels. A vector (of at least length 2) specifies the number of cuts for each variate. A list (with at least 2 components) specifies the exact N cut points (giving N-1 different levels) for each variate. |
pretty |
a logical specifying whether to use levels generated by the
either the pretty function (pretty=TRUE) or in even splits. This argument is only relevant
when levels is a single integer or a vector of integers. |
dep |
logical, whether to plot variates as dependent or independent. Currently unimplemented. |
nlab |
the number of tic marks and labels for the vertical axis |
samples |
a logical specifying whether or not the vertical axis scales according to the number of samples (samples=TRUE) or the proportion of samples. |
legend |
A logical specifying whether or not a legend will be drawn
for each variate (legend=TRUE). |
plot |
A logical specifying whether or not the plot will be drawn. If
plot=FALSE, then the function returns a list (see below). |
xlab,ylab,main |
Axis labels and plot title |
... |
additional graphical parameters |
A Mondrian or "World" plot is a simple sorted scatterplot of all the data in a given dataframe. It provides a quick "bird's eye" or "world" view of the data space. Colors are used to represent class intervals and the records of sequential variates are sorted in a nested fashion to reveal conditional dependencies.
The function makes use of existing R functions:
* cut – to create coded levels in the dataframe
* order – to create a nested sort of the data space
* image – to plot the data samples (records) as a series of colored blocks; Each block color (for each variate) corresponds to a specific class interval
* hist – to check that given cut points span the entire data range and to provide frequency counts to aid in the plotting of clarifying lines.
Light gray lines are plotted that indicate the number of samples (or the proportion of samples) that lie in a given range of the root variate (i.e., a given class interval).
When plot=FALSE, the function returns a list with the following components:
zord |
The sorted and recoded dataframe z. This is the dataframe that is
plotted with the image function to produce the basic Mondrian plot. |
brks |
The breakpoints used for the root variate. |
cnts |
The counts in each class interval for the root variate. |
cumcnts |
The cumulative counts in each class interval for the root variate. |
cumperc |
The cumulative proportion of counts in each class interval for the root variate. |
Neil E. Klepeis
neil AT exposurescience DOT org
http://klepeis.net
http://exposurescience.org