mond {heR.Misc}R Documentation

Mondrian Plot

Description

Create a Mondrian "World" plot for use in visualizing relationships in an n-dimensional data space. Useful for large data sets with many records and multiple variates.

Usage

mond(z, colors, levels=4, pretty=FALSE, dep=TRUE,
     nlab=4, samples=TRUE, plot=TRUE,
     legend=TRUE, xlab, ylab, main, ... )

Arguments

z a dataframe object containing at least two variates (columns) and any number of records (rows)
colors a vector of integers or color names, defaults to ("white", "black", "green", "red", "blue", ... )
levels number of splits (class intervals). Can be an integer, a vector, or a list. A single integer indicates that all variates will be cut into that number of levels. A vector (of at least length 2) specifies the number of cuts for each variate. A list (with at least 2 components) specifies the exact N cut points (giving N-1 different levels) for each variate.
pretty a logical specifying whether to use levels generated by the either the pretty function (pretty=TRUE) or in even splits. This argument is only relevant when levels is a single integer or a vector of integers.
dep logical, whether to plot variates as dependent or independent. Currently unimplemented.
nlab the number of tic marks and labels for the vertical axis
samples a logical specifying whether or not the vertical axis scales according to the number of samples (samples=TRUE) or the proportion of samples.
legend A logical specifying whether or not a legend will be drawn for each variate (legend=TRUE).
plot A logical specifying whether or not the plot will be drawn. If plot=FALSE, then the function returns a list (see below).
xlab,ylab,main Axis labels and plot title
... additional graphical parameters

Details

A Mondrian or "World" plot is a simple sorted scatterplot of all the data in a given dataframe. It provides a quick "bird's eye" or "world" view of the data space. Colors are used to represent class intervals and the records of sequential variates are sorted in a nested fashion to reveal conditional dependencies.

The function makes use of existing R functions:
* cut – to create coded levels in the dataframe
* order – to create a nested sort of the data space
* image – to plot the data samples (records) as a series of colored blocks; Each block color (for each variate) corresponds to a specific class interval
* hist – to check that given cut points span the entire data range and to provide frequency counts to aid in the plotting of clarifying lines.

Light gray lines are plotted that indicate the number of samples (or the proportion of samples) that lie in a given range of the root variate (i.e., a given class interval).

Value

When plot=FALSE, the function returns a list with the following components:

zord The sorted and recoded dataframe z. This is the dataframe that is plotted with the image function to produce the basic Mondrian plot.
brks The breakpoints used for the root variate.
cnts The counts in each class interval for the root variate.
cumcnts The cumulative counts in each class interval for the root variate.
cumperc The cumulative proportion of counts in each class interval for the root variate.

Author(s)

Neil E. Klepeis
neil AT exposurescience DOT org
http://klepeis.net
http://exposurescience.org


[Package heR.Misc version 0.0.4 Index]