Unix Data Manipulation

August 13th, 2008

Standard unix distributions (e.g. CentOS, RHEL, etc.) come with many commands that are useful for graph data extraction and manipulation. A brief list of some of these commands are:

  • grep - to extract data
  • sort - to order data
  • cut - to extract columns from data
  • paste - to reattach columns of data
  • uniq - to remove duplicated lines
  • join - to perform a relational join of lines
  • cmp - to compare files
  • diff - to show the differences between files
  • head - to extract the first lines of a file
  • tail - to extract the last lines of a file
  • tr - to translate character sets
  • sed - to perform regular expressions

These can be used in combination to extract data from a file, parse out the necessary fields, sort the fields, and add them to a data file. Sure beats cut & paste by hand!

Portable Anymap (PNM) Formats

July 22nd, 2008

I have used ImageMagick quite often for a variety of projects and I often noted the many “pnm” utilities that come with it. For example, pnmrotate, pnmscale, pnmcat, etc.  I never really paid attention to what these formats are, however. PNM provides a standard format for uncompressed bitmaps (PBM), grayscale (PGM) and color images (PPM) that is very easy to parse and write. These are perfect for manipulating with Perl, C, or any other language where you want to do a low-level hack.

There are 6 types of these images and they are all identified by the first byte of the file. This byte provides a “magic number” and is decoded as:

  • P1: ASCII bitmap
  • P2: ASCII grayscale
  • P3: ASCII color
  • P4: Binary bitmap
  • P5: Binary grayscale
  • P6: Binary color

After the magic number, there can be an arbitrary amount of whitespace followed by a newline. Then, the next line is an optional comment line if it begins with a “#”. After the comment terminates with a newline, there is again an arbitrary amount of white space followed by a newline. Now, the size of the image is given in pixels using standard ASCII. For example, “640 480″ would be a 640 wide by 480 tall image. The header information can be summarized like this example:

 P3
 # feep.ppm
 4 4

What follows next depends on the image format. Each one is presented briefly.

For P1/P4 bitmap formats, the rasterized bitmap data is in the file. For P1 types, this data is just shown as ASCII “0″ and “1″s with optional whitespace. For P4 types, the data is packed (to the left) big-endian binary. The binary data is padded with don’t care values to give a full line. It is important, however, that NO LINE CAN EXTEND MORE THAN 70 BYTES/CHARACTERS. Some tools ignore this, but others don’t. ASCII example:

P1
# feep.pbm
24 7
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0
0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0
0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 0
0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

For P2/P5 grayscale formats, we first get a single integer that describes the maximum range of the gray. It can be 0 to 65536. 255 would be white while 0 is black.Following this, the data is again encoding as either ASCII numbers or 1-2 bytes that are big enough to hold the maximum value. ASCII example:

P2
# feep.pgm
24 7
15
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
0  3  3  3  3  0  0  7  7  7  7  0  0 11 11 11 11  0  0 15 15 15 15  0
0  3  0  0  0  0  0  7  0  0  0  0  0 11  0  0  0  0  0 15  0  0 15  0
0  3  3  3  0  0  0  7  7  7  0  0  0 11 11 11  0  0  0 15 15 15 15  0
0  3  0  0  0  0  0  7  0  0  0  0  0 11  0  0  0  0  0 15  0  0  0  0
0  3  0  0  0  0  0  7  7  7  7  0  0 11 11 11 11  0  0 15  0  0  0  0
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

For P3/P6 color images, the format is the same as the grayscale images except that there are now 3, instead of 1, numbers to represent red, blue and green. ASCII example:

P3
# feep.ppm
4 4
15
 0  0  0    0  0  0    0  0  0   15  0 15
 0  0  0    0 15  7    0  0  0    0  0  0
 0  0  0    0  0  0    0 15  7    0  0  0
15  0 15    0  0  0    0  0  0    0  0  0

Animations

June 3rd, 2008

A really fun feature of gnuplot in the more recent versions is animated GIFs. This requires version 2.0.28 of the Boutell gd library and 4.3(?) of gnuplot. You do this with the terminal type:

set terminal gif animate delay 10

and then plot several items. The delay is in 1/100th of a second, so each frame in this case is 0.1s (10 frames per second). Combined with yesterday’s post, we can make animated droplets:

Animated GIF Example
The entire source code for the example is here.

Variable Size Points

June 2nd, 2008

Example of variable-size data points.

I remember writing a Perl hack to do this some years ago, but it appears that you can now specify the size of the points in a plot as a 3rd variable. For example,

plot '-' using 1:2:3 with points lt 1 pt 6 ps variable
1 3 8
6 2 2
5 5 4
e

Point type 6 is circles and the 3rd column specifies the size of the circles.

Geometric Layout Plots

June 1st, 2008

One of the most frequent things that I use gnuplot for is to view geometric layouts. This can be done by simply generating data for the shapes in clockwise or counter-clockwise order. For example,

set xrange [-1:21]
set yrange [-1:21]
plot ‘-’ with lp
0 0
0 10
10 10
10 0
0 0

10 10
10 20
20 20
20 10
10 10
e

will plot two squares of size “10″ on a side. Multiple shapes are added to the same data source by leaving a blank line between the data sets.

I have used this to plot VLSI circuit layouts in a very portable manner (since gnuplot is on almost every platform). Here is a small circuit:

If the data is not “in line” (i.e. you put it in a file instead of the ‘-’) then you can also interactively zoom when using the X11 terminal. This is a newer feature of gnuplot 3.8+. You simply use the right mouse button to select a region and then click the left button to execute the zoom. Pressing “p” will return to the previous scale.

Welcome

June 1st, 2008

I’ve always been fascinated by cool graphs. To quote one of my favorite movies:

(1) Mathematics is the language of nature; (2) Everything around us can be represented and understood from numbers; (3) If you graph the numbers in any systems, patterns emerge.

- Maximillian Cohen, Pi by Darren Aronofsky (1998)

This is a site dedicated to creating, sharing, and enabling the creation of interesting graphs. I use many free tools to do this such as gnuplot, graphviz, xgraph, xfig, Inkscape, and hacks with Perl and Unix to manipulate data.

Part of the value of this site is also a tuturial for people that are new to data visualization on Unix/Linux/OSX platforms. Periodically, I will post “tips” about tools and key features of these tools.