Wednesday, October 14, 2009

Day 2

Today, we will create a simple network, read it into R, and draw it using igraph. One way of representing a network is as an edge list. Edge lists are easy to create. Consider yourself and a few of your immediate friends. Then think of some of your friends' immediate friends. You can represent this in a file that contains one line for each friendship, like this:

Charlie Lucy
Charlie Patty
Charlie Linus
Charlie Sally
Charlie Snoopy
Patty Marcie
Lucy Linus
Lucy Schroeder
Snoopy Woodstock
Snoopy Spike
Snoopy Patty
Sally Linus

Save this list in a file, and start up R. In the Misc menu change the working directory to the directory that contains the file. Now, load the file into a data frame, which is the way R represents data internally. Since we have not provided table heads (think columns in Excel) for the two columns in the file, we turn table heads off:

> friends.data <- read.table("peanuts.txt", head=F)

Create a graph from this edge list.
Load the igraph library (see Day 1 for how to do this), then enter:

> friends <- graph.data.frame(friends.data, directed=F)

Note that we specify the graph as undirected. This means that all links between nodes of the graph are bidirectional. Since Lucy is a friend Charlie, Charlie is also a friend of Lucy. We could model directionality as well, but won't do this in most of this course. To get a summary of the graph:

> summary(friends)
Vertices: 10
Edges: 12
Directed: FALSE
No graph attributes.
Vertex attributes: name.
No edge attributes.

Now that we have created the graph, we can draw it. In R you use the plot command to draw a graph. If you provide the graph as the only argument to plot, the result will be less than satisfying (try it!). Instead, you want to specify a layout that will draw the graph nicely according to criteria of what constitutes a good visualization. For example:

> plot(friends, layout=layout.kamada.kawai)

BTW, if you can't remember the names of the layouts, remember just one thing: help is only a few clicks away. Enter "layout" in the search box on the R GUI, or enter the following on the command line:

> help(layout)

One thing we would like to improve in the resulting diagram is that we would like to see the names of our friends instead of the numerical index of each node. We can get a more readable diagram by adding a parameter vertex.label to the plot command like this:

> plot(friends, layout=layout.kamada.kawai, vertex.label=V(friends)$name)

This tells R to use the names of the nodes instead of their indices. V(friends) returns a list of the names, and $name refers to the name attribute of a node. V(friends)$name returns a vector of node names. You can convince yourself as follows:

> V(friends)$name
[1] "Charlie" "Patty" "Lucy" "Snoopy" "Sally" "Linus"
[7] "Marcie" "Schroeder" "Woodstock" "Spike"

This is the output of drawing the friendship network:


With that, we should call it a day.

Wednesday, October 7, 2009

Day 1

Start by installing R. R is an open source platform for statistical computing. The R project is hosted at the R Project site. Download R for your platform. The current version is 2.9.2. But pretty much any version should work.

Pick up a cup of coffee while you are waiting for the file to download.

Install R by following the instructions. Usually this means little more than clicking on the disk image and running the installer.

Test the installation. You can think of R as a gigantic calculator. Start up R and run a few tests to see that your install worked:

> 2+2
[1] 4
> exp(-2)
[1] 0.1353353
>
The > is the prompt, and the numbers in parentheses in the answer help you find your way through the result. Often, the results are given as a vector as in the examples above.

Next, install igraph. igraph is an R package for social network analysis. Many packages have been written for R for all kinds of purposes. You can find most of them on CRAN, the R archive. Here is a Canadian CRAN mirror site: http://cran.stat.sfu.ca. Select the Package Installer from the Packages & Data menu, click Get List and enter "igraph" in the search box. Select igraph and hit Install Selected. The current version of igraph is 0.5.2-2.

If you were successful, you should see something like the following:

trying URL 'http://probability.ca/cran/bin/macosx/universal/contrib/2.9/igraph_0.5.2-2.tgz'
Content type 'application/x-gzip' length 2401199 bytes (2.3 Mb)
opened URL
==================================================
downloaded 2.3 Mb

The downloaded packages are in
/var/folders/A6/A6HuYMhtGsSKIXTAuzxo9U+++TI/-Tmp-//RtmpX7GvKg/downloaded_packages

To verify the installation, load the igraph library and search for the help page on an igraph function. Something like this:

> library(igraph)
> help(degree)
>

Enough for today. Give yourself a pat on the back.