Wednesday, October 14, 2009

Day 2

Today, we will create a simple network, read it into R, and draw it using igraph. One way of representing a network is as an edge list. Edge lists are easy to create. Consider yourself and a few of your immediate friends. Then think of some of your friends' immediate friends. You can represent this in a file that contains one line for each friendship, like this:

Charlie Lucy
Charlie Patty
Charlie Linus
Charlie Sally
Charlie Snoopy
Patty Marcie
Lucy Linus
Lucy Schroeder
Snoopy Woodstock
Snoopy Spike
Snoopy Patty
Sally Linus

Save this list in a file, and start up R. In the Misc menu change the working directory to the directory that contains the file. Now, load the file into a data frame, which is the way R represents data internally. Since we have not provided table heads (think columns in Excel) for the two columns in the file, we turn table heads off:

> friends.data <- read.table("peanuts.txt", head=F)

Create a graph from this edge list.
Load the igraph library (see Day 1 for how to do this), then enter:

> friends <- graph.data.frame(friends.data, directed=F)

Note that we specify the graph as undirected. This means that all links between nodes of the graph are bidirectional. Since Lucy is a friend Charlie, Charlie is also a friend of Lucy. We could model directionality as well, but won't do this in most of this course. To get a summary of the graph:

> summary(friends)
Vertices: 10
Edges: 12
Directed: FALSE
No graph attributes.
Vertex attributes: name.
No edge attributes.

Now that we have created the graph, we can draw it. In R you use the plot command to draw a graph. If you provide the graph as the only argument to plot, the result will be less than satisfying (try it!). Instead, you want to specify a layout that will draw the graph nicely according to criteria of what constitutes a good visualization. For example:

> plot(friends, layout=layout.kamada.kawai)

BTW, if you can't remember the names of the layouts, remember just one thing: help is only a few clicks away. Enter "layout" in the search box on the R GUI, or enter the following on the command line:

> help(layout)

One thing we would like to improve in the resulting diagram is that we would like to see the names of our friends instead of the numerical index of each node. We can get a more readable diagram by adding a parameter vertex.label to the plot command like this:

> plot(friends, layout=layout.kamada.kawai, vertex.label=V(friends)$name)

This tells R to use the names of the nodes instead of their indices. V(friends) returns a list of the names, and $name refers to the name attribute of a node. V(friends)$name returns a vector of node names. You can convince yourself as follows:

> V(friends)$name
[1] "Charlie" "Patty" "Lucy" "Snoopy" "Sally" "Linus"
[7] "Marcie" "Schroeder" "Woodstock" "Spike"

This is the output of drawing the friendship network:


With that, we should call it a day.

No comments:

Post a Comment