Network Analysis

In this section, we'll explore network analysis as a tool for understanding relationships between people and things.

What is network analysis?#

Networks are everywhere. Once you start, it's hard to stop seeing them. The internet is a network, Facebook is a network, Kevin Bacon is the central node in the Hollywood network. Studying the connections between people and things is a major activity in the humanities, so "network analysis" feels like a natural method for working with humanities data. Network analysis varies by discipline, but originates from graph theory in mathematics. This is a simplification, but the idea is that one can measure significant people in a network by how many connections they have to other people and by the weight or significance of those connections. You can also learn who might serve as a bridge between social groups or whether you have a dense network of highly connected people or a more disperse group. We can visualize these networks through a network graph - points connected by lines.

You don't necessarily have to just study people with network analysis, but it's a common approach. The specific name for this methodology is SNA or Social Network Analysis. SNA is used in many disciplines, with a heavy presence in the social sciences. It's important to recognize that each discipline will have its own conventions and expectations for SNA. You should tread carefully until you have a firm foundation. As Scott Weingart reminds us in "Networks Demystified:"

"Networks can be used on any project. Networks should be used on far fewer."

This is not to scare you! But learning about network analysis is the perfect moment to pause and remember that you are not working in a vacuum. The methods you use have a historical and modern context and you are responsible for learning that context.

That being said, network visualizations can be a great place to start. They can help you learn more about your data or illuminate a new direction for your research agenda. Plus, they're fun. Learning the basics will help you be better judge of whether it is, in fact, a good fit for your project.

Nodes, edges, ties, what?#

Let's take a moment to define some of the terms you'll encounter in network analysis.

  • Nodes/points/agents/vertices - All of these words refer to the points on your network graph. If you're doing social network analysis, the points are the people in your graph.

  • Edges/ties - Refers to the connections between nodes, usually represented by lines between points. Edges can be weighted (stronger or weaker edges) or directional (some relationships are mutual, some are not). The line could be drawn thin or thick based on the weight, or with an arrow to signify direction. The connection between you and your roommate might have a strong weight for example. If you follow a celebrity on Twitter, but they don't follow you back, we can imagine that as a relationship with a single direction.

  • Centrality - the more connections a node has, the more central they are to the network. You can measure centrality in different ways, such as betweenness (the shortest distance between nodes) or eigenvector (way to measure influence in a network). Think about your friends. The person who seems to know everyone might not be the same as the person who can spread information effectively, or to far off parts of the network.

  • Attributes - Information about your nodes. If your network contains people, attributes might be their age, birth dates, affiliation, occupation, etc.

  • Bimodal or multi-modal - a network in which more than one type of thing is being connected. While a unimodal network might connect people to people, a bimodal network looks at authors and their books, or people to places. Scott Weingart has more to say about bimodal networks.

Network data#

To visualize your network, you need to set up your data is particular way. This might vary depending on the software you use, but it's a good place to start:

An edge list is a two-column spreadsheet that forms the the network by listing the nodes and their connections. It's in a deceptively simple format:

Source Target
Kevin Bacon John Lithgow
Kevin Bacon Sarah Jessica Parker
Sarah Jessica Parker Matthew Broderick
Matthew Broderick Jennifer Grey
Jennifer Grey Patrick Swayze
Patrick Swayze Demi Moore
Demi Moore Kevin Bacon
Jennifer Grey Laurence Fishburne
Laurence Fishburne Keanu Reeves
Kevin Bacon Laurence Fishburne

You'll notice that some names can appear more than once, and not necessarily in the same column. Each of these rows represents a single connection, in this case, a movie. Take a minute to try to draw this extremely limited 80s-90s movie network on paper. What do you expect to see? Who has the most connections? Who seems central and who seems like an outlier? How flawed is this network? Add to it if you can.

An edge list might build the network, but we need an attribute table to add context to our network. An attribute table lists each node only once, then displays information about that node in subsequent columns.

Name Gender Is known for a dance scene
Kevin Bacon M Y
Matthew Broderick M Y
Laurence Fishburne M N
Jennifer Grey F Y
John Lithgow M N
Demi Moore F Y
Sarah Jessica Parker F N
Keanu Reeves M N
Patrick Swawyze M Y

Now I can use this information to filter or slice the view of my network. I can also use it to refine my questions or my data set. It's easy for me to see that I have more men in my network than women.

If you want to see this data as a visualization, try loading it into Palladio.

Copy and paste the following text into the white box in Palladio, then press Load:

Source,Target,
Kevin Bacon,John Lithgow,
Kevin Bacon,Sarah Jessica Parker,
Sarah Jessica Parker,Matthew Broderick,
Matthew Broderick,Jennifer Grey,
Jennifer Grey,Patrick Swayze,
Patrick Swayze,Demi Moore,
Demi Moore,Kevin Bacon,
Jennifer Grey, Laurence Fishburne,
Laurence Fishburne, Keanu Reeves,
Kevin Bacon, Laurence Fishburne,

To generate a network, visit the Graph tab. Use the menu on the right of the screen to select the "source" column in Source and the "target" column in Target. You should see a network appear! Does it look like what you expected?

Tools#

  • Palladio is a data visualization tool created by Stanford's Humanities + Design Research Lab. It's a browser-based tool that accepts structured data and creates network, geospatial, and gallery visualizations. It's easy to use, but can crash under too much data. It's a great way to create first draft visualizations of your data.

  • Gephi is a robust network/graph visualization tool. It runs from your computer and can accept large data sets (though you will need some patience). Gephi has a lot of options for changing the appearance of your network via filters and layout. You can also generate statistics about graph density etc. There are a number of tutorials available.

  • Cytoscape was designed for the sciences, but is now being used for more general network analysis and visualization purposes.

  • Nodegoat is a data modeling and visualization platform with options for network analysis.

  • igraph is a network analysis package for use with R, Python, and C++

  • UCINET is available on Windows only. It is not free software, but 90 day trials are available. Tutorials are available on the UCINET site. Hanneman and Riddle created this textbook for use with UCINET.

Activities#

Activity 5.1#

In order to practice using the terminology and methods of network analysis, let's design a network from scratch. I recommend using the whiteboard feature in Zoom, a Google doc, or a Box note for this exercise.

  1. In your group, select a topic for your network. It should be approachable for all members of your group. Game of Thrones? W&L students? Sports? A novel or TV show?
  2. List all the nodes in your network. Do they have types? Do they have attributes? Is this a bi-modal or multi-modal network?
  3. Start making connections or edges in your network. What type of edges do you need? Do the edges have a weight?
  4. Think about centrality. Do you have an ego network? How might you start calculating centrality?

Activity 5.2#

Let's continue practicing our skills reviewing established projects. Instead of a blog post, be prepared to talk about this project with your classmates. Select one of the following projects to explore. Use the review criteria from Reviews in DH to analyze the project. Make some notes in if you need to.

During class, we'll divide into groups to discuss the project you chose.

  1. What is this project about? What are the goals?
  2. Where is the data?
  3. How is this a humanities project?
  4. How effective are the network visualizations? Why do you say that?
  5. What about the design, layout, and organization of the project?

Activity 5.3#

Let's practice putting together a network visualization from our trusty cemetery dataset.

  1. First, what are our questions? Who are the people in our data set? What networks should try to explore?
  2. Next, we need an edge list. We know that this is a two column spreadsheet, what belongs in each column?
  3. Go ahead and put some data together. This is a big data set, so we'll have to start somewhere.
  4. Upload your data into Palladio and create your network.
  5. What worked? What didn't?

Resources#

Readings#

results matching ""

    No results matching ""