Goals:

  1. Basics of Social Network Analysis.
  2. Basics of Gephi.

Software

Files & Scripts

Class

  • Basic explanations
  • Hands-on tutorial

SNA Concepts

  • Networks
    • Patterns of relationships that connect individuals, institutions, or objects (or leave them unconnected)
    • Examples: family ties—vertical and horizontal; patronage—who pays whom for what services; shared membership in organizations; paterns of contracts among companies.
  • Main elements
    • Node (vertex, point, item)—person, organization, country, document, etc.
    • Edge (tie, link, line)—relationship, friendship, common attendance, co-occurance, kinship, membership
      • Directed and Undirected links: marriage—undirected; patronage—directed.
    • Graphs (Sociograms) / Matrices (pl. of matrix)
  • Graph properties
    • Density: number of edges / number of edges in a complete graph (example of complete and incomplete graph)
    • Centrality measures
      • Degree: Proportional to the number of other nodes to which a node is links – Number of links divided by (n-1). For example: it may measure exposure to what is flowing through the network (opportunity to influence directly; in-degree and out-degree).
      • Closeness – The sum of shortest paths to all other points in the graph. Divide by (n-1), then invert. For example: it may measure how quickly information can travel.
      • Betweenness – The extent to which a particular point lies ‘between’ other points in the graph; how many shortest paths (geodesics) is it on? A measure of brokerage or gatekeeping (How often a node lies on the shortest path between two other nodes)
      • Eigenvector – A weighted measure of centrality that takes into account the centrality of other nodes to which a node is connected. That is, being connect with other central nodes increases centrality. E.g., secretary of powerful person. Google’s page rank algorithm is based on a variation of this approach.
    • Modularity—interconnected clusters.

Gephi Data Format (Basic)

Nodes file

A CSV file with as many columns as you might need, but one must be id. These ids must be used in your edges data file in fields source and target.

Often scholars use numbers as ids in the edges table, while providing all the robust relevant information on the edges in the nodes table. This allows to accommodate as must information on the nodes as one might need (for example, coordinates, if nodes are locations; of affiliations, if nodes are people). This also helps to make edges file significantly smaller, as the information on nodes is not repeated.

Edges file

A CSV file with three columns:

1) source 2) target 3) weight

Lots of tutorials for Gephi can be found here: https://gephi.wordpress.com/.

Let’s try “Star Wars”

Download files with nodes and edges from the top of the page. Let’s try to load them into Gephi.

Collecting SNA data from “Dispatch”

Essentially, you can use the same script that you used to collect toponymic data from articles; here, however, the lines of code that collected toponymic data from a given artice should be replaces with the lines of code that collect network data:

1) collect all relevant items; remove duplicates; 2) convert them into edges (see, function edges() below); 3) after all the edges are collected, you will need to count their frequencies; 4) the result should be a table (or, a csv/tsv file) with three columns: source, target, weight; 5) weight value can be counted in a number of different ways; by and large, it is up to you, as it depends on what specifically you want to measure.

Code snippets

NB: edgesDic = {} must be declared in the very beginning of the script.

# updating dictionary
def updateDic(dic, key):
    if key in dic:
        dic[key] += 1
    else:
        dic[key]  = 1

# creating edges from a list
import itertools
def edges(edgesList, edgesDic):
    edges = list(itertools.combinations(edgesList, 2))
    for e in edges:
        key = "\t".join(sorted(list(e))) # A > B (sorted alphabetically, to avoid cases of B > A)
        updateDic(edgesDic, key)

# collectign tagged toponyms into a network
def collectTaggedToponyms(xmlText, dic, date):
    xmlText = re.sub("\s+", " ", xmlText)

    topList = []

    for t in re.findall(r"<placeName[^<]+</placeName>", xmlText):
        t = t.lower()

        if re.search(r'"(tgn,\d+)', t):
            reg = re.search(r'"(tgn,\d+)', t).group(1)

            if reg in mapData:
                topList.append(reg)

    # generating edges and updating their freqs
    edges(topList, edgesDic)

Reference Materials:

Some interesting examples

Homework:

  1. Create a network of places in the “Dispatch”—if places are mentioned in the same article, they must be connected one way or the other—we can use that as the basis for creating edges. Generate a vizualisation in Gephi, only here you will need to create a geographical network, i.e., a network put over a real map: use Geo Layout plugin (your file with the data on nodes must also include coordinates). Describe your observations on your website as a separate blog post.