In this R Markdown Notebook, I go through the steps of how to construct the longitudinal network object for Austin in the 2010s, used in The Civic Elite. This replication file covers the analysis of the longitudinal Austin network (1998-2016) but the steps can be generalized to other cities.
This file is very similar – but not identical– to the file for building the dataset in the 1990s. Most importantly, it does not contain steps on attaching finanical information (not used here)
This file relies heavily on tsna, a package for manipulating longitudinal network objects. The end result will be longitudinal network objects for Austin that can be easily analyzed to study the temporal evolution of board interlock networks.
This guide covers constructing “one-mode” interlock networks, as is utilized in The Civic Elite. However, it is important to note that it is also possible to construct two-mode bipartite networks using tsna networkDynamic objects. This functionality is not covered here.
The first thing we need to do is load in our packages and the edgelist constructed in the previous step.
rm(list=ls())
library("tidyverse") # Data Management
library("statnet") # SNA and related suites
library(tsna) # Tools for Temporal Social Network Analysis
#library(ndtv) # Network Dynamic Temporal Visualizations
library(DescTools)
########
#1. Load Data and Subset
<- "https://github.com/AFMessamore42/the-civic-elite/raw/main/data/texas_filers_boards.csv"
URL <- read.csv( URL ) texas_filers
The first thing we are going to do is filter our Texas data down to Austin. In the paper, I choose to look at all nonprofits enclosed in the county (Travis) enclosing the city. Next, we are going to construct a list of all unique organizations in Texas, based on EIN. This will be useful for constructing the network object: since this is a bipartite network, of organizations and directors, we need to have the organizational information to construct the network properly
Important to note: the networkDynamic objects that we are going to build will need unique IDs for each organization, so we are going to assign this ID at this step.
# This datafile should include all board members with attached data on that orgs year variables
<- subset(texas_filers, subset = FIPS == 48453) #Filtering texas to Austin
officers
<- officers %>%
nodelist ::select(EIN, FISYR, TAXPAYER_NAME, NTEE1) %>%
dplyrgroup_by(FISYR) %>%
filter(!duplicated(EIN)) %>%
arrange(EIN) %>% ungroup() %>% mutate(ID= as.numeric(as.factor(EIN)))
We now get into the hard part of taking our list of names and list of organizations with financial data, and turn it into an easily usable temporal network object for analysis.
The first thing we need to do is to construct an edgelist (see Analyzing Social Networks for more info on the definitions)-which is at its simplest a two column dataframe that matches the nodes in our networks and indicates they will have a tie. This edgelist will turn the list of names of directors present at each organization into a dataframe that indicates when a tie occurs between two organizations.
Critically for longitudinal network analysis, this will be a temporal edgelist indicating not only that a tie occurred, but when the tie occurred, in order to create a temporal structure. Also, the edgelist will need unique IDs for EIN that match the IDs for organizations created in the last step.
One more detail: based on conversations with Richard Benton, it seemed appropriate to reduce unnecessary information in the network in the case that I hoped to study the evolution of the network over time and wanted to reduce computation time. So, I filtered the list of directors down only to those that either:
This information is not particularly relevant for this project. But any study of bipartite networks may need to think carefully about structures like these.
## Creating a clean edgelist object and removing orgs that were isolates
<- officers
edgelist <- edgelist %>% select(EIN, FISYR, cleaned_name) %>% #selecting edgelist items
edgelist rename(BOARDMEMBER = cleaned_name)
head(edgelist)
<- edgelist %>% filter(!is.na(BOARDMEMBER)) #remove "NA" orgs that didnt have board members
edgelist
#Creating a Counter for Years of service and interlock placement, and removing inelligble directors
<- edgelist %>% group_by(EIN, BOARDMEMBER) %>% mutate(years_of_service = n()) %>% ungroup()
edgelist <- edgelist %>%
edgelist group_by(BOARDMEMBER) %>%
mutate(served_at= length(unique(EIN))) %>% ungroup()
<- edgelist %>% filter(served_at > 1 | years_of_service ==4)
edgelist
## Constructing the director to director edgelist
<- left_join(edgelist, edgelist, by=c("FISYR","BOARDMEMBER")) # exact name matches in the same year for edge list
edgelist <- edgelist %>% subset(EIN.x != EIN.y) %>% select(EIN.x,FISYR, EIN.y) #removing self edges
edgelist
#Removing duplicate undirected edges
<- edgelist %>%
edgelist_clean apply(1L, sort) %>% # sort dyads
t() %>% # transpose resulting matrix to get the original shape back
unique() %>% # get the unique rows
as.data.frame() %>% # back to data frame
setNames(c("FISYR", "EIN.x", "EIN.y"))
# Inducing the Temporal Structure Into the Network
<- edgelist_clean %>%
edgelist rename(head =EIN.x, tail=EIN.y, onset=FISYR) %>%
mutate(terminus = onset+1) %>%
mutate(duration = 1) %>%
mutate(edge.id = row_number())
# Creating unique IDS for each EIN
<-nodelist %>% select(EIN, ID) %>% distinct(EIN, .keep_all=TRUE) #key for IDs from EIN
key <- key %>% rename(head=EIN, ID_head=ID)
key1 <- key %>% rename(tail=EIN, ID_tail=ID)
key2
# Joining these IDS to the edgelist
<- left_join(edgelist, key1, by="head")
edgelist <- left_join(edgelist, key2, by="tail")
edgelist <- edgelist %>% select(-head, -tail) %>% rename(head=ID_head, tail=ID_tail)
edgelist
edgelist
Now, we can take our nodelist of organizations and our edgelist of interlocking directorates and construct a longitudinal network object. More information about networkDynamic objects can be found here.
The basic steps are the following:
#4. Construct Temporal Network
# Using Network Dynamic objects
#creating "spells" of presence within the network for verticies (this is where I would include stats)
#nodelist formatting
<- nodelist %>% select(-EIN) %>% rename(EIN=ID)
nodelist <- nodelist %>%
nodelist rename(onset=FISYR) %>%
mutate(terminus=onset+1) %>%
rename(vertex.id = EIN) %>%
mutate(vertex.id =as.integer(vertex.id)) %>%
select(onset, terminus, vertex.id, TAXPAYER_NAME, NTEE1) %>% as.data.frame()
#edgelist formatting
<- edgelist %>%
edgelist mutate(onset.censored=FALSE, terminus.censored = FALSE) %>%
select(onset, terminus,tail, head,onset.censored,terminus.censored, duration, edge.id) %>% as.data.frame()
#specify initial base.net
<- network.initialize(
net length(unique(nodelist$vertex.id)),
directed = FALSE,
hyper = FALSE,
loops = FALSE,
multiple = FALSE
)
#initialize network
<- networkDynamic(
dynamicNetwork
net,edge.spells = edgelist,
vertex.spells = nodelist,
directed=FALSE,
)
# Checks
network.dynamic.check(dynamicNetwork) # All Good?
<- as.data.frame(dynamicNetwork) #Edgelist correct?
dynamic_output
list.vertex.attributes(dynamicNetwork)
<- nodelist %>% #for names
names distinct(vertex.id, .keep_all = TRUE) %>% select(TAXPAYER_NAME)
<- names$TAXPAYER_NAME
names
<- nodelist %>% #for names
ntee distinct(vertex.id, .keep_all = TRUE) %>% select(NTEE1)
<- ntee$NTEE1
ntee
set.vertex.attribute(dynamicNetwork, "vertex.names", names)
set.vertex.attribute(dynamicNetwork, "ntee", ntee)
Now, you are done and the network can be saved!
save.image('Austin_2010s.RData')