In this R Markdown Notebook, I go through the steps of how to construct the longitudinal network object for Austin in the 1990s, used in The Civic Elite. This replication file covers the analysis of the longitudinal Austin network (1998-2016) but the steps can be generalized to other cities.
This file is very similar – but not identical– to the file for building the dataset in the 2010s
This file relies heavily on tsna, a package for manipulating longitudinal network objects. The end result will be longitudinal network objects for Austin that can be easily analyzed to study the temporal evolution of board interlock networks.
This guide covers constructing “one-mode” interlock networks, as is utilized in The Civic Elite. However, it is important to note that it is also possible to construct two-mode bipartite networks using tsna networkDynamic objects. This functionality is not covered here.
The first thing we need to do is load in our packages and the edgelist constructed in the previous step.
rm(list=ls())
library("tidyverse") # Data Management
library("statnet") # SNA and related suites
library(tsna) # Tools for Temporal Social Network Analysis
#library(ndtv) # Network Dynamic Temporal Visualizations
library(DescTools)
########
#1. Load Data and Subset
<- "https://media.githubusercontent.com/media/AFMessamore42/the-civic-elite/main/data/filers_boards_1990s.csv"
URL <- read.csv( URL ) texas_filers
The first thing we are going to do is filter our Texas data down to Austin. In the paper, I choose to look at all nonprofits enclosed in the county (Travis) enclosing the city. Next, we are going to construct a list of all unique organizations in Texas, based on EIN. This will be useful for constructing the network object: since this is a bipartite network, of organizations and directors, we need to have the organizational information to construct the network properly
Important to note: the networkDynamic objects that we are going to build will need unique IDs for each organization, so we are going to assign this ID at this step.
# This datafile should include all board members with attached data on that orgs year variables
<- subset(texas_filers, subset = FIPS == 48453) #Filtering texas to Austin
officers
<- officers %>%
nodelist select(EIN, FISYR, NAME.x, NTEE1) %>%
group_by(FISYR) %>%
filter(!duplicated(EIN)) %>%
arrange(EIN) %>% ungroup() %>% mutate(ID= as.numeric(as.factor(EIN)))
I used the IRS business master file to obtain additional information about the nonprofits. I did not use the following information in the manuscript, ultimately, but it is useful to link other organizational level covariates to learn about your case. This information can also be used in future projects.
#founding date from BMF
<- "https://github.com/AFMessamore42/the-civic-elite/raw/main/data/BMF_Loaded.RData"
githubURL load(url(githubURL))
<- bmfofficers %>% select(EIN, RULEDATE)
age $RULEDATE[age$RULEDATE ==0] <- NA
age$RULEYR <- substr(age$RULEDATE, 0, 4)
age$RULEYR <- as.numeric(age$RULEYR )
age<- age %>% select(EIN, RULEYR)
age rm(bmfofficers)
#financial records from revexp
# -total assets
# -total liability
# -total contributions
# -total revenue
# -total expenses
# -fundraising expenses
<- officers %>%
covariates select(EIN, FISYR, P4E_ASST, P4E_LIAB, P1TCONT, P1TOTREV, P1TOTEXP,P1FREXP, P1DIRSUP) %>% #rename clear
rename(eoy_assets=P4E_ASST, liabilities=P4E_LIAB,contributions=P1TCONT,
tot_rev=P1TOTREV,tot_exp=P1TOTEXP, fr_exp =P1FREXP, dir_sup = P1DIRSUP) %>%# create variables that will be used
mutate_at(c("eoy_assets","liabilities","contributions","tot_rev",
"tot_rev", "tot_exp","fr_exp","dir_sup"),Winsorize, probs = c(0.01, 0.99)) %>%
mutate(res_vuln = liabilities/(eoy_assets), res_depn= contributions/(tot_rev),
fin_perform =tot_rev/(tot_exp), pub_support= contributions/(tot_rev),
fund_effic=dir_sup/(fr_exp), net_rev=tot_rev-tot_exp) %>%
group_by(FISYR) %>%
filter(!duplicated(EIN)) %>% ungroup() #eliminate "boardmember-year" records
#fix weird values Constructed from the above.
$fund_effic[covariates$dir_sup>0 & covariates$fr_exp==0] <- 1
covariates$fund_effic[covariates$dir_sup==0 & covariates$fr_exp==0] <-0
covariates
$res_vuln[covariates$liabilities>0 & covariates$eoy_assets==0] <- 1
covariates$res_vuln[covariates$liabilities==0 & covariates$eoy_assets==0] <-0
covariates
# -board size
<- officers %>%
board_size group_by(EIN, FISYR) %>%
filter(!is.na(cleaned_name)) %>% #remember not to misattribute if no board
mutate(board_size=n()) %>%
ungroup() %>%
group_by(FISYR) %>%
filter(!duplicated(EIN)) %>% ungroup() %>%
select(board_size, EIN, FISYR)
$board_size[is.na(board_size$board_size)] <- 0 #NA means no board reported
board_size
#concatenate
<- left_join(covariates, board_size) #NA means no board reported
full_covariates <- left_join(full_covariates, age, by="EIN")
full_covariates <- full_covariates %>% #generate age
full_covariates mutate(age=FISYR-RULEYR)
#add time-varying covariates to our nodelist of organizations
<- left_join(nodelist, full_covariates, by=c("EIN", "FISYR")) nodelist
We now get into the hard part of taking our list of names and list of organizations with financial data, and turn it into an easily usable temporal network object for analysis.
The first thing we need to do is to construct an edgelist (see Analyzing Social Networks for more info on the definitions)-which is at its simplest a two column dataframe that matches the nodes in our networks and indicates they will have a tie. This edgelist will turn the list of names of directors present at each organization into a dataframe that indicates when a tie occurs between two organizations.
Critically for longitudinal network analysis, this will be a temporal edgelist indicating not only that a tie occurred, but when the tie occurred, in order to create a temporal structure. Also, the edgelist will need unique IDs for EIN that match the IDs for organizations created in the last step.
One more detail: based on conversations with Richard Benton, it seemed appropriate to reduce unnecessary information in the network in the case that I hoped to study the evolution of the network over time and wanted to reduce computation time. So, I filtered the list of directors down only to those that either:
This information is not particularly relevant for this project. But any study of bipartite networks may need to think carefully about structures like these.
## Creating a clean edgelist object and removing orgs that were isolates
<- officers
edgelist <- edgelist %>% select(EIN, FISYR, cleaned_name) %>% #selecting edgelist items
edgelist rename(BOARDMEMBER = cleaned_name)
head(edgelist)
<- edgelist %>% filter(!is.na(BOARDMEMBER)) #remove "NA" orgs that didnt have board members
edgelist
#Creating a Counter for Years of service and interlock placement, and removing inelligble directors
<- edgelist %>% group_by(EIN, BOARDMEMBER) %>% mutate(years_of_service = n()) %>% ungroup()
edgelist <- edgelist %>%
edgelist group_by(BOARDMEMBER) %>%
mutate(served_at= length(unique(EIN))) %>% ungroup()
<- edgelist %>% filter(served_at > 1 | years_of_service ==4)
edgelist
## Constructing the director to director edgelist
<- left_join(edgelist, edgelist, by=c("FISYR","BOARDMEMBER")) # exact name matches in the same year for edge list
edgelist <- edgelist %>% subset(EIN.x != EIN.y) %>% select(EIN.x,FISYR, EIN.y) #removing self edges
edgelist
#Removing duplicate undirected edges
<- edgelist %>%
edgelist_clean apply(1L, sort) %>% # sort dyads
t() %>% # transpose resulting matrix to get the original shape back
unique() %>% # get the unique rows
as.data.frame() %>% # back to data frame
setNames(c("FISYR", "EIN.x", "EIN.y"))
# Inducing the Temporal Structure Into the Network
<- edgelist_clean %>%
edgelist rename(head =EIN.x, tail=EIN.y, onset=FISYR) %>%
mutate(terminus = onset+1) %>%
mutate(duration = 1) %>%
mutate(edge.id = row_number())
# Creating unique IDS for each EIN
<-nodelist %>% select(EIN, ID) %>% distinct(EIN, .keep_all=TRUE) #key for IDs from EIN
key <- key %>% rename(head=EIN, ID_head=ID)
key1 <- key %>% rename(tail=EIN, ID_tail=ID)
key2
# Joining these IDS to the edgelist
<- left_join(edgelist, key1, by="head")
edgelist <- left_join(edgelist, key2, by="tail")
edgelist <- edgelist %>% select(-head, -tail) %>% rename(head=ID_head, tail=ID_tail)
edgelist
edgelist
Now, we can take our nodelist of organizations and our edgelist of interlocking directorates and construct a longitudinal network object. More information about networkDynamic objects can be found here.
The basic steps are the following:
#4. Construct Temporal Network
# Using Network Dynamic objects
#creating "spells" of presence within the network for verticies (this is where I would include stats)
#nodelist formatting
<- nodelist %>% select(-EIN) %>% rename(EIN=ID)
nodelist <- nodelist %>%
nodelist rename(onset=FISYR, TAXPAYER_NAME = NAME.x) %>%
mutate(terminus=onset+1) %>%
rename(vertex.id = EIN) %>%
mutate(vertex.id =as.integer(vertex.id)) %>%
select(onset, terminus, vertex.id, TAXPAYER_NAME, NTEE1,
age, board_size,res_vuln, res_depn,fin_perform, %>%
pub_support, fund_effic, net_rev) as.data.frame()
#edgelist formatting
<- edgelist %>%
edgelist mutate(onset.censored=FALSE, terminus.censored = FALSE) %>%
select(onset, terminus,tail, head,onset.censored,terminus.censored, duration, edge.id) %>% as.data.frame()
#specify initial base.net
<- network.initialize(
net length(unique(nodelist$vertex.id)),
directed = FALSE,
hyper = FALSE,
loops = FALSE,
multiple = FALSE
)
#initialize network
<- networkDynamic(
dynamicNetwork
net,edge.spells = edgelist,
vertex.spells = nodelist,
directed=FALSE,
create.TEAs=TRUE,
)
# Checks
network.dynamic.check(dynamicNetwork) # All Good?
<- as.data.frame(dynamicNetwork) #Edgelist correct?
dynamic_output
list.vertex.attributes(dynamicNetwork)
<- nodelist %>% #for names
names distinct(vertex.id, .keep_all = TRUE) %>% select(TAXPAYER_NAME)
<- names$TAXPAYER_NAME
names
<- nodelist %>% #for names
ntee distinct(vertex.id, .keep_all = TRUE) %>% select(NTEE1)
<- ntee$NTEE1
ntee
set.vertex.attribute(dynamicNetwork, "vertex.names", names)
set.vertex.attribute(dynamicNetwork, "ntee", ntee)
Now, you are done and the network can be saved!
save.image("Austin_1990s.RData)