Introduction

In this R Markdown Notebook, I go through the steps of how to construct the longitudinal network object for Austin in the 1990s, used in The Civic Elite. This replication file covers the analysis of the longitudinal Austin network (1998-2016) but the steps can be generalized to other cities.

This file is very similar – but not identical– to the file for building the dataset in the 2010s

This file relies heavily on tsna, a package for manipulating longitudinal network objects. The end result will be longitudinal network objects for Austin that can be easily analyzed to study the temporal evolution of board interlock networks.

This guide covers constructing “one-mode” interlock networks, as is utilized in The Civic Elite. However, it is important to note that it is also possible to construct two-mode bipartite networks using tsna networkDynamic objects. This functionality is not covered here.

The first thing we need to do is load in our packages and the edgelist constructed in the previous step.

rm(list=ls())
library("tidyverse") # Data Management
library("statnet") # SNA and related suites
library(tsna) # Tools for Temporal Social Network Analysis
#library(ndtv) # Network Dynamic Temporal Visualizations
library(DescTools)
########

#1. Load Data and Subset
URL <- "https://media.githubusercontent.com/media/AFMessamore42/the-civic-elite/main/data/filers_boards_1990s.csv"
texas_filers <- read.csv( URL )

First Steps: Getting to Austin and Generating a List of Organizations

The first thing we are going to do is filter our Texas data down to Austin. In the paper, I choose to look at all nonprofits enclosed in the county (Travis) enclosing the city. Next, we are going to construct a list of all unique organizations in Texas, based on EIN. This will be useful for constructing the network object: since this is a bipartite network, of organizations and directors, we need to have the organizational information to construct the network properly

Important to note: the networkDynamic objects that we are going to build will need unique IDs for each organization, so we are going to assign this ID at this step.

# This datafile should include all board members with attached data on that orgs year variables

officers <- subset(texas_filers, subset = FIPS == 48453) #Filtering texas to Austin

nodelist <- officers %>% 
  select(EIN, FISYR, NAME.x, NTEE1) %>%
  group_by(FISYR) %>% 
  filter(!duplicated(EIN)) %>%
  arrange(EIN) %>% ungroup() %>%  mutate(ID= as.numeric(as.factor(EIN)))

Financial Covariates

I used the IRS business master file to obtain additional information about the nonprofits. I did not use the following information in the manuscript, ultimately, but it is useful to link other organizational level covariates to learn about your case. This information can also be used in future projects.

  • year of founding
  • total assets
  • total liability
  • total contributions
  • total revenue
  • total expenses
  • fundraising expenses
  • board size
#founding date from BMF
githubURL <- "https://github.com/AFMessamore42/the-civic-elite/raw/main/data/BMF_Loaded.RData"
load(url(githubURL))

age <- bmfofficers %>% select(EIN, RULEDATE)
age$RULEDATE[age$RULEDATE ==0] <- NA
age$RULEYR <- substr(age$RULEDATE, 0, 4)
age$RULEYR  <- as.numeric(age$RULEYR )
age <- age %>% select(EIN, RULEYR)
rm(bmfofficers)

#financial records from revexp
# -total assets
# -total liability
# -total contributions
# -total revenue
# -total expenses
# -fundraising expenses

covariates <- officers %>% 
  select(EIN, FISYR, P4E_ASST, P4E_LIAB, P1TCONT, P1TOTREV, P1TOTEXP,P1FREXP, P1DIRSUP) %>% #rename clear
  rename(eoy_assets=P4E_ASST, liabilities=P4E_LIAB,contributions=P1TCONT,
         tot_rev=P1TOTREV,tot_exp=P1TOTEXP, fr_exp =P1FREXP, dir_sup = P1DIRSUP) %>%# create variables that will be used
  mutate_at(c("eoy_assets","liabilities","contributions","tot_rev",
              "tot_rev", "tot_exp","fr_exp","dir_sup"),Winsorize, probs = c(0.01, 0.99)) %>%
  mutate(res_vuln = liabilities/(eoy_assets), res_depn= contributions/(tot_rev),
         fin_perform =tot_rev/(tot_exp), pub_support= contributions/(tot_rev),
         fund_effic=dir_sup/(fr_exp), net_rev=tot_rev-tot_exp) %>% 
  group_by(FISYR) %>% 
  filter(!duplicated(EIN)) %>% ungroup() #eliminate "boardmember-year" records

#fix weird values Constructed from the above.
covariates$fund_effic[covariates$dir_sup>0 & covariates$fr_exp==0] <- 1
covariates$fund_effic[covariates$dir_sup==0 & covariates$fr_exp==0] <-0

covariates$res_vuln[covariates$liabilities>0 & covariates$eoy_assets==0] <- 1
covariates$res_vuln[covariates$liabilities==0 & covariates$eoy_assets==0] <-0


# -board size
board_size <- officers %>% 
  group_by(EIN, FISYR) %>%
  filter(!is.na(cleaned_name)) %>%  #remember not to misattribute if no board
  mutate(board_size=n()) %>% 
  ungroup() %>% 
  group_by(FISYR) %>% 
  filter(!duplicated(EIN)) %>% ungroup() %>% 
  select(board_size, EIN, FISYR)

board_size$board_size[is.na(board_size$board_size)] <- 0 #NA means no board reported

#concatenate 
full_covariates <- left_join(covariates, board_size) #NA means no board reported
full_covariates <- left_join(full_covariates, age, by="EIN")
full_covariates <- full_covariates %>%  #generate age
  mutate(age=FISYR-RULEYR)

#add time-varying covariates to our nodelist of organizations
nodelist <- left_join(nodelist, full_covariates, by=c("EIN", "FISYR"))

Constructing Temporal Network Objects

We now get into the hard part of taking our list of names and list of organizations with financial data, and turn it into an easily usable temporal network object for analysis.

The Temporal Edgelist

The first thing we need to do is to construct an edgelist (see Analyzing Social Networks for more info on the definitions)-which is at its simplest a two column dataframe that matches the nodes in our networks and indicates they will have a tie. This edgelist will turn the list of names of directors present at each organization into a dataframe that indicates when a tie occurs between two organizations.

Critically for longitudinal network analysis, this will be a temporal edgelist indicating not only that a tie occurred, but when the tie occurred, in order to create a temporal structure. Also, the edgelist will need unique IDs for EIN that match the IDs for organizations created in the last step.

One more detail: based on conversations with Richard Benton, it seemed appropriate to reduce unnecessary information in the network in the case that I hoped to study the evolution of the network over time and wanted to reduce computation time. So, I filtered the list of directors down only to those that either:

  1. Served on multiple boards
  2. Served at least four years on the board

This information is not particularly relevant for this project. But any study of bipartite networks may need to think carefully about structures like these.

## Creating a clean edgelist object and removing orgs that were isolates
edgelist <- officers
edgelist <- edgelist %>% select(EIN, FISYR, cleaned_name) %>%  #selecting edgelist items
  rename(BOARDMEMBER = cleaned_name)
head(edgelist)
edgelist <- edgelist %>% filter(!is.na(BOARDMEMBER)) #remove "NA" orgs that didnt have board members

#Creating a Counter for Years of service and interlock placement, and removing inelligble directors
edgelist <- edgelist %>% group_by(EIN, BOARDMEMBER) %>% mutate(years_of_service = n()) %>% ungroup()
edgelist <- edgelist %>% 
  group_by(BOARDMEMBER) %>% 
  mutate(served_at= length(unique(EIN))) %>% ungroup()
edgelist <- edgelist %>% filter(served_at > 1 | years_of_service ==4) 

## Constructing the director to director edgelist
edgelist <- left_join(edgelist, edgelist, by=c("FISYR","BOARDMEMBER")) # exact name matches in the same year for edge list
edgelist <- edgelist %>% subset(EIN.x != EIN.y) %>% select(EIN.x,FISYR, EIN.y)  #removing self edges

#Removing duplicate undirected edges
edgelist_clean <- edgelist %>% 
  apply(1L, sort) %>%              # sort dyads
  t() %>%                          # transpose resulting matrix to get the original shape back
  unique() %>%                     # get the unique rows
  as.data.frame() %>%              # back to data frame
  setNames(c("FISYR", "EIN.x", "EIN.y"))

# Inducing the Temporal Structure Into the Network 
edgelist <- edgelist_clean %>%
  rename(head =EIN.x, tail=EIN.y, onset=FISYR) %>%
  mutate(terminus = onset+1) %>% 
  mutate(duration = 1) %>% 
  mutate(edge.id = row_number())

# Creating unique IDS for each EIN
key <-nodelist %>% select(EIN, ID) %>% distinct(EIN, .keep_all=TRUE) #key for IDs from EIN
key1 <- key %>% rename(head=EIN, ID_head=ID)
key2 <- key %>% rename(tail=EIN, ID_tail=ID)

# Joining these IDS to the edgelist 
edgelist <- left_join(edgelist, key1, by="head")
edgelist <- left_join(edgelist, key2, by="tail")
edgelist <- edgelist %>% select(-head, -tail) %>% rename(head=ID_head, tail=ID_tail)

edgelist

Bringing it all together: The Longitudinal Network Object

Now, we can take our nodelist of organizations and our edgelist of interlocking directorates and construct a longitudinal network object. More information about networkDynamic objects can be found here.

The basic steps are the following:

  1. Format the nodelist
  2. Format the edgelist (also indicating that in this case no information about time was censored)
  3. Initialize a network object with as many nodes as there are organizations
  4. Join the nodelist (with time varying covariates) and edgelist to this empty network object to create the networkDynamic object
  5. Check the networkDynamic object to see it looks good
  6. Add any time invairiant attributes to the organizations in the network at the end: name of organization and ntee code
#4. Construct Temporal Network 
# Using Network Dynamic objects

#creating "spells" of presence within the network for verticies (this is where I would include stats)
#nodelist formatting 
nodelist <-  nodelist %>% select(-EIN) %>% rename(EIN=ID)
nodelist <- nodelist %>% 
  rename(onset=FISYR, TAXPAYER_NAME = NAME.x) %>% 
  mutate(terminus=onset+1) %>% 
  rename(vertex.id = EIN) %>% 
  mutate(vertex.id =as.integer(vertex.id)) %>% 
  select(onset, terminus, vertex.id, TAXPAYER_NAME, NTEE1,
         age, board_size,res_vuln, res_depn,fin_perform, 
         pub_support, fund_effic, net_rev) %>% 
  as.data.frame()

#edgelist formatting 
edgelist <- edgelist %>% 
  mutate(onset.censored=FALSE, terminus.censored = FALSE) %>% 
  select(onset, terminus,tail, head,onset.censored,terminus.censored, duration, edge.id) %>% as.data.frame()


#specify initial base.net
net <- network.initialize(
  length(unique(nodelist$vertex.id)),
  directed = FALSE,
  hyper = FALSE,
  loops = FALSE,
  multiple = FALSE
)

#initialize network
dynamicNetwork <- networkDynamic(
  net,
  edge.spells = edgelist,
  vertex.spells = nodelist, 
  directed=FALSE,
  create.TEAs=TRUE,
)

# Checks
network.dynamic.check(dynamicNetwork) # All Good?
dynamic_output <- as.data.frame(dynamicNetwork) #Edgelist correct?



list.vertex.attributes(dynamicNetwork)

names <- nodelist %>% #for names
  distinct(vertex.id, .keep_all = TRUE) %>% select(TAXPAYER_NAME)
names <- names$TAXPAYER_NAME

ntee <- nodelist %>% #for names
  distinct(vertex.id, .keep_all = TRUE) %>% select(NTEE1)
ntee <- ntee$NTEE1

set.vertex.attribute(dynamicNetwork, "vertex.names", names)
set.vertex.attribute(dynamicNetwork, "ntee", ntee)

Now, you are done and the network can be saved!

save.image("Austin_1990s.RData)