Introduction

In this R Markdown Notebook, I provide the code of how to construct the board edgelists used in The Civic Elite. This replication file covers the analysis of the longitudinal Austin network (1998-2016) but the steps can be generalized to other cities. We will first construct the data for the recent time period, from AWS, and then move to the earlier time period.

The end result of this analysis will be the construction of a dataset of Form 990/Form 990 EZ nonprofits in Texas from which longitudinal networks of interlocking directorates can be constructed.

Please note: Because these the raw data files are very large, the file is only provided for reference. The data itself can be found here

2010s

Initial Loading and Cleaning

In order to construct board networks from 2010 on, we need data from Amazon Web Services that contains the board information from Forms 990/Forms 990EZ filers across the United States from 2010 to 2017. We will then need to match this board data to a set of all active nonprofits in Texas between 2013 and 2016.

Both of these data files can be accessed with some work AWS via IRSx. There, you can find guides for how to constructing the appropriate information. A special shout out to Jesse Lecy was also very kind in providing me with the board information from AWS.

These are the files that are necessary:

  • “FULL-PART-VII-TABLE-01.rds” : The Board Data for the United States
  • “texas_efilers.csv” : The e-filer records for Texas between 2013 and 2016 (list of active e-filers)
  • “parsednames.csv”: A cleaned database of names for 2013-2016
board_data <- readRDS("FULL-PART-VII-TABLE-01.rds") #Board Data from AWS 
texas_filers <- read.csv("texas_efilers.csv", numerals = "no.loss") #Records of Texas E-Filers

# We are then only going to select the variables we need from the board data and match names and variable types for linking
officers <- board_data %>% select(EIN, NAME, TAXYR, FORMTYPE, F9_07_PZ_DTK_NAME, F9_07_PZ_DTK_TITLE)
officers <- officers %>% filter(TAXYR > 2012 & TAXYR < 2017)
officers <- officers %>% rename(FISYR= TAXYR)

texas_filers <- texas_filers %>% mutate(EIN = as.character(EIN))
officers <- officers %>% mutate(FISYR = as.numeric(FISYR))

Join the Filers.

texas_filers <- left_join(texas_filers, officers, by = c("EIN", "FISYR"))

# Drop Private Foundations

texas_filers <- texas_filers %>% filter(RETURN_TYPE != "990PF")

Name Cleaning And Save

The names need to be cleaned before you use them for matching, removing items such as honorifics, suffixes, and salutations, etc. Jesse Lecy has developed peopleparser for this task. This is very computationally intensive for a board list of this size, so I am going to show the code, but I after the producing it the first time I actually load in an already cleaned dataset.

texas_filers$F9_07_PZ_DTK_NAME <- tolower(texas_filers$F9_07_PZ_DTK_NAME) # going lower case

# parsed_names <- parse.names(texas_filers$F9_07_PZ_DTK_NAME) # write this and load it to use in the future
# write.csv(census.names, "census.names.csv")
# write.csv(parsed_names, "parsednames.csv")
## run this below
parsed_names <- read.csv("parsednames.csv")
texas_filers <- cbind(parsed_names, texas_filers)

We now do a little cleaning up to create single names for matching and save the dataset. An important part of this step is to eliminate multiple records of the same person at any given organization, since a single person may serve on multiple roles but you dont necessary want to inflate their representation. I delete duplicate records of indvididuals with the same name in the same organization in the same year.

#single name
texas_filers$cleaned_name = paste0(texas_filers$first_name, texas_filers$last_name)
#recode NA back to NA
texas_filers$cleaned_name <- na_if(texas_filers$cleaned_name, "NA")

texas_filers <- texas_filers[,2:33]

#remove any duplicate mentions of a director in an organization in a given year. We dont need that information. 
texas_filers$record_indicator=paste0(texas_filers$cleaned_name, texas_filers$EIN, texas_filers$FISYR)
texas_filers <- texas_filers %>% 
  distinct(record_indicator, .keep_all = TRUE)

### For the sake of storing data on GitHub I am going to reduce the size of this dataset here. 
texas_filers <- subset(texas_filers, subset = FIPS == 48453)

write.csv(texas_filers, "texas_filers_boards.csv")

1990s

Initial Loading and Cleaning

The steps for constructing the edgelists in the 1990s are a little different: they come from the Urban Institutes historical “Digitized Data” from 1998-2003. Here, we will need to load two files available from the Urban Institute.

  1. ‘BMF_RevExp_Loaded.RData’: The Revenues and Expenses from the Business Master File (for organizational data)
  2. ‘Officers_Loaded.RData’: The Officers file, which contains information on board members.

This data can all be downloaded here: Urban Institute

rm(list=ls())

load('BMF_RevExp_Loaded.RData')
load('Officers_Loaded.RData')

We need to join the BMF Revenues and Expenses File, and the Officers file to get the board information we want here. First a little subsetting and checking.

# Reduce the officers data
officers <- officers %>% select(EIN, NAME, FISYR,  P5NAME, P5TTL)

#check revexp make sure orgs are unique filings
revexp <- revexp %>% 
  group_by(EIN) %>% 
  mutate(count=n()) 
table(revexp$count) #at most 6, which means these are all unique

#subset the revenues expenses file to texas, as we are going to use it for matching 
revexp <- revexp %>% subset(STATE=='TX')

# Now we have what looks to be four years of board data from 2013-2016

Now we start joining files to create a proper edgelist.

filers <- left_join(revexp, officers, by = c("EIN", "FISYR"))
# change "" to NA on names
filers$P5NAME[filers$P5NAME==""] <- NA

table(is.na(filers$P5NAME))
table(is.na(filers$P5NAME), filers$FORMTYPE) # No PF in this data set

Name Cleaning and Save

Finally, we do the name cleaning. As before, we are going to technically skip this because its so time intensive.

filers$P5NAME <- tolower(filers$P5NAME) #converting to lower
filers <- filers %>% ungroup()
# parsed_names <- parse.names(filers$P5NAME) # write this and load it to use in the future
# write.csv(census.names, "census.names_1990s.csv")
# write.csv(parsed_names, "parsednames_1990s.csv")
# run this below
parsed_names <- read.csv("parsednames_1990s.csv")
filers <- cbind(parsed_names, filers)

At this point all we need to do is save the dataset.

#single name
filers$cleaned_name = paste0(filers$first_name, filers$last_name)
#recode NA back to NA
filers$cleaned_name <- na_if(filers$cleaned_name, "NA")

#remove any duplicate mentions of a director in an organization in a given year
filers$record_indicator=paste0(filers$cleaned_name, filers$EIN, filers$FISYR)
filers <- filers %>% 
  distinct(record_indicator, .keep_all = TRUE)


### For the sake of storing data on GitHub I am going to reduce the size of this dataset here. 

filers <- subset(filers, subset = FIPS == 48453)

write.csv(filers, "filers_boards_1990s.csv")