In Democracy’s Data, Dan Bouk uses the 1940 U.S. Census as a case study for reading into the “depths” of data. These depths include the data’s designers: those who decide what information to collect, what not to collect, and from whom to collect. The depths include the process through which a data set is produced: the mechanisms — people, paper forms, digital interfaces — by which data is inputted and, potentially, the unseen pressures imposed upon the dataset by political and social forces. The depths of data, moreover, include the choice of how to present the data and how those presentations are manipulated, utilized, and challenged. And the depths are rife with “silences,” important gaps in the data elided — maybe intentionally, maybe not — in its presentation. But above all, the depths of data, at every layer, are filled with hidden stories, waiting patiently to be read.
Within this framework, Bouk unpacks the 1940 U.S. Census, piece by piece
- Designers: government bureacrats from several federal agencies, statisticians and academics, special invitees of President Roosevelt’s adminstration, and business interests (including the life insurance, retail, and manufacturing industries). Importantly, this cadre of designers consisted almost exclusively of white men.
- Process: enumerators (commonly called “census takers”), often selected via state-level political patronage, would carry long sheets of grid paper from door to door across their assigned “enumeration districts.” They were seeking to locate and interview the heads of households: the fundamental organizational unit of the census. These enumerators would record street addresses, names, races, education and income levels, occupations, and, as Bouk specifically expands on in his chapter on “partners,” the relation between the head of household and each individual residing in said household. Based on the political and social inclincations of each enumerator, and their interpretation of census guidelines, the attributes recorded in the form may or may not have aligned with the will of the person being counted.
- Presentation: the most fundamental output of the census is a number. For the 1940 Census, that number was 131,669,275: the official size of the U.S. population. Also included in the Census Bureau’s report was reapportionment data. After all, the entire thesis for the census is that the number of House representatives and federal spending ought to remain proportional to state population sizes. While the presentation of high-level census data happens within a few years after the count concluded, more granular statistics are, by law, sealed for a period of 72 years, ostensibly to protect the privacy of those counted. But despite the Census Bureau’s stated commitment to privacy, the political pressures of World War II proved too great for the Bureau: in an effort to make themselves useful, Bouk recounts how the Bureau eagerly aided the military’s efforts to round up Japaenese Americans for internment early on in the war.
- Silences: the story of America and its development is undeniably a story of white supremacy, and the census did not escape this racism. With the help of a census statistician, a major (and deeply flawed!) work of scientific racism was published that, using 1890 census data that showed the rate of growth in the Black population dramatically fall off, surmised that federal aid ought not to be provided to a populace which was trending towards “extinction.” As it turns out, there were systemic undercounts of the Black population in several censuses; the Census Bureau, however, was fiercely resistant to the Black professor — Kelly Miller — who first made the undercount claim. Without a deeper, intentional reading of data for silences, large swathes of Black people would have remained unrepresented in the so-called “inventory of democracy.”
- Stories: perhaps the most striking aspect of Democracy’s Data were the rich, individual stories. Enumerators, the enumerated, specific “Question Men”, senators, Census Bureau statisticians (and the family of one Japanese staff member), and even Bouk’s own immediate and extended family. Reading through these stories made me want to dig through census records to see who I might discover; I specifically thought about looking for my family in the data. Indeed, one of my fellow Coding it Forward fellows went down this very route, locating his grandfather in the 1950 census.
All in all, Democracy’s Data was an engaging crash course in how to critically read data, and I’d highly recommend it for anyone who collects, analyzes, or operates on data.