Census Tree Project
We are creating a Census Tree that will link the 188 million people that lived in the United States between 1900 and 1940 across each of the census records for these years they appear in and connect each person to all of their one-hop relatives (parents, siblings, spouses, and children). The Census Tree will provide the largest longitudinal data set ever created in the United States and will open up many opportunities for research in economics, demography, sociology, and public health. Here is a link to the paper and slides about the project that we recently presented in Oslo. We are working with Steve Ruggles, Cathy Fitch, and Jonas Helgertz on this project.
We have gathered data on every students who attended Harvard during the early 1900s. During this time period, Harvard randomly assigned room mates and we are using this exogenous variation to examine the impact of your college room mate on long-run economic outcomes. The project involves a combination of computer vision, natural language processing, and record linking that are the three core strengths of our lab. We are working with Seth Zimmerman (University of Chicago) on this project.
We currently have funding from two NIH grants to develop tools to automatically index historical records. We are currently focused on Ohio death certificates and the 1940 US census but the tools we are developing will allow us to automatically index records from many different countries. We've created a mobile indexing app that let's you see how well the automatic indexing is working: bit.ly/rll_index. Each of the images on this app were snipped and indexed by a machine learning algorithm. Currently, we can narrow the correct name to a small set of options and the indexing app will allow us to improve our hand-writing recognition until it is able to match the accuracy of a human. We are working with Mark Clement (BYU) on this the project.
Academic Partner Projects
We have worked on projects for several academic partners. The students in our lab helped Martha Bailey (University of Michigan) with the LIFEM project, Adriana Lleras-Muney (UCLA) and Anna Aizer (Brown) with the Mother’s Pension project, Rick Hornbeck (Chicago) with a project to digitize and index the US manufacturing census, and Ran Abramitzky (Stanford) and Leah Boustan (UCLA) with a project related to Ellis Island oral histories. We will help with any projects that can directly benefit FamilySearch either by improving the Family Tree or expanding their indexed data collections. Our main motivation for these projects is to provide meaningful research experiences to BYU students. Contact Joe Price (firstname.lastname@example.org) if you would like the Record Linking Lab to help with a project.