Before the Industrial Revolution in the United States, Canada and Europe, you might have ended up married to a fourth cousin. People didn’t travel far to find a spouse, and the closer you were to home, the more likely it was you’d marry within your family.
Then, in the late 19th century, something changed, and people stopped marrying their cousins.
It has been conventional wisdom that Europeans and North Americans married more outside their families as geographic dispersal ramped up between 1825 and 1875, with the advent of mass railroad travel. But over the same period, the genetic relatedness of many couples actually increased. It wasn’t until after 1875 that partners started to become less and less related.
This 50-year lag might indicate that shifts in social norms played a bigger role than geographic mobility in getting people to wed outside their bloodline. It’s also just one example of the insights that can be gleaned from the world’s largest, scientifically-vetted family tree, presented in a study published Thursday in Science.
Compiling and validating 86 million public profiles from Geni.com, a genealogy-driven social media site, the authors generated 5 million family trees. The largest tree consisted of 13 million people, spanned an average of 11 generations and included both Sewall Wright, a founder of human population genetics, and actor Kevin Bacon (the two are separated by 24 degrees, in case you were wondering).
The researchers then used this data set to test several genetic and historical hypotheses, showing “you can harness the hard work of so many people around the globe just documenting their own family history, and learn something about humanity,” said Yaniv Erlich, chief science officer of MyHeritage, the parent company of Geni.com, and senior author of the paper.
“It’s very impressive as a data collection and harmonization effort, and of course they have only scratched the surface of what it might have to offer,” said Philip Cohen, a sociology professor at the University of Maryland, who was not involved in the research.
The study is the latest example of scientists using big, crowdsourced data collected by private companies to do research. Last year, one study spearheaded by Ancestry.com mapped North American migrations. There have also been efforts to track food poisoning via Yelp reviews and drug usage via Instagram.
The trend raises new questions for researchers to think about, such as how representative such data are of populations at large, and whether commercial entities like Geni.com have vested interests, said Emily Klancher Merchant, a science and technology studies professor at the University of California, Davis.
“When private companies control the data and fund the research, they’re the ones gatekeeping what kind of science gets done,” she said. A DNA testing company might be more interested, for instance, in financing studies that discover sellable genetic markers rather than open-ended, basic research.
What’s interesting about genealogies, however, is that there aren’t much better ways to get data currently. In the past, researchers had to laboriously cobble together church records or local birth and death certificates to construct large family trees.
With crowdsourced genealogy, “we have the ability to connect a much more vast network of individuals and locations around the world, in a faster, cheaper way,” said Erlich, who is also a computer science professor at Columbia University.
Geni.com has millions of users worldwide, and the website allows its members to merge trees, in an effort to create a single, massive family tree for the world.
Erlich and his collaborators took steps to validate the company’s trees, then reported several new findings from the data, such as a lower heritability of life span than others have reported, and a greater likelihood of mothers to migrate than fathers.
But with all of their findings, it’s important to consider that not everyone is represented equally.
“It’s like reading a Jane Austen novel — it gives you lots of great insights, but you have to be careful about generalizing to society as a whole,” Cohen said.
In addition to likely underreporting infant mortalities and people who never bore children, genealogy data skews toward families that have the privilege of accessing and maintaining a detailed history, Merchant said. Sites like Geni.com require a paid membership to access all features. And families that were forced to migrate, through the slave trade or to flee persecution, are probably not as well-represented in their databases.
While these studies have limitations, they also offer opportunities. Biomedical research that marries family trees with genetic and health information could lead to new discoveries about heritable disease risk. In social sciences, genealogies merged with census and tax records could yield findings on things like inequality.
“As soon as you start linking these things, your analytical power goes way up — as do privacy risks,” Cohen said.
With any of these efforts, he added, it’s important to insist “that the norms of science, as far as transparency and openness, still apply.”