An abstract interstitial image featuring visuals of the ocean and a genetics lab.

Don’t Call It a Throwback: Keolu Fox on the Past, Present, and Future of Genetic Data

Dr. Keolu Fox is a Kānaka Maoli (Native Hawaiian) genetic scientist, and professor at the University of California, San Diego. His multidisciplinary research includes genomic sequencing and editing, and Indigenizing medical research. While the history of genetics has been marred by notions of “normality”—such as Sir Francis Galton’s insistence on a “normal distribution” to justify eugenic ideas—Dr. Fox’s research is flipping the script on norms in genetic science. Many of his projects combine cutting edge technologies, Indigenous data sovereignty and futures, alongside questions of equity and how benefits of medical research are distributed. We sat down with Dr. Fox to learn more about what genetic data is, and the promises and pitfalls of genetic research.

What is digital genetic data and how exactly is it used? 

I hadn’t thought about digital genetic data until a professor by the name of Maynard Olson brought it up during medical school. When you enter graduate school, you have these retreats that you go on, which is very popular in medical school-ville. I was at a retreat and we were sitting around the fire, drinking whiskey with my department grandpa, Maynard Olson. Maynard is a cool cat. He asks me, “You know what the funny thing about genomics is, Keolu?”, and I was like, “No, uncle, what is it?” He replied, “We’re using technology to take something that’s analog and making it digital.” That led to this whole conversation about what bioinformatics is, where it comes from and what technologies enabled it to happen. 

It really comes down to a few characters that really envisioned this. A lot of it happened at Caltech with this guy named Lee Hood. He, in my opinion, is the technologist that enabled the genomic and post-genomic revolution, as well as its evolution. He set out to create and automate a lot of these technologies and is probably the most influential person when it comes to that. 

Think about this: a chemical composition that exists, and how we use technologies to literally take something that is chemical in nature—a pattern, a sequence that exists in nature—and how it can be translated into zeros and ones. Or as Uncle Maynard would say, zeros, ones, twos and threes: because AGCT, right? So then you’re now in this whole different world about the way we think about genes as a form of data and how it’s coded, mined, and operationalized. 

In your lab, how are you using genetic data? 

In a number of different ways—there’s certain things that you can do with it. You can read it—that is the sequencing modality, and there’s all of these different types of sequencing. There was the technology that Fred Sanger created early on in the ’70s and ’80s, which enabled Lee Hood to automate and create the ABI SOLiD, which enabled the Human Genome Project. From there, we have had new disruptive technologies from 2007 to 2010, and we get into this next-generation sequencing mode, and that is the birth of these technologies like Illumina, which is based here in La Jolla. 

That is a very strong reason why I work here—there are certain cities that are these technological epicenters, and La Jolla is one of those. It also has incredible surfing and great burritos and beer—and also has legal weed. So it makes sense that you’re going to get a critical mass of some really interesting characters in this dynamic ecosystem of people that want to innovate in the genetic technology space. 

Sequencing is our primary tool, our bread and butter. But we don’t like to just read, we like to write, so we love genome editing and everything that’s come out of the back end of the genome editing revolution. The specific tools that we like to use include base editing or prime editing, and this comes from Alexis Komor, who is in the Department of Chemistry here at UCSD. Base editing is a precise form of genome editing that allows us to make individual nucleotide changes. So we can use both of those tools to do a bunch of cool shit.

A new fundamental idea that’s been going on in our laboratory is to try to understand genetic data as a resource. We generate sequences from humans because we want to be able to predict and prevent disease—but then what is the actual value of that data? What exactly did you sequence?

That brings us to another set of questions around diversity—not just as a buzzword, but as something that actually has immense value. What you sequence matters, because 90 percent or so of people that are sequenced are of Western European ancestry. The rest of the world, including Big Pharma, now knows that we’ve been looking in the wrong place. 

How so? 

Well, the most interesting things to me are throwback evolutionary questions. Evolution and natural selection, and what is medically actionable or not—these are not independent, they’re the same thing, right? Think about it this way. Darwin’s finches are this great example of natural selection because they allow us to understand variation in the context of geography. But so does sickle cell disease, and high elevation adaptation in Tibet and Nepal. 

Now, let’s take it a step further. I can actually utilize that perspective and that understanding of natural selection to better inform the development of pharmaceutical drugs. As an example: women who are ancestral descendants of communities that have lived for 10,000 years at high elevation in the Himalayas have adapted and have mutations that are advantageous at that elevation—in something called the HIF pathway. That’s a really important mutation for understanding something like oxygen metabolism, and how we might develop new classes of drugs that are involved in cardiovascular disease, or different responses to Covid and respiratory health. Or the next Viagra.

So you see how these are all just avenues for the development of these new drugs. And if I know that natural selection did all the work for me, it allows me to focus on a biomarker that I can then reverse engineer, and save myself a lot of money from the research and development point of view because I don’t have to try a million different isoforms. I can just zero right into the thing that I want. 

When we find a genetic mutation that is associated with protecting against type 2 diabetes, that’s very interesting because might we be able to develop a new drug that is a competitor for metformin that classically doesn’t sometimes work in minority populations. Because on the back side of this, in 95 percent of pharmaceutical drugs, the clinical trials don’t include anybody except white people.

DNA Equity

For something like user data, Google collects data and makes it profitable. For genetic data, what’s the landscape of different actors like? Do you have open source datasets of genetic data, or is it mostly companies like 23andMe collecting genetic data?

A timely question, given 23andMe’s decision to go public. Genomic data by itself is valuable, but it’s not as valuable as genomic data plus an electronic medical health record, or electronic data plus cholesterol data. It needs to have genotype and phenotype. So the metadata that’s associated with genomic data is super, super important.

There is a lot of encouragement from the federal government, and obviously Silicon Valley and Seattle and DC to have an open data environment. Because once it’s all open, it can be aggregated and used to model and predict and prevent things, right? It’s a double-edged sword, because on the one hand you can see how effective an open data environment was in sharing a lot of the discoveries and insights from this last year of having a pandemic, and then how fast vaccines were developed. But there are always going to be perpetual linesteppers and companies that want to abuse this, right?

Vertex Pharmaceuticals, for example, used genetic sequence data derived from cystic fibrosis patients, which they got from the nonprofit Cystic Fibrosis Foundation. Vertex incorporated and recruited patients through that avenue, sequenced their genomes, and then refined the development of this new drug—and then sold it back to those exact same people for $300,000 dollars a year. 

To me, that’s super fucked up—I don’t care who you are. That’s a violation of the Common Rule, that’s a violation of the United States Constitution, that’s a violation of human rights. It’s also highly problematic because Vertex is actively taking advantage of loopholes that exist in our broken healthcare system, where it’s estimated that almost 60 million people don’t have access to healthcare. So, you know, it serves them to socially stratify and create this brave new world of health disparities, which is also gross. 

These companies, they’re taking advantage of this—and not only taking advantage of it, they have lobbying groups to ensure that their assets stay their assets. It’s pretty dangerous. You see companies like 23andMe sell access to their database, a primitive version of their database to GlaxoSmithKline for $300 million. And then this year you watch Ancestry get sold to the Blackstone Group for $4.7 billion. 

That’s a lot of money. It’s very obvious that the value of genomic data is on the rise. With 23andMe going public, you can see that it’s very obvious that the number one commodity on planet Earth is data, and digital sequencing information is a part of that apparatus. 

There are solutions, though, I should note.

Give us some optimism. 

So let’s be positive for a moment. It’s easy to point fingers at all of the terrible stuff that happens with capitalism, and we can get stuck in that kind of negative feedback loop really easily. 

We have been thinking about different types of solutions. One of them would be for the example of 23andMe. For the people who gave you their genomic information for you to create this company, and you’re selling that information to Big Pharma, and then they’re developing drugs and selling it back to those same people for money—why not give them equity? Why not give them a stake?

I feel like that’s a more sustainable model, financially. And would be a new cool kind of stockholder, shareholder-based benefit-sharing model. That’s the way we should be thinking about circular economies. What is just? And when we say equitable and you say equity, what do you mean? When I say equity, I’m talking about financial institutions and structures that allow us to move past the dollar sign and buy back our land. Something like stock shares in 23andMe would enable that, or trusts where that money can go to avoid corruption and cronyism. 

In one of your papers, you talk about the “illusion of inclusion” and the NIH “All of Us” project. One of the common arguments about data is that if data is open instead of proprietary, it will benefit the public instead of just companies. But, in that paper, you talk about how we need to rethink what public benefit really means. Could you elaborate on that idea? 

First, I would say, whose benefit? Whose greater good? I think that’s the bigger question. If we look at that historically, we know very well whose greater good we’re prioritizing over others, and we know that that’s quite hierarchical, right? 

Couching it historically, what does it mean when the federal government uses your tax dollars to pay for a large-scale initiative to sequence one million people’s genomes in America, and that data is open? Does that mean that you, as an average US citizen, are going to benefit directly from that? You know, I think there are probably better uses of taxpayer dollars, maybe breaking up this multibillion dollar project into smaller grants and serving other basic scientific questions. 

But, in the meantime, the reason why this is problematic is because once that data hits the open market, it gets aggregated by these companies like Regeneron and others and is used in the development of pharmaceutical drugs. Meanwhile, the communities who graciously contributed their genomes to this, based on the false pretense of this having an impact on their health, will basically receive nothing. 

Think about the history of the Human Genome Project from 2001. If we think about the actual impact of that project on health, it’s been kind of negligible with respect to common complex disease. What do I mean? The number one and number two cause of death in America are heart disease and cancer. If you look at the rates of the number of people in America that have those disorders, it’s pretty consistent. And because that’s the number one and number two cause of death, we have to look at it and say, “Have large scale genome sequencing projects changed these death rates?”

Common complex diseases are fucking complex. There are multiple genes at play with multiple mutations and many, many other things. And there are also lifestyle choices and stress and sleep and a number of other things that are independent of the innateness of these conditions. But if I keep selling you innateness, and keep trying to say we need to sequence more people’s genomes because it will allow us to reduce the widening gap in health disparities when we know that there are better uses of federal money… I don’t know. There’s a hundred different things that we could be using to do that. 

That’s not to say that genomic technology hasn’t resulted in some pretty incredible stuff—look at this Moderna vaccine. It’s pretty remarkable, but it’s a targeted thing. It is a piece of drug development. It is innovative, but it has a particular goal. 

With respect to common complex disease, I would say that it’s been almost an abject failure. For things like Mendelian disease, though, or identifying the cause of Kabuki syndrome and Miller syndrome and these other really, really rare things, genomic technology has been more successful. 

The whole “All of Us” project is couched on increasing diversity, because we know there’s a dearth of diversity in genetic data—we know that the people who have been sequenced so far in America do not reflect the full spectrum of human genetic variation. It’s a clever trick though: it uses the illusion of widespread benefits to public health outcomes in order to include minority populations so companies can understand their intimate historical relationships with natural selection, and then use those genetic mutations to fasttrack the development of expensive pharmaceutical drugs that are out of reach of those populations.

And that’s why that paper was in The New England Journal of Medicine: because I was super critical, but I also spoke about the facts, and that’s the only thing they respect. Sometimes you have to be honest. I don’t think that that paper particularly made me a lot of fans at the federal government, but I also think that we need to be critical. Part of science is correcting the mistakes we make and really thinking about what true equity and inclusion means. 

“Just-So” Stories

There are many popular genetic origin tracing projects that have made a lot of claims around race, biology, and Indigeneity that are pretty problematic. Do these tracing projects have any scientific value? As a genetic scientist, what do you see as the reason for these types of projects? 

There’s a lot to unpack there. There’s some complex shit. We actually just created this new exhibit at the Bishop Museum in Hawaii called “(Re)Generations: Challenging Scientific Racism in Hawaii.” A lot of the subtext and inspiration for connecting genetics to race comes from the early days of, like, the Franz Boas, you know, Indiana Jones bullshit, a.k.a. eugenics. We’ve been thinking about phrenology and all these old forms of science. 

I think the problem is we consider ourselves totally separate from our past and say, “Those tools of the past, like phrenology and all of these different ways that you measure a human, you know, we’re independent from that now.” Like: “We would never measure the kinkiness of someone’s hair in that way. We would never associate that with race, racism, and social hierarchy.” Meanwhile, what’s going on in La Jolla or the Bay Area is the same exact thing. But it’s really a lot harder to put your finger on those things, because digital data is not something that you can see or taste or touch in that way. So the new form of eugenics and extraction is invisible. 

You also have to know that there’s a certain level of certainty and accuracy with the way we return those results, too. Nothing is 100 percent, in the way that the techniques used to survey the genome vary as well. What 23andMe uses or what National Geographic uses is a very low resolution look at the genome. They use a tiny fraction of the actual variation that you have. And then don’t get me started on the way we create categories, because that’s just biased in itself. Come to Hawaii, where we have the highest percentage of people of mixed ancestry in the United States, and show me who’s Hawaiian—good luck. What characteristic are you going to use? Is it skin color? Because that is exactly the same thing that we were criticizing.

Could you talk about the connection between genetic determinism and disease likelihood, because one of the things that you mentioned in your papers is “just-so” evolutionary explanations. If you get a high likelihood of a disease on 23andMe, are you just doomed forever? What is a just-so evolutionary explanation?

Are you familiar with polygenic risk scores? They’re super interesting. That’s what’s hot right now in our field. They are these algorithms or heuristics that we can use to predict the potential for what we call the pathogenicity of a mutation. We use them to predict whether a mutation is going to be cancerous, or cause heart disease or something like that. 

That predictive quality of genomics is something that a lot of people see as the Holy Grail that we’ve been searching for: the difference between predictive medicine and reactive medicine, and that’s definitely where we want to go. Especially with common, complex disease. But when you train every single algorithm on white people’s data, you bias everything, and so none of these polygenic risk scores work in populations that are not white. 

Then there’s a second part of it. And this gets to the kind of “just-so” world of it. If you say a mutation is cancerous or pathogenic based on a correlative basis, and you don’t have causative data to show that, that’s problematic, right? You have no proof except the P value, you have no proof except some statistical phenomenon or correlation or racist-ass narrative. And so that has just played out through the field of population genetics forever. There are just all these levels of inaccuracy that get baked into these things and they become real. 

One of the most famous examples of this one that I was happy to shoot down was this one that is called the thrifty gene theory. The thrifty gene theory states that Polynesian and Pacific Islander people today have really high rates of obesity and type 2 diabetes because of our history as voyaging people, because if you had these mutations that predispose you to hypercaloric storage on these journeys, it was an advantage. But then once modernity hits, you develop type 2 diabetes and obesity, and it’s because of our genes. 

It’s just so racist because it’s like—no, maybe the reason why we have high rates of obesity is because you took away our access to reefs, to fishing, hunting rights and all these other things, and then you replaced our highly nutritious diet of poi and fish with spam, white rice, and soy sauce. Why are you blaming this on innateness and our evolutionary history? You’re also discrediting our accomplishments as probably the greatest seafaring people in human history.

Okay, you said you found this thrifty mutation in 2015 in Samoa. This is the most fucked up story of all. You found this mutation in the gene CREBRF, and you say that it’s a thrifty mutation in fucking Nature. Everybody reads Nature. The gold standard. You said that in Nature and yet you have no functional evidence in your paper other than a mouse overexpression, which is essentially punting it? I can tell when things get appended as an experiment on the back end of a paper, like an auxiliary addition. They were asked by a reviewer to do that to show something—they did this bullshit that basically made the data fit a model. Super sketchy, but they got it through. Which speaks to how racist and faulty a lot of these journals are too.

Luckily, I was doing my postdoc in genomic engineering and genome editing. And so I said, okay, what if we use Crispr and take their mutation and then create a stem cell line that includes the Polynesian mutation in one population of cells, and then in the other population of cells we don’t include the mutation? It’s called an “isogenic cell line model”—we’re reverse engineering a mutation from natural selection’s point of view and we’re putting it in the cellular model. Now we can starve it from insulin and see what happens, we can measure the mutation’s effect on a range of things that are involved in producing metabolites and actually speculate on it from the point of view of a causative experiment. That’s the cool shit about genome editing for us. We can actually rewrite evolutionary history and empower Indigenous people by using genome editing. Thank you, Jennifer Doudna.

But, again, it’s about who uses the tools, right? And, lo and behold, what we’re seeing is very different from that paper. So we’re seeing that that mutation is at a much higher percentage of people that are of Polynesian ancestry that are professional athletes, seeing that it’s associated with bone density, muscle density, and other things—we don’t quite know yet fully, but it’s definitely not a fucking thrifty gene, tell you that much.

Now the other thing that they did—Steve McGarvey from Brown University—is they tried to file an international patent claim around the mutation and they didn’t include anyone of Polynesian ancestry in the patent claim. And they put it up there. And I think because I went to like every single genetics society and medical society and just buried them since 2018, it didn’t work. Not just me, but others that are from the culture, to my Maori colleagues, my Tahitian colleagues, my Samoan colleagues, my Chamorro, my Taiwanese colleagues, everybody was like: fuck this and fuck you, you have no right to try and commodify this, you know? 

That was the moment where I started thinking intensely about IP, commodification and natural selection. It was so close to home and was this weird moment where I was like, “Oh shit. They are going to just keep doing this.”

Indigenous Futures

In the context of patenting and data sovereignty, could you talk about what you’re doing at the Indigenous Futures Institute at UCSD? 

We have begun to think about data as a resource, and we’ve begun to really recognize data as the number one commodity on planet Earth, much like everyone else. And it’s not just us. It’s not a moment, it’s a movement, and so there are tons of different people that are thinking about this in different spaces. 

My hope is that people like Deb Haaland, the new Secretary of the Interior who is Laguna Pueblo, will represent our interests. And I’m hoping that as we have these conversations about land and what the Department of the Interior is responsible for, they’ll really begin to think about data as a resource and being responsible for data. Because what that does is it allows Indigenous people to be in control and have governance over our data. And once that happens, that can then be used to buy back land and begin all of these new forms of stewardship and guardianship of land and revitalization of our cultures. So that’s a very holistic way of thinking about Indigenous data sovereignty. 

What we’re doing at the Indigenous Futures Institute is we’re really thinking about ways to do that. This work is couched within our design lab and engineering school. We’re intentionally trying to make it as multidisciplinary as possible, but sometimes it’s hard to be disruptive and chaotic in academia. But we’re trying our best, you know. We have a range of different focuses, but it’s about codesigning technology with communities and being iterative about that process. 

That could range from the story I just told you about what happens when Indigenous people use genome editing to think about our past, or what happens when Indigenous people use machine learning and deep learning to shine a light on the past. It could include automating our archival records. It could also be thinking about creating deterrent and safeguarding technologies to avoid museum collections from sequencing our ancient genomes without our consent. There could be a number of different ways that we use those technologies.

Then there’s the focus on education, educational tools and creating a safe space for all of our Indigenous and historically marginalized communities of people to come, learn, and be productive and innovative. We have a huge focus on sustainability, whether it’s environmental sustainability or looking towards Indigenous innovation. Within architecture and urban planning, what does it mean when Indigenous communities are in the driver’s seats of those fields? And how do we actually begin to look towards knowledge that is quite ancient for modern solutions? 

So that’s it in a nutshell. Some of the people that are involved there are Provost Wayne Yang and Theresa Ambo, Sarah Aarons, Manuel Carrillo. That’s the core group, but it’s very new. We’re also thinking about Indigenous takes on economics. Like, why is your financial projection quarterly? Why isn’t it, you know, ten generations?

Keolu Fox is a genetic scientist and professor at the University of California, San Diego.

This piece appears in Logic's issue 13, "Distribution". To order the issue, head on over to our store. To receive future issues, subscribe.