Read Time: 7 minutes

Closing the diversity gap in genetic studies

Scientists developed a new statistical method to improve genome-wide studies of diseases and traits in people with diverse ancestries.

Image Credit: Photo by John Schaidler on Unsplash

A core aspect of biomedical research is figuring out what drives physical characteristics, often called phenotypes, and how best to fix anomalies like diseases. Scientists use genetic techniques to hunt for and identify specific locations in the human genome related to diseases. Studies like these are called genome-wide association studies, or GWAS. Scientists use the results of these studies to predict disease risks, and develop relevant prevention or treatment strategies.

However, researchers agree there is a long-standing problem with this approach. Most of the data used in GWAS are from people of European origin because their genetic data are often readily available in large quantities. This geographic restriction makes it difficult for scientists to apply the results to patients of other ancestries, like Asia and Africa, because the genetic structure of each population is unique. Researchers in the past used limited data from other ancestries for GWAS but found it didn’t adequately represent the populations, as these researchers reported for rheumatoid arthritis

Scientists use GWAS analyses to produce a collection of statistical values that estimate the likelihood an individual will develop a trait or disease based on their genetic makeup. This output is similar to a report card with scores, known as polygenic scores. It also tells scientists how genes are passed down through generations and how differences in each gene can affect traits like height, weight, and blood pressure. 

Researchers from different institutes in Australia, Japan, Taiwan, and the Republic of Korea developed a method to address this diversity gap in genome-wide studies. They hypothesized that integrating polygenic scores from European GWASs into analyses of people with different ancestries could account for diversity during genetic research.

They used data from a collection of human biological samples stored for research purposes, known as biobanks. The biobanks used were the UK Biobank, BioBank Japan, Taiwan Bioban, and Korean Genome and Epidemiology Study, representing diverse populations in the UK, Japan, Taiwan, and Korea, respectively. They analyzed 7 traits: height, body mass index, blood pressure, stored fat content, type-2 diabetes, good cholesterol, and bad cholesterol. They included all these traits to ensure their method was versatile. They adopted statistical models to calculate polygenic scores for each trait and computed how GWAS results changed when they included the European polygenic scores in their analyses. 

The researchers tested their hypothesis across different data combinations from the biobanks and past research. They also used statistics to combine the results of similar studies from the past and compared them with the results obtained from their method. They made similar comparisons within the same ancestry and across different ancestries. 

Their goal was to obtain an accurate and reliable estimate of how well their method could enable relevant medical discoveries to help underrepresented populations. They also analyzed genome segments responsible for specific traits to identify if any one nucleotide base in the segment is unique to that trait. The researchers employed prediction tests to assess whether they could use their method on larger datasets than the one they used in their analyses.

They found the polygenic score-adjusted GWAS analyses could explore traits and diseases using small datasets of underrepresented populations with better statistical significance than previous methods. They showed their method was good at detecting rare genetic differences and the relationship between traits. 

The authors reported that even though they mainly used data of East-Asian origin in their analyses, scientists could use their new method on people with other ancestries, provided they could access their polygenic scores. They also acknowledged applying the method would be complex and require high computational resources. But they underscored the method could break down tasks into smaller bits during computation for simultaneous processing.

The authors concluded their method will improve researchers’ ability to discover new features in genetic data. They stressed the method was flexible and could easily be integrated with existing software tools for GWAS analyses. Finally, they recommended researchers use it in future GWAS studies, especially on data from underrepresented populations, to explore the impact of genetic interactions on traits and diseases. 

Study Information

Original study: Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores

Study was published on: September 18, 2023

Study author(s): Adrian I. Campos, Shinichi Namba, Shu-Chin Lin, Kisung Nam, Julia Sidorenko, Huanwei Wang, Yoichiro Kamatani, The Biobank Japan Project, Ling-Hua Wang, Seunggeun Lee, Yen-Feng Lin, Yen-Chen Anne Feng, Yukinori Okada, Peter M. Visscher, Loic Yengo

The study was done at: Institute for Molecular Bioscience (Australia), Osaka University Graduate School of Medicine (Japan), National Health Research Institute (Taiwan), Seoul National University (Republic of Korea), Institute of Health Data Analytics and Statistics (Taiwan), Institute of Epidemiology and Preventive Medicine (Taiwan), The University of Tokyo (Japan), RIKEN Center for Integrative Medical Sciences (Japan), Immunology Frontier Research Center (WPI-IFReC) (Japan), Institute for Open and Transdisciplinary Research Initiatives (Japan), The Biobank Japan Project (Japan)

The study was funded by: Australian Research Council, National Health Research Institutes, National Science and Technology Council of Taiwan, National Taiwan University, Yushan Young Fellow Program, Japan Society for the Promotion of Science, Population Health Research Center, Japan Agency for Medical Research and Development, Takeda Science Foundation, Bioinformatics Initiative of Osaka University, National Research Foundation of Korea

Raw data availability: Code can be found here.

Featured image credit: Photo by John Schaidler on Unsplash

This summary was edited by: Aubrey Zerkle