Read Time: 6 minutes, 634 words

Creating a better algorithm to detect ovarian cancer

The studying of chromosome lengths could potentially act as a better algorithm for detecting probability of ovarian cancer in women.

Image Credit: Teal Ribbon, by Freepik

In their lifetime, most people will be able to tell you about someone they know who has battled cancer. Most cancers rely on an early diagnosis so that treatment can begin right away. However, that is easier said than done in some types of cancer. Ovarian serous carcinoma falls into this category.

This type of ovarian cancer is very difficult to diagnose early, and the consequences of this are often fatal. Women that are diagnosed are usually in the late stages of cancer, and the odds of beating it are poor. One of the reasons why this happens is that the current technology used for determining the likelihood of getting ovarian cancer is not as accurate and sensitive as it could be. So the question remains: is there a way to improve this system?

Christopher Toh and James P. Brody address this question in a new study, Genetic risk score for ovarian cancer based on chromosomal-scale length variation. They set out to create a new technology that would read patients’ chromosomes, and hopefully better predict a woman’s likelihood of getting ovarian cancer.

Chromosomes are long strings of genes that hold all of our genetic information. They often become longer or shorter as cancer develops. Looking at how long these strings of genes are may be able to tell scientists how likely a woman is to get ovarian cancer.

Toh and Brody used data from the Cancer Genome Atlas project (TCGA), which was funded by the National Institute of Cancer. The TCGA includes samples of 33 different types of cancer in 11,000 people. The samples Toh and Brody studied were of 4,669 women, 414 of whom had ovarian cancer and 4,225 of whom did not. The group of women who did not have ovarian cancer was the control group for the study. The control group gives a baseline, telling scientists what the data would look like under normal circumstances.

The researchers measured chromosome length using their own process, called chromosome-scale length variation. Put more simply, this term describes how people’s different traits can affect the length of chromosomes. They separated all the data into two sections, “ovarian cancer” and “normal.” Within those groups, they searched for the most accurate machine learning algorithm to look through the patients’ genetic code and distinguish the women who are likely to get ovarian cancer from the women who are not.

Using a coding package named H2O in software called R, Toh and Brody took the samples from the 4,669 women who already had ovarian cancer and ran it through the algorithm they chose called Gradient Boosting Machine. This teaches the program how to tell the difference between the women with ovarian cancer from the women without it. If the algorithm can do this, it may be able to detect from a random sample the women that may get ovarian cancer in the future. It may also be able to detect those who have recently developed it.

The results Toh and Brody got from this test run were promising for future research. Humans have 23 pairs of chromosomes, and the program was able to rank them based on how important they are when it comes to a woman’s probability of having ovarian cancer. However, the researchers do not yet know how the program will do when faced with a sample of people who have not yet been diagnosed with any type of cancer. Toh and Brody concluded that possible future studies could include applying the program to people who are not a part of the TCGA dataset.

Since ovarian cancer is extremely difficult to diagnose early, these results are promising for future research. Hopefully at some point in the future, looking at data like this will help women across the globe assess their probability of developing ovarian cancer, and put a stop to it before it begins.

Study Information

Original study: Genetic risk score for ovarian cancer based on chromosomal-scale length variation

Study was published on: March 9, 2021

Study author(s): Christopher Toh, James P. Brody

The study was done at: University of California (USA)

The study was funded by: No funding was used. Authors declare that the results from this study used data generated by The Cancer Genome Atlas (TCGA) which was funded by the National Cancer Institute.

Raw data availability: Database used was The Cancer Genome Atlas, at

Featured image credit: Teal Ribbon, by Freepik

This summary was edited by: Mary Sabuda