DNA is a biological molecule with four nucleotide bases, adenine, guanine, cytosine, and thymine. It represents the basic building block of life, by encoding our genes. In many organisms, a chemical reaction adds a small molecule called a methyl group to some DNA bases. This reaction, called DNA methylation, can affect the normal behavior of the gene.
Researchers in the past showed DNA methylation contributes to body system functions through a process called gene regulation. Gene regulation happens when the cell nucleus turns a gene on or off in a specific pattern to ensure normal development. This process is why human organs like the heart, brain, and liver act differently and perform different functions, despite containing the same set of genes and chromosomes. Sometimes, DNA methylation regulates genes in the wrong way, leading to diseases like cancer. So, researchers want to know why and where it happens within the gene, as well as what percentage of DNA is methylated versus normal.
Scientists have used gene sequencing technology to study changes in genes, because it reduces them to the 4 simple bases of DNA that are easier to explore. But as the technology advances to improve efficiency, the data becomes more specific and cumbersome to interpret. Researchers need better processing tools to handle all this data.
Researchers at the Mater Research Institute and the University of Queensland recently set out to develop new tools for processing and interpreting gene sequencing data. The authors noted a new sequencing technology, called nanopore sequencing, uses electrical signals and machine learning to detect DNA methylation in longer gene sequences, like human DNA. However, analyzing and interpreting nanopore methylation data is challenging.
Their main aim was to create a suite of tools, which they called Methylartist, that processes and visualizes nanopore DNA methylation data. They used a programming language called Python and some analytics and visualization libraries to develop 12 tools for exploring the DNA data. One or a combination of these tools would check and flag any errors in the DNA sequence data and generate labeled and color-coded graphs showing the types of changes, amounts, and exact locations of the DNA methylation. The authors made the Methyartist tools and a set of how-to tutorials available and accessible to the public on a coding repository called GitHub.
To assess and confirm the ability of the software to identify DNA methylation changes, the researchers sequenced DNA in two types of breast cancer cells with nanopore technology. First, they used other existing software to convert the nanopore data into a format suitable for Methylartist. Then, they used Methylartist to test the data quality, compare the methylation levels in each breast cancer cell, and generate labeled plots to visualize differences in methylation levels between the different cancer cells.
The researchers also compared Methyartist to 3 other similar tools and used it to visualize data from other sequencing technology. They found their tools performed as well as existing tools, had more specific visual aids, and could either work alone or in tandem with existing tools. They also found Methyartist supports all detectable DNA methylation types, enabling scientists to study all modified bases in DNA data with one set of tools.
The authors concluded researchers can use the Methylartist suite of tools to analyze and visualize nanopore methylation data, which can help resolve how modified base pairs contribute to disease. They also plan to further develop Methylartist to improve its accuracy and efficiency as nanopore data evolves. They suggested other researchers could then use it to explore methylation in different diseases, like diabetes and heart disease, or in different organisms, like bacteria and viruses.