Read Time: 7 minutes

An interactive virtual assistant for cancer research

Scientists developed an Alexa-based virtual assistant to help researchers and clinicians understand and interpret cancer genome data.


shadow
Image Credit: Designed by the author using canva

Imagine asking your virtual assistant, “Hey, Google/Alexa, get me the lyrics of Ed Sheeran’s ‘Beautiful People.’” You’ve just interacted with a voice user interface and likely received the information you need in seconds. Cancer doctors and researchers face the challenges of exploring and interpreting cancer genome data, which is like a massive library with billions of works in different categories. What if they had a tool like Alexa that could answer questions about their data in seconds?

Traditionally, researchers analyze cancer genome data with computer programming or interactive websites with point-and-click functionality. In addition to being time-consuming, researchers agree both of these methods often require advanced technical knowledge that not all clinicians and researchers have. Scientists in Singapore and the United States collaborated to develop a conversational virtual assistant to navigate the massive library of cancer genomes. They called this assistant Melvin. Their goal was to make relevant information accessible to all users quickly, regardless of their technical expertise. 

The scientists described Melvin as a software tool that allows users to interact with cancer genome data using simple conversations through Amazon Alexa. They incorporated familiar Alexa features such as understanding and speaking everyday English and allowing researchers to initiate a conversation by saying the name “Alexa.” Additionally, the scientists included a knowledge base with genomic data of 33 cancer types from a global cancer database called The Cancer Genome Atlas, gene expression data, and mutations known to make cancer more likely. They also incorporated secondary information like the definitions and locations of human genes, protein information, and records of cancer drug performances from their respective databases to help users interpret their results effectively. 

The scientists curated approximately 24,000 pronunciation samples for cancer genes, cancer types, mutations, genomic data types, and synonyms for all the terms in these categories from 9 cancer experts at the Cancer Science Institute of Singapore. These experts were from Singapore, Indonesia, Sri Lanka, America, and India, which was necessary to add accent diversity to Melvin. The scientists mentioned that the pronunciations did not cover all known cancer genes and features because that would prolong the data collection time. 

The scientists explained that voice user interfaces work well when they correctly hear and understand the user, including the context of the conversation. Since cancer terms differ from the usual English vocabulary, they enabled Melvin with cancer vocabulary using a machine learning process of giving meaning to previously unknown words called Out-of-Vocabulary Mapper Service design

In addition, they developed a web portal where users can submit their pronunciations for certain cancer features that Melvin might not recognize initially, so when Melvin hears those words, it knows what the user means. To address potential security concerns from users about their recordings, the scientists pointed out how users can avoid data storage by following Amazon Alexa’s account instructions to delete the recordings. They discussed the opportunities to extend Melvin’s capabilities through crowdsourcing for pronunciation improvements. They expect these pronunciations would provide more data to fit regional and national accents for Melvin to understand and speak. 

The scientists highlighted Melvin’s ability to work on any device that supports Alexa and to answer users’ questions like “Tell me about gene_name” and “What percentage of lung cancer patients have mutations in that gene?” They reported that Melvin processes these queries and provides responses in audio and visual formats within seconds. 

They also reported that it allowed follow-up questions based on previous conversations. They explained the difficulty of getting valuable information from a single question and emphasized the value of Melvin’s ability to retain context through incremental questioning. The scientists affirmed this design made it easy for users to explore multiple related inquiries in a single conversation. They also demonstrated that Melvin performs advanced analytical tasks, such as comparing mutations in specific genes across different cancer types or analyzing how gene expression varies. 

The scientists concluded that Melvin will promote scientific discovery in cancer research and help convert research findings into solutions clinicians can apply to their patients. They acknowledged that while Melvin’s framework is currently centered around cancer genes, it could be extended to support more attributes of cancer. The team plans to enhance Melvin by adding more high-value datasets and functionalities based on user feedback.

Study Information

Original study: Melvin is a conversational voice interface for cancer genomics data

Study was published on: January 5, 2024

Study author(s): Akila R. Perera, Vinay Warrier, Shwetha Sundararaman, Yi Hsiao, Soumita Ghosh, Linganesan Kularatnarajah, Jason J. Pitt

The study was done at: University of Michigan (USA), National University of Singapore (Singapore), Agency for Science, Technology and Research (A*STAR) (Singapore)

The study was funded by: National Research Foundation Singapore, The Singapore Ministry of Education, National Supercomputing Center, Singapore

Raw data availability: From The Cancer Genome Atlas and other publicly available sources

Featured image credit: Designed by the author using canva

This summary was edited by: Ben Pauley