How COVID-19 Will Drive Adoption of Natural Language Processing
Harry Potter and his wizard friends often hit the books to solve a mystery. They spent hours searching library stacks trying to learn about Nicolas Flamel, eventually discovering that he created the Elixir of Life.
Outside the wizarding world, we don’t have enough time to read everything. This is one reason the COVID-19 virus remains a mystery to doctors, researchers and political leaders. They just don’t have time to connect the dots. But that is going to change.
On March 16, the White House asked experts across the public and private sector for help understanding 29,000 articles about COVID-19, SARS-CoV-2, and the Coronavirus group. Organizations including Microsoft, the Chan Zuckerberg Initiative and Georgetown helped collect the articles, which swelled to 44,000 in two weeks. Google Cloud’s Kaggle platform hosts both the articles and analytics tools that experts worldwide contribute.
Healthcare, life sciences and academic organizations will use natural language processing (NLP) and machine learning (ML) in particular to tackle this challenge. NLP refers to the subset of AI that summarizes, translates and/or responds to speech or text. NLP often relies on ML to improve its accuracy by learning from word patterns. Together these technologies will help explore and analyze the fast-growing body of research on COVID-19.
Many industries benefit from NLP because it saves experts the trouble of reading everything. Lawyers use NLP to rapidly identify relevant legal precedents or regulatory requirements. Engineers use NLP to automatically scan myriad articles, then drill into the most relevant passages for their research effort. Chemists use it to quickly collate a wide range of peer findings before drawing their own conclusions.
NLP similarly speeds and enriches medical research of all types. Doctors, academics and scientists navigate more documents, faster, than they would otherwise. They discover more facts and correlate more data points per hour, freeing up time to apply their irreplaceable domain knowledge to the most actionable material.
OpenText Magellan, for example, helps navigate and process technical medical text. It exposes key entities and topics, such as “antibiotics” or “pneumonia,” and converts them to search filters. Analysts can quickly sort numerous documents by entity, topic or type. They can get a quantified summary of references to entities such as drugs, symptoms, organizations or locations – perhaps 3 references to China, 17 to the United States, etc. They can peruse short auto-written summaries of documents turned up by a given search.
How do Magellan and other NLP-enabled products penetrate the jargon? They provide preconfigured, industry-specific taxonomies that users customize both manually and automatically via ML. While nothing replaces human expertise, NLP gives the experts a head start and a map of the territory.
Such features will help COVID-19 experts quickly identify nuggets floating in that ocean of text. They will accelerate their research in the areas of diagnosis, vaccine development, clinical trials and assessments of population health risks. Here are a few examples.
- Cast a wide net. Researchers will broaden their source data by using ML-enabled NLP to translate terms across disciplines or time. For example, when they search for “H1N1,” NLP will turn up documents for the “Spanish Influenza” of 1918 once it learns they are the same virus.
- Speed things up. Academics, biotech firms and pharmaceutical firms developed more than 40 potential vaccines for COVID-19 so far, and Johnson & Johnson aims to start human trials in September. NLP can help identify risky side effects of a vaccine candidate faster by improving the breadth and accuracy of searches of clinical trial records for vaccine components such as proteins.
- Make new correlations. ML-enabled NLP creates lucky surprises by connecting dots across disciplines that normally don’t overlap. Researchers can easily search for new types of documents for certain symptoms or treatment outcomes, confident they are making the necessary translations of key terms. Chris Wynder, a Ph.D. in molecular genetics and currently Product Marketing Director at OpenText, likens NLP to a super version of the lead character in the hit TV show “House,” who solves medical mysteries by drawing on his encyclopedic knowledge of seemingly unrelated facts.
These NLP capabilities also help govern data. Analysts can organize documents with common formats and metadata. Compliance personnel can use NLP to identify and filter Personally Identifiable Information (PII) for auditing purposes. Data stewards can use NLP-generated metadata and terminology to help build data catalogs and business glossaries.
COVID-19 forces all types of health-related organizations to open up and share their data. As they do so, they will increase usage of NLP to navigate different formats, languages, terminologies, and biases, and other industries will follow. Doctor House is on the way.