It would be an exaggeration to say you’ll never again read a news article overhyping a medical breakthrough. But, thanks to researchers at the University of Copenhagen, spotting hyperbole may one day get more manageable.
In a new paper, Dustin Wright and Isabelle Augenstein explain how they used NVIDIA GPUs to train an “exaggeration detection system” to identify overenthusiastic claims in health science reporting.
The paper comes amid a pandemic that has fueled demand for understandable, accurate information. And social media has made health misinformation more widespread.
Research like Wright and Augenstein’s could speed more precise health sciences news to more people.
A ‘Sobering Realization’
“Part of the reason why things in popular journalism tend to get sensationalized is some of the journalists don’t read the papers they’re writing about,” Wright says. “It’s a bit of a sobering realization.”
It’s hard to blame them. Many journalists need to summarize a lot of information fast and often don’t have the time to dig deeper.
That task falls on the press offices of universities and research institutions. They employ writers to produce press releases — short, news-style summaries — relied on by news outlets.
That makes the problem of detecting exaggeration in health sciences press releases a good “few-shot learning” use case.
Few-shot learning techniques can train AI in areas where data isn’t plentiful — there are only a few items to learn from.
It’s not the first time researchers have put natural language techniques to work detecting hype. Wright points to the earlier work of colleagues in scientific exaggeration detection and misinformation.
Wright and Augenstein’s contribution is to reframe the problem and apply a novel, multitask-capable version of a technique called Pattern Exploiting Training, which they dubbed MT-PET.
The co-authors started by curating a collection that included both the releases and the papers they were summarizing.
Each pair, or “tuple,” has annotations from experts comparing claims made in the papers with those in corresponding press releases.
These 563 tuples gave them a strong base of training data.
They then broke the problem of detecting exaggeration into two related issues.
First, seeing the strength of claims made in press releases and the scientific papers they summarized. Then, identifying the level of exaggeration.
They then ran this data through a novel kind of PET model, which learns much the way some second-grade students learn reading comprehension.
The training procedure relies on cloze-style phrases — phrases that mask a keyword an AI needs to fill — to ensure it understands a task.
For example, a teacher might ask a student to fill in the blanks in a sentence such as “I ride a big ____ bus to school.”
If they answer “yellow,” the teacher knows they understand what they see. If not, the teacher knows the student needs more help.
Wright and Augenstein expanded on the idea to train a PET model to both detect the strength of claims made in press releases and to assess whether a press release overstates a papers’ claims.
The researchers trained their models on a shared computing cluster, using four Intel Xeon CPUs and a single NVIDIA TITAN X GPU.
As a result, Wright and Augenstein were able to show how MT-PET outperforms PET and supervised learning.
Such technology could allow researchers to spot exaggeration in fields with a limited amount of expertise to classify training data.
AI-enabled grammar checkers can already help writers polish the quality of their prose.
One day, similar tools could help journalists summarize new findings more accurately, Wright says.
To be sure, putting this research to work would need investment in production, marketing and usability, Wright says.
Wright’s also realistic about the human factors that can lead to exaggeration.
Press releases convey information. But they also need to be bold enough to generate interest from reporters. Not always easy.
“Whenever I tweet about stuff, I think, ‘how can I get this tweet out without exaggeration,’” Wright says. “It’s hard.”
You can catch Dustin Wright and Isabella Augenstein on Twitter at @dustin_wright37 and @IAugenstein. Read their full paper, “Semi-Supervised Exaggeration Detection of Health Science Press Releases,” here: https://arxiv.org/pdf/2108.13493.pdf.
Featured image credit: Vintage postcard, copyright expired.