Bada Bing Bada Boom: Microsoft Turns to Turing-NLG, NVIDIA GPUs to Instantly Suggest Full-Phrase Queries

by Murat Guney

Hate hunting and pecking away at your keyboard every time you have a quick question? You’ll love this.

Microsoft’s Bing search engine has turned to Turing-NLG and NVIDIA GPUs to suggest full sentences for you as you type.

Turing-NLG is a cutting-edge, large-scale unsupervised language model that has achieved strong performance on language modeling benchmarks.

It’s just the latest example of an AI technique called unsupervised learning, which makes sense of vast quantities of data by extracting features and patterns without the need for humans to provide any pre-labeled data.

Microsoft calls this Next Phrase Prediction, and it can feel like magic, making full-phrase suggestions in real time for long search queries.

Turing-NLG is among several innovations — from model compression to state caching and hardware acceleration — that Bing has harnessed with Next Phrase Prediction.

Over the summer, Microsoft worked with engineers at NVIDIA to optimize Turing-NLG to their needs, accelerating the model on NVIDIA GPUs to power the feature for users worldwide.

A key part of this optimization was to run this massive AI model extremely fast to power real-time search experience. With a combination of hardware and model optimization Microsoft and NVIDIA achieved an average latency below 10 milliseconds.

By contrast, it takes more than 100 milliseconds to blink your eye.

Learn more about the next wave of AI innovations at Bing.

Before the introduction of Next Phrase Prediction, the approach for handling query suggestions for longer queries was limited to completing the current word being typed by the user.

Now type in “The best way to replace,” and you’ll immediately see three suggestions for completing the phrase: wood, plastic and metal. Type in “how can I replace a battery for,” and you’ll see “iphone, samsung, ipad and kindle” all suggested.

With Next Phrase Prediction, Bing can now present users with full-phrase suggestions.

The more characters you type, the closer Bing gets to what you probably want to ask.

And because these suggestions are generated instantly, they’re not limited to previously seen data or just the current word being typed.

So, for some queries, Bing won’t just save you a few keystrokes — but multiple words.

As a result of this work, the coverage of autosuggestion completions increases considerably, Microsoft reports, improving the overall user experience “significantly.”