The next time a virtual assistant seems particularly thoughtful rescheduling your appointment, you could thank it. Who knows, maybe it was built to learn from compliments. But you might actually have Gabor Angeli to thank.
The engineering manager and members of his team at Square Inc. published a paper on techniques for creating AI assistants that are sympathetic listeners. It described AI models that approach human performance in techniques like reflective listening — re-phrasing someone’s request so they feel heard.
These days his team is hard at work expanding Square Assistant from a virtual scheduler to a conversational AI engine driving all the company’s products.
“There is a huge surface area of conversations between buyers and sellers that we can and should help people navigate,” said Angeli, who will describe the work in a session available now with a free registration to GTC Digital.
Square, best known for its stylish payment terminals, offers small businesses a wide range of services from handling payroll to creating loyalty programs.
Hearing the Buzz on Conversational AI
A UC Berkeley professor’s intro to AI course lit a lasting fire in Angeli for natural-language processing more than a decade ago. He researched the emerging field in the university’s AI lab and eventually co-founded Eloquent, an NLP startup acquired by Square last May.
Six months later, Square Assistant was born as a virtual scheduler.
“We wanted to get something good but narrowly focused in front of customers quickly,” Angeli said. “We’re adding advanced features to Square Assistant now, and our aim is to get it into nearly everything we offer.”
Results so far are promising. Square Assistant can understand and provide help for 75 percent of customer’s questions, and it’s reducing appointment no-shows by 10 percent.
But to make NLP the talk of the town, the team faces knotty linguistic and technical challenges. For example, is “next Saturday” this coming one or the one after it?
What’s more, there’s a long tail of common customer queries. As the job description of Square Assistant expands from dozens to thousands of tasks, its neural network models expand and require more training.
“It’s exciting to see BERT [Bidirectional Encoder Representations from Transformers] do things we didn’t think were possible, like showing AI for reading comprehension. It amazes me this is possible, but these are much larger models that present challenges in the time it takes to train and deploy them,” he said.
GPUs Speed Up Inference, Training
Angeli’s team started training AI models at Eloquent on single NVIDIA GPUs running CUDA in desktop PCs. At Square it’s using desktops with dual GPUs supplemented with training for large hyper-parameter jobs run on GPUs in the AWS cloud service.
In its tests, Square found inference jobs on average-size models run twice as fast on GPUs than CPUs. Inference on large models such as RoBERTa run 10x faster on the AWS GPU service than on CPUs.
The difference for training jobs is “even more stark,” he reported. “It’s hard to train a modern machine-learning model without a GPU. If we had to run deep learning on CPUs, we’d be a decade behind,” he added.
Faster training also helps motivate AI developers to iterate designs more often, resulting in better models, he said.
His team uses a mix of small, medium and large NLP models, applying pre-training tricks that proved their worth with computer vision apps. Long term, he believes engineers will find general models that work well across a broad range of tasks.
In the meantime, conversational AI is a three-legged race with developers like Angeli’s team crafting more efficient models as GPU architects design beefier chips.
“Half the work is in algorithm design, and half is in NVIDIA making hardware that’s more optimized for machine learning and runs bigger models,” he said.