Crossing the Chasm of AI Interpretability

by Andy Steinbach

The Gold Rush to monetize AI is on. Organizations across the globe — whether they’re seeking the profit that comes from differentiated products or creating value for the public good — are scrambling to take part.

The explosion of machine learning model complexity over the last five years has been driven by the ready availability of big data and parallel NVIDIA GPU-compute power. These are the fuel of the new generations of complex models.

H2O.ai has partnered with NVIDIA to tap into the power of GPU computing to launch its latest solution, called Driverless AI. It uses GPUs to turbocharge automated feature engineering and the building of thousands of models to optimize hyperparameters rapidly.

One key question that is often asked in the midst of this AI revolution is: Can I understand “why”? That is, why does my trained model output the answer it produces? What reasoning is it using? What input variables did it use to reach an answer or prediction?

Given that AI algorithms are expected to drive vehicles, make healthcare decisions, and control financial processes, these questions of model explainability must be addressed. We need to have confidence in the root logic behind these critical decisions, and regulated industries such as banking and insurance require this for their model approval process.

It’s sometimes said that advanced machine learning models are “black boxes” that are hard to understand and, therefore, hard to trust. This viewpoint won’t prevail for long, as powerful new tools are being deployed for explaining and understanding the reason complex AI models produce their decisions.

One such AI interpretability toolkit is H2O.ai’s MLI module (Machine Learning Interpretability) within its Driverless AI solution. Driverless AI accelerates the process of building advanced models by automated feature engineering, model stacking and using powerful models such as gradient-boosted trees. Such complex models increase the demands for powerful explainability tools, and H2O.ai delivers the most sophisticated integrated package for this purpose that I’ve seen.

Their MLI module brings together an impressive array of techniques for both so-called global and local model explainability. For example, MLI allows the examination of surrogate models, which might be a linear regression or decision tree proxy model, allowing data scientists to easily explore and understand deviations from monotonic behavior in more complex candidate models. Users can automatically generate reason codes, partial dependence plots and variable importance reports. And it’s easy to drill down into any outliers to discover for possible valid reasons for deviations. The toolkit is built around a powerful set of data visualization capabilities that allow for sophisticated and fast analysis, and for real-time exploration of complex exceptions or behavior.

Locally Interpretable Model Explainability (LIME) is another powerful explainability technique integrated into the MLI toolkit that can expand a complex model about any point in its input space using a simpler linear model. This is simply a high-dimensional analog to a Taylor series expansion about a fixed example or instance of interest. Local behavior is then understandable by a linear and monotonically varying local model.

Unsupervised data mining techniques such as clustering and autoencoders can uncover hidden latent variables that often elucidate the fundamental relationships in a complex system. In this way, complicated models can actually reveal simpler, underlying low-dimensional behavior that would be impossible to uncover with basic linear techniques.

To see how the challenge of model explainability is being solved with state-of-the-art tools, register for an H2O.ai webinar on August 17. They’ll take a deep dive into their MLI toolkit.