Long Pedal to a Kaggle Medal: How a Thousand-Kilometer Bike Trip Ended with a Win for Rainforests

by Jamie Beckett

Sometimes, you need to see the trees to see the forest.

Shubin Dai celebrated a recent Christmas with a 1,000-kilometer mountain biking trip that gave him a close-up view of China’s largest rainforest.

This year, the Changsha-based data scientist is celebrating with a $30,000 check, won in a Kaggle data science competition using satellite imagery to protect an even larger rainforest, the Amazon. (Update: Dai recently reached the top rank among 83,500 Kaggle participants around the world, the data science site announced in a May 2018 blog post.)

The challenge was to track the human footprint in the Amazon rainforest by distinguishing human causes of forest loss from natural ones. That could make rainforest protection more effective and more timely, according to the competition website.

Dai faced almost a thousand competitors, but he had an edge others lacked — his on-the-ground experience in the rainforest, and our GPUs. The satellite images were tagged to indicate whether they pictured forest cleared for farming, illegal logging, or some other other activity.

Because he’d seen deforestation up close, Dai could look at satellite data and picture exactly what was happening on the ground. He used that insight to design a GPU-accelerated deep learning model to label images automatically.

Shubin Dai's deep learning system could help reduce forest loss. His trip to China's largest rainforest gave him the insight to make it happen,
Shubin Dai crosses a stream during his bike trip to the Yanoda rainforest, China’s largest. (Photo courtesy of  Dai.)

The Cost of Forest Loss

The Amazon rainforest, the world’s largest, plays an invaluable role in sustaining life — stabilizing climate, producing oxygen and water, and housing some 30 percent of the world’s species.

But it’s steadily being destroyed. Over the past 40 years the Brazilian Amazon alone has lost about a fifth of its forest — an area larger than California and New Mexico combined — to climate change, cattle grazing, industrial agriculture and logging, according to the respected Mongabay environmental science website.

In the Kaggle competition, Planet, which designs and builds satellites to collect images of earth, and its Brazilian counterpart, SCCON, challenged participants to label atmospheric conditions and land use in high-resolution satellite images.

The idea of scanning satellite images to monitor rainforest conditions isn’t new. But according to Planet, most methods rely on lower-resolution images, which limits their effectiveness for observing forest degradation, small-scale deforestation and distinguishing human from natural causes of forest loss.

The Amazon is home to more species of plants and animals (including this tree frog) than any other terrestrial ecosystem.
The Amazon is home to more species of plants and animals — including this tree frog — than any other terrestrial ecosystem. (Photo by Benedict Adam, under a Creative Commons license.)

A Man and His Bike

At the close of the competition in December, Dai — better known as “bestfitting” on Kaggle and “Bingo” by his friends — was fourth among all 66,453 competitors on the Kaggle platform. Later, after placing in the top one percent in 14 straight Kaggle contests, he rose to first on the data science platform, which in May 2018 boasted more than 83,500 Kagglers.

In his day job, Dai manages a team of 200 computer scientists at a company he founded that provides consulting and software development to Chinese banks. On weekends, he’s on his mountain bike, logging long miles across China’s grasslands, mountains and forests.

But the rainforest ride on Hainan Island off the coast of southern China was the one that changed everything.

“My first impression was of how beautiful it was. But that changed to sadness when I saw how much of the forest had been damaged,” he said. “Even in protected zones, it was not easy to find a very big tree.”

That lit a fire under Dai to learn more about rainforest destruction. When the Kaggle competition came up, he was quick to enter it, even though he was already enmeshed in another tough Kaggle contest.

“When I was facing difficulties during the Kaggle competition, I remembered moments in the rainforest. We’d often start at 6 a.m. and finish at 11 p.m. or even 1 a.m. I was very tired,” Dai said. “But thanks to all that, I learned to keep calm when facing a challenge, to keep moving when it seems hopeless and to be grateful all the time.”

A section of the Amazon rainforest that has been cleared for farming.
A section of the Amazon rainforest that has been cleared for farming. (Image by Matt Zimmerman under a Creative Commons license.)

GPUs 24/7

Dai trained his neural network on half of the dataset’s 40,000 images, using the CUDA parallel computing platform and GeForce GTX TITAN X GPUs with the cuDNN-accelerated PyTorch deep learning framework.

His task was to train the network to add one or more of 17 labels to images to indicate whether they contain signs of forest perils or what atmospheric conditions they depict. He discusses his solution in depth on the competition site, or you can read an his interview on the Kaggle blog.

Dai also used the TITAN X GPUs for inferencing, the process of deploying his algorithm.

“I kept my GPUs running almost 24 hours every day, and their stability was beyond my expectations,” he said.

He has since purchased additional NVIDIA GPUs, which he plans to apply to more deep learning projects to benefit the environment and advance medicine.

To learn about another deep learning approach to protecting the rainforest, see “Tree’s Company: AI Maps Biological Riches of the Rainforest.”

* Main image for this story, a scene from the Amazon, is courtesy of Neil Palmer, CIAT, via Wikimedia Commons.