Long Pedal to a Kaggle Medal: How a Thousand-Kilometer Bike Trip Ended with a Win for Rainforests

by Jamie Beckett

Sometimes, you need to see the trees to see the forest.

Shubin Dai celebrated a recent Christmas with a 1,000-kilometer mountain biking trip that gave him a close-up view of China’s largest rainforest.

This year, the Changsha-based data scientist is celebrating with a $30,000 check, won in a data science competition using satellite imagery to protect an even larger rainforest, the Amazon.

Organized by the Kaggle platform for data science competitions, the challenge was to track the human footprint in the Amazon rainforest by distinguishing human causes of forest loss from natural ones. That could make rainforest protection more effective and more timely, according to the competition website.

Dai faced almost a thousand competitors, but he had an edge others lacked — his on-the-ground experience in the rainforest, and our GPUs. The satellite images were tagged to indicate whether they pictured forest cleared for farming, illegal logging, or some other other activity.

Because he’d seen deforestation up close, Dai could look at satellite data and picture exactly what was happening on the ground. He used that insight to design a GPU-accelerated deep learning model to label images automatically.

Shubin Dai's deep learning system could help reduce forest loss. His trip to China's largest rainforest gave him the insight to make it happen,
Shubin Dai crosses a stream during his bike trip to the Yanoda rainforest, China’s largest. (Photo courtesy of  Dai.)

The Cost of Forest Loss

The Amazon rainforest, the world’s largest, plays an invaluable role in sustaining life — stabilizing climate, producing oxygen and water, and housing some 30 percent of the world’s species.

But it’s steadily being destroyed. Over the past 40 years the Brazilian Amazon alone has lost about a fifth of its forest — an area larger than California and New Mexico combined — to climate change, cattle grazing, industrial agriculture and logging, according to the respected Mongabay environmental science website.

In the Kaggle competition, Planet, which designs and builds satellites to collect images of earth, and its Brazilian counterpart, SCCON, challenged participants to label atmospheric conditions and land use in high-resolution satellite images.

The idea of scanning satellite images to monitor rainforest conditions isn’t new. But according to Planet, most methods rely on lower-resolution images, which limits their effectiveness for observing forest degradation, small-scale deforestation and distinguishing human from natural causes of forest loss.

The Amazon is home to more species of plants and animals (including this tree frog) than any other terrestrial ecosystem.
The Amazon is home to more species of plants and animals — including this tree frog — than any other terrestrial ecosystem. (Photo by Benedict Adam, under a Creative Commons license.)

A Man and His Bike

Dai, who goes by the name “bestfitting” on Kaggle, is fourth among all 66,453 Kaggle competitors after just a year on the platform. He’s reached the top one percent in each of his last seven contests, but this was his first top prize.

In his day job, Dai manages a team of 200 computer scientists at a company he founded that provides consulting and software development to Chinese banks. On weekends, he’s on his mountain bike, logging long miles across China’s grasslands, mountains and forests.

But the rainforest ride on Hainan Island off the coast of southern China was the one that changed everything.

“My first impression was of how beautiful it was. But that changed to sadness when I saw how much of the forest had been damaged,” he said. “Even in protected zones, it was not easy to find a very big tree.”

That lit a fire under Dai to learn more about rainforest destruction. When the Kaggle competition came up, he was quick to enter it, even though he was already enmeshed in another tough Kaggle contest.

“When I was facing difficulties during the Kaggle competition, I remembered moments in the rainforest. We’d often start at 6 a.m. and finish at 11 p.m. or even 1 a.m. I was very tired,” Dai said. “But thanks to all that, I learned to keep calm when facing a challenge, to keep moving when it seems hopeless and to be grateful all the time.”

A section of the Amazon rainforest that has been cleared for farming.
A section of the Amazon rainforest that has been cleared for farming. (Image by Matt Zimmerman under a Creative Commons license.)

GPUs 24/7

Dai trained his neural network on half of the dataset’s 40,000 images, using the CUDA parallel computing platform and GeForce GTX TITAN X GPUs with the cuDNN-accelerated PyTorch deep learning framework.

His task was to train the network to add one or more of 17 labels to images to indicate whether they contain signs of forest perils or what atmospheric conditions they depict. He discusses his solution in depth on the competition site, or you can read an his interview on the Kaggle blog.

Shubin also used the TITAN X GPUs for inferencing, the process of deploying his algorithm.

“I kept my GPUs running almost 24 hours every day, and their stability was beyond my expectations,” he said.

He has since purchased additional NVIDIA GPUs, which he plans to apply to more deep learning projects to benefit the environment and advance medicine.

To learn about another deep learning approach to protecting the rainforest, see “Tree’s Company: AI Maps Biological Riches of the Rainforest.”

* Main image for this story, a scene from the Amazon, is courtesy of Neil Palmer, CIAT, via Wikimedia Commons.