AI and NVIDIA Parallel Processing: A PhD Student’s Research

by Martin Peniak

Humanoid robots and technology in general have always fascinated me. This caused my parents lots of trouble during my childhood. I would secretly disassemble pretty much all the electrical devices in our household in hope to understand the underlying principles that made them work.

As I grew up, this curiosity broadened and I started asking questions about the fundamental principles of the universe, the nature of reality and our consciousness. I finished an engineering school in my home country Slovakia and soon after moved to the United Kingdom where I studied computing and astronomy.

Nowadays, I do a PhD at The University of Plymouth for the iTalk project (Integration of Action and Language in Humanoid Robots). The iTalk project ( proposal beat the competition from 31 other applications and won a £4.7 million grant from The European Commission’s Seventh Framework Programme. This highly ambitious project, coordinated by my supervisor Professor Angelo Cangelosi, aims at developing biologically-inspired artificial systems that can progressively develop their cognitive capabilities through the interaction with their environments.

Based on the insights from neuroscience, developmental psychology, robotics, linguistics and others we argue that cognitive skills (e.g. memory, reasoning, symbolic thinking, visual and auditory processing, etc.) have their foundations in the morphology and material properties of our bodies. The iTalk project emphasizes the role of embodiment by using one of the most complex humanoid robots in the world. This intricate humanoid robotic platform called iCub is approximately 105cm high, weights around 20.3kg and was designed by the RobotCub Consortium (

I originally heard about the CUDA framework from a Russian friend who was planning to use it to accelerate the Mars Rover Simulator that I previously developed during our collaboration with ESA (European Space Agency). I could not spend any more time on the ESA research, as I had to resume my PhD and proceed with the research. However, one day I found an article about a neural network implementation using CUDA and I was impressed by the performance increases that were achieved. A day later, I showed the article to my colleague Anthony Morse and after many discussions we agreed that GPU processing is exactly what we needed in our laboratory as most of the systems we use can easily be parallelised.

We looked into OpenCL as an alternative but CUDA framework provided much more support and the API was really good. Therefore we decided to go with CUDA, ordered six servers with several Tesla C1060 and GeForce GTX470 cards and created a Linux based supercomputing cluster for an affordable price that is capable of performing over 12TFLOPS (trillion operations per second). In order to start utilising this power we began the development of CUDA-enabled software named Aquila that is tailored for the iCub humanoid robot and the execution of several different bio-inspired systems.

For my PhD research, I use Aquila to develop complex artificial neural networks, inspired by those found in the brain, and use them for the real-time control of the iCub robot. These artificial neural networks often consist of thousands of neurons that are connected to many other neurons as well as to several modalities (somatosensory, vision, language, etc.) of the iCub robot through millions of synaptic connections. The multidimensional input from various senses is abstracted into internal representations meaningful to the system. This is achieved through the use of so-called self-organising maps, which closely resemble the topologically organised cortices found in the brain. Often reaching sizes of several thousands neurons, these maps are abstracting the original visual data obtained through the process of applying special filters to millions of pixels. Apart from this visual processing, the system needs to work with linguistic and somatosensory inputs while performing millions of calculations needed to activate the neural network at every 50-100ms.

CUDA framework accelerated the online neural network control several hundred times on average, and the algorithms responsible for iCub’s training showed around 50x speed increase. I have developed both CPU and GPU versions and although I haven’t completed extensive optimisations, the nice thing about CUDA is that simply by naïve parallelisation of the CPU code one can achieve massive speedups using GPU devices.

As quantum computing is still in its infancy, to me it seems that massively parallel GPU processing is the way to move forward since CPU architectures are simply not suited for parallel tasks, consume too much energy and do not scale well.