by Will Ramey

For developers wanting to get ready for the impending arrival of Fermi, we’ve got exciting news. NVIDIA just released the 3.0 version of its CUDA Toolkit, which gives developers all the tools needed to start preparing their code for Fermi-based GPUs due to hit the market in a few weeks time. You can download it today at

There’s been a lot of anticipation about how the Fermi architecture can help accelerate research. The release of the 3.0 toolkit means that the scientific community is a step closer to realizing some of these performance gains. We talked to Professor Richard Brower, U.S. Software Coordinator for USQCD, Boston University Physics and Electrical Engineering Departments, about what the new toolkit means for his research in quantum chromodynamics (QCD), an area of particle physics.

“QCD codes need all the compute cycles they can get and we're really excited about the results we're getting by using GPUs – we've already reduced the cost of our calculations by a factor of 5," Professor Bower said, adding that features in the new CUDA Toolkit that are essential for high performance computing – like GPU acceleration for more complex linear algebra routines – are going to help advance his QCD research.

The new toolkit lets developers take advantage of innovations in the Fermi architecture that make these new GPUs exceptionally well suited for scientific applications.

Here are some of the key features the CUDA Toolkit v3.0 includes:

  • Support for new GPUs based on Fermi architecture – including ECC, optimized double precision, support for linear algebra libraries such as BLAS and LAPACK, the CUDA-GDB debugger and Visual Profiler
  • C++ support – delivering improved productivity with class and template inheritance
  • GPGPU/Graphics interoperability – delivering Direct3D 9, 10 and 11 and OpenGL for both CUDA and OpenCL
  • Improved developer tools for Linux – including the new CUDA Memory Checker that reports misalignment and out-of-bounds errors
  • Tesla Compute Cluster (TCC) – improving performance and cluster management

We hope you’ll take a look at the new CUDA Toolkit 3.0, and learn more about the tools and resources we’ve got for all NVIDIA developers in the Developer Zone. Also, for more guides on tuning for Fermi, we have published guides here.

Similar Stories

  • jazzquezz

    What you really mean by “supports a full range of languages and APIs including CUDA C […] and PGI CUDA Fortran.”??? Does this mean that in order to compile a kernel in the GPGU written in fortran you must pay the fortran PGI compiler?????

  • Will Ramey

    Hi @jazzquezz,
    Thanks for asking!
    NVIDIA provides the NVCC compiler, which supports CUDA C/C++, and works with other companies to support additional languages and APIs.
    PGI currently offers two solutions for compiling Fortran for the GPU.
    Their CUDA Fortran[1] solution supports an explicit programming model similar to what NVIDIA supports via NVCC and the CUDA C Runtime. Their Accelerator Compilers[2] support a more implicit programming model in which programmers use compiler directives to specify subroutines and regious of code to be accelerated by the GPU.
    Hope this is helpful,


    hi will,
    thanks a lot for your answer.. i am afraid that the answer to my question is “yes, you have to pay the pgi compiler”, right?
    just to be sure that there are no other options.
    thanks again!

  • Seth

    Where can I find additional information about “Tesla Compute Cluster?”

  • Will Ramey @ NVIDIA

    Yes, that’s correct.

  • LeBen

    Hi, jazzquezz,
    Another compiler exists for Fortran to target CUDA. It’s HMPP[1] from CAPS entreprise. But unfortunately it is also another commercial product.
    Check it out. it is good enough to have look.
    1. http://

  • Jonathan Bailleul

    Hi all,
    Sorry for the newbie question, but is there some purpose in using CUDA 3.0 with my GTX260 card? (GT200, Cuda 2.3)
    In other words, can I use CUDA 3.0, compile and run applications on my current hardware?
    I can imagine that some features will be scaled naturally from cuda 2.3 to cuda 3, but maybe some others will simply not work due to GT200 hardware limitations.
    Thank you for any hint. I’ve ran through provided technical specifications, but this information was not explicitely expressed according to my readings.

  • Will Ramey @ NVIDIA

    Hi Jonathan,
    >> … can I use CUDA 3.0, compile and run applications on my current hardware?
    Of course!
    All of the features in CUDA Toolkit 3.0 will work on your GT200 card except things like multiple copy engines and ECC reporting that depend on new hardware in Fermi-based GPUs.
    Many of the new features in CUDA Toolkit 3.0 and the supporting drivers and SDK code samples are listed in the Release Highlights section on the downloads page at

  • marc

    OS: Ubuntu 9.10 64bit with GCC 4.4.1 compiler (all updates applied)
    Video: NVIDIA 8800GT 1024KB
    I downloaded the software and followed the instructions:
    1) Downloaded files from
    2) Installed driver.
    3) Installed toolkit.
    4) Update enviroment settings
    4.1) added library entries ‘/usr/local/cuda/lib64’ and ‘/usr/local/cuda/lib’ to /etc/
    4.2) added path entry ‘export PATH=/usr/local/cuda/bin:$PATH’ to /etc/profile.d/
    5) Installed .
    So far so good but when i try to compile the examples in ~/NVIDIA_GPU_Computing_SDK/C/ it breaks.
    OK, I can debug the files and alter the code and hopefully it compiles but, that does not seem logical. This is suppose to be release material, so what extra configurations must I make to be able to work will CUDA 3.0 software.
    Thank you.

  • Will Ramey @ NVIDIA
  • Will Ramey @ NVIDIA

    Hi Marc,
    Probably something simple. Should “just work” as installed.
    Please post to our support forums for help.

  • peter d.

    c++ support.. does it mean that whole c++ program can run from GPU using all cores? or how exactly is it with that c++??? 🙂

  • Will Ramey @ NVIDIA

    Hi Peter,
    Support for C++ means you can now use C++ language features like operator overloading, polymorphism, class inheritance, templates, and more in your functions/kernels that run on the GPU.
    The Toolkit 3.0 release is the first to support C++ class inheritance, enabling programmers to more easily port their existing C++ code (where appropriate) to the GPU and take advantage of higher level, object-oriented language features when writing new code.

  • diego

    To com um problema aqui , tenhoo um dula core 1.8 , 4 gb de memoria , 1 tera de hd , e uma 8600gt , uso para games , instalei o driver 196.21 e o cuda 2.3 tolkit e minha máquina ficou 100 vezes melhor um espetaculo
    Só que ai saiu o novo driver 197.13 e o novo cuda 3.0 eu os instalei , só que tanto nos videos quantos em ”alguns” games senti que pesou , melhorou graficos . tipo no pes 2010 depois que faz um gol no replay quando chega mais peerto da um lag …
    Será que é meu processador que é fraco pro cuda 3.0 e driver 197.13 ?
    Dual core 1.8 1mb de cache , to querendo um up para um core 2 duo e700 2.93ghz 3mb de cache , será que vai resolver meu problema ? alguem poderia me ajudarr pleaseee …..

  • Will Ramey @ NVIDIA

    Hola Diego,
    Que jogos você está jogando, assim podemos testar aqui na NVIDIA?
    (via Google Translate)

  • Diego

    Oi, como vai você?
    Sim use o Cuda para jogos e vídeos, e só temos que agradecer a você – eles, porque o programa é espetacular, mas muito muitoo melhorouu mesmo o modesto desempenho da minha 8600GT.
    Sobre o que eu enviei anteriormente sobre o CUDA 3.0, fiz meu pc sinto um pouco para processar os dados, e é um dual core 1,8 gz, a maioria dos jogos mais pesados exigem um 2,0 pelo menos.
    Portanto, para ter as minhas dúvidas se era o mesmo processador, fez um overclock básico na minha 1.8 GHz transformando-o em 2.01 ghz.
    E eu resolvi meu problema com o CUDA 3.0, na verdade, o problema mesmo era o meu processador, então um já preparado com um Core 2 Duo 2,93 GHz
    Muito obrigadooooo!

  • Diego translate sorry huauhauh !!

    Hi, how are you?
    Yes use the Cuda for games and videos, and just have to thank you – them because the program is spectacular, but very melhorouu muitoo even the modest performance of my 8600gt.
    About what I sent previously on the cuda 3.0, have made my pc feel a little bit to process the data, and is a dual core 1.8 gz, most games require a heavier 2.0 at least.
    So to take my doubts whether it was the same processor, made a basic overclock on my 1.8 GHz transforming it at 2.01 ghz.
    And I resolved my problem with CUDA 3.0, actually the same problem was my processor, so an already prepared up with a core 2 duo 2.93 ghz
    Very obrigadooooo!

  • Diego

    To have a problem here, I have a dual core 1.8, 4 gb memory, 1 tera hd, and a 8600gt, use for games, I installed the driver 196.21 and cuda 2.3 Tolkit and my machine was a 100 times better spectacle, the system was smooth round stable.
    Except that there emerged the new driver 197.13 and the new cuda 3.0 I installed them in”some”games I felt much improved the dynamics of the game and also the effect of physics, which only began to regret giving tb lag in pes 2010 always gave lag weighed replay scenes, corners, and videos that I watched but just in time to advance them.
    Is it my processor is weak pro 3.0 and cuda driver 197.13? This question was right! huahahauahuahah
    Dual core 1.8 1mb cache, trying to one up for a Core 2 Duo E7500 2.93GHz 3MB of cache, will it solve my problem? someone could get ajudarr pleaseee …..
    And it was the CPU clock yes, I became an overclock from 1.8 ghz to 2.01ghz and it solved my problem somehow, but I’m over the limit for my configurção, I will upgrade to a dual-core processor with 2.93 ghz, or what my recommendation of you to exchange the processor, the main focus games?
    And my motherboard only supports the core 2 duo.

  • Diego

    WiLL Ramey fought for attention, I’m playing Pro Evolution Soccer 2010, but the problem is not the driver or the 197.13 cuda 3.0, but in my processor.
    As I’ve said for over a 2:01 GHz and much improved 99%.
    My motherboard only supports up to a core 2 duo.
    And as with not much money available to penssei in core 2 duo and 7500 from 2.93 ghz and 3 mb cache, exclusive use for games.
    I believe in making a minimum over it ..
    What is your suggestion to upgrade my processor?

  • Zbigniew Koza

    Hi Marc,
    I don’t know cuda 3.0, but eralier versions were not compatible with gcc 4.4. I guess cuda 3.0 is no exception. You must downgrade your compiler to gcc 4.3.
    a) install gcc 4.3 NEXT TO your current compiler (using standard package manager).
    b) create soft links in your $HOME/bin directory that will point to the 4.3 binaries of gcc and g++. These links should be named “gcc” and “g++” respectively. The effect looks like this:
    zkoza@zbyszek:~$ ls -l ./bin
    […] g++ -> /usr/bin/g++-4.3
    […] gcc -> /usr/bin/gcc-4.3
    c) add a compiler option “–compiler-bindir=…” that will tell the nvcc compiler which gcc/g++ to use. In my case it reads
    > nvcc –compiler-bindir=/home/zkoza/bin [OTHER OPTIONS]


    hi my name is jim, docdrabs,This might be a dumb ? for you guys i no alot about my computer but i have a problem with call of duty world at war and call of duty card is a 9500gt the game works great in multiplayer but when i try to play them in single player i get wrong cd entered,there is only 1 cd so that cant be the issue.i have asked everywhere but cannot get any help the only thing help said it was to do with my video c++ i have no idea what that means. Plzz help im at a dead end.jim ,docdrabs