• http://profiles.google.com/rtfss1 rtfss none

    Sorry for saying that but AMD DirectGMA on firepro SDI cards seems better as is equal to CUDA P2P GPU-GPU functionality avoiding host copies.. yours is going through host mem altough goodly using no copies on host.. also sorry don’t know what magical feature is  NVIDIA GPUDirect for Video, it since CUDA 4.0 has cudahostregister for pinning host mem allocated without CUDA host mem alloc calls.. hope you improve soon to a use P2P PCIe transfers avoiding host transferes similar to directgma from AMD

  • Anonymous

    Thanks for reading our blog – we appreciate your feedback!

    Let me address some of your points:
    In terms of P2P, we actually do P2P transfers in our own NVIDIA Digital Video Pipeline product today, and after several years of experience with DVP, what we found is that we could achieve similar low latencies with the P2H2P approach that we have taken with GPUDirect for Video.

    Among other things, this gives us the benefit of being able to provide long term support for third-party video I/O cards without having our partners make changes as our GPU architectures change over time.

    GPUDirect for Video also allows transfers of data directly to the relevant GPU (whether Quadro or Tesla). Moving the data using CUDA C functions would invoke an extra interop stage to transfer into the graphics API, which of course increases latency. Similarly, GPUDirect has extensive support for synchronization which allows independent threads to work as efficiently as possible, further minimizing the latency of transfers. So yes we understand the merits of P2P but our architectural choices were made with all these factors in mind and so far are we are very happy with the results.

  • http://profiles.google.com/rtfss1 rtfss none

    Now makes sense thanks..