How to call CUDA Programs from a C/C++ Application?
Calling a CUDA function from your C/C++ file is very simple. It’s pretty straight forward as you call an extern function in C/C++. To start with, I believe you’ve added the CUDA program in your work space (or copy the program provided below and save as .cu file) and you could compile the file using CUDA compiler and finally the object files has been generated. Please check my previous post to know more about how to compile CUDA source in Visual Studio. CUDA follows C language constructs and rationales. The CUDA compiler will generate the object files which contains the functions and definitions of your CUDA program.
Let’s take a sample presented in a DDJ article CUDA, Supercomputing for the Masses: Part 2. The program increments the content of the array by one. Robb Farber has put an excellent effort to present CUDA in a simple manner in his high performance computing series on CUDA (Check DDJ). Please click on the link to know more about the program presented here.
[sourcecode language='cpp']
// incrementArray.cu
#include
#include
#include
void incrementArrayOnHost(float *a, int N)
{
int i;
for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
__global__ void incrementArrayOnDevice(float *a, int N)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx
extern "C" void IncrementArray(void)
{
float *a_h, *b_h; // pointers to host memory
float *a_d; // pointer to device memory
int i, N = 10;
size_t size = N*sizeof(float);
// allocate arrays on host
a_h = (float *)malloc(size);
b_h = (float *)malloc(size);
// allocate array on device
cudaMalloc((void **) &a_d, size);
// initialization of host data
for (i=0; i
cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);
// do calculation on host
incrementArrayOnHost(a_h, N);
// do calculation on device:
// Part 1 of 2. Compute execution configuration
int blockSize = 4;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);
// Retrieve result from device and store in b_h
cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// check results
for (i=0; i
free(a_h); free(b_h); cudaFree(a_d);
}
[/sourcecode]
So the above program creates a simple CUDA program which increments the contents of each element in the array by one. What I changed from the original source code is, I made the main function as a new function which follows the “C” language rules.[that’s why I put extern “C” in front of the code. Even if you don’t add extern “C” it will work fine with your C++ compiler.
Instead of including the CUDA file by #include preprocessor, it’s better to define it as an external function by extern keyword.So your main program may appear as follows.
[sourcecode language='cpp']
// IntegrationWithCPP.cpp : Defines the entry point for the console application.
//
#include “stdafx.h”
// Forward declare the function
extern “C” void IncrementArray();
int _tmain(int argc, _TCHAR* argv[])
{
IncrementArray();
return 0;
}
[/sourcecode]
Finally link the program with cudart.lib cudartd.lib(debug) or cudart.lib(release) and enjoy your program!!!
That’s it. One thing I noticed is that, even if I define the main functions in CUDA file and my C++ file, I’m not getting any error from the linker. The linker gives error only if the main function in CUDA and C++ file having same prototype. Otherwise the version in the CUDA file will be called (from my experience so far).





