Archive

Archive for the ‘CUDA’ Category

How to call CUDA Programs from a C/C++ Application?

September 29th, 2008 Sarath Comments

Calling a CUDA function from your C/C++ file is very simple. It’s pretty straight forward as you call an extern function in C/C++. To start with, I believe you’ve added the CUDA program in your work space (or copy the program provided below and save as .cu file) and you could compile the file using CUDA compiler and finally the object files has been generated. Please check my previous post to know more about how to compile CUDA source in Visual Studio. CUDA follows C language constructs and rationales. The CUDA compiler will generate the object files which contains the functions and definitions of your CUDA program.

Let’s take a sample presented in a DDJ article CUDA, Supercomputing for the Masses: Part 2. The program increments the content of the array by one. Robb Farber has put an excellent effort to present CUDA in a simple manner in his high performance computing series on CUDA (Check DDJ). Please click on the link to know more about the program presented here.

[sourcecode language='cpp']
// incrementArray.cu
#include
#include
#include

void incrementArrayOnHost(float *a, int N)
{
int i;
for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
__global__ void incrementArrayOnDevice(float *a, int N)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx }

extern "C" void IncrementArray(void)
{
float *a_h, *b_h; // pointers to host memory
float *a_d; // pointer to device memory
int i, N = 10;
size_t size = N*sizeof(float);
// allocate arrays on host
a_h = (float *)malloc(size);
b_h = (float *)malloc(size);
// allocate array on device
cudaMalloc((void **) &a_d, size);
// initialization of host data
for (i=0; i // copy data from host to device
cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);
// do calculation on host
incrementArrayOnHost(a_h, N);
// do calculation on device:
// Part 1 of 2. Compute execution configuration
int blockSize = 4;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);
// Retrieve result from device and store in b_h
cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// check results
for (i=0; i // cleanup
free(a_h); free(b_h); cudaFree(a_d);
}
[/sourcecode]

So the above program creates a simple CUDA program which increments the contents of each element in the array by one. What I changed from the original source code is, I made the main function as a new function which follows the “C” language rules.[that’s why I put extern “C” in front of the code. Even if you don’t add extern “C” it will work fine with your C++ compiler.
Instead of including the CUDA file by #include preprocessor, it’s better to define it as an external function by extern keyword.So your main program may appear as follows.

[sourcecode language='cpp']
// IntegrationWithCPP.cpp : Defines the entry point for the console application.
//

#include “stdafx.h”

// Forward declare the function
extern “C” void IncrementArray();

int _tmain(int argc, _TCHAR* argv[])
{
IncrementArray();
return 0;
}
[/sourcecode]

Finally link the program with cudart.lib cudartd.lib(debug) or cudart.lib(release) and enjoy your program!!!

That’s it. One thing I noticed is that, even if I define the main functions in CUDA file and my C++ file, I’m not getting any error from the linker. The linker gives error only if the main function in CUDA and C++ file having same prototype. Otherwise the version in the CUDA file will be called (from my experience so far).

Sharing my thoughts...

Categories: C++, CUDA, Code, GPGPU, Misc, Tips, Visual Studio Tags: , , ,

How to Get Properties of your CUDA Device?

September 22nd, 2008 Sarath Comments

If you’re a going into High Performance Computing (HPC), there’s a very rare chance to miss the name CUDA. I’m just getting into that. nVdia GeForce 8 Series + hardware supports CUDA programming. You might be interested in various parameters of CUDA. From the version to number of multi processors, size of shared memory, global memory etc… How you can get the details of your CUDA hardware? Interestingly you can have multiple GPU in your machine. So how can we enumerate the properties of your CUDA hardware? Here’s the sample CUDA snippet for you. Save in your disk and compile it using nVidia CUDA compiler (nvcc.exe coming with CUDA Toolkit)

[sourcecode language='cpp']
Device properties of CUDA Hardware
#include
#include
#include

void DisplayProperties( cudaDeviceProp* pDeviceProp )
{
if( !pDeviceProp )
return;

printf( “\nDevice Name \t – %s “, pDeviceProp->name );
printf( “\n**************************************”);
printf( “\nTotal Global Memory\t\t -%d KB”, pDeviceProp->totalGlobalMem/1024 );
printf( “\nShared memory available per block \t – %d KB”, pDeviceProp->sharedMemPerBlock/1024 );
printf( “\nNumber of registers per thread block \t – %d”, pDeviceProp->regsPerBlock );
printf( “\nWarp size in threads \t – %d”, pDeviceProp->warpSize );
printf( “\nMemory Pitch \t – %d bytes”, pDeviceProp->memPitch );
printf( “\nMaximum threads per block \t – %d”, pDeviceProp->maxThreadsPerBlock );
printf( “\nMaximum Thread Dimension (block) \t – %d %d %d”, pDeviceProp->maxThreadsDim[0], pDeviceProp->maxThreadsDim[1], pDeviceProp->maxThreadsDim[2] );
printf( “\nMaximum Thread Dimension (grid) \t – %d %d %d”, pDeviceProp->maxGridSize[0], pDeviceProp->maxGridSize[1], pDeviceProp->maxGridSize[2] );
printf( “\nTotal constant memory \t – %d bytes”, pDeviceProp->totalConstMem );
printf( “\nCUDA ver \t – %d.%d”, pDeviceProp->major, pDeviceProp->minor );
printf( “\nClock rate \t – %d KHz”, pDeviceProp->clockRate );
printf( “\nTexture Alignment \t – %d bytes”, pDeviceProp->textureAlignment );
printf( “\nDevice Overlap \t – %s”, pDeviceProp-> deviceOverlap?”Allowed”:”Not Allowed” );
printf( “\nNumber of Multi processors \t – %d”, pDeviceProp->multiProcessorCount );
}

int main(void)
{
cudaDeviceProp deviceProp;
int nDevCount = 0;

cudaGetDeviceCount( &nDevCount );
printf( “Total Device found: %d”, nDevCount );
for (int nDeviceIdx = 0; nDeviceIdx < nDevCount; ++nDeviceIdx )
{
memset( &deviceProp, 0, sizeof(deviceProp));
if( cudaSuccess == cudaGetDeviceProperties(&deviceProp, nDeviceIdx))
DisplayProperties( &deviceProp );
else
printf( "\n%s", cudaGetErrorString(cudaGetLastError()));
}
}
[/sourcecode]

Technorati Tags: ,,,

 

Sharing my thoughts...

Categories: CUDA, Code, GPGPU, Tips Tags: ,

How to Enable Syntax Highlighting for CUDA files in Visual Studio 2005?

September 4th, 2008 Sarath Comments

It’s really awkward if the CUDA file displayed as a normal file in Visual Studio. It’s essentially following C style format. But the keywords are different from C/C++. But Visual Studio is flexible enough to give editor experience for a custom file.

image

So what you’ve to do to enable syntax highlighting for CUDA Source files? It’s clearly described in the CUDA SDK help files.

1. Setup CUDA in your box. (Install the CUDA SDK and CUDA Toolkit)

2. Browse to “Microsoft Visual Studio 8\Common7\IDE” folder from your program files folder

3. Open the user “usertype.dat” file from the folder. If the file doesn’t exists, create a new one in the name.

4. Open the %Program Files%\NVIDIA Corporation\NVIDIA CUDA SDK\doc\syntax_highlighting\visual_studio_8

5. Append the content of “usertype.dat” to the previously opened “usertype.dat” file from “Microsoft Visual Studio 8\Common7\IDE”

6. Save the file

7. Open You IDE and Take Tools -> Options.

8. Under Text Editor -> File Extension tab, specify the extension “cu” as a new type (as pictured below)

image

1. Restart your IDE

2. Enjoy Syntax highlighting

Note that the above settings are applicable for Visual Studio 8 only. For Visual Studio 7 the setup is slightly different. You can see the instruction from NVIDIA Corporation\NVIDIA CUDA SDK\doc\syntax_highlighting\visual_studio_7 folder once you install the CUDA SDK.

Sharing my thoughts...

Categories: CUDA, GPGPU, Tips Tags: