Archive

Posts Tagged ‘CUDA’

How to call CUDA Programs from a C/C++ Application?

September 29th, 2008 Sarath Comments

Calling a CUDA function from your C/C++ file is very simple. It’s pretty straight forward as you call an extern function in C/C++. To start with, I believe you’ve added the CUDA program in your work space (or copy the program provided below and save as .cu file) and you could compile the file using CUDA compiler and finally the object files has been generated. Please check my previous post to know more about how to compile CUDA source in Visual Studio. CUDA follows C language constructs and rationales. The CUDA compiler will generate the object files which contains the functions and definitions of your CUDA program.

Let’s take a sample presented in a DDJ article CUDA, Supercomputing for the Masses: Part 2. The program increments the content of the array by one. Robb Farber has put an excellent effort to present CUDA in a simple manner in his high performance computing series on CUDA (Check DDJ). Please click on the link to know more about the program presented here.

[sourcecode language='cpp']
// incrementArray.cu
#include
#include
#include

void incrementArrayOnHost(float *a, int N)
{
int i;
for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
__global__ void incrementArrayOnDevice(float *a, int N)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx }

extern "C" void IncrementArray(void)
{
float *a_h, *b_h; // pointers to host memory
float *a_d; // pointer to device memory
int i, N = 10;
size_t size = N*sizeof(float);
// allocate arrays on host
a_h = (float *)malloc(size);
b_h = (float *)malloc(size);
// allocate array on device
cudaMalloc((void **) &a_d, size);
// initialization of host data
for (i=0; i // copy data from host to device
cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);
// do calculation on host
incrementArrayOnHost(a_h, N);
// do calculation on device:
// Part 1 of 2. Compute execution configuration
int blockSize = 4;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);
// Retrieve result from device and store in b_h
cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
// check results
for (i=0; i // cleanup
free(a_h); free(b_h); cudaFree(a_d);
}
[/sourcecode]

So the above program creates a simple CUDA program which increments the contents of each element in the array by one. What I changed from the original source code is, I made the main function as a new function which follows the “C” language rules.[that’s why I put extern “C” in front of the code. Even if you don’t add extern “C” it will work fine with your C++ compiler.
Instead of including the CUDA file by #include preprocessor, it’s better to define it as an external function by extern keyword.So your main program may appear as follows.

[sourcecode language='cpp']
// IntegrationWithCPP.cpp : Defines the entry point for the console application.
//

#include “stdafx.h”

// Forward declare the function
extern “C” void IncrementArray();

int _tmain(int argc, _TCHAR* argv[])
{
IncrementArray();
return 0;
}
[/sourcecode]

Finally link the program with cudart.lib cudartd.lib(debug) or cudart.lib(release) and enjoy your program!!!

That’s it. One thing I noticed is that, even if I define the main functions in CUDA file and my C++ file, I’m not getting any error from the linker. The linker gives error only if the main function in CUDA and C++ file having same prototype. Otherwise the version in the CUDA file will be called (from my experience so far).

Sharing my thoughts...

Categories: C++, CUDA, Code, GPGPU, Misc, Tips, Visual Studio Tags: , , ,

How to Integrate CUDA with Visual C++

September 26th, 2008 Sarath Comments

If you want to program CUDA, first you will have to install latest nVidia CUDA Driver for your Graphics Hardware which supports desired CUDA version. Then you will have to install the CUDA Toolkit which includes the CUDA Compiler, Include file, lib file and binary files to develop your CUDA application.

The bin folder under your CUDA installation location (most probably C:\CUDA), you can see nvcc.exe which helps you to compile the CUDA program. If you give, nvcc my_cuda_filename.cu, the compiler will compile the source file and creates the executables (a.exe).

But in Windows world, most of the developers are much satisfied with the IDE Visual Studio. So may have to leave the world of command line compilation and source editing in favor of improving our productivity. If we can integrate the CUDA development to Visual Studio IDE, that’s pretty nice no? In one of my previous post, I said about enabling syntax highlighting for CUDA files under Visual Studio.

Now let’s check how we can support CUDA compilation under Visual Studio. CUDA compiler has a dependency with C++ compiler. It supports either Visual C++ 7.1 or 8.0. Currently it doesn’t support Visual C++ 9 (VS 2008).

Method 1 – Install CUDA Build Rule for Visual Studio 2005

There’s a painless method by installing a custom build rule for Visual Studio 2005 developed by JaredHoberock. A cool installer is available to do the necessary settings. When you add the “.cu” files to your application, it shows a dialog box to select the rule you installed. Press “OK” button.

Select CUDA Build Rule

Select CUDA Build Rule

You can control the per-file (.cu file) configuration for your application by selecting the properties of the file from your solution explorer. See the figure below.

Detailed Build Configuration under Wizard

Detailed Build Configuration under Wizard

The installer also adds new Project Type to create your new CUDA project which does the necessary settings for your application.

Wizard to create new CUDA Project

Wizard to create new CUDA Project

Finally you can put the required CUDA libraries (e.g CUDA Runtime library – cudart.lib) in Project->Properties->Linker->Input->Additional Dependency.

I think the installer won’t work for Visual Studio 2003. I think you can manually copy the rule file from $:\Program Files\Microsoft Visual Studio 8\VC\VCProjectDefaults\cuda.rules and specify this as the rule file for compilation. Anyway try it yourself.

Method 2 – Manually Configure by Custom Build Event

Without Visual Studio, we can use nvcc compiler for compiling the CUDA files with various command line parameters. So the same setup we can use with Visual Studio as well. One of the advantages is that, we can use various Visual Studio Specific variables to identify various file locations.

 

Custom Build

Custom Build

 

1. Select the CUDA source file from the solution explorer and take properties.

2.  Select “Custom Build Setup” from the tree

3.  You can select the active configuration from “Configuration” combo box.

4. Specify the following options under “Command Line”

“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/RTC1,/MTd -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\<your_cu_filename>.obj <your_cu_filename>.cu

The above configuration is to compile the source under debug configuration. For a release configuration, you will have to remove the symbol definition, _DEBUG (-D switch is to define symbols). Also the above command line ask to link with the static-debug version of C-Runtime library by specifying /MTd option. In the case of release build you will have to modify it to /MD or /MT. Please check the various C-Runtime versions from MSDN and apply accordingly. Thus a sample command line for release version may look as follows. 

“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/RTC1,/MD -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\<your_cu_filename>.obj <your_cu_filename>.cu

If you need to add Device Emulation option, you add -deviceemu switch in the command line. Thus you the compiler will generate code for GPGPU Emulation library. You can understand more about each switches from MSDN itself. The information about the other variables used has been specified at the end of this post.

5. You can specify your own description for the custom build event under “Description” text box. (See the figure)

6. Finally you have to specify the output file name under “Outputs”

$(ConfigurationName)\<your_cu_filename>.obj

7. If you  have included any CUDA files (probably will be having extension .cu or .cuh), you can specify it in the “Additional Dependencies option

In the command line string, we’ve used some visual studio specific build macros. You can see the entire list of Visual Studio Build Macros in MSDN.

8. Finally build your project and now see the output of CUDA compilation in your output Window.

Adding a new build configuration

Sometimes you have to build your file with CUDA emulation for many debugging purpose.  Please check MSDN to know more about adding build configuration.

I’ll explain how to call a CUDA Program from a C++ file in the next post. Thanks for your patience. I’m a newbie in CUDA. If you’ve better ideas and suggestions, please share with me through your comments.

 

Sharing my thoughts...

Categories: C++, Code, Softwares, Tips Tags: , , ,

How to Get Properties of your CUDA Device?

September 22nd, 2008 Sarath Comments

If you’re a going into High Performance Computing (HPC), there’s a very rare chance to miss the name CUDA. I’m just getting into that. nVdia GeForce 8 Series + hardware supports CUDA programming. You might be interested in various parameters of CUDA. From the version to number of multi processors, size of shared memory, global memory etc… How you can get the details of your CUDA hardware? Interestingly you can have multiple GPU in your machine. So how can we enumerate the properties of your CUDA hardware? Here’s the sample CUDA snippet for you. Save in your disk and compile it using nVidia CUDA compiler (nvcc.exe coming with CUDA Toolkit)

[sourcecode language='cpp']
Device properties of CUDA Hardware
#include
#include
#include

void DisplayProperties( cudaDeviceProp* pDeviceProp )
{
if( !pDeviceProp )
return;

printf( “\nDevice Name \t – %s “, pDeviceProp->name );
printf( “\n**************************************”);
printf( “\nTotal Global Memory\t\t -%d KB”, pDeviceProp->totalGlobalMem/1024 );
printf( “\nShared memory available per block \t – %d KB”, pDeviceProp->sharedMemPerBlock/1024 );
printf( “\nNumber of registers per thread block \t – %d”, pDeviceProp->regsPerBlock );
printf( “\nWarp size in threads \t – %d”, pDeviceProp->warpSize );
printf( “\nMemory Pitch \t – %d bytes”, pDeviceProp->memPitch );
printf( “\nMaximum threads per block \t – %d”, pDeviceProp->maxThreadsPerBlock );
printf( “\nMaximum Thread Dimension (block) \t – %d %d %d”, pDeviceProp->maxThreadsDim[0], pDeviceProp->maxThreadsDim[1], pDeviceProp->maxThreadsDim[2] );
printf( “\nMaximum Thread Dimension (grid) \t – %d %d %d”, pDeviceProp->maxGridSize[0], pDeviceProp->maxGridSize[1], pDeviceProp->maxGridSize[2] );
printf( “\nTotal constant memory \t – %d bytes”, pDeviceProp->totalConstMem );
printf( “\nCUDA ver \t – %d.%d”, pDeviceProp->major, pDeviceProp->minor );
printf( “\nClock rate \t – %d KHz”, pDeviceProp->clockRate );
printf( “\nTexture Alignment \t – %d bytes”, pDeviceProp->textureAlignment );
printf( “\nDevice Overlap \t – %s”, pDeviceProp-> deviceOverlap?”Allowed”:”Not Allowed” );
printf( “\nNumber of Multi processors \t – %d”, pDeviceProp->multiProcessorCount );
}

int main(void)
{
cudaDeviceProp deviceProp;
int nDevCount = 0;

cudaGetDeviceCount( &nDevCount );
printf( “Total Device found: %d”, nDevCount );
for (int nDeviceIdx = 0; nDeviceIdx < nDevCount; ++nDeviceIdx )
{
memset( &deviceProp, 0, sizeof(deviceProp));
if( cudaSuccess == cudaGetDeviceProperties(&deviceProp, nDeviceIdx))
DisplayProperties( &deviceProp );
else
printf( "\n%s", cudaGetErrorString(cudaGetLastError()));
}
}
[/sourcecode]

Technorati Tags: ,,,

 

Sharing my thoughts...

Categories: CUDA, Code, GPGPU, Tips Tags: ,