working CUDA example

Index > Projects and Ideas > working CUDA example

Author

Thread

tthsqe

Joined: 20 May 2009
Posts: 767

tthsqe 19 Aug 2011, 23:34

I finally tried cuda in fasm and it does work.
This simple test program shows how accurate the approximate log2 function is on the gpu vs cpu.

Note: The program posted 8 posts down seems to work better.

Description:		Download
Filename:	CUDA.zip
Filesize:	7.95 KB
Downloaded:	1512 Time(s)

Last edited by tthsqe on 02 Dec 2013, 08:07; edited 2 times in total

19 Aug 2011, 23:34

tthsqe

Joined: 20 May 2009
Posts: 767

tthsqe 20 Aug 2011, 12:53

If you have a cuda-enabled gpu and this doesn't display a table of values, please let me know what error you are experiencing.

20 Aug 2011, 12:53

sinsi

Joined: 10 Aug 2007
Posts: 794
Location: Adelaide

sinsi 21 Aug 2011, 10:32

error code dec:999

Does a GTX580 support CUDA? I assume so.

edit: fails at cleanup 'cuMemFree'

21 Aug 2011, 10:32

ctl3d32

Joined: 30 Dec 2009
Posts: 206
Location: Brazil

ctl3d32 21 Aug 2011, 13:35

Works fine for me: GT540M

21 Aug 2011, 13:35

tthsqe

Joined: 20 May 2009
Posts: 767

tthsqe 21 Aug 2011, 16:22

sinsi,
It seems strange that everything would work except the cleanup.
Error code 999 is CUDA_UNKNOWN_ERROR as in cuda.inc.
Try taking out jnz Error and see if the table was computed correctly.
Also, I hate to ask, but what driver #?
The GTX580 does support cuda - this is what I tested it on.

21 Aug 2011, 16:22

sinsi

Joined: 10 Aug 2007
Posts: 794
Location: Adelaide

sinsi 21 Aug 2011, 23:36

Driver is 280.26, first column is 1 to 20, second is all 0, 3rd is 0 to 4.321
OS is win7pro x64, CPU is AMD Phenom II X6 1100T.

21 Aug 2011, 23:36

Kuemmel

Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany

Kuemmel 22 Aug 2011, 16:14

Works fine here on my GTX260. Great stuff ! So as far as I understood this is NVIDIA-CUDA-syntax. So I guess AMD uses a different syntax.

Wasn't there something like a common shader language, like GLSL. Couldn't that be used for the same effort to avoid the need of writing code for both companies ?

22 Aug 2011, 16:14

f0dder

Joined: 19 Feb 2004
Posts: 3174
Location: Denmark

f0dder 22 Aug 2011, 16:38

Kuemmel wrote:

Wasn't there something like a common shader language, like GLSL. Couldn't that be used for the same effort to avoid the need of writing code for both companies ?

DirectCompute Smile

_________________
carpe noctem

22 Aug 2011, 16:38

tthsqe

Joined: 20 May 2009
Posts: 767

tthsqe 23 Aug 2011, 00:52

@sinsi
The only relavent different in our systems is the driver #. I have 275.33, and have had problems with the newer ones in the past. You can see exactly how it is failing with this

Description:		Download
Filename:	CUDA.zip
Filesize:	9.91 KB
Downloaded:	1251 Time(s)

23 Aug 2011, 00:52

sinsi

Joined: 10 Aug 2007
Posts: 794
Location: Adelaide

sinsi 23 Aug 2011, 01:10

OK, the last one works fine.

23 Aug 2011, 01:10

tthsqe

Joined: 20 May 2009
Posts: 767

tthsqe 23 Aug 2011, 01:42

What?
OK. This is further proof that cuParmSeti sucks and cuParmSetv is the way to go - that is the only thing I changed.

23 Aug 2011, 01:42

gunblade

Joined: 19 Feb 2004
Posts: 209

gunblade 11 Jan 2012, 03:52

Sorry for reviving a (slightly) old thread.. but I got a Nvidia GTS 450 not long ago - and was keen to test out CUDA. However - I use linux as my primary (and only) system, so I decided to port your code to 64-bit linux to see if it would work (I hope you dont mind.. if you do, let me know, and I'll take it down Smile

)

I've attached the code, the cuda.inc hasnt changed, the api_cuda.inc's been changed to match ELF64 "extrn" syntax rather than the import thing that winapi does, and the main cudatest.asm's been totally updated to work on linux 64-bit.
The api_cuda actually changed more than that, i dumped a list of the function names from the cuda library and generated the inc by extrn'ing all the available functions. Theres less functions in the linux cuda library than in the windows (ie: no Direct3D stuff, obviously)

Uploaded the archive as a .tar.bz2, should extract fine with tar -xvf cuda.tar.bz2, or using a gui archive program (if you insist on using the GUI Razz

). inside's the code, etc, and a makefile. Typing make should build it fine. The second stage may vary depending on the system, although I'm pretty sure the location/name of the dynamic linker is quite standard, if yours is different from mine (/lib/ld-linux-x86-64.so.2), then you can change it in the makefile..

Thanks a lot tthsqe for the original code, you've saved me from using C/C++ and nvidia's big SDK/compiler.. now I can have fun writing some CUDA code in assembly (under linux, of course) Very Happy

- Speaking of which, where did you get the syntax for the assembler used in your PTX function, the one that is actually assembled/run on the GPU? I looked up some of the CUDA documentation on the nvidia site, but it was mainly references on the functions in the cuda library, rather than the CUDA language itself. Razz

Description:	64-bit Linux version of tthsqe's cudatest application.	Download
Filename:	cuda.tar.bz2
Filesize:	7.22 KB
Downloaded:	1092 Time(s)

11 Jan 2012, 03:52

Tyler

Joined: 19 Nov 2009
Posts: 1215
Location: NC, USA

Tyler 11 Jan 2012, 05:49

Kuemmel wrote:

Wasn't there something like a common shader language, like GLSL. Couldn't that be used for the same effort to avoid the need of writing code for both companies ?

OpenCL does that.

11 Jan 2012, 05:49

LocoDelAssembly
Your code has a bug

Joined: 06 May 2005
Posts: 4623
Location: Argentina

LocoDelAssembly 11 Jan 2012, 15:27

Quote:

Speaking of which, where did you get the syntax for the assembler used in your PTX function, the one that is actually assembled/run on the GPU?

Maybe from PTX: Parallel Thread Execution ISA? But if I remember right, this is not the one that is run on the GPU, is some sort of "Java assembly" (but still, this may be faster than using HLL code, at least in the DirectX world, drivers receive assembled code, not the HLL code for them to compile).

11 Jan 2012, 15:27

gunblade

Joined: 19 Feb 2004
Posts: 209

gunblade 11 Jan 2012, 15:40

Ah, thanks for the link Loco, I know the "assembly" code is probably not the code thats actually executed on the card - but well, its as close as we'll get probably Very Happy

11 Jan 2012, 15:40

ohara

Joined: 13 Oct 2006
Posts: 20

ohara 11 Apr 2012, 18:56

Fantastic!
I have this working on a geforce 9400 gt
One thing I wondered, as I increase the memory arrays up passed 10Mbytes, this appears in the .exe filesize...putting the data last does not seem to work as it does in 32-bit fasm. Does anyone know how I can make the .exe file much smaller?

11 Apr 2012, 18:56

ohara

Joined: 13 Oct 2006
Posts: 20

ohara 12 Apr 2012, 11:03

Ah- found the answer, you must put assigned data before unassigned data at the end. 5K exe file now.

12 Apr 2012, 11:03

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum