flat assembler
Message board for the users of flat assembler.

Index > Main > Running code on GPU

Author
Thread Post new topic Reply to topic
r22



Joined: 27 Dec 2004
Posts: 805
r22
Has anyone here tried to run non graphics related code on the GPU?
IE: Arbitrary math calculations

I don't know modern GPU assembly, but I assume if I did I'd be able to write a shader in GPU asm and then use something like DX9CompileVertexShader to turn it into machine code. BUT I don't know how the input and output works, once it's done with the calculations how would I retrieve the results etc.

If you google this topic you get a bunch of USELESS stuff about HIGHLEVEL shader languages Cg(dx/gl), HLSL(dx), Brook stream programming in C, Sh(gl), and advertisements for books. (all useless)

If someone's searching skills are better than mine or you have a simple example program that (for example) accumulates all the singleFP values in an array on the GPU and returns the result I would be very interested in your reply.
Post 26 Jun 2005, 19:14
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
farrier



Joined: 26 Aug 2004
Posts: 274
Location: North Central Mississippi
farrier

_________________
Some Assembly Required
It's a good day to code!
U.S.Constitution; Bill of Rights; Amendment 1:
... the right of the people peaceably to assemble, ...
The code is dark, and full of errors!
Post 26 Jun 2005, 20:37
View user's profile Send private message Reply with quote
THEWizardGenius



Joined: 14 Jan 2005
Posts: 382
Location: California, USA
THEWizardGenius
Be careful, as not all GPU's are compatible. Some people might not even have a GPU, so your code won't work for everyone unless you do it right. Still, it's a good idea.
Post 27 Jun 2005, 01:07
View user's profile Send private message AIM Address Reply with quote
Micah II



Joined: 04 Jun 2005
Posts: 5
Micah II
As a c++ user with a little OpenGL experience, i believe it's possible to do some minor computational work on current CPU's most of that still related to 3d maths though, i.e. rotations and translations. (and thus Sine() and Cosine().)

as for programming for GPU for other purposes, i don't think anyone really has enough information about it, since nVIDIA and ATI keep their GPU's secrets very well hidden.

and thewizardgenius is right, for example the latest Geforce 6 models use the NV40 core i believe, where the chip used in the xbox has a NV2A core, having a different architecture, and thus requiring different ASM sources, making it more logical to let the graphics card producers write the nitty gritty stuff like HLSL or CG, and you coding your things ina a high level language.



simply put, ASM for GPU's wouldn't make any sense IMO.
Post 27 Jun 2005, 15:30
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
Agreed sadly, after doing a bit more research that's no real low level approach to running code on the GPU.
Because of the different architectures you need to have a runtime that compiles the correct code.
For instance, Brook General Purpose GPU project uses streams and kernel functions that are converted to HLSL and the runtime to compile the code to GPU machine code is put inside of the executable.

It's a shame that the greatest speed optimization requires such a high level approach.
Post 27 Jun 2005, 21:45
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I was thinking about this subject recently. I have no luck with OpenGL (only stupid GLSL stuff and things like "the ABR group decided to not publish the bytecodes for OGL 2.0". With DirectX however I found this http://msdn2.microsoft.com/en-us/library/bb219840.aspx

The DDK has documentation about the bytecodes as that document says. Unfortunatelly I have no time to even try writing bytecodes by hand and test.

Regardless of the posibility of doing non-graphics stuff I want to write macroses to have some kind of shader assembler at least of DirectX (though I would like much more OpenGL).

About collecting results I think you can capture the screen after using some pixel shaders (not sure how to capture result when using vertex shaders).

This is the best thing I found for OpenGL http://cs.anu.edu.au/~Hugh.Fisher/shaders/lowlevel.html . However it uses oldest shader version and uses plain text assembly shader instead of bytecoded.

I hope you can get this to work and share it here
Post 12 Jun 2007, 18:22
View user's profile Send private message Reply with quote
zir_blazer



Joined: 05 Dec 2006
Posts: 66
zir_blazer
What about this?
http://arstechnica.com/news.ars/post/20070219-8878.html

Its a C based extension for using the nVidia GPUs for calculations but its pretty close to what you want. ATI also had disclosed some data for using their GPUs directly on Assembly language, but as I am not a developer I didn't searched around for it.
Post 12 Jun 2007, 21:11
View user's profile Send private message MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I read about CUDA but it's intended for GeForce 8800. The processing capability is tremendously big but, requires a excessively new graphic card...
Post 12 Jun 2007, 21:57
View user's profile Send private message Reply with quote
zir_blazer



Joined: 05 Dec 2006
Posts: 66
zir_blazer
LocoDelAssembly wrote:
I read about CUDA but it's intended for GeForce 8800. The processing capability is tremendously big but, requires a excessively new graphic card...

According to this http://forums.nvidia.com/index.php?s=c496218c5d508bece0ac9292c7968023&showtopic=28492
It applies for all the GeForce 8xxx line. That means that a GeForce 8300GT could work with the code, as maybe does a IGP based on that architecture.
Post 13 Jun 2007, 02:20
View user's profile Send private message MSN Messenger Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Just and update, my link doesn't work anymore (there is a redirect to the Asm Shader Reference but doesn't provide information to make your own assembler). The information about the instruction encoding is here now.
Post 28 Dec 2009, 19:48
View user's profile Send private message Reply with quote
MattBro



Joined: 08 Nov 2003
Posts: 37
MattBro
GPU computing and more generally heterogeneous computing is the wave of the future. AMD has their fusion vision which fuses their CPU and GPU efforts as Intel is doing in some ways with their upcoming Larrabee (though it's been canceled as a gaming platform.)

For my purposes I desperately need all the performance I can get. I've got problems that take days to compute. These days that means taking advantage of NVIDIAs CUDA development environment or even better using OpenCL. OpenCL is basically a C99 compiler that compiles code to various GPU/CPU devices. It's embraced by Apple, Nvidia, AMD and supposedly Intel. (Though Intel hasn't released any drivers and maybe never will.)
http://www.khronos.org/opencl/

Here's the weird part. The C99 code is compiled on the fly. The reason is that all these different devices have entirely different architectures and machine code op.s. In fact Nvidia has different machine instruction sets for different graphics cards. It only unifies at the much higher level of abstraction, the openCL code. That probably means that a lot of the performance of hand optimized assembly is lost, even though performance is critical for these applications. I personally think OpenCL should have agreed upon a low level virtual machine and assembler, rather than C99, and then they could supply the compilers to that target.

They are using the LLVM infrastructure for this anyway. Maybe there are GPU back ends for the LLVM code. Wouldn't it be nice if there were a FASM for heterogenous computing? That's probably more like a high level assembler project though.

_________________
-- -------------------------------------------------------
"I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus
---------------------------------------------------------
Post 31 Dec 2009, 08:08
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Thing is, would targetting assembly for a "virtual" CPU be a good idea? You'd probably be stuck with either some pretty high-level assembly that still needs to be translated to The Real Deal (and with less leeway to do so, because you're working on a lower abstraction level), or you'd end up with a lowest-common-denominator that can't take advantage of everything...
Post 31 Dec 2009, 12:24
View user's profile Send private message Visit poster's website Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
All we need is Intel/AMD to make a chip with just SSE/2/3/4 compatibility and 32+ cores. Intel tried with Larrabee but ended up canceling it due to performance issues (probably SSE not having trig functions sin/cos etc.)
However, Larrabee was compatible with the standard x86 instruction set as well, which added unneeded overhead for a companion processor. The inefficient encoding for SSE because it's an extension probably doesn't help matters either the die size for 32+ would probably be prohibitively large.

MOV rax, [RANT]
RET 0
Post 31 Dec 2009, 14:15
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Do you reckon 32 x86-like cores would be enough to reach computational power of today's GPUs?
Post 31 Dec 2009, 14:34
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
They would do around 3.2 teraflops so yeah, pretty faster than today's GPUs.
Post 31 Dec 2009, 15:35
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Quote:

Thing is, would targetting assembly for a "virtual" CPU be a good idea?

Well, I don't know about how CUDA intefaces with the hardware, but with DirectX it is highly probable that when you compile your HLSL code you actually get shader assembly bytecodes rather than GPU's native opcodes. Because of that, I think you could still do better at times using shader assembly instead of high level (in the same way like you can with Java and .Net, the JIT could be good, but it is better to pre-optimize rather than trusting that it will also optimize your pseudo-code)
Post 31 Dec 2009, 19:25
View user's profile Send private message Reply with quote
MattBro



Joined: 08 Nov 2003
Posts: 37
MattBro
r22 wrote:
All we need is Intel/AMD to make a chip with just SSE/2/3/4 compatibility and 32+ cores. Intel tried with Larrabee but ended up canceling it due to performance issues (probably SSE not having trig functions sin/cos etc.)
However, Larrabee was compatible with the standard x86 instruction set as well, which added unneeded overhead for a companion processor. The inefficient encoding for SSE because it's an extension probably doesn't help matters either the die size for 32+ would probably be prohibitively large.

MOV rax, [RANT]
RET 0


Yes if you are going to make a lot of cores why even have SSE units? IMHO the architecture for SSE is completely misguided. I'd like to use the word retarded, but I know they've got some smart engineers at Intel. If you take a look at the way the GPU vendors have handled SIMD, you'll see why SSE seems stupid. Coupling two floats/ints into one register and then dealing with all the packing and unpacking permutations has created a mess that very few compiler vendors can deal with, let alone poor assembler programmers.

It's far better to make a lot of cheap lower complexity cores, than a few high complexity ones that are difficult to program. The success of CUDA has shown that you can still do a lot without the use of semaphores or complex core data interdependencies. Of course the problem with x86 or x64 is that for floating point you are stuck with either the old floating point stack or SSE. Not a lot of good choices there in terms of leveraging your silicon.

_________________
-- -------------------------------------------------------
"I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus
---------------------------------------------------------
Post 01 Jan 2010, 08:43
View user's profile Send private message Visit poster's website Reply with quote
MattBro



Joined: 08 Nov 2003
Posts: 37
MattBro
f0dder wrote:
Thing is, would targetting assembly for a "virtual" CPU be a good idea? You'd probably be stuck with either some pretty high-level assembly that still needs to be translated to The Real Deal (and with less leeway to do so, because you're working on a lower abstraction level), or you'd end up with a lowest-common-denominator that can't take advantage of everything...


Perhaps you are correct. It may be that the ambitions of OpenCL can only be unified at the higher level of abstraction afforded by the use of C. Nevertheless I do note that the use of llvm implies a virtual machine assembler as a target of the C code. The APIs that set up the "kernels" that are run as multiple threads on the GPU device would still call appropriate library routines, and that wouldn't really change with a virtual machine either.

_________________
-- -------------------------------------------------------
"I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus
---------------------------------------------------------
Post 01 Jan 2010, 08:48
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.