flat assembler
Message board for the users of flat assembler.
Index
> Main > Running code on GPU |
Author |
|
farrier 26 Jun 2005, 20:37
_________________ Some Assembly Required It's a good day to code! U.S.Constitution; Bill of Rights; Amendment 1: ... the right of the people peaceably to assemble, ... The code is dark, and full of errors! |
|||
26 Jun 2005, 20:37 |
|
THEWizardGenius 27 Jun 2005, 01:07
Be careful, as not all GPU's are compatible. Some people might not even have a GPU, so your code won't work for everyone unless you do it right. Still, it's a good idea.
|
|||
27 Jun 2005, 01:07 |
|
Micah II 27 Jun 2005, 15:30
As a c++ user with a little OpenGL experience, i believe it's possible to do some minor computational work on current CPU's most of that still related to 3d maths though, i.e. rotations and translations. (and thus Sine() and Cosine().)
as for programming for GPU for other purposes, i don't think anyone really has enough information about it, since nVIDIA and ATI keep their GPU's secrets very well hidden. and thewizardgenius is right, for example the latest Geforce 6 models use the NV40 core i believe, where the chip used in the xbox has a NV2A core, having a different architecture, and thus requiring different ASM sources, making it more logical to let the graphics card producers write the nitty gritty stuff like HLSL or CG, and you coding your things ina a high level language. simply put, ASM for GPU's wouldn't make any sense IMO. |
|||
27 Jun 2005, 15:30 |
|
r22 27 Jun 2005, 21:45
Agreed sadly, after doing a bit more research that's no real low level approach to running code on the GPU.
Because of the different architectures you need to have a runtime that compiles the correct code. For instance, Brook General Purpose GPU project uses streams and kernel functions that are converted to HLSL and the runtime to compile the code to GPU machine code is put inside of the executable. It's a shame that the greatest speed optimization requires such a high level approach. |
|||
27 Jun 2005, 21:45 |
|
LocoDelAssembly 12 Jun 2007, 18:22
I was thinking about this subject recently. I have no luck with OpenGL (only stupid GLSL stuff and things like "the ABR group decided to not publish the bytecodes for OGL 2.0". With DirectX however I found this http://msdn2.microsoft.com/en-us/library/bb219840.aspx
The DDK has documentation about the bytecodes as that document says. Unfortunatelly I have no time to even try writing bytecodes by hand and test. Regardless of the posibility of doing non-graphics stuff I want to write macroses to have some kind of shader assembler at least of DirectX (though I would like much more OpenGL). About collecting results I think you can capture the screen after using some pixel shaders (not sure how to capture result when using vertex shaders). This is the best thing I found for OpenGL http://cs.anu.edu.au/~Hugh.Fisher/shaders/lowlevel.html . However it uses oldest shader version and uses plain text assembly shader instead of bytecoded. I hope you can get this to work and share it here |
|||
12 Jun 2007, 18:22 |
|
zir_blazer 12 Jun 2007, 21:11
What about this?
http://arstechnica.com/news.ars/post/20070219-8878.html Its a C based extension for using the nVidia GPUs for calculations but its pretty close to what you want. ATI also had disclosed some data for using their GPUs directly on Assembly language, but as I am not a developer I didn't searched around for it. |
|||
12 Jun 2007, 21:11 |
|
LocoDelAssembly 12 Jun 2007, 21:57
I read about CUDA but it's intended for GeForce 8800. The processing capability is tremendously big but, requires a excessively new graphic card...
|
|||
12 Jun 2007, 21:57 |
|
zir_blazer 13 Jun 2007, 02:20
LocoDelAssembly wrote: I read about CUDA but it's intended for GeForce 8800. The processing capability is tremendously big but, requires a excessively new graphic card... According to this http://forums.nvidia.com/index.php?s=c496218c5d508bece0ac9292c7968023&showtopic=28492 It applies for all the GeForce 8xxx line. That means that a GeForce 8300GT could work with the code, as maybe does a IGP based on that architecture. |
|||
13 Jun 2007, 02:20 |
|
LocoDelAssembly 28 Dec 2009, 19:48
Just and update, my link doesn't work anymore (there is a redirect to the Asm Shader Reference but doesn't provide information to make your own assembler). The information about the instruction encoding is here now.
|
|||
28 Dec 2009, 19:48 |
|
MattBro 31 Dec 2009, 08:08
GPU computing and more generally heterogeneous computing is the wave of the future. AMD has their fusion vision which fuses their CPU and GPU efforts as Intel is doing in some ways with their upcoming Larrabee (though it's been canceled as a gaming platform.)
For my purposes I desperately need all the performance I can get. I've got problems that take days to compute. These days that means taking advantage of NVIDIAs CUDA development environment or even better using OpenCL. OpenCL is basically a C99 compiler that compiles code to various GPU/CPU devices. It's embraced by Apple, Nvidia, AMD and supposedly Intel. (Though Intel hasn't released any drivers and maybe never will.) http://www.khronos.org/opencl/ Here's the weird part. The C99 code is compiled on the fly. The reason is that all these different devices have entirely different architectures and machine code op.s. In fact Nvidia has different machine instruction sets for different graphics cards. It only unifies at the much higher level of abstraction, the openCL code. That probably means that a lot of the performance of hand optimized assembly is lost, even though performance is critical for these applications. I personally think OpenCL should have agreed upon a low level virtual machine and assembler, rather than C99, and then they could supply the compilers to that target. They are using the LLVM infrastructure for this anyway. Maybe there are GPU back ends for the LLVM code. Wouldn't it be nice if there were a FASM for heterogenous computing? That's probably more like a high level assembler project though. _________________ -- ------------------------------------------------------- "I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus --------------------------------------------------------- |
|||
31 Dec 2009, 08:08 |
|
f0dder 31 Dec 2009, 12:24
Thing is, would targetting assembly for a "virtual" CPU be a good idea? You'd probably be stuck with either some pretty high-level assembly that still needs to be translated to The Real Deal (and with less leeway to do so, because you're working on a lower abstraction level), or you'd end up with a lowest-common-denominator that can't take advantage of everything...
|
|||
31 Dec 2009, 12:24 |
|
r22 31 Dec 2009, 14:15
All we need is Intel/AMD to make a chip with just SSE/2/3/4 compatibility and 32+ cores. Intel tried with Larrabee but ended up canceling it due to performance issues (probably SSE not having trig functions sin/cos etc.)
However, Larrabee was compatible with the standard x86 instruction set as well, which added unneeded overhead for a companion processor. The inefficient encoding for SSE because it's an extension probably doesn't help matters either the die size for 32+ would probably be prohibitively large. MOV rax, [RANT] RET 0 |
|||
31 Dec 2009, 14:15 |
|
f0dder 31 Dec 2009, 14:34
Do you reckon 32 x86-like cores would be enough to reach computational power of today's GPUs?
|
|||
31 Dec 2009, 14:34 |
|
Borsuc 31 Dec 2009, 15:35
They would do around 3.2 teraflops so yeah, pretty faster than today's GPUs.
|
|||
31 Dec 2009, 15:35 |
|
LocoDelAssembly 31 Dec 2009, 19:25
Quote:
Well, I don't know about how CUDA intefaces with the hardware, but with DirectX it is highly probable that when you compile your HLSL code you actually get shader assembly bytecodes rather than GPU's native opcodes. Because of that, I think you could still do better at times using shader assembly instead of high level (in the same way like you can with Java and .Net, the JIT could be good, but it is better to pre-optimize rather than trusting that it will also optimize your pseudo-code) |
|||
31 Dec 2009, 19:25 |
|
MattBro 01 Jan 2010, 08:43
r22 wrote: All we need is Intel/AMD to make a chip with just SSE/2/3/4 compatibility and 32+ cores. Intel tried with Larrabee but ended up canceling it due to performance issues (probably SSE not having trig functions sin/cos etc.) Yes if you are going to make a lot of cores why even have SSE units? IMHO the architecture for SSE is completely misguided. I'd like to use the word retarded, but I know they've got some smart engineers at Intel. If you take a look at the way the GPU vendors have handled SIMD, you'll see why SSE seems stupid. Coupling two floats/ints into one register and then dealing with all the packing and unpacking permutations has created a mess that very few compiler vendors can deal with, let alone poor assembler programmers. It's far better to make a lot of cheap lower complexity cores, than a few high complexity ones that are difficult to program. The success of CUDA has shown that you can still do a lot without the use of semaphores or complex core data interdependencies. Of course the problem with x86 or x64 is that for floating point you are stuck with either the old floating point stack or SSE. Not a lot of good choices there in terms of leveraging your silicon. _________________ -- ------------------------------------------------------- "I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus --------------------------------------------------------- |
|||
01 Jan 2010, 08:43 |
|
MattBro 01 Jan 2010, 08:48
f0dder wrote: Thing is, would targetting assembly for a "virtual" CPU be a good idea? You'd probably be stuck with either some pretty high-level assembly that still needs to be translated to The Real Deal (and with less leeway to do so, because you're working on a lower abstraction level), or you'd end up with a lowest-common-denominator that can't take advantage of everything... Perhaps you are correct. It may be that the ambitions of OpenCL can only be unified at the higher level of abstraction afforded by the use of C. Nevertheless I do note that the use of llvm implies a virtual machine assembler as a target of the C code. The APIs that set up the "kernels" that are run as multiple threads on the GPU device would still call appropriate library routines, and that wouldn't really change with a virtual machine either. _________________ -- ------------------------------------------------------- "I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus --------------------------------------------------------- |
|||
01 Jan 2010, 08:48 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.