flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
farrier
_________________ Some Assembly Required It's a good day to code! U.S.Constitution; Bill of Rights; Amendment 1: ... the right of the people peaceably to assemble, ... The code is dark, and full of errors! |
|||
![]() |
|
THEWizardGenius
Be careful, as not all GPU's are compatible. Some people might not even have a GPU, so your code won't work for everyone unless you do it right. Still, it's a good idea.
|
|||
![]() |
|
Micah II
As a c++ user with a little OpenGL experience, i believe it's possible to do some minor computational work on current CPU's most of that still related to 3d maths though, i.e. rotations and translations. (and thus Sine() and Cosine().)
as for programming for GPU for other purposes, i don't think anyone really has enough information about it, since nVIDIA and ATI keep their GPU's secrets very well hidden. and thewizardgenius is right, for example the latest Geforce 6 models use the NV40 core i believe, where the chip used in the xbox has a NV2A core, having a different architecture, and thus requiring different ASM sources, making it more logical to let the graphics card producers write the nitty gritty stuff like HLSL or CG, and you coding your things ina a high level language. simply put, ASM for GPU's wouldn't make any sense IMO. |
|||
![]() |
|
r22
Agreed sadly, after doing a bit more research that's no real low level approach to running code on the GPU.
Because of the different architectures you need to have a runtime that compiles the correct code. For instance, Brook General Purpose GPU project uses streams and kernel functions that are converted to HLSL and the runtime to compile the code to GPU machine code is put inside of the executable. It's a shame that the greatest speed optimization requires such a high level approach. |
|||
![]() |
|
LocoDelAssembly
I was thinking about this subject recently. I have no luck with OpenGL (only stupid GLSL stuff and things like "the ABR group decided to not publish the bytecodes for OGL 2.0". With DirectX however I found this http://msdn2.microsoft.com/en-us/library/bb219840.aspx
The DDK has documentation about the bytecodes as that document says. Unfortunatelly I have no time to even try writing bytecodes by hand and test. Regardless of the posibility of doing non-graphics stuff I want to write macroses to have some kind of shader assembler at least of DirectX (though I would like much more OpenGL). About collecting results I think you can capture the screen after using some pixel shaders (not sure how to capture result when using vertex shaders). This is the best thing I found for OpenGL http://cs.anu.edu.au/~Hugh.Fisher/shaders/lowlevel.html . However it uses oldest shader version and uses plain text assembly shader instead of bytecoded. I hope you can get this to work and share it here |
|||
![]() |
|
zir_blazer
What about this?
http://arstechnica.com/news.ars/post/20070219-8878.html Its a C based extension for using the nVidia GPUs for calculations but its pretty close to what you want. ATI also had disclosed some data for using their GPUs directly on Assembly language, but as I am not a developer I didn't searched around for it. |
|||
![]() |
|
LocoDelAssembly
I read about CUDA but it's intended for GeForce 8800. The processing capability is tremendously big but, requires a excessively new graphic card...
|
|||
![]() |
|
zir_blazer
LocoDelAssembly wrote: I read about CUDA but it's intended for GeForce 8800. The processing capability is tremendously big but, requires a excessively new graphic card... According to this http://forums.nvidia.com/index.php?s=c496218c5d508bece0ac9292c7968023&showtopic=28492 It applies for all the GeForce 8xxx line. That means that a GeForce 8300GT could work with the code, as maybe does a IGP based on that architecture. |
|||
![]() |
|
LocoDelAssembly
Just and update, my link doesn't work anymore (there is a redirect to the Asm Shader Reference but doesn't provide information to make your own assembler). The information about the instruction encoding is here now.
|
|||
![]() |
|
MattBro
GPU computing and more generally heterogeneous computing is the wave of the future. AMD has their fusion vision which fuses their CPU and GPU efforts as Intel is doing in some ways with their upcoming Larrabee (though it's been canceled as a gaming platform.)
For my purposes I desperately need all the performance I can get. I've got problems that take days to compute. These days that means taking advantage of NVIDIAs CUDA development environment or even better using OpenCL. OpenCL is basically a C99 compiler that compiles code to various GPU/CPU devices. It's embraced by Apple, Nvidia, AMD and supposedly Intel. (Though Intel hasn't released any drivers and maybe never will.) http://www.khronos.org/opencl/ Here's the weird part. The C99 code is compiled on the fly. The reason is that all these different devices have entirely different architectures and machine code op.s. In fact Nvidia has different machine instruction sets for different graphics cards. It only unifies at the much higher level of abstraction, the openCL code. That probably means that a lot of the performance of hand optimized assembly is lost, even though performance is critical for these applications. I personally think OpenCL should have agreed upon a low level virtual machine and assembler, rather than C99, and then they could supply the compilers to that target. They are using the LLVM infrastructure for this anyway. Maybe there are GPU back ends for the LLVM code. Wouldn't it be nice if there were a FASM for heterogenous computing? That's probably more like a high level assembler project though. _________________ -- ------------------------------------------------------- "I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus --------------------------------------------------------- |
|||
![]() |
|
f0dder
Thing is, would targetting assembly for a "virtual" CPU be a good idea? You'd probably be stuck with either some pretty high-level assembly that still needs to be translated to The Real Deal (and with less leeway to do so, because you're working on a lower abstraction level), or you'd end up with a lowest-common-denominator that can't take advantage of everything...
|
|||
![]() |
|
r22
All we need is Intel/AMD to make a chip with just SSE/2/3/4 compatibility and 32+ cores. Intel tried with Larrabee but ended up canceling it due to performance issues (probably SSE not having trig functions sin/cos etc.)
However, Larrabee was compatible with the standard x86 instruction set as well, which added unneeded overhead for a companion processor. The inefficient encoding for SSE because it's an extension probably doesn't help matters either the die size for 32+ would probably be prohibitively large. MOV rax, [RANT] RET 0 |
|||
![]() |
|
f0dder
Do you reckon 32 x86-like cores would be enough to reach computational power of today's GPUs?
|
|||
![]() |
|
Borsuc
They would do around 3.2 teraflops so yeah, pretty faster than today's GPUs.
|
|||
![]() |
|
LocoDelAssembly
Quote:
Well, I don't know about how CUDA intefaces with the hardware, but with DirectX it is highly probable that when you compile your HLSL code you actually get shader assembly bytecodes rather than GPU's native opcodes. Because of that, I think you could still do better at times using shader assembly instead of high level (in the same way like you can with Java and .Net, the JIT could be good, but it is better to pre-optimize rather than trusting that it will also optimize your pseudo-code) |
|||
![]() |
|
MattBro
r22 wrote: All we need is Intel/AMD to make a chip with just SSE/2/3/4 compatibility and 32+ cores. Intel tried with Larrabee but ended up canceling it due to performance issues (probably SSE not having trig functions sin/cos etc.) Yes if you are going to make a lot of cores why even have SSE units? IMHO the architecture for SSE is completely misguided. I'd like to use the word retarded, but I know they've got some smart engineers at Intel. If you take a look at the way the GPU vendors have handled SIMD, you'll see why SSE seems stupid. Coupling two floats/ints into one register and then dealing with all the packing and unpacking permutations has created a mess that very few compiler vendors can deal with, let alone poor assembler programmers. It's far better to make a lot of cheap lower complexity cores, than a few high complexity ones that are difficult to program. The success of CUDA has shown that you can still do a lot without the use of semaphores or complex core data interdependencies. Of course the problem with x86 or x64 is that for floating point you are stuck with either the old floating point stack or SSE. Not a lot of good choices there in terms of leveraging your silicon. _________________ -- ------------------------------------------------------- "I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus --------------------------------------------------------- |
|||
![]() |
|
MattBro
f0dder wrote: Thing is, would targetting assembly for a "virtual" CPU be a good idea? You'd probably be stuck with either some pretty high-level assembly that still needs to be translated to The Real Deal (and with less leeway to do so, because you're working on a lower abstraction level), or you'd end up with a lowest-common-denominator that can't take advantage of everything... Perhaps you are correct. It may be that the ambitions of OpenCL can only be unified at the higher level of abstraction afforded by the use of C. Nevertheless I do note that the use of llvm implies a virtual machine assembler as a target of the C code. The APIs that set up the "kernels" that are run as multiple threads on the GPU device would still call appropriate library routines, and that wouldn't really change with a virtual machine either. _________________ -- ------------------------------------------------------- "I am the Way and the Truth and the Light, no one comes to the Father except through me" - Jesus --------------------------------------------------------- |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.