flat assembler
Message board for the users of flat assembler.
![]() Goto page 1, 2 Next |
Author |
|
Tomasz Grysztar 27 Apr 2018, 20:15
A friend of mine that knows that I deal a lot with x86 assembly sent me a link to this discussion today:
What are some examples of beautiful x86 assembly code? I wonder what would the users of this board answer. I see that someone there already mentioned HeavyThing - this would probably be my pick, too. In recent years this was the code that impressed me the most - even though, or perhaps exactly for the reason that its style is in many aspects different from mine. And I could not overlook someone mentioning "xor eax,eax" as "the coolest one liner". How many disputes over this simple instruction have we had already? Perhaps never enough. Speaking of which, if we talk not about beautifully structured programs, but just a tiny tricky snippets that do impressive things with at most few instructions, then my favorite is perhaps a trick to convert a nibble to hexadecimal digits without jumps of tables: The Svin wrote:
|
|||
![]() |
|
revolution 28 Apr 2018, 02:15
I'd not seen the two instruction Fibonacci calculator before. Very nice.
|
|||
![]() |
|
DimonSoft 28 Apr 2018, 07:28
Tomasz Grysztar wrote: Speaking of which, if we talk not about beautifully structured programs, but just a tiny tricky snippets that do impressive things with at most few instructions, then my favorite is perhaps a trick to convert a nibble to hexadecimal digits without jumps of tables: AFAIK, it’s called Allison’s algorithm. |
|||
![]() |
|
Tomasz Grysztar 28 Apr 2018, 09:04
revolution wrote: I'd not seen the two instruction Fibonacci calculator before. Very nice. DimonSoft wrote: AFAIK, it’s called Allison’s algorithm. Code: ADD AL, 90h DAA ADC AL, 40h DAA |
|||
![]() |
|
revolution 28 Apr 2018, 09:35
Tomasz Grysztar wrote:
|
|||
![]() |
|
Tomasz Grysztar 28 Apr 2018, 10:01
revolution wrote: Are you sure that xadd was designed to compute Fibonacci numbers? I thought its original intention was something else. And it takes a bit of inspiration to realise that it can also be used for Fibonacci. |
|||
![]() |
|
alexfru 28 Apr 2018, 10:41
I think x86 code is more of an accidental beauty. There are just a few good things about the instruction set: it's compact, it's extensible, an instruction can encode the full memory operand offset (not just 12 bits or something). The rest is pretty ugly: too few registers, a bunch of non-orthogonal instructions, some of them with hard-coded operands.
The MIPS equivalent of the hex->ASCII is just one instruction longer: Code: # a0 = input: 0 through 15 # v0 = output: corresponding ASCII char: '0' ... '9', 'A' ... 'F' addiu v0, a0, 0x30 # output candidate 1: input + 0x30 addiu t0, a0, 0x37 # output candidate 2: input + 0x37 sltiu t1, a0, 10 # input < 10? movz v0, t0, t1 # if input >= 10, choose output candidate 2 No tricky instructions, no hard-coded registers. Everything's simple and regular. |
|||
![]() |
|
revolution 28 Apr 2018, 11:08
alexfru wrote: I think x86 code is more of an accidental beauty. There are just a few good things about the instruction set: it's compact, it's extensible, an instruction can encode the full memory operand offset (not just 12 bits or something). alexfru wrote: The MIPS equivalent of the hex->ASCII is just one instruction longer: |
|||
![]() |
|
Tomasz Grysztar 28 Apr 2018, 11:10
alexfru wrote: No tricky instructions, no hard-coded registers. Everything's simple and regular. ![]() |
|||
![]() |
|
DimonSoft 28 Apr 2018, 20:47
Tomasz Grysztar wrote:
Well, not much sources, but… Both variations are described in [url=https://groups.google.com/forum/#!msg/comp.lang.asm.x86/TJg1gpsY8FQ/khvpAflvzpMJ]this old comp.lang.asm.x86 thread[/url] along with explanations on how/why they work. I feel I’ve seem a page some day that told a bit more about who this Allison is/was but can’t find it now. |
|||
![]() |
|
Furs 28 Apr 2018, 23:05
alexfru wrote: The MIPS equivalent of the hex->ASCII is just one instruction longer: |
|||
![]() |
|
alexfru 29 Apr 2018, 00:20
Furs wrote:
LOL |
|||
![]() |
|
bitRAKE 29 Apr 2018, 01:12
On another machine, I have an archive going back over 20 years - lots of saved snippets and whole projects from dubious sources. But for right now, when so much of the web has moved on - the MadWizard remains:
![]() ...and http://www.azillionmonkeys.com/qed/asmexample.html Bubble sort: Code: ; by Andrew Howe outerloop: lea ebx,[edi+ecx*4] mov eax,[edi] cmploop: sub ebx,4 cmp eax,[ebx] jle notyet xchg eax,[ebx] notyet: cmp ebx,edi jnz cmploop stosd loop outerloop |
|||
![]() |
|
revolution 29 Apr 2018, 01:47
bitRAKE wrote: ...and http://www.azillionmonkeys.com/qed/asmexample.html |
|||
![]() |
|
bitRAKE 29 Apr 2018, 02:06
Couldn't agree more, revolution. Projects like GMPlib do that sort of thing. (Even though they don't always have the best algorithms, imho.)
...and a Windows 64 snip: Code: ; structure is created on stack by ENTER instruction. mov rbp,$00003FFF00000008 ; INITCOMMONCONTROLSEX trickery enter 32,0 ; mov ecx,ebp ; #32# call [InitCommonControlsEx] ; |
|||
![]() |
|
revolution 29 Apr 2018, 02:24
Plus with things like OOO execution and LRU caches, when trying to do a short timing test, it is almost impossible to get consistent results. Added to the fact that some CPUs have decoupled the TSC from the CPU clock making it so much harder to get precise results.
The solution is to simply forget about it and concentrate on the overall performance, rather than individual parts. And even then it ain't easy; change one instruction and suddenly the whole chain of events is different and you get a huge change. All the while accounting for the fact that only on that CPU do you see such effects. Non-determinism FTW ![]() |
|||
![]() |
|
bitRAKE 29 Apr 2018, 03:15
You'd think it'd be in the best interest of the CPU manufacturers to actually publish accurate CPU details, but the marketing department has convinced them that that is just not the case. This is seen most blatantly in the consumer line versus the Xeon line of Intel chips.
I don't think anyone is doing short timing test anymore - maybe for the last decade. There are different results for each cache level, and the length of the test needs to account for that. I code for size by default for that reason - code that runs once takes longer to load than to execute. |
|||
![]() |
|
revolution 29 Apr 2018, 03:35
bitRAKE wrote: You'd think it'd be in the best interest of the CPU manufacturers to actually publish accurate CPU details, but the marketing department has convinced them that that is just not the case. This is seen most blatantly in the consumer line versus the Xeon line of Intel chips. |
|||
![]() |
|
revolution 29 Apr 2018, 03:44
Trying to keep on topic
![]() Here is an ARM snippet to range check a value in two instructions: Code: cmp r0,'0' ;carry is set if r0 >= '0' rsbcss ip,r0,'9' ;carry is set if r0 <= '9'. ip is not used |
|||
![]() |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.