flat assembler
Message board for the users of flat assembler.
 Home   FAQ   Search   Register 
 Profile   Log in to check your private messages   Log in 
flat assembler > Main > Beauty in x86 assembly?

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 2651
Location: dank orb


revolution wrote:
It's mind bogglingly complex.

...and we aren't getting any younger, lol. Very Happy
That's what good tools are for - to leverage a limited mind and body.

Here we have the GCD - it's not fast, but it works:

Code:
macro gcd? reg0,reg1
  local _0,_1
        jmp _1
_0:     neg reg1
        xchg reg1reg0
_1:     sub reg1reg0
        jg _1
        jne _0
end macro

...create your own inline instruction. Wink

_________________
Discovery without power nor ownership.
Post 29 Apr 2018, 06:58
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1125


revolution wrote:
It isn't just the CPU though. Even if you knew exactly every transistor, it is still the code you are running that affects things. On some CPUs the OOO buffer is more than 100 instructions long. So you also have to know every one of those 100+ instructions ahead of your snippet, and which port they will go into, and what instructions are currently in each port, and how many ports you have, and whether or not the memory read/write buffers are full, and the current state of the BTB and caches, whether or not another SMP instruction stream is interleaved with your stream, etc. etc. etc. It's mind bogglingly complex.

Sometimes it's good to have unused ports or units in a thread, because they become free to use for another thread with Hyperthreading.

It's not often you see hyperthreading double the performance, but it does happen (and it did for me) when I did a brute force test of a very long latency-bottlenecked algorithm (but you could parallelize individual inputs of course). Using 8 threads with hyperthreading finished in almost half the time I estimated with 4 threads (8 threads with HT), which was a 2 hour gain. And that's with me still using my PC for lightweight stuff (I lowered the priority on that test program to minimum).
Post 29 Apr 2018, 12:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15727
Location: (514107) 2015 BZ509


Furs wrote:
Sometimes it's good to have unused ports or units in a thread, because they become free to use for another thread with Hyperthreading.

Yes. If you know all about that particular CPU then you can do such things. But in six months time when a newer system is being used all that knowledge becomes useless.
Post 29 Apr 2018, 13:15
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2279
Location: Usono (aka, USA)

Remember the 62-byte Sudoku solver (DOS .COM) we discussed years ago?

There's also Assembly Gems (archived).

Bob Swart wrote about Borland Pascal Efficiency, and one interesting snippet is this:


Code:

function IsAscii(CChar): Boolean;
 InLine(
   $5B/          {      POP     BX      }
   $31/$C0/      {      XOR     AX,AX   }
   $D0/$E3/      {      SHL     BL,1    }
   $1C/$FF);     {      SBB     AL,$FF  }


Post 02 May 2018, 02:25
View user's profile Send private message Visit poster's website Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1224
Location: Mikrolimano

This snippet prints a two-digit number in AL (11 bytes).

Code:

putn:
aam
push A
A:xchg al,ah
add al,48
int 0x29 
ret 


Post 03 May 2018, 14:20
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 15727
Location: (514107) 2015 BZ509


Picnic wrote:
This snippet prints a two-digit number in AL (11 bytes).

I presume this is intended for real mode DOS int 0x29?
Post 03 May 2018, 14:29
View user's profile Send private message Visit poster's website Reply with quote
Picnic



Joined: 05 May 2007
Posts: 1224
Location: Mikrolimano

Yes, it is the 29h Fast console output (more about here Undocumented DOS Programming).

If i recall well i saw it inside one of the 256 byte demo, many years ago.
Post 03 May 2018, 14:40
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >

Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Main index   Download   Documentation   Examples   Message board
Copyright © 2004-2018, Tomasz Grysztar.
Powered by rwasa.