flat assembler
Message board for the users of flat assembler.

Index > Windows > How do I read a file byte by byte?

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
Roman



Joined: 21 Apr 2012
Posts: 1769
Roman 27 May 2020, 12:39
I heard jmp and jr knocks down CPU conveyor

Good do one jr or jmp.
Bad method do more then one jmp\jnz\jz\jae\jbe
Post 27 May 2020, 12:39
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 27 May 2020, 13:16
That all depends on CPU brand/generation that you work with. When you write something that is going to work on a wide range of hardware, my usual suggestion is to optimize primarily for size and simplicity.
Post 27 May 2020, 13:16
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 27 May 2020, 13:19
Study the BTB (Branch Target/Tag Buffer) it's how the processor predicts where to go next. Multiple Jcc close together can be a performance problem - as far as I can remember JMP has no penalty 100% predicted obviously.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 27 May 2020, 13:19
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 27 May 2020, 13:35
Forget about trying to predict what the BTB predictor will do. Laughing Really! It's a fools game to go down that path. It might give you a vague idea about things, but not much more. With over 100 instructions in flight at any one time inside the CPU there is just too much happening to predict any of this statically by simply looking at the code by eye and counting clock cycles or something. Hehe, well unless you are still using an old P3 system, then maybe you can get away with it there.

Test your code to see what happens when it runs for real. No need to guess, when you can know for sure by running it and seeing what happens.
Post 27 May 2020, 13:35
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 27 May 2020, 15:00
It was more about branch locality than trying to predict the processor. There are a limited number of bits to indicate where the Jcc instruction is. If there are too many branches in a small region of memory this can artificially slow things down.

All this is very academic at the extreme edge of optimization.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 27 May 2020, 15:00
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4024
Location: vpcmpistri
bitRAKE 29 May 2020, 01:28
As a concrete example, assume we want to execute some function, but only if a file/folder exists:
Code:
        lea rcx,[folderW]
        call [GetFileAttributesW]
        test eax,FILE_ATTRIBUTE_DIRECTORY
        lea ecx,[rax+2] ; check for error INVALID_FILE_ATTRIBUTES
        loopnz execute_function_on_folder
; fall through for multiple error conditions    
...this is a rather common pattern in Windows.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 29 May 2020, 01:28
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1769
Roman 30 May 2020, 06:18
I read https://www.agner.org/optimize/instruction_tables.pdf
Call far take 16-22 CPU cycles (for modern CPU its about 16 nanoseconds)
JMP far take 16-20 CPU cycles

Jcc take 1-2 CPU cycles

Question in Fasm 64 bit invoke do far Call ?
Post 30 May 2020, 06:18
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 30 May 2020, 06:38
Roman wrote:
I read https://www.agner.org/optimize/instruction_tables.pdf
Call far take 16-22 CPU cycles (for modern CPU its about 16 nanoseconds)
JMP far take 16-20 CPU cycles

Jcc take 1-2 CPU cycles

Question in Fasm 64 bit invoke do far Call ?
64-bit x86 code doesn't support far calls. The segment registers are not used.
Post 30 May 2020, 06:38
View user's profile Send private message Visit poster's website Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 30 May 2020, 15:00
Actually aren't far calls used to switch between 32-bit and 64-bit code?
Post 30 May 2020, 15:00
View user's profile Send private message Reply with quote
Ali.Z



Joined: 08 Jan 2018
Posts: 716
Ali.Z 30 May 2020, 18:36
Furs wrote:
Actually aren't far calls used to switch between 32-bit and 64-bit code?

so does far jmps.

_________________
Asm For Wise Humans
Post 30 May 2020, 18:36
View user's profile Send private message Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1769
Roman 30 May 2020, 19:21
Post 30 May 2020, 19:21
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.