flat assembler
Message board for the users of flat assembler.

Index > Main > Any point in "optimizing" with smaller numbers?

Author
Thread Post new topic Reply to topic
moveax41h



Joined: 18 Feb 2018
Posts: 59
moveax41h 11 Dec 2019, 02:34
I come from C and this is sorta a C/fasm hybrid question.

Say I have a function like this in HLL:

Code:
void display_menu_item(unsigned char menu_index)
{
  printf("%s\n", some_lookup_table[menu_index]);
  return;
}
    


A lot of programmers would just use an int here, or maybe an unsigned int. However, in my case, there will never be a menu item larger than 256. In fact, there won't be one larger than 50. I would NEVER have a negative menu item, so I typically would not use a signed number here... But int seems to be the universally abused type in C at least.

In my past, I try to consider this and I typically code up the smallest memory size necessary to store the value. However, as I recently have been programming more assembler, I wanted to confirm that this probably doesn't actually save any space or optimize anything at all, because the word size of the x64 is already way bigger than 1 byte anyway.

Is this a correct assumption? In programming fasm, I've realized that actually the fastest operations probably involve 64 bit or in some cases 32 bit and anything smaller doesn't seem to affect optimization.

_________________
-moveax41h
Post 11 Dec 2019, 02:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 11 Dec 2019, 03:06
You have to test the result to see if it makes a difference in your case.

Every case is different. You can't analyse this statically because there are many things going on inside the CPU that are unpredictable until the code actually runs.

Sometimes using smaller allocations is a win, sometimes it is a loss, other times it makes no meaningful difference.
Post 11 Dec 2019, 03:06
View user's profile Send private message Visit poster's website Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 435
Location: Australia
redsock 11 Dec 2019, 09:30
moveax41h wrote:
A lot of programmers would just use an int here, or maybe an unsigned int. However, in my case, there will never be a menu item larger than 256. In fact, there won't be one larger than 50. I would NEVER have a negative menu item, so I typically would not use a signed number here... But int seems to be the universally abused type in C at least.
Agree with @revo that every case is really different, but I would like to add:

A lot of the HLL people (HLL here being >C/C++) argue that having the ability to shoot ourself in the foot with an integer here is inherently evil, and all that is wrong in the world of IT (exaggerating for effect, but it doesn't take much searching online to find a decent flame war about this).

Consider this:
Code:
static inline bool is_some_array_value_positive(int index) {
    if (index < 0 || index > sizeof(array) / sizeof(array[0]))
        return false;
    return array[index] > 0;
}    


For an unsigned char, the compilers on x86_64 usually promote it to 32 bits to avoid partial register stalls (search Agner Fog's optimization guides for more facedesk exercises here).

So, if it promotes it to a 32 bit index (because I used int, instead of long int), then does it really matter whether it is signed or not?

The unsigned version could just elimimate the < 0 conditional, and be just as effective (read: do the same thing). If the integer is a -desired- effect (and I agree it rarely is), then the index checking is still important regardless if you want "robust" HLL-like "please don't crash" code Smile

Performance wise, Garbage-In, Garbage-Out is a decent mantra. If you -know- your index is good, you don't need to check it, do you? You can call it an int, or an unsigned, or whatever you like and it will "just work".

Smile

$0.02

_________________
2 Ton Digital - https://2ton.com.au/
Post 11 Dec 2019, 09:30
View user's profile Send private message Reply with quote
guignol



Joined: 06 Dec 2008
Posts: 763
guignol 11 Dec 2019, 09:46
revolution wrote:
Every case is different. You can't analyse this statically because there are many things going on inside the CPU that are unpredictable until the code actually runs.
one can analize a finite machine statically.
revolution wrote:
Sometimes using smaller allocations is a win, sometimes it is a loss, other times it makes no meaningful difference.
regular mislead.
Post 11 Dec 2019, 09:46
View user's profile Send private message Reply with quote
redsock



Joined: 09 Oct 2009
Posts: 435
Location: Australia
redsock 11 Dec 2019, 09:57
guignol wrote:
one can analize a finite machine statically ....regular mislead.
I don't really understand how this is constructive or helpful to the OP.

I have spent a significant amount of time with static analysis tools, and the codepath and pipelines for my work in x86_64 can -never- accurately be computed by a third-party static analysis method.

Further you say it is a "regular mislead" that sometimes it is, and sometimes it isn't ... There are so many variables at play inside a modern CPU core that I can't call it a "finite machine" unless we get to completely isolate its function and purpose. None of the OP was questioning writing code in an airgapped CPU with no operating system or other things going on to pollute caches, dynamic register renaming, memory cache poisoning, or a whole host of other things.

I'd beg you to provide helpful and constructive comments rather than useless dead-end ones.

/rant.

_________________
2 Ton Digital - https://2ton.com.au/
Post 11 Dec 2019, 09:57
View user's profile Send private message Reply with quote
guignol



Joined: 06 Dec 2008
Posts: 763
guignol 11 Dec 2019, 11:37
so that, you're thinking that "non"-static "third-party" analitic tools are.. of some use?

CPU is a finite-state machine
nothing in real life is finite

you're living in fairy-tailed finite-state of misconceptions of your life
as in "evolution of species & nix", or "aloneness in space (what can ever be better)"; or, idk, finiteness of your purse whilst having everlasting popping-up problems, all the time, on and on, and without end.......(for sakes!)

The conclusion. Why the rant, if there should be grudge?
Post 11 Dec 2019, 11:37
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 11 Dec 2019, 15:15
Anecdotally, I was reading code for a raytracer once and noticed the structure for a ray was 68 bytes. After reducing the size to 64 bytes, I saw a 15% speed increase. I emailed the author.

There are situations where it matters. Modern x86 has instructions like MOVSX/MOVZX that can promote a smaller value to the optimal machine size. Exceeding the range of the type (some future unexpected use) should produce an error from the compiler. Maybe someone else has an example where using the smaller size would be a problem?
Post 11 Dec 2019, 15:15
View user's profile Send private message Visit poster's website Reply with quote
Estece



Joined: 08 Feb 2017
Posts: 10
Estece 11 Dec 2019, 16:07
C language is abstraction that don't know what processor is running it's program.
You can force compiler to optimize for Your type or in general.
In asm You are the optimization algorithm.
Read documentation of Your CPU for best results.
On x86_64 a 32 bit is the fastest and smaller type.
Then You have cache or memory and there is a lot of knowledge You need about what You doing with Your data.
Post 11 Dec 2019, 16:07
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2566
Furs 11 Dec 2019, 17:04
For function parameters in C, it doesn't really matter since all parameters are promoted to words (32-bit or 64-bit, depending on arch).

To fit multiple byte parameters compactly, you'd probably have to use something like this:
Code:
void _internal_function_(uint32_t);

inline void actual_function(uint8_t a, uint8_t b, uint8_t c, uint8_t d)
{
    _internal_function_(a | b << 8 | c << 16 | d << 24);
}


// use the function
actual_function(1, 2, 3, 4);    
The inline function will be optimized away since it's inline and then the 4 bytes will be packed into one 32-bit int as a single parameter to the function. Similar with asm, if you want to pack a parameter.

The compiled code will look like this (optimized):
Code:
push 0x04030201
call _internal_function_    



However for structures, usually smaller types will definitely improve efficiency and matter. But watch out for padding if you use C! (use #pragma pack to remove padding if you want it compact)
Post 11 Dec 2019, 17:04
View user's profile Send private message Reply with quote
guignol



Joined: 06 Dec 2008
Posts: 763
guignol 12 Dec 2019, 13:28
C language is an abstraction, is an abstraction of a compiler, compiler of an abstraction
it doesn't know anything

And what if I want to read the documentation of somebody else's CPU?
Post 12 Dec 2019, 13:28
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.