flat assembler
Message board for the users of flat assembler.

Index > Main > Assembly Unions

Author
Thread Post new topic Reply to topic
The_Unknown_Member



Joined: 28 Aug 2017
Posts: 17
The_Unknown_Member 13 Sep 2017, 14:42
Code:
format PE console
use32   ; x86_32
entry start 

include 'win32a.inc'

struct PNT 
        x dd ? 
        y dd ?
        z dd ?
        q dd ? 
ends

struct IPV4 
        union
                struct
                        a db ?
                        b db ?
                        c db ?
                        d db ?
                ends 
                addr dd ?
        ends 
ends 
; This is the data section:
; =======================================================
section '.data' data readable writeable 
        lhost     IPV4 <127, 0, 0, 1>

; =======================================================
section '.text' code readable executable 

start:
        ; Your program begins here
        mov eax, 0 ; Clear eax 
        mov ah, byte [lhost.a] ; Output -> 7f00
        mov eax, 0 ; Clear eax 
        mov ah, byte [lhost.b] ; Output -> 0
        mov eax, 0 ; Clear eax 
        mov ah, byte [lhost.c] ; Output -> 0
        mov eax, 0 ; Clear eax 
        mov ah, byte [lhost.d] ; Output -> 100
        mov eax, 0 ; Clear eax 
        mov eax, dword [lhost.addr] ; Output -> 100007f 
        ; Also getting the same output if I use only lhost 

        ; Exit the process: 
        push 0
        call [ExitProcess]
    


Can someone explain me what is happening in this code please ?
The first thing that I can't understand is this line:
lhost IPV4 <127, 0, 0, 1>
How here I am able to initialize a, b, c, d with different values in the union (I thought they all share one value ? Atleast this is how it is in C) ? The second thing is the values. I commented next to the function the values of a, b, c, d so please read and explain me why I am getting such outputs. And the third thing that confuses me is the final result "100007f". Why the result is "100007f" ?
Post 13 Sep 2017, 14:42
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2566
Furs 13 Sep 2017, 14:44
They don't share the same value in C either. They're wrapped in a struct inside the union. The struct is "one element" and contains four elements itself (but the union sees it as one element).

The struct is shared with the 'addr' member though.

EDIT: The final result 100007f is because it's little endian -- it stores the least significant byte first in memory. (this actually makes far more sense since we read left to right -- our numbers, by default, are written the wrong way around; we just got used to it, but it's not logical at all, since we write words left-to-right but read numbers right-to-left since you can't even pronounce how high a number is without seeing how deep it goes first)
Post 13 Sep 2017, 14:44
View user's profile Send private message Reply with quote
The_Unknown_Member



Joined: 28 Aug 2017
Posts: 17
The_Unknown_Member 13 Sep 2017, 16:15
Furs wrote:
They don't share the same value in C either. They're wrapped in a struct inside the union. The struct is "one element" and contains four elements itself (but the union sees it as one element).

The struct is shared with the 'addr' member though.

EDIT: The final result 100007f is because it's little endian -- it stores the least significant byte first in memory. (this actually makes far more sense since we read left to right -- our numbers, by default, are written the wrong way around; we just got used to it, but it's not logical at all, since we write words left-to-right but read numbers right-to-left since you can't even pronounce how high a number is without seeing how deep it goes first)

Thanks. But why it's little endian ? Shouldn't it be little endian only in the memory? Look at this:
Code:
format PE console
use32   ; x86_32
entry start 

include 'win32a.inc'

; This is the data section:
; =======================================================
section '.data' data readable writeable 
        num     dd 1 dup(0)

; =======================================================
section '.text' code readable executable 

start:
        ; Your program begins here
        mov dword [num], 18693h
        mov eax, [num] ; Output -> 18693

        ; Exit the process: 
        push 0
        call [ExitProcess]
    


Here in this code I get the result in big endian's format. Hexadecimal is read from right to left just like the binary right ? The most significant bit is the leftmost bit and the least significant bit is the rightmost bit ?
Post 13 Sep 2017, 16:15
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2566
Furs 13 Sep 2017, 16:28
I'm not sure I understand your confusion but, let me try explain.

The first byte, which is a, is the least significant byte. Your value is "100007f". This value's least significant byte is 7F, which is correct since a is 7F. It makes sense that a lower memory address contains a lesser significant byte, right?

In a decimal number, say 1234, the least significant digit is 4 (rightmost). Adding 1 to it gives you 1235 -- least significant carries to the left (i.e. right-to-left, backwards, due to human number notation)

Again, in human notation, we read the number from right to left subconsciously. They're just written backwards compared to words.

That's why I think little endian is just "logical". Lower memory -> lower byte.


I guarantee if you start to consciously read numbers from right to left, all confusion will be gone. Wink (after all, we can't process a number like "1000000" without first seeing all of its digits, we don't know how big '1' is before we see the least significant digit, so subconsciously we need to read it from right to left)

If our numbers were written left-to-right, we wouldn't have this problem. Then when we see a digit we don't have to rewind for its meaning later. We know the first one is always the least significant and know exactly of its impact without knowing the number's "length". Big endian is just stupid like that.


Last edited by Furs on 13 Sep 2017, 16:32; edited 1 time in total
Post 13 Sep 2017, 16:28
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 13 Sep 2017, 16:29
The data stored in memory is little endian. The first byte is 0x93, the next is 0x86, etc.

0x18693 is stored as four bytes: 0x93, 0x86, 0x01, 0x00. And is read back the same way, so 0x93 is put into the lowest byte of eax.

Edit: Cross post with Furs.
Post 13 Sep 2017, 16:29
View user's profile Send private message Visit poster's website Reply with quote
The_Unknown_Member



Joined: 28 Aug 2017
Posts: 17
The_Unknown_Member 13 Sep 2017, 20:24
revolution wrote:
The data stored in memory is little endian. The first byte is 0x93, the next is 0x86, etc.

0x18693 is stored as four bytes: 0x93, 0x86, 0x01, 0x00. And is read back the same way, so 0x93 is put into the lowest byte of eax.

Edit: Cross post with Furs.

But look at this example:
Code:
section '.bss' readable writeable
        num dd 1 dup (?)
section '.text' code readable executable  

start: 
        ; Your program begins here   
        mov [num], 1034fh
        mov eax, [num] ; Output -> 1034f 

        ; Exit the process:  
        push 0 
        call [ExitProcess] 
    

Here i am moving 1034f to the memory and the output is still 1034f it's not in reversed order
Post 13 Sep 2017, 20:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 13 Sep 2017, 20:28
The assembler converts and stores your number in little endian format. You see it in the source code as big endian, but if you disassemble the code and look at the hex output it is little endian. So it is only an optical illusion tricking you.
Post 13 Sep 2017, 20:28
View user's profile Send private message Visit poster's website Reply with quote
The_Unknown_Member



Joined: 28 Aug 2017
Posts: 17
The_Unknown_Member 13 Sep 2017, 21:09
revolution wrote:
The assembler converts and stores your number in little endian format. You see it in the source code as big endian, but if you disassemble the code and look at the hex output it is little endian. So it is only an optical illusion tricking you.

So when I write code I must think in the Big Endian way and when I read the code from hex editor I must think in the Little Endian way ?
Post 13 Sep 2017, 21:09
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 13 Sep 2017, 21:11
The_Unknown_Member wrote:
So when I write code I must think in Big Endian way and when I read the code from hex editor I must think in Little Endian way ?
Yes. But only for writing numbers larger than one byte. Normal English number rules use big endian, so our source code tends to follow this convention.
Post 13 Sep 2017, 21:11
View user's profile Send private message Visit poster's website Reply with quote
The_Unknown_Member



Joined: 28 Aug 2017
Posts: 17
The_Unknown_Member 13 Sep 2017, 21:49
revolution wrote:
The_Unknown_Member wrote:
So when I write code I must think in Big Endian way and when I read the code from hex editor I must think in Little Endian way ?
Yes. But only for writing numbers larger than one byte. Normal English number rules use big endian, so our source code tends to follow this convention.

Okay I understood. Thanks very much!
Post 13 Sep 2017, 21:49
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2566
Furs 13 Sep 2017, 21:53
@The_Unknown_Member: Here's an example of two cases:
Code:
db 1, 2, 3, 4  ; 4 bytes in this order
dd 0x04030201  ; assembles to the exact same thing, but because we store the "dword" at once, in source code it's "big endian" so it's reversed    
The simplest solution, to me, is to read numbers from right to left intentionally when you really have to think about this.

Or, alternatively, imagine the memory backwards: it starts from right and goes to left. (i.e. byte 0 is to the right of byte 1, though most Hex Editors don't have such option; but then you have to think that way when using "db" or the like too).

@revolution: I loved your use of "optical illusion" Wink
Post 13 Sep 2017, 21:53
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.