flat assembler
Message board for the users of flat assembler.

Index > Windows > I have the assembler, now what?

Goto page Previous  1, 2, 3, 4, 5, 6  Next
Author
Thread Post new topic This topic is locked: you cannot edit posts or make replies.
Trinitek



Joined: 06 Nov 2011
Posts: 257
Trinitek
C0deHer3tic wrote:
Why can't I print out them both?
EAX is used as the return register by functions, so it gets destroyed by the first printf.

https://en.wikipedia.org/wiki/X86_calling_conventions#stdcall

EAX, ECX, and EDX are designated for use in functions, and so aren't guaranteed to be preserved.
Post 27 Mar 2017, 22:04
View user's profile Send private message Reply with quote
C0deHer3tic



Joined: 25 Mar 2017
Posts: 49
C0deHer3tic
Okay, so this is what I came up with. Any suggestions?

Code:

format PE console
entry start

include 'win32a.inc'
include 'dota.inc' ; this has my imports

section '.data' data readable

Hello db 'Hello World! The number is now %d.',0
String1 db "The number is %d",0

section '.main' data readable

subtract:
        mov eax,0    ; nop out eax
        sub ebx,1     ; subtract 1 from ebx (equals eax when we copied eax into ebx)
        mov eax,ebx  ; mov ebx into eax
        call multiply   ; call the multiply func


multiply:
        imul eax,2    ; multiply 2 by eax
        push eax      

        push Hello
        call [printf]
        add esp,4
        push 0
        call [ExitProcess]



start:
        mov eax, 4     ; eax = 4
        add eax,6      ;  add 6 to 4
        mov ebx,eax  ; copy eax into ebx (since printf will erase eax) 
        push eax       ; push eax to the stack
        push String1  
        call [printf]
        add esp, 4

        call subtract    ; call the subtract func
    


PS. I would like to make two lines of text.
Example:
Code:
The number is 10.
Hello World! The number is now 18.
    

_________________
- Just because something is taught one way, does not mean there is not a different way, possibly more efficient. -
Post 27 Mar 2017, 23:08
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
@Heretic

Seems to me that you got yourself into the wrong starting point of learning assembly. But I can see that you are legitimately trying. That's nice.

We/I can attend to your code's problems because there are lots of errors. But I give you a simple but clean example how to achieve similar objective with hope that you can slowly digest it and make your next move after that. This code shows a lot of things that you should have when doing assembly and dealing with functions;

Code:
section '.data' data readable writeable
greet db 'Hello. My first program',0ah,0
fmt db 'Result: %d + %d = %d',0ah,0
inp1 dd 6
inp2 dd 4
ans dd ?

section '.code' code readable executable
main:
        push    greet           ;arg1
        call    [printf]        
        add     esp,4           ;cleanup for arg1 push. cdecl convention

        push    [inp2]          ;arg2
        push    [inp1]          ;arg1
        call    addTwo
        mov     [ans],eax       ;copy the return value in EAX to ans
        
        push    [ans]           ;arg4
        push    [inp2]          ;arg3
        push    [inp1]          ;arg2
        push    fmt             ;arg1
        call    [printf]
        add     esp,4*4         ;cdecl stack cleanup (calling convention)

        push    0
        call    [ExitProcess]

;add two integers
;Requires 2 arguments
;Returns to EAX
;Calling convention: stdcall (callee cleanup stack)
addTwo:
        push    ebp             ;function prologue
        mov     ebp,esp
        push    ebx             ;save EBX
        
        mov     ebx,[ebp+12]    ;arg2 from stack
        mov     eax,[ebp+8]     ;arg1 from stack
        add     eax,ebx         ;EAX stores the answer
        
        pop     ebx             ;restore EBX
        mov     esp,ebp         ;function epilogue
        pop     ebp
        ret     4*2             ;stdcall stack cleanup (calling convention)    


You can slowly modify and expand it after according to your own pace. Others may help you if you have a working code like this instead as your starting point. It resembles this C code because it will be easier for you to understand it from your C background (only slightly differ in addTwo calling convention's use);

Code:
int main()
{
    int inp1=6, inp2=4;
    int ans;

    printf("Hello. My first program\n");
    ans = addTwo(inp1,inp2);
    printf("Result: %d + %d = %d\n",inp1,inp2,ans);
    return 0;
}

int addTwo(int x,int y)
{
    return x+y;
}    
Post 27 Mar 2017, 23:40
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1520
Furs
Your functions never return, that's not a good practice even if it works in this case. Use 'ret' instruction to return from a function (optionally if you have parameters on the stack for *your* function, use 'ret X' where X is number of bytes to pop).

Also you forgot newlines in your strings (they end up on the same line).

Here's a small attempt with commented changes, but CAUTION: I didn't compile it, because it's missing 'dota.inc' so couldn't test, if there's a typo etc please ignore it:

Code:
format PE console
entry start

include 'win32a.inc'
include 'dota.inc'

section '.data' data readable

Hello db 'Hello World! The number is now %d.',13,10,0    ; add newline to the strings (windows uses CR LF)
String1 db "The number is %d",13,10,0

section '.main' data readable

subtract:
        dec ebx        ; subtract 1 with 'dec' instruction, smaller encoding
        mov eax,ebx    ; could have used here 'lea eax, [ebx-1]' and get rid of 'dec'
                       ; and the mov (the lea does the same thing), but let's keep it basic Razz
        ret            ; return from subtract!


multiply:
        shl eax, 1     ; multiply by 2 with left shift by 1 bit, same thing but faster
        ret            ; return from multiply with result in eax!


start:
        mov eax, 4
        add eax, 6
        mov ebx, eax
        push eax       ; push eax to the stack
        push String1  
        call [printf]
        add esp, 4

        call subtract  ; takes input in ebx, returns result in eax
        call multiply  ; multiply takes value in eax and returns in eax

        push eax
        push Hello
        call [printf]
        add esp,4
        push 0
        call [ExitProcess]
    


What happens is that your subtract function takes as INPUT 'ebx', decrements it and places the result in 'eax'. You have to understand the "flow" of this, just like in C, you have functions that take parameters and return a value.

Of course in asm you can return multiple values (registers), but always document what you use (simple comments before your function) so you can have a clear grasp of what takes what and returns what.

A call is the exact same thing as jmp (jump/goto) instruction except that it pushes the return address on the stack. ret takes the return address found at 'esp' and jumps back.

So if you don't have ret instruction and don't use hacks to get the return address, then 'call' makes no sense at all you could just use 'jmp'. call is used for functions that are supposed to return, just like in C.

system error wrote:
Alignment is a processor thingy. Not OS. 64-bit calling conventions is tailored to suit such CPU requirement.
Huh? I was talking about requiring alignment on a function call -- the processor does not need that at all.

It's there only to make sure the tiny minority of functions that use vectorized SSE aligned loads/stores don't have to realign the stack, which is quite stupid. First of all, that's a tiny minority of functions at the expense of bloating 99% of the functions' stack (resulting in less code cache/stack cache too).

Secondly, it does NOT even work for anything better than 128-bit SSE. If you use 256-bit vectors, then it's nothing but a pure waste of space. You have to realign the stack anyway to 256-bits. Thus the 128-bit alignment is senseless in anything that uses AVX. And now we're stuck with catering to vectorized SSE even if we don't use it whatsoever (and use AVX instead). So fucking dumb. Keep in mind we're stuck with this forever as long as x86 64-bit exists with this dumb calling convention. Retarded decision.


Last edited by Furs on 27 Mar 2017, 23:52; edited 1 time in total
Post 27 Mar 2017, 23:46
View user's profile Send private message Reply with quote
C0deHer3tic



Joined: 25 Mar 2017
Posts: 49
C0deHer3tic
I thank you all for being patient with me. I will study both of theses answers. I appreciate the efforts from you all.


- Sincerely, C0deHer3tic.

_________________
- Just because something is taught one way, does not mean there is not a different way, possibly more efficient. -
Post 27 Mar 2017, 23:51
View user's profile Send private message Reply with quote
Trinitek



Joined: 06 Nov 2011
Posts: 257
Trinitek
C0deHer3tic wrote:
PS. I would like to make two lines of text.
Example:
Code:
The number is 10.
Hello World! The number is now 18.    
Either add "\n\r" in your message string, or put 0x0D, 0x0A before the terminating 0. My dot product example uses the latter.

As an additional suggestion, you should consider using a debugger and view the disassembly of a C program you'd like to convert. That should give you some implementation hints.


Last edited by Trinitek on 28 Mar 2017, 00:20; edited 1 time in total
Post 27 Mar 2017, 23:51
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1520
Furs
C0deHer3tic wrote:
I thank you all for being patient with me. I will study both of theses answers. I appreciate the efforts from you all.


- Sincerely, C0deHer3tic.
Try and use a good debugger like OllyDbg (for 32-bit code), and see how your program works one instruction at a time. Just don't forget to "Step Out" (i.e. skip) of printf or system functions, and only follow your own code. So when you arrive at printf just press F8. Rest of the time press F7 to advance your code by 1 instruction.

It's a great way to see the flow of control IMO.
Post 27 Mar 2017, 23:54
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
Huh? I was talking about requiring alignment on a function call -- the processor does not need that at all.


What do you mean the processor doesn't need aligned memory? stack is just an abstract view of the same memory in the same address space. IT IS MEMORY. So suggesting that SSE instructions require aligned memory but not aligned stack shows your fundamental understanding of how the 64-CPU works in bare metal is quite low.

SSE/AVX instructions do exist / required inside many API functions. So where do you think they get aligned memory from if not from the aligned stack?
Post 27 Mar 2017, 23:58
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17662
Location: In your JS exploiting you and your system
revolution
64-bit Windows uses the SSE/AVX instructions to move data from/to the stack when doing various internal things within the APIs. It is not just for arithmetic operations, so every API has the potential to cause a stack fault if you don't align the stack correctly.
Post 28 Mar 2017, 00:38
View user's profile Send private message Visit poster's website Reply with quote
C0deHer3tic



Joined: 25 Mar 2017
Posts: 49
C0deHer3tic
Thank you everyone. So much studying! Razz

_________________
- Just because something is taught one way, does not mean there is not a different way, possibly more efficient. -
Post 28 Mar 2017, 02:55
View user's profile Send private message Reply with quote
C0deHer3tic



Joined: 25 Mar 2017
Posts: 49
C0deHer3tic
@system error

Your code is very confusing to me. I would love to know more about what everything does, and why. Your comments helped some, however I am not understanding things like:

Code:
add     esp,4*4
    


I am not sure I understand the reason 4 times 4 added to esp
Is this because 3 args + printf were pushed to the stack? Thus making 4?

Also why does this ret 4*2? I am not quite understanding.

Code:
ret     4*2             ;stdcall stack cleanup (calling convention)
    


@Furs

Code:
shl eax, 1     ; multiply by 2 with left shift by 1 bit, same thing but faster 
    


What? I am not clear with this command. How does it multiply by 2?

Here is what I found on this command, and forgive my ignorance, but I still don't understand.

Quote:

shl shifts the destination operand left by the number of bits specified in the second operand. The destination operand can be byte, word, or double word general register or memory. The second operand can be an immediate value or the CL register. The processor shifts zeros in from the right (low order) side of the operand as bits exit from the left side. The last bit that exited is stored in CF. sal is a synonym for shl.


Also, why do you use 13,10,0 after each string?

Code:
Hello db 'Hello World! The number is now %d.',13,10,0    ; add newline to the strings (windows uses CR LF) 
String1 db "The number is %d",13,10,0 
    


Now is this because 13 is the carriage return, and would do the same if I did 0dh?
If so, why then is the newline used? Dec 10 (0ah) hex, or 1010b
Of course 0 is null. Just like "Hello World!\0" Right?


- Sincerely and curious, CodeHer3tic

_________________
- Just because something is taught one way, does not mean there is not a different way, possibly more efficient. -


Last edited by C0deHer3tic on 28 Mar 2017, 04:23; edited 1 time in total
Post 28 Mar 2017, 04:13
View user's profile Send private message Reply with quote
Trinitek



Joined: 06 Nov 2011
Posts: 257
Trinitek
C0deHer3tic wrote:
@Furs
Code:
shl eax, 1     ; multiply by 2 with left shift by 1 bit, same thing but faster 
    
What? I am not clear with this command. How does it multiply by 2?

Here is what I found on this command, and forgive my ignorance, but I still don't understand.
Quote:

shl shifts the destination operand left by the number of bits specified in the second operand. The destination operand can be byte, word, or double word general register or memory. The second operand can be an immediate value or the CL register. The processor shifts zeros in from the right (low order) side of the operand as bits exit from the left side. The last bit that exited is stored in CF. sal is a synonym for shl.
If we shift a base-2 number to the left by one binary place, we effectively multiply it by 2. In the same way, we can shift a base-10 number to the left by one decimal place to multiply it by 10. Try it for yourself in the Windows calculator's programmer mode.
Post 28 Mar 2017, 04:23
View user's profile Send private message Reply with quote
C0deHer3tic



Joined: 25 Mar 2017
Posts: 49
C0deHer3tic
@Trinitek

Thank you for the explanation. That was silly of me for not seeing that on my own.
Post 28 Mar 2017, 04:28
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17662
Location: In your JS exploiting you and your system
revolution
C0deHer3tic wrote:
Also, why do you use 13,10,0 after each string?

Code:
Hello db 'Hello World! The number is now %d.',13,10,0    ; add newline to the strings (windows uses CR LF) 
String1 db "The number is %d",13,10,0 
    


Now is this because 13 is the carriage return, and would do the same if I did 0dh?
If so, why then is the newline used? Dec 10 (0ah) hex, or 1010b
Of course 0 is null. Just like "Hello World!\0" Right?
The following are all equivalent: 10, 0xa, 0ah, 1010b, 12o, 5+5, 5 shl 1, 5*2, 0x5*0x2, 0x14/2, 0x14 shr 1

In Windows
CR: cursor goes to the beginning of the current line
LF: cursor goes down to the next line
Post 28 Mar 2017, 04:44
View user's profile Send private message Visit poster's website Reply with quote
C0deHer3tic



Joined: 25 Mar 2017
Posts: 49
C0deHer3tic
Thank you, revolution. That was helpful. I understand now. Smile
Post 28 Mar 2017, 05:04
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 1520
Furs
I tested and it seems just '10' (i.e. \n, newline) is enough in Windows console app so you should just use that then I guess, if you want Smile

But yeah, 13,10 is equivalent to "\r\n" in C string, since that's their ASCII/ANSI encoding, nothing special.


system error wrote:
What do you mean the processor doesn't need aligned memory? stack is just an abstract view of the same memory in the same address space. IT IS MEMORY. So suggesting that SSE instructions require aligned memory but not aligned stack shows your fundamental understanding of how the 64-CPU works in bare metal is quite low.
Actually it shows YOUR understanding of the stack is low and you think it's something magical and can't be altered.

Dude, a function can realign the stack with one instruction (and a frame pointer). Even if the stack is completely messed up and aligned to 1 byte only. This is REQUIRED for any functions using AVX regardless if you want performance.

ALL that the stupid ABI does is guarantee functions that want to use 128-bit SSE (but not anything higher!!) that the stack is aligned so it will save 1-3 instructions in the function prolog at MOST and bloat EVERYTHING ELSE. So, we waste the stack for *every single function* (because the ABI applies to every single function, as long as it follows it) for that tiny minority of functions which use SSE (and not AVX)? Just to save a few stupid instructions in the prolog? Let's say on average 50% of functions need to waste 8 bytes of stack space to align the stack.

Keep in mind, this applies ONLY to 128-bit SSE. Any new code using AVX will have to realign the stack anyway, doesn't matter if it's 16-byte, 8-byte or 1-byte aligned, you get the same extra prolog. The ONLY code that benefits from this is strictly SSE, and that's it.

Ok, want an assembly example? Here's our AVX function prolog (this is required for performance regardless of alignment of stack to 16-bytes):

Code:
push rbp
mov rbp, rsp
and rsp, -32    ; realigns the stupid stack, WOW magic!!!

[...]    
Go on, call that function with ANY stack pointer alignment, and it WILL WORK, wow magic! Now for SSE, replace -32 with -16, same thing.

So to save that stupid instruction (which has so much overhead obviously and functions using SSE are 99% of them right?) we get this idiotic alignment requirement, WTF?

This alignment requirement doesn't even work for AVX, only produces bloat in the stack (and stack is considered "hot" to store stuff to be in the cache). So technically all code using AVX+ will not benefit from it in any way, in fact it makes it worse.

Short-sighted and pathetic design, period.

system error wrote:
SSE/AVX instructions do exist / required inside many API functions. So where do you think they get aligned memory from if not from the aligned stack?
Simple, they realign the stack with my magic code above if they need it that way!

Don't mix up AVX with SSE. AVX already has to realign the stack, so they already do it. This ABI shit is ONLY for SSE.

If this is such a wonderful idea, why not align the stack to 1024-bits just in case future AVX extensions will use 1024-bit vectors? Or let's align it to 4k bytes (a page) and be done with it, that way we can be sure it will work with any future vector instructions, right?

After all, saving that "and rsp, -4096" for that ONE function using this massive vector is extremely important yea? Let's pollute every other function in existence with this requirement for that one function using 4096-byte vectors!
Post 28 Mar 2017, 12:04
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17662
Location: In your JS exploiting you and your system
revolution
Furs wrote:
... for that tiny minority of functions which use SSE ...
Did you miss my post above? In fact a large number of API functions use the SSE/AVX instructions for moving data around (without any arithmetic being done). It is not a tiny minority. I'm not saying it is good or bad, just that there are in fact many functions that require the alignment.

Anyhow, we have it now. It is what it is. If you want to write code that interfaces with the API then you have to comply or have your code crashing.
Post 28 Mar 2017, 12:23
View user's profile Send private message Visit poster's website Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
Furs wrote:
Dude, a function can realign the stack with one instruction (and a frame pointer). Even if the stack is completely messed up and aligned to 1 byte only. This is REQUIRED for any functions using AVX regardless if you want performance.


this is where you get the idea wrong. In 64-bit ABI of any kind, the work of aligning the stack is not done by the function but rather the responsibility user codes / callers so that a function is free to do its job without any hassle. So the function do not bloat its code with function prologue and epilogue like your pathetic code is suggesting.

It is nothing different than 32-bit calling conventions where the users / callers need to re-align the stack, especially in CDECL. Same old, same old.

I understand your INCOMPETENCY when dealing with 64-bit thingy. You don't have to bark at Miscrosoft or Linus Torvalds Very Happy

You DO know that PUSH is a high-level / complex instruction, right?
Wink

You DO know that PUSH RCX consumes more microcode than plain
Code:
sub rsp,8
mov [rsp],rcx    


right? right? Lets see how good your brain vs your big mouth. Very Happy
Post 28 Mar 2017, 12:30
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
revolution wrote:
Furs wrote:
... for that tiny minority of functions which use SSE ...
Did you miss my post above? In fact a large number of API functions use the SSE/AVX instructions for moving data around (without any arithmetic being done). It is not a tiny minority. I'm not saying it is good or bad, just that there are in fact many functions that require the alignment.

Anyhow, we have it now. It is what it is. If you want to write code that interfaces with the API then you have to comply or have your code crashing.


She doesn't get the idea that 64-bit programming is not for everybody. 64-bit programming is not for the faint of heart. If she tries to understand 64-bit calling conventions with 32-bit INVOKERS mindset, then she/he is going to be hysterical (like she's now).
Post 28 Mar 2017, 12:36
View user's profile Send private message Reply with quote
system error



Joined: 01 Sep 2013
Posts: 671
system error
C0deHer3tic wrote:
Thank you everyone. So much studying! Razz


The add esp,4*x is to restore the stack, aka Top of Stack, aka ESP to its previous positions prior to function calls. That means if you PUSHED 3 items onto the stack for function arguments, then after exiting the function, you're responsible to restore it back to its original value because ESP is going to be used by everybody else.

In 32-bit computing, a push is 4 bytes. So 3 pushes is 4*3 to restore it.
In 64-bit computing, a push is 8 bytes, So 5 pushes is 8*5 to restore the Top of Stack.

Code:
Example

push a
push b
push c
call D
add esp,4*3 ;or simply add esp, 12    


I told you you're picking the wrong entry point to learn assembly. Jumping right to calling convention or stack programming is not a wise move. You need to go back down a little bit to the basics.
Post 28 Mar 2017, 12:44
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic This topic is locked: you cannot edit posts or make replies.

Jump to:  
Goto page Previous  1, 2, 3, 4, 5, 6  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.