flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
revolution 24 Mar 2018, 00:40
Why not just do a direct division for each coin?
quarters = floor (cents / 25) dimes = floor ((cents - quarters * 25) / 10) nickels = floor ((cents - quarters * 25 - dimes * 10) / 5) pennies = cents - quarters * 25 - dimes * 10 - nickels * 5 No loops required. |
|||
![]() |
|
tthsqe 24 Mar 2018, 02:50
This problem can be solved in constant time (i.e. you dont need loops). Im sure revolution can cook up such constant time code for us in x86 here in a jiffy.
Edit: however, this will require revo to properly understand the question first. ![]() Lets make sure. With 11 pennies we got 4 combos? 11 = 10+1 = 5+5+1 = 5+1+...+1 = 1+...+1 |
|||
![]() |
|
moveax41h 24 Mar 2018, 08:00
Well, I need to find the number of all combinations together of each coin type to create some given number of pennies... I'm not sure if straight division will give me that, I think it will just give me how many of each coin type individually make up the whole 200 pennies for example. I'm actually doing all of those math operations in my code above but just getting the number of quarters, dimes, nickels, and pennies in and of itself doesn't give me the total number of unique combos of all of them if that makes sense. E.G. given 200 pennies, the correct answer is 1463 combinations. Given 900 pennies, the answer is 104650 combinations.
_________________ -moveax41h |
|||
![]() |
|
donn 24 Mar 2018, 16:14
Haven't looked at your math, but have you considered back-to-back conditional jumps? I often use a style like this (which is similar to yours) and it seems efficient enough:
Code: .nextXItemLabel: ; Could be quarters or the beginning of some loop mov ecx, [someVariableIndexVal] ; Load loop entry criteria mov edx, [someOtherCount] ; Load a comparison count, maybe total items cmp ecx, edx ; Criteria to begin loop jnl .exitLabel jne .anotherExitPointLabel ; Back- to-back jumps to other exit points. For efficiency, these can be checked outside the loop so they don't slow down loop performance, but may help during debugging. cmp ecx, 0 ; Other entry checks jne .noItemsLabel ; Can jump to different parts within the loop conditionally ; loop body .noItemsLabel: ; Within body, may want to jump here when 0, for example .nextYItemLabel: ; Inner loop, structured the same as outer mov ecx, [someYVariableIndexVal] ; Load Y loop entry criteria mov edx, [someOtherYCount] ; Load a Y comparison count, maybe total items cmp ecx, edx ; Criteria to begin loop jnl .exitYLabel ; inner loop body jmp nextYItemLabel .exitYLabel: jmp nextXItemLabel .exitLabel: ; exited... .anotherExitPointLabel: So, the previous in short is structured like this: Code: nextXItem: ; Multiple entry critera checks performed immediately after label nextYItem: ; Entry criteria checks performeda after label, similar to nextXItem's jmp nextYItem jmp nextXItem That should solve nested control flow. Alternatively, you could put loop entry criteria after each loop label, and redirect to another loop, like so: Code: nextXItem: ; Some entry checks could redirect to nextYItem jmp nextXItem nextYItem: ; Entry checks performed same as other example jmp nextYItem but that's not really nested. Those are two loops of the same nesting level. If anyone else has best practices, maybe they can shed some insight. The first example is what I've been using lately. I may have misunderstood your intended control-flow structure, however. |
|||
![]() |
|
revolution 24 Mar 2018, 16:35
tthsqe wrote: Edit: however, this will require revo to properly understand the question first. moveax41h wrote: Well, I need to find the number of all combinations together of each coin type to create some given number of pennies... I'm not sure if straight division will give me that, I think it will just give me how many of each coin type individually make up the whole 200 pennies for example. I'm actually doing all of those math operations in my code above but just getting the number of quarters, dimes, nickels, and pennies in and of itself doesn't give me the total number of unique combos of all of them if that makes sense. E.G. given 200 pennies, the correct answer is 1463 combinations. Given 900 pennies, the answer is 104650 combinations. |
|||
![]() |
|
tthsqe 24 Mar 2018, 16:36
For all of you with windoze boxes, could you give this a try? It should popup a message box with the correct answers.
Code: format PE GUI 5.0 entry Start include 'win32ax.inc' section '.text' code readable executable madd: ; ebx[0:1:2:3] = ebx[0:1:2:3] * eax + ecx push esi mov esi, eax mov eax, dword[ebx + 4*2] mul esi push edx push eax mov eax, dword[ebx + 4*0] mul esi push edx push eax mov eax, dword[ebx + 4*1] mul esi imul esi, dword[ebx + 4*3] add dword[esp + 4*0], ecx adc dword[esp + 4*1], eax adc dword[esp + 4*2], edx adc dword[esp + 4*3], esi pop dword[ebx + 4*0] pop dword[ebx + 4*1] pop dword[ebx + 4*2] pop dword[ebx + 4*3] pop esi ret divide: ; ebx[0:1:2:3] = ebx[0:1:2:3] / ecx ; output remainder in edx xor edx, edx mov eax, dword[ebx + 4*3] div ecx mov dword[ebx + 4*3], eax mov eax, dword[ebx + 4*2] div ecx mov dword[ebx + 4*2], eax mov eax, dword[ebx + 4*1] div ecx mov dword[ebx + 4*1], eax mov eax, dword[ebx + 4*0] div ecx mov dword[ebx + 4*0], eax ret round: ; if 2*edx > ecx, add one to ebx[0:1:2:3] xor eax, eax add edx, edx jc .big cmp ecx, edx .big: adc dword[ebx + 4*0], eax adc dword[ebx + 4*1], eax adc dword[ebx + 4*2], eax adc dword[ebx + 4*3], eax ret print_num: ; print ebx[0:1:2:3] to edi ; also clobbers ebx[0:1:2:3] push ebp mov ebp, esp mov ecx, 10 .loop1: call divide push edx mov eax, dword[ebx + 4*0] or eax, dword[ebx + 4*1] or eax, dword[ebx + 4*2] or eax, dword[ebx + 4*3] jnz .loop1 .loop2: pop eax add eax, '0' stosb cmp esp, ebp jne .loop2 pop ebp ret print_calculation: ; in : eax number to calculate ; in/out: edi address of buffer to write calculation push esi sub esp, 4*4 mov ebx, esp mov esi, eax ; esi = number to calculate mov eax, 'f(' stosw mov dword[ebx + 4*0], esi mov dword[ebx + 4*1], 0 mov dword[ebx + 4*2], 0 mov dword[ebx + 4*3], 0 call print_num mov eax, ') = ' stosd mov eax, esi xor edx, edx mov ecx, 5 div ecx mov esi, eax ; esi = floor(number to calculate/5) mov dword[ebx + 4*0], 4 mov dword[ebx + 4*1], 0 mov dword[ebx + 4*2], 0 mov dword[ebx + 4*3], 0 mov eax, esi mov ecx, 54 call madd mov eax, esi mov ecx, 212 call madd mov eax, esi mov ecx, 225 call madd mov ecx, 240 call divide call round call print_num add esp, 4*4 pop esi ret Start: lea edi, [Message] mov eax, 0 call print_calculation mov al, 10 stosb mov eax, 11 call print_calculation mov al, 10 stosb mov eax, 99 call print_calculation mov al, 10 stosb mov eax, 200 call print_calculation mov al, 10 stosb mov eax, 900 call print_calculation mov al, 10 stosb mov eax, 999 call print_calculation mov al, 10 stosb mov eax, 12345 call print_calculation mov al, 10 stosb mov eax, -1 call print_calculation xor eax, eax stosb invoke MessageBoxA, 0, Message, "result", MB_OK invoke ExitProcess, 0 section '.data' data readable writeable Message: rb 1000 section '.idata' import data readable writeable library kernel32,'KERNEL32.DLL',\ user32,'USER32.DLL' include 'api\kernel32.inc' include 'api\user32.inc' |
|||
![]() |
|
donn 24 Mar 2018, 18:08
That was quick, numbers look correct?
|
||||||||||
![]() |
|
moveax41h 25 Mar 2018, 21:19
So I re-arranged my code because I realized my looping branches looked like crap and frankly to you experienced folks they probably still do...
It still doesn't work properly though, something is wrong, if I enter 200 I only get 40 combinations so it's a premature exit... Gonna have to do some more debugging but I'm still tryin: Code: format PE console entry main include 'macro/import32.inc' section '.rdata' data readable Msg db 'Enter an amount in PENNIES and we will give the change combinations for quarters, dimes, nickels, and pennies', 0 frmspec db '%d', 0 Msg2 db 13, 10, 'Total combinations: %d', 0 pnull db 'pause>nul', 0 section '.data' data readable writeable combos dd 0 postquarter_total dd 0 postdime_total dd 0 q dd 0 d dd 0 n dd 0 p dd 0 userinput dd 0 quarters_divided dd 0 section '.code' readable executable main: ;setup main stack push ebp mov ebp, esp push Msg call [printf] add esp, 4 push userinput push frmspec call [scanf] add esp, 8 ; Doesn't need to be in loop because this will be constant mov eax, [userinput] ; Move the penny count into eax mov edx, 0 ; prepare to do a divide mov ecx, 25 ; we are dividing by 25 for quarters div ecx ; do the divide mov [quarters_divided], eax .quarter_loop: ; q as counter mov eax, [quarters_divided] cmp eax, [q] ; compare the quotient with the counter jle endme ; if the quotient is not less than or equal to the counter, jmp out mov eax, [q] ; Quarter counter into eax mov ecx, 25 ; Prepare for mult by 25 mul ecx ; mult by 25 product in eax mov ebx, [userinput] ; put pennies into ebx sub ebx, eax ; subtract the quotient of q * 25 from pennies mov [postquarter_total], ebx ; assign to postquarter_total .dime_loop: mov eax, [postquarter_total] ; Problem loop... need to base cmp off d here mov edx, 0 mov ecx, 10 div ecx cmp eax, [d] jle .after_dime mov eax, [d] ; Move the dime cou mov ecx, 10 mul ecx mov ebx, [postquarter_total] sub ebx, eax mov [postdime_total], ebx .nickel_loop: mov eax, [postdime_total] mov edx, 0 mov ecx, 5 div ecx cmp eax, [n] ; This is never satisfied because n never increments and is always 0 jle .after_nickel ; This is our exit condition to go up to the prev loop mov eax, [n] mov ecx, 5 mul ecx mov ebx, [postdime_total] sub ebx, eax mov [p], ebx inc [combos] inc [n] ; Bug fix hopefully... jmp .nickel_loop .after_nickel: inc [d] jmp .dime_loop .after_dime: inc [q] jmp .quarter_loop endme: push [combos] push Msg2 call [printf] add esp, 8 mov esp, ebp pop ebp push pnull call [system] add esp, 4 push 0 call [exit] section '.idata' import data readable library msvcrt, 'msvcrt.dll' import msvcrt, \ printf, 'printf', \ scanf, 'scanf', \ system, 'system', \ exit, 'exit' _________________ -moveax41h |
|||
![]() |
|
moveax41h 25 Mar 2018, 22:36
This is fast as hell. Works like a charm. Did you write it without even testing it?
tthsqe wrote: For all of you with windoze boxes, could you give this a try? It should popup a message box with the correct answers. _________________ -moveax41h |
|||
![]() |
|
tthsqe 26 Mar 2018, 11:50
moveax41h, the logic of your asm code leaves me in doubt of your c code. Why do you need to divide? Try translating the following into asm. Of course the variable total doesn't need to be passed around in your asm version because you can reserve a register for it.
Code: #include <stdio.h> void countP(int n, int& total) {total++;} void countN(int n, int& total) {do countP(n, total); while ((n-=5) >= 0);} void countD(int n, int& total) {do countN(n, total); while ((n-=10) >= 0);} void countQ(int n, int& total) {do countD(n, total); while ((n-=25) >= 0);} int f(int n) { int total = 0; if (n >= 0) countQ(n, total); return total; } int main() { int number, result; loop: printf("enter integer to calculate\n"); result = scanf("%d", &number); if (result == EOF || result == 0) { printf("exiting\n"); return 0; } printf("f(%d) = %d\n",number,f(number)); goto loop; } |
|||
![]() |
|
Furs 26 Mar 2018, 12:16
Technically, that's C++ code, because int& is a reference, and C doesn't have references. (i.e. you'll need a C++ compiler to compile it, not that it's a problem, just mentioning it if OP can't compile it)
I'm sure this can be simplified to a math problem with Combinations/Permutations stuff (so just need to calculate some factorials), but I'm not that knowledgeable there. |
|||
![]() |
|
moveax41h 26 Mar 2018, 20:47
Here's the C program:
Code: #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { if (argc != 2) { printf("Usage: change [total cents]\n"); return 1; } int total = atoi(argv[1]); printf("Total combinations of quarters, dimes, nickels, and pennies to make %d\n", total); int combos = 0; int postquarter_total; int postdime_total; for (int q = 0; q <= total / 25; q++) // quarters { postquarter_total = total - q * 25; for (int d = 0; d <= postquarter_total / 10; d++) // dimes { postdime_total = postquarter_total - d * 10; for (int n = 0; n <= postdime_total / 5; n++) // nickels { int p = postdime_total - n * 5; // pennies combos++; } } } printf("%d combinations\n", combos); return 0; } |
|||
![]() |
|
tthsqe 26 Mar 2018, 22:19
Ok c code is fine, you are simply suffering from a GCE concerning labels. I will give you control flow for a for loop, and hopefully you can then apply it three times.
Code: for (a=b; a<=c; a+=d) {body} Code: mov a, b .check_cond: cmp a, c jg .exit ; exit loop if a>c body add a, d jmp .check_cond ; unconditional jump back .exit: Some people would be up in arms about that extra uncondition jump from a performance point of view. This is also possible: Code: mov a, b .check_cond: cmp a, c jg .exit ; exit loop if a>c .cond_true: body add a, d cmp a, c jle .cond_true ; dont exit if a<=c .exit: x86 is a register-starved arch. I hope you fix your conversion and also convert the c code I recommended, and compare them for speed. |
|||
![]() |
|
DimonSoft 27 Mar 2018, 15:09
tthsqe wrote: Some people would be up in arms about that extra uncondition jump from a performance point of view. <OffTop>They shouldn’t really be since static prediction always predicts unconditional jump as taken and, if memory serves me, it has a very small penalty. Unlike conditional branches which are predicted with more complex rules and the predictions for which may fail.</OffTop> |
|||
![]() |
|
revolution 27 Mar 2018, 22:24
Predictions, and their rules, and even whether or not the CPU supports branch prediction, is not an absolute. Each CPU does things differently. Try not to fall into the trap of reading one CPU manual and thinking all other CPUs do the same thing or have the same behaviours. Some people use old CPUs, some people use new CPUs, some people use a different brand of CPU, some people use embedded single core CPUs, some people use server-class multi-core CPUs., some systems have fast SRAM, some systems have slow ROM, some CPUs have no cache, etc.
|
|||
![]() |
|
moveax41h 28 Mar 2018, 07:50
Thanks for the help everyone. I finally squashed all the bugs and got this thing working.
#1 mistake was that I forgot to set each loop counter to 0 before entering the loop. Thus after each loop ran through all of its cycles in 1 iteration of the outer loop, it would never actually run again, severely misrepresenting the count. The #2 mistake was that I should have been using jl instead of jle because the loops need to execute when the numbers being compared are equal to one another as well... It ONLY needs to jump out of the loop when the number is less. These 2 fixes made the program work finally... Code: format PE console entry main include 'macro/import32.inc' section '.rdata' data readable Msg db 'Enter an amount in PENNIES and we will give the change combinations for quarters, dimes, nickels, and pennies', 0 frmspec db '%d', 0 Msg2 db 13, 10, 'Total combinations: %d', 0 pnull db 'pause>nul', 0 section '.data' data readable writeable combos dd 0 postquarter_total dd 0 postdime_total dd 0 q dd 0 d dd 0 n dd 0 p dd 0 userinput dd 0 quarters_divided dd 0 section '.code' readable executable main: ;setup main stack push ebp mov ebp, esp push Msg call [printf] add esp, 4 push userinput push frmspec call [scanf] add esp, 8 ; Doesn't need to be in loop because this will be constant mov eax, [userinput] ; Move the penny count into eax mov edx, 0 ; prepare to do a divide mov ecx, 25 ; we are dividing by 25 for quarters div ecx ; do the divide mov [quarters_divided], eax mov [q], 0 .quarter_loop: ; q as counter mov eax, [quarters_divided] cmp eax, [q] ; compare the quotient with the counter jl endme ; if the quotient is not less than or equal to the counter, jmp out mov eax, [q] ; Quarter counter into eax mov ecx, 25 ; Prepare for mult by 25 mul ecx ; mult by 25 product in eax mov ebx, [userinput] ; put pennies into ebx sub ebx, eax ; subtract the quotient of q * 25 from pennies mov [postquarter_total], ebx ; assign to postquarter_total mov [d], 0 .dime_loop: mov eax, [postquarter_total] ; Problem loop... need to base cmp off d here mov edx, 0 mov ecx, 10 div ecx cmp eax, [d] jl .after_dime mov eax, [d] ; Move the dime cou mov ecx, 10 mul ecx mov ebx, [postquarter_total] sub ebx, eax mov [postdime_total], ebx mov [n], 0 .nickel_loop: mov eax, [postdime_total] mov edx, 0 mov ecx, 5 div ecx cmp eax, [n] ; This is never satisfied because n never increments and is always 0 jl .after_nickel ; This is our exit condition to go up to the prev loop mov eax, [n] mov ecx, 5 mul ecx mov ebx, [postdime_total] sub ebx, eax mov [p], ebx inc [combos] inc [n] ; Bug fix hopefully... jmp .nickel_loop .after_nickel: inc [d] jmp .dime_loop .after_dime: inc [q] jmp .quarter_loop endme: push [combos] push Msg2 call [printf] add esp, 8 mov esp, ebp pop ebp push pnull call [system] add esp, 4 push 0 call [exit] section '.idata' import data readable library msvcrt, 'msvcrt.dll' import msvcrt, \ printf, 'printf', \ scanf, 'scanf', \ system, 'system', \ exit, 'exit' _________________ -moveax41h |
|||
![]() |
|
DimonSoft 28 Mar 2018, 11:05
revolution wrote: Predictions, and their rules, and even whether or not the CPU supports branch prediction, is not an absolute. Each CPU does things differently. Try not to fall into the trap of reading one CPU manual and thinking all other CPUs do the same thing or have the same behaviours. Some people use old CPUs, some people use new CPUs, some people use a different brand of CPU, some people use embedded single core CPUs, some people use server-class multi-core CPUs., some systems have fast SRAM, some systems have slow ROM, some CPUs have no cache, etc. Well, I’ve read quite a lot of optimization manuals by now, from very old x86 processors (the oldest was around 80286, I guess) to quite modern ones (like Core i7). The rules are the same for this particular case. Why would one ever decide to predict an unconditional branch as not taken? I guess, this is the case where your (generally useful) habit to remind about the relativity of optimization techniques becomes useless. Besides, there’s nothing wrong with taking two pieces of code that have a small difference and talking about possible pros and cons of each one. Engineering is all about making decisions and choosing implementations. We need some playground to share our knowledge on optimizations and the playground is made of such code snippets. Nearly every pipelined CPU out there has a branch predictor. 99.999% of x86 CPUs (except really old ones, I guess) these days do have it. People with old CPUs already cope with the fact that most software either fails to run or is not optimized for their particular CPU model. The number of cores doesn’t render branch prediction talks wrong since each core has the same rules and the code snippet runs on the same core (with respect to task switching, of course). What I was trying to say is that there’s no need to rewrite a loop just because someone might complain about unconditional jumps. If JMP suits the needs of this particular loop better, any attempts to avoid it with Jcc will almost definitely render its performance worse. A good advice comes with a rationale. There seems to be no rationale behind the fear of unconditional jumps ending loops. |
|||
![]() |
|
tthsqe 28 Mar 2018, 11:33
moveax41h: I'm glad you got it working, now can you make it faster? I will test your code when I can on a windows box, but at first sight it looks like it will be on the slow side.
I more aptly should have said "aesthetic" in place of "performance". Use of an unconditional jump is a hint that there might be a better way to layout your basic blocks. Besides, everyone else is doing it nowadays. Here is gcc output at O3 (you have to go all of the way to O0 to get a jmp) for you javascriptless users: Code: int f(int x); int g(int a, int b, int c, int d) { for (a=b; a<=c; a+=d) a= f(a); return a; } Code: g(int, int, int, int): cmp esi, edx mov edi, esi jg .L6 push rbp push rbx mov ebp, ecx mov ebx, edx sub rsp, 8 .L3: call f(int) lea edi, [rax+rbp] cmp ebx, edi jge .L3 add rsp, 8 mov eax, edi pop rbx pop rbp ret .L6: mov eax, esi ret |
|||
![]() |
|
revolution 28 Mar 2018, 12:19
DimonSoft wrote: Why would one ever decide to predict an unconditional branch as not taken? |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.