flat assembler
Message board for the users of flat assembler.
![]() Goto page Previous 1, 2, 3 Next |
Author |
|
fasmnewbie 20 Jan 2018, 02:42
@Furs, I don't think u understand how division by multiplication-constant akin to Agner Fog's style works. Logically, there's no way you can extract the last digit (Least Significant Digit) after the first MUL. It is lost forever ;D
|
|||
![]() |
|
Furs 20 Jan 2018, 12:58
Yes you can? Say we use base10 for simplicity. You have a number 1234, and its 123 (after division), you want to extract 4, using a bit of logic, just multiply 123 by 10 -> 1230, then subtract, 1234-1230 = 4.
![]() So for base26: Code: ; input number in ecx mov eax, 0x4EC4EC4F mul ecx shr edx, 3 imul eax, edx, 26 sub ecx, eax ; ecx = last digit, do stuff with it mov ecx, edx ; next input ; now loop until ecx = 0 |
|||
![]() |
|
Tomasz Grysztar 20 Jan 2018, 14:11
Multiplying the result by the divisor and the subtracting from original number is an universal method of obtaining the remainder, no matter what algorithm was used to divide. However, even in case of division through multiplication by a "magic" number it is possible, at least in some cases, to have remainder obtained directly from the main algorithm. See what I wrote about my alternative approach to these techniques.
|
|||
![]() |
|
fasmnewbie 20 Jan 2018, 14:50
@Furs
Ofc you can always start from the back digits, if you focus your algorithm that way. But by doing so, you're making your algoritm even slower than a regular DIV. It is pointless. It involves 3 MULS even before you get to the next iteration. If you're doing it from the back, just use a DIV ;D If u need faster approach, start from the front digits. A simplified technique for a 2-digit BASE-26 of decimal 274 (AE). Code: mov eax,274 mov ebx,0x4ec4ec4f xor edx,edx mul ebx shr edx,3 mov ecx,edx ;first digit mov eax,edx mov ebx,26 mul ebx mov esi,274 sub esi,eax ;second digit But this too is not any better than a regular DIV due to the needs for branches. So, just like I said, just use a DIV for a more standardized way to convert to any base. Last edited by fasmnewbie on 20 Jan 2018, 14:53; edited 1 time in total |
|||
![]() |
|
fasmnewbie 20 Jan 2018, 14:51
Tomasz Grysztar wrote: Multiplying the result by the divisor and the subtracting from original number is an universal method of obtaining the remainder, no matter what algorithm was used to divide. However, even in case of division through multiplication by a "magic" number it is possible, at least in some cases, to have remainder obtained directly from the main algorithm. See what I wrote about my alternative approach to these techniques. |
|||
![]() |
|
revolution 20 Jan 2018, 14:58
fasmnewbie wrote: ... slower than a regular DIV .... not any better than a regular DIV "better" is a subjective term. Different people will interpret that differently. Maybe you can qualify in which way you suggest it is better. For readability? For "speed"? For ease of programming? For fewer instruction bytes? Something else? |
|||
![]() |
|
fasmnewbie 20 Jan 2018, 15:12
revolution wrote:
revolution, let the different ideas flow more freely on this board. This board is lacking this specific kind of discussions because every time, there's some people who say "stop it peeps, it all depends on the system. So these discussions are useless. End the discussions now". And as I recall it, this is the first, in a about two years or so, we have this kind of beginners question on conversion. The last one was handled gracefully by AsmGuru. This latest one is already bombarded with high-performance advanced optimization techniques. No wonder why beginners questions are so rare on this board. hahaha ;D |
|||
![]() |
|
revolution 20 Jan 2018, 15:18
Hmm, okay, I wasn't trying stop you discussing anything. If it appears that way then I guess I word things badly. I was trying to get you to define what you mean more clearly. And also to realise that faster/slower are not absolutes. Others will experiences different behaviour on their systems. I think it is important to acknowledge that.
|
|||
![]() |
|
fasmnewbie 20 Jan 2018, 15:23
It doesn't matter what the terminologies are being used. People will eventually get to it in their own ways. OhEmGee, you're so tight! ;D
|
|||
![]() |
|
Furs 20 Jan 2018, 16:53
Tomasz Grysztar wrote: Multiplying the result by the divisor and the subtracting from original number is an universal method of obtaining the remainder, no matter what algorithm was used to divide. However, even in case of division through multiplication by a "magic" number it is possible, at least in some cases, to have remainder obtained directly from the main algorithm. See what I wrote about my alternative approach to these techniques. ![]() fasmnewbie wrote: Ofc you can always start from the back digits, if you focus your algorithm that way. But by doing so, you're making your algoritm even slower than a regular DIV. It is pointless. It involves 3 MULS even before you get to the next iteration. If you're doing it from the back, just use a DIV ;D According to Agner (for my CPU), mul/imul have like 3 clock cycle latency, and div is like 22-29 (for 32-bit number, for 64-bit numbers it's even more). 2 serial muls would have combined 6 clock cycle latency which is still far from 22, even if you add the sub (1 clock cycle). |
|||
![]() |
|
fasmnewbie 20 Jan 2018, 17:53
@Furs ... in this thread you need to separate the idea of optimizations in mathematical sense and the other one in string conversion sense. In mathematical sense, people talk about how fast an algorithm is based on the speed of a computational result. Look at Tomasz's own thread. He's talking "speed" purely from hypothetical mathematics POV. No strings attached.
This thread discusses about string conversion, where in IMO the fastest division algorithm out there does not make any significant improvement when strings are involved. In this particular sense, DIV is not slow as many people like to believe. That's misleading. |
|||
![]() |
|
fasmnewbie 20 Jan 2018, 18:04
Quote: It's 2 muls, but I have a hard time understanding how div can be "faster" I've seen codes employing MULS which are slower than a DIV operation Quote: According to Agner (for my CPU), mul/imul have like 3 clock cycle latency, and div is like 22-29 (for 32-bit number, for 64-bit numbers it's even more). 2 serial muls would have combined 6 clock cycle latency which is still far from 22, even if you add the sub (1 clock cycle). Ofc, in theory they have lower latency. But in practice, a string conversion reads / writes from memory (that's a heavy latency). So your 'fast' division algorithm will be completely overwhelmed by the sum of latencies to memory WRITE/READ. Not to mention I/O processes being used in say, C's printf or kernel's I/O routines. Pretty pointless isn't it? ;D |
|||
![]() |
|
Ali.Z 21 Jan 2018, 10:52
yeohhs wrote:
well that was simple in c/c++ i didnt know that %8.x will display the result as a hex. about all the other posts from great users, shr and shl for simplicity honestly. the in depth control is under div. (and actually more advanced) but the algorithm required for it can be long a bit, which will result extra microseconds (not milliseconds) _________________ Asm For Wise Humans |
|||
![]() |
|
rugxulo 07 Feb 2018, 20:05
In assembly (but not x64?), for hex (base 16), you can use the old low-nibble conversion trick: "cmp al,10 // sbb al,105 // das".
My own HLL code is fairly naive. But yes, the big advantage to hex is that it's fast and easy to convert to string without slow DIV. It may not matter for major platforms (e.g. Windows or Linux), but printf() can be a pig. It's both bloated and slow, at least when statically linking (e.g. DJGPP). When writing a partial hexdump / od clone, I wrote my own crude routine (in C) which was significantly smaller and faster. Similarly, I wrote my own for Turbo Pascal 5.5 since it lacked (TP 7 ??; well, at least FPC) hexstr(). Of course, buffering helped a lot, too. It's also faster to not have one function for everything. FPC's hexstr() is fast, but I found that it was faster in TP 5.5 to use two separate, specialized routines (bytehex, longhex) instead of only one universal one. The more specific and isolated you can be, the more optimized it is. Writing generic functions that do it all is great, but sometimes you only need the bare minimum. You don't need to call a full printf() implementation, reparsing your format string over and over again, if all you need is simple hex output. |
|||
![]() |
|
Ali.Z 08 Feb 2018, 04:49
i think my main problem with consoles are:
displaying and clearing the console screen. im not sure why i have issues with C runtime library (msvcrt.dll) it can be my bad for using it wrong. currently im using console APIs (WriteConsole) but its not easy to use. as for algorithms there are many, and people here mentioned many as well. some of them are confusing a bit, some of them looks simple. also for sure it might not be a big problem to display 1byte hex, but what if i need to display 4byte long hex value. say: 3 millions decimal to hex, which in this case i have no idea how to deal with it. |
|||
![]() |
|
Ali.Z 29 Jun 2018, 23:02
ok guys, previously you all helped me converting to hex format.
now in my program for some reason i need to convert some hex values (reading from file so its actually char to) 4byte (dword) say i have a string in my file: 4001FC8 as a hex. converting from base16 to 10 is difficult to me. (ive no idea too) |
|||
![]() |
|
Picnic 30 Jun 2018, 07:28
Ali.A wrote: say i have a string in my file: 4001FC8 as a hex. Hi Ali.A, Here is a simple HexToDword routine. Converts 4001FC8 to 67117000 in EAX. No error checking whatsoever. Code: ; input: ESI pointer to string buffer ; output: EAX HexToDword: push ebx esi xor ebx, ebx cld .loop: lodsb test al, al je .return sub al, '0' cmp al, 10 jl @F sub al, 7 @@: shl ebx, 4 or bl, al jmp .loop .return: mov eax, ebx pop esi ebx ret Ali.A wrote: i think my main problem with consoles are: Here is a CLS routine to get you started.
|
|||||||||||
![]() |
|
Ali.Z 30 Jun 2018, 09:11
thanks bro, downloaded your cls function.
already im using 0x0D 0x0A to write 100 of lines lol, anyhow i want to understand: - sub al,'0' ; subtract 30 hex, (ascii table 30 = 0) - then compare if its 10? 0x0A if its less go forward otherwise sub 7? why? - bx is 0, shifting it to left will result 64d, then or 64 with content of al! this algorithm is out of my thinking range. btw, i was thinking to use SetConsoleCursorPosition .. do i really have to use STRUCT and pass a pointer for this struct? cant i just create a label and define dwXpos and Ypos? |
|||
![]() |
|
Picnic 30 Jun 2018, 11:05
Yes you can, just as you imagine it.
Code: ; data section dwCoord dw 40,10 ; data section dwCoord dd 0x000A0028 ; data section dwCoord: x dw 40 y dw 10 Code: ; code section invoke SetConsoleCursorPosition, dword [hOut], dword [dwCoord] p.s. CLS.asm was a wrong choice of name. It will conflict with cmd.exe CLS command. Please rename it after download. I added a few comments. I'm sure you'll figure it out. Code: sub al, '0' ; subtract 48 from the char cmp al, 10 ; see if it was a digit 0-9 and not a letter (assume A-F) jl @F ; jump if it was a digit sub al, 7 ; convert char to letter @@: shl ebx, 4 ; get space for next nibble or bl, al ; store the nibble |
|||
![]() |
|
Goto page Previous 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.