flat assembler
Message board for the users of flat assembler.
Index
> Main > Convert variable byte string to hex value |
Author |
|
bitRAKE 22 Feb 2009, 16:40
SHL is faster on some processors than ROL and there is no need to preserve least significant bits. Your use of ADC is slightly mis-leading because the carry flag is always clear - ADD is sufficient. No need to clear EAX at start.
Code: toHex: xor edx,edx .1: lodsb shl edx,4 ; rol aam 16 cmp ah,4 ; sub jc .3 add al,9 ; adc .3: or dl,al ; add loop .1 xchg eax,edx ret |
|||
22 Feb 2009, 16:40 |
|
hopcode 22 Feb 2009, 21:09
bitRAKE wrote:
Ah ya,ya,pardon... they came unaltered cut'n'pasted from other versions of the same routine.. It was my 66 post... But,anyway, I find no practcal use without zerostring. Updated a version with check on the charset 0-9,a-f,A-F. Strings must be zero terminated. Thanks for pointing out.. hopcode |
|||
22 Feb 2009, 21:09 |
|
bitRAKE 23 Feb 2009, 05:06
This old thread [www.asmcommunity.net] might be of interest - several techniques.
|
|||
23 Feb 2009, 05:06 |
|
hopcode 25 Feb 2009, 00:15
very interesting thread!
i have updated my source with a variant on the bitmap technique. But, important, i need a profiler that outputs things like latency times and AGI stalls between instructions. Do you know any ? regards, hopcode |
|||
25 Feb 2009, 00:15 |
|
Madis731 26 Feb 2009, 07:59
I usually calculate the latency/stalls by hand (in my head) and test my theory by applying RDTSC to the code.
Actually Agner has it all on his homepage. Code to test for single instruction timings and also PDFs describing all the timings he has measured on different CPUs. i7 is not on the list...yet. I'm planning on helping him. |
|||
26 Feb 2009, 07:59 |
|
hopcode 12 Mar 2009, 13:39
Madis731 wrote: ..the latency/stalls by hand... New processors follow µops-rules,right? If yes, should be latency/stalls calculations applied as like as for old processors? Regards, hopcode i have updated the source with only bitmask tables, but without using AF's manuals optimize |
|||
12 Mar 2009, 13:39 |
|
revolution 12 Mar 2009, 14:02
Calculating latency and stalls only makes sense if the code and data are already in the L1 cache. Otherwise grabbing the code/data from L2, L3 or main memory (or a paging file) will kill the performance and make timing estimates worthless. Why not just use RDTSC, that mostly gives a good idea of the real world performance of the test. But be careful to make sure the thread is fixed to run on the same core for the entire duration.
|
|||
12 Mar 2009, 14:02 |
|
hopcode 12 Mar 2009, 22:37
revolution wrote: ...use RDTSC... Alrighty!! and thanks to revolution Well, using this simple skeleton of RDTSC profiler found on this board, here the smallest and fastest version i have managed to do at the moment. ~450 µsecs for such an entire string szDb db "0123456789ABCDEFabcdef",0 no BT, no bitmask!! here the code - updated source 16/März/2009 ~310 µsecs check for "'" in string example "00AA'BBCC",0 - updated to 14 März 2009 ~330 µsecs for the same string Code: ;ü------------------------------------------ö ;| toHex:convert var bytestr to hex val | ;| by hopcode[mrk] | ;| Updated 16/März/2009 | ;| - added ' quoting in large numbers | ;| Updated: 25 Feb | ;| - validate 0-9,a-f,A-F | ;| Datum: 22 Feb 2009 | ;#------------------------------------------ä toHex: xor edx,edx dec esi xor eax,eax .1: inc esi or dl,al nop .4: mov al,byte[esi] sub al,30h jb .err shl edx,4 cmp al,9 jbe .1 or al,20h sub al,30h jbe .err1 cmp al,6 ja .err1 add al,9 jmp .1 .err: cmp al,0D0h jz .3 inc esi cmp al,0F7h jz .4 .err1: stc .3: xchg eax,edx ret 0 ;-------------------------usage------------- mov esi,szDb call toHex ;in ESI string to convert ;ret EAX=value / EAX=? carry=1 ;--------------------------------------------- Have you faster procs to share? ideas? BTW, conditions are: - validation a-f/A-F/0-9 - EAX result - use only 2 register + source string register hopcode [mrk] Last edited by hopcode on 16 Mar 2009, 02:39; edited 3 times in total |
|||
12 Mar 2009, 22:37 |
|
bitRAKE 14 Mar 2009, 16:27
It seems strange to return carry flag set for invalid characters when the filtering doesn't block all invalid characters. I would test on the full byte range to insure the correct return.
I came up with the following: Code: align 64 toHex: xor edx,edx mov rbx,1111'1100'0000'0000'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111'1111b mov rcx,1111'1111'1111'1111'0000'0011'1111'1111'1111'1111'1111'1111'0000'0011'1111'1111b jmp .go .0: js .end bt rbx,rax jc .end .10: shl rdx,4 and al,1111b or dl,al .go: mov al,[rsi] inc rsi test al,1100'0000b jle .0 add al,9 bt rcx,rax jnc .10 .end: dec rsi xchg rax,rdx retn Code: push rsi call toHex .A: cmp byte [rsi],"'" jne .B inc rsi xchg rax,rdx inc dword [rsp] call toHex.go jmp .A .B: pop rdx cmp rsi,rdx jz .underflow sub rdx,rsi cmp rdx,-16 jl .overflow Notice the use of JLE to branch on top two bits. |
|||
14 Mar 2009, 16:27 |
|
hopcode 16 Mar 2009, 02:13
Quote: test al,1100'0000b Quote: ...when the filtering doesn't block all invalid characters... caused from Code: test al,01000000b jz .2 Code: .0: js .end bt rbx,rax ...too much slow BT... Code: .A: cmp byte [rsi],"'" jne .B inc rsi not needed for every byte in the string,imho,but anyway,if you really want Quote:
On the contrary,my updated version handles the quote not as a syntax rule but like a visual help: You can write in your string: "11AA'BBCC" or "1'1AAB'BC'C" or as usual "11AABBCC" with no big difference in speed.As far as the proc lays all in the cache, it runs at ~310 µsec for all the test string (szDb db "012'3456789ABCDEFabcdef",0 ) on a - Intel(R) Pentium(R) 4 650 Prescott - CPU 3.40GHz - Socket 775 LGA (platform ID = 4h) - Core Stepping N0 - Core Speed 2797.3 MHz (14.0 x 199.8 MHz) - Rated Bus speed 799.2 MHz - Stock frequency 3400 MHz - Instructions sets MMX, SSE, SSE2, SSE3, EM64T - L1 Data cache 16 KBytes, 8-way set associative, 64-byte line size - Trace cache 12 Kuops, 8-way set associative - L2 cache 2048 KBytes, 8-way set associative, 64-byte line size Unfortunately, i cannot check 64bit source code, because of my 32Bit XP OS. Anyway, thank you, from your asm code i learn always something new. Regards, hopcode[mrk] |
|||
16 Mar 2009, 02:13 |
|
bitRAKE 16 Mar 2009, 03:44
BT reg,reg/imm is only one cycle on my processor (Core2 45nm, but any P3/PM/Athlon+ has the same timing). That P4 is a very special beast.
(My routine measured at 259 cycles for the first test string. Your latest measures at 217 on average.) Quote: should not be present also an error-check for the user-mishandled repetition of "'" in large numbers ? _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
16 Mar 2009, 03:44 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.