number dq 17777777777777777777
Still 9's for me with the first Overclick one, post with results updated. Also ran one too many loops on the original post for Overclick results so probably ~5% faster than what was originally shown.
New Overclick code modified to 64-bit for comparison.
xor rdx,rdx
mov rax,[number]
mov rcx,0x3030303030303030 ; Need to initialize buffer each time
mov qword[result],rcx
mov qword[result+8],rcx
mov dword[result+0x10],ecx
beg:
sub sil,sil
mov rcx,[simple+rdx*8]
begin:
cmp rax,rcx
jb @F
sub rax,rcx
inc sil
jmp begin
@@:
add [result+rdx],sil
inc rdx
cmp rdx,19
jnz beg
add byte[result+rdx],al
The results now with clock cycles from RDTSC and running at the HFM (2GHz).
1 Million Iterations @2GHz, Counter Frequency 10000000
Number Used Number Ret ms Clocks
================================================================
9999999999999999999 09999999999999999999 155.293 311
9999999999999999999 09999999999999999999 153.471 307
9999999999999999999 09999999999999999999 152.972 306
999999999999999 00000999999999999999 139.653 279
999999999999999 00000999999999999999 140.957 282
999999999999999 00000999999999999999 141.443 283
99999999999 00000000099999999999 123.067 246
99999999999 00000000099999999999 124.466 249
99999999999 00000000099999999999 122.532 245
9999999 00000000000009999999 83.985 168
9999999 00000000000009999999 83.759 168
9999999 00000000000009999999 83.854 168
999 00000000000000000999 56.915 114
999 00000000000000000999 57.067 114
999 00000000000000000999 56.610 113
5555555555555555555 05555555555555555555 102.313 205
5555555555555555555 05555555555555555555 95.783 192
5555555555555555555 05555555555555555555 98.954 198
555555555555555 00000555555555555555 96.333 193
555555555555555 00000555555555555555 96.899 194
555555555555555 00000555555555555555 96.711 193
55555555555 00000000055555555555 85.598 171
55555555555 00000000055555555555 85.215 170
55555555555 00000000055555555555 88.553 177
5555555 00000000000005555555 76.872 154
5555555 00000000000005555555 76.221 152
5555555 00000000000005555555 76.553 153
555 00000000000000000555 51.490 103
555 00000000000000000555 51.183 102
555 00000000000000000555 51.902 104
0 00000000000000000000 46.033 92
0 00000000000000000000 44.347 89
0 00000000000000000000 46.035 92
1 00000000000000000001 46.018 92
1 00000000000000000001 46.038 92
1 00000000000000000001 46.029 92
18446744073709551615 18446744073709551615 99.439 199
18446744073709551615 18446744073709551615 102.999 206
18446744073709551615 18446744073709551615 100.036 200
17777777777777777777 17777777777777777777 120.813 242
17777777777777777777 17777777777777777777 119.089 238
17777777777777777777 17777777777777777777 120.917 242
5000000000000000000 05000000000000000000 47.151 94
5000000000000000000 05000000000000000000 45.065 90
5000000000000000000 05000000000000000000 45.056 90
5000000000000000 00005000000000000000 45.026 90
5000000000000000 00005000000000000000 45.187 90
5000000000000000 00005000000000000000 45.026 90
5000000000000 00000005000000000000 45.033 90
5000000000000 00000005000000000000 45.027 90
5000000000000 00000005000000000000 45.028 90
5000000000 00000000005000000000 45.027 90
5000000000 00000000005000000000 45.034 90
5000000000 00000000005000000000 45.030 90
5000000 00000000000005000000 45.034 90
5000000 00000000000005000000 45.025 90
5000000 00000000000005000000 45.034 90
5000 00000000000000005000 45.028 90
5000 00000000000000005000 45.035 90
5000 00000000000000005000 45.027 90
50 00000000000000000050 45.018 90
50 00000000000000000050 45.030 90
50 00000000000000000050 45.061 90
I'm guessing maybe Roman wanted to see if using add/sub would be faster than using mul/div but his code seems to consistently run at around 5 cycles per digit so seems pretty nice to me.