flat assembler
Message board for the users of flat assembler.
Index
> Main > fast strlen Goto page Previous 1, 2, 3 Next |
Author |
|
asmfan 08 Feb 2008, 17:44
revolution wrote: You can always use a macro, which you selectively include just for P4 optimisation. See here where I posted such a macro 3 years ago. Wrong. P4 and later including. [Intel® 64 and IA-32 Architectures Optimization Reference Manual] 248966.pdf Quote: 3.5.1.1 Using of the INC and DEC instructions As i said above dependecy on flags should be avoided. _________________ Any offers? |
|||
08 Feb 2008, 17:44 |
|
revolution 08 Feb 2008, 18:03
The P4 had a lot of bad things done, but that is in the past now thankfully. And the µcode can't fix it, it is hardwired in the CPU.
|
|||
08 Feb 2008, 18:03 |
|
asmfan 08 Feb 2008, 18:17
Yeah, if performance would be the only thing that suffer in newest CPU... no way, manufacturers make new misatakes in addition to existing and not only performance suffer A microcode reliability update is available that improves the reliability of systems that use Intel processors. Microcode the only medicine.
_________________ Any offers? |
|||
08 Feb 2008, 18:17 |
|
f0dder 09 Feb 2008, 00:02
Microcode isn't as flexible as you think, edfed... it's not like you can change any and all part of the CPU design, that would require something like a FPGA.
|
|||
09 Feb 2008, 00:02 |
|
itsnobody 17 Feb 2008, 11:29
I did some speed tests:
lstrlen API - 7300 milliseconds your strlen - 4150 milliseconds fastest strlen - 3600 milliseconds The fastest one was one I found in an Intel Pentium manual ( http://www.agner.org/optimize/#manuals ): Code: proc strlen,pointer push ebx mov eax, [pointer] ; get pointer s lea edx, [eax+3] ; pointer+3 used in the end l1: mov ebx, [eax] ; read 4 bytes of string add eax, 4 ; increment pointer lea ecx, [ebx-01010101H] ; subtract 1 from each byte not ebx ; invert all bytes and ecx, ebx ; and these two and ecx, 80808080H ; test all sign bits jz l1 ; no zero bytes, continue loop mov ebx, ecx shr ebx, 16 test ecx, 00008080H ; test first two bytes cmovz ecx, ebx ; shift if not in first 2 bytes lea ebx, [eax+2] ; .. and increment pointer by 2 cmovz eax, ebx add cl, cl ; test first byte sbb eax, edx ; compute length pop ebx ret endp You would think Microsoft would try to optimize their API or something, its very slow |
|||
17 Feb 2008, 11:29 |
|
edfed 17 Feb 2008, 11:42
Quote: You would think Microsoft would try to optimize their API or something, its very slow no, m$ optimise just their capital, they make some OS, vi$ta is their last soup, if you buy m$, you give them power. and m$ don't want to be fast, he want to be the king of the world. cmovz is not supported by earlier pentiums, so this fast strlen is designated for latest µP. i just think about something: how to build the function list for the machine at boot? f: .null dd 0 .strlen dd strlen1 .. mov eax,[f.strlen] call eax .. with this, we can build specific func list. i don't know how is it for m$, but, it's like this for menuet and i'll make it for my OS and the fasmb project. |
|||
17 Feb 2008, 11:42 |
|
revolution 17 Feb 2008, 11:43
It all depends on how long your strings are. One algorithm will not suit all situations. Short stings require algo A, medium strings require algo B, long string require algo C.
|
|||
17 Feb 2008, 11:43 |
|
edfed 17 Feb 2008, 11:47
and how do you know the leng of the string?
|
|||
17 Feb 2008, 11:47 |
|
revolution 17 Feb 2008, 11:50
edfed wrote: and how do you know the leng of the string? If your particular app always deals with long (or short) strings then you can tune a strlen algo to suit your situation. |
|||
17 Feb 2008, 11:50 |
|
itsnobody 17 Feb 2008, 18:16
edfed wrote:
How far does cmovz go back? Should work on all x86 processors |
|||
17 Feb 2008, 18:16 |
|
edfed 17 Feb 2008, 18:19
the first µP to support cmovcc is PII or PIII, before, it doesn't exists and is an invalid opcode.
|
|||
17 Feb 2008, 18:19 |
|
itsnobody 17 Feb 2008, 19:47
edfed wrote: the first µP to support cmovcc is PII or PIII, before, it doesn't exists and is an invalid opcode. Hmm..Wikipedia says it was added with "Pentium Pro", which came out in 1995 |
|||
17 Feb 2008, 19:47 |
|
mattst88 18 Feb 2008, 00:42
itsnobody wrote:
This is correct. _________________ My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions. Assembly Programmer's Journal |
|||
18 Feb 2008, 00:42 |
|
rugxulo 19 Feb 2008, 03:03
I think it's only on some PPros, so you really need to check if CPUID is available, then check if CMOV is supported, and then you can use it.
|
|||
19 Feb 2008, 03:03 |
|
daniel.lewis 12 Mar 2008, 07:07
Heh, I suppose there's probably still a niche.
I stopped caring about 1995 stuff a *decade* ago. In automotive terms, you're designing something amazingly fast for those who's primary mode of transportation is by donkey. Why wouldn't someone simply upgrade their hardware, if they truly cared one iota about performance? Considering the volume of persons who travel by donkey, your shaving a camel-hair's width off their stonethrows per fortnight probably isn't worth your expertise? Just a thought. _________________ dd 0x90909090 ; problem solved. |
|||
12 Mar 2008, 07:07 |
|
victor 12 Mar 2008, 09:21
|
|||
12 Mar 2008, 09:21 |
|
daniel.lewis 12 Mar 2008, 22:58
Well no, but the bloke certainly seems eloquent and charismatic and classy enough that we could be confused.
/end ego trip No, but we are both Welsh and probably related two-six hundred years ago. I'm a lesser known Daniel Lewis, currently residing on a beautiful tropical island, working as a scripter for the world's biggest bank. I have been programming since the age of 12. I unfortunately don't get to see much of my beautiful island because I work 8-5. I someday hope to reside on a yacht which I have already designed. I am married, and have an exceptionally cute 1 1/2 year old daughter. I enjoy dilbert and other realist comics such as Dennis Leary, but my sense of humor is otherwise dark and bitter. _________________ dd 0x90909090 ; problem solved. |
|||
12 Mar 2008, 22:58 |
|
victor 13 Mar 2008, 01:20
Quote: Dilbert Another comics fan! |
|||
13 Mar 2008, 01:20 |
|
daniel.lewis 13 Mar 2008, 04:57
In my life, the only way I manage to stay marginally sane is to laugh at all the stupidity and irrationality caused by dumb people.
How marginal, is an exercise left up to each of you to decide. |
|||
13 Mar 2008, 04:57 |
|
Goto page Previous 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.