flat assembler
Message board for the users of flat assembler.

Index > Main > fast strlen

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan 08 Feb 2008, 17:44
revolution wrote:
You can always use a macro, which you selectively include just for P4 optimisation. See here where I posted such a macro 3 years ago.

BTW: This is only needed for the P4, not for earlier or later CPU's.

Wrong. P4 and later including.
[Intel® 64 and IA-32 Architectures Optimization Reference Manual] 248966.pdf
Quote:
3.5.1.1 Using of the INC and DEC instructions
INC and DEC should be replaced.

As i said above dependecy on flags should be avoided.

_________________
Any offers?
Post 08 Feb 2008, 17:44
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 08 Feb 2008, 17:52
it's really strange that dependency on flag exists on P4 but not before.
perhaps we can modify it with the microcode?
on later pentiums, there is microcode, it decides of the ececution of instructions in a RISC arch, P4 is a RISC arch with µcode to executes complex instructions, and it can be modified.
this mechanism exists to fix bugs in the µP, if they see a bug after commercialisation, they simply correct it via the µcode.

here is the BIG problem with µP industry, they make changes without consentment of the community.

we are the users, and they are the manufacturer, they shall make it simpler, and no, they say:
replace
inc xxx
by
add xxx,1
Post 08 Feb 2008, 17:52
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20292
Location: In your JS exploiting you and your system
revolution 08 Feb 2008, 18:03
The P4 had a lot of bad things done, but that is in the past now thankfully. And the µcode can't fix it, it is hardwired in the CPU.
Post 08 Feb 2008, 18:03
View user's profile Send private message Visit poster's website Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan 08 Feb 2008, 18:17
Yeah, if performance would be the only thing that suffer in newest CPU... no way, manufacturers make new misatakes in addition to existing and not only performance suffer A microcode reliability update is available that improves the reliability of systems that use Intel processors. Microcode the only medicine.

_________________
Any offers?
Post 08 Feb 2008, 18:17
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 09 Feb 2008, 00:02
Microcode isn't as flexible as you think, edfed... it's not like you can change any and all part of the CPU design, that would require something like a FPGA.
Post 09 Feb 2008, 00:02
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody 17 Feb 2008, 11:29
I did some speed tests:
lstrlen API - 7300 milliseconds
your strlen - 4150 milliseconds
fastest strlen - 3600 milliseconds

The fastest one was one I found in an Intel Pentium manual ( http://www.agner.org/optimize/#manuals ):
Code:
proc strlen,pointer
push ebx
mov eax, [pointer] ; get pointer s
lea edx, [eax+3] ; pointer+3 used in the end
l1: mov ebx, [eax] ; read 4 bytes of string
add eax, 4 ; increment pointer
lea ecx, [ebx-01010101H] ; subtract 1 from each byte
not ebx ; invert all bytes
and ecx, ebx ; and these two
and ecx, 80808080H ; test all sign bits
jz l1 ; no zero bytes, continue loop
mov ebx, ecx
shr ebx, 16
test ecx, 00008080H ; test first two bytes
cmovz ecx, ebx ; shift if not in first 2 bytes
lea ebx, [eax+2] ; .. and increment pointer by 2
cmovz eax, ebx
add cl, cl ; test first byte
sbb eax, edx ; compute length
pop ebx
ret
endp
    


You would think Microsoft would try to optimize their API or something, its very slow
Post 17 Feb 2008, 11:29
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 17 Feb 2008, 11:42
Quote:
You would think Microsoft would try to optimize their API or something, its very slow

no, m$ optimise just their capital, they make some OS, vi$ta is their last soup, if you buy m$, you give them power.
and m$ don't want to be fast, he want to be the king of the world.
cmovz is not supported by earlier pentiums, so this fast strlen is designated for latest µP.

i just think about something:
how to build the function list for the machine at boot?

f:
.null dd 0
.strlen dd strlen1
..
mov eax,[f.strlen]
call eax
..

with this, we can build specific func list.

i don't know how is it for m$, but, it's like this for menuet and i'll make it for my OS and the fasmb project.
Post 17 Feb 2008, 11:42
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20292
Location: In your JS exploiting you and your system
revolution 17 Feb 2008, 11:43
It all depends on how long your strings are. One algorithm will not suit all situations. Short stings require algo A, medium strings require algo B, long string require algo C.
Post 17 Feb 2008, 11:43
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 17 Feb 2008, 11:47
and how do you know the leng of the string?
Post 17 Feb 2008, 11:47
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20292
Location: In your JS exploiting you and your system
revolution 17 Feb 2008, 11:50
edfed wrote:
and how do you know the leng of the string?
That is something you can't know for sure. That is why MS and other standard libraries will put in an algo for what they expect to be the most common cases.

If your particular app always deals with long (or short) strings then you can tune a strlen algo to suit your situation.
Post 17 Feb 2008, 11:50
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody 17 Feb 2008, 18:16
edfed wrote:
Quote:
You would think Microsoft would try to optimize their API or something, its very slow

no, m$ optimise just their capital, they make some OS, vi$ta is their last soup, if you buy m$, you give them power.
and m$ don't want to be fast, he want to be the king of the world.
cmovz is not supported by earlier pentiums, so this fast strlen is designated for latest µP.

i just think about something:
how to build the function list for the machine at boot?

f:
.null dd 0
.strlen dd strlen1
..
mov eax,[f.strlen]
call eax
..

with this, we can build specific func list.

i don't know how is it for m$, but, it's like this for menuet and i'll make it for my OS and the fasmb project.


How far does cmovz go back? Should work on all x86 processors
Post 17 Feb 2008, 18:16
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 17 Feb 2008, 18:19
the first µP to support cmovcc is PII or PIII, before, it doesn't exists and is an invalid opcode.
Post 17 Feb 2008, 18:19
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody 17 Feb 2008, 19:47
edfed wrote:
the first µP to support cmovcc is PII or PIII, before, it doesn't exists and is an invalid opcode.


Hmm..Wikipedia says it was added with "Pentium Pro", which came out in 1995
Post 17 Feb 2008, 19:47
View user's profile Send private message Reply with quote
mattst88



Joined: 12 May 2006
Posts: 260
Location: South Carolina
mattst88 18 Feb 2008, 00:42
itsnobody wrote:
edfed wrote:
the first µP to support cmovcc is PII or PIII, before, it doesn't exists and is an invalid opcode.


Hmm..Wikipedia says it was added with "Pentium Pro", which came out in 1995


This is correct.

_________________
My x86 Instruction Reference -- includes SSE, SSE2, SSE3, SSSE3, SSE4 instructions.
Assembly Programmer's Journal
Post 18 Feb 2008, 00:42
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 19 Feb 2008, 03:03
I think it's only on some PPros, so you really need to check if CPUID is available, then check if CMOV is supported, and then you can use it.
Post 19 Feb 2008, 03:03
View user's profile Send private message Visit poster's website Reply with quote
daniel.lewis



Joined: 28 Jan 2008
Posts: 92
daniel.lewis 12 Mar 2008, 07:07
Heh, I suppose there's probably still a niche.

I stopped caring about 1995 stuff a *decade* ago. In automotive terms, you're designing something amazingly fast for those who's primary mode of transportation is by donkey.

Why wouldn't someone simply upgrade their hardware, if they truly cared one iota about performance?

Considering the volume of persons who travel by donkey, your shaving a camel-hair's width off their stonethrows per fortnight probably isn't worth your expertise?

Just a thought.

_________________
dd 0x90909090 ; problem solved.
Post 12 Mar 2008, 07:07
View user's profile Send private message Reply with quote
victor



Joined: 31 Dec 2005
Posts: 126
Location: Utopia
victor 12 Mar 2008, 09:21
Off topic.

@daniel.lewis: Are you the OSCAR-winning best actor Daniel Day-Lewis? Surprised

Refer to this.
Post 12 Mar 2008, 09:21
View user's profile Send private message Reply with quote
daniel.lewis



Joined: 28 Jan 2008
Posts: 92
daniel.lewis 12 Mar 2008, 22:58
Well no, but the bloke certainly seems eloquent and charismatic and classy enough that we could be confused. Very Happy

/end ego trip

No, but we are both Welsh and probably related two-six hundred years ago.

I'm a lesser known Daniel Lewis, currently residing on a beautiful tropical island, working as a scripter for the world's biggest bank. I have been programming since the age of 12. I unfortunately don't get to see much of my beautiful island because I work 8-5. I someday hope to reside on a yacht which I have already designed. I am married, and have an exceptionally cute 1 1/2 year old daughter.

I enjoy dilbert and other realist comics such as Dennis Leary, but my sense of humor is otherwise dark and bitter.

_________________
dd 0x90909090 ; problem solved.
Post 12 Mar 2008, 22:58
View user's profile Send private message Reply with quote
victor



Joined: 31 Dec 2005
Posts: 126
Location: Utopia
victor 13 Mar 2008, 01:20
Quote:
Dilbert

A cartoon computer worker drawn by Scott Adams , who works in Silicon Valley. The cartoon became so popular he left his day job. The cartoon satirises typical corporate life, especially that which revolves around computers.

Another comics fan! Very Happy
Post 13 Mar 2008, 01:20
View user's profile Send private message Reply with quote
daniel.lewis



Joined: 28 Jan 2008
Posts: 92
daniel.lewis 13 Mar 2008, 04:57
In my life, the only way I manage to stay marginally sane is to laugh at all the stupidity and irrationality caused by dumb people.

How marginal, is an exercise left up to each of you to decide.
Post 13 Mar 2008, 04:57
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.