flat assembler
Message board for the users of flat assembler.

Index > Windows > What is faster? calling lstrcat or using only mov and add?

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY
What is faster?

Code:
invoke lstrcpy, buffer, 'This'
invoke lstrcat, buffer, ' is '
invoke lstrcat, buffer, 'a te'
invoke lstrcat, buffer, 'st!'
    


or

Code:
mov edi,buffer
mov dword[edi],'This'
add edi,4
mov dword[edi],' is '
add edi,4
mov dword[edi],'a te'
add edi,4
mov dword[edi],'st!'
    


Of course this is a dumb example. But I think the second one is faster, right?
Post 01 Feb 2008, 02:19
View user's profile Send private message Reply with quote
System86



Joined: 15 Aug 2007
Posts: 77
System86
The second one, because it doesn't have the overhead of a Win32 function call. In general, using your own assembly code is faster than using standard library.
Post 01 Feb 2008, 02:25
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Why the "add edi, 4" when you can add a static offset? Smile

The question is, of course, "it depends". If you're concatenating static strings, why would you use lstrcat anyway? You know the string size(s) and can use memcpy instead.

And of course the same-old applies: unless you have very specific needs, speed doesn't really matter.
Post 01 Feb 2008, 11:04
View user's profile Send private message Visit poster's website Reply with quote
calpol2004



Joined: 16 Dec 2004
Posts: 110
calpol2004
Quote:

invoke lstrcpy, buffer, 'This'
invoke lstrcat, buffer, ' is '
invoke lstrcat, buffer, 'a te'
invoke lstrcat, buffer, 'st!'


invoke is just macro. before fasm assembles your code it will turn those invokes into a bunch of push instructions then a call to the procedure. Even the pushing of all those values onto the stack takes comparable time to the second method and windows functions aren't exactly 100% efficient Confused.
Post 01 Feb 2008, 11:56
View user's profile Send private message MSN Messenger Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
re: calpol
I've looked at the Win XP 64bit kernel functions and have found that 99% of them are very well optimized. Whether this holds true for the 32bit kernel as well, I can't be sure.
But using the fact that Windows is a bloated OS to justify inferring it's library functions are inefficient is a fallacy.

Decompile the Win XP/2003 64bit system dlls the RtlMoveMemory function was especially impressive to me, because of the sheer amount of optimization for almost all cases (doing a better job without inflating the size of the function would be close to impossible).

re: OzzY
You can easily create a simple benchmarking program to answer these questions for you in the future. Use GetTickCount or RDTSC on both ends of a loop with your test code in the middle and see how long each version of the test code takes to run. Make the program a console application and use MSVCRT32.dll's printf function to output the resulting time and your done. NOTE: if you use two loops make sure you make the labels similary aligned.
Code:
call [GetTickCount] ;used for simplicity of example RDTSC is better
push eax
mov ecx,7FFFFFFh ;number of iterations
call test1
call [GetTickCount]
pop edx
sub eax,edx
cinvoke printf, <">test1 %d">, eax
call [GetTickCount] ;used for simplicity of example RDTSC is better
push eax
mov ecx,7FFFFFFh
call test2
call [GetTickCount]
pop edx
sub eax,edx
cinvoke printf, <">test2 %d">, eax
ret 0

;; align
align 16
;;
test1:
;;;;;test code 1
dec ecx
jnz test1
ret 0

;; align
align 16
;;
test2:
;;;;;test code 2
dec ecx
jnz test2
ret 0
    
Post 01 Feb 2008, 16:20
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
32-bit windows versions (at least pre-vista, haven't looked at that) aren't something to write home about - but again, unless you have specific Needs For Speed, the routines will be just fine.
Post 01 Feb 2008, 16:26
View user's profile Send private message Visit poster's website Reply with quote
System86



Joined: 15 Aug 2007
Posts: 77
System86
Still, there is the overhead of pushing the values on the stack, calling the Win32 function, having the function (optimized or not) do the work, and returning back to your program. The second example has just the actual work being done and avoids the other overhead.
Post 01 Feb 2008, 18:39
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
System86 wrote:
Still, there is the overhead of pushing the values on the stack, calling the Win32 function, having the function (optimized or not) do the work, and returning back to your program. The second example has just the actual work being done and avoids the other overhead.


I think the "push/call" tandem is optimized in the cpu (Pentium or newer), so it's not inherently that slow.
Post 01 Feb 2008, 20:33
View user's profile Send private message Visit poster's website Reply with quote
Raedwulf



Joined: 13 Jul 2005
Posts: 375
Location: United Kingdom
Raedwulf
But still slower!
Post 02 Feb 2008, 08:18
View user's profile Send private message MSN Messenger Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
Wow, I just did a speed test, using direct instructions over Windows API comes up to be around 30 times faster on my Vista notebook

Comes up to be around 100 milliseconds using the direct instructions and around 3100 milliseconds using the Windows API (ofcourse I had to loop it around 10 million times to see any time difference)

The Windows Developers really have to optimize their code...even the one call ( invoke lstrcpy, buffer, 'This is a test!') comes up to be 6 times slower than all the direct mov and add instructions


Last edited by itsnobody on 03 Feb 2008, 14:22; edited 1 time in total
Post 02 Feb 2008, 11:49
View user's profile Send private message Reply with quote
Goplat



Joined: 15 Sep 2006
Posts: 181
Goplat
It's not Microsoft's fault that lstrcpy is a lot slower than direct movs. It couldn't be any other way. lstrcpy has to be slow because:
  • It's a function so it has call/ret overhead (and since it's in a DLL, that's an indirect call)
  • It must be able to work for any size string
  • It must be able to work for any string contents
  • It uses an exception handler to trap and ignore page faults (stupid, but this behavior can't be changed now; it's documented so there are almost certainly programs out there that rely on it).
Post 03 Feb 2008, 03:06
View user's profile Send private message Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
Goplat wrote:
It's not Microsoft's fault that lstrcpy is a lot slower than direct movs. It couldn't be any other way. lstrcpy has to be slow because:
  • It's a function so it has call/ret overhead (and since it's in a DLL, that's an indirect call)
  • It must be able to work for any size string
  • It must be able to work for any string contents
  • It uses an exception handler to trap and ignore page faults (stupid, but this behavior can't be changed now; it's documented so there are almost certainly programs out there that rely on it).


That's true

But what I'm wondering is if you make your own custom strcopy and strcat functions if it'll be faster than the supposed optimized Windows API
Post 03 Feb 2008, 17:12
View user's profile Send private message Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
Hmm

I just quickly made a (what I thought was) a slow strlen function, and even it benchmarks around 40% faster than the Windows API lstrlen function
Post 03 Feb 2008, 19:00
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
itsnobody wrote:
Comes up to be around 100 milliseconds using the direct instructions and around 3100 milliseconds using the Windows API (ofcourse I had to loop it around 10 million times to see any time difference)

I've just highlighted the main point of this whole thing. Think about it for a while.

itsnobody wrote:
The Windows Developers really have to optimize their code...even the one call ( invoke lstrcpy, buffer, 'This is a test!') comes up to be 6 times slower than all the direct mov and add instructions

And why do they have to do that, keeping the highlighted text from above in mind? Smile

Yes, you can gain some real-world measurable speed if you're doing a lot of manipulating working with extremely large strings. But if you aren't, you can't, and you're wasting your time.

_________________
Image - carpe noctem
Post 03 Feb 2008, 23:35
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
f0dder wrote:
itsnobody wrote:
Comes up to be around 100 milliseconds using the direct instructions and around 3100 milliseconds using the Windows API (ofcourse I had to loop it around 10 million times to see any time difference)

I've just highlighted the main point of this whole thing. Think about it for a while.

Right....if you do it just once you won't see any difference obviously, because it happens too fast

This is true for virtually any and everything

Quote:

itsnobody wrote:
The Windows Developers really have to optimize their code...even the one call ( invoke lstrcpy, buffer, 'This is a test!') comes up to be 6 times slower than all the direct mov and add instructions

And why do they have to do that, keeping the highlighted text from above in mind? Smile

Yes, you can gain some real-world measurable speed if you're doing a lot of manipulating working with extremely large strings. But if you aren't, you can't, and you're wasting your time.


So then why use Assembly? Why not use Java or Visual Basic if speed doesn't matter?

Ofcourse it matters a lot more than people think and is VERY significant, all of that time will add up, right now for instance there are lots of programs running that constantly call lstrlen and other functions, but if you're writing a program and only use it once or twice you won't see much difference

For people who want the very fastest and for people interested in optimizing code it does matter
Post 04 Feb 2008, 03:59
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
itsnobody wrote:
So then why use Assembly? Why not use Java or Visual Basic if speed doesn't matter?

Indeed, why not? Smile

If you're writing a database frontend, you might as well write it in VB and save yourself a lot of trouble. Assembly wouldn't gain you anything, except perhaps personal satisfaction and wanking rights.

itsnobody wrote:
Ofcourse it matters a lot more than people think and is VERY significant, all of that time will add up, right now for instance there are lots of programs running that constantly call lstrlen and other functions, but if you're writing a program and only use it once or twice you won't see much difference

And for basically all those apps that "that constantly call lstrlen and other functions", you wouldn't be able to measure any improvement. Now if we're talking about very specific applications, like parsing several gigabytes of httpd logs, the situation is entirely different.

itsnobody wrote:
For people who want the very fastest and for people interested in optimizing code it does matter

Then it matters for wanking rights, and not much else.

Optimizing strlen, strcat (etc.) is focusing on a wrong spot, anyway. Why "push string; call strlen" when you can use smarter strings and "mov eax, [string.length]"?

_________________
Image - carpe noctem
Post 04 Feb 2008, 11:59
View user's profile Send private message Visit poster's website Reply with quote
calpol2004



Joined: 16 Dec 2004
Posts: 110
calpol2004
r22 wrote:
re: calpol
I've looked at the Win XP 64bit kernel functions and have found that 99% of them are very well optimized. Whether this holds true for the 32bit kernel as well, I can't be sure.
But using the fact that Windows is a bloated OS to justify inferring it's library functions are inefficient is a fallacy.

Decompile the Win XP/2003 64bit system dlls the RtlMoveMemory function was especially impressive to me, because of the sheer amount of optimization for almost all cases (doing a better job without inflating the size of the function would be close to impossible
[/code]


Calm down. I only said they weren't 100% efficient, I didn't say they were horrible. I doubt they used assembly so there are probably some very minor innefficiencies, but that's a matter of a dozen or so cpu cycles.

Most functions have to deal with a large range inputs and have parse/interpret them so a home brewed specialized routine would be faster even if the librarys are especially good. The main reason i was getting onto to not to use the library (in this case) unless you need to is because you need push the arguments onto the stack (64bit Win uses fastcall now?) and jump to the routine. Which for the given example which could be executed in a few hundred cycles, a call to the api could easily half the speed.
Post 04 Feb 2008, 14:14
View user's profile Send private message MSN Messenger Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
f0dder wrote:
itsnobody wrote:
So then why use Assembly? Why not use Java or Visual Basic if speed doesn't matter?

Indeed, why not? Smile

If you're writing a database frontend, you might as well write it in VB and save yourself a lot of trouble. Assembly wouldn't gain you anything, except perhaps personal satisfaction and wanking rights.

Well that's true for some simple apps or apps that don't require any real speed

But I think that's the problem with programmers, they don't care about speed anymore

Quote:

itsnobody wrote:
Ofcourse it matters a lot more than people think and is VERY significant, all of that time will add up, right now for instance there are lots of programs running that constantly call lstrlen and other functions, but if you're writing a program and only use it once or twice you won't see much difference

And for basically all those apps that "that constantly call lstrlen and other functions", you wouldn't be able to measure any improvement. Now if we're talking about very specific applications, like parsing several gigabytes of httpd logs, the situation is entirely different.

itsnobody wrote:
For people who want the very fastest and for people interested in optimizing code it does matter

Then it matters for wanking rights, and not much else.

Optimizing strlen, strcat (etc.) is focusing on a wrong spot, anyway. Why "push string; call strlen" when you can use smarter strings and "mov eax, [string.length]"?

Well lots of applications right now running are constantly calling some type of strlen function, like for instance a web browser or text editor or many other applications...all that wasted time adds up from milliseconds to seconds to lag...it would be noticable if all applications used the fastest strlen and string functions...if all applications used the fastest super-optimized functions it would be a lot faster
Post 04 Feb 2008, 17:04
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4242
Location: 2018
edfed
the faster is to do nothing.
Post 04 Feb 2008, 17:41
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
itsnobody wrote:
f0dder wrote:
itsnobody wrote:
So then why use Assembly? Why not use Java or Visual Basic if speed doesn't matter?

Indeed, why not? Smile

If you're writing a database frontend, you might as well write it in VB and save yourself a lot of trouble. Assembly wouldn't gain you anything, except perhaps personal satisfaction and wanking rights.

Well that's true for some simple apps or apps that don't require any real speed


Which is most applications that most regular users will be using.

itsnobody wrote:
But I think that's the problem with programmers, they don't care about speed anymore


A lot of people don't have it as their prime design criteria. Personally I value correctness/robustness and ease-of-use over speed, for most stuff.

itsnobody wrote:
Well lots of applications right now running are constantly calling some type of strlen function, like for instance a web browser or text editor or many other applications...


Do they, now? Where's your data to back up this claim? Smile

itsnobody wrote:
all that wasted time adds up from milliseconds to seconds to lag...it would be noticable if all applications used the fastest strlen and string functions...


Oh really?

In the real world, we profile applications to detect bottlenecks, so we know where to put our optimization efforts.

itsnobody wrote:
if all applications used the fastest super-optimized functions it would be a lot faster


Not really.

Fisrt of all, you should stay away from most of the libc str* functions, as they are inherently unsafe. Next, one size doesn't fit all, and if you try to cover all cases, you end up with franken-functions with lots of (slow) branches.

_________________
Image - carpe noctem
Post 05 Feb 2008, 01:51
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.