flat assembler
Message board for the users of flat assembler.
Index
> Main > Null termination? Goto page 1, 2, 3, 4 Next |
Author |
|
revolution 30 Mar 2009, 16:51
Because they are easily scanned, 'just keep loading bytes until you find a zero' is easy to code and easy to understand. Also, if you come in to a comms channel half way within a string you can synchronise by waiting for the next zero and the start processing future information.
I think the use is now mostly traditional. Old habits die hard. |
|||
30 Mar 2009, 16:51 |
|
Azu 30 Mar 2009, 16:56
revolution wrote: Because they are easily scanned, 'just keep loading bytes until you find a zero' is easy to code and easy to understand. mov ecx,[esi] rep movsd Easier then @@: mov eax,[esi] test eax,eax jz @f movsd jmp @b @@: ??? revolution wrote: Also, if you come in to a comms channel half way within a string you can synchronise by waiting for the next zero and the start processing future information. |
|||
30 Mar 2009, 16:56 |
|
buzzkill 30 Mar 2009, 16:59
Personally, I really prefer C-strings (null-terminated) over Pascal-strings (length byte). For one thing, you're not limited to any arbitrary amount of characters, and also I think they're faster to work with, ie. "just go on until you hit a 0" is easier than "read 1 byte, then read that many following bytes" (the latter would require an extra helper variable).
The Linux write syscall for instance doesn't use null-terminated strings, and so one of the first things I do is create a wrapper around it that does... Also, it's handy (if you're on a *nix platform) that you can interface with libc, which does of course use C-strings (and as far as I know, so does most (library)code). |
|||
30 Mar 2009, 16:59 |
|
Azu 30 Mar 2009, 17:04
buzzkill wrote: Personally, I really prefer C-strings (null-terminated) over Pascal-strings (length byte). For one thing, you're not limited to any arbitrary amount of characters buzzkill wrote: , and also I think they're faster to work with, ie. "just go on until you hit a 0" is easier than "read 1 byte, then read that many following bytes" (the latter would require an extra helper variable). buzzkill wrote: Also, it's handy (if you're on a *nix platform) that you can interface with libc, which does of course use C-strings (and as far as I know, so does most (library)code). Azu wrote: (besides interacting with other code that uses them for no reason) Last edited by Azu on 30 Mar 2009, 17:04; edited 1 time in total |
|||
30 Mar 2009, 17:04 |
|
revolution 30 Mar 2009, 17:04
Azu wrote:
Azu wrote:
|
|||
30 Mar 2009, 17:04 |
|
Azu 30 Mar 2009, 17:07
revolution wrote:
It seems to me like it's easier/faster/smaller to make code that uses prepended length strings rather then null terminated strings. In all cases I could think of. I think that I am missing something or else C wouldn't have been made to use null termination. |
|||
30 Mar 2009, 17:07 |
|
revolution 30 Mar 2009, 17:08
Azu wrote: Isn't it slower since you have to keep iterating through the whole string just to find how long it is? |
|||
30 Mar 2009, 17:08 |
|
Azu 30 Mar 2009, 17:10
revolution wrote:
Last edited by Azu on 30 Mar 2009, 17:10; edited 1 time in total |
|||
30 Mar 2009, 17:10 |
|
revolution 30 Mar 2009, 17:10
Probably C started from the old old processors where registers were a limited resource. The extra loop counter may have been a problem? I don't expect anyone really knows but it doesn't matter if you write your own code, you can do what suits you best.
|
|||
30 Mar 2009, 17:10 |
|
Azu 30 Mar 2009, 17:12
revolution wrote: I don't expect anyone really knows but it doesn't matter if you write your own code, you can do what suits you best. revolution wrote: Probably C started from the old old processor where registers were a limited resource. The extra loop counter may have been a problem? |
|||
30 Mar 2009, 17:12 |
|
revolution 30 Mar 2009, 17:16
Performance metrics are very application specific, I doubt that you can find any sensible advice that can work in everyone's situation.
What is your usage model? How many strings are you dealing with each microsecond in your application? How much of the percent processor time will be spent dealing with strings? What string operations do you do mostly? These are the sort of Q's you need to ask yourself in order to decide what will be the best solution. |
|||
30 Mar 2009, 17:16 |
|
Azu 30 Mar 2009, 17:18
revolution wrote: Performance metrics are very application specific, I doubt that you can find any sensible advice that can work in everyone's situation. Which? P.S. the operations would mainly be scanning, copying, or combinations of the two. Last edited by Azu on 30 Mar 2009, 17:19; edited 1 time in total |
|||
30 Mar 2009, 17:18 |
|
buzzkill 30 Mar 2009, 17:19
It may also matter whether you want your strings mutable or immutable (like some newer HLLs). If you eg add to your string, you have to go back to the length byte/word and change that. With a null-terminated string, your algorithms/functions don't change, it's still "keep going until 0".
|
|||
30 Mar 2009, 17:19 |
|
revolution 30 Mar 2009, 17:20
Azu wrote: Which? |
|||
30 Mar 2009, 17:20 |
|
Azu 30 Mar 2009, 17:20
buzzkill wrote: It may also matter whether you want your strings mutable or immutable (like some newer HLLs). If you eg add to your string, you have to go back to the length byte/word and change that. With a null-terminated string, your algorithms/functions don't change, it's still "keep going until 0". revolution wrote:
|
|||
30 Mar 2009, 17:20 |
|
buzzkill 30 Mar 2009, 17:23
This just occurred to me: performance-wise, if you calculate the length of a C-string once, the entire string will be in your (L1) cache, so future operations on the string could be sped up. Is this a real advantage, or am I just onto nothing here?
|
|||
30 Mar 2009, 17:23 |
|
Azu 30 Mar 2009, 17:24
buzzkill wrote: This just occurred to me: performance-wise, if you calculate the length of a C-string once, the entire string will be in your (L1) cache, so future operations on the string could be sped up. Is this a real advantage, or am I just onto nothing here? And if the L1 cache is to valuable to store a length in it isn't it to valuable to store a string in it, anyways? |
|||
30 Mar 2009, 17:24 |
|
buzzkill 30 Mar 2009, 17:32
Quote:
No, with Pascal-strings you have to write in two places: behind the original string, and before it to the length byte, and with C-strings you only write the part behind the original string. Now, with Pascal-strings you could build in more safety with accessing the string like an array, because you could check the subscript with the length byte (ie, string[10] is allowed only if length(string) >= 10 if you count your subscripts from 1, which is I believe the Pascal-way). But since C has always been about speed and control, and not so much safety, I would guess that the C inventors chose the null-terminated string for those reasons (though I have no literature at hand to back that up). |
|||
30 Mar 2009, 17:32 |
|
Azu 30 Mar 2009, 17:35
buzzkill wrote:
I meant writing the length of the string to the beginning, instead of writing a null to the end and not being able to use nulls in the string. I think it would make comparisons and scanning and copying faster. I just want to know if I'm missing anything (besides it taking a register). Last edited by Azu on 30 Mar 2009, 17:37; edited 1 time in total |
|||
30 Mar 2009, 17:35 |
|
Goto page 1, 2, 3, 4 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.