flat assembler
Message board for the users of flat assembler.

Index > Main > Null termination?

Goto page 1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
Are there ANY scenarios (besides interacting with other code that uses them for no reason) where null terminated strings > prepended length strings?

The only advantage I can think of is saving a byte on strings over 255 bytes long. But unless you have lots of these, the smaller amount of code needed for prepended length strings easily compensates.

There must be SOME reason that popular HLLs like C use them, right??
Post 30 Mar 2009, 16:43
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17254
Location: In your JS exploiting you and your system
revolution
Because they are easily scanned, 'just keep loading bytes until you find a zero' is easy to code and easy to understand. Also, if you come in to a comms channel half way within a string you can synchronise by waiting for the next zero and the start processing future information.

I think the use is now mostly traditional. Old habits die hard.
Post 30 Mar 2009, 16:51
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
revolution wrote:
Because they are easily scanned, 'just keep loading bytes until you find a zero' is easy to code and easy to understand.
Isn't

mov ecx,[esi]
rep movsd


Easier then

@@:
mov eax,[esi]
test eax,eax
jz @f
movsd
jmp @b
@@:

???



revolution wrote:
Also, if you come in to a comms channel half way within a string you can synchronise by waiting for the next zero and the start processing future information.
You could just waste one more byte and put a null in front of the length..
Post 30 Mar 2009, 16:56
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
buzzkill



Joined: 15 Mar 2009
Posts: 111
Location: the nether lands
buzzkill
Personally, I really prefer C-strings (null-terminated) over Pascal-strings (length byte). For one thing, you're not limited to any arbitrary amount of characters, and also I think they're faster to work with, ie. "just go on until you hit a 0" is easier than "read 1 byte, then read that many following bytes" (the latter would require an extra helper variable).
The Linux write syscall for instance doesn't use null-terminated strings, and so one of the first things I do is create a wrapper around it that does... Also, it's handy (if you're on a *nix platform) that you can interface with libc, which does of course use C-strings (and as far as I know, so does most (library)code).
Post 30 Mar 2009, 16:59
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
buzzkill wrote:
Personally, I really prefer C-strings (null-terminated) over Pascal-strings (length byte). For one thing, you're not limited to any arbitrary amount of characters
You could just reserve 1 bit of the length byte for indicating if there is another length byte.. then there is no limit to size.


buzzkill wrote:
, and also I think they're faster to work with, ie. "just go on until you hit a 0" is easier than "read 1 byte, then read that many following bytes" (the latter would require an extra helper variable).
The Linux write syscall for instance doesn't use null-terminated strings, and so one of the first things I do is create a wrapper around it that does...
Isn't it slower since you have to keep iterating through the whole string just to find how long it is?

buzzkill wrote:
Also, it's handy (if you're on a *nix platform) that you can interface with libc, which does of course use C-strings (and as far as I know, so does most (library)code).
Azu wrote:
(besides interacting with other code that uses them for no reason)


Last edited by Azu on 30 Mar 2009, 17:04; edited 1 time in total
Post 30 Mar 2009, 17:04
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17254
Location: In your JS exploiting you and your system
revolution
Azu wrote:
revolution wrote:
Because they are easily scanned, 'just keep loading bytes until you find a zero' is easy to code and easy to understand.
Isn't

mov ecx,[esi]
rep movsd


Easier then

@@:
mov eax,[esi]
test eax,eax
jz @f
movsd
jmp @b
@@:

???
But you have to remember that most C functions used are scan type functions like printf, so the scanning is made simpler without the need of a loop counter. And things like movsd are completely non-sensical to C programmers.
Azu wrote:
revolution wrote:
Also, if you come in to a comms channel half way within a string you can synchronise by waiting for the next zero and the start processing future information.
You could just waste one more byte and put a null in front of the length..
Sure there are many ways to do that but if the null byte serves double duty then a lot of programmers of old would have preferred it.
Post 30 Mar 2009, 17:04
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
revolution wrote:
Azu wrote:
revolution wrote:
Because they are easily scanned, 'just keep loading bytes until you find a zero' is easy to code and easy to understand.
Isn't

mov ecx,[esi]
rep movsd


Easier then

@@:
mov eax,[esi]
test eax,eax
jz @f
movsd
jmp @b
@@:

???
But you have to remember that most C functions used are scan type functions like printf, so the scanning is made simpler without the need of a loop counter. And things like movsd are completely non-sensical to C programmers.
It was just an example. And I meant for in ASM. I just used C as an example of a popular language that uses null termination, as why I think there must be a reason for it.

It seems to me like it's easier/faster/smaller to make code that uses prepended length strings rather then null terminated strings. In all cases I could think of. Confused
I think that I am missing something or else C wouldn't have been made to use null termination.
Post 30 Mar 2009, 17:07
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17254
Location: In your JS exploiting you and your system
revolution
Azu wrote:
Isn't it slower since you have to keep iterating through the whole string just to find how long it is?
Yes, but it depends upon your usage model. If you are doing many string length calls then a length prefix may be best in your case. If you do mostly scanning and an extra register to hold the length becomes a problem the null termination may be best. But what about a hybrid approach, have both a length prefix and a null terminator.
Post 30 Mar 2009, 17:08
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
revolution wrote:
Azu wrote:
Isn't it slower since you have to keep iterating through the whole string just to find how long it is?
Yes, but it depends upon your usage model. If you are doing many string length calls then a length prefix may be best in your case. If you do mostly scanning and an extra register to hold the length becomes a problem the null termination may be best. But what about a hybrid approach, have both a length prefix and a null terminator.
Okay.. but isn't it faster to test a register then a memory location? I really think there's something that I overlooked here.


Last edited by Azu on 30 Mar 2009, 17:10; edited 1 time in total
Post 30 Mar 2009, 17:10
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17254
Location: In your JS exploiting you and your system
revolution
Probably C started from the old old processors where registers were a limited resource. The extra loop counter may have been a problem? I don't expect anyone really knows but it doesn't matter if you write your own code, you can do what suits you best.
Post 30 Mar 2009, 17:10
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
revolution wrote:
I don't expect anyone really knows but it doesn't matter if you write your own code, you can do what suits you best.
I'm trying to decide which to use and want to make an educated decision. All the discussions I could find were mainly about which is more easier to secure or more compatible or something else, rather then which is better performing.



revolution wrote:
Probably C started from the old old processor where registers were a limited resource. The extra loop counter may have been a problem?
Okay. Thanks. I guess I'll just go with prepended length if that's the only reason not to.
Post 30 Mar 2009, 17:12
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17254
Location: In your JS exploiting you and your system
revolution
Performance metrics are very application specific, I doubt that you can find any sensible advice that can work in everyone's situation.

What is your usage model? How many strings are you dealing with each microsecond in your application? How much of the percent processor time will be spent dealing with strings? What string operations do you do mostly?

These are the sort of Q's you need to ask yourself in order to decide what will be the best solution.
Post 30 Mar 2009, 17:16
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
revolution wrote:
Performance metrics are very application specific, I doubt that you can find any sensible advice that can work in everyone's situation.

What is your usage model? How many strings are you dealing with each microsecond in your application? How much of the percent processor time will be spent dealing with strings? What string operations do you do mostly?

These are the sort of Q's you need to ask yourself in order to decide what will be the best solution.
So basically one of them scales better under heavy load then the other?

Which?



P.S. the operations would mainly be scanning, copying, or combinations of the two.


Last edited by Azu on 30 Mar 2009, 17:19; edited 1 time in total
Post 30 Mar 2009, 17:18
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
buzzkill



Joined: 15 Mar 2009
Posts: 111
Location: the nether lands
buzzkill
It may also matter whether you want your strings mutable or immutable (like some newer HLLs). If you eg add to your string, you have to go back to the length byte/word and change that. With a null-terminated string, your algorithms/functions don't change, it's still "keep going until 0".
Post 30 Mar 2009, 17:19
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17254
Location: In your JS exploiting you and your system
revolution
Azu wrote:
Which?
Only you can know the answer to that. You have to define your usage model first. Even simple things like the average length of strings will affect the answer.
Post 30 Mar 2009, 17:20
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
buzzkill wrote:
It may also matter whether you want your strings mutable or immutable (like some newer HLLs). If you eg add to your string, you have to go back to the length byte/word and change that. With a null-terminated string, your algorithms/functions don't change, it's still "keep going until 0".
Doesn't it take just as much work to write a 0 to the end as it does to write a number to the beginning?


revolution wrote:
Azu wrote:
Which?
Only you can know the answer to that. You have to define your usage model first. Even simple things like the average length of strings will affect the answer.
Heavy load with lots of string operations taking up CPU time. Variable string sizes and amounts of strings.
Post 30 Mar 2009, 17:20
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
buzzkill



Joined: 15 Mar 2009
Posts: 111
Location: the nether lands
buzzkill
This just occurred to me: performance-wise, if you calculate the length of a C-string once, the entire string will be in your (L1) cache, so future operations on the string could be sped up. Is this a real advantage, or am I just onto nothing here?
Post 30 Mar 2009, 17:23
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
buzzkill wrote:
This just occurred to me: performance-wise, if you calculate the length of a C-string once, the entire string will be in your (L1) cache, so future operations on the string could be sped up. Is this a real advantage, or am I just onto nothing here?
If you did it twice in a row (little or nothing in between).. but still slower on the first pass, right?


And if the L1 cache is to valuable to store a length in it isn't it to valuable to store a string in it, anyways?
Post 30 Mar 2009, 17:24
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
buzzkill



Joined: 15 Mar 2009
Posts: 111
Location: the nether lands
buzzkill
Quote:

Doesn't it take just as much work to write a 0 to the end as it does to write a number to the beginning?

No, with Pascal-strings you have to write in two places: behind the original string, and before it to the length byte, and with C-strings you only write the part behind the original string.

Now, with Pascal-strings you could build in more safety with accessing the string like an array, because you could check the subscript with the length byte (ie, string[10] is allowed only if length(string) >= 10 if you count your subscripts from 1, which is I believe the Pascal-way). But since C has always been about speed and control, and not so much safety, I would guess that the C inventors chose the null-terminated string for those reasons (though I have no literature at hand to back that up).
Post 30 Mar 2009, 17:32
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1160
Azu
buzzkill wrote:
Quote:

Doesn't it take just as much work to write a 0 to the end as it does to write a number to the beginning?

No, with Pascal-strings you have to write in two places: behind the original string, and before it to the length byte, and with C-strings you only write the part behind the original string.

Now, with Pascal-strings you could build in more safety with accessing the string like an array, because you could check the subscript with the length byte (ie, string[10] is allowed only if length(string) >= 10 if you count your subscripts from 1, which is I believe the Pascal-way). But since C has always been about speed and control, and not so much safety, I would guess that the C inventors chose the null-terminated string for those reasons (though I have no literature at hand to back that up).
I don't know about Pascal strings.
I meant writing the length of the string to the beginning, instead of writing a null to the end and not being able to use nulls in the string.
I think it would make comparisons and scanning and copying faster. I just want to know if I'm missing anything (besides it taking a register).


Last edited by Azu on 30 Mar 2009, 17:37; edited 1 time in total
Post 30 Mar 2009, 17:35
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.