flat assembler
Message board for the users of flat assembler.
Index
> Main > Parsing strings Goto page 1, 2 Next |
Author |
|
typedef 11 Apr 2011, 19:51
HINT: check for nulls
|
|||
11 Apr 2011, 19:51 |
|
MattDiesel 11 Apr 2011, 20:00
Is that a hint as to how to do it, or a hint in general? There shouldn't be any nulls in there.
As an example, this would be the legendary statement in a simple programming language: Code: print "Hello, World!" ouput of the lexer: Code: Token: Identifier ('print') Token: String ("Hello, World!") In that case the string is 13 characters long. The lexer sees a " and sets the state to string, which then deals with all sorts of escape codes and characters before ending the string. But how to allocate the space which is not known in advance and could be any length? |
|||
11 Apr 2011, 20:00 |
|
vid 11 Apr 2011, 20:06
Well, unless you use some library which already does this for you, you need to do it the hard way: Estimate the size, read to the buffer, if buffer is full then grow (reallocate) it and go on... reallocation can be relatively expensive, especially on a large block, but I wouldn't worry about this unless you plan parsing over megabytes of strings.
|
|||
11 Apr 2011, 20:06 |
|
typedef 11 Apr 2011, 20:07
No need to allocate. Just set a pointer to a blank sapce...on top of other variable ofcourse incase you overwrite them
Just make sure you add it after you have added other variables because you may not know how long it is. It could cause buffer overflow. So SUB ESP,4 ; dword etc SUB ESP,8 ; another dword etc SUB ESP,12 ; pointer to string. |
|||
11 Apr 2011, 20:07 |
|
MattDiesel 11 Apr 2011, 20:12
I doubt it will be megabytes... But I suppose thats up to the user
What is a good size? I suppose I need to find a point where allocation is still trivial but is big enough to minimise the amount of allocations overall... I'll run a few tests and see what happens. |
|||
11 Apr 2011, 20:12 |
|
typedef 11 Apr 2011, 20:14
Assume the all standard size 1 KB (1024 ) LOL
Hey, Even winsock uses that as initial buffer size.. |
|||
11 Apr 2011, 20:14 |
|
MattDiesel 11 Apr 2011, 20:25
Having come from a high level background... Allocating a fixed number always looks ugly to me.
Looking at what I need, I'm going to go with 256 (1024 is a bit ott for what I need), but I'm going to keep it as a constant at the top of the lexer so changing it later is trivial. I'll also have a look at using the stack pointer like you said typedef. |
|||
11 Apr 2011, 20:25 |
|
JohnFound 11 Apr 2011, 22:21
If I understood you correctly, you want to reach the end of the string and just then to copy it somewhere else - right?
In this case you need to support arbitrary sizes of the buffer - because the string can have arbitrary size. If you need really optimal solution, I would suggest you to use arbitrary count of buffers instead. When the one buffer ends, just allocate new one, read the chunk of the file and continue. In the same time you have to use double pointer to point to the begin and to the end of the string: [buffer number] and [buffer offset]. When you locate the end of the string, you can allocate needed memory, copy the string and then free all buffers except the last one. Then the algorithm loops from begin. You can keep the pointer to the buffers in dynamic array - it will be small enough to not cost you much for re-allocations of even to hold all pointers without reallocation at all. 4096bytes array can address 1024 buffers that is (if you allocate 4k for each) 4Megabytes before you need to resize it. |
|||
11 Apr 2011, 22:21 |
|
MattDiesel 12 Apr 2011, 08:43
I have it all working, but that's a good idea thanks JohnFound... If the buffer size is fixed then I could almost have a singly linked list of buffers with just a pointer then the buffer, so no need for the dynamic array, just a pointer to the first buffer. I'll certainly keep that in mind when I need to deal with bigger memory blocks.
There is a slight misunderstanding in that I won't be copying the string (if I can help it). Once it's in memory as a string I'll just be using a pointer to it, which will be stored in the token and given to the parser... So by doing it without the list I save a single allocation at the end, so doing it the other way would be better for small strings (e.g. less than my arbitrary buffer size). |
|||
12 Apr 2011, 08:43 |
|
vid 12 Apr 2011, 10:23
JohnFounds idea is better performance-wise, but the string in such representation is of course harder to work with. You must decide what's better for you.
As for growing buffer, usual way is to start with size "good enough for most", and then on each growing double the size (eg. 256 -> 512 -> 1024 -> 2048, etc.). |
|||
12 Apr 2011, 10:23 |
|
MattDiesel 12 Apr 2011, 12:16
At the moment I've gone for 256 as the initial size (good enough for 99.9% of uses) and then it keeps going up by 256... Doubling would probably be faster and easier than adding (just left shift by 1).
|
|||
12 Apr 2011, 12:16 |
|
JohnFound 12 Apr 2011, 12:53
I made some research about memory reallocation strategies. Additive grow is really very, very slow. You have to do it by multiplication.
IMHO, x1.5 is probably the best multiplier for most of my needs. You can do it for example with: Code: lea ecx, [ecx+2*ecx+256] shr ecx, 1 It will make the series: 0, 128, 320, 608, 1040, 1688, etc. The additive constant serves to make the series grow faster on small sizes. You can vary this behavior changing the constant in the brackets. It must be > 2 in order to exclude hanging if incidentally ecx=0 on the input. |
|||
12 Apr 2011, 12:53 |
|
revolution 12 Apr 2011, 14:11
JohnFound wrote: Additive grow is really very, very slow. You have to do it by multiplication. |
|||
12 Apr 2011, 14:11 |
|
typedef 12 Apr 2011, 18:29
hmmm.... i smell buffer overflows lol.
cant you use physical files? or mapped files? write the buffer to a file and do get file size the allocate that amount? kinda works but slow.....good security wise too |
|||
12 Apr 2011, 18:29 |
|
MattDiesel 12 Apr 2011, 21:24
Theres no chance of buffer overflows Don't worry about that.
I'm going stick with what I'm doing now, which is: 1) Allocate an initial buffer that depends on type of content (256*sizeof.TCHAR for strings), and set an initial counter to -1 2) Add the characters to the buffer until the counter = size, then do a simple shl on the size and reallocate to that size. Thanks for all the help though. Testing all the alternatives has taught me a lot. Mat |
|||
12 Apr 2011, 21:24 |
|
vid 12 Apr 2011, 23:54
Quote: hmmm.... i smell buffer overflows lol. Depends on who does the coding |
|||
12 Apr 2011, 23:54 |
|
typedef 13 Apr 2011, 00:57
vid wrote:
I'll kill you |
|||
13 Apr 2011, 00:57 |
|
vid 13 Apr 2011, 01:05
Quote: I'll kill you depends on who does the killing |
|||
13 Apr 2011, 01:05 |
|
typedef 13 Apr 2011, 01:27
vid wrote:
I'll rape you |
|||
13 Apr 2011, 01:27 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.