flat assembler
Message board for the users of flat assembler.

Index > Main > question on datastructures for lists for variable length str

Author
Thread Post new topic Reply to topic
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 22 Feb 2014, 07:53
What would be a good data structure for maintaining a list of strings each of an arbitrary length?
The following operations should be FAST:
- get the address of the i^th string
- replace the i^th string with another string of arbitrary length
- append a string to the end of the list
Post 22 Feb 2014, 07:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20344
Location: In your JS exploiting you and your system
revolution 22 Feb 2014, 08:03
A simple index should be fine.
Code:
mov eax,[i]
mov esi,[eax*4+string_index_table]    
You didn't state the rules for allocating/freeing memory used by the strings so can we assume that it is not important?
Post 22 Feb 2014, 08:03
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 22 Feb 2014, 08:18
nono, memory allocation is important.
So for each index i, I should use HeapAlloc and keep track of how much space is available at that index?
And, when replacing the i^th string, only if there is not enough space do we call HeapFree on [i*4+string_index_table] followed by a call to HeapAlloc with the new larger size?
And, if the heap run out, I have to reallocate the whole table?
Post 22 Feb 2014, 08:18
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20344
Location: In your JS exploiting you and your system
revolution 22 Feb 2014, 08:27
So this is for Windows? If so then you can use HeapReAlloc / HeapCompact and Windows will move things around accordingly without destroying the data. For this to work properly the index and each string is allocated separately. Since you only state that the index operations need to be fast then preallocate an index of a large size and minimise the number of changes to its size.
Post 22 Feb 2014, 08:27
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 22 Feb 2014, 11:58
If I am understanding the docs correctly,
HeapReAlloc is not usefull because it copies the old data to the new location, which I do not need. Might as well just use HeapFree+HeapAlloc.
HeapCompact just tells the length of the longest contiguous component. I'm not sure for what what this is usefull, since if any call to HeapAlloc fails, I can assume that he heap is essentially full. At this point I would have to create a new heap of maybe twice the size of the old one and copy everything from the old heap to the new heap, then delete the old heap.
Post 22 Feb 2014, 11:58
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1631
Location: Toronto, Canada
AsmGuru62 22 Feb 2014, 12:51
If HeapAlloc fails, then new heap creation will also fail.
It usually happens when process has no more pages to allocate or 2Gb of room (in 32 bit process)
has become fragmented -- it is enough memory, but no continuous block to satisfy the request.

It all depends on how intensive your code is.
How often strings need to be added or replaced? Occasional chance of that or
it happens all the time when code runs and in great frequency.

Every call to HeapAlloc() will have 24 bytes added for the block service structures, so having each string allocated with high frequency may be not the ideal choice. For high frequency allocation you may need the custom allocator, based on getting large blocks with VirtualAlloc() and then cutting them onto the smaller pieces you need with less overhead than 24 bytes, maybe just 4 or 8 bytes.

Or you can go with what C# does -- probably the fastest way: get a large block, say 512Kb (smaller is better to reduce fragmentation) and then just start adding strings into the block and keep the pointers in a vector of DWORDs, like in the code from revolution.

Then when new string comes -- simply append it again into large block and forget about the old text, but keep the count of empty bytes you just forgot. When block becomes filled - allocate another 512Kb and continue adding. Of course at some point you need to check if blocks become filled with empty bytes, say by 40% - if 40% of 512Kb is empty bytes, then start repacking the strings into yet another 512Kb and once done - VirtualFree() old large blocks. That takes some time, but initial allocation is very fast, because you only move pointer forward and copy strings into it (1st copy -- then move, of course).
Post 22 Feb 2014, 12:51
View user's profile Send private message Send e-mail Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 22 Feb 2014, 15:21
I like your last suggestion, namely
- allocate 256KiB initially
- maintain an array of string pointers and block sizes
- if a string to be put in some position cannot fit, then forget about that string data, and appen the new string to the end of the growing block inside the 256KiB region
- if we ever overflow the 256*n KiB region,
-- allocate a new 256*(n+1) KiB region, if this fails, then we are out of memory
-- transfer the old 256*n KiB region to the new 256*(n+1) KiB, and fill in the holes in the process
-- free the old 256*n KiB region

This should require a little care to get things working, but when it does, the calls to memory functions should be minimal. Smile
Post 22 Feb 2014, 15:21
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20344
Location: In your JS exploiting you and your system
revolution 22 Feb 2014, 15:51
tthsqe wrote:
- allocate 256KiB initially
- maintain an array of string pointers and block sizes
- if a string to be put in some position cannot fit, then forget about that string data, and appen the new string to the end of the growing block inside the 256KiB region
- if we ever overflow the 256*n KiB region,
-- allocate a new 256*(n+1) KiB region, if this fails, then we are out of memory
-- transfer the old 256*n KiB region to the new 256*(n+1) KiB, and fill in the holes in the process
-- free the old 256*n KiB region
You just pretty much described the functions of HeapAlloc, HeapReAlloc and HeapCompact. I think you just reinvented the Heap Razz The Windows functions are very robust and also quite smart about allocating/moving stuff. Naturally certain pathological situations can trip them up and cause some inefficiency but generally I find they work well.

BTW: HeapCompact does more than give the largest free space, it defragments the heap.
Post 22 Feb 2014, 15:51
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 22 Feb 2014, 16:58
I am not sure what you want to create, but I would suggest using FreshLib.

It is fast, it is assembly-centric, it works with variable length strings (StrLib) and has a dynamically resizeable arrays.
Post 22 Feb 2014, 16:58
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
gens



Joined: 18 Feb 2013
Posts: 161
gens 22 Feb 2014, 18:05
if you want speed over simplicity you need to make the custom allocator on top of mmap

since you know how long an avg string will be and min/max lengths, that a generic allocator can't know


Last edited by gens on 22 Feb 2014, 18:09; edited 1 time in total
Post 22 Feb 2014, 18:05
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 22 Feb 2014, 18:09
revolution,
When it is a proprietary wheel, that has been built for a task more general than the one required, and when I do not understand the inner workings of that wheel, I do not mind reinventing it.

For example,

VirtualAlloc and VirtuallFree, which work for large chunks, are my friends. I will use them to get the 256KiB chunks. The heap functions, which can work on small blocks, have operations that are mysterious to me.

There is nothing in the documentation page for HeapCompact that leads me to believe that it fills in holes. The hole filling process that AsmGuru62 and I were talking about has the possibility of changing the offsets of all of the string addresses in the list of strings. This should not be a problem because it is happening only every 256KiB overflow.
Post 22 Feb 2014, 18:09
View user's profile Send private message Reply with quote
gens



Joined: 18 Feb 2013
Posts: 161
gens 22 Feb 2014, 18:11
yes, VirtualAlloc in windows
mmap is same thing just more portable
Post 22 Feb 2014, 18:11
View user's profile Send private message Reply with quote
gens



Joined: 18 Feb 2013
Posts: 161
gens 22 Feb 2014, 18:40
sry, didn't bring much to the topic

pages are 4k in size on x86 so you'd want to work around that
you can mmap anywhere in virtual memory and any size (if you take more then 4k the addresses will be continuous, size and starting address aligned to 4k ofc)

if you don't pick an address, the next available will be chosen
this is the portable way since if you have a shared library doing malloc-ing you would have to run checks as to not map over it

i didn't get to making a complex memory management so check out some malloc implementation
like dlmalloc http://g.oswego.edu/dl/html/malloc.html as a general idea
or tcmalloc (a kind of lazy free malloc, that is "caching" parts of memory thus being faster with more fragmentation)
also http://en.wikipedia.org/wiki/Buddy_memory_allocation that is used internally in the linux kernel might give another perspective
all you need to put on top of it is an index
Post 22 Feb 2014, 18:40
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.