flat assembler
Message board for the users of flat assembler.

Index > Main > traversing memory, fastest way to find a "null"

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 04 Nov 2008, 02:37
Code:
include 'win32ax.inc'
POOL_SIZE  = 1 * 1024*1024
ALLOC_SIZE = 64

.data
  pPool dd 0
  pHead dd 0
.code

; On error EAX=0
proc heap_init
  invoke  VirtualAlloc, NULL, POOL_SIZE, MEM_COMMIT, PAGE_READWRITE
  test    eax, eax
  mov     [pPool], eax
  mov     [pHead], eax
  jz      .exit

  mov     ecx, POOL_SIZE / ALLOC_SIZE - 1 ; Make sure that ecx will be 1 at least here

.loop:
  lea     edx, [eax+ALLOC_SIZE]
  mov     [eax], edx
  mov     eax, edx
  dec     ecx
  jnz     .loop

  mov     dword [eax], 0 ; Close list

.exit:
  ret
endp

; EAX=NULL on fail
proc heap_alloc
  mov     eax, [pHead]
  test    eax, eax
  jz      .exit

  mov     edx, [eax]
  mov     [pHead], edx

.exit:
  ret
endp

; pointer must be valid (a pointer that was already freed is considered invalid of course)
proc heap_free pointer
  mov     eax, [pointer]
  mov     edx, [pHead]
  mov     [eax], edx
  mov     [pHead], eax

  ret
endp

start:
  call    heap_init
  call    heap_alloc
  stdcall heap_free, eax

  invoke  MessageBox, 0, "Test", "No crash Very Happy", 0
  invoke  ExitProcess, 0
.end start    


Check it very well because I haven't tested it much (only the silly test you can see in the source was performed). The design flaw of heap_free can be solved in fact by checking if the pointer is inside the boundaries of the pool and if the pointer is multiple of ALLOC_SIZE but that is not enough (this check cannot detect double calls to heap_free nor frees to non allocated memory). Using a bitmap with BTx instructions should solve this problem but perhaps you will find some way to use the free space in such a way that using separate space for the bitmap will not be required (but in the worst case remember that in each dword you can hold the state allocated/freed of 32 blocks so it is not a huge space overhead anyway).

[edit]Another idea: You could make the list on your original array by using the slot position to calculate the address of the block inside the pool and hence leaving the slot space free to be used for the list. Also, you could use some bit to indicate whether the slot belongs to the list or not so you are not vulnerable anymore to heap_free used on already free blocks. Another advantage is that crossing the boundaries of a block will not put in risk the list integrity since it is stored outside the memory pool.
Post 04 Nov 2008, 02:37
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 04 Nov 2008, 14:12
Try googling for jemalloc, it's supposed to be a pretty decent (aka fast) memory allocator, for current hardware.
Post 04 Nov 2008, 14:12
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 10 Nov 2008, 03:02
Any news about this? Because of the silence of this thread now I'll keep the thread-safe version with me Wink
Post 10 Nov 2008, 03:02
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 11 Nov 2008, 09:07
This example shows what I use for cacheline size objects. Not thread safe, but threads shouldn't overlap at such a fine granularity, imho.
Post 11 Nov 2008, 09:07
View user's profile Send private message Visit poster's website Reply with quote
adnimo



Joined: 18 Jul 2008
Posts: 49
adnimo 14 Nov 2008, 07:48
Hi, sorry for the absence I couldn't get a hold of inet lately (yeah I know...)

f0dder: all I found was a pdf and I'm still trying to understand the algorithm, it bragged mostly on how good it was and the explanation of the system was quite odd to my taste...

LocoDelAssembly: I'm still playing with your code. mind you I'm quite a slow learner Smile

bitRAKE: I'm mostly concerned on how to deal with the memory block and all pointers rather than how to allocate it at the moment, unless I missed something from your code.
Post 14 Nov 2008, 07:48
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 14 Nov 2008, 12:13
adnimo: well, they put quite some effort into the algorithm, and the benchmarks compared to the classic BSD PHK allocator were positive - it was also adopted for FireFox3, where it apparently showed some nice improvements. It's not just about allocating blocks of memory fast, but also trying to reduced multithreaded sync overhead, and improve cache coherency.
Post 14 Nov 2008, 12:13
View user's profile Send private message Visit poster's website Reply with quote
adnimo



Joined: 18 Jul 2008
Posts: 49
adnimo 15 Nov 2008, 12:47
yes, but that's the problem. I cannot develop anything _that_ advanced since I'm still learning, my idea was to make a simple memory pool library so I can get rid of the system malloc/free calls (huge overhead) for stuff like linked lists, arrays, etc. by already having allocated a big chunk of memory and then just managing the pointers by myself.

also, when doing malloc you never know if it'll happen to give you a portion of memory that is close to the last one you requested or if it's at the very end of the address space,

right now I would only use a memory pool for node systems, etc. where the allocation calls are very frequent, not so much for arrays, etc.
Post 15 Nov 2008, 12:47
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 16 Nov 2008, 18:03
adnimo wrote:
bitRAKE: I'm mostly concerned on how to deal with the memory block and all pointers rather than how to allocate it at the moment, unless I missed something from your code.
I was specifically referring to the getBLOCK/setBLOCK macros. Not a general solution by any means, but impossible to get faster for specific solutions which do many small de-/allocations.
Code:
macro getBlock reg0*,reg1*{
        mov reg0,reg1
        mov reg1,[reg1]
}
macro putBlock reg0*,reg1*{
        mov [reg0],reg1
        mov reg1,reg0
}    
Any other solution needs atleast another level of indirection - usually several.
(Looks like the same thing LocoDelAssembly posted - just in another form.)

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 16 Nov 2008, 18:03
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 16 Nov 2008, 18:28
Quote:

(Looks like the same thing LocoDelAssembly posted - just in another form.)

If you mean your mem_hog.asm I think it is the same, at least your initialization loop seems to do exactly the same I do but written sightly different. I'm unsure about your macros though, how are supposed to be used?
Post 16 Nov 2008, 18:28
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.