flat assembler
Message board for the users of flat assembler.

Index > Main > Search hash in hash-table [algorithm]

Author
Thread Post new topic Reply to topic
Bob++



Joined: 12 Feb 2013
Posts: 92
Bob++ 18 Feb 2013, 21:41
I am coding a word-processing that there is so much array access(static array, no changes at run-time). I am hasing the string and then check if there is in my array(lookup). But what is a good implementation of it? I am doing by the simple way. Checking value per value if match to my hash input. Ideas to make it fastest?

I am currently checking:

if loop unroling will make really different.
if disorderly array is really slow than a sorted
I'II go to see if vectorization will be fine in this case.

do you recommend it? better,how do you implement this algorith?

here's current routine(it's returns into EAX the index of hash in array or a negative value,if don't any):

Code:
Index_of:
push edx
push ecx
push ebx
mov ecx,123456          ;hash example. in the real code,it's set by a routine.
xor ebx,ebx
mov eax,array
.LOOP1:
        cmp [eax],ecx           ;match hash?
        je .FOUND
        cmp byte [eax],0
        je .NOTFOUND
        add ebx,1
        add eax,4
        jmp .LOOP1
.NOTFOUND:
        neg eax
        jmp .DONE
.FOUND:
        mov eax,ebx
.DONE:
pop ebx
pop ecx
pop edx
ret
    


array is:

Code:
;hashs for examples
array:
dd 33389990
dd 1234567
dd 81919191
dd 938383737
0
    


Thanks in advance.

I am on x86/32bit machine. Runnning Linux. But not really matter. The algorith has no(and shouldn't) OS-specific.
Post 18 Feb 2013, 21:41
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 18 Feb 2013, 22:23
You should use the (hash_value mod array.size) as an index of the array instead of searching, that way you get O(1) access instead of the O(n) you have now.

If you have collisions you'll need to store in the array pointers to lists (or make each array entry hold several entries if that solves the issue). Also google for perfect hashing which is suitable for static arrays.
Post 18 Feb 2013, 22:23
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 19 Feb 2013, 06:28
there are lot of ways to implement it. but essentially as Loco suggested.
a simple one
http://code.google.com/p/x64lab/source/browse/develop/ide/wspace.asm
line 2534 for utf-16 directory. you allocate the table on the stack, etc.etc, accordinng the number of bits you need (choos it carefully!)
look at the sdbm hash, good for general purpose.

here some macros i wrote (my improvement on it tested exstensively works without doubt better on labels L_00000, directory, and english dictionary)
http://code.google.com/p/x64lab/source/browse/develop/macro/mrk_macrow.inc
line 519,550

Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 19 Feb 2013, 06:28
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 19 Feb 2013, 07:05
If you can't build hash tables, because of some reason, you can try to keep the array sorted and use binary search. It will speed the things up.
Post 19 Feb 2013, 07:05
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 19 Feb 2013, 23:30
1) you're linearly scanning a table of hash values - you don't have a hash-table. O(n) lookup speed is bad.
2) checking for table end with byte(0) instead of dword(0) seems potentially dangerous.
3) as already mentioned, if your hash-table is static, you really should google perfect hashing - if it's suitable for your scenario, you get true O(1) lookups.
4) if for some reason you can't/won't do a true hash-table, at least sort the input values and use a binary search, that gives you O(log n) lookup speed (i.e. searching the entire DWORD space would require a worst-case of 32 iterations to find your item).
Post 19 Feb 2013, 23:30
View user's profile Send private message Visit poster's website Reply with quote
idle



Joined: 06 Jan 2011
Posts: 440
Location: Ukraine
idle 20 Feb 2013, 12:39
Post 20 Feb 2013, 12:39
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 20 Feb 2013, 13:48
ths is becoming a very useful thread, i think it's time to post this instructional
links if i dint it yet, http://home.comcast.net/~bretm/hash/

Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 20 Feb 2013, 13:48
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1661
Location: Toronto, Canada
AsmGuru62 20 Feb 2013, 16:37
I needed that for education!
I know "by guts" what is a hash, but this article going into much needed details.
Thanks.
Post 20 Feb 2013, 16:37
View user's profile Send private message Send e-mail Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 21 Feb 2013, 18:05
Post 21 Feb 2013, 18:05
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 23 Feb 2013, 08:17
reading above about the load factor, i would like to point out now that
collisions rate should be solved keeping this points as reference:

a) the hash function
b) the scheme
c) the context

a)
the hash function generate values from inputs.
those values may collide. here a graph, not much readable imho,
about how the function works:
http://www.fantasy-coders.de/projects/gh/html/x435.html


b)
the scheme is about the way you manage the table in memory,
because you have got some collisions coming from a good/bad a)
also deallocation/reallocation in order to increase/decrease/sort
free slots (buckets). this scheme for example,
http://en.wikipedia.org/wiki/Cuckoo_hashing
or linear probing

the focus here is on the birthday's paradox
http://en.wikipedia.org/wiki/Birthday_problem


c)
the context is the last but not least. because one thing is
hashing an english dictionary, one other thing is hashing
programmer's labels and symbols, where you may have
symbols like j,m,n,o,j1,j2 etc.
i cannot find the link of a valuable explanation about python using
a clever and elegant scheme on numbers and variables.
but here an implementation of the function:
http://stackoverflow.com/questions/793761/built-in-python-hash-function

Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 23 Feb 2013, 08:17
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.