flat assembler
Message board for the users of flat assembler.
Index
> Projects and Ideas > 32 chars hashed into one DWORD max value Goto page 1, 2 Next |
Author |
|
tthsqe 17 Feb 2014, 05:49
sleepsleep, you should really read up on the math. What you are requesting is possible only if you have a three character alphabet:
Code: Log[10^5]/Log[32] = 3.321928094887362 Since you are talking about a 63 character alphabet, I would first compress the 32 characters to 24 bytes. This can be done 1-1 since Code: Log[63^32]/Log[256] = 23.90911969399966 If you want to go fewer than 24 bytes you are going to get collisions of which you will have to keep track as you hash values. You can easily hash the 24 bytes (3 dwords) into 8 bytes by xor'ing the 3 dwords... |
|||
17 Feb 2014, 05:49 |
|
revolution 17 Feb 2014, 10:25
FYI: This is known as the pigeonhole principle.
|
|||
17 Feb 2014, 10:25 |
|
BAiC 17 Feb 2014, 12:18
Quote:
FYI: 24 bytes = 6 dwords (or 3 qwords). |
|||
17 Feb 2014, 12:18 |
|
sleepsleep 17 Feb 2014, 13:04
thanks for the term, revolution,
btw, tthsqe, i am really bad with maths, i only know simple add,subtract,divide,multiply. i was thinking about calculate all chars into a number. if got clash after strcmp (even if same number, then only i mark it as special) so, the hash would be, 1. calculate ascii value 2. check hash table if no clash 3. check special hash table if clash eg. window1 = 119 105 110 100 111 119 049 = 713 but, as we all know windov2 = 119 105 110 100 111 118 050 = 713 i was thinking the hash system should be intelligent and flexible to decide based on input that provided to it and deduce a best solution. (not about hashing all possibilities, since it is near impossible to use all possibilities) objective is prevent clash, and plan b and plan c if clash found. since a DWORD max value is 4294967295 i was trying to fit, and prevent inside this max value. |
|||
17 Feb 2014, 13:04 |
|
revolution 17 Feb 2014, 13:07
There is such a thing called a "perfect hash" which can do what you want. But it is slow and complex, I don't know if that is a problem for you. Anyhow, do some research into different types of hashes. I'm sure you can find one that will do what you want.
|
|||
17 Feb 2014, 13:07 |
|
sleepsleep 17 Feb 2014, 13:18
they are complex, and probably needs maths,,, =P
another idea is using 3D coordinate, as if these variables live in 3D space, maybe need a function to identify unique between 2 clashes, was thinking about extend by prepend at the back with a counter + 1 or more. |
|||
17 Feb 2014, 13:18 |
|
revolution 17 Feb 2014, 13:22
Why are you trying to invent your own hash? There are already many to choose from and all the analysis has been done by experienced people.
|
|||
17 Feb 2014, 13:22 |
|
sleepsleep 17 Feb 2014, 13:35
idk, like i said, i don't have maths to use their superior system, and how to use a thing that i don't even understand?
btw, i just thought of a new one, maybe this one will work =) 119 105 110 100 111 119 049 , they will be added to below max DWORD, then second row 4 294 967 295 1 191 051 10 = 1 191 051 100 if let say, inputTextDate ascii = 105110112117116084101120116068097116101 hash = Code: 1 051 101 121 171 160 841 011 201 160 680 971 161 01 so, add them all, then that is answer, if first char start with 4, then 1 will be prepend infront. horray, mission solve, lol =) i am simple minded, and maths is too complex for me. there is for sure possibilities for collision, but maybe, enough for my 1000 variables with no clash. maybe safer if i just start with 0 to prevent possible overflow 4kkk. |
|||
17 Feb 2014, 13:35 |
|
revolution 17 Feb 2014, 13:47
I expect even a basic CRC32 would give you better results. But there is still going to be that pesky little pigeon problem.
The old classic MD5 is still good value you know. Not secure, but then you never mentioned that as a requirement. |
|||
17 Feb 2014, 13:47 |
|
sleepsleep 17 Feb 2014, 13:51
the requirement would be, hashed of those chars must fit inside DWORD,
let me change the thread title, sorry. thanks for the CRC32 revolution, =) there are bunch of values and require calculus i guess. |
|||
17 Feb 2014, 13:51 |
|
edfed 17 Feb 2014, 14:48
if you use your string as a seed for a p-random generator, you can theorically have a good hash by picking some arbitrary bytes somewhere in the random string generated by the prng.
now, you just have to find a prng that fits your requirements. |
|||
17 Feb 2014, 14:48 |
|
cod3b453 17 Feb 2014, 20:32
You can take a larger hash function with better compression properties and use a dword of its output e.g. MD5/SHA. As you're not hashing that many values (~2^10) the odds of a collision are pretty small but for safety you should add do a full comparison when a hash match occurs (if it collides you can use another dword from the hash or rehash to enter in the table).
|
|||
17 Feb 2014, 20:32 |
|
JohnFound 17 Feb 2014, 21:39
My suggestion is as always FNV-1b. It is small and fast, with pretty good distribution (i.e. low collision rate):
Code: proc DataHash, .ptrData, .len begin push ecx edx esi mov esi, [.ptrData] mov ecx, [.len] mov eax, $811C9DC5 ; 2166136261 ; FNV offset basis inc ecx .hashloop: dec ecx jz .exit movzx edx, byte [esi] xor eax, edx inc esi imul eax, $01000193 ; 16777619 ; FNV prime jmp .hashloop .exit: pop esi edx ecx return endp If it is good enough for FASM it should be good for you as well. |
|||
17 Feb 2014, 21:39 |
|
sleepsleep 18 Feb 2014, 10:34
JohnFound wrote:
thanks for the information JohnFound, i will try it, as you said, if it is good enough for FASM, it should be good for me, =) thanks.! |
|||
18 Feb 2014, 10:34 |
|
revolution 18 Feb 2014, 10:37
sleepsleep wrote: ... if it is good enough for FASM, it should be good for me, =) |
|||
18 Feb 2014, 10:37 |
|
sleepsleep 18 Feb 2014, 11:11
revolution wrote:
ah.hah,, probably beyond my ability., but i will try, there got to be something simple using add,minus, only. after all, is numbers only imagination? idk, |
|||
18 Feb 2014, 11:11 |
|
sleepsleep 18 Feb 2014, 13:26
ok, what i gonna said might be really stupid,
but here the idea, 2 factors, 1st = strlen, 2nd = ascii value totalled. then i allocated page (decide by user, how many items per page) eg. 100 so, since, 32 chars is max, so we got 32 pages. if, let say, window1 = 119 105 110 100 111 119 049 = 713 7 chars, to identify it from name, 7 x 100, (initial address of page 7), compare first dword if calculated ascii total, 713, search through list until 0, if only 1, the next dword is what we intend to represent with this string, if more than 1, then we do strcmp on identified 713. i guess i could only use simple solution. |
|||
18 Feb 2014, 13:26 |
|
r22 27 Feb 2014, 20:22
CRC32 isn't a very difficult algorithm (to implement anyways). After reading the last sentence I feel compelled to prove it... for my own entertainment...
Code: .data hash_me dd 12345678h, 87654321h, 1a2b3c4dh, 9f8e7d6ch MAX_DW dd 0FFFFFFFFh CRC32_CONST dd 0EDB88320h CRC32_Table rd 256 .code CALL CRC32_BuildTable PUSH hash_me CALL CRC32_Consume32 CRC32_BuildTable: MOV edx, CRC32_Table PUSH ebx XOR ecx, ecx .do256: MOV eax, ecx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV ebx, eax SHR ebx, 1 XOR ebx, dword[CRC32_CONST] SHR eax, 1 CMOVC eax, ebx MOV dword[edx+ecx*4], eax ADD ecx, 1 CMP ecx, 256 JL .do256 POP ebx RET 0 CRC32_Consume32: PUSH esi PUSH edi MOV esi, dword[esp+12] MOV edi, CRC32_Table MOV eax, dword[MAX_DW] XOR ecx, ecx .do32: MOVZX edx, byte[esi + ecx] XOR dl, al MOV edx, dword[edi + edx * 4] SHR eax, 8 XOR eax, edx ADD ecx, 1 CMP ecx, 32 JL .do32 XOR eax, dword[MAX_DW] POP edi POP esi RET 4 There's probably more than a handful of CRC32 implementations already on the FASM board (that are faster, more correct, and have a higher probability of compiling), but the above fills that niche "I wrote it in the reply box after googling 'crc32 javascript' for a reference" implementation. |
|||
27 Feb 2014, 20:22 |
|
sleepsleep 27 Feb 2014, 23:31
awesome, r22
thanks,! |
|||
27 Feb 2014, 23:31 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.