flat assembler
Message board for the users of flat assembler.

 Index > Projects and Ideas > 32 chars hashed into one DWORD max value Goto page 1, 2  Next
Author
 Thread
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
idk what the term for this kinda process,
but what i intended to do is hashing the 32 chars into 5 digits number,

was thinking the following idea,

a replacement for variable name,

assume variable name only consists of A-Z, a-z, 0-9, _,
how to create a hash system that able to uniquely identify each possible name? without same hash for 2 different variable name?

the easy one inside my mind is count the ASCII value, add them, but this is for sure not fault proof,

any ideas that you wish to share?

Last edited by sleepsleep on 17 Feb 2014, 13:52; edited 2 times in total
17 Feb 2014, 03:54
tthsqe

Joined: 20 May 2009
Posts: 731
tthsqe
sleepsleep, you should really read up on the math. What you are requesting is possible only if you have a three character alphabet:
Code:
` Log[10^5]/Log[32] = 3.321928094887362    `

Since you are talking about a 63 character alphabet, I would first compress the 32 characters to 24 bytes. This can be done 1-1 since
Code:
`Log[63^32]/Log[256] = 23.90911969399966    `

If you want to go fewer than 24 bytes you are going to get collisions of which you will have to keep track as you hash values. You can easily hash the 24 bytes (3 dwords) into 8 bytes by xor'ing the 3 dwords...
17 Feb 2014, 05:49
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18182
Location: In your JS exploiting you and your system
revolution
FYI: This is known as the pigeonhole principle.
17 Feb 2014, 10:25
BAiC

Joined: 22 Mar 2011
Posts: 272
Location: California
BAiC
Quote:

24 bytes (3 dwords)

FYI: 24 bytes = 6 dwords (or 3 qwords).
17 Feb 2014, 12:18
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
thanks for the term, revolution,
btw, tthsqe, i am really bad with maths, i only know simple add,subtract,divide,multiply.

i was thinking about calculate all chars into a number. if got clash after strcmp (even if same number, then only i mark it as special)

so, the hash would be,
1. calculate ascii value
2. check hash table if no clash
3. check special hash table if clash

eg.
window1 = 119 105 110 100 111 119 049 = 713
but, as we all know
windov2 = 119 105 110 100 111 118 050 = 713

i was thinking the hash system should be intelligent and flexible to decide based on input that provided to it and deduce a best solution. (not about hashing all possibilities, since it is near impossible to use all possibilities)

objective is prevent clash, and plan b and plan c if clash found.

since a DWORD max value is 4294967295
i was trying to fit, and prevent inside this max value.
17 Feb 2014, 13:04
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18182
Location: In your JS exploiting you and your system
revolution
There is such a thing called a "perfect hash" which can do what you want. But it is slow and complex, I don't know if that is a problem for you. Anyhow, do some research into different types of hashes. I'm sure you can find one that will do what you want.
17 Feb 2014, 13:07
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
they are complex, and probably needs maths,,, =P

another idea is using 3D coordinate, as if these variables live in 3D space,
maybe need a function to identify unique between 2 clashes,

was thinking about extend by prepend at the back with a counter + 1 or more.
17 Feb 2014, 13:18
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18182
Location: In your JS exploiting you and your system
revolution
Why are you trying to invent your own hash? There are already many to choose from and all the analysis has been done by experienced people.
17 Feb 2014, 13:22
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
idk, like i said, i don't have maths to use their superior system, and how to use a thing that i don't even understand?

btw, i just thought of a new one, maybe this one will work =)

119 105 110 100 111 119 049 , they will be added to below max DWORD, then second row

4 294 967 295
1 191 051 10 = 1 191 051 100

if let say, inputTextDate
ascii = 105110112117116084101120116068097116101
hash =
Code:
```1 051 101 121
171 160 841
011 201 160
680 971 161
01
```

so, add them all, then that is answer, if first char start with 4, then 1 will be prepend infront.
horray, mission solve, lol =)

i am simple minded, and maths is too complex for me.

there is for sure possibilities for collision, but maybe, enough for my 1000 variables with no clash.

maybe safer if i just start with 0 to prevent possible overflow 4kkk.
17 Feb 2014, 13:35
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18182
Location: In your JS exploiting you and your system
revolution
I expect even a basic CRC32 would give you better results. But there is still going to be that pesky little pigeon problem.

The old classic MD5 is still good value you know. Not secure, but then you never mentioned that as a requirement.
17 Feb 2014, 13:47
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
the requirement would be, hashed of those chars must fit inside DWORD,

let me change the thread title, sorry.

thanks for the CRC32 revolution, =)
there are bunch of values and require calculus i guess.
17 Feb 2014, 13:51
edfed

Joined: 20 Feb 2006
Posts: 4252
Location: Now
edfed
if you use your string as a seed for a p-random generator, you can theorically have a good hash by picking some arbitrary bytes somewhere in the random string generated by the prng.

now, you just have to find a prng that fits your requirements.
17 Feb 2014, 14:48
cod3b453

Joined: 25 Aug 2004
Posts: 618
cod3b453
You can take a larger hash function with better compression properties and use a dword of its output e.g. MD5/SHA. As you're not hashing that many values (~2^10) the odds of a collision are pretty small but for safety you should add do a full comparison when a hash match occurs (if it collides you can use another dword from the hash or rehash to enter in the table).
17 Feb 2014, 20:32
JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound
My suggestion is as always FNV-1b. It is small and fast, with pretty good distribution (i.e. low collision rate):

Code:
```proc DataHash, .ptrData, .len
begin
push    ecx edx esi

mov     esi, [.ptrData]
mov     ecx, [.len]
mov     eax, \$811C9DC5                  ; 2166136261              ; FNV offset basis
inc     ecx

.hashloop:
dec     ecx
jz      .exit
movzx   edx, byte [esi]
xor     eax, edx
inc     esi
imul    eax, \$01000193                  ;   16777619              ; FNV prime
jmp     .hashloop

.exit:
pop     esi edx ecx
return
endp    ```

If it is good enough for FASM it should be good for you as well.
17 Feb 2014, 21:39
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
JohnFound wrote:

If it is good enough for FASM it should be good for you as well.

thanks for the information JohnFound,

i will try it, as you said, if it is good enough for FASM, it should be good for me, =)
thanks.!
18 Feb 2014, 10:34
revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 18182
Location: In your JS exploiting you and your system
revolution
sleepsleep wrote:
... if it is good enough for FASM, it should be good for me, =)
Be careful there because fasm has a specific purpose for the hash to generate a binary search tree (and it also allows for collisions). But in your case the needs might be different. It would pay for you to properly study the available hashes to see which best suits your requirements.
18 Feb 2014, 10:37
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
revolution wrote:

t would pay for you to properly study...

ah.hah,, probably beyond my ability., but i will try, there got to be something simple using add,minus, only.
after all, is numbers only imagination? idk,
18 Feb 2014, 11:11
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
ok, what i gonna said might be really stupid,
but here the idea,

2 factors, 1st = strlen, 2nd = ascii value totalled.
then i allocated page (decide by user, how many items per page) eg. 100
so,
since, 32 chars is max, so we got 32 pages.
if, let say,
window1 = 119 105 110 100 111 119 049 = 713
7 chars,
to identify it from name,
7 x 100, (initial address of page 7),
compare first dword if calculated ascii total, 713, search through list until 0,
if only 1, the next dword is what we intend to represent with this string,
if more than 1, then we do strcmp on identified 713.

i guess i could only use simple solution.
18 Feb 2014, 13:26
r22

Joined: 27 Dec 2004
Posts: 805
r22
CRC32 isn't a very difficult algorithm (to implement anyways). After reading the last sentence I feel compelled to prove it... for my own entertainment...

Code:
```.data
hash_me dd 12345678h, 87654321h, 1a2b3c4dh, 9f8e7d6ch

MAX_DW dd 0FFFFFFFFh
CRC32_CONST dd 0EDB88320h
CRC32_Table rd 256

.code
CALL CRC32_BuildTable

PUSH hash_me
CALL CRC32_Consume32

CRC32_BuildTable:
MOV edx, CRC32_Table
PUSH ebx

XOR ecx, ecx
.do256:

MOV eax, ecx

MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx
MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx
MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx
MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx
MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx
MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx
MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx
MOV ebx, eax
SHR ebx, 1
XOR ebx, dword[CRC32_CONST]
SHR eax, 1
CMOVC eax, ebx

MOV dword[edx+ecx*4], eax

ADD ecx, 1
CMP ecx, 256
JL .do256

POP ebx
RET 0

CRC32_Consume32:
PUSH esi
PUSH edi
MOV esi, dword[esp+12]
MOV edi, CRC32_Table
MOV eax, dword[MAX_DW]

XOR ecx, ecx
.do32:

MOVZX edx, byte[esi + ecx]
XOR dl, al
MOV edx, dword[edi + edx * 4]
SHR eax, 8
XOR eax, edx

ADD ecx, 1
CMP ecx, 32
JL .do32

XOR eax, dword[MAX_DW]

POP edi
POP esi
RET 4
```

There's probably more than a handful of CRC32 implementations already on the FASM board (that are faster, more correct, and have a higher probability of compiling), but the above fills that niche "I wrote it in the reply box after googling 'crc32 javascript' for a reference" implementation.
27 Feb 2014, 20:22
sleepsleep

Joined: 05 Oct 2006
Posts: 9830
Location: ˛　　　　　　　　　　　　　　　　　　　　　　　　　　　　　⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
awesome, r22
thanks,!
27 Feb 2014, 23:31
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First

 Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area
Goto page 1, 2  Next

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum

Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.