flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > 32 chars hashed into one DWORD max value

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
idk what the term for this kinda process,
but what i intended to do is hashing the 32 chars into 5 digits number,

was thinking the following idea,

a replacement for variable name,

assume variable name only consists of A-Z, a-z, 0-9, _,
how to create a hash system that able to uniquely identify each possible name? without same hash for 2 different variable name?

the easy one inside my mind is count the ASCII value, add them, but this is for sure not fault proof,

any ideas that you wish to share?


Last edited by sleepsleep on 17 Feb 2014, 13:52; edited 2 times in total
Post 17 Feb 2014, 03:54
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 730
tthsqe
sleepsleep, you should really read up on the math. What you are requesting is possible only if you have a three character alphabet:
Code:
 Log[10^5]/Log[32] = 3.321928094887362    

Since you are talking about a 63 character alphabet, I would first compress the 32 characters to 24 bytes. This can be done 1-1 since
Code:
Log[63^32]/Log[256] = 23.90911969399966    

If you want to go fewer than 24 bytes you are going to get collisions of which you will have to keep track as you hash values. You can easily hash the 24 bytes (3 dwords) into 8 bytes by xor'ing the 3 dwords...
Post 17 Feb 2014, 05:49
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17658
Location: In your JS exploiting you and your system
revolution
FYI: This is known as the pigeonhole principle.
Post 17 Feb 2014, 10:25
View user's profile Send private message Visit poster's website Reply with quote
BAiC



Joined: 22 Mar 2011
Posts: 272
Location: California
BAiC
Quote:

24 bytes (3 dwords)

FYI: 24 bytes = 6 dwords (or 3 qwords).
Post 17 Feb 2014, 12:18
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
thanks for the term, revolution,
btw, tthsqe, i am really bad with maths, i only know simple add,subtract,divide,multiply.

i was thinking about calculate all chars into a number. if got clash after strcmp (even if same number, then only i mark it as special)

so, the hash would be,
1. calculate ascii value
2. check hash table if no clash
3. check special hash table if clash

eg.
window1 = 119 105 110 100 111 119 049 = 713
but, as we all know
windov2 = 119 105 110 100 111 118 050 = 713

i was thinking the hash system should be intelligent and flexible to decide based on input that provided to it and deduce a best solution. (not about hashing all possibilities, since it is near impossible to use all possibilities)

objective is prevent clash, and plan b and plan c if clash found.

since a DWORD max value is 4294967295
i was trying to fit, and prevent inside this max value.
Post 17 Feb 2014, 13:04
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17658
Location: In your JS exploiting you and your system
revolution
There is such a thing called a "perfect hash" which can do what you want. But it is slow and complex, I don't know if that is a problem for you. Anyhow, do some research into different types of hashes. I'm sure you can find one that will do what you want.
Post 17 Feb 2014, 13:07
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
they are complex, and probably needs maths,,, =P

another idea is using 3D coordinate, as if these variables live in 3D space,
maybe need a function to identify unique between 2 clashes,

was thinking about extend by prepend at the back with a counter + 1 or more.
Post 17 Feb 2014, 13:18
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17658
Location: In your JS exploiting you and your system
revolution
Why are you trying to invent your own hash? There are already many to choose from and all the analysis has been done by experienced people.
Post 17 Feb 2014, 13:22
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
idk, like i said, i don't have maths to use their superior system, and how to use a thing that i don't even understand?

btw, i just thought of a new one, maybe this one will work =)

119 105 110 100 111 119 049 , they will be added to below max DWORD, then second row

4 294 967 295
1 191 051 10 = 1 191 051 100

if let say, inputTextDate
ascii = 105110112117116084101120116068097116101
hash =
Code:
1 051 101 121
   171 160 841
   011 201 160
   680 971 161
   01
    

so, add them all, then that is answer, if first char start with 4, then 1 will be prepend infront.
horray, mission solve, lol =)

i am simple minded, and maths is too complex for me.

there is for sure possibilities for collision, but maybe, enough for my 1000 variables with no clash.

maybe safer if i just start with 0 to prevent possible overflow 4kkk.
Post 17 Feb 2014, 13:35
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17658
Location: In your JS exploiting you and your system
revolution
I expect even a basic CRC32 would give you better results. But there is still going to be that pesky little pigeon problem.

The old classic MD5 is still good value you know. Not secure, but then you never mentioned that as a requirement.
Post 17 Feb 2014, 13:47
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
the requirement would be, hashed of those chars must fit inside DWORD,

let me change the thread title, sorry.

thanks for the CRC32 revolution, =)
there are bunch of values and require calculus i guess.
Post 17 Feb 2014, 13:51
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4242
Location: 2018
edfed
if you use your string as a seed for a p-random generator, you can theorically have a good hash by picking some arbitrary bytes somewhere in the random string generated by the prng.

now, you just have to find a prng that fits your requirements.
Post 17 Feb 2014, 14:48
View user's profile Send private message Visit poster's website Reply with quote
cod3b453



Joined: 25 Aug 2004
Posts: 619
cod3b453
You can take a larger hash function with better compression properties and use a dword of its output e.g. MD5/SHA. As you're not hashing that many values (~2^10) the odds of a collision are pretty small but for safety you should add do a full comparison when a hash match occurs (if it collides you can use another dword from the hash or rehash to enter in the table).
Post 17 Feb 2014, 20:32
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
My suggestion is as always FNV-1b. It is small and fast, with pretty good distribution (i.e. low collision rate):

Code:
proc DataHash, .ptrData, .len
begin
        push    ecx edx esi

        mov     esi, [.ptrData]
        mov     ecx, [.len]
        mov     eax, $811C9DC5                  ; 2166136261              ; FNV offset basis
        inc     ecx

.hashloop:
        dec     ecx
        jz      .exit
        movzx   edx, byte [esi]
        xor     eax, edx
        inc     esi
        imul    eax, $01000193                  ;   16777619              ; FNV prime
        jmp     .hashloop

.exit:
        pop     esi edx ecx
        return
endp    


If it is good enough for FASM it should be good for you as well. Smile
Post 17 Feb 2014, 21:39
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
JohnFound wrote:

If it is good enough for FASM it should be good for you as well.

thanks for the information JohnFound,

i will try it, as you said, if it is good enough for FASM, it should be good for me, =)
thanks.!
Post 18 Feb 2014, 10:34
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17658
Location: In your JS exploiting you and your system
revolution
sleepsleep wrote:
... if it is good enough for FASM, it should be good for me, =)
Be careful there because fasm has a specific purpose for the hash to generate a binary search tree (and it also allows for collisions). But in your case the needs might be different. It would pay for you to properly study the available hashes to see which best suits your requirements.
Post 18 Feb 2014, 10:37
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
revolution wrote:

t would pay for you to properly study...

ah.hah,, probably beyond my ability., but i will try, there got to be something simple using add,minus, only.
after all, is numbers only imagination? idk,
Post 18 Feb 2014, 11:11
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
ok, what i gonna said might be really stupid,
but here the idea,

2 factors, 1st = strlen, 2nd = ascii value totalled.
then i allocated page (decide by user, how many items per page) eg. 100
so,
since, 32 chars is max, so we got 32 pages.
if, let say,
window1 = 119 105 110 100 111 119 049 = 713
7 chars,
to identify it from name,
7 x 100, (initial address of page 7),
compare first dword if calculated ascii total, 713, search through list until 0,
if only 1, the next dword is what we intend to represent with this string,
if more than 1, then we do strcmp on identified 713.

i guess i could only use simple solution.
Post 18 Feb 2014, 13:26
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
CRC32 isn't a very difficult algorithm (to implement anyways). After reading the last sentence I feel compelled to prove it... for my own entertainment...

Code:
.data
hash_me dd 12345678h, 87654321h, 1a2b3c4dh, 9f8e7d6ch

MAX_DW dd 0FFFFFFFFh
CRC32_CONST dd 0EDB88320h
CRC32_Table rd 256

.code
CALL CRC32_BuildTable

PUSH hash_me
CALL CRC32_Consume32

CRC32_BuildTable:
MOV edx, CRC32_Table
PUSH ebx

XOR ecx, ecx
.do256:

MOV eax, ecx

MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx
MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx
MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx
MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx
MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx
MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx
MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx
MOV ebx, eax
 SHR ebx, 1
  XOR ebx, dword[CRC32_CONST]
   SHR eax, 1
    CMOVC eax, ebx

MOV dword[edx+ecx*4], eax

ADD ecx, 1
CMP ecx, 256
JL .do256

POP ebx
RET 0

CRC32_Consume32:
PUSH esi
PUSH edi
MOV esi, dword[esp+12]
MOV edi, CRC32_Table
MOV eax, dword[MAX_DW]

XOR ecx, ecx
.do32:

MOVZX edx, byte[esi + ecx]
XOR dl, al
MOV edx, dword[edi + edx * 4]
SHR eax, 8
XOR eax, edx

ADD ecx, 1
CMP ecx, 32
JL .do32

XOR eax, dword[MAX_DW]

POP edi
POP esi
RET 4
    


There's probably more than a handful of CRC32 implementations already on the FASM board (that are faster, more correct, and have a higher probability of compiling), but the above fills that niche "I wrote it in the reply box after googling 'crc32 javascript' for a reference" implementation.
Post 27 Feb 2014, 20:22
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 9128
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
awesome, r22
thanks,!
Post 27 Feb 2014, 23:31
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.