flat assembler
Message board for the users of flat assembler.

Index > Heap > [contest] database, maybe ascii text flat file is better?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
sleepsleep wrote:

ok, i attached with sample zip data which random output through the vbscript i wrote several days ago, let us try 90,000 record first


1st column = id (unique id) (1 to 90,000)
2nd = barcode (14 digit)
3rd = date item insert ( year 2000 to 2011)
4th = product name
5th = product price ( 0.10 till 2000.00 )
6th = product available stock quantity (1 to 500)
7th = product type (1 to 70)

i set some objectives like below, we need to know the "time" to get below answers.

1. list of conflict barcode number ( which double or triple or more exists)
2. total sum by product type
3. quantity of product that stock below 200 unit
4. sum of product and its quantity inserted by year.
5. product price above 1,000 and got quantity >= 250 unit
6. search by product name

i think maybe a sort of contest for fasmer to design text file reader that capable to answer in shortest time?? i think this would be interesting

or anyone want to put these data to database and see what the time sql server uses to obtained those answer, then we can see how far our reader time against professional database server.

i uploaded (around 3mb) the data file to mediafire.com
http://www.mediafire.com/?i63xez1ra4w0xqp


Rules
sleepsleep wrote:

well, notepad or any other ascii editor that able to display ascii text as if like using notepad.

hex editor is okay, as long as, the data are displayed in a way that we can read/write/modify it like using notepad.


1. please put your engine into a DLL file and calling it from console exe file

2. you can use whatever API that is available in Win32 or Win64, no third-party DLL, only DLL that shipped with default installation of Windows XP, Vista, 7.




-----------------------------------------------------------------------------------------

when application database somehow corrupted, then i appreciate the ascii maybe UTF-8 flat file approach.

i was thinking what if a directory based approach to design a simple flat file database, truecrypt to encrypt them into container if we want more security on our data.

and Findstr as our search engine to replace SQL.

sometime i think, to use something that i totally don't understand is so dangerous, if data corruption happened, then i must rely to another tools/software to solve this corrupted thing.

are we using too much complex thing to solve some simple application data storage problem?


Last edited by sleepsleep on 25 Aug 2011, 23:01; edited 1 time in total
Post 16 Aug 2011, 22:09
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
We used flat text data files in our product - it lived up to 2004.
Smile
It was not slow - not noticeable against an SQL database.
However, with SQL you have more flexibility with data - for example
you can JOIN tables and with our FLAT file we could not - it was limited.

We could only SELECT some records using a simple criteria. We could
delete records by writing a byte at the beginning of a flat file record.
It was simple, but it was in production and working for customers for quite some time.

And a cool thing is that we could load these files into NOTEPAD!
Post 17 Aug 2011, 00:29
View user's profile Send private message Send e-mail Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
XML databases were all the rage 6 or 7 years ago.

I've always wanted to create a good file/directory structure based RDBMS using symbolic links to avoid data duplication, but never had the time to plan it out.
Post 17 Aug 2011, 12:11
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
Which data handling/storage strategy you should be using is entirely dependant upon what type of data and how much data you have and what you need to do with it. There is no one-size-fits-all data strategy.

A 100TB database of plain flat ASCII would be awesomely lethargic at searching.

A 256B database of uppercase translation bytes would be awesomely complex if stored in a RDBMS server with SQL querying.
Post 17 Aug 2011, 19:10
View user's profile Send private message Visit poster's website Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 99
Location: Argentina
pabloreda
I use a CVS like text files to manage database, but use 3 separators, register, field and multivalue (like pick databases).

I coded filter and sorting all in memory, I not have large database but you can limit this by design.

For now work very well... I not need sql.
Post 17 Aug 2011, 19:21
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
thanks for the replies,
i coded vbscript for data generation purpose, next only i try to find tools for searching flat files.

Code:
ask me if you want, otherwise keep loading this code is wasting bytes. :D
    


output, does those random words make sense? idk..
Code:
SOCE WAVUD UOHEM KINU QUJEJ
DITUK CUSA IHA CUXEJ RU EFAG
JU VEZ ADA FIY VOV LO GOTOX
QOS HU FO NIKO ETAAS LUDUH JASUE GAHIT
KO JAM KUYA WEVA
HIKUG WE AT HOFER GU MUNUU IJ UZAQI BAPI
EIX DIO WAS LOLU PAPE ENEV OLOE
CE CEMU PE
VUXO MOACU TEY
AHUEX EPAU VI DEITO KUG
    


Last edited by sleepsleep on 24 Aug 2011, 13:09; edited 1 time in total
Post 23 Aug 2011, 22:34
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
pabloreda wrote:
I use a CVS like text files to manage database, but use 3 separators, register, field and multivalue (like pick databases).

I coded filter and sorting all in memory, I not have large database but you can limit this by design.

For now work very well... I not need sql.


hi there pabloreda,
wanna know how many records your flat file db currently saves?
and which process does your application usually perform?
add new record, delete or update?

you mentioned about filter and sorting, hmm, care to details a bit how it works and "all in memory" means how? i mean, how you request the memory and do all things there?

waiting for your reply Wink
Post 23 Aug 2011, 22:38
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
revolution wrote:

A 100TB database of plain flat ASCII would be awesomely lethargic at searching.


agree Smile

because i think my customer seldom search, let say, point of sales transactions that performed "last year" or "last month", they just want a total of every month,

well, i am still thinking how to solve the query of stock movement, to know the exact stock unit on hand in "sort of real time" through this flat file db.
Post 23 Aug 2011, 22:44
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
AsmGuru62 wrote:
We used flat text data files in our product - it lived up to 2004.
Smile


cool!!

AsmGuru62 wrote:
And a cool thing is that we could load these files into NOTEPAD!

i was thinking about using .NFO extension Twisted Evil
Post 23 Aug 2011, 22:46
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
btw, anyone got ideas how to deal with UNIQUE field ?
hmm, a flat file with 100,000 records but i need UNIQUE field for let say product barcode...

or product that less than 4.50

or manufactured date around 8th August 2010?

Crying or Very sad Crying or Very sad
Post 23 Aug 2011, 22:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
sleepsleep wrote:
btw, anyone got ideas how to deal with UNIQUE field ?
hmm, a flat file with 100,000 records but i need UNIQUE field for let say product barcode...

or product that less than 4.50

or manufactured date around 8th August 2010?

Crying or Very sad Crying or Very sad
Seems to me that you ARE searching the DB. Since you have 100k records then perhaps SQL will suit your needs.
Post 24 Aug 2011, 00:01
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
100K records are ok for flat file. You just need the additional file beside a flat one which will hold the record indexes of a SORTED unique field. SORTED because to find the record number in a sorted vector is VERY fast using the binary search.

Here is how to do it:

1. Define a structure with two fields:
- record number in flat file: 1,2,3, etc.
- UNIQUE field value (barcode or whatever)

2. Scan all records from flat file - recording these two values and adding structures into vector.

3. When vector is done - SORT it by the barcode value! The sorting will also sort the record numbers.

4. Write the whole vector to a file beside the flat file.
This is your index file.

5. To use it - say, you need to find a specific barcode - load the whole index file into memory and use binary search to find a structure with proper barcode. The first element of that structure will tell you the record number where you can find that barcode in flat file. Seek to that record and read it. It is proven to be a very fast method to search without SQL.

You can create these index files for any field - even dates, so you can find the range of dates also using binary search.

Here is a good way to store dates: pack it all into 4 bytes:
- day: 1-31 = 1 byte of storage
- month: 1-12 = 1 byte of storage
- year: 1-65535 = 2 bytes of storage.

Even better:
YEAR*10000 + MONTH*100 + DAY.

Today: 20110823
Tomorrow: 20110824

It fits into 4 bytes also.

To compare it simply use a single comparison - instead of doing first by year then month then day.
Post 24 Aug 2011, 00:37
View user's profile Send private message Send e-mail Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 99
Location: Argentina
pabloreda
Hi sleepsleep

now i have near 2000 records (200kb aprox), the program separate for years, but the limit is the RAM i guess...

the complete code is in http://code.google.com/p/reda4/source/browse/trunk/r4/Lib/db2.txt

I use the db for:
phone numbers.. agenda example (http://code.google.com/p/reda4/source/browse/trunk/r4/Apps/agenda.txt)
and NOT in site (some people pay for it!!)
*fishing contest system
*register IN/OUT mail (not e-amail), and other papers
*cementery files (really.. i have a programa for this!!..with map for found sites)
*budgets and billing (I not sureif traslate well..)
*stock and accounts (simple contab)
*printing form for registration of cars (formularies)

all this program use this db, really simple when fail...open notepad!!

how work?

I load the entire file, and make a 1 pass index with the records. and reserve the same amount of mem for second index.
when modify sometimes (insert,delete,update) then rewrite the file (never have in memory other version of table).

the access is made ALL in second index, this index is for filter and sorting, or copy of first index if not have filter or sorting.

if you need explanations or traslation to asm, tell me!!
Post 24 Aug 2011, 02:03
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
revolution wrote:

then perhaps SQL will suit your needs.

i scare hell the part when it is corrupted, then the stress to find tools to fix it, even if i want to manual fix it, i have no idea how to do it.
Post 24 Aug 2011, 04:05
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep

the reda4 site looks COOL!, the main frontpage got crayon and the maybe "animation", is that some kinda standalone OS?

pabloreda wrote:

if you need explanations or traslation to asm, tell me!!

yeah, ma say thanks to you first. Smile
Post 24 Aug 2011, 04:17
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
AsmGuru62,
thanks for the info,
ma thinking to have it runs like a http text file database server Smile

reserve a portion of memory, load everything into memory, all update / modify / delete will write in another text file, scheduled time-out based only write to exact text file or upon query.
Post 24 Aug 2011, 05:00
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
sleepsleep wrote:
revolution wrote:

then perhaps SQL will suit your needs.

i scare hell the part when it is corrupted, then the stress to find tools to fix it, even if i want to manual fix it, i have no idea how to do it.
A proper database system is a lot less likely to become corrupt than something you'll be hancoding in a few weeks. Ever heard of ACID? And you do plan on doing backups anyway, right?

AsmGuru62 wrote:
Here is a good way to store dates: pack it all into 4 bytes:
- day: 1-31 = 1 byte of storage
- month: 1-12 = 1 byte of storage
- year: 1-65535 = 2 bytes of storage.

Even better:
YEAR*10000 + MONTH*100 + DAY.
Why yet another custom format instead of sticking to a industry-standard timestamp?

_________________
Image - carpe noctem
Post 24 Aug 2011, 11:20
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
Question is: why format matters if customer is happy?
He did not care about the format and it was done in 1992 or so.
The main point of code is to run businesses. It can be full of repetitions
of code or "bad" formats, but if it runs - everyone is happy!
Post 24 Aug 2011, 13:02
View user's profile Send private message Send e-mail Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
sleepsleep: SQL has something called a TRANSACTION and if used properly - database will never get corrupted.

Also, rewriting the whole flat file is a bad idea - performance suffers and index files needs to be rebuilt - very bad!
Post 24 Aug 2011, 13:06
View user's profile Send private message Send e-mail Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
i like a lot the 20110824h date format.

looks like a very good solution for dates until 99991231h

the BCD style for the date is very good and is a real industry standard.
if you add a hour like 24:59:59:99, you just need another dword.
then, with a single qword compare, the test is fast.

or just use something like millisecond count since jesus christ or ramses2, in a qword..
Post 24 Aug 2011, 13:11
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.