flat assembler
Message board for the users of flat assembler.

Index > Heap > [contest] database, maybe ascii text flat file is better?

Goto page Previous  1, 2, 3  Next
Author
Thread Post new topic Reply to topic
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
f0dder wrote:

A proper database system is a lot less likely to become corrupt than something you'll be hancoding in a few weeks

what you said is true, but the problem usually not the database system, it could be OS, human, hardware, network.

and BACKUP only and probably only works if that company got IT guy, and not every company here got IT guy and they don't even think they need one.

and they don't backup, and they don't check if their backup file is free from corrupted records or etc.

basically, they don't care, and they only care when it doesn't works.

and when it doesn't works, they will call me and wasting my time, yeah, they pay me, but i don't want to waste my time solving such kinda problems.

and the software they bought from those local company software stuffs are like worse than shit, yeah they use farking database, MS SQL, MySQL, FoxPro Dbase, Oracle Express, maybe they just don't know how to integrate their application with the database.

i give up on database because i face such problem several times already with different clients, and i am not gonna support those software that not written by me, just wasting more and more time.

i need one which is simple, no need extra tools to solve problem if problem arise.
Post 24 Aug 2011, 13:20
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
edfed wrote:

i like a lot the 20110824h date format.

me too, if add time, easy to sort and can do it like number.
201108241655

Quote:

Also, rewriting the whole flat file is a bad idea - performance suffers and index files needs to be rebuilt - very bad!

i am thinking about multiple flat file with scheduled rewrite on particular line at text file.
Post 24 Aug 2011, 13:24
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
AsmGuru62 wrote:
Question is: why format matters if customer is happy?
He did not care about the format and it was done in 1992 or so.
The main point of code is to run businesses. It can be full of repetitions
of code or "bad" formats, but if it runs - everyone is happy!
Let me start with another question: why waste time inventing yet another format when there's existing formats that work, and the new format doesn't bring any benefits?

"if it runs - everyone is happy" - except for the maintenance programmer X years from now who has to deal with weirdness like this. But of course, if your strategy is to make yourself irreplaceable and you're charging per the hour, stuff like this might be a good strategy until someone figures you out Smile

edfed wrote:
the BCD style for the date is very good and is a real industry standard.
What makes it "very good" in the context of timestamps? Is it "a real industry standard" in the context of timestamps?

sleepsleep wrote:
f0dder wrote:
A proper database system is a lot less likely to become corrupt than something you'll be hancoding in a few weeks

what you said is true, but the problem usually not the database system, it could be OS, human, hardware, network.
If you've got failure at that level, a flat file format is at least as likely to get corrupted as a proper database file. At least a proper database system has transactions (including a transaction log), making it a lot more resilient to damage.

sleepsleep wrote:
i give up on database because i face such problem several times already with different clients, and i am not gonna support those software that not written by me, just wasting more and more time.
So because you've been dealing with difficult clients (and you probably don't know database systems very well), you've decided to make the situation worse for your clients by wasting their $$$ designing a less resilient solution? Rolling Eyes

_________________
Image - carpe noctem
Post 24 Aug 2011, 13:37
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
f0dder wrote:
]What makes it "very good" in the context of timestamps? Is it "a real industry standard" in the context of timestamps?


verygood because it just needs an hexadecimal conversion, and then, it is faster to display.
up to the client coder to do the conversion with the convenient algorithm.

BCD is an industry standard, that's all, if you bring this kind of format to any BCD to 7segment driver on a paralell port, it will works, and will need less components on the board

in the context of assembly/hardware programming, it is a real industry standard.
Post 24 Aug 2011, 13:46
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
edfed wrote:
verygood because it just needs an hexadecimal conversion, and then, it is faster to display.
That's a joke, right? Since when has "conversion time for the sake of displaying" been important? When dealing with data, how often do you need to display it, compared to how often you need to perform calculations on it?

edfed wrote:
up to the client coder to do the conversion with the convenient algorithm.
Trading performance for programmer convenience, eh?

edfed wrote:
BCD is an industry standard, that's all, if you bring this kind of format to any BCD to 7segment driver on a paralell port, it will works, and will need less components on the board
And how is that relevant in the current context?

edfed wrote:
in the context of assembly/hardware programming, it is a real industry standard.
The current context isn't exactly hardware programming. Try again.
Post 24 Aug 2011, 14:02
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
f0dder wrote:


sleepsleep wrote:
f0dder wrote:
A proper database system is a lot less likely to become corrupt than something you'll be hancoding in a few weeks

what you said is true, but the problem usually not the database system, it could be OS, human, hardware, network.
If you've got failure at that level, a flat file format is at least as likely to get corrupted as a proper database file. At least a proper database system has transactions (including a transaction log), making it a lot more resilient to damage.

the flat file might get corrupted, but at least with a txt file, and a transaction log, i can recover for them in a state that i know what i am doing.

f0dder wrote:

sleepsleep wrote:
i give up on database because i face such problem several times already with different clients, and i am not gonna support those software that not written by me, just wasting more and more time.
So because you've been dealing with difficult clients (and you probably don't know database systems very well), you've decided to make the situation worse for your clients by wasting their $$$ designing a less resilient solution? Rolling Eyes

it never occurs in my mind to present a far worse solution than whatever that they are using right now, if i gonna sell/present to them, it means, in my state of mind, this stuff is a better solution compare to existing solution.

f0dder wrote:

(and you probably don't know database systems very well)

probably.
Post 24 Aug 2011, 15:15
View user's profile Send private message Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 99
Location: Argentina
pabloreda
@sleepsleep

my lang run in win for now, but the interface is very simple (like all forth's), I use the r4asm.asm i posted in the other thread for compile.

If you like test the flat file aproach, send me a huge CVS (if you add the separators I use better!!) and I can build a Insert, Update, Delete, sort, filter y paginator aplication... you can test the speed, and tell us the result !!
Post 24 Aug 2011, 21:58
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
Quote:

If you like test the flat file aproach, send me a huge CVS (if you add the separators I use better!!) and I can build a Insert, Update, Delete, sort, filter y paginator aplication... you can test the speed, and tell us the result !!

thanks a lot, Smile
ok, is that ok if i use any separator? and should text comes with "apostrophe" ?

ma interested with speed of SELECT, SORT, is sort by date capability inside ur engine? and SUM ?

if could get 60% out of current sql server speed, then it is more than enough Smile
Post 25 Aug 2011, 15:58
View user's profile Send private message Reply with quote
pabloreda



Joined: 24 Jan 2007
Posts: 99
Location: Argentina
pabloreda
the format I use is

1|name1|dir1~
2|name2|dir2~
etc..

_ be a multivalue separator..
..|field_with_many_values|..

the fields can have any character less |~_

sort by date is not implemented now but is easy to code.

I have a funtion (word in forth) called DBMAP, this execute a word for every register... sum can be buid with this.

I don't know how work compare to sql server
Post 25 Aug 2011, 17:50
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
ok, i attached with sample zip data which random output through the vbscript i wrote several days ago, let us try 90,000 record first


1st column = id (unique id) (1 to 90,000)
2nd = barcode (14 digit)
3rd = date item insert ( year 2000 to 2011)
4th = product name
5th = product price ( 0.10 till 2000.00 )
6th = product available stock quantity (1 to 500)
7th = product type (1 to 70)

i set some objectives like below, we need to know the "time" to get below answers.

1. list of conflict barcode number ( which double or triple or more exists)
2. total sum by product type
3. quantity of product that stock below 200 unit
4. sum of product and its quantity inserted by year.
5. product price above 1,000 and got quantity >= 250 unit
6. search by product name

i think maybe a sort of contest for fasmer to design text file reader that capable to answer in shortest time?? i think this would be interesting

or anyone want to put these data to database and see what the time sql server uses to obtained those answer, then we can see how far our reader time against professional database server.

i uploaded (around 3mb) the data file to mediafire.com
http://www.mediafire.com/?i63xez1ra4w0xqp
Post 25 Aug 2011, 18:44
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
after we achieve this, then we will try corrupt the data then we want the reader to identify which line that was corrupted.
Post 25 Aug 2011, 18:51
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
And how do you detect a corrupted record?

P.S. The file you posted is not really a FLAT FILE.
Flat file has all records of same size so, it is easy to seek the record.
Can I reformat it to be a 'real' flat file?
Post 25 Aug 2011, 20:50
View user's profile Send private message Send e-mail Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
AsmGuru62 wrote:

P.S. The file you posted is not really a FLAT FILE.
Flat file has all records of same size so, it is easy to seek the record.
Can I reformat it to be a 'real' flat file?

no problem, you can reformat it to 'real' flat file,
as long as the data file is possible to be edited through notepad.

btw, for those who wanna join this contest, it is ok u use not so 'real' flat file or 'real' flat file, we will see how the results later, just make sure your reader is the fastest among all Laughing


AsmGuru62 wrote:

And how do you detect a corrupted record?

yeah, what we would do is, we random delete from 1/3 or 1/4 of any line from 1 to 90000 then the reader "function" should able to tell us which line(s) probably got error.

since we are dealing with one 1 data file, and we don't deal with "transactions" based data input validation, we assume those we need such function should use SQL or noSQL database server out there.
Post 25 Aug 2011, 21:27
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
I am not following you - what is "delete from 1/3 of any line"? You mean cut the line off, or remove IDs, so it will be a missing IDs? Not clear...
Post 25 Aug 2011, 21:40
View user's profile Send private message Send e-mail Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
AsmGuru62 wrote:
I am not following you - what is "delete from 1/3 of any line"? You mean cut the line off, or remove IDs, so it will be a missing IDs? Not clear...


sorry for not making myself clear,

ok, example
Code:
16|13441488146781|20060609|ZOJUO XEOXAMERI AFENAJIMA|341.00|81|40
17|19041232466697|20040906|FUIVE NEO JUYAM|578.75|449|37
18|12733403444290|20080611|ME LE XUQI YIVEMI|1875.25|129|33
19|15303876996040|20071022|BE LUG ZUOIBEL KII UNEEM SOKAWE KUG ZUIX ZU|11.40|263|17
20|14553640484809|20010905|YADULUWO QAYIDA|680.75|49|61
21|13884602785110|20030617|AIXIOGU DIKAATO IHEFAFOO EUWADIOA FOKOX AI PUC KI|1549.25|344|65
22|12758241891860|20030808|WEGEBIJEW ZUPUCUNU VO TITIBO YA NEOK KETUSEJOS ODEAP|1770.10|257|68
23|15393635034561|20060607|JUBUYE XONURE CIANOBOB XAQUXAZ LIKUL NAPIWIXIX IESEXIH|981.45|210|45
24|16769604086875|20070914|QULANUGE HUA ZOHOXOYIX|1589.75|392|16
    


the corrupted version
Code:
16|13441488146781|2006 #&@*$^@#^$&&^&!()!@)8(#@IMA|341.00|81|40
@#*$*(8&$#@$*@#&&jhfkja7324832uhnjdfngk5893io324
290|20080611|ME LE XUQI YIVEMI|1875.25|129|33
19|15303876996040|2007102EL KII UNEEM SOKAWE KUG ZUIX ZU|11.40|263|17
20|1455|640484809|2001090AYIDA|680.75|49|61
21|13884                             20030617A||||||| FOKOX AI PUC KI|1549.25|344|65
22|1275824189~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |||| ODEAP|1770.10|257|6834561|20060607|JUBUYE XONURE CIANOBOB XAQUXAZ LIKUL NAPIWIXIX IESEXIH|981.45|210|45
24|16769604086875|20070914|QULANUGE HUA ZOHOXOYIX|1589.75|392|16
    


it is ok if your implementation need to add any field in front of record, as long as you mentioned so that everybody know how is your data file design works Very Happy
Post 25 Aug 2011, 21:48
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
Corruption should be very simple to detect. Just us a CRC for each record.

And if you have corruption then you should be restoring from backups and examining why you have the corruption. If it is just a bad HDD then this is easy to solve. If it is because of bad RAM or a cosmic ray on a non-ECC server then this is also easy to solve. But, if your code is faulty and creates corrupted records then you have a serious problem. Code that creates bad records means your entire DB is suspect. How will you explain to your customer that all the data is of no value anymore?
Post 25 Aug 2011, 21:57
View user's profile Send private message Visit poster's website Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
dear revolution,
would love if you could join this contest if you got extra time, since you are one of the greatest coders around board,

then we can see how everybody readers perform.
Post 25 Aug 2011, 22:07
View user's profile Send private message Reply with quote
sleepsleep



Joined: 05 Oct 2006
Posts: 8864
Location: ˛                             ⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣Posts: 334455
sleepsleep
i just think of 1 rule which is important,

if you gonna reformat the data into your reader, please make sure you got an import function,

we ASSUME the raw data will be like what i uploaded to mediafire, so please code "import" function to translate the data, calculate record crc or etc you want to do into whatever format that you find suitable for your reader.

the "RULE" is, the data file must be viewable, editable through NOTEPAD
Post 25 Aug 2011, 22:10
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
It would be pretty tricky to edit CRCs in notepad manually. I suppose it could be stored as hex, but calculating it is not so friendly.

But also, remember that notepad is a very restricted editor. If your DB is larger than your available RAM then there is no way you can edit it in notepad. Can you always guarantee that your DB will be smaller than your available RAM? IIRC notepad will convert ASCII files upon loading and uses two bytes of RAM per single ASCII character. Plus inserting/deleting text with notepad in large files takes a long time.
Post 25 Aug 2011, 22:18
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
CRC is good.
Post 25 Aug 2011, 22:19
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.