flat assembler
Message board for the users of flat assembler.
Index
> Projects and Ideas > Another settings file format... Goto page 1, 2 Next |
Author |
|
JohnFound 01 Dec 2012, 16:35
Well, I already created one simple micro database for FreshLib (discussed in this thread.
But today, I got some inspiration and invented another simple file format for hierarchical database (i.e. for storing settings, applications preferences, etc.) Unlike .INI files which have only two levels of nesting (sections and keys) this one has really true hierarchical structure. It is inspired by Python indentation model and seems to be really easy for searching and parsing. Additional advantage is that it is human readable and editable. Here is some example: Code: ; comment ; keys on root level key1="some string value" key2=1234 ; some number value in FASM format. key3=#0f0a030405060708 ; binary field encoded in hex dir1: key1="another key" ; the full name is: /dir1/key1 key2=1234 dir2: key1=1234 ; /dir1/dir2/key1 dir3: key1=1234 ; /dir1/dir2/dir3/key1 key3=#010203 ; /dir1/key3 key4=1234 ; /key4 - on the root level. The colon symbol opens new sub-directory; If some of the next rows begins on position less than or equal to the current sub-directory indent, it ends the current directory and returns to the previous, where the same check should be provided. Please comment this format. What I am missing? How it should be implemented? Where are the possible pitfalls? I would like to have some discussion before implementing it as a code. The goal is to implement the full API in less than a 512bytes. _________________ Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9 |
|||
01 Dec 2012, 16:35 |
|
LocoDelAssembly 01 Dec 2012, 17:45
If you need some extra ideas, you might want to take a look at YAML, which is also identation based.
|
|||
01 Dec 2012, 17:45 |
|
SeproMan 02 Dec 2012, 14:15
In your example you used 4 SPACE characters for indentation, isn't it?
How would you deal with the TAB character? |
|||
02 Dec 2012, 14:15 |
|
AsmGuru62 02 Dec 2012, 14:41
What's with 512 bytes dogma?
Every piece of code must fit into 512 bytes -- just for the fun of it? |
|||
02 Dec 2012, 14:41 |
|
JohnFound 02 Dec 2012, 20:30
Serpoman: the number of indentations is not important. It serves only to make it more readable. The rule is following: Every level of nesting has some start indentation. If some row has indentation less than the current level, it ends the block and belongs to the previous level. For example, see "/dir1/key3" from the above example. "key3" has indentation less than "/dir3/key1" and less than "dir3". That is why it belongs to "/dir1" sub-block. "key3" can have indentations 4, 5, 6 or 7 and still will belongs to the same sub-block. (I am not sure I am clear, but I still don't have exact formal definition of this behavior - AFAIK, Python language has similar indentation rule. Maybe I have to read it from its manuals. )
About TAB characters - it is a problem. The simplest solution is to forbid TAB characters at all. Or expand them correctly...if possible at all. AsmGuru62: 512 bytes is not a dogma. It is a desire to make it small and simple. |
|||
02 Dec 2012, 20:30 |
|
SeproMan 02 Dec 2012, 21:50
You say :
"key3" can have indentations 4, 5, 6 or 7 and still will belongs to the same sub-block. This means that you define the current level by the position of the colon character. Nothing wrong with that! But it allows for some ugly layout. If key3 is shifted from indent 4 to indent 6 it still legally belongs to Dir1 but optically it seems to belongs to Dir2. Perhaps you could consider defining the current level by the first character of de directory name. I would even suggest "No TAB's allowed and a fixed 1 SPACE indentation (per level)" Also see what happens when you rewrite your example with (much) longer directory names? p.e. "ColorSettingsOfSpreadsheetApplication:" _________________ Real Address Mode. |
|||
02 Dec 2012, 21:50 |
|
JohnFound 03 Dec 2012, 06:06
Not the colon position. The indentation of the line ended with colon matters. Fixed indentation of 1 space per level is too strong IMHO. It will raise errors all the time.
|
|||
03 Dec 2012, 06:06 |
|
JohnFound 08 Dec 2012, 22:35
I am working slowly on this library. The results are pretty good for now. The discussed format is really easy for parsing. The subroutine that searches the database tree is only 121bytes long:
Code: ; Arguments: ; esi - pointer to the text of the database ; edx - pointer to the path string ; Returns: ; CF=0 if the path is found. ; esi - pointer to the value. ; ah=":" if the path is directory. ; ah="=" if the path is data field. ; ; CF=1 the path was not found. ; esi - points to the first row after the end of the block where the path ; is supposed to be (but is not). ; edx - points to the remainder of the path that was not found. proc __SearchData begin mov edi, edx xor ebx, ebx ; current block indent dec ebx .outer_loop: ; compute the line indent mov ecx, esi .scan_indent: lodsb test al, al jz .end_of_file cmp al, ' ' je .scan_indent jb .outer_loop cmp al, ';' jne .indent_ok ; skips to the end of the line. .SkipIt: lodsb cmp al, ' ' jb .outer_loop test al, al jnz .SkipIt .end_of_file: dec esi .error: stc return .indent_ok: dec esi sub ecx, esi neg ecx ; Here the indent of the current line is in ECX. cmp ecx, ebx jle .error ; if the current indent is less of equal than the block ; indent, then the needed element can not be found. cmp ecx, edi ja .SkipIt ; the current indent is above the current upper limit, so ; the whole line have not to be processed. mov edi, edx ; the start of the current path .key_loop: mov al, [edi] inc edi mov ah, [esi] inc esi test al, al jz .maybe_end_of_path cmp al, '/' je .key_maybe_found cmp al, '\' je .key_maybe_found test ah, ah jz .end_of_file cmp al, ah je .key_loop ; not this, so check the current line whether it is a directory. .not_this: dec esi .dir_loop: lodsb test al, al jz .end_of_file cmp al, ' ' jb .outer_loop je .SkipIt cmp al, '=' je .SkipIt cmp al, ':' jne .dir_loop mov edi, ecx ; it is a subdirectory, that is not in the path, ; so set the upper limit. jmp .SkipIt .maybe_end_of_path: cmp ah, '=' je .found .key_maybe_found: cmp ah, ':' jne .not_this test al, al jz .found ; the searched path is subdirectory, not a record. mov edx, edi ; the subdirectory was found. Set new path and block indent. mov ebx, ecx jmp .SkipIt .found: ; the whole path was found. clc return endp The procedure supports comments in the configuration file, beginning with ";". Only spaces are allowed. Actually, every character less than $20 is accepted as a new line separator. |
|||
08 Dec 2012, 22:35 |
|
f0dder 09 Dec 2012, 13:42
Personally I'd go for JSON - it's a pretty standard format, it doesn't depend on indentation (you can indent it just as prettily as YAML/whatever, but you can also compact if for network transfer), there's good syntax support for it in a lot of editors, and many people are familiar with it.
I don't see the point of "yet another configuration format" - at least not when it doesn't add anything the existing formats can't handle. |
|||
09 Dec 2012, 13:42 |
|
JohnFound 09 Dec 2012, 15:19
Why not XML then? New formats appears every day. There should be some reason.
|
|||
09 Dec 2012, 15:19 |
|
f0dder 09 Dec 2012, 16:09
XML is a bit on the heavy side, and not very human-friendly - so alternatives were were justified. But IMHO there's enough to choose from now, so I don't see the point in adding yet a format that has the same capabilities and doesn't add anything except a somewhat different look.
Might be worthwhile to look at a binary format with ACID guarantees, that would (probably ) be something reasonably new. |
|||
09 Dec 2012, 16:09 |
|
JohnFound 09 Dec 2012, 16:57
f0dder wrote: Might be worthwhile to look at a binary format with ACID guarantees, that would (probably ) be something reasonably new. It is already invented: SQLite. _________________ Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9 |
|||
09 Dec 2012, 16:57 |
|
HaHaAnonymous 09 Dec 2012, 17:16
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 22:12; edited 1 time in total |
|||
09 Dec 2012, 17:16 |
|
JohnFound 09 Dec 2012, 17:26
HaHaAnonymous, You can create unlimited levels of nesting.
Well, as many as you can put in the memory actually. Although, the whole library can be found in the repository. I am still not sure about the final API. |
|||
09 Dec 2012, 17:26 |
|
typedef 09 Dec 2012, 17:40
Maybe you can also implement a base64/hex encoder library so that way you can store binary data into the file. You can use a specific API to do the conversion.
|
|||
09 Dec 2012, 17:40 |
|
f0dder 09 Dec 2012, 17:50
JohnFound wrote:
_________________ - carpe noctem |
|||
09 Dec 2012, 17:50 |
|
JohnFound 09 Dec 2012, 18:15
SQLite is relational database and in some aspects is better than the hierarchical databases for storing configuration data. Anyway, every data structure can be represented in RDBS. The only reason I don't want to use it is that it is pretty big for such a simple task.
|
|||
09 Dec 2012, 18:15 |
|
f0dder 09 Dec 2012, 18:59
JohnFound wrote: Anyway, every data structure can be represented in RDBS. For a project that already uses SQLite for other things, it could be reasonable enough using it for configuration as well. Heck, {key,value} pairs work fine for a lot of purposes, and you can probably get away with using a delimiter character in the key strings to simulate 'folders'. It's somewhat bruteforce, but with the CPU speed and amount of RAM these days, who cares _________________ - carpe noctem |
|||
09 Dec 2012, 18:59 |
|
JohnFound 19 Dec 2012, 08:49
The first version of the new library is finished and tested now. The source is merged into FreshLib project: "data/uConfig.asm"
There is a test project to play with in "freshlib/test_code/TestConfig.fpr". I didn't managed to make the whole library fit in 512 bytes in the worst case. Instead I chose to make the API more feature rich. So, the final size depends on procedures used: 628 bytes if get and set functions are used. 284 bytes if only get functions are used. 501 bytes if only set functions are used. |
|||
19 Dec 2012, 08:49 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.