flat assembler
Message board for the users of flat assembler.

flat assembler > Projects and Ideas > Another settings file format...

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
Well, I already created one simple micro database for FreshLib (discussed in this thread.
But today, I got some inspiration and invented another simple file format for hierarchical database (i.e. for storing settings, applications preferences, etc.)
Unlike .INI files which have only two levels of nesting (sections and keys) this one has really true hierarchical structure. It is inspired by Python indentation model and seems to be really easy for searching and parsing. Additional advantage is that it is human readable and editable.
Here is some example:
Code:
; comment
; keys on root level
key1="some string value"
key2=1234 ; some number value in FASM format.
key3=#0f0a030405060708  ; binary field encoded in hex
dir1:
    key1="another key"   ; the full name is: /dir1/key1
    key2=1234
    dir2:
        key1=1234      ; /dir1/dir2/key1
        dir3:
            key1=1234  ; /dir1/dir2/dir3/key1
    key3=#010203    ; /dir1/key3
key4=1234      ; /key4 - on the root level.
    


The colon symbol opens new sub-directory; If some of the next rows begins on position less than or equal to the current sub-directory indent, it ends the current directory and returns to the previous, where the same check should be provided.

Please comment this format. What I am missing? How it should be implemented? Where are the possible pitfalls?
I would like to have some discussion before implementing it as a code.

The goal is to implement the full API in less than a 512bytes.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 01 Dec 2012, 16:35
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4634
Location: Argentina
If you need some extra ideas, you might want to take a look at YAML, which is also identation based.
Post 01 Dec 2012, 17:45
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
Yes, YAML seems to be very similar to this my idea (I knew it - someone invented it before me. Wink )
I will read its specification very carefully, but probably some simplified form will be better for assembly programming.
Post 01 Dec 2012, 17:50
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
SeproMan



Joined: 11 Oct 2009
Posts: 54
Location: Belgium
In your example you used 4 SPACE characters for indentation, isn't it?
How would you deal with the TAB character?
Post 02 Dec 2012, 14:15
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1389
Location: Toronto, Canada
What's with 512 bytes dogma?
Every piece of code must fit into 512 bytes -- just for the fun of it?
Post 02 Dec 2012, 14:41
View user's profile Send private message Send e-mail Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
Serpoman: the number of indentations is not important. It serves only to make it more readable. The rule is following: Every level of nesting has some start indentation. If some row has indentation less than the current level, it ends the block and belongs to the previous level. For example, see "/dir1/key3" from the above example. "key3" has indentation less than "/dir3/key1" and less than "dir3". That is why it belongs to "/dir1" sub-block. "key3" can have indentations 4, 5, 6 or 7 and still will belongs to the same sub-block. (I am not sure I am clear, but I still don't have exact formal definition of this behavior - AFAIK, Python language has similar indentation rule. Maybe I have to read it from its manuals. Smile )
About TAB characters - it is a problem. The simplest solution is to forbid TAB characters at all. Or expand them correctly...if possible at all.

AsmGuru62: 512 bytes is not a dogma. It is a desire to make it small and simple.
Post 02 Dec 2012, 20:30
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
SeproMan



Joined: 11 Oct 2009
Posts: 54
Location: Belgium
You say :
"key3" can have indentations 4, 5, 6 or 7 and still will belongs to the same sub-block.

This means that you define the current level by the position of the colon character.
Nothing wrong with that!
But it allows for some ugly layout. If key3 is shifted from indent 4 to indent 6 it still legally belongs to Dir1 but optically it seems to belongs to Dir2.

Perhaps you could consider defining the current level by the first character of de directory name.
I would even suggest "No TAB's allowed and a fixed 1 SPACE indentation (per level)"

Also see what happens when you rewrite your example with (much) longer directory names? p.e. "ColorSettingsOfSpreadsheetApplication:"

_________________
Real Address Mode.
Post 02 Dec 2012, 21:50
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
Not the colon position. The indentation of the line ended with colon matters. Fixed indentation of 1 space per level is too strong IMHO. It will raise errors all the time.
Post 03 Dec 2012, 06:06
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
I am working slowly on this library. The results are pretty good for now. The discussed format is really easy for parsing. The subroutine that searches the database tree is only 121bytes long:

Code:
; Arguments:
;   esi - pointer to the text of the database
;   edx - pointer to the path string
; Returns:
;   CF=0 if the path is found.
;     esi - pointer to the value.
;     ah=":" if the path is directory.
;     ah="=" if the path is data field.
;
;   CF=1 the path was not found.
;     esi - points to the first row after the end of the block where the path
;           is supposed to be (but is not).
;     edx - points to the remainder of the path that was not found.
proc __SearchData
begin
        mov     edi, edx
        xor     ebx, ebx        ; current block indent
        dec     ebx

.outer_loop:
; compute the line indent
        mov     ecx, esi

.scan_indent:
        lodsb
        test    al, al
        jz      .end_of_file

        cmp     al, ' '
        je      .scan_indent
        jb      .outer_loop

        cmp     al, ';'
        jne     .indent_ok

; skips to the end of the line.
.SkipIt:
        lodsb
        cmp     al, ' '
        jb      .outer_loop
        test    al, al
        jnz     .SkipIt

.end_of_file:
        dec      esi

.error:
        stc
        return

.indent_ok:
        dec     esi

        sub     ecx, esi
        neg     ecx

; Here the indent of the current line is in ECX.

        cmp     ecx, ebx
        jle     .error          ; if the current indent is less of equal than the block
                                ; indent, then the needed element can not be found.
        cmp     ecx, edi
        ja      .SkipIt         ; the current indent is above the current upper limit, so
                                ; the whole line have not to be processed.

        mov     edi, edx        ; the start of the current path

.key_loop:
        mov     al, [edi]
        inc     edi
        mov     ah, [esi]
        inc     esi

        test    al, al
        jz      .maybe_end_of_path
        cmp     al, '/'
        je      .key_maybe_found
        cmp     al, '\'
        je      .key_maybe_found

        test    ah, ah
        jz      .end_of_file

        cmp     al, ah
        je      .key_loop

; not this, so check the current line whether it is a directory.
.not_this:
        dec     esi

.dir_loop:
        lodsb
        test    al, al
        jz      .end_of_file

        cmp     al, ' '
        jb      .outer_loop
        je      .SkipIt
        cmp     al, '='
        je      .SkipIt
        cmp     al, ':'
        jne     .dir_loop

        mov     edi, ecx        ; it is a subdirectory, that is not in the path,
                                ; so set the upper limit.
        jmp     .SkipIt

.maybe_end_of_path:
        cmp     ah, '='
        je      .found

.key_maybe_found:
        cmp     ah, ':'
        jne     .not_this

        test    al, al
        jz      .found          ; the searched path is subdirectory, not a record.

        mov     edx, edi        ; the subdirectory was found. Set new path and block indent.
        mov     ebx, ecx
        jmp     .SkipIt

.found:                         ; the whole path was found.
        clc
        return
endp
    


The procedure supports comments in the configuration file, beginning with ";". Only spaces are allowed. Actually, every character less than $20 is accepted as a new line separator.
Post 08 Dec 2012, 22:35
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3172
Location: Denmark
Personally I'd go for JSON - it's a pretty standard format, it doesn't depend on indentation (you can indent it just as prettily as YAML/whatever, but you can also compact if for network transfer), there's good syntax support for it in a lot of editors, and many people are familiar with it.

I don't see the point of "yet another configuration format" - at least not when it doesn't add anything the existing formats can't handle.
Post 09 Dec 2012, 13:42
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
Why not XML then? New formats appears every day. There should be some reason.
Post 09 Dec 2012, 15:19
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3172
Location: Denmark
XML is a bit on the heavy side, and not very human-friendly - so alternatives were were justified. But IMHO there's enough to choose from now, so I don't see the point in adding yet a format that has the same capabilities and doesn't add anything except a somewhat different look.

Might be worthwhile to look at a binary format with ACID guarantees, that would (probably Smile) be something reasonably new.
Post 09 Dec 2012, 16:09
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
f0dder wrote:
Might be worthwhile to look at a binary format with ACID guarantees, that would (probably Smile) be something reasonably new.


It is already invented: SQLite.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 09 Dec 2012, 16:57
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1174
Location: Unknown
Stupid post removed.


Last edited by HaHaAnonymous on 28 Feb 2015, 22:12; edited 1 time in total
Post 09 Dec 2012, 17:16
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
HaHaAnonymous, You can create unlimited levels of nesting.
Well, as many as you can put in the memory actually. Smile
Although, the whole library can be found in the repository.
I am still not sure about the final API.
Post 09 Dec 2012, 17:26
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2910
Location: 0x77760000
Maybe you can also implement a base64/hex encoder library so that way you can store binary data into the file. You can use a specific API to do the conversion.
Post 09 Dec 2012, 17:40
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3172
Location: Denmark
JohnFound wrote:
f0dder wrote:
Might be worthwhile to look at a binary format with ACID guarantees, that would (probably Smile) be something reasonably new.
It is already invented: SQLite.
SQLite can be used for settings, yes, but it's a flat database format and not a tree structure. Of course you can coerce a tree into a flat format, but it's not exactly pretty...

_________________
Image - carpe noctem
Post 09 Dec 2012, 17:50
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
SQLite is relational database and in some aspects is better than the hierarchical databases for storing configuration data. Anyway, every data structure can be represented in RDBS. The only reason I don't want to use it is that it is pretty big for such a simple task.
Post 09 Dec 2012, 18:15
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3172
Location: Denmark
JohnFound wrote:
Anyway, every data structure can be represented in RDBS.
It can be coerced into it, yes - but it's not necessarily a very comfortable fit, hence why the NoSQL movement exists Smile

For a project that already uses SQLite for other things, it could be reasonable enough using it for configuration as well. Heck, {key,value} pairs work fine for a lot of purposes, and you can probably get away with using a delimiter character in the key strings to simulate 'folders'. It's somewhat bruteforce, but with the CPU speed and amount of RAM these days, who cares Rolling Eyes Rolling Eyes Rolling Eyes

_________________
Image - carpe noctem
Post 09 Dec 2012, 18:59
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3476
Location: Bulgaria
The first version of the new library is finished and tested now. The source is merged into FreshLib project: "data/uConfig.asm"

There is a test project to play with in "freshlib/test_code/TestConfig.fpr".

I didn't managed to make the whole library fit in 512 bytes in the worst case. Instead I chose to make the API more feature rich. So, the final size depends on procedures used:

628 bytes if get and set functions are used.
284 bytes if only get functions are used.
501 bytes if only set functions are used.
Post 19 Dec 2012, 08:49
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2018, Tomasz Grysztar.

Powered by rwasa.