flat assembler
Message board for the users of flat assembler.

Index > Programming Language Design > Separating entities fasm assembler & fasm assambly language

Author
Thread Post new topic Reply to topic
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 16 Oct 2018, 22:00
Hello Tomasz and forum members.
fasm assembly language deserve its own page not mixed with fasm assembler.
Couple days ago I tryed to separate entities "fasm assembler" & "fasm assambly language" in rus wiki that are mixing in one entity "fasm" as in any wiki however. Still both entities not ready fasm page present in its current form, but planned became page of choise what exactly meaned by fasm - programming tool or programing language.

"Ассемблер Fasm" - page is ready (maybe with some assumptions but ready), it is more compete definition of tool fasm, with cutted off part relative to fasm assembly language itself (forex. only fresh IDE (and ofcourse fasmw, fasmd too) are related to fasm assembler, other IDE are related to syntax of fasm language). This text looks more clear for reading, for ex. term self-hosting just as translation sounds terribly, this term needed clarification. Thoought about novice usual mistake of mixing stages, replaced with description of all 4 stages instead of 2. And rus text became more closer to english variant. term fasm OS version replaced with fasm OS variation, because version is same for realization for all OSes.


But with describing fasm language I stuck (muse is gone for a while). I planned to completely cutoff any relation to x86 in fasm language description, concentrating only on preprocessor and assembly directives. Because language more wider, it is applicable to (I guess) any other architecture and addon fasmarm proved this.


I have some questions about tool & language:

At what stage comments removed and lines combined? at begining of preprocessor stage?

Still stages goes in order preprocessor,parser,assembler,formatter - is it mean preprocessor operate on completely untokenized raw text? or some tokenization will happen (preprocessor syntax)?
What validates done in parser stage? order, existance of meaning?

Assembly syntax:
language supports only 1 byte characters. Instead of whole 256 symbol range supported only 254 (NUL ascii(0) & SUB ascii(26) are off), why?
Nul symbol because it marks end of text file (and sources are just text files)?
Sub symbol because it is used internaly as marker of token other then quoted token or single char tokens that marked by theyself. Using of such simbol would arise conflict because from one side it is single char tokens that marked by theyself from other side it is marker of other token class.

any kind of quotes in tokens not enclosed in quotes are just regular symbol - i supprised.
next succesfuly compiled (in fasmg it succesfuly compiled too - but that another story, they (I mean quotes) could be added to names in customization includes that suspected to conflict with same ones that final user defined, each of names will stayed readable)
Code:
db _""hello'
_""hello':    


more things that fasm compiled successfuly (note fasmg don`t):
1 a) equing trully digit is bad
Code:
9 equ 2 ;preprocessor allow it
;9=2 ; assembler don`t
db 9 dup 'a'    

1 b) equing numeric token that isn`t valid digit is good
Code:
180° equ 1.5707963267948966192313216916398 ;preprocessor allow it
;180° = 1.5707963267948966192313216916398 ; assembler don`t
dq 180°    

2 can`t say that this is bad or good feature
Code:
macro 5 {}
5    


in all range of speccial symbols only "." allowed to be a macro name (in fasmg it is valid too, example only in fasm syntax) any number of that symbol allowed as name
Code:
struc . args {} ; that looks bad, if such thing is used it can be mixed up with label descending, but if use it wisely such syntax have rights to exist
object . method

macro ... {}
...    



P.S. Hope there will be no rocks throwned to me because I started this without ask. Any case I don`t going to modify EN wiki, all stayed in RUS wiki. I`m not that man that can describe my own thoughts in scientific form in english.

_________________
I don`t like to refer by "you" to one person.
My soul requires acronim "thou" instead.


Last edited by ProMiNick on 17 Oct 2018, 09:28; edited 3 times in total
Post 16 Oct 2018, 22:00
View user's profile Send private message Send e-mail Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 17 Oct 2018, 08:46
ProMiNick wrote:
At what stage comments removed and lines combined? at begining of preprocessor stage?

FASM.pdf wrote:
2.3 Preprocessor directives
All preprocessor directives are processed before the main assembly process, and therefore are not affected by the control directives. At this time also comments are stripped out.

I doubt lines are ever combined.

ProMiNick wrote:
Still stages goes in order preprocessor,parser,assembler,formatter - is it mean preprocessor operate on completely untokenized raw text? or some tokenization will happen (preprocessor syntax)?

The syntax of preprocessor and the way parameters are passed implies that tokenization is done before performing any preprocessing. After all, the preprocessor has to recognize macro, struc, <…, …, …> sequences, etc.

ProMiNick wrote:
Nul symbol because it marks end of text file (and sources are just text files)?

There’s no mark for end of file in any modern widely used file system (FAT, NTFS). And text files are not special and have no real differences from other files.
Post 17 Oct 2018, 08:46
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 17 Oct 2018, 09:02
If tokenization is done at start of preprocessing. What for parser stage? different level of validation or what?
Post 17 Oct 2018, 09:02
View user's profile Send private message Send e-mail Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 17 Oct 2018, 21:33
ProMiNick wrote:
If tokenization is done at start of preprocessing. What for parser stage? different level of validation or what?

FASM itself has quite complex syntax. You might have seen a “program” I’ve posted recently without a single assembly instruction. I’ve never dived deep into the source but it seems that’s what that file is for.

Well, generally tokenization is considered to be the task for lexical analysis which is the first stage of a translator. Parsing is syntax analysis which is the second stage, so nothing wrong here.
Post 17 Oct 2018, 21:33
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 18 Oct 2018, 14:45
ProMiNick wrote:
fasm assembly language deserve its own page not mixed with fasm assembler.
Couple days ago I tryed to separate entities "fasm assembler" & "fasm assambly language" in rus wiki that are mixing in one entity "fasm" as in any wiki however. Still both entities not ready fasm page present in its current form, but planned became page of choise what exactly meaned by fasm - programming tool or programing language.

I would say that in case of fasm 1 separation of these concepts is not perfect. As fasm 1 evolved, some features have been influenced by the limitations of the implementation. Also, fasm 1 actually contains two distinct languages, one for the preprocessor and one for the assembler. They are very different, though they also affected each other's implementation.

In case of fasmg the distinction is much more clear, because the language of fasmg was designed before any implementation was even attempted, so it is a fully separate entity. Therefore it might be possible to implement the same language in different ways (I had been considering some other approaches that could in theory be more efficient while implementing the same language in a fully compatible way) and it makes sense to discuss the language of fasmg separately from implementation.

On the other hand, nowadays any new features in fasm 1 tend to be back-ported from fasmg, so the implementation details of fasm 1 architecture also no longer affect them.

ProMiNick wrote:
At what stage comments removed and lines combined? at begining of preprocessor stage?
The source reader (which is a part of the preprocessor) strips comments and combines lines when it tokenizes them. This is done any time when a source text is read, for example by INCLUDE directive.

ProMiNick wrote:
Still stages goes in order preprocessor,parser,assembler,formatter - is it mean preprocessor operate on completely untokenized raw text? or some tokenization will happen (preprocessor syntax)?
Tokenization is done by preprocessor and the preprocessed text that goes to the parser is already fully tokenized. See the description of .fas format.

ProMiNick wrote:
What validates done in parser stage? order, existance of meaning?
This stage parses the syntax of the language used by the assembler, which a completely separate language from the one used by preprocessor. The parser converts the tokenized lines into fasm's internal bytecode. This includes recognition of labels, instructions and directives. Numeric expressions are converted into stack-based evaluation sequences of bytecode.
Post 18 Oct 2018, 14:45
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 22 Oct 2018, 15:07
there so many exceptions from main rules in fasm (just like in russian language).
before today I thought that only label: instruction(or label: macro) can be mixed to one line.
But today I found smthng that completely breaks paradigm 1 line - 1 instruction
Code:
macro u[0] { db 1}  macro ... [_""] { dd 5} struc o val { .field dd val } name equ 5
u[0]
u
...[_""]
n o name    
all compiled successfuly((((((((((((((

is it could be fixed?
what I want: rule that disallows preprocessor definitions after symbol "}" - so only using of smthing already defined or assembler instruction would be allowed.


Last edited by ProMiNick on 23 Oct 2018, 09:29; edited 3 times in total
Post 22 Oct 2018, 15:07
View user's profile Send private message Send e-mail Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 22 Oct 2018, 15:25
This is the feature of preprocessor's language that was introduced deliberately, it has some uses when combined with FIX definitions.

Note that "label: instruction" is the syntax of the assembly language - that is the language of assembler module. As noted above, preprocessor is a different language on top of that.

In addition to that, ignoring the preprocessor, note that fasm's variant of assembly language (as implemented by fasmg also) actually allows "label1: label2: ... labelN: instruction" syntax.
Post 22 Oct 2018, 15:25
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 798
Location: Russian Federation, Sochi
ProMiNick 23 Oct 2018, 13:32
Let suppose one could make multiline comment in source text. How? He is just add symbol with code 0 to src text and make comment after it. In compilation process everithing in file after symbol 0 is stripped out, like stripped out everything between symbol ; and CR symbol.
could fasmw IDE be teached to display in editor content after 0 symbol (determine it as CRLF) and highlight it in comment color?
Is src remain text file? if we resave it with any text editor - it became text with stripped unsufficient comment. So it is stayed text except only one binary byte placed in it.
It could be nice ability - and it was requested some time ago (place something in code to mark code end - symbol with ASCII code 0 is best for it), it is impoossible to place it from keyboard so additional button will needed in IDE.

Second question related not to IDE but to fasm internals. if symbol 0 disallowed, could it be used insted of $1a to mark a symbol that not a symbol character and not a quoted string.
As tokenized line end could be used pair of 00 00. Or for determining token as quoted string internaly used only one of quotes, second remain untouched, so it could be used insted of $1a. And for marking tokenized line end just one 0 left.

If to do so all 256 symbols will be allowed for use in fasm src texts.
Post 23 Oct 2018, 13:32
View user's profile Send private message Send e-mail Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 23 Oct 2018, 16:48
ProMiNick wrote:
It could be nice ability - and it was requested some time ago (place something in code to mark code end - symbol with ASCII code 0 is best for it), it is impoossible to place it from keyboard so additional button will needed in IDE.

So, it’s a feature that is difficult to use and that takes C-style null-terminated strings to a new level of crazyness. How many people would use it? Do you expect all software to deal with such files without removing the last part? What are the valid use cases for that? Do they make this feature score enough points if it, as any other feature, starts with –100 points? Is it worth future support?

Most non-printable characters are supported badly or not supported at all by most software, except for a few ones like horizontal TAB (#9), New line (#10) and Carriage return (#13). FASM doesn’t exist in its own universe, it has to interoperate with other software. Tricks with #$1A seem to be funny until they’re implemented and people start complaining about other editors being unable to preserve large pieces of code.
Post 23 Oct 2018, 16:48
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.