flat assembler
Message board for the users of flat assembler.

Index > IDE Development > TABLES.INC token classifier

Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 02 Apr 2013, 22:57
Every time FASM changes, tools that manipulate FASM source code need to change also. For example, syntax highlighters used by editors. Many people have probably written scripts to accomplish this, but I thought it could be done with FASM itself in the following way:
Code:
; General FASM token classifier

SEPARATOR fix " "
GROUP fix 13,10

CLASS equ
LENGTH equ

macro include_variable [A] {}
macro symbol_characters [A] {}

macro preprocessor_directives [A] {
  CLASS equ PREPROCESSOR
  LENGTH equ PREFIX
}
macro macro_directives [A] {
  CLASS equ PREPROCESSOR
  LENGTH equ PREFIX
}

macro operators [A] {
  CLASS equ OPERATORS.CONFLICT
  LENGTH equ PREFIX
}
macro single_operand_operators [A] {
  CLASS equ OPERATORS.CONFLICT
  LENGTH equ PREFIX
}
macro directive_operators [A] {
  CLASS equ OPERATORS
  LENGTH equ PREFIX
}
macro address_sizes [A] {
  CLASS equ OPERATORS
  LENGTH equ PREFIX
}

macro symbols [A] {
  rept 10 n:2 \{
    macro symbols_\#n [B] \\{
      CLASS equ SYMBOL
      LENGTH equ n
    \\}
  \}
  macro symbols_end [B] \{ CLASS equ \}
  CLASS equ
}

macro instructions [A] {
  rept 15 n:2 \{
    macro instructions_\#n [B] \\{
      CLASS equ INSTRUCTION
      LENGTH equ n
    \\}
  \}
  macro instructions_end [B] \{ CLASS equ \}
  CLASS equ
}

macro data_directives [A] {
  rept 3 n:2 \{
    macro data_directives_\#n [B] \\{
      CLASS equ DATAD
      LENGTH equ n
    \\}
  \}
  macro data_directives_end [B] \{ CLASS equ \}
  CLASS equ
}
;#######################################

..__db equ db

macro db [A] { common
  match =PREPROCESSOR,CLASS \{
    match B=,C,A \\{
      ; add string (C) of length (B) to CLASS
    \\}
  \}
  match =OPERATORS,CLASS \{
    match B=,C=,D,A \\{
      ; add string (C) of length (B) to CLASS
    \\}
  \}
  match =SYMBOL,CLASS \{
    match B=,C=,D,A \\{
      match =10h,C \\\{
        CLASS equ REGISTER
      \\\}
      ; add string (B) of LENGTH to CLASS
      match =10h,C \\\{
        restore CLASS
      \\\}
    \\}
  \}
  match =INSTRUCTION,CLASS \{
    DW_DELAY equ A ; delay until class refinement in DW
  \}
  match =DATAD,CLASS \{
    match B=,C,A \\{
      ; add string (B) of LENGTH to CLASS
    \\}
  \}
}
macro dw [A] { common
  ; used to split INSTRUCTION/DIRECTIVE classes
  match =INSTRUCTION,CLASS \{
    match B=-C,A \\{
        virtual
          ..__db \\`B
          load d dword from $-4
        end virtual
      match D=,E,DW_DELAY \\\{
        if d="tive"
          ..__db D,SEPARATOR
        else
        end if
      \\\}
    \\}
  \}
}

include "..\..\FASM\SOURCE\TABLES.INC"    
...next I will create some templates for common editors: npp, PSPad, x86lab, etc. There are some conflicts to resolve as well (i.e. the reason all highlighters have problem with FASM syntax). Maybe, add another abstraction layer for CLASS precedence.

If you are working on an IDE then maybe this work will be useful for you.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 02 Apr 2013, 22:57
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 03 Apr 2013, 05:56
yeap useful Smile i did it till now regex+sort(because of MASM/NASM merging too). i think i will output the class dictionary using it for geshi and scintilla. lexer in scintilla has few awareness of fasm syntax. it's simple to tweak it but i didnt do it because i am a bit lazy. i think the code above may stay in the example section because it works essentially as a tokenizer for macro-languages. invoking fasm more times on midlle-stage-inputs till to the last stage outputting the executable.

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 03 Apr 2013, 05:56
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 04 Apr 2013, 16:29
Keep It Simple Stupid ? Wink
Code:
sect equ
line equ 0

macro dw [arg]{}

macro db arg1,arg2,[args]{
 if arg1 eqtype 0
   if arg2 eqtype ''
      display arg2,13,10
   else if arg2 eqtype 0
      if sect eq symbol_characters
        ;---    display_hex arg1 
        ;---    display_hex arg2
        match a,args\{
         display \a ;--- hex for 9,0Ah,0Dh,1Ah,20h
        \}
       display 13,10
   end if
  end if
  else if arg1 eqtype ''
    display arg1,13,10
  end if
}

macro @output arg{
 local a
 macro arg a\{
  sect equ arg
  display `arg,13,10
 \}
}

 @output symbol_characters
 @output preprocessor_directives
 @output macro_directives
 @output operators
 @output single_operand_operators
 @output directive_operators
 @output address_sizes
 @output symbols_2
 @output symbols_3
 @output symbols_4
 @output symbols_5
 @output symbols_6
 @output symbols_7
 @output symbols_8
 @output symbols_9
 @output symbols_10
 @output symbols_11
 @output symbols_12             ;--- doesent exist, test
 @output instructions_2
 @output instructions_3
 @output instructions_4
 @output instructions_5
 @output instructions_6
 @output instructions_7
 @output instructions_8
 @output instructions_9
 @output instructions_10
 @output instructions_11
 @output instructions_12
 @output instructions_13
 @output instructions_14
 @output instructions_15
 @output instructions_16 ;--- doesent exist
 @output data_directives_2
 @output data_directives_3
 @output data_directives_4

virtual at 0
 include "tables.inc"
end virtual
    

output
Code:
flat assembler  version 1.71.09  (1048576 kilobytes memory)
symbol_characters
;<--- here should display hex ------------
+-/*=<>()[]{}:,|&~#`;\
preprocessor_directives
define
include
irp
irps
macro
match
purge
rept
restore
restruc
struc
macro_directives
common
forward
local
reverse
operators
+
-
*
/
and
mod
or
shl
shr
xor
single_operand_operators
+
-
not
plt
rva
directive_operators
align
as
at
;---- follows ....
    

it may count lines too! its so K.I.S.S as to deserve the XML format
Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 04 Apr 2013, 16:29
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 04 Apr 2013, 20:06
Here are the exceptions/notes I have collected:
Code:
not in TABLES.INC:
       - FIX EQU
       - @@ @B @F @R %T

PREPROCESSOR
  - MACRO, only valid inside of {}
OPERATOR
  - instruction
      - and or shl shr xor not
  - directive
      - align
  - address
SYMBOL
  - REGISTER
  - MODIFIER (type, directive)
INSTRUCTION
  - DIRECTIVE
DATAD    

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup


Last edited by bitRAKE on 05 Apr 2013, 00:20; edited 1 time in total
Post 04 Apr 2013, 20:06
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 04 Apr 2013, 21:54
bitRAKE,

In a macro language like fasm provides, probably the only thing you can be sure about is that in «a fix b» a is a symbol, fix is the directive and b is a replacement for a. Everything else requires preprocessing (except comments, of course Wink).

To stay on topic, equ is not in TABLES.INC too.
Post 04 Apr 2013, 21:54
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4020
Location: vpcmpistri
bitRAKE 05 Apr 2013, 00:20
Too bad we can't "f fix fix" to save a few characters. Even crazy stuff like:
Code:
\%?$ fix macro

\%?$ a b{db b}    
... is possible! Yet, it is not possible to:
Code:
C fix ;
dnl fix ;

C This type of comment syntax is not valid in FASM.
C Macro for AT&T syntax not possible.

; Of course, macro works (kind of):
macro C d {}
macro dnl d {}    
Yeah, FASM's flexibility is both a source of joy and frustration. Very Happy

Thanks for the reminder of EQU. Having a concise list will speed future adaption of the syntax. We should make an obfuscator that packs source codes to minimal bytes which still compile into original target, lol.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 05 Apr 2013, 00:20
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 05 Apr 2013, 03:06
obfuscator... is interesting idea, because considering on-line http compilation it may be very useful to reduce bandwidth. it contains the syntax, it doesent need decompression. client sends, i estimate 1/3 -> 1/5 source stuff, to server. on the server side, diff will be simpler to keep it updated.

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 05 Apr 2013, 03:06
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 05 Apr 2013, 09:02
baldr wrote:
bitRAKE,probably the only thing you can be sure about is that in «a fix b» a is a symbol, fix is the directive and b is a replacement for a. Everything else requires preprocessing (except comments, of course Wink).

i'll show you something. your comments here very well appreciated Wink
Code:
da equ db
comm equ ;
da 1
comm fix \;
comm aaaaaaa
    

output
Code:
E:\old_tests_x64lab>fasm equ_test.asm
flat assembler  version 1.71.09  (1048576 kilobytes memory)
1 passes, 1 bytes.

E:\old_tests_x64lab>
    

Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 05 Apr 2013, 09:02
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 05 Apr 2013, 10:03
hopcode wrote:
your comments here very well appreciated Wink
What exactly do you expect?
  1. da is defined as «db»; nothing special;
  2. comm is defined as empty;
  3. da is replaced with «db»; this yields «db 1», valid assembler directive;
  4. line 5 is concatenated with current; this yields «comm fix comm aaaaaaa», which works as expected.
Maybe you thought that line 5 will expand to something like «; aaaaaaa»? Add another line, «comm bbbbbbb», you'll be surprized. Wink
Post 05 Apr 2013, 10:03
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 05 Apr 2013, 10:49
yes, as expected. i was "speculating" about equ above. it should have lower or equal priority than fix. then i was searching for something repetitive without brackets {}
Code:
comm fix times 0 \;
comm 1
comm db 10

comm aaaaa
    

output seems ok. legal ? i think yes. my initial point was about comm equ times 0 \;.
and this last works too.
what do you think ?

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 05 Apr 2013, 10:49
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 05 Apr 2013, 16:55
hopcode,

Yes, it works (but probably not as you think): first line is concatenated with second (I'm out of guesses why you're doing so), comm fix times 0 comm 1, while looking somewhat strange, successfully neutralize anything at assembler stage, provided it's used as a first token (after optional labels, naturally).

I think simple ; is better. Wink
Post 05 Apr 2013, 16:55
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 06 Apr 2013, 02:51
yes, ; works better in most cases! i am collecting/illustrating the syntax by simple examples and descriptions. thank you for your huge patience.
Cheers,
Very Happy

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 06 Apr 2013, 02:51
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.