flat assembler
Message board for the users of flat assembler.

Index > Programming Language Design > [fasmg] Would a change to iterate be better?

Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 25 May 2020, 10:29
Proposed change:

iterate should ignore trailing comma,

iterate A,
end iterate

...would not loop with A="". To do that would require:

iterate A,,
end iterate

This change would follow list definitions in many other languages and parallels whitespace-separated lists - trailing whitespace is not another item.

Present functionality has no way to iterate zero times without additional branching. Copy-Paste of lists always requires post paste editing when trailing item is modified.

repeat instruction already follows a zero-based pattern. So, this change would seem to align the functionality in fasmg.

How would it effect the existing code? Only the empty case would need to be changed.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 25 May 2020, 10:29
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 May 2020, 10:46
This specific trait was inherited from fasm 1, where IRP was supposed to behave similarly to a macro with repeated argument - this is why it was chosen that it should behave analogously to such macro and always process at least a single argument, even if empty one. Similarly macros in fasm 1 treat the additional comma at the end always as adding additional empty argument.

In general, I'm very reluctant to consider any backward-compatibility breaking changes at this point (otherwise I would be attempting changes to what I consider "not the best choices" in some other areas, which I would consider even more important - but still not important enough to consider breaking existing source bases now).

However, if we could be fairly sure that iteration over an empty item at the end of list is not really used anywhere (it is quite conceivable), this might be something to consider. Because yes, I believe that what you proposed would be unarguably better. There might be a small problem in case when there are multiple iterated parameters, but it should be solvable.
Post 25 May 2020, 10:46
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 May 2020, 12:26
I prepared a patch, please test it if you have an opportunity:
Code:
--- source/directives.inc
+++ source/directives.inc
@@ -1843,31 +1843,35 @@
       iterator_parameters_declared:
        cmp     al,','
        jne     invalid_iterator
        mov     eax,[number_of_parameters]
        mov     [number_of_values],eax
-       xor     eax,eax
        mov     [value_index],eax
-       inc     eax
-       mov     [number_of_iterations],eax
+       and     [number_of_iterations],0
       collect_iterator_values:
        inc     esi
+       call    move_to_next_symbol
+       jc      initialize_iterator
        mov     edx,expression_workspace
        mov     ecx,sizeof.LineExcerpt
        call    reserve_workspace
        call    cut_argument_value
        add     edi,sizeof.LineExcerpt
        inc     [number_of_values]
        mov     ecx,[value_index]
-       sub     ecx,[number_of_parameters]
-       jc      iterator_value_collected
-       mov     [value_index],ecx
+       cmp     ecx,[number_of_parameters]
+       jne     iterator_value_collected
        inc     [number_of_iterations]
+       xor     ecx,ecx
       iterator_value_collected:
-       inc     [value_index]
+       inc     ecx
+       mov     [value_index],ecx
        cmp     al,','
        je      collect_iterator_values
+      initialize_iterator:
+       cmp     [number_of_iterations],0
+       je      inactive_iterator_block
        mov     dl,DBLOCK_CONTROL
        mov     ecx,5+sizeof.RepeatData
        call    add_directive_block
        mov     [edi+DirectiveBlock.subtype],CTRL_IRP
        or      [edi+DirectiveBlock.flags],CTRLF_BREAKABLE + CTRLF_HAS_REPEAT_DATA + CTRLF_HAS_WRITABLE_INDEX    
I'm going to continue testing it myself on all fasmg projects that I can find - if it does not break anything, I may apply it to the trunk.
Post 25 May 2020, 12:26
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 25 May 2020, 17:28
I tested many projects and it seems that this change is completely safe, I have not found a single source text that would actually rely on this behavior - which is yet another argument for this being a good change.

I'm going to just include this change in the official version then. Thank you for bringing it up! I was myself too stuck in "compatibility" thinking to notice this - nearly free - opportunity.

Also, this brings more symmetry between IRP and IRPV. And to iterate over an empty argument you can still do it like:
Code:
iterate A,<>    
in addition to the double-comma variant you mentioned.
Post 25 May 2020, 17:28
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 26 May 2020, 03:06
I don't expect you to entertain any of my crazy ideas, but this one seemed reasonable. Like I think all the legacy data sizes should go away, lol. The quick reply is awesome and works well.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 26 May 2020, 03:06
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 26 May 2020, 05:18
I'm writing a script to extract all the instructions from your implementation of x86:
Code:
file                    lines   labels

assembler.inc           3288    334
calm.inc                2343    222
conditions.inc           655     85
console.inc              331     44
directives.inc          4912    497
errors.inc               224     20
expressions.inc         4130    489
floats.inc              1165    115
map.inc                  227     27
messages.inc              52      0
output.inc               728     77
reader.inc               442     40
symbols.inc             1511    157
tables.inc               419      2
variables.inc            261      0
version.inc                1      0

80186.inc                317     51
80286.inc                 73      9
80287.inc                 10      0
80386.inc               2337    417
80387.inc                129     14
80486.inc                 52      5
8086.inc                1324    240
8087.inc                 502     76
p5.inc                    30      3
p6.inc                    51      5
x64.inc                 3030    554

adx.inc                   20      0
aes.inc                   28      0
avx2.inc                 448     26
avx512_4vnniw.inc         25      0
avx512_bitalg.inc         21      0
avx512bw.inc             404      0
avx512cd.inc              19      0
avx512dq.inc             390      0
avx512er.inc              30      0
avx512f.inc             2363    116
avx512_ifma.inc            8      0
avx512.inc                 7      0
avx512pf.inc              53      0
avx512_vbmi2.inc          44      0
avx512_vbmi.inc           13      0
avx512vl.inc               4      0
avx512_vnni.inc            8      0
avx512_vpopcntdq.inc       8      0
avx.inc                 1082     40
bmi1.inc                  88      0
bmi2.inc                  95      0
cet_ibt.inc                5      0
cet_ss.inc                88      0
f16c.inc                  26      0
fma.inc                   19      0
fsgsbase.inc              20      0
gfni.inc                  34      0
hle.inc                   11      0
invpcid.inc               13      0
mmx.inc                  132     16
movdir64b.inc             20      2
movdiri.inc               17      0
mpx.inc                  182      2
pclmulqdq.inc              9      0
ptwrite.inc               17      0
rdrand.inc                 9      0
rdseed.inc                 9      0
rdtscp.inc                 3      0
rtm.inc                   35      0
smx.inc                    3      0
sse2.inc                 551     10
sse3.inc                  64      0
sse4.1.inc               189      0
sse4.2.inc                49      0
sse.inc                  389     12
ssse3.inc                 23      0
vaes.inc                  26      0
vmx.inc                   56      0
vpclmulqdq.inc            25      0
xsave.inc                 29      0    
...the relative number of labels kind of indicate calm conversion rate.
Post 26 May 2020, 05:18
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 26 May 2020, 08:17
bitRAKE wrote:
I don't expect you to entertain any of my crazy ideas, but this one seemed reasonable. Like I think all the legacy data sizes should go away, lol.
This one is not even that crazy, they really should have been just a header file. But then most of the existing projects would need to additionally include such file to work. And, anyway, they can be re-defined and overridden easily (although if they were in a header file, they could be defined with ":=" then).

bitRAKE wrote:
I'm writing a script to extract all the instructions from your implementation of x86:
When I was converting the instructions to CALM, I had been using this simple trick to gather information on which of the instructions that are still not converted are used the most:
Code:
macro calminstruction?.display?! any&
end macro    

macro macro?! declaration&
        esc macro declaration
        display `declaration,10
end macro    
It helped me prioritize the conversions.
Post 26 May 2020, 08:17
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 26 May 2020, 12:31
Clever, extrapolating I came up with:
Code:
macro display? D&
        db `D,13,10
end macro

calminstruction calminstruction?! text&
        local B
        match B,text
        jno done
        arrange B,=display B
        assemble B
done:
        arrange B,=calminstruction text
        assemble B
end calminstruction    
...for the others. Couple things to weed out, but the lists look good and take advantage of the groupings you've made.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 26 May 2020, 12:31
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 26 May 2020, 14:30
This is the stub.inc, I prepended to each file:
Code:
if __source__=__file__
macro display? D&
if __source__=__file__
        db `D,13,10
end if
end macro

macro macro?! all&
if __source__=__file__
        display all
end if
        esc macro all
end macro

macro calminstruction?.asmcmd? pattern&
        local   cmd
        arrange cmd, pattern
        assemble cmd
end macro

calminstruction calminstruction?! text&
        local B
        match B,text
        jno done
        arrange B,=display B
        assemble B
done:   arrange B,=calminstruction text
        assemble B
end calminstruction
end if    
...and the batch file:
Code:
@ECHO OFF
:: depenancy chain requires these steps to be
:: preformed separately
SET BASEPATH="C:\fasmg\packages\x86\include\cpu\"
:: merge stub to each file
FOR /R %BASEPATH% %%G IN (*.inc) DO (
        copy /b .\stub.inc + %%G .\%%~nG.inc
)
:: then process them
FOR /R %BASEPATH% %%G IN (*.inc) DO (
        fasmg .\%%~nG.inc
)
:: remove temp files
FOR /R %BASEPATH% %%G IN (*.inc) DO (
        del .\%%~nG
)
ECHO ON    
...x64.inc is the black sheep of the bunch as it's the only one needing files in \ext. And then there's mmx.inc needing asmcmd. So many paths lead to it that I've added it to the stub. Too lazy to code an include fix for x64.inc - I did that one manually.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 26 May 2020, 14:30
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 05 Apr 2021, 20:43
A kind of related change I've been playing with:
Code:
@@ -654,11 +654,11 @@ assembly_line:
        cmp     al,27h
        je      convert_quoted_string
        test    al,al
        jz      file_ended
        cmp     al,0Ah
-       je      line_ended
+       jz      maybe_line_ended
        cmp     al,';'
        jne     preprocess_syntactical_character
        xor     edx,edx
        test    [preprocessing_mode],PMODE_RETAIN_COMMENTS
        jz      skip_comment
@@ -679,13 +679,22 @@ assembly_line:
        cmp     al,';'
        je      concatenation_comment
        cmp     al,20h
        jne     preprocess_line_from_file
        jmp     detect_line_concatenation
+    maybe_line_ended:
+       ; empty lines don't count
+       cmp     edi,[preprocessing_workspace.memory_start]
+       jz      line_ended
+       ; final comma character on line forces concatenation
+       cmp     byte [edi-1],','
+       jnz     line_ended
+       jmp     concatenate_comma_line
     concatenate_line:
        mov     byte [edi-1],20h
        inc     esi
+    concatenate_comma_line:
        inc     [ebx+SourceEntry.number_of_attached_lines]
        jmp     preprocess_line_from_file
     concatenation_comment:
        test    [preprocessing_mode],PMODE_RETAIN_COMMENTS
        jnz     preprocess_line_from_file    
...WARNING: this breaks existing code.

I wanted to have the comma character to automatically imply line continuation. This is just a cosmetic change to make the code look prettier, imho.

What it breaks is a common pattern to use an empty value to arrange, etc. An easy work-around is to convert lines like:
Code:
arrange CheckSumBlocks,    
to ...
Code:
arrange CheckSumBlocks,;    
I'm thinking of maybe using something like ε or ⊖ to represent no value, and then make comma work more like backslash. Then only space-separated arguments will really need the backslash.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 05 Apr 2021, 20:43
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 30 Apr 2021, 01:24
Recently, it was useful to output a more general extensioned VIRTUAL block, 'exe.dd64'. By patching one line fasmg will just use whatever extension is provided. The current distribution does not allow periods in the extension given to VIRTUAL.
Code:
@@ -1148,11 +1148,14 @@ virtual_block:
        mov     eax,[current_area]
        inc     [eax+ValueDefinition.reference_count]
        push    ecx
        call    put_into_map
        pop     ecx
-       jmp     validate_extension
+; allow any extension
+mov esi,edi
+jmp instruction_assembled
     continue_virtual_block:
        and     [leave_opening_parentheses],0
        mov     edi,[expression_workspace.memory_start]
        call    parse_expression
        mov     edi,[expression_workspace.memory_start]    
I understand the concern with opening this up, but it would be nice to replace the whole name. Maybe something like,
Code:
virtual as "/filename.ext"    
...forcing output files to be in the same location. Yet, allowing to change the whole name. Multiple forward/back-slashes would still be invalid.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 30 Apr 2021, 01:24
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 02 May 2021, 11:04
I did advance the feature of VIRTUAL output naming, but I've chosen a scheme in conflict with fasmg's current behavior. This is to prevent any overlap in functionality and erroneous behavior.
Code:
create_output_path:
; in:
;  esi - base path and name (cannot be zero length)
;  ebx - file part to change
;  ecx = length of the part
; out:
;  edx - output path (generated in temporary storage)
        push    ecx
        or      ecx,-1
        mov     edi,esi
        xor     al,al
        repnz   scasb
        mov     ecx,edi
    locate_part:
        dec     edi
        cmp     edi,esi
        jz      no_path
        mov     al,[edi]
        cmp     al,'\'
        jz      no_ext
        cmp     al,'/'
        jz      no_ext
        cmp     al,'.'
        jnz     locate_part

        cmp     dword [esp],0
        jz      truncate_base
        cmp     [ebx],al
        jz      truncate_base
        jmp     locate_part
    no_ext:
        inc     edi
    no_path:
        cmp     dword [esp],0
        jz      copy_base
        cmp     byte [ebx],'.'
        jz      copy_base

        inc     ebx
    truncate_base:
        inc     edi
        mov     ecx,edi
    copy_base:
        sub     ecx,esi

        mov     edi,[preprocessing_workspace.memory_start]
        mov     edx,preprocessing_workspace
        push    ecx
        call    reserve_workspace
        pop     ecx
        dec     ecx
        rep     movsb

        pop     ecx
        jecxz   part_attached
        mov     esi,ebx
        push    ecx
        inc     ecx
        call    reserve_workspace
        pop     ecx
        rep     movsb
    part_attached:
        xchg    eax,ecx
        stosb
        mov     edx,[preprocessing_workspace.memory_start]
        retn    
Code:
format binary as '\taco.exe'
virtual as '/my_data.dat'
virtual as '.exe.dd64'    
So, there are two modalities: '.ext' to change the extension, and '\filename.ext' or '/filename.ext' to change the file name. I've also preserved the zero length ext behavior of the original code. Register usage has been changed and I've propagated those changes through the dependency chain.
Code:
ebx <-> esi

write_output_file:
write_auxiliary_output_area:
        iterate_through_map:
                release_file_data:
                release_auxiliary_output:    
Strings which are valid for this mod are not valid for fasmg, and visa versa.

Updated the extension validation as well.
Code:
    validate_extension:
        jecxz   extension_valid
        lodsb
        cmp     al,'.'
        jz      extension_valid
        cmp     al,'/'
        jz      extension_valid
        cmp     al,'\'
        jz      extension_valid
        jmp     invalid_argument
    extension_valid:
        mov     esi,edi
        jmp     instruction_assembled    
...so, it is possible to do sub-directories in this form (but fasmg doesn't create directories - they must exist). The security minded would want to block the ".." stuff, too. I'm okay with this relaxed utility.

(makes building and testing fasmg mods easier, too Wink)

Edit: the patch/diff.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 02 May 2021, 11:04
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 07 Mar 2023, 07:09
With only three lines of code, we can add multi-line quoted strings:
Code:
@@ -179,9 +179,9 @@ tokenize_source:
    copy_string:
        mov     al,[esi]
        cmp     al,0Dh
        je      copy_string_lead_check
        cmp     al,0Ah
        je      copy_string_lead_check
        cmp     al,1Ah
        je      broken_string
        test    al,al
@@ -196,6 +196,10 @@ tokenize_source:
        mov     [edi+ecx],al
        inc     ecx
        jmp     copy_string
    copy_string_lead_check:
        inc     esi
        jecxz   copy_string
        jmp     copy_string_character
    broken_string:
        mov     byte [edi-5],27h
    finish_string_token:    
... this will allow code like:
Code:
display "
┏━━━━━━━━━┳━━━━━━━━━┓
┃  Box A  ┃  Box B  ┃
┣━━━━━━━━━╋━━━━━━━━━┫
┃  Box C  ┃  Box D  ┃
┗━━━━━━━━━┻━━━━━━━━━┛
"    
We should make note of a few things: newlines at the start of the string are ignored, but those at the end are included. This asymmetry is useful. Also, newlines are encoded based on the source encoding - this is no different than other fasmg features.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 07 Mar 2023, 07:09
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 07 Mar 2023, 08:22
I was sometimes considering a change to tokenizer that could generate two (or possibly more) copies of every tokenized source, using different string parsing rules - to allow switching between classic assembly quoting rules that fasm uses, and C-like string with "\" as escaping character. Then an assembly-time command (similar to RETAINCOMMENTS) would allow to switch to a different tokenization (which in practice would mean switching to a different pre-tokenized copy in the source cache). The only conceptual problem I see is the algorithm to decide where to resume assembly in the alternative source.
Post 07 Mar 2023, 08:22
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 07 Mar 2023, 12:30
The flexibility of the design is certainly one of it's greatest features. For example, strings could be prefixed to specify desired encoding during tokenization. Kind of internalization of \include\encoding\ files, though. What would be the benefit over calm implementation of C-style strings?

There is also potential to mimic string templates like: db "Some value: ${expression}"... although this seems like just a compact way of expressing existing functionality. It's the same with the multi-line quotes - just wanting the source to look a certain way.

The hidden line-ending becoming part of the product is problematic. Especially, if one wants the same file to work in multiple places.

Edit: I didn't address the line count - need to count and signal the consumption of lines.
Code:
display "
┏━━━━━━━━━┳━━━━━━━━━┓
┃  Box A  ┃  Box B  ┃
┣━━━━━━━━━╋━━━━━━━━━┫
┃  Box C  ┃  Box D  ┃
┗━━━━━━━━━┻━━━━━━━━━┛
"
repeat 1,L:__LINE__
display "__LINE__ = ",`L,9,"; 8?"
end repeat    
... if I understand correctly, it is sufficient to just:
Code:
mov ecx,[consumed_lines]
mov al,0ah
rep stosb    
... after the string.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 07 Mar 2023, 12:30
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.