flat assembler
Message board for the users of flat assembler.
Index
> Programming Language Design > [fasmg] Would a change to iterate be better? |
Author |
|
Tomasz Grysztar 25 May 2020, 10:46
This specific trait was inherited from fasm 1, where IRP was supposed to behave similarly to a macro with repeated argument - this is why it was chosen that it should behave analogously to such macro and always process at least a single argument, even if empty one. Similarly macros in fasm 1 treat the additional comma at the end always as adding additional empty argument.
In general, I'm very reluctant to consider any backward-compatibility breaking changes at this point (otherwise I would be attempting changes to what I consider "not the best choices" in some other areas, which I would consider even more important - but still not important enough to consider breaking existing source bases now). However, if we could be fairly sure that iteration over an empty item at the end of list is not really used anywhere (it is quite conceivable), this might be something to consider. Because yes, I believe that what you proposed would be unarguably better. There might be a small problem in case when there are multiple iterated parameters, but it should be solvable. |
|||
25 May 2020, 10:46 |
|
Tomasz Grysztar 25 May 2020, 12:26
I prepared a patch, please test it if you have an opportunity:
Code: --- source/directives.inc +++ source/directives.inc @@ -1843,31 +1843,35 @@ iterator_parameters_declared: cmp al,',' jne invalid_iterator mov eax,[number_of_parameters] mov [number_of_values],eax - xor eax,eax mov [value_index],eax - inc eax - mov [number_of_iterations],eax + and [number_of_iterations],0 collect_iterator_values: inc esi + call move_to_next_symbol + jc initialize_iterator mov edx,expression_workspace mov ecx,sizeof.LineExcerpt call reserve_workspace call cut_argument_value add edi,sizeof.LineExcerpt inc [number_of_values] mov ecx,[value_index] - sub ecx,[number_of_parameters] - jc iterator_value_collected - mov [value_index],ecx + cmp ecx,[number_of_parameters] + jne iterator_value_collected inc [number_of_iterations] + xor ecx,ecx iterator_value_collected: - inc [value_index] + inc ecx + mov [value_index],ecx cmp al,',' je collect_iterator_values + initialize_iterator: + cmp [number_of_iterations],0 + je inactive_iterator_block mov dl,DBLOCK_CONTROL mov ecx,5+sizeof.RepeatData call add_directive_block mov [edi+DirectiveBlock.subtype],CTRL_IRP or [edi+DirectiveBlock.flags],CTRLF_BREAKABLE + CTRLF_HAS_REPEAT_DATA + CTRLF_HAS_WRITABLE_INDEX |
|||
25 May 2020, 12:26 |
|
Tomasz Grysztar 25 May 2020, 17:28
I tested many projects and it seems that this change is completely safe, I have not found a single source text that would actually rely on this behavior - which is yet another argument for this being a good change.
I'm going to just include this change in the official version then. Thank you for bringing it up! I was myself too stuck in "compatibility" thinking to notice this - nearly free - opportunity. Also, this brings more symmetry between IRP and IRPV. And to iterate over an empty argument you can still do it like: Code: iterate A,<> |
|||
25 May 2020, 17:28 |
|
bitRAKE 26 May 2020, 03:06
I don't expect you to entertain any of my crazy ideas, but this one seemed reasonable. Like I think all the legacy data sizes should go away, lol. The quick reply is awesome and works well.
_________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
26 May 2020, 03:06 |
|
bitRAKE 26 May 2020, 05:18
I'm writing a script to extract all the instructions from your implementation of x86:
Code: file lines labels assembler.inc 3288 334 calm.inc 2343 222 conditions.inc 655 85 console.inc 331 44 directives.inc 4912 497 errors.inc 224 20 expressions.inc 4130 489 floats.inc 1165 115 map.inc 227 27 messages.inc 52 0 output.inc 728 77 reader.inc 442 40 symbols.inc 1511 157 tables.inc 419 2 variables.inc 261 0 version.inc 1 0 80186.inc 317 51 80286.inc 73 9 80287.inc 10 0 80386.inc 2337 417 80387.inc 129 14 80486.inc 52 5 8086.inc 1324 240 8087.inc 502 76 p5.inc 30 3 p6.inc 51 5 x64.inc 3030 554 adx.inc 20 0 aes.inc 28 0 avx2.inc 448 26 avx512_4vnniw.inc 25 0 avx512_bitalg.inc 21 0 avx512bw.inc 404 0 avx512cd.inc 19 0 avx512dq.inc 390 0 avx512er.inc 30 0 avx512f.inc 2363 116 avx512_ifma.inc 8 0 avx512.inc 7 0 avx512pf.inc 53 0 avx512_vbmi2.inc 44 0 avx512_vbmi.inc 13 0 avx512vl.inc 4 0 avx512_vnni.inc 8 0 avx512_vpopcntdq.inc 8 0 avx.inc 1082 40 bmi1.inc 88 0 bmi2.inc 95 0 cet_ibt.inc 5 0 cet_ss.inc 88 0 f16c.inc 26 0 fma.inc 19 0 fsgsbase.inc 20 0 gfni.inc 34 0 hle.inc 11 0 invpcid.inc 13 0 mmx.inc 132 16 movdir64b.inc 20 2 movdiri.inc 17 0 mpx.inc 182 2 pclmulqdq.inc 9 0 ptwrite.inc 17 0 rdrand.inc 9 0 rdseed.inc 9 0 rdtscp.inc 3 0 rtm.inc 35 0 smx.inc 3 0 sse2.inc 551 10 sse3.inc 64 0 sse4.1.inc 189 0 sse4.2.inc 49 0 sse.inc 389 12 ssse3.inc 23 0 vaes.inc 26 0 vmx.inc 56 0 vpclmulqdq.inc 25 0 xsave.inc 29 0 |
|||
26 May 2020, 05:18 |
|
Tomasz Grysztar 26 May 2020, 08:17
bitRAKE wrote: I don't expect you to entertain any of my crazy ideas, but this one seemed reasonable. Like I think all the legacy data sizes should go away, lol. bitRAKE wrote: I'm writing a script to extract all the instructions from your implementation of x86: Code: macro calminstruction?.display?! any& end macro macro macro?! declaration& esc macro declaration display `declaration,10 end macro |
|||
26 May 2020, 08:17 |
|
bitRAKE 26 May 2020, 12:31
Clever, extrapolating I came up with:
Code: macro display? D& db `D,13,10 end macro calminstruction calminstruction?! text& local B match B,text jno done arrange B,=display B assemble B done: arrange B,=calminstruction text assemble B end calminstruction _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
26 May 2020, 12:31 |
|
bitRAKE 26 May 2020, 14:30
This is the stub.inc, I prepended to each file:
Code: if __source__=__file__ macro display? D& if __source__=__file__ db `D,13,10 end if end macro macro macro?! all& if __source__=__file__ display all end if esc macro all end macro macro calminstruction?.asmcmd? pattern& local cmd arrange cmd, pattern assemble cmd end macro calminstruction calminstruction?! text& local B match B,text jno done arrange B,=display B assemble B done: arrange B,=calminstruction text assemble B end calminstruction end if Code: @ECHO OFF :: depenancy chain requires these steps to be :: preformed separately SET BASEPATH="C:\fasmg\packages\x86\include\cpu\" :: merge stub to each file FOR /R %BASEPATH% %%G IN (*.inc) DO ( copy /b .\stub.inc + %%G .\%%~nG.inc ) :: then process them FOR /R %BASEPATH% %%G IN (*.inc) DO ( fasmg .\%%~nG.inc ) :: remove temp files FOR /R %BASEPATH% %%G IN (*.inc) DO ( del .\%%~nG ) ECHO ON _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
26 May 2020, 14:30 |
|
bitRAKE 05 Apr 2021, 20:43
A kind of related change I've been playing with:
Code: @@ -654,11 +654,11 @@ assembly_line: cmp al,27h je convert_quoted_string test al,al jz file_ended cmp al,0Ah - je line_ended + jz maybe_line_ended cmp al,';' jne preprocess_syntactical_character xor edx,edx test [preprocessing_mode],PMODE_RETAIN_COMMENTS jz skip_comment @@ -679,13 +679,22 @@ assembly_line: cmp al,';' je concatenation_comment cmp al,20h jne preprocess_line_from_file jmp detect_line_concatenation + maybe_line_ended: + ; empty lines don't count + cmp edi,[preprocessing_workspace.memory_start] + jz line_ended + ; final comma character on line forces concatenation + cmp byte [edi-1],',' + jnz line_ended + jmp concatenate_comma_line concatenate_line: mov byte [edi-1],20h inc esi + concatenate_comma_line: inc [ebx+SourceEntry.number_of_attached_lines] jmp preprocess_line_from_file concatenation_comment: test [preprocessing_mode],PMODE_RETAIN_COMMENTS jnz preprocess_line_from_file I wanted to have the comma character to automatically imply line continuation. This is just a cosmetic change to make the code look prettier, imho. What it breaks is a common pattern to use an empty value to arrange, etc. An easy work-around is to convert lines like: Code: arrange CheckSumBlocks, Code: arrange CheckSumBlocks,; _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
05 Apr 2021, 20:43 |
|
bitRAKE 30 Apr 2021, 01:24
Recently, it was useful to output a more general extensioned VIRTUAL block, 'exe.dd64'. By patching one line fasmg will just use whatever extension is provided. The current distribution does not allow periods in the extension given to VIRTUAL.
Code: @@ -1148,11 +1148,14 @@ virtual_block: mov eax,[current_area] inc [eax+ValueDefinition.reference_count] push ecx call put_into_map pop ecx - jmp validate_extension +; allow any extension +mov esi,edi +jmp instruction_assembled continue_virtual_block: and [leave_opening_parentheses],0 mov edi,[expression_workspace.memory_start] call parse_expression mov edi,[expression_workspace.memory_start] Code: virtual as "/filename.ext" _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
30 Apr 2021, 01:24 |
|
bitRAKE 02 May 2021, 11:04
I did advance the feature of VIRTUAL output naming, but I've chosen a scheme in conflict with fasmg's current behavior. This is to prevent any overlap in functionality and erroneous behavior.
Code: create_output_path: ; in: ; esi - base path and name (cannot be zero length) ; ebx - file part to change ; ecx = length of the part ; out: ; edx - output path (generated in temporary storage) push ecx or ecx,-1 mov edi,esi xor al,al repnz scasb mov ecx,edi locate_part: dec edi cmp edi,esi jz no_path mov al,[edi] cmp al,'\' jz no_ext cmp al,'/' jz no_ext cmp al,'.' jnz locate_part cmp dword [esp],0 jz truncate_base cmp [ebx],al jz truncate_base jmp locate_part no_ext: inc edi no_path: cmp dword [esp],0 jz copy_base cmp byte [ebx],'.' jz copy_base inc ebx truncate_base: inc edi mov ecx,edi copy_base: sub ecx,esi mov edi,[preprocessing_workspace.memory_start] mov edx,preprocessing_workspace push ecx call reserve_workspace pop ecx dec ecx rep movsb pop ecx jecxz part_attached mov esi,ebx push ecx inc ecx call reserve_workspace pop ecx rep movsb part_attached: xchg eax,ecx stosb mov edx,[preprocessing_workspace.memory_start] retn Code: format binary as '\taco.exe' virtual as '/my_data.dat' virtual as '.exe.dd64' Code: ebx <-> esi write_output_file: write_auxiliary_output_area: iterate_through_map: release_file_data: release_auxiliary_output: Updated the extension validation as well. Code: validate_extension: jecxz extension_valid lodsb cmp al,'.' jz extension_valid cmp al,'/' jz extension_valid cmp al,'\' jz extension_valid jmp invalid_argument extension_valid: mov esi,edi jmp instruction_assembled (makes building and testing fasmg mods easier, too ) Edit: the patch/diff. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
02 May 2021, 11:04 |
|
bitRAKE 07 Mar 2023, 07:09
With only three lines of code, we can add multi-line quoted strings:
Code: @@ -179,9 +179,9 @@ tokenize_source: copy_string: mov al,[esi] cmp al,0Dh je copy_string_lead_check cmp al,0Ah je copy_string_lead_check cmp al,1Ah je broken_string test al,al @@ -196,6 +196,10 @@ tokenize_source: mov [edi+ecx],al inc ecx jmp copy_string copy_string_lead_check: inc esi jecxz copy_string jmp copy_string_character broken_string: mov byte [edi-5],27h finish_string_token: Code: display " ┏━━━━━━━━━┳━━━━━━━━━┓ ┃ Box A ┃ Box B ┃ ┣━━━━━━━━━╋━━━━━━━━━┫ ┃ Box C ┃ Box D ┃ ┗━━━━━━━━━┻━━━━━━━━━┛ " _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
07 Mar 2023, 07:09 |
|
Tomasz Grysztar 07 Mar 2023, 08:22
I was sometimes considering a change to tokenizer that could generate two (or possibly more) copies of every tokenized source, using different string parsing rules - to allow switching between classic assembly quoting rules that fasm uses, and C-like string with "\" as escaping character. Then an assembly-time command (similar to RETAINCOMMENTS) would allow to switch to a different tokenization (which in practice would mean switching to a different pre-tokenized copy in the source cache). The only conceptual problem I see is the algorithm to decide where to resume assembly in the alternative source.
|
|||
07 Mar 2023, 08:22 |
|
bitRAKE 07 Mar 2023, 12:30
The flexibility of the design is certainly one of it's greatest features. For example, strings could be prefixed to specify desired encoding during tokenization. Kind of internalization of \include\encoding\ files, though. What would be the benefit over calm implementation of C-style strings?
There is also potential to mimic string templates like: db "Some value: ${expression}"... although this seems like just a compact way of expressing existing functionality. It's the same with the multi-line quotes - just wanting the source to look a certain way. The hidden line-ending becoming part of the product is problematic. Especially, if one wants the same file to work in multiple places. Edit: I didn't address the line count - need to count and signal the consumption of lines. Code: display " ┏━━━━━━━━━┳━━━━━━━━━┓ ┃ Box A ┃ Box B ┃ ┣━━━━━━━━━╋━━━━━━━━━┫ ┃ Box C ┃ Box D ┃ ┗━━━━━━━━━┻━━━━━━━━━┛ " repeat 1,L:__LINE__ display "__LINE__ = ",`L,9,"; 8?" end repeat Code: mov ecx,[consumed_lines] mov al,0ah rep stosb _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
07 Mar 2023, 12:30 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.