Message board for the users of flat assembler.
> OS Construction > YATF Project
|Are you interested in the topic? (Please read at least once before answering)|
|Total Votes : 23|
YATF, Yet Another Tokenized Forth
*This post is in constant re-editing process*
last update: August 1, 2003
I want to share with you a project I've been working on
since some time. The following code is a very small compiler
oriented towards making small programs, to be used with
my previous bootloader topic. I post it here for your enjoyment.
And also to try to seduce you into helping me achieve what I think
could be simply a revolutionary objective in the opensource world,
namely widespread Forth methodology recognition and adoption.
Forth is centered around words and stacks. Colorforth, the current
state of the Art, is centered around DWORDs, since they are the
native word of the machine. It stuffs whole human words into them
using shannon encoding to fasten the search process (scasd). It
basically uses the first nibble as an index into a jump table for token
decoding, hence "colors" naturally appeared in the editor. Chuck said
CF was putting information in the blank spaces of source code, which
is a great idea, the other Colorforths implement this concept.
My goal is to make a more user-friendly Forth by using ASCII
chars and if possible, better than Chuck's own implementation,
by making it even more simple and straightforward. I looked at
CF's colorful code translated in HTML, and with the kernel source
by my side I slowly digested the cleverness embedded within it,
to mutate it into what I think can revolutionarize everyday
The UltraTechnology.com website helped me a lot, and I owe also
a lot to Jeff Fox, for the exposure of his AHA system and his HUGE
documentation effort. The code you're about to see embodies what
I found the most interesting in the two systems, which are the best
of the Forth world, and again, couldn't exist without Jeff's website.
This is the compiler part of the kernel, the bootloader is in the topic
"IDE bootloader" of this forum. I'll try to put colorful code here the
sooner I can, along with other explanations, please bear with me.
"We can make this world a better place to be in..."
CompileCall 89D9 mov ECX, EBX ; Save current address 89FB mov EBX, EDI ; for lookback optimizat D1E0 shl EAX, 1 ; Read the Index D1E0 shl EAX, 1 ; (shorter than lea 8B00 mov EAX, [EAX] ; and factorizes well) C607E8 mov [EDI], byte 0E8h ;Call rel32 opcode 8D7F05 lea EDI, [EDI+5] ; Advance pointer 29F8 sub EAX, EDI ; to calculate rel. add. 8947FC mov [EDI-4], EAX ; and compile it. C3 ret ExecuteWord D1E0 shl EAX, 1 D1E0 shl EAX, 1 8B00 mov EAX, [EAX] FFE0 jmp EAX BinaryCopy 8A26 mov ah, [esi] 46 inc esi 8827 mov [edi], ah 47 inc edi FEC8 dec al 79F6 jns BinaryCopy C3 ret Detoken D1E8 shr EAX, 1 72D3 jc CompileCall D1E8 shr EAX, 1 72E5 jc ExecuteWord D1E8 shr EAX, 1 72E9 jc BinaryCopy D1E8 shr EAX, 1 7208 jc SkipComment ;forward reference WriteDefinition 89E3 mov EBX, ESP ;Break Lookback optimization 897D00 mov [EBP], EDI ; Save the current address 8D6D04 lea EBP, [EBP+4]; in the Index SkipComment 40 inc EAX 01C6 add ESI, EAX C3 ret Compiler ;Bridge from run to compile time 56 push ESI ;Switch to compile time 89C6 mov ESI, EAX ;TOS is the source code address 29ED sub EBP, EBP ;Reset Top of Index pointer CompilerLoop 0FB606 movzx EAX, byte [ESI] 46 inc ESI 68[4E000000] push CompilerLoop EBD4 jmp Detoken
Last edited by AdamMarquis on 01 Aug 2003, 07:21; edited 18 times in total
|20 Jun 2003, 05:59||
*Last update: August 2, 2003*
This example is the compiler compiling a
program having the same functionality.
This code could be rewritten
to structure dependencies in a more
logical way, like putting 4, 2, and 1,
together, along with byteoffset nearby
for ; and all needed conditionals,
since they all must be macros.
About colors & words, etc.
Color: Name, Type (length)
Red: Definition, Word ([1,16] + 1 bytes)
Black: Comment, Word
Olive: JumpTo, Token (1 byte)
Green: CompileCall, Token
Indigo: BinaryString, ([1,32]+ 1 bytes)
*Words are from 1 to 16 bytes long
*Binary strings are of even length and use [0,9] & [A,F] ASCII intervals.
pushopt 8D40FB39C175138038E8750EC6006840FF30FF70058F008F4005 ;;
@ 2, 8B00 ;
2* 2, D1E0 ;
2/ 2, D1E8 ;
4, 8D7F06C747FA8D7F04C766C747FE47FC ;
4* 4, D1E0D1E0 ;
opt 4, 89D989FB ;
token@+ 4, 0FB60646 ;
1, 4, 47C647FF ;
1+ 1, 40 ;
c? 1, 72 cond token@+ 2* @ 4729F883F8807C048847FFC3
-? 1, 79 cond ;
CompileCall opt 4* @ C607E88D7F0529F88947FC ;
Jump 4* @ FFE0
BinaryCopy 8A2646882747FEC8 -? BinaryCopy ;
SkipComment 1+ 01C6 ;
Detoken 2/ c? CompileCall 2/ c? JumpTo2/ c? BinaryCopy 2/ c? Definition SkipComment ;
CompilerLoop token@+ Detoken CompilerLoop ; GO!
Listing output comes from NASM (only missing feature of fasm)
Explanations following in this format upon request.
GO 5E pop ESI ;get rid of Compilerloop 5E pop ESI ; Reenable AD lodsd ; the data stack FF65FC jmp dword [EBP-4] ;jump to last defined word
It gets out of the compiler and then jump to the last defined word.
Or one could use the end word: push push lodsd ret.
This word must be defined in the 64 first words, since it must
be executed to stop the compiler. A lot of variations are possible.
Return: C607C3 mov byte [EDI], 0C3h 47 inc EDI C3 ret Semicolon: 8D47FB lea EAX, [EDI-5] 39D8 cmp EAX, EBX 75F4 jnz Return 8038E8 cmp [EAX], byte 0E8h ;Call opcode 75EF jnz Return 83780180 cmp dword [EAX+1], byte -128 7C06 jl JumpNear JumpShort: 800002 add byte [EAX], byte 2h ;EBh: Short jump opcode 83EF03 sub EDI, byte 3h JumpNear: FE00 inc byte [EAX] ;E9h: jump rel32 opcode ;Pushopt: 8D40FB lea EAX, [EAX-5] 39C1 cmp ECX, EAX 7513 jnz skip 8038E8 cmp byte [EAX], 0E8h 750E jnz skip C60068 mov byte [EAX], 068h ;68h: push imm32 opcode 40 inc EAX FF30 push dword [EAX] FF7005 push dword [EAX+5] 8F00 pop dword [EAX] 8F4005 pop dword [EAX+5] skip: C3 ret
Pushopt is a comment, to make a clear distinction:
the call1 jmp2 -> push2 jmp1 optimization is optionnal.
the bincopy token can be removed safely if the ;; (explicit ret)
macro is left there. Also a comment to brake the long line,
since definitions and comment start on a new line.
Forward references (jnz skip) are only found inside
binary tokens, since words must be defined before
being referenced. Far simpler to implement backward
cond ;relative offset from CompileCall token 0FB606 movzx eax, byte [esi] 46 inc esi D1E0 shl eax, 1 8B00 mov eax, [eax] 47 inc edi 29F8 sub eax, edi 83F880 cmp eax, byte -128 7C04 jl convertnear 8847FF mov byte [edi-1], al C3 ret convertnear 83E804 sub EAX, byte 4 8907 mov [EDI], EAX 66B80F10 mov ax, 10F0h 0267FE add ah, byte [EDI-2] 668947FE mov word [EDI-2], ax 83C704 add EDI, byte 4 C3 ret
This polyvalent macro generates byte offset for
short and near conditional jumps, provided that
the short version of the opcode is compiled.
The convertnear part (along with the cmp jl)
is optional and is not needed by the thoughtful
programmer who wants tight source to the
expense of jump limits.
2, 4, and 1,
comma4 ;opcode 6 bytes long 8D7F04 lea EDI, [EDI+4] C747FC78563412 mov dword [edi-4], 12345678h comma2 ;opcode 6 bytes long 47 inc EDI 47 inc EDI 66C747FE3412 mov word [EDI-2], 1234h comma1 ;opcode 4 bytes long 47 inc EDI C647FFC3 mov byte [edi-1], 0C3h comma6 ; 4, code 8D7F06 lea EDI, [EDI+6] C747FA8D7F04C7 mov dword [EDI-6], 0c7047f8dh 66C747FE47FC mov word [EDI-2], 0FC47h C3 ret
I use those primitives as macros only, to make other macros' definitions
more readable and make source code smaller. It should degrade raw
speed a little and make the intermediate code bigger, but not much.
I will post about ?dup and ?lit, since they're used in the construction
of the primitive stack machine operations. After all the primitives
are defined, every forther should feel at home. How exactly the system
will be usable in the future is still to be determined.
I would like to have small template modules built around forth primitives,
like an editor with a builtin (also dis-)assembler, needed especially for
binary copy tokens. One could get past that by using a cheat sheet with
common opcodes, but once the primitives will be out, no more absolute
need for this.
Thanks again for your interest,
Last edited by AdamMarquis on 03 Aug 2003, 12:18; edited 14 times in total
|24 Jul 2003, 02:02||
This sounds really cool and I have enjoyed the ideas at the links you have posted - looking forward to your future developments.
|24 Jul 2003, 21:36||
I would love to eb able to delete my posts whenever necessary in my threads.
Last edited by AdamMarquis on 02 Aug 2003, 05:15; edited 3 times in total
|31 Jul 2003, 02:47||
did you know Retro Forth?
|31 Jul 2003, 09:44||
Yes, I think their native keyboard driver is quite neat, I'm trying
to come with my own right now. The projects are similar, I have
the latest build here in my "systems" folder, along with ColorForth,
Terry Loveall's 4word, Sean Pringle's State of Flux and others.
|31 Jul 2003, 17:22||
Map : Forth in 3D
|15 Aug 2003, 01:17||
Maybe you could use lego bricks ;p
2d color blocks could be of great use to show the memory
footprint of the program and/or the number of cycles each
word takes. a cube got 3 dimensions+color, could be feasible.
Did you catch the 3d display hack for LCD screens?
One Just have to put cellophane one one half
and use polarized "goggles" on his/her face or
fix them to the screen.
|15 Aug 2003, 22:10||
Did you catch the 3d display hack for LCD screens?
Yeah it could be great! the hack with os and fasm.
but wasn't this technology about 15 years ago? company won't give the power to us..
bioelectronics applied quantum computer.
Hand-maden 500W PSU.
3D cad/cam/cae written in fasm, half-life2(physical engine in graphic).
Last edited by fasm9 on 26 Aug 2003, 21:59; edited 1 time in total
|17 Aug 2003, 03:27||
Mark Slicker, an active ColorForth contributor,
suggested on the CF mailing list that since
fall-throughs between definitions are rare compared
to closed ones (red word terminated by semicolon),
we could embed the ; functionality in every definition.
Forth is cleaner that way IMO, and when one need
fall-through he/she can use a special word, like
let say ... to read the red word the old way.
Colorless conversion are also easier to perform
and there's less semantic rules involved.
|25 Aug 2003, 12:17||
Mark Slicker, an active ColorForth contributor,
it is OS?
|19 Nov 2003, 04:49||
I wrote about the forth language, which this thread is all about.
Those are the best references, if you want to learn more:
I'm still in the process of finishing the project
(very busy with school) and I hope it to be available
at the beginning of 2004 at most.
I think of using the A register implicitly for all
Traditional @, taking its address from the stack
would be a! then @, or #@ for literals.
Aside from a! and a@ there would be @+ and !+.
|19 Nov 2003, 23:39||
Adam, have you made any more progress on this project?
|15 Oct 2004, 12:20||
No, I didn't.
I hope people are willing to help, would be great
to every newcomer in programming all around the world.
What I'm looking into is networking and USB code.
And replace the firmware of my cheap 2505N router,
uses the traditional Samsung chip with an ARM7 core,
it doesn't do DHCP right, doesnt increment IPs.
Also got my hands on a C8051f350 (24 bits ADC) along
with an AVR Butterfly and a Honda Civic SI 1989 ECU
from a friend. Those are my actual toys, along maybe
with a gamecube.
I bought also a 256MB usb drive, use at the moment
Damn Small Linux. It's great, but very far from what's
actually wanted in my case.
This revived interest could be what's needed to bring
people closer to what a computer truly is. It will be done
one day, it's inevitable, or perhaps already done.
Some great, dense yet full of light source code,
like the piece that started this thread. It was
meant to inspire, hope it worked. Yet it saddens me
to see it going nowhere.
|18 Oct 2004, 02:30||
< Last Thread | Next Thread >
Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.
Website powered by rwasa.