flat assembler
Message board for the users of flat assembler.

Index > Heap > Let's Build a Compiler

Author
Thread Post new topic Reply to topic
KevinN



Joined: 09 Oct 2012
Posts: 161
KevinN
Post 05 Mar 2013, 22:28
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Man, that's a trip down memory lane Smile
Post 05 Mar 2013, 22:31
View user's profile Send private message Visit poster's website Reply with quote
TmX



Joined: 02 Mar 2006
Posts: 821
Location: Jakarta, Indonesia
TmX
And here's the Forth version of that tutorial:
http://home.iae.nl/users/mhx/crenshaw/tiny.html

Smile
Post 06 Mar 2013, 06:22
View user's profile Send private message Reply with quote
KevinN



Joined: 09 Oct 2012
Posts: 161
KevinN
TmX wrote:
And here's the Forth version of that tutorial:
http://home.iae.nl/users/mhx/crenshaw/tiny.html

Smile


great Laughing
Post 06 Mar 2013, 08:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17271
Location: In your JS exploiting you and your system
revolution
The part about the expression evaluator and interpreter are a good read for anyone wanting to parse text.
Post 07 Mar 2013, 07:45
View user's profile Send private message Visit poster's website Reply with quote
Bob++



Joined: 12 Feb 2013
Posts: 92
Bob++
I think that I will print it
Post 09 Mar 2013, 16:14
View user's profile Send private message Reply with quote
comrade



Joined: 16 Jun 2003
Posts: 1137
Location: Russian Federation
comrade
Nice find
Post 09 Mar 2013, 18:37
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
uart777



Joined: 17 Jan 2012
Posts: 369
uart777
I read this tutorial a long time ago. It uses a recursive descent parser and is designed for load/store architecture. A translation to C/X86 would be something like this (written at age 19, 16 years ago):
Code:
/* Convert expression to ASM. Supports variables,
numbers and operators: &, |, ^, <<, >>, +, -, *, /,
%, (). All operations are i32 signed. ASM code must
be rescanned and optimized */

#include <stdio.h>
#include <stdlib.h>

#define and &&
#define or ||

#define isa(c) ((c>='A' and c<='Z') or \
 (c>='a' and c<='z') or c=='_')
#define isn(c) (c<='9' and c>='0')
enum { NAME, NUMBER, SYMBOL };

char *stream, token[256], token_type;
void get_token(), expression(),
level1(), level2(), level3(), level4(), level5();
#define descend(x) puts("push edx"); get_token(); x()

int main() {
stream=(char *) malloc(256);
for (;Wink {
 puts("Enter expression ('q' to exit): ");
 gets(stream);
 if (*stream=='q')
 break;
 expression();
}
return 0;
}

// display error then exit

void error(char *why) {
printf("Error: %s\n", why);
exit(0);
}

// copy token and advance. skip whitespace first

void get_token() {
int i;
for (; *stream==' ' or *stream=='\t'; stream++);
if (!*stream)
 return;
if (isa(*stream)) {
 for (i=0; isa(*stream); token[i]=*stream++, i++);
 token[i]=0, token_type=NAME;
 return;
}
if (isn(*stream)) {
 for (i=0; isn(*stream); token[i]=*stream++, i++);
 token[i]=0, token_type=NUMBER;
 return;
}
token[0]=*stream++, token[1]=0;
token_type=SYMBOL;
}

// convert expression. result in edx

void expression() {
get_token();
level1();
}

// bitwise AND, OR, XOR

void level1() {
int c;
level2();
while ((c=token[0])=='&' or c =='|' or c=='^') {
 descend(level2);
 if (c=='&')
  puts("pop eax\nand eax, edx\nmov edx, eax");
 else if (c=='|')
  puts("pop eax\nor eax, edx\nmov edx, eax");
 else if (c=='^')
  puts("pop eax\nxor eax, edx\nmov edx, eax");
}
}

// shifts

void level2() {
int c, c2;
level3();
while (((c=token[0])=='<' and (c2=*stream)=='<') or
((c=token[0])=='>' and (c2=*stream)=='>')) {
 stream++; // past <>
 descend(level3);
 if (c=='<' and c2=='<')
  puts("pop eax\nshl eax, edx\nmov edx, eax");
 else if (c=='>' and c2=='>')
  puts("pop eax\nshr eax, edx\nmov edx, eax");
}
}

// add or subtract

void level3() {
int c;
level4();
while ((c=token[0])=='+' or c=='-') {
 descend(level4);
 if (c=='+')
  puts("pop eax\nadd eax, edx\nmov edx, eax");
 else if (c=='-')
  puts("pop eax\nsub eax, edx\nmov edx, eax");
}
}

// multiply or divide

void level4() {
int c;
level5();
while ((c=token[0])=='*' or c=='/' or c=='%') {
 descend(level5);
 if (c=='*')
  puts("pop eax\nimul eax, edx\nmov edx, eax");
 else if (c=='/') // must use ecx...
  puts("pop eax\nmov ecx, edx\ncdq\nidiv ecx\nmov edx, eax");
 else if (c=='%')
  puts("pop eax\nmov ecx, edx\ncdq\nidiv ecx");
}
}

// parentheses or get value

void level5() {
if (token[0]=='(') {
 get_token();
 level1();
 if (token[0]!=')')
  error(") expected");
 get_token();
}
else {
 printf("mov edx, %s\n", token);
 if (token_type==SYMBOL)
  error("Value expected");
 get_token();
}
}    


Then I discovered that it's best to use the "Shunting Yard" algorithm and convert standard infix expressions to RPN ("Reverse Polish Notation") first for easy optimization and conversion to ASM. Thus: Infix>RPN>UASM>ASM (where UASM is universal and ASM is CPU-specific).

Famous "red dragon compiler book": http://www.amazon.com/Compilers-Principles-Techniques-Alfred-Aho/dp/0201100886/ref=sr_1_2?ie=UTF8&qid=1363779650&sr=8-2&keywords=red+dragon+compiler

"Shunting Yard": http://en.wikipedia.org/wiki/Shunting_yard_algorithm

"Reverse Polish Notation": http://en.wikipedia.org/wiki/Reverse_polish_notation
Post 20 Mar 2013, 11:51
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.