flat assembler
Message board for the users of flat assembler.

Index > Programming Language Design > How C stores symbol table? (type information)

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 05 Apr 2017, 07:40
How C compiler stores symbol table? This link https://www.tutorialspoint.com/compiler_design/compiler_design_symbol_table.htm
explained most of it, but I'm still not sure how they store type data. They can be quite complex, some types are actually combinations of types (array of struct, pointer to pointer to integer), some types reference user defined types (creating a struct creates a new type, which can be referenced in another type (array of structs again)). How compilers and debuggers store that?
Post 05 Apr 2017, 07:40
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 05 Apr 2017, 08:07
I'm pretty sure it depends upon which C compiler you are using. There is no standard AFAICT.
Post 05 Apr 2017, 08:07
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 05 Apr 2017, 08:45
That's kind of oblvious, tell me about whichever compiler you know.

Also why you moved this thread from Programming Language Design to High Level Languages? People here ask how to use C, I'm asking how to make C. I'm not complaining, maybe this subforum is read more often.
Post 05 Apr 2017, 08:45
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 05 Apr 2017, 09:05
You do not appear to the designing a language. Instead you are making a C compiler. A laudable goal in itself but that is not designing a language.
Post 05 Apr 2017, 09:05
View user's profile Send private message Visit poster's website Reply with quote
Trinitek



Joined: 06 Nov 2011
Posts: 257
Trinitek 05 Apr 2017, 09:37
Well that's interesting. I didn't expect a website like TutorialsPoint.com to have articles on compiler design. Laughing
Post 05 Apr 2017, 09:37
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 05 Apr 2017, 10:01
revolution wrote:
You do not appear to the designing a language. Instead you are making a C compiler. A laudable goal in itself but that is not designing a language.
The description of "Programming Language Design" forum is:
Quote:
Discuss the design of programming languages and construction of compilers, interpreters and assemblers.
so this thread likely fits.
Post 05 Apr 2017, 10:01
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 05 Apr 2017, 11:04
Tomasz Grysztar wrote:
so this thread likely fits.
Yeah. My mistake. Thanks.
Post 05 Apr 2017, 11:04
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 05 Apr 2017, 13:42
I'd probably store variable type as a single dword. With one bit of it deciding if it's a normal type or a user defined type.

The problem with it is in those combined types, like *******int. I know you rarely meet more than two pointer stars, but still compiler probably has to support up to infinity pointers, this means this probably requires up to infinity space...

Probably that's a good place to use linked lists, I dunno.

Yeah, I think I'll do something like that. Type information is two dword,
first dword is type
second dword is child type
and when I'll need to "say" that a symbol is a *********int (this looks like a swearing, lol), I'll create a bunch of middle types that aren't used by any symbol directly.
I'll create a type with first dword saying pointer, and second dword saying int.
Then I'll create a type with first dword saying pointer, and second dword saying the previous type.
And continue until I created what I wanted.

The next problem is with languages that have not one, but two child types. Python and go have "dict" and "map" data types, and both of them are maps of something to something. That's two child types, this means at least three dwords, for every type. And C++ allows creation of combined types (they are called templates, i think) that may use even more child types, which means again up to infinity dwords. I'm again not sure how to implement that.

Since I moved to assembly, I started to panic whenever I see a variable length anything. I guess that's my problem right now.
Post 05 Apr 2017, 13:42
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 05 Apr 2017, 14:49
vivik wrote:
Since I moved to assembly, I started to panic whenever I see a variable length anything. I guess that's my problem right now.
Once you have dealt with a problem of memory allocation (and in any modern OS you have a reliable "malloc" API), creation of variable-length structures in assembly is really not that hard, and it often is very rewarding.

For example to make a linked list traversable in both directions you need to add just two 32-bit (or 64-bit) fields to every element - call them like "next" and "previous" - and store pointers to adjacent elements there. To access the list just keep a pointer to a element of your choice (the first one, or the last one, or even one in the middle if that makes sense, it all depends on application). Insertion of a new element at any point may look like:
Code:
        ; ebx - new element (probably freshly malloc'd)
        ; esi - element in list after which to insert new one
        mov     [ebx+Element.previous],esi
        mov     eax,ebx
        xchg    eax,[esi+Element.next]
        mov     [ebx+Element.next],eax
        mov     [eax+Element.previous],ebx    
I think that implementation of growing data structures can be a lot of fun and I highly recommend to experiment with them. The more you do it, the less intimidating and more enjoyable it becomes.

And of course, to implement any kind of compiler - even a relatively simple one like an assembler - you need to be able to process growing data structures en masse. The symbol tree in fasmg contains a thick pattern of interconnected structures, some of the connections create linked lists, some make trees, hash tables, hash trees (a hash-based binary trie is my favorite one, I use it in fasm perhaps partly out of sentiment). And it does not have any complex types at all!
Post 05 Apr 2017, 14:49
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 05 Apr 2017, 16:36
@Tomasz Grysztar
Interesting how in fasm you only seem to use memory allocation like once, at the very beginning, and that's it. If I understood and remember correctly. Was it just more convenient that way, or you wanted fasm to be faster?

Also, can you point out at where the binary trie is used in fasm? I'm curious how it looks like.

Yeah, I should start using malloc already, I'm avoiding it much more than necessary.
Post 05 Apr 2017, 16:36
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 05 Apr 2017, 17:40
vivik wrote:
Interesting how in fasm you only seem to use memory allocation like once, at the very beginning, and that's it. If I understood and remember correctly. Was it just more convenient that way, or you wanted fasm to be faster?
fasm was originally written for DOS where good malloc-style API was not universally available (especially not for extended memory) and so the only real option was to take the entire available memory (not a problem in system with no multitasking) and implement own allocation scheme inside it. But instead of writing some kind of internal "malloc" for fasm I used allocation methods very specific to consecutive stages of preprocessing and assembling which were impossible to untangle from fasm's internals when porting it to more modern systems. And thus pre-allocation of a fixed amount of memory had to stay.

vivik wrote:
Also, can you point out at where the binary trie is used in fasm? I'm curious how it looks like.
In fasm source you can find it in the "get_label_id" routine in PARSER.INC, start looking from "find_label", the tree traversal starts right after the FNV-1a hash is computed. In fasmg source this is the "scan_namespace" function in SYMBOLS.INC.
Post 05 Apr 2017, 17:40
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 05 Apr 2017, 19:15
Glad I finally stopped worrying about default windows memory allocator being slow, I finished designing data structures pretty fast after that.

I wonder what reasons golang developers had for supporting maps (aka dicts, hash tables) on type system level. Can't it just be a class with a bunch of methods? Having maps in type system only makes sense if there is some kind of function that may assept only arguments of this exact type. Can't imagine a function that needs to have maps of strings to integers, and nothing else.

Hash tables are in golang probably because they were in python, and python is pretty lax about type system in general.

If I'll ever need to make sure that a function always receives the hash table of correct type at compile type, there are more ways to make sure of than to just use type system. I'll use something that operates on program source, I guess? Dunno, don't want to think about it now.

So, this means I can ignore maps.
Post 05 Apr 2017, 19:15
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 05 Apr 2017, 19:42
vivik wrote:
Glad I finally stopped worrying about default windows memory allocator being slow, I finished designing data structures pretty fast after that.
Even if you stick to the OS-provided allocator in the beginning, later you always have an option of replacing it with your own one, perhaps better tuned to your application. This is also the approach I chose for fasmg (if you scroll to the end of that thread you may find a very good article on memory allocation strategies), though it turned out that "malloc" has not that much impact on the overall assembly time. The strongest slow-downs were from a re-allocations of overflowing buffers and I managed to keep them in a small number by growing buffers exponentially (but still not too aggressively, see the "grow_stack" routine in fasmg source).
Post 05 Apr 2017, 19:42
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 07 Apr 2017, 09:33
I noticed you use HeapSize right after HeapAlloc in the fasmg source. This means HeapAlloc may allocate more memory than requested? How often is this the case?

I tried googling, and found this thread about somebody asking for a memory with size 0 and getting memory with size 1: https://www.experts-exchange.com/questions/23458459/Why-does-HeapAlloc-allocates-more-memory-than-I-ask-it-to-under-Vista-but-not-under-XP.html

I planned to use HeapSize to keep track of all variable length data structures, but if it will suddenly allocate 12 bytes when I ask for only 8, it will break things pretty badly. But if it only allocates more memory when it's not dividible by 4, or when asked for something weird like zero size memory, and behaves correctly in all other cases, when I will use HeapSize for that. Overwise, I'll have to keep sizes somewhere in my own data, manually. Probably as a first dword inside data chunk itself.
Post 07 Apr 2017, 09:33
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 07 Apr 2017, 09:37
There are no guarantees about what is allocated other than you can use what you ask for; and maybe more if the OS uses some form of minimum allocation or alignment strategies. Also, do not rely on all versions of any particular OS all doing the same things.

TL;DR never assume things, always check.
Post 07 Apr 2017, 09:37
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 07 Apr 2017, 09:45
Note that my "malloc" function is a generic API, it always returns the size allocated, but the code that called it may use this information or not. If you discard such information, you simply do not use the excess memory if such is provided and it should not be a big problem - any "malloc" implementation wastes some memory anyway. But in cases where I allocated buffers that may need to grow later I store the allocated size, so when there was more memory allocated that requested it still may get used before calling "realloc" to grow the buffer.

One change that I consider for inclusion in fasmg source is to differentiate between a few "malloc" variants - "malloc_fixed" for blocks that are not going to be resized or freed before the end of assembly (the basic symbol tree structure buffers are like that), "malloc_growable" for blocks that are likely to be resized with "realloc" in future and plain "malloc" for blocks that may get freed but not resized. I could make them all label the same function in basic implementation, but with a custom memory allocator these hits could be taken into account to make better choices about allocation strategy.
Post 07 Apr 2017, 09:45
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 07 Apr 2017, 09:50
Instead of alternative names, why not an extra parameter input telling the allocator the expected future usage of the buffer?
Post 07 Apr 2017, 09:50
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 07 Apr 2017, 10:40
revolution wrote:
Instead of alternative names, why not an extra parameter input telling the allocator the expected future usage of the buffer?
Until there is an actual implementation that uses these hints the code may stay unaltered, with all three labels pointing to the same address and hint being merely visual.
Post 07 Apr 2017, 10:40
View user's profile Send private message Visit poster's website Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 12 Apr 2017, 07:43
I like how fasm doesn't even use stack to pass function arguments, and only uses registers and globals. I wonder how much of advantage in speed and size this way of coding has, and if any high level language compilers can produce a code like that.
Post 12 Apr 2017, 07:43
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 12 Apr 2017, 09:46
vivik wrote:
I like how fasm doesn't even use stack to pass function arguments, and only uses registers and globals. I wonder how much of advantage in speed and size this way of coding has, and if any high level language compilers can produce a code like that.
One should be careful when using globals like fasm does, because it is not thread-safe. For this reason I chose not to use EBP register in the fasmg core (even though it made a register allocation a bit harder), so that when a thread-safe version of the assembler is needed (like a DLL one when you might want to call multiple assemblies in parallel) it should be possible to put all the globals into "virtual at ebp+x" block and have a separate local copy for every call of the assembler function.
Post 12 Apr 2017, 09:46
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.