flat assembler
Message board for the users of flat assembler.
Index
> Programming Language Design > How C stores symbol table? (type information) Goto page 1, 2 Next |
Author |
|
vivik 05 Apr 2017, 07:40
How C compiler stores symbol table? This link https://www.tutorialspoint.com/compiler_design/compiler_design_symbol_table.htm
explained most of it, but I'm still not sure how they store type data. They can be quite complex, some types are actually combinations of types (array of struct, pointer to pointer to integer), some types reference user defined types (creating a struct creates a new type, which can be referenced in another type (array of structs again)). How compilers and debuggers store that? |
|||
05 Apr 2017, 07:40 |
|
revolution 05 Apr 2017, 08:07
I'm pretty sure it depends upon which C compiler you are using. There is no standard AFAICT.
|
|||
05 Apr 2017, 08:07 |
|
revolution 05 Apr 2017, 09:05
You do not appear to the designing a language. Instead you are making a C compiler. A laudable goal in itself but that is not designing a language.
|
|||
05 Apr 2017, 09:05 |
|
Trinitek 05 Apr 2017, 09:37
Well that's interesting. I didn't expect a website like TutorialsPoint.com to have articles on compiler design.
|
|||
05 Apr 2017, 09:37 |
|
Tomasz Grysztar 05 Apr 2017, 10:01
revolution wrote: You do not appear to the designing a language. Instead you are making a C compiler. A laudable goal in itself but that is not designing a language. Quote: Discuss the design of programming languages and construction of compilers, interpreters and assemblers. |
|||
05 Apr 2017, 10:01 |
|
revolution 05 Apr 2017, 11:04
Tomasz Grysztar wrote: so this thread likely fits. |
|||
05 Apr 2017, 11:04 |
|
vivik 05 Apr 2017, 13:42
I'd probably store variable type as a single dword. With one bit of it deciding if it's a normal type or a user defined type.
The problem with it is in those combined types, like *******int. I know you rarely meet more than two pointer stars, but still compiler probably has to support up to infinity pointers, this means this probably requires up to infinity space... Probably that's a good place to use linked lists, I dunno. Yeah, I think I'll do something like that. Type information is two dword, first dword is type second dword is child type and when I'll need to "say" that a symbol is a *********int (this looks like a swearing, lol), I'll create a bunch of middle types that aren't used by any symbol directly. I'll create a type with first dword saying pointer, and second dword saying int. Then I'll create a type with first dword saying pointer, and second dword saying the previous type. And continue until I created what I wanted. The next problem is with languages that have not one, but two child types. Python and go have "dict" and "map" data types, and both of them are maps of something to something. That's two child types, this means at least three dwords, for every type. And C++ allows creation of combined types (they are called templates, i think) that may use even more child types, which means again up to infinity dwords. I'm again not sure how to implement that. Since I moved to assembly, I started to panic whenever I see a variable length anything. I guess that's my problem right now. |
|||
05 Apr 2017, 13:42 |
|
Tomasz Grysztar 05 Apr 2017, 14:49
vivik wrote: Since I moved to assembly, I started to panic whenever I see a variable length anything. I guess that's my problem right now. For example to make a linked list traversable in both directions you need to add just two 32-bit (or 64-bit) fields to every element - call them like "next" and "previous" - and store pointers to adjacent elements there. To access the list just keep a pointer to a element of your choice (the first one, or the last one, or even one in the middle if that makes sense, it all depends on application). Insertion of a new element at any point may look like: Code: ; ebx - new element (probably freshly malloc'd) ; esi - element in list after which to insert new one mov [ebx+Element.previous],esi mov eax,ebx xchg eax,[esi+Element.next] mov [ebx+Element.next],eax mov [eax+Element.previous],ebx And of course, to implement any kind of compiler - even a relatively simple one like an assembler - you need to be able to process growing data structures en masse. The symbol tree in fasmg contains a thick pattern of interconnected structures, some of the connections create linked lists, some make trees, hash tables, hash trees (a hash-based binary trie is my favorite one, I use it in fasm perhaps partly out of sentiment). And it does not have any complex types at all! |
|||
05 Apr 2017, 14:49 |
|
vivik 05 Apr 2017, 16:36
@Tomasz Grysztar
Interesting how in fasm you only seem to use memory allocation like once, at the very beginning, and that's it. If I understood and remember correctly. Was it just more convenient that way, or you wanted fasm to be faster? Also, can you point out at where the binary trie is used in fasm? I'm curious how it looks like. Yeah, I should start using malloc already, I'm avoiding it much more than necessary. |
|||
05 Apr 2017, 16:36 |
|
Tomasz Grysztar 05 Apr 2017, 17:40
vivik wrote: Interesting how in fasm you only seem to use memory allocation like once, at the very beginning, and that's it. If I understood and remember correctly. Was it just more convenient that way, or you wanted fasm to be faster? vivik wrote: Also, can you point out at where the binary trie is used in fasm? I'm curious how it looks like. |
|||
05 Apr 2017, 17:40 |
|
vivik 05 Apr 2017, 19:15
Glad I finally stopped worrying about default windows memory allocator being slow, I finished designing data structures pretty fast after that.
I wonder what reasons golang developers had for supporting maps (aka dicts, hash tables) on type system level. Can't it just be a class with a bunch of methods? Having maps in type system only makes sense if there is some kind of function that may assept only arguments of this exact type. Can't imagine a function that needs to have maps of strings to integers, and nothing else. Hash tables are in golang probably because they were in python, and python is pretty lax about type system in general. If I'll ever need to make sure that a function always receives the hash table of correct type at compile type, there are more ways to make sure of than to just use type system. I'll use something that operates on program source, I guess? Dunno, don't want to think about it now. So, this means I can ignore maps. |
|||
05 Apr 2017, 19:15 |
|
Tomasz Grysztar 05 Apr 2017, 19:42
vivik wrote: Glad I finally stopped worrying about default windows memory allocator being slow, I finished designing data structures pretty fast after that. |
|||
05 Apr 2017, 19:42 |
|
vivik 07 Apr 2017, 09:33
I noticed you use HeapSize right after HeapAlloc in the fasmg source. This means HeapAlloc may allocate more memory than requested? How often is this the case?
I tried googling, and found this thread about somebody asking for a memory with size 0 and getting memory with size 1: https://www.experts-exchange.com/questions/23458459/Why-does-HeapAlloc-allocates-more-memory-than-I-ask-it-to-under-Vista-but-not-under-XP.html I planned to use HeapSize to keep track of all variable length data structures, but if it will suddenly allocate 12 bytes when I ask for only 8, it will break things pretty badly. But if it only allocates more memory when it's not dividible by 4, or when asked for something weird like zero size memory, and behaves correctly in all other cases, when I will use HeapSize for that. Overwise, I'll have to keep sizes somewhere in my own data, manually. Probably as a first dword inside data chunk itself. |
|||
07 Apr 2017, 09:33 |
|
revolution 07 Apr 2017, 09:37
There are no guarantees about what is allocated other than you can use what you ask for; and maybe more if the OS uses some form of minimum allocation or alignment strategies. Also, do not rely on all versions of any particular OS all doing the same things.
TL;DR never assume things, always check. |
|||
07 Apr 2017, 09:37 |
|
Tomasz Grysztar 07 Apr 2017, 09:45
Note that my "malloc" function is a generic API, it always returns the size allocated, but the code that called it may use this information or not. If you discard such information, you simply do not use the excess memory if such is provided and it should not be a big problem - any "malloc" implementation wastes some memory anyway. But in cases where I allocated buffers that may need to grow later I store the allocated size, so when there was more memory allocated that requested it still may get used before calling "realloc" to grow the buffer.
One change that I consider for inclusion in fasmg source is to differentiate between a few "malloc" variants - "malloc_fixed" for blocks that are not going to be resized or freed before the end of assembly (the basic symbol tree structure buffers are like that), "malloc_growable" for blocks that are likely to be resized with "realloc" in future and plain "malloc" for blocks that may get freed but not resized. I could make them all label the same function in basic implementation, but with a custom memory allocator these hits could be taken into account to make better choices about allocation strategy. |
|||
07 Apr 2017, 09:45 |
|
revolution 07 Apr 2017, 09:50
Instead of alternative names, why not an extra parameter input telling the allocator the expected future usage of the buffer?
|
|||
07 Apr 2017, 09:50 |
|
Tomasz Grysztar 07 Apr 2017, 10:40
revolution wrote: Instead of alternative names, why not an extra parameter input telling the allocator the expected future usage of the buffer? |
|||
07 Apr 2017, 10:40 |
|
vivik 12 Apr 2017, 07:43
I like how fasm doesn't even use stack to pass function arguments, and only uses registers and globals. I wonder how much of advantage in speed and size this way of coding has, and if any high level language compilers can produce a code like that.
|
|||
12 Apr 2017, 07:43 |
|
Tomasz Grysztar 12 Apr 2017, 09:46
vivik wrote: I like how fasm doesn't even use stack to pass function arguments, and only uses registers and globals. I wonder how much of advantage in speed and size this way of coding has, and if any high level language compilers can produce a code like that. |
|||
12 Apr 2017, 09:46 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.