flat assembler
Message board for the users of flat assembler.
Index
> Compiler Internals > some questions about fas file format Goto page 1, 2, 3 Next |
Author |
|
Tomasz Grysztar 20 Mar 2009, 15:02
buzzkill wrote: - The field "Length of section names table" (last field in the header) has value 8. I think that should be 4 right? (filesize 0x0230 - offset of section names table 0x022D + 1 = 4) Seems like bug, I'll look into it later, when I'm back home. buzzkill wrote: - In the preprocessed source area, it looks like an 8th line has been processed (from offset 0x0154 to 0x0164), however there are only 7 lines in the src file. If this is an 8th line, where does it come from, and if it's something else, what is it? Possibly you have an LF character at the end of 7th line, which makes an 8th, empty line. buzzkill wrote: - In the assembly dump, in the contents of line 1 (offsets 0x0165 thru 0x0180), at offset 0x017E (Type of code) is the value 0x10, which would indicate 16-bit code. Since (on a 32-bit distro) there is only 32-bit code, I wonder if this value is correct? The default code setting at the beginning of assembly is 16-bit, the "format ELF" directive switches to 32-bit (or 64-bit for ELF64). So the line containing "format" directive itself is executed in 16-bit context. buzzkill wrote: - WRT "the tokenized contents of line" (in table 3 of the docs), is there a maximum length for this structure? Since the number of tokens per line is limited, I would expect the size of the resulting structure to have a maximum as well? No, the number of tokens per line is not limited, neither is the length of line structure. Try doing such test: Code: ; test.asm db "dd 1" repeat 700000 db "+1" end repeat db 10 Assemble "fasm test.asm test2.asm", and then "fasm test2.asm". buzzkill wrote: BTW Tomasz, in the docs of "Table 2 Symbol structure" there are two tables 3.1 and 3.2 mentioned, but there is only one table 3. Oh, right, these two tables have been merged into one. The reference in table 2 needs to be corrected. buzzkill wrote: - Could someone point me to some docs about calculating/creating the "extended SIB" fields? In this .fas file, they're all zero, so my hexeditor isn't much help, and I found some docs online, but they seem to use a different SIB format than fasm. Maybe the difference is in the word "extended" Table 2 already explains this: Quote: the first two bytes are register codes and the second two bytes are corresponding scales The register codes are listed in the table 2.3. So, for example, sequence of four bytes 43h, 40h, 1, 8 means the EBX+EAX*8. |
|||
20 Mar 2009, 15:02 |
|
buzzkill 20 Mar 2009, 15:48
Tomasz Grysztar wrote: Possibly you have an LF character at the end of 7th line, which makes an 8th, empty line. Yes, there is a LF at the end of the 7th line (as there should be). I take it fasm assumes this to mean that there will follow an 8th line then. Normally, with text processing programs, if the input file has 7 lines, there will be 7 lines processed and 7 lines output (assuming you generate 1 line of output for every 1 line of input). No big deal though, there wil just be an extra "line" then. Tomasz Grysztar wrote: The default code setting at the beginning of assembly is 16-bit, the "format ELF" directive switches to 32-bit (or 64-bit for ELF64). So the line containing "format" directive itself is executed in 16-bit context. OK, I thought it would be something like that, and now I'm sure . Tomasz Grysztar wrote: No, the number of tokens per line is not limited, neither is the length of line structure. Try doing such test: Ah, ofcourse, I had only thought of instruction lines, not data definition lines. How silly of me (I was thinking about how to process this token structure in a program, and was wondering how much memory to malloc for it. So unfortunately no easy answer there...) Tomasz Grysztar wrote: Table 2 already explains this: I had seen table 2.3, but didn't really put it together I guess Your example makes it clear though (maybe an idea to put an example in the docs?) So byte 3 is the scale for the register in byte 1, and byte 4 is the scale for the register in byte 2. I guess these SIBs only come into play for the second operand of an instruction then? (something like "lea edi,[esp+ecx*4+8]" would generate a SIB for the second operand that's 0x44 0x41 0x04 0x08, and no SIB for the first operand). BTW, one more thing: I noticed that empty lines or lines that contain only a comment don't get stripped/ignored, ie they wind up in the "preprocessed source" section. Why is this? There can't be tokens to be found in these lines, so why not just ignore them? (I thought it was common for preprocessors to strip such lines). |
|||
20 Mar 2009, 15:48 |
|
Tomasz Grysztar 20 Mar 2009, 16:04
buzzkill wrote: I had seen table 2.3, but didn't really put it together I guess Your example makes it clear though (maybe an idea to put an example in the docs?) So byte 3 is the scale for the register in byte 1, and byte 4 is the scale for the register in byte 2. I guess these SIBs only come into play for the second operand of an instruction then? (something like "lea edi,[esp+ecx*4+8]" would generate a SIB for the second operand that's 0x44 0x41 0x04 0x08, and no SIB for the first operand). First, what do you mean by "second operand"? You have an instruction like "mov [esp+ecx*4+8],edi". And second, those extended SIBs are used for the address values, since every instruction is assembled at some assumed address (which you can change with ORG directive). For example: Code: virtual at esp+ecx*8 nop ; 1 byte lea eax,[$] end virtual Here LEA instruction will get assembled at assumed address ESP+ECX*8+1. You will rarely see the extended SIBs in such context, because most of the code is assembled with simple numerical addresses (and that should be obvious). Second place, where you can find extended SIBs in .fas file is the symbols table, where any symbol can have the value of address, possibly with some registers added. For example: Code: label alpha dword at esi+edi is going to generate "alpha" symbol with ESI+EDI extended SIB. buzzkill wrote: BTW, one more thing: I noticed that empty lines or lines that contain only a comment don't get stripped/ignored, ie they wind up in the "preprocessed source" section. Why is this? There can't be tokens to be found in these lines, so why not just ignore them? (I thought it was common for preprocessors to strip such lines). Yes, those lines are ignored - but on the next stage, by the parser. The preprocessor keeps all the lines, just for the completness (that is actually useful when reading the preprocessed source dump later). The lines that contain preprocessor's directives only, or macro invocations, also are ignored on the entry to next stage. |
|||
20 Mar 2009, 16:04 |
|
Tomasz Grysztar 20 Mar 2009, 16:20
The bug with the section list size is fixed with the new release (uploading right now).
buzzkill wrote: Normally, with text processing programs, if the input file has 7 lines, there will be 7 lines processed and 7 lines output (assuming you generate 1 line of output for every 1 line of input). No big deal though, there wil just be an extra "line" then. But when you have LF after the each of those 7 lines, you in fact have 8 lines of text - the last one is simply empty. But you can move your caret into this last one line in text editors (at least the one I use). |
|||
20 Mar 2009, 16:20 |
|
buzzkill 20 Mar 2009, 16:36
Tomasz Grysztar wrote: The bug with the section list size is fixed with the new release (uploading right now). Thanks, I'll be downloading it shortly Tomasz Grysztar wrote: But you can move your caret into this last one line in text editors. Not in The One True Editor (vim), you can't But I also checked it in nano, and there you're allowed to go to line 8 of a 7 line text file. But let's not start an editor war (although maybe it's an idea for a poll, what editor do asm coders use, and do they have useful scripts/macros that we can copy ) And I think I finally get the SIBs : they are related to addresses, only when an instruction or label is assembled at a "special" address (instead of just "the next" address) do they come into play. (I can't explain it as well as you, but still I think I got it ) |
|||
20 Mar 2009, 16:36 |
|
Tomasz Grysztar 20 Mar 2009, 16:45
buzzkill wrote: Not in The One True Editor (vim), you can't But I also checked it in nano, and there you're allowed to go to line 8 of a 7 line text file. But let's not start an editor war (although maybe it's an idea for a poll, what editor do asm coders use, and do they have useful scripts/macros that we can copy ) Well... The One True Editor for fasm is the asmedit (see SOURCE\IDE subdirectory in the DOS and Windows distributions). Unfortunately, there are only DOS and Windows ports of it (fasmd and fasmw) available right now. I'm planning to make a fasmx (X Windows port), too - but I cannot tell you for sure, when I'm going to start this project. Anyway - that doesn't really need to be related to editors, I just gave them as an example. For me each LF (or CR-LF, or CR, depending on OS) starts a new line. So this: Code: db 31h,0Ah,32h,0Ah,33h,0Ah,34h,0Ah,35h,0Ah,36h,0Ah,37h Code: db 31h,0Ah,32h,0Ah,33h,0Ah,34h,0Ah,35h,0Ah,36h,0Ah,37h,0Ah All my tools are using this interpretation, so please be prepared for it. |
|||
20 Mar 2009, 16:45 |
|
buzzkill 20 Mar 2009, 23:42
Two small questions about the strings table (since I'm trying to write a program to read this):
- The input filename and output filename are always the first two entries in the string table, and after those come the sections and external symbols right? - Is there any way to tell when the "sections" part of the string table ends and the "external symbols" part begins? (Maybe everything that starts with a '.' is a section, the rest is an external symbol?) Edit: never mind, i can just use the "Offset/Length of section names table" fields, I should learn to read... |
|||
20 Mar 2009, 23:42 |
|
buzzkill 22 Mar 2009, 02:00
Hmm, it seems my above comment was valid after all... It turns out that in the strings table, sections and symbols are intermixed, ie. suppose you have an external symbol 'printf' defined in your '.text' section, then the strings table looks like this:
input filename output filename .text printf .data and not like I thought, first the sections and then the symbols. So my original question stands: how can you tell sections and symbols apart inside the strings table? Maybe still "if it starts with a dot, it's a section"? BTW, the ELF specs allow you to define sections that don't start with a dot... Edit: once again I think I'm on the wrong path here, the symbol table should contain all offsets of symbols in the strings table I think. I really should stop coding/reading specs before 4am... |
|||
22 Mar 2009, 02:00 |
|
buzzkill 23 Mar 2009, 15:39
In table 2.1 of the fas docs, what exactly is "the prediction" (as in eg "The prediction was needed when checking whether the symbol was used."). Also, what exactly does "The optimization adjustment is applied to the value of this symbol." mean?
|
|||
23 Mar 2009, 15:39 |
|
Tomasz Grysztar 23 Mar 2009, 16:10
Those flags are related to some internal workings of fasm, they are not really that important from the external point of view. I can give a detailed explanation later, if you really wish to know this.
|
|||
23 Mar 2009, 16:10 |
|
buzzkill 23 Mar 2009, 17:45
No that's OK, I was just wondering what those things meant I'll just limit myself to bits 0-3 for the moment
|
|||
23 Mar 2009, 17:45 |
|
buzzkill 23 Mar 2009, 20:50
Yet another question: in table 2, the "Value of symbol.", how is this calculated? Because I have eg a src line that says "number dd 3", and in the symbol table, the symbol 'number' gets value 0x3B, so it's obviously not just the assigned numerical value, ie 3 in this case.
Also, when defining strings, like eg "string db 'my very very long string',10,0", you couldn't fit all that into a qword, so obviously there must be some way of reducing the string to a value that fits into a qword. There's nothing about this in the docs, but I'm sure you can enlighten me, Tomasz |
|||
23 Mar 2009, 20:50 |
|
LocoDelAssembly 23 Mar 2009, 21:15
Tomasz probably will give the full answer later but in the meantime: perhaps in the case of "number dd 3" the "value" is the symbol address and when you do "number = 3" then its "value" will effectively be 3?
|
|||
23 Mar 2009, 21:15 |
|
buzzkill 23 Mar 2009, 21:37
I think that's it, LocoDelAssembly, thanks! I just counted, and all the offsets of my variables in the .data section match up to the "symbol value". I guess I was thrown off by the name, if it would have been called "symbol offset" or "symbol address" or something I would have thought of this myself I think. Anyway, thanks for the help.
|
|||
23 Mar 2009, 21:37 |
|
buzzkill 25 Mar 2009, 15:30
Question about the "Symbol structure" (table 2): what is an anonymous symbol? I haven't been able to find one in my own (simple) programs' fas files unfortunately.
Also, about the second-last field of the Symbol structure, when would the offset be into the strings table (ie high bit set)? In my own programs' fas files, all symbol names are offsets into the preprocessed source (ie high bit cleared). |
|||
25 Mar 2009, 15:30 |
|
Tomasz Grysztar 25 Mar 2009, 16:26
buzzkill wrote: Question about the "Symbol structure" (table 2): what is an anonymous symbol? I haven't been able to find one in my own (simple) programs' fas files unfortunately. It's @@ label. See the last paragraph of section 1.2.3 in manual. buzzkill wrote: Also, about the second-last field of the Symbol structure, when would the offset be into the strings table (ie high bit set)? In my own programs' fas files, all symbol names are offsets into the preprocessed source (ie high bit cleared). This happens when the label has a constructed name, which doesn't occur directly in the source. NASM-style locals are example: Code: some: .a: ; this defines label with name "some.a" |
|||
25 Mar 2009, 16:26 |
|
buzzkill 25 Mar 2009, 16:53
Thanks for the prompt reply, Tomasz, that clears it up for me. BTW, since the name of a symbol is stored in a pascal-style string in the preprocessed source, I assume that means the maximum identifier (symbol) length is 255? I don't think that's mentioned in the manual anywhere.
|
|||
25 Mar 2009, 16:53 |
|
Tomasz Grysztar 25 Mar 2009, 17:34
You're right, the manual doesn't mention it. I treat it more like a limitation of implementation, not language. Though with .fas his limitation became fortified.
BTW, you might also want to take a look at never-finished guide to fasm's internals. |
|||
25 Mar 2009, 17:34 |
|
buzzkill 25 Mar 2009, 18:04
Thanks for that link Tomasz, I'm definitely going to read that guide as well.
OT: I'm having an annoying problem with the forum: Often I don't get an email notification about a new post, like eg your last one here (but I did receive an email for your previous one before that). When I check the thread, I have to click "watch this topic for replies" again, even though I have "always receive mail notification" on in my profile, and I never click on "stop watching thread". Am I doing something wrong or...? |
|||
25 Mar 2009, 18:04 |
|
Goto page 1, 2, 3 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.