flat assembler
Message board for the users of flat assembler.

Index > Main > COMDAT support

Author
Thread Post new topic Reply to topic
hg



Joined: 24 Jan 2006
Posts: 10
hg
I realised today how much I hate Masm!
It's inefficiencies are driving me crazy and the amount of work to fix them are not small.

I'm in progress of builing a re-assembler project but I have some very big problems which I hope fasm can solve.

What I'm doing is to read a win32 ms coff object file and process it into asm code. This asm code must be able to assemble again.
Right now I'm doing the following:

1. read coff object file
2. write masm output
3. compile with masm
4. link with rest objects

The purpose is to get a object which works identically as the original.

However with C++ there are LOTS of problems. Masm doesn't support COMDAT. The problem is that if you have section ABC in two object files they must be tagged as either NONDUPLICATE which means that you will get a linker error saying duplicate symbols or as COMDAT meaning the linker will throw away one of the sections and keep the other.

Since Masm doesn't support this I tried to hack my way out of it. By including both sections and marking them private I can get the linker to accept both. Naturally we get a duplicate symbol problem but I solved this by making new names for the symbols in question.

However I got stuck. After a whole day of debugging I reached into a new problem which I *think* I more or less understand now:
We all know that ecx is used by MS compilers for __thiscall calling convention and is a pointer to the vtable. Whenever I re-assemble 2 files I get crashes. I realised the vtable is wrong and the most logical explanation I can find for this is because I got two copies (COMDAT problem again + my hack).

So the question is: Can fasm support COMDAT? I don't think it's mentioned in the manual but maybe it could be done for a future version?
Based on how COMDAT works it's really easy to support in a assembler. Basicly you have a section in the object file which is marked with IMAGE_SCN_LNK_COMDAT and then have a valid auxility symbol for the section.

Thanks in advance.
Post 24 Jan 2006, 18:32
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
The quick way to add support for this flag:
open FORMATS.INC file, find the "formatter symbols" on the very end of file, and add this line:
Code:
 db 6,'comdat',19h,12    

between the lines defining 'console' and 'data' symbols (as it has to be alphabetically sorted list).
Now reassemble fasm (with itself), and you can use newly-added keyword this way:
Code:
section '.prv' COMDAT data readable writeable    

(the case doesn't matter).

I can also add officially it in the next release. Are you sure this flag is enough for the COMDAT to work properly? I will look into specifications to check for any catch.

BTW, there are already two non-documented keywords in fasm, "hidden" for IMAGE_SCN_LNK_REMOVE and "linkinfo" for IMAGE_SCN_LNK_INFO. The second one seems good to me, but I'm not sure about leaving the "hidden", maybe something like "linkremove" would be better? What's your opinion?
Post 24 Jan 2006, 18:52
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Thanks for the quick reply, Privalov - I'm interested in COMDAT as well (and it's me who directed hg here Smile ).
Post 24 Jan 2006, 18:57
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
I just checked this out, and simply setting the flag is not enough, the COMDAT section has to have an additional record in symbol table, which has to select one of the values for COMDAT type:
IMAGE_COMDAT_SELECT_NODUPLICATES
IMAGE_COMDAT_SELECT_ANY
IMAGE_COMDAT_SELECT_SAME_SIZE
IMAGE_COMDAT_SELECT_EXACT_MATCH
IMAGE_COMDAT_SELECT_ASSOCIATIVE
IMAGE_COMDAT_SELECT_LARGEST

This may need some more sophisticated syntax, especially the IMAGE_COMDAT_SELECT_ASSOCIATIVE type, which needs to be in correspondence with some other section.
Post 24 Jan 2006, 19:01
View user's profile Send private message Visit poster's website Reply with quote
hg



Joined: 24 Jan 2006
Posts: 10
hg
Yes exactly! It's very important to support all the comdat types. I'm trying to build a re-assembler based on existing coff files. Limited support won't cut it. I didn't yet test your patch but the first thing that came in mind was that it basicly only supports enabling of it but not setting the correct auxility symbol.

The record must be 18 bytes like all the other symbols records, stored right after the symbol record which keeps the name of the section and must have a checksum of the comdat section in question. It's described on page 40 of pecoff.doc at http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx.

I'm not sure how to calculate the checksum yet. I didn't look into the algorithm. However I apreciate your fast response and really hope that you will want to support it.

I do hope you realize that no other assembler seems to support this. That makes it extra important.

As for the other question you asked I recommend you use "linkremove". You want to keep your keywords as close to the syntax as possible. Making your own terms will only mislead users. By using "linkremove" you can easily recall what it means in relation to the documentation.

Thanks so far.

Regards,
Henrik Goldman
Post 24 Jan 2006, 20:30
View user's profile Send private message Reply with quote
hg



Joined: 24 Jan 2006
Posts: 10
hg
Oh yeah... One more thing. Since I don't know how the checksum is calulated it might be that my new calucated checksum won't be the same as the original. I mean if I take a section from the original coff file and write it out in fasm it might not be 100% identical. However the linker will need the original checksum that I had before to make a correct descision. Because if you have A.obj and B.obj I might only process one of them. The resulting exe should still choose the same comdat sections though no matter if I processed A.obj, B.obj or both.

Keep in mind that it might be a good idea to force a specific checksum for a comdat section.

Regards,
Henrik Goldman
Post 24 Jan 2006, 20:41
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
I do not consider using some arbitrary algorithm for checksum - it has to be the same (I hope it won't be hard to find it out). Also, do you know whether this checksum is needed only for some particular type of COMDAT (perhaps IMAGE_COMDAT_SELECT_EXACT_MATCH?). Specification is very sketchy about it.

I'm thinking about the syntax like:
Code:
section '.prv' COMDAT 3 data readable writeable    

to select the 3rd type (IMAGE_COMDAT_SELECT_SAME_SIZE). I don't know however, what to do with type 5, as it needs to have somehow specified a relation to some other section. Or maybe allow only types 1-4 and 6 for now?
Post 24 Jan 2006, 20:56
View user's profile Send private message Visit poster's website Reply with quote
hg



Joined: 24 Jan 2006
Posts: 10
hg
Since I don't like guessing I wrote a little dumper utility for my existing tool that I work on. I dumped a standard VC++ 7.1 object file which contains a little mix of STL (mostly just string and list class) and some custom C++ classes.

The results are attached here.
I don't think the checksum is very hard to get.
From what I can see checksum is applied to most of the sections.

What confuses me about type 5 is how to find the relations to the other sections.

Take a look at the log and lets go from there. Most type 5 are only for debug sections which I throw away anyway.

-- Henrik

Edit: Since I could not upload the file I'm attaching part of it:

Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xC5B428EF
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0x8F629757
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xAA09C88B
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xB8BC6765
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0x00000000
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xA848E8F7
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xA032AF3E
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0x5019579F
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xC5B428EF
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0x8F629757
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xAA09C88B
Section Name: .rdata Characteristics: 0x40301040 Comdat selection: 2 Comdat Checksum: 0xB8BC6765
Section Name: .text Characteristics: 0x60501020 Comdat selection: 2 Comdat Checksum: 0x27B0B9D7
Section Name: .debug$F Characteristics: 0x42101040 Comdat selection: 5 Comdat Checksum: 0x00000000
Section Name: .text Characteristics: 0x60501020 Comdat selection: 2 Comdat Checksum: 0x026D930A
Section Name: .debug$F Characteristics: 0x42101040 Comdat selection: 5 Comdat Checksum: 0x00000000
Section Name: .text Characteristics: 0x60501020 Comdat selection: 2 Comdat Checksum: 0xC043E03C
Section Name: .debug$F Characteristics: 0x42101040 Comdat selection: 5 Comdat Checksum: 0x00000000
Section Name: .text Characteristics: 0x60501020 Comdat selection: 2 Comdat Checksum: 0xA92228E6
Section Name: .debug$F Characteristics: 0x42101040 Comdat selection: 5 Comdat Checksum: 0x00000000
Section Name: .text Characteristics: 0x60501020 Comdat selection: 2 Comdat Checksum: 0xD8D6B039
Post 24 Jan 2006, 21:19
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
I noticed that each COMDAT section needs the related symbol name, and while the section names may be the same for the different COMDAT objects, this extrnal symbol name is used to identify them. This implies that it's the COMDAT symbol that you define, and the section is iternally split into separate COMDAT sections for each symbol. This would require some syntax like:
Code:
section 'rdata' readable writeable

  comdat TheSymbolName 3 ; the number is the COMDAT selection
    ; definitions
  end comdat

  comdat SomeOtherSymbol 2
    ; ...
  end comdat    

which would internally create actually two '.rdata' COMDAT sections.
Post 25 Jan 2006, 08:48
View user's profile Send private message Visit poster's website Reply with quote
hg



Joined: 24 Jan 2006
Posts: 10
hg
May I ask where you found this information? I thought it used the checksum for comparing but I might be absolutely wrong. Also where is this symbol name stored? From what I understood the only information there is about comdat is in the auxiliary format 5 which comes right after the section name symbol.

-- Henrik
Post 25 Jan 2006, 09:08
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7797
Location: Kraków, Poland
Tomasz Grysztar
Microsoft PE/COFF specification wrote:
The first symbol having the section value of the COMDAT section must be the section symbol. This symbol has the name of the section, Value field equal to 0, the section number of the COMDAT section in question, Type field equal to IMAGE_SYM_TYPE_NULL, Class field equal to IMAGE_SYM_CLASS_STATIC, and one auxiliary record. The second symbol is called “the COMDAT symbol” and is used by the linker in conjunction with the Selection field

So there is the auxiliary record for the section symbol but then also additional symbol (marked with storage class 2, that is as external), which is "the COMDAT symbol". From the sample object file I saw that there were many '.rdata' COMDAT sections created, and each section symbol was immediately followed by such COMDAT symbol, having some unique name. I guess this is name is used to recognize what symbol we are actually defining. I don't know however for what purpose the checksum is used - I cannot find it in the COFF specs.
Post 25 Jan 2006, 09:19
View user's profile Send private message Visit poster's website Reply with quote
hg



Joined: 24 Jan 2006
Posts: 10
hg
Ok thanks for the follow up.

I realized it _might_ not be needed to use comdat afterall to solve my problem.
I found out the problems I had with masm were not related to comdat afterall. However it would still be a useful contribution.

I must admit though that I'm really excited about fasm!
I've spent hours and hours on generating code for masm and each time I hit a problem I have to use lots of code to solve those problems. Since I generate assembler files of atleast 5000 lines of code debugging is not fun to do since it's not human made.

With fasm I did a quick manual test this morning and prooved that it works great with minimal effort.

Regarding comdat looked at the the specification as well and also found it very problemation to find the needed comdat checksum information. I hope it's solvable though. Perhaps it's located in imaghelp.dll or whatever it's called like the image checksum. It's not something I've researched though.

-- Henrik
Post 25 Jan 2006, 11:35
View user's profile Send private message Reply with quote
cjacobi



Joined: 28 Jan 2013
Posts: 6
Location: Palo Alto
cjacobi
Apology, this isn't FASM related (but I am a happy FASM user) and I was Google searching for ever.

My application is reading and writing object files like yours. However when comdat symbols are found (or generated), Microsoft linking of my object-files crashes. In particular (in Microsoft VS generated files) I am finding comdat sections (of comdat type == ANY) which do NOT have a comdat symbol. My reading of the coff spec says that should not occur. I wonder whether you have an idea how to interpret such comdat sections, when to generate or avoid regenerating sections without a comdat symbol.

Thanks for explanations, links to better documentation, or linker source-code I can understand.
Chris
Post 08 Jan 2014, 22:40
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.