flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > Relocation info for format binary. Old suggestion.

Author
Thread Post new topic Reply to topic
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
What is the conceptual state of the idea? I noticed, a new stable version line were gonna be released, and thought of reminding the author of a way back planned feature (as it was mentioned here).

The reminder is closely related to this topic. Having said, that "the officially invalid alignment values, starting from 4, were going to be achievable only by the manually created PEs", I disregarded the lack of the possibility to generate relocations. This is in particular of interest with respect to Windows drivers, because many of them use alignments starting from 0x20, which are not possible to achieve with fasm.

_________________
Faith is a superposition of knowledge and fallacy
Post 08 Dec 2013, 11:21
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
I have been playing with this idea for ages already, but I never invented any specific design that would be good enough for me to enthusiastically start working on it. I cannot promise anything at this moment.
Post 08 Dec 2013, 12:13
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
I'm very shortsighted when it comes to making crossplatform decisions, but I'm still gonna ask. Are there any disadvantages in just using the PE fixups format?

_________________
Faith is a superposition of knowledge and fallacy
Post 08 Dec 2013, 14:48
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
I'm giving this idea a thought once again and I think I have finally come up with a design that could be satisfactory to me. The recent changes to the handling of addressing spaces have partially paved the way for it. I'm going to try a "proof of concept" implementation soon and see whether it works out.

l_inc wrote:
I'm very shortsighted when it comes to making crossplatform decisions, but I'm still gonna ask. Are there any disadvantages in just using the PE fixups format?
I really wanted to allow more flexible way of handling this, so that you could create files in custom formats not supported internally by fasm without having change you format specification to handle the PE fixups instead.
Post 28 Jan 2014, 22:41
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
I never meant, a custom executable format should always be using the PE relocations. I meant, that creating a custom relocation table by parsing some_very_generic_relocation_table put into a virtual block does not seem to be affected by the format of the "some_very_generic_relocation_table" . Thus absolutely any format would be acceptable as a basis for any other custom relocation format. One just need to parse it out of the virtual block and store in the binary in any desired format. With PE relocations as a basis you just don't need to invent something new. Or am I missing something?

_________________
Faith is a superposition of knowledge and fallacy
Post 29 Jan 2014, 09:59
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
l_inc wrote:
I never meant, a custom executable format should always be using the PE relocations. I meant, that creating a custom relocation table by parsing some_very_generic_relocation_table put into a virtual block does not seem to be affected by the format of the "some_very_generic_relocation_table" . Thus absolutely any format would be acceptable as a basis for any other custom relocation format. One just need to parse it out of the virtual block and store in the binary in any desired format. With PE relocations as a basis you just don't need to invent something new. Or am I missing something?
fasm already has the custom internal format that gets converted into PE format when generating PE fixups data. And what I would like to do is to allow direct conversion from this fasm's internal structure into whatever you want to have in binary output.
Post 29 Jan 2014, 10:56
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
What do you mean by "direct conversion"? Any automatic means? Or the same way of load/store based parsing out of a virtual block I was talking about?

_________________
Faith is a superposition of knowledge and fallacy
Post 30 Jan 2014, 20:58
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
l_inc wrote:
What do you mean by "direct conversion"? Any automatic means? Or the same way of load/store based parsing out of a virtual block I was talking about?
The "variables in loop" similar to the REPEAT with % variable mechanism.
Post 30 Jan 2014, 22:24
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17332
Location: In your JS exploiting you and your system
revolution
I know that for ARM specific relocations there are a few different types with differing effects. Like 8-bit offset, 24-bit offset, 32-bit absolute (and probably some more with the introduction of 64-bit ARMv8) so I hope the repeat loop can also provide the extra type information to construct a correct table.

Without this extra type information being recorded and supplied to the relocs writer, ARM code would be forced into using cumbersome springboard stubs everywhere to always provide a 32/64-bit memory location for the loader to populate with an absolute address.
Post 31 Jan 2014, 11:08
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
Oh, you'd like to avoid storing of the data temporarily. Not sure I can understand, why it's that important, but I'm then also a bit concerned, whether it's then possible to provide all the necessary information in a couple of iteration scope variables.

_________________
Faith is a superposition of knowledge and fallacy
Post 31 Jan 2014, 13:02
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
The information that is necessary includes the type of relocation, a base that the target is relative to (PE fixups lack this kind of field, by the way, since PE uses a different mechanism for the imports from external libraries) and address in code, which has to be provided as "addressing space + address within that space" pair. This means at least four variables of the % kind.

Note that variables like % or %t do not store a copy of the value, they server as a special identificators and fasm fetches the right value for them every time they are used. Thus in this case loop will just move the internal pointer in the (already existing) fasm's internal relocation information storage and expressions using % variables would receive the values directly from that structure (with some conversion if really needed).
Post 31 Jan 2014, 13:38
View user's profile Send private message Visit poster's website Reply with quote
VEG



Joined: 06 Feb 2013
Posts: 81
Location: Minsk, Belarus
VEG
Any news about it? Maybe FASM could provide access to internal relocations structure in a additional named virtual address space for reading by "load" instruction? It would be very useful in some cases.
Post 02 May 2015, 09:21
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
I partially explored a new approach to this problem in the "fasm g" architecture. At the moment, since everything is getting implemented in form of macros there, this is not really a complex problem (it is enough to do something like reloading a DD or other such instruction with macro to handle relocatable values), but if I ever wanted to make fasm 2 based on such architecture, I would also need to implement some additional facilities to allow a non-macro instruction to generate something similar to such DD macro call. Still, I think that some of the aspects of the new architecture make it much better suited to deal with this kind of problems. Though I don't know whether I would ever start working on fasm 2 based on it. At the moment I took a break.
Post 03 May 2015, 20:54
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
But what about fasm 1? Is the loop variable based approach still in question?

_________________
Faith is a superposition of knowledge and fallacy
Post 03 May 2015, 21:35
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7734
Location: Kraków, Poland
Tomasz Grysztar
I'm reluctant to promise or declare anything related to fasm 1 at this moment. I really felt that further pushing the limits of that engine may actually make it worse, that's why last year I started working on a new one. Though it is still possible that experiences I get from the work on projects like "fasm g" may also give me some new ideas for refurbishing the old fasm 1 engine. There are already many features that I originally intended for fasm 2 but then I was able to incorporate them into fasm 1.
Post 06 May 2015, 10:29
View user's profile Send private message Visit poster's website Reply with quote
l_inc



Joined: 23 Oct 2009
Posts: 881
l_inc
Tomasz Grysztar
Every time I think of this feature it reminds me that the PE formatting isn't flexible enough to specify the file and section alignments, which would particularly be useful for drivers, because they typically use alignments you claim to be beyond the scope of the PE specification. Accessing the relocation information would at least allow to level this shortage by creating appropriate PEs manually.

_________________
Faith is a superposition of knowledge and fallacy
Post 06 May 2015, 12:31
View user's profile Send private message Reply with quote
alexfru



Joined: 23 Mar 2014
Posts: 75
alexfru
I needed a somewhat similar thing when I was exploring the options of how to create 16-bit DOS executables with my compiler (Smaller C), which generates assembly code for NASM. I can get away without relocations in 32-bit Windows and Linux executables, pretty much creating flat binaries with a proper "org address" directive, but that trick won't work in DOS as the address space is one and shared between all modules loaded into memory (DOS, drivers, TSRs, etc). It turns out, NASM supports 16-bit relocations in ELF, so I can use ELF for both 32-bit and 16-bit code. However, there's no segmentation support in ELF and nothing for 16-bit MZ .EXEs, which is OK for tiny and small memory model .COMs and .EXEs. But I also wanted some kind of huge model implemented, where I'd not be constrained by 64KB for code and 64KB of data/stack in an .EXE, IOW, I wanted to support potentially more than 2-3 full 64KB segments and I needed relocations for that and I wanted them the simplest possible way.

I ended up doing the following...
For every reference to an object or subroutine like
Code:
call fxn1
mov eax, fxn1 ; get address of function fxn1
mov eax, var1 ; get address of variable var1
pfxn1 dd fxn1 ; pfxn1 is a variable pointing to function fxn1
pvar1 dd var1 ; pvar1 is a variable pointing to variable var1
    

the compiler would emit a 32-bit record into a dedicated relocation section.
The record contains the address of pfxn1 or pvar1 or the imm32/disp32 field in the call/mov instructions above. You can now probably see where this is going.
The .EXE's startup code would then loop over the entries in this special relocation section and fix up the addresses, by adding the base address, where the .EXE is loaded to memory, thus forming proper physical addresses in all code and data.

32-bit physical addresses, you may ask? Yep. You may wonder how the above can work for 16-bit code running in real or virtual 8086 mode. The above is in fact a simplification or just a part of what actually happens.

First, the compiler generates code mostly oblivious to segmentation and thinks every address is a 32-bit physical/linear address without any segments in it. But for every memory access the code generator actually generates several instructions to break such a 32-bit address down into the 16-bit segment and 16-bit offset parts and then access memory. Naturally, this leads to a two-fold code size expansion and considerable perf degradation, but things work nonetheless and the C programmer needs not deal with the limitations of the segmented memory model explicitly and perf-critical portions of the code can still be coded in assembly if needed (as in old days). Though, with modern hardware, performance of 16-bit/DOS code is less of a problem, computers are an order of magnitude faster now than at the end of the DOS era.

Second, there actually are two dedicated sections for relocations. One is a regular one and the other one is for calls to subroutines, e.g. call fxn1. The calls are actually far calls and the relocation process not only adds something to the address encoded in the call instruction, but also makes it a proper far address with a 16-bit segment and a 16-bit offset as required by the instruction.

I guess, the assembler (I'm now talking about FASM) could also optionally generate a phantom relocation section containing the locations in the code and data sections, where address fix-ups are needed and a couple of symbols designating the start of this section and its end or size. In my case with Smaller C and NASM I simply made the compiler generate this section explicitly. I can then assemble 16-bit code with 32-bit addresses into ELF object files, there's probably nothing special about this arrangement (NASM simply generates proper operand size and address size prefixes requested by the "bits 16" directive that the compiler emits) and then these ELF object files are linked together in a usual way to make a flat binary only with a proper MZ .EXE header at the beginning.
Post 16 Jun 2015, 12:41
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.