flat assembler
Message board for the users of flat assembler.

Index > Assembly > Notes copied from Twitter

Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 15 May 2021, 11:13
This is a collection of various notes that I originally posted on Twitter, mostly in 2018, while working on my binary formats tutorial. I copy their contents here to preserve them, but also to make it easier to find and read them all.

MZ stub:
Tomasz Grysztar wrote:
The PE loader in modern Windows does not care about the validity of MZ stub at all. It only pays attention to MZ signature and the offset of PE headers at position 0x3C.

You can take a hex editor and replace first bytes of PE with a text like "MZ? No, not really." The image is still going to load and work without a hitch.

On a related note, DOS allowed programs to start with ZM signature in place of MZ. But for a new executable formats like PE it has to be MZ and nothing else.

On the versions of Windows that were able to run DOS programs changing the first two bytes of PE file to ZM would force the system to execute a (normally hidden) DOS stub.
Tomasz Grysztar wrote:
The "Rich" header put into PE stub by modern Microsoft tools is a clever design that makes use of this otherwise ignored area. But wouldn't it be fun if the stub program itself got a bit of a face-lift?

Considering the usual sizes of PE executables nowadays, even putting a complete DOS application into a stub would not make much difference. I imagine a stub that would include Rich header, and when run under DOS would parse said header to present it in some interesting way.

Magic values:
Tomasz Grysztar wrote:
COFF's optional header mimics old a.out format, which it was supposed to replace on Unix systems. This is where the magic value in the optional header comes from.

When PE was designed as a mutation of COFF, it inherited the magic value 0x10b in the optional header, which can be traced back to ZMAGIC type of a.out.

The a.out magic value was in fact a branch instruction of PDP-11, originally it was there to simply jump over the header.
Different magic numbers corresponded to possible header sizes. The value 0x10b (octal 413) was a jump over 11 (octal 13) words.

Therefore every 32-bit PE in existence has a vestigial PDP-11 instruction hidden inside.

Tomasz Grysztar wrote:
ELF did not hold onto the legacy. It seems deliberately designed as a fresh start, cautious to not copy limitations of its predecessors.
Its magic value is just a text. Non-printable 0x7f was perhaps put there to ensure no text file would be accidentally identified as ELF.


NT subsystems:
Tomasz Grysztar wrote:
In the early days of PE, files usually had "operating system version" set to 1.0 and only "subsystem version" was used for the actual version of Windows (like 3.10 or 4.0).

My guess is that this referred to the native layer of the new Windows NT kernel, as opposed to the API compatible with earlier Windows versions implemented as a subsystem atop of that.

Initially NT kernel had other subsystems, like one compatible with OS/2 API. Years later, this feature showed up again with the introduction of Linux subsystem.


Section alignment:
Tomasz Grysztar wrote:
When PE image uses a section alignment smaller than the page size, there is a restriction that relative addresses in memory need to exactly match the offsets in file.

This allows to load such file into memory as one big blob of data. The attributes of the sections do not matter, everything in the image can be overwritten or executed.


Import Directory Entry:
Tomasz Grysztar wrote:
An early description of PE format from 1993 defined Import Directory Entry with layout a bit different from the final standard. The first field, which now points to ILT, was designated as flags. Various headers and documents retain an obsolete name Characteristics for it.

This change was probably a reason for some old linkers to produce imports with no ILT. The initial content of IAT is identical to what ILT would have, though the loader is then forced to rewrite it with the addresses of functions. Windows to this day keeps accepting such variant.


Empty sections:
Tomasz Grysztar wrote:
Modern Windows (based on NT kernel) rejects to load a PE file containing an empty section (one that has virtual size set to zero). Other implementations, like Windows 9x or Win32s, did not object to such setup.

On the other hand, Windows 9x interpreted relocation data containing no records as a signal that the image could not be loaded at another address. Therefore a DLL with .reloc section of zero length could fail both on Windows 9x and NT, but for very different reasons.

Win32s was the least fussy. It was not troubled by zero-length sections and correctly interpreted an empty but present relocation data.

Export Ordinal Table:
Tomasz Grysztar wrote:
PE specification has contradictory statements about the Export Ordinal Table. First defines it as an array of indexes into the Export Address Table, then suggests that these are the ordinals.

An ordinal is formed by adding an EAT index to the Ordinal Base. The Export Ordinal Table, contrary to its name, does not contain actual ordinals, just plain zero-based indexes.

Algorithm presented in PE specification is therefore wrong in suggesting to subtract the Ordinal Base from the index that was read from the Export Ordinal Table.

An early description of PE from 1993 says that the ordinals in EOT "already include the Ordinal Base". Could this be understood the other way around?


Large ordinals:
Tomasz Grysztar wrote:
Export table in PE may have Ordinal Base higher than 65535. In theory it should be possible to import functions with such high numbers, since ILT entries may contain 31-bit ordinals. This worked perfectly on old implementations like Windows 9x.

Nevertheless, Windows 10 sees only 16 bits of an imported ordinal, even though at the same time it uses full numbers from the export table. Then a high base effectively prevents this method of importing.

I also tested import by ordinal from a library exporting large number of addresses. Modern Windows ended up calling function 1 where the old one ran function 65537.

In my tests any old version of Windows seems able to import by ordinal larger than 65536. When it is also linked with a large index in EAT, it works only on Windows 9x and Win32s. On Windows 10 an ordinal from ILT is truncated to 16 bits and such imports are not possible at all.


Manifest loadFrom:
Tomasz Grysztar wrote:
In Windows, an executable can spell out a path to a DLL it needs with a loadFrom attribute in RT_MANIFEST resource. This is a nice tool for testing and experimentation, too bad it is undocumented.
Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
    <file name="library.dll" loadFrom="%APPDATA%\Tutorial\library.dll" />
</assembly>    


Resource tables:
Tomasz Grysztar wrote:
Early PE specifications like http://bytepointer.com/resources/pecoff_v4.0.htm stated that resource tables use RVAs to point to subdirectories or data entries. At the same time they included example that contradicted this by using offsets relative to the beginning of the table.

The example was correctly showing what the actual implementation ended up doing. Later it was removed, though. And specification was not corrected until version 9 or so, relatively recently.


Printable x86 code:
Tomasz Grysztar wrote:
An x86 program that consists only of printable ASCII characters may sound like something intended for shellcode, but it is an art with a long history. I first saw it in MS-DOS when I encountered files made with COM2TXT. http://sac.sk/files.php?d=17&l=C

The beginning of such program contains lots of PUSH and POP instructions (luckily they have codes in ASCII range) and with help of a few other carefully crafted ones it creates a secondary code on the stack, able to decode the original program from the remaining text.


Undocumented instructions:
Tomasz Grysztar wrote:
I find it ironic that the only time I saw SALC instruction described in official CPU documentation was when manuals for x86-64 (later renamed to AMD64) listed it among opcodes not promoted to long mode.

The list of x86 opcodes not carried over to long mode did not, however, include FDISI and FENI, though they served no purpose since 80287. They continue to execute as valid instructions in all modes, yet they are just exotic forms of NOP.


Pentium F00F bug and Win9x:
Tomasz Grysztar wrote:
The Pentium F00F bug might have sounded like nothing serious to the users of Windows 95. Getting that system to freeze was nothing extraordinary, even a user-mode program could do it with just a couple of regular instructions. The simplest method I know is CLI followed by JMP $.

In the excellent write-up on the Pentium F00F bug Robert R. Collins mentioned that it had to be posted on comp.os.linux.advocacy to reach the people that would understand the seriousness of the issue.
http://rcollins.org/ddj/May98/F00FBug.html


Relocatable PE:
Tomasz Grysztar wrote:
As long as PE has fixups, it may get relocated when default base is unavailable, apparently even IMAGE_FILE_RELOCS_STRIPPED does not block this action (it did on Win32s, though).

On the other hand either setting IMAGE_FILE_RELOCS_STRIPPED or clearing IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE is enough to hinder ASLR even when fixups are present.


Win32s:
Tomasz Grysztar wrote:
Win32s, which allowed to run 32-bit programs on Windows 3.1, was a peculiar early implementation of PE loading. It was not able to load programs at their standard base, so every image had to include relocations.

I tested at what address Win32s was loading my image, it was 0x8c440000. On modern Windows I would need IMAGE_FILE_LARGE_ADDRESS_AWARE flag to run at such high base.

When I prepared a PE with fixed base 0x8c440000, Win32s was in fact able to run it without relocations. But only one copy at a time, because all programs shared the same address space.


PT_PHDR:
Tomasz Grysztar wrote:
ELF specification defines PT_PHDR as optional and I used to think that it is never really needed in Linux.

But now my experiments indicate it is needed in ET_DYN file that has PT_INTERP/PT_DYNAMIC segments. I get a segfault if I do not include a valid PT_PHDR there.


Origins of executable formats:
Tomasz Grysztar wrote:
While working on my tutorial I've been digging through Usenet archives and I found that - contrary to my earlier impression - Mach-O and ELF appeared almost simultaneously in 1988, and PE might also had been born around that date (development of NT was started then).

The need to replace original COFF with something more capable and extensible must have been strong. Each of the new formats had been made future-proof enough so that they stayed with us for 30 years and more.
Post 15 May 2021, 11:13
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 15 May 2021, 23:43
I like to force ASLR with a base address of zero.
Tomasz Grysztar wrote:
Manifest loadFrom:
I did a little research on this one. It was documented in some DDK releases and then later leaked in windows source leaks. So, this is actively removed from documentation even though it's always been part of the schema.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 15 May 2021, 23:43
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 16 May 2021, 08:38
bitRAKE wrote:
I like to force ASLR with a base address of zero.
This forces the use of fixups, but it does not by itself force ASLR - that was the point of my note. If you make a PE file with base zero, but either set IMAGE_FILE_RELOCS_STRIPPED or clear IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE, your image is going to be just relocated to base 10000h, there is no randomization. Down that thread on Twitter I also added a link to the recording of a live stream where I demonstrated this.
Post 16 May 2021, 08:38
View user's profile Send private message Visit poster's website Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 18 May 2021, 02:35
Origins of executable formats:
Tomasz Grysztar wrote:
.....Each of the new formats had been made future-proof enough so that they stayed with us for 30 years and more.


Enough time for me to catch up, Razz x86 Assembly language and PE file format have not changed much for decades.

Nice read, Tomasz. As a side note, your Twitter posts earlier than 18 Jul 2018 were not found but I saw you registered in 2013? Sorry if I have become busybody.

And I like your Paged Out article. I am proud of you.
Post 18 May 2021, 02:35
View user's profile Send private message Reply with quote
wizgogo



Joined: 11 Dec 2020
Posts: 10
Location: Hell, Norway
wizgogo 18 May 2021, 08:52
Any info about android ELF?
Post 18 May 2021, 08:52
View user's profile Send private message Reply with quote
Hrstka



Joined: 05 May 2008
Posts: 61
Location: Czech republic
Hrstka 18 May 2021, 11:13
In PE32+ relative virtual addresses remained only 32-bit. To me it would make more sense if they were 64-bit.
Post 18 May 2021, 11:13
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 24 May 2021, 15:57
Tomasz Grysztar wrote:
bitRAKE wrote:
I like to force ASLR with a base address of zero.
This forces the use of fixups, but it does not by itself force ASLR - that was the point of my note. If you make a PE file with base zero, but either set IMAGE_FILE_RELOCS_STRIPPED or clear IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE, your image is going to be just relocated to base 10000h, there is no randomization. Down that thread on Twitter I also added a link to the recording of a live stream where I demonstrated this.
Absolutely, it is quite a stretch to call that ASLR, lol. I must have missed this video during one of my vacations. It and the accompanied thread are quite thorough!

There are many invalid addresses that result in forced relocation. For example, DEFAULT_IMAGE_BASE := -1 shl 16.


A useful configuration for 64-bit is the use:
Code:
        .Characteristics                dw IMAGE_FILE_32BIT_MACHINE \
                + IMAGE_FILE_EXECUTABLE_IMAGE \
                + IMAGE_FILE_LARGE_ADDRESS_AWARE
...
        .DllCharacteristics             dw IMAGE_DLLCHARACTERISTICS_NX_COMPAT \
                + IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE\
                + IMAGE_DLLCHARACTERISTICS_HIGH_ENTROPY_VA    
This results in the minimal mapping within the 32-bit address space. Only two 4k pages are mapped: 0x7FFE0000 (KUSER_SHARED_DATA) and 0x7FFE6000 (?).

Yes, even the stack is high in memory. This is useful from a debugging standpoint of catching address truncation errors when migrating 32-bit code. It's useful from a algorithmic standpoint when working with pointer heavy structures - as they can be put low in memory - effectively doubling throughput (whilst making the code equivalent to the 32-bit at minimum). Useful from an obfuscation standpoint as one can allocate the low memory matching the process space and produce some very confusing code. Razz

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 24 May 2021, 15:57
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.