flat assembler
Message board for the users of flat assembler.

Index > Programming Language Design > Instruction encoding verification

Author
Thread Post new topic Reply to topic
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8022
Location: Kraków, Poland
Tomasz Grysztar
When implementing new instruction sets into an assembler, it is good to have some kind of automated or semi-automated testing. One option is to use a third-party disassembler (if there is one) and compare its output with the original source. Another option is to have reference output for a given source and then compare the binary files. But I have been playing with yet another variant - to have the reference output in the source code itself, like:
Code:
use64
vaddps ymm6{k1}{z},ymm12,ymm24          ; 62 91 1C A9 58 F0
vsubps zmm1,zmm2,dword [rsi] {1to16}    ; 62 F1 6C 58 5C 0E    
I have made a small module for fasmg that allows to automatically verify with references defined as in the above sample. The test might look like:
Code:
; CPU headers to test:

        include 'cpu/x64.inc'
        include 'cpu/ext/avx512.inc'

; Verification module:

        include 'iev.alm'

; Reference files to test:

        include 'instructions.ref'    
IEV.ALM:
Code:
; Instruction Encoding Verifier for fasm g

        retaincomments

        calminstruction calminstruction?.initsym? var*, val&
                publish var, val
        end calminstruction

        calminstruction calminstruction?.asm? command&
                local i, const, tmp
                initsym i, 0
                initsym const, const
                compute i, i+1
                arrange tmp, const#i
                publish tmp, command
                arrange tmp, =assemble tmp
                assemble tmp
        end calminstruction

        calminstruction hex_nibble digit*, command: display
                compute digit, 0FFh and '0123456789ABCDEF' shr (digit*8)
                arrange command, command digit
                assemble command
        end calminstruction

        calminstruction hex_dump data*, command: display
                local   digit, i
                compute i, 0
            loop:
                local   digit
                compute digit, (data shr (i*8+4)) and 0Fh
                call    hex_nibble, digit, command
                compute digit, (data shr (i*8)) and 0Fh
                call    hex_nibble, digit, command
                compute i, i + 1
                check   lengthof data > i
                jno     done
                local   separator
                arrange separator, command ' '
                assemble separator
                jump    loop
            done:
        end calminstruction

        calminstruction ? line&
                local comment, i, chunk
                local data, reference

                match ;comment, line
                jyes done

                compute i, $
                match line;comment, line
                assemble line
                jno done

                asm load data:$-i from :i

                arrange reference, =0x
                compute i, 0
            extract:
                match chunk comment?, comment
                jno completed
                compute i, i + 1
                arrange reference, reference#chunk
                jump extract
            completed:
                compute reference, reference bswap i

                check data eq reference
                jyes done

                stringify line
                asm display 'Discrepancy:',13,10,9, line, 13,10, 'Assembled: '
                call hex_dump, data
                asm display 13,10, 'Reference: '
                call hex_dump, reference
                asm display 13,10

            done:
        end calminstruction    
This of course could be used with other instruction set implementations for fasmg, so if you're working on one, perhaps you may find it helpful.

With fasm 1 processing the same test is a bit harder and the simplest route is to convert the reference into something more fasm-friendly, like the following fasmg-based script does.

REF2FASM.CMD:
Code:
@goto equ

        MAX_INSTRUCTION_LENGTH := 40

        retaincomments

        macro processor
                PROCESSED = __FILE__
                macro ?! line&
                        if __FILE__ = PROCESSED
                                match instruction;comment, line
                                        db '@ ',`instruction,MAX_INSTRUCTION_LENGTH-lengthof `instruction dup ' ',' @ ',`comment
                                else
                                        db `line
                                end match
                                db 13,10
                        end if
                end macro
        end macro

        db "include 'iev.inc'",13,10

        include SRCFILE, processor

:equ

@echo off
if not exist "%~1" goto info

fasmg "%~f0" "%~n1.fasm" -i"SRCFILE='%1'"

goto end

:info

echo Please provide a name of file containing x86 test reference.
echo This tool converts the reference into IEV test for fasm.

:end    
This is a source file for fasmg that also doubles as a Windows batch file that launches the process automatically (it expects fasmg to be in PATH environment variable, but can easily modified to use full path instead). It converts the reference into a source that looks like:
Code:
include 'iev.inc'
use64
@ vaddps ymm6{k1}{z},ymm12,ymm24           @ 62 91 1C A9 58 F0
@ vsubps zmm1,zmm2,dword [rsi] {1to16}     @ 62 F1 6C 58 5C 0E    
And this is then much easier for fasm 1 to process, just a single macro suffices.

IEV.INC:
Code:
; Instruction Encoding Verifier for fasm 1

virtual at 0
        __hex:: db '0123456789ABCDEF'
end virtual

macro @ line&
{
        local proxy, offset, length, reference, reference_length, a, b, c
        define proxy line
        match instruction =@ comment, proxy
        \{
                offset = $ - $$
                instruction
                length = $ - $$ - offset
                virtual at 0
                        reference::
                        irps chunk, comment
                        \\{
                                db 0x\\#chunk
                        \\}
                        reference_length = $
                end virtual
                c = 0
                if length = reference_length
                        repeat length
                                load a byte from $$+offset+%-1
                                load b byte from reference:%-1
                                if a <> b
                                        c = 1
                                        break
                                end if
                        end repeat
                else
                        c = 1
                end if
                if c
                        display 'Assembled:'
                        repeat length
                                load a byte from $$+offset+%-1
                                load b byte from __hex:a shr 4
                                load c byte from __hex:a and 0Fh
                                display ' ',b,c
                        end repeat
                        display 13,10,'Reference:'
                        repeat reference_length
                                load a byte from reference:%-1
                                load b byte from __hex:a shr 4
                                load c byte from __hex:a and 0Fh
                                display ' ',b,c
                        end repeat
                        display 13,10
                        err
                end if
        \}
}    

This one is probably not very useful, though, unless you plan to tweak the source of fasm 1 or implement instructions of some other CPU, etc. Instead of being tested, fasm's output could itself become a reference. And I have made another script that does exactly that.

ASM2REF.CMD:
Code:
@goto equ

        MAX_INSTRUCTION_LENGTH := 40

        virtual at 0
                opcodes:: file BINFILE
                TOP_OFFSET = $
        end virtual

        virtual at 0
                file FASFILE

                load preprocessed_offset:dword from 32
                load dump_offset:dword from 40
                load dump_length:dword from 44
                PREVIOUS_OFFSET = 0
                LINE_NUMBER = 0
                repeat dump_length/28, i:0
                        load CURRENT_OFFSET:dword from dump_offset+i*28
                        if (CURRENT_OFFSET>PREVIOUS_OFFSET)
                                repeat 1, n:LINE_NUMBER
                                        load DATA#n:CURRENT_OFFSET-PREVIOUS_OFFSET from opcodes:PREVIOUS_OFFSET
                                end repeat
                                PREVIOUS_OFFSET = CURRENT_OFFSET
                        end if
                        load line_offset:dword from dump_offset+i*28+4
                        load LINE_NUMBER:dword from preprocessed_offset+line_offset+4
                end repeat
                if (TOP_OFFSET>PREVIOUS_OFFSET)
                        repeat 1, n:LINE_NUMBER
                                load DATA#n:TOP_OFFSET-PREVIOUS_OFFSET from opcodes:PREVIOUS_OFFSET
                        end repeat
                end if
        end virtual

        calminstruction hex_nibble digit*, command: db
                compute digit, 0FFh and '0123456789ABCDEF' shr (digit*8)
                arrange command, command digit
                assemble command
        end calminstruction

        calminstruction hex_dump data*, command: db
                local   digit, i
                compute i, 0
            loop:
                local   digit
                compute digit, (data shr (i*8+4)) and 0Fh
                call    hex_nibble, digit, command
                compute digit, (data shr (i*8)) and 0Fh
                call    hex_nibble, digit, command
                compute i, i + 1
                check   lengthof data > i
                jno     done
                local   separator
                arrange separator, command ' '
                assemble separator
                jump    loop
            done:
        end calminstruction

        macro processor
                PROCESSED = __FILE__
                macro ?! line&
                        if __FILE__ = PROCESSED
                                db `line
                                repeat 1, n:__LINE__
                                        if defined DATA#n
                                                db MAX_INSTRUCTION_LENGTH-lengthof `line dup ' ', '; '
                                                hex_dump DATA#n
                                        end if
                                end repeat
                                db 13,10
                        end if
                end macro
        end macro

        include SRCFILE, processor

:equ

@echo off
if not exist "%~1" goto info

fasm "%~1" "%~n1.bin" -s "%~n1.fas"
fasmg "%~f0" "%~n1.ref" -i"SRCFILE='%1'" -i"BINFILE='%~n1.bin'" -i"FASFILE='%~n1.fas'"
del "%~n1.bin"
del "%~n1.fas"

goto end

:info

echo Please provide a name of file containing x86 instructions.
echo This tool assembles them with fasm and generates a test reference.

:end    
This one was really fun to craft - again it combines fasmg source with a batch file, but this time it also uses the .FAS file generated with fasm to combine the source lines with bytes extracted from fasm-generated output. The reference it makes looks exactly as the ones shown here in the beginning.

On a final one, I feel it's important to add that this kind of testing is usually just a "sanity check", to make sure that nothing is obviously broken in your instruction encoders. In my experience bugs may be very context-dependent and can be easily missed by such basic testing. Therefore in fasm development my release process usually included verifying several real projects in addition to a collection of snippets (starting with reassembling itself, which has always been the most obvious sanity check for fasm).
Post 16 Dec 2021, 14:02
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 18755
Location: In your JS exploiting you and your system
revolution
I think it is important to have more than one tool written by different people and combine/compare the results.

The simplest form of that is a disassembler written by one person, and an assembler written by another person. Then generate random, or directed, binary files and run them through the disassembler and reassemble to ensure the files match.

Having different people write each part is good to check for errors caused by a misunderstanding, a bad assumption, or just reading something wrong, from the documentation.
Post 16 Dec 2021, 22:39
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8022
Location: Kraków, Poland
Tomasz Grysztar
Yes, cross-checking with third-party tools is the most natural way, especially when implementing something from scratch. And my practice shows that you're likely to end up finding bugs in both your own and the third-party tools. It is also important to keep in mind that sometimes two tools can disagree and yet both be right - an example of that is when assemblers have a unique "footprint" in how they choose to encode instructions that can be represented in multiple equivalent ways.

Such tool-specific choices are why you might want to have your own references. Especially when you work on features that may not be supported by other assemblers or simply are specific to the one you are working on. When implementing rare border cases of optimization, it should be obvious what tests to craft to verify the specific feature, while random testing would be unlikely to land on the exact boundaries.
Code:
use32
inc dword [ds:ebp+ebp+7Fh]              ; 3E FF 44 2D 7F
inc dword [ds:ebp+ebp+80h]              ; FF 04 6D 80 00 00 00    
This is why having each method in the testing suite is important. Hand-crafted references may be the only way to ensure that some unique features of given assembler are working as intended.

Anyway, this thread is mainly just me having fun with my own tools. Parsing a fasm-generated .FAS file with fasmg was a surprisingly enjoyable adventure.
Post 17 Dec 2021, 10:45
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.