flat assembler
Message board for the users of flat assembler.

Index > Linux > chastecmp: hexadecimal comparison of two files

Author
Thread Post new topic Reply to topic
chastitywhiterose



Joined: 13 Oct 2025
Posts: 52
chastitywhiterose 12 Nov 2025, 15:35
# chastecmp

chastecmp by Chastity White Rose

chastecmp file1 file2

Bytes that differ between files are shown in hexadecimal
until the EOF has been reached.

Files must be same size for best results. As soon as the end of either files is reached, the program ends. This is based on the assumption that it is used on files of a fixed format or size.

Example files: video game save files, executables compiled with minor changes in variables, text files with typos in one version and not another.

## Purpose

This program is designed to be a complement to my previous program: chastehex.

https://gitlab.com/chastitywhiterose/chastehex

https://github.com/chastitywhiterose/chastehex

The two programs help each other in two ways.

First, if you have two files that are mostly similar, you can use chastecmp to find which bytes are different.

Second, if you were to make a copy of a file and then use chastehex to modify the bytes of the file, then chastecmp could tell you which bytes you had modified when compared to the original file.

The main source is posted below and my header files are included as attachments. The good news is that my latest header, chasteio.asm handles displaying the correct message when a file cannot be opened or if it was opened successfully. chasteio.asm handles file io whereas chastelib.asm handles number conversion and printing strings and integers in any base I need.

Code:
;Linux 32-bit Assembly Source for chastecmp
format ELF executable
entry main

include 'chastelib32.asm'

main:

;radix will be 16 because this whole program is about hexadecimal
mov [radix],16 ; can choose radix for integer input/output!
mov [int_width],1

pop eax
mov [argc],eax ;save the argument count for later

;first arg is the name of the program. we skip past it
pop eax
dec [argc]
mov eax,[argc]

cmp eax,2
jb help
mov [file_offset],0 ;assume the offset is 0,beginning of file
jmp arg_open_file_1

help:
mov eax,help_message
call putstring
jmp main_end

arg_open_file_1:
pop eax
mov [filename1],eax ; save the name of the file we will open to read

call putstring ;print the name of the file we will try opening

mov ecx,0   ;open file in read mode 
mov ebx,eax ;filename should be in eax before this function was called
mov eax,5   ;invoke SYS_OPEN (kernel opcode 5)
int 80h     ;call the kernel

cmp eax,0
js file_error_display ;end program if the file can't be opened
mov [filedesc1],eax ; save the file descriptor number for later use
mov eax,file_open
call putstr_and_line

arg_open_file_2:
pop eax
mov [filename2],eax ; save the name of the file we will open to read

call putstring ;print the name of the file we will try opening

mov ecx,0   ;open file in read mode 
mov ebx,eax ;filename should be in eax before this function was called
mov eax,5   ;invoke SYS_OPEN (kernel opcode 5)
int 80h     ;call the kernel

cmp eax,0
js file_error_display ;end program if the file can't be opened
mov [filedesc2],eax ; save the file descriptor number for later use
mov eax,file_open
call putstr_and_line

files_compare:

file_1_read_one_byte:
mov edx,1          ;number of bytes to read
mov ecx,byte1 ;address to store the bytes
mov ebx,[filedesc1] ;move the opened file descriptor into EBX
mov eax,3          ;invoke SYS_READ (kernel opcode 3)
int 80h            ;call the kernel

;eax will have the number of bytes read after system call
mov [file_1_bytes_read],eax ;we save the number of bytes read for later
cmp eax,0
jnz file_2_read_one_byte ;unless zero bytes were read, proceed to read from next file

mov eax,[filename1]
call putstring
mov eax,end_of_file_string
call putstr_and_line

;Even if we have reached the end of the first file,
;we still proceed to read a byte from the second file
;to see if it also ends at the same address

file_2_read_one_byte:
mov edx,1          ;number of bytes to read
mov ecx,byte2 ;address to store the bytes
mov ebx,[filedesc2] ;move the opened file descriptor into EBX
mov eax,3          ;invoke SYS_READ (kernel opcode 3)
int 80h            ;call the kernel

;eax will have the number of bytes read after system call
mov [file_2_bytes_read],eax ;we save the number of bytes read for later
cmp eax,0
jnz check_both_bytes ;unless zero bytes were read, proceed to compare bytes from both files

mov eax,[filename2]
call putstring
mov eax,end_of_file_string
call putstr_and_line

jmp main_end ;we have reach end of one file and should end program

check_both_bytes:

;we add the number of bytes read from both files
mov eax,[file_1_bytes_read]
add eax,[file_2_bytes_read]
cmp eax,2
jnz main_end

compare_bytes:

mov al,[byte1]
mov bl,[byte2]

;compare the two bytes and skip printing them if they are the same
cmp al,bl
jz bytes_are_same

;print the address and the bytes at that address
mov eax,[file_offset]
mov [int_width],8
call putint_and_space
mov [int_width],2
mov eax,0
mov al,[byte1]
call putint_and_space
mov al,[byte2]
call putint_and_line

bytes_are_same:

inc [file_offset]

jmp files_compare

file_error_display:

mov eax,file_error
call putstr_and_line

main_end:

;this is the end of the program
;we close the open files and then use the exit call

mov ebx,[filedesc1] ;file number to close
mov eax,6   ;invoke SYS_CLOSE (kernel opcode 6)
int 80h     ;call the kernel

mov ebx,[filedesc2] ;file number to close
mov eax,6   ;invoke SYS_CLOSE (kernel opcode 6)
int 80h     ;call the kernel

mov eax, 1  ; invoke SYS_EXIT (kernel opcode 1)
mov ebx, 0  ; return 0 status on exit - 'No Errors'
int 80h

;variables for displaying information

help_message db 'chastecmp by Chastity White Rose',0Ah,0Ah
db 9,'chastecmp file1 file2',0Ah,0Ah
db 'Bytes that differ between files are shown in hexadecimal',0Ah
db 'until the EOF has been reached.',0Ah,0


file_open db ' opened',0
file_error db ' error',0
end_of_file_string db ' EOF',0

;variables for managing arguments and files
argc dd ?
filename1 dd ? ; name of the file to be opened
filename2 dd ? ; name of the file to be opened
filedesc1 dd ? ; file descriptor
filedesc2 dd ? ; file descriptor
byte1 db ?
byte2 db ?
file_1_bytes_read dd ?
file_2_bytes_read dd ?
file_offset dd ?
    


Description:
Download
Filename: chastelib32.asm
Filesize: 7.03 KB
Downloaded: 26 Time(s)



Last edited by chastitywhiterose on 13 Apr 2026, 17:16; edited 1 time in total
Post 12 Nov 2025, 15:35
View user's profile Send private message Send e-mail Reply with quote
chastitywhiterose



Joined: 13 Oct 2025
Posts: 52
chastitywhiterose 20 Nov 2025, 10:34
I made a small update to the code. Now it will tell you the name of which file reached the end first. If the files are the same length, the first file passed as an argument will be the one that is at EOF first just because it is read from first. In any case, the idea is the same that comparison is only useful in files that are similar. I was experimenting using this program in my recent analysis of the ELF format as seen in other forum topic. By assembling slightly different files, I was able to detect which bytes differed.
Post 20 Nov 2025, 10:34
View user's profile Send private message Send e-mail Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1213
Location: Russia
macomics 20 Nov 2025, 18:00
Just use objdump
Post 20 Nov 2025, 18:00
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20956
Location: In your JS exploiting you and your system
revolution 21 Nov 2025, 04:15
I think programs like this are great, and encourage people to post more.

Blindly running an already existing program is good to get a particular job done, but teaches the user nothing about how things work, or how to make an improved version, a simpler version, a faster version, a smaller version, or a specialised version, of the same.
Post 21 Nov 2025, 04:15
View user's profile Send private message Visit poster's website Reply with quote
chastitywhiterose



Joined: 13 Oct 2025
Posts: 52
chastitywhiterose 21 Nov 2025, 16:53
macomics wrote:
Just use objdump


My chastecmp program is usable for all kinds of files, not just object files understood by objdump. Indeed a hex comparison is usable for formats no longer in use. For example, comparing two save files of games like Castle of the Winds or Cave Story, which are both games I have hacked using my own tools chastehex and chastecmp.

Hex dumps and comparisons are generically useful, not just for object file formats.
Post 21 Nov 2025, 16:53
View user's profile Send private message Send e-mail Reply with quote
chastitywhiterose



Joined: 13 Oct 2025
Posts: 52
chastitywhiterose 21 Nov 2025, 16:58
revolution wrote:
I think programs like this are great, and encourage people to post more.

Blindly running an already existing program is good to get a particular job done, but teaches the user nothing about how things work, or how to make an improved version, a simpler version, a faster version, a smaller version, or a specialised version, of the same.


You get my idea. The whole point of learning assembly is for learning about how things work. I can already write these programs in C using a lot less source code, but I felt 15 kilobytes for a compiled C program was bloated and wasteful. Now I am making rewrites of my C programs that fit in less than 2 kilobytes. Most of my programs are toy programs meant to generate integer sequences but some of them are tools like this that I actually use for multiple purposes. I hope people are inspired by my programs.
Post 21 Nov 2025, 16:58
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2026, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.