flat assembler
Message board for the users of flat assembler.

Index > Linux > chastecmp: hexadecimal comparison of two files

Author
Thread Post new topic Reply to topic
chastitywhiterose



Joined: 13 Oct 2025
Posts: 33
chastitywhiterose 12 Nov 2025, 15:35
# chastecmp

chastecmp: compares two files in hexadecimal
chastecmp file1 file2

Files must be same size for best results. As soon as the end of either files is reached, the program ends. This is based on the assumption that it is used on files of a fixed format or size.

Example files: video game save files, executables compiled with minor changes in variables, text files with typos in one version and not another.

## Purpose

This program is designed to be a complement to my previous program: chastehex.

https://gitlab.com/chastitywhiterose/chastehex

https://github.com/chastitywhiterose/chastehex

The two programs help each other in two ways.

First, if you have two files that are mostly similar, you can use chastecmp to find which bytes are different.

Second, if you were to make a copy of a file and then use chastehex to modify the bytes of the file, then chastecmp could tell you which bytes you had modified when compared to the original file.

The main source is posted below and my header files are included as attachments. The good news is that my latest header, chasteio.asm handles displaying the correct message when a file cannot be opened or if it was opened successfully. chasteio.asm handles file io whereas chastelib.asm handles number conversion and printing strings and integers in any base I need.

Code:
;Linux 32-bit Assembly Source for chastecmp
format ELF executable
entry main

include 'chasteio32.asm'

main:

;radix will be 16 because this whole program is about hexadecimal
mov [radix],16 ; can choose radix for integer input/output!
mov [int_width],1
mov [int_newline],0 ;disable automatic printing of newlines after putint

pop eax
mov [argc],eax ;save the argument count for later

;first arg is the name of the program. we skip past it
pop eax
dec [argc]
mov eax,[argc]

;call putint
;call putspace

cmp eax,2
jb help
mov [file_offset],0 ;assume the offset is 0,beginning of file
jmp arg_open_file_1

help:
mov eax,help_message
call putstring
jmp main_end

arg_open_file_1:
pop eax
mov [filename1],eax ; save the name of the file we will open to read
call open
cmp eax,0
js main_end ;end program if the file can't be opened
mov [filedesc1],eax ; save the file descriptor number for later use

arg_open_file_2:
pop eax
mov [filename2],eax ; save the name of the file we will open to read
call open
cmp eax,0
js main_end ;end program if the file can't be opened
mov [filedesc2],eax ; save the file descriptor number for later use

files_compare:

file_1_read_one_byte:
mov edx,1          ;number of bytes to read
mov ecx,byte1 ;address to store the bytes
mov ebx,[filedesc1] ;move the opened file descriptor into EBX
mov eax,3          ;invoke SYS_READ (kernel opcode 3)
int 80h            ;call the kernel

;eax will have the number of bytes read after system call
cmp eax,0
jnz file_2_read_one_byte ;unless zero bytes were read, proceed to read from next file

mov eax,[filename1]
call putstring
mov eax,end_of_file_string
call putstring

jmp main_end ;we have reach end of one file and should end program

file_2_read_one_byte:
mov edx,1          ;number of bytes to read
mov ecx,byte2 ;address to store the bytes
mov ebx,[filedesc2] ;move the opened file descriptor into EBX
mov eax,3          ;invoke SYS_READ (kernel opcode 3)
int 80h            ;call the kernel

;eax will have the number of bytes read after system call
cmp eax,0
jnz compare_bytes ;unless zero bytes were read, proceed to compare bytes from both files

mov eax,[filename2]
call putstring
mov eax,end_of_file_string
call putstring

jmp main_end ;we have reach end of one file and should end program


compare_bytes:

mov al,[byte1]
mov bl,[byte2]

;compare the two bytes and skip printing them if they are the same
cmp al,bl
jz same
call print_bytes_info
same:

inc [file_offset]

jmp files_compare

main_end:

;this is the end of the program
;we close the open files and then use the exit call

mov eax,[filedesc1] ;file number to close
call close
mov eax,[filedesc2] ;file number to close
call close


mov eax, 1  ; invoke SYS_EXIT (kernel opcode 1)
mov ebx, 0  ; return 0 status on exit - 'No Errors'
int 80h

;print the address and the bytes at that address
print_bytes_info:
mov eax,[file_offset]
mov [int_width],8
call putint
call putspace
mov [int_width],2
mov eax,0
mov al,[byte1]
call putint
call putspace
mov al,[byte2]
call putint
call putspace
call putline
ret

;variables for displaying information

help_message db 'chastecmp: compares two files in hexadecimal',0Ah
db 9,'chastecmp file1 file2',0Ah
db 'The bytes of the files are compared until EOF of either is reached.',0Ah,0

end_of_file_string db ' has reached EOF',0Ah,0

;in this section, only the relevant functions from chastelib32.asm were copied over

stdout dd 1 ; variable for standard output so that it can theoretically be redirected

putstring:

push eax
push ebx
push ecx
push edx

mov ebx,eax ; copy eax to ebx as well. Now both registers have the address of the main_string

putstring_strlen_start: ; this loop finds the lenge of the string as part of the putstring function

cmp [ebx],byte 0 ; compare byte at address ebx with 0
jz putstring_strlen_end ; if comparison was zero, jump to loop end because we have found the length
inc ebx
jmp putstring_strlen_start

putstring_strlen_end:
sub ebx,eax ;ebx will now have correct number of bytes

;write string using Linux Write system call
;https://www.chromium.org/chromium-os/developer-library/reference/linux-constants/syscalls/#x86-32-bit


mov edx,ebx      ;number of bytes to write
mov ecx,eax      ;pointer/address of string to write
mov ebx,[stdout] ;write to the STDOUT file
mov eax, 4       ;invoke SYS_WRITE (kernel opcode 4 on 32 bit systems)
int 80h          ;system call to write the message


pop edx
pop ecx
pop ebx
pop eax

ret ; this is the end of the putstring function return to calling location

;this is the location in memory where digits are written to by the putint function
int_string     db 32 dup '?' ;enough bytes to hold maximum size 32-bit binary integer
; this is the end of the integer string optional line feed and terminating zero
; clever use of this label can change the ending to be a different character when needed 
int_newline db 0Ah,0

radix dd 2 ;radix or base for integer output. 2=binary, 8=octal, 10=decimal, 16=hexadecimal
int_width dd 8

;this function creates a string of the integer in eax
;it uses the above radix variable to determine base from 2 to 36
;it then loads eax with the address of the string
;this means that it can be used with the putstring function

intstr:

mov ebx,int_newline-1 ;find address of lowest digit(just before the newline 0Ah)
mov ecx,1

digits_start:

mov edx,0
div dword [radix]
cmp edx,10
jb decimal_digit
jge hexadecimal_digit

decimal_digit: ;we go here if it is only a digit 0 to 9
add edx,'0'
jmp save_digit

hexadecimal_digit:
sub edx,10
add edx,'A'

save_digit:

mov [ebx],dl
cmp eax,0
jz intstr_end
dec ebx
inc ecx
jmp digits_start

intstr_end:

prefix_zeros:
cmp ecx,[int_width]
jnb end_zeros
dec ebx
mov [ebx],byte '0'
inc ecx
jmp prefix_zeros
end_zeros:

mov eax,ebx ; now that the digits have been written to the string, display it!

ret

; function to print string form of whatever integer is in eax
; The radix determines which number base the string form takes.
; Anything from 2 to 36 is a valid radix
; in practice though, only bases 2,8,10,and 16 will make sense to other programmers
; this function does not process anything by itself but calls the combination of my other
; functions in the order I intended them to be used.

putint: 

push eax
push ebx
push ecx
push edx

call intstr

call putstring

pop edx
pop ecx
pop ebx
pop eax

ret

;the strint function from chastelib32.asm has been excluded for this program because unlike in chastehex, we are not reading strings to get numbers. We are only outputting byte numbers by converting them to strings with intstr

;the next utility functions simply print a space or a newline
;these help me save code when printing lots of things for debugging

space db ' ',0
line db 0Dh,0Ah,0

putspace:
push eax
mov eax,space
call putstring
pop eax
ret

putline:
push eax
mov eax,line
call putstring
pop eax
ret

;variables for managing arguments and files
argc dd ?
filename1 dd ? ; name of the file to be opened
filename2 dd ? ; name of the file to be opened
filedesc1 dd ? ; file descriptor
filedesc2 dd ? ; file descriptor
byte1 db ?
byte2 db ?
bytes_read dd ?
file_offset dd ?
    


Description:
Download
Filename: chasteio32.asm
Filesize: 2.55 KB
Downloaded: 16 Time(s)

Description:
Download
Filename: chastelib32.asm
Filesize: 5.35 KB
Downloaded: 13 Time(s)

Post 12 Nov 2025, 15:35
View user's profile Send private message Send e-mail Reply with quote
chastitywhiterose



Joined: 13 Oct 2025
Posts: 33
chastitywhiterose 20 Nov 2025, 10:34
I made a small update to the code. Now it will tell you the name of which file reached the end first. If the files are the same length, the first file passed as an argument will be the one that is at EOF first just because it is read from first. In any case, the idea is the same that comparison is only useful in files that are similar. I was experimenting using this program in my recent analysis of the ELF format as seen in other forum topic. By assembling slightly different files, I was able to detect which bytes differed.
Post 20 Nov 2025, 10:34
View user's profile Send private message Send e-mail Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1200
Location: Russia
macomics 20 Nov 2025, 18:00
Just use objdump
Post 20 Nov 2025, 18:00
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20774
Location: In your JS exploiting you and your system
revolution 21 Nov 2025, 04:15
I think programs like this are great, and encourage people to post more.

Blindly running an already existing program is good to get a particular job done, but teaches the user nothing about how things work, or how to make an improved version, a simpler version, a faster version, a smaller version, or a specialised version, of the same.
Post 21 Nov 2025, 04:15
View user's profile Send private message Visit poster's website Reply with quote
chastitywhiterose



Joined: 13 Oct 2025
Posts: 33
chastitywhiterose 21 Nov 2025, 16:53
macomics wrote:
Just use objdump


My chastecmp program is usable for all kinds of files, not just object files understood by objdump. Indeed a hex comparison is usable for formats no longer in use. For example, comparing two save files of games like Castle of the Winds or Cave Story, which are both games I have hacked using my own tools chastehex and chastecmp.

Hex dumps and comparisons are generically useful, not just for object file formats.
Post 21 Nov 2025, 16:53
View user's profile Send private message Send e-mail Reply with quote
chastitywhiterose



Joined: 13 Oct 2025
Posts: 33
chastitywhiterose 21 Nov 2025, 16:58
revolution wrote:
I think programs like this are great, and encourage people to post more.

Blindly running an already existing program is good to get a particular job done, but teaches the user nothing about how things work, or how to make an improved version, a simpler version, a faster version, a smaller version, or a specialised version, of the same.


You get my idea. The whole point of learning assembly is for learning about how things work. I can already write these programs in C using a lot less source code, but I felt 15 kilobytes for a compiled C program was bloated and wasteful. Now I am making rewrites of my C programs that fit in less than 2 kilobytes. Most of my programs are toy programs meant to generate integer sequences but some of them are tools like this that I actually use for multiple purposes. I hope people are inspired by my programs.
Post 21 Nov 2025, 16:58
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.