flat assembler
Message board for the users of flat assembler.

Index > Linux > [Solved] Need help in creating a hexdump utility

Author
Thread Post new topic Reply to topic
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 01 Oct 2021, 09:56
I have done something like this, generating using HLL:

Code:
00000000    7F  45  4C  46  02  01  01  03  00  00  00  00  00  00  00  00      .ELF............
00000010    02  00  3E  00  01  00  00  00  B0  00  40  00  00  00  00  00      ..>.......@.....
00000020    40  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00      @...............
00000030    00  00  00  00  40  00  38  00  02  00  40  00  00  00  00  00      ....@.8...@.....    


Now I want to create the hexadecimal dump output using Assembly language:

I managed to reach the stage where I can open file, read file, and write the output.
Still figuring out how to convert ASCII character code to printable hexadecimal value....

I paste the code on here as my backup:

Code:
format ELF64 executable 3

segment readable executable

entry $

      pop     r8
      pop     rsi       ;APP_NAME
      pop     rsi       ;1st command-line argument
      mov     rdi,rsi
      cmp     rdi,0
      je      _err   
      ;lea     rdi,[fn]
      xor     rsi,rsi   ;O_RDONLY
      mov     rax,2     ; sys_create
      syscall
      cmp     rax,-1
      je      _err
      mov     dword [fd],eax

_redo:
      mov     rdx,16
      lea     rsi,[buffer]      
      mov     edi,dword [fd]
      xor     rax,rax  ; sys_read
      syscall
      cmp     rax,0
      je      _err

      mov     rdx,rax
      lea     rsi,[buffer]
      mov     rdi,1    ; STDOUT
      mov     rax,1    ; sys_write
      syscall
      jmp     _redo
      
      mov     edi,dword [fd]
      mov     rax,3    ; sys_close
      syscall

_err:
      mov     rdi,rax
      mov     rax,60   ; sys_exit
      syscall

segment readable writeable

buffer rb      16
fd     dd      ?
;fn     db      'cpubrand.asm',0    


Last edited by FlierMate on 01 Oct 2021, 16:38; edited 1 time in total
Post 01 Oct 2021, 09:56
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8353
Location: Kraków, Poland
Tomasz Grysztar 01 Oct 2021, 10:22
FlierMate wrote:
Still figuring out how to convert ASCII character code to printable hexadecimal value....
In one of the parts of my video tutorial I show an implementation of a "ShowHex" routine in detail. Even though it is shown under Windows, these instructions and snippets are not really OS-dependent.
Post 01 Oct 2021, 10:22
View user's profile Send private message Visit poster's website Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 01 Oct 2021, 13:12
Tomasz Grysztar wrote:
FlierMate wrote:
Still figuring out how to convert ASCII character code to printable hexadecimal value....
In one of the parts of my video tutorial I show an implementation of a "ShowHex" routine in detail. Even though it is shown under Windows, these instructions and snippets are not really OS-dependent.


Oh wow, I did not know you have YouTube channel too.

I used your code as below:

Code:
_redo:
...
...
      mov     edx,[offset]
      mov     ecx,8
      call    ConvertHex
      call    PrintOffset
      call    PrintLongSpace   
...
...
...
      call    PrintLine      
      add     [offset],16
      jmp     _redo
...
...
...

ConvertHex:                                     Wink Nice code snippet by Tomasz Grysztar (flat assembler)
      ;mov      ecx,8
      xor      ebx,ebx
_loop1:
      rol      edx,4
      mov      eax,edx
      and      eax,1111b
      mov      al,[digits+eax]
      mov      [ebx+hexnum],al
      inc      ebx
      dec      ecx
      jnz      _loop1     
      ret     


It works by showing the offset (left-most pane in screenshot), however, it did not show the data in proper hex values (all are 0x00).

I am a little confused here, "FFh" is correctly converted, but how about 0xFF (value)?

The code to convert data is highlighted below:

Code:
      mov     rcx,0
      
_repeat1:      
      xor     rdx,rdx
      mov     dl,byte [buffer + rcx]

      push    rcx
      mov     rcx,2
      call    ConvertHex
....
....
    


Last edited by FlierMate on 05 Feb 2022, 09:09; edited 1 time in total
Post 01 Oct 2021, 13:12
View user's profile Send private message Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 01 Oct 2021, 16:34
Ahh, I found out I should have customize the code given by Tomasz.

For 2-byte hex digit use case:

Code:
....
....
      rol      dl,4
      mov      al,dl
      and      eax,1111b
      mov      al,[digits+eax]
....
....    


This fixes the error. Cheers!


Last edited by FlierMate on 05 Feb 2022, 09:08; edited 1 time in total
Post 01 Oct 2021, 16:34
View user's profile Send private message Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 11 Oct 2021, 20:48
.....


Description: hexdump v0.01
Download
Filename: hexdump.asm
Filesize: 4.07 KB
Downloaded: 461 Time(s)



Last edited by FlierMate on 06 Feb 2022, 02:40; edited 6 times in total
Post 11 Oct 2021, 20:48
View user's profile Send private message Reply with quote
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 16 Oct 2021, 13:20
What is the difference between topic/tag and search (which gives 23 repository results)?
Post 16 Oct 2021, 13:20
View user's profile Send private message Visit poster's website Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 16 Oct 2021, 13:45
st wrote:
What is the difference between topic/tag and search (which gives 23 repository results)?


Topic/tag is labelled voluntarily by the repo owner. Your search result is otherwise, which gives more accurate matches.

However, repos could also be using keyword other than "hexdump", which in turn hard to tell how many hexdump utility there are on GitHub.

I see most if not all code repositories of hexdump written in Assembly are not popular, most are not starred, and my "hexdump" FASM example has had only 3 git clones so far.
Post 16 Oct 2021, 13:45
View user's profile Send private message Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 01 Feb 2022, 04:27
Succeeded in deriving my hexdump utility to utf8count utility.

Compare character count from my utility with online counter such as :

The number of character count is divided to 1-byte, 2-byte, 3-byte, 4-byte with different color. The total number is printed in hexadecimal value (have to convert manually to decimal using calculator)

The logic is rather simple:
Code:
     cmp     dl, 011110000b
     jae     _4byte
     cmp    dl, 011100000b
     jae      _3byte
     cmp    dl, 011000000b
     jae      _2byte
     cmp     dl, 001111111b
     jbe     _1byte
     
    ;the others are continuation byte, don't count    


Bug report is welcome! Improvement is welcome!


Description: utf8count v0.02 - derived from hexdump.asm
Download
Filename: utf8count.asm
Filesize: 10.45 KB
Downloaded: 451 Time(s)



Last edited by FlierMate on 06 Feb 2022, 02:44; edited 5 times in total
Post 01 Feb 2022, 04:27
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20335
Location: In your JS exploiting you and your system
revolution 01 Feb 2022, 04:34
What happens for an invalid UTF-8 sequence?

e.g.
Code:
db 0xff, 0xff, 0xff, 0xff, 0xff, 0xff ; 0xff is not valid in UTF-8
db 0xc0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80 ; too many continuation bytes, and overlong encoding of null    
Post 01 Feb 2022, 04:34
View user's profile Send private message Visit poster's website Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 01 Feb 2022, 04:43
revolution wrote:
What happens for an invalid UTF-8 sequence?

e.g.
Code:
db 0xff, 0xff, 0xff, 0xff, 0xff, 0xff ; 0xff is not valid in UTF-8
db 0xc0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80 ; too many continuation bytes, and overlong encoding of null    


By applying my code logic as stated in previous post, the result will be incorrect. This is a good question.

The two sample UTF-8 text files do not have any header signature, then it is intriguing how to check whether it is binary file or UTF-8 file. Perhaps @revolution can give some idea?
Post 01 Feb 2022, 04:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20335
Location: In your JS exploiting you and your system
revolution 01 Feb 2022, 04:59
It is common to scan for byte values 0x00-0x1f to detect a binary file. You might also want to add 0x7f.

Invalid UTF-8 sequences depend upon the application. Some applications just render them as an empty box for each bad byte, or simply ignore them like they don't exist.

Overlong encodings have been used for web exploits to insert rogue JS into pages.
Post 01 Feb 2022, 04:59
View user's profile Send private message Visit poster's website Reply with quote
FlierMate



Joined: 21 Jan 2021
Posts: 219
FlierMate 01 Feb 2022, 05:53
revolution wrote:
It is common to scan for byte values 0x00-0x1f to detect a binary file. You might also want to add 0x7f.

Invalid UTF-8 sequences depend upon the application. Some applications just render them as an empty box for each bad byte, or simply ignore them like they don't exist.

Overlong encodings have been used for web exploits to insert rogue JS into pages.


The above are valuable information, and have never crossed my mind. Thank you so much for the suggestion.
Although it makes the program more complicated, I think it is fun to do such a way:
For invalid UTF-8 sequence,
-Decrement the counter if the continuation bytes are longer or shorter than expected
- Calculate the percentage of wrong UTF-8 sequence (e.g. "Test1.txt is 78% a valid UTF-8 file")

About the overlong encodings, it is really eye-opening, digital crime is everywhere.
Post 01 Feb 2022, 05:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20335
Location: In your JS exploiting you and your system
revolution 01 Feb 2022, 06:03
I omitted to say that 0x0d and 0x0a (and maybe 0x0c) are normal ASCII so perhaps shouldn't trigger a binary file detection.
Post 01 Feb 2022, 06:03
View user's profile Send private message Visit poster's website Reply with quote
FlierMate1



Joined: 31 May 2022
Posts: 118
FlierMate1 03 Jul 2022, 07:14
Hi folks, I have updated my utf8count.asm to v0.03, now it uses 'reset' ANSI escape code instead of black color code, so that it is compatible with terminal emulator with white background (previous version only work with black background).

Though, the colors have no effect in WSL, as tested on my PC.


Description: New version displays colors correctly
Filesize: 135.99 KB
Viewed: 8328 Time(s)

utf8.png


Description: Updated version
Download
Filename: utf8count_v0.03.asm
Filesize: 10.52 KB
Downloaded: 360 Time(s)

Post 03 Jul 2022, 07:14
View user's profile Send private message Reply with quote
FlierMateI



Joined: 12 Sep 2022
Posts: 6
FlierMateI 12 Sep 2022, 06:54
A nice DIY idea for those who have downloaded my hexdump v0.01, you can use coloring for different group of hexadecimal values.

For the example in my screenshot attached below, null values are highlighted with blue color, making it easier to distinguish from the rest.
I am not attaching the files, but you use ANSI escape code for coloring.


Description: Null values in hexdump are in blue color
Filesize: 354 KB
Viewed: 7435 Time(s)

222.png


Post 12 Sep 2022, 06:54
View user's profile Send private message Reply with quote
MatQuasar



Joined: 25 Oct 2023
Posts: 105
MatQuasar 25 May 2024, 07:00
I have changed the set of ANSI escape code to use semicolon instead of colon, so now it is working in WSL also!

v0.04 is the latest minor update.

If you don't want to download new file, you can update your existing utf8count.asm by modifying the last few lines:

Code:
colorblue   db   27,'[48;5;33m'   ; ANSI escape code
lenblue = $ - colorblue
colorgreen   db   27,'[48;5;46m'   ; ANSI escape code
lengreen = $ - colorgreen
colororange   db   27,'[48;5;196m'   ; ANSI escape code
lenorange = $ - colororange
coloryellow   db   27,'[48;5;172m'   ; ANSI escape code
lenyellow = $ - coloryellow                                       


Use semicolon after "27,'[48....m'".


Description: utf8count now show colors properly in WSL
Filesize: 61.46 KB
Viewed: 1884 Time(s)

Capture.PNG


Description: v0.04, works in WSL
Download
Filename: utf8count.asm
Filesize: 11.63 KB
Downloaded: 103 Time(s)

Post 25 May 2024, 07:00
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.