flat assembler
Message board for the users of flat assembler.
Index
> Linux > [Solved] Need help in creating a hexdump utility |
Author |
|
Tomasz Grysztar 01 Oct 2021, 10:22
FlierMate wrote: Still figuring out how to convert ASCII character code to printable hexadecimal value.... |
|||
01 Oct 2021, 10:22 |
|
FlierMate 01 Oct 2021, 13:12
Tomasz Grysztar wrote:
Oh wow, I did not know you have YouTube channel too. I used your code as below: Code: _redo: ... ... mov edx,[offset] mov ecx,8 call ConvertHex call PrintOffset call PrintLongSpace ... ... ... call PrintLine add [offset],16 jmp _redo ... ... ... ConvertHex: Nice code snippet by Tomasz Grysztar (flat assembler) ;mov ecx,8 xor ebx,ebx _loop1: rol edx,4 mov eax,edx and eax,1111b mov al,[digits+eax] mov [ebx+hexnum],al inc ebx dec ecx jnz _loop1 ret It works by showing the offset (left-most pane in screenshot), however, it did not show the data in proper hex values (all are 0x00). I am a little confused here, "FFh" is correctly converted, but how about 0xFF (value)? The code to convert data is highlighted below: Code: mov rcx,0 _repeat1: xor rdx,rdx mov dl,byte [buffer + rcx] push rcx mov rcx,2 call ConvertHex .... .... Last edited by FlierMate on 05 Feb 2022, 09:09; edited 1 time in total |
|||
01 Oct 2021, 13:12 |
|
FlierMate 01 Oct 2021, 16:34
Ahh, I found out I should have customize the code given by Tomasz.
For 2-byte hex digit use case: Code: .... .... rol dl,4 mov al,dl and eax,1111b mov al,[digits+eax] .... .... This fixes the error. Cheers! Last edited by FlierMate on 05 Feb 2022, 09:08; edited 1 time in total |
|||
01 Oct 2021, 16:34 |
|
FlierMate 11 Oct 2021, 20:48
.....
Last edited by FlierMate on 06 Feb 2022, 02:40; edited 6 times in total |
|||||||||||
11 Oct 2021, 20:48 |
|
st 16 Oct 2021, 13:20
What is the difference between topic/tag and search (which gives 23 repository results)?
|
|||
16 Oct 2021, 13:20 |
|
FlierMate 16 Oct 2021, 13:45
st wrote: What is the difference between topic/tag and search (which gives 23 repository results)? Topic/tag is labelled voluntarily by the repo owner. Your search result is otherwise, which gives more accurate matches. However, repos could also be using keyword other than "hexdump", which in turn hard to tell how many hexdump utility there are on GitHub. I see most if not all code repositories of hexdump written in Assembly are not popular, most are not starred, and my "hexdump" FASM example has had only 3 git clones so far. |
|||
16 Oct 2021, 13:45 |
|
FlierMate 01 Feb 2022, 04:27
Succeeded in deriving my hexdump utility to utf8count utility.
Compare character count from my utility with online counter such as : The number of character count is divided to 1-byte, 2-byte, 3-byte, 4-byte with different color. The total number is printed in hexadecimal value (have to convert manually to decimal using calculator) The logic is rather simple: Code: cmp dl, 011110000b jae _4byte cmp dl, 011100000b jae _3byte cmp dl, 011000000b jae _2byte cmp dl, 001111111b jbe _1byte ;the others are continuation byte, don't count Bug report is welcome! Improvement is welcome!
Last edited by FlierMate on 06 Feb 2022, 02:44; edited 5 times in total |
|||||||||||
01 Feb 2022, 04:27 |
|
revolution 01 Feb 2022, 04:34
What happens for an invalid UTF-8 sequence?
e.g. Code: db 0xff, 0xff, 0xff, 0xff, 0xff, 0xff ; 0xff is not valid in UTF-8 db 0xc0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80 ; too many continuation bytes, and overlong encoding of null |
|||
01 Feb 2022, 04:34 |
|
FlierMate 01 Feb 2022, 04:43
revolution wrote: What happens for an invalid UTF-8 sequence? By applying my code logic as stated in previous post, the result will be incorrect. This is a good question. The two sample UTF-8 text files do not have any header signature, then it is intriguing how to check whether it is binary file or UTF-8 file. Perhaps @revolution can give some idea? |
|||
01 Feb 2022, 04:43 |
|
revolution 01 Feb 2022, 04:59
It is common to scan for byte values 0x00-0x1f to detect a binary file. You might also want to add 0x7f.
Invalid UTF-8 sequences depend upon the application. Some applications just render them as an empty box for each bad byte, or simply ignore them like they don't exist. Overlong encodings have been used for web exploits to insert rogue JS into pages. |
|||
01 Feb 2022, 04:59 |
|
FlierMate 01 Feb 2022, 05:53
revolution wrote: It is common to scan for byte values 0x00-0x1f to detect a binary file. You might also want to add 0x7f. The above are valuable information, and have never crossed my mind. Thank you so much for the suggestion. Although it makes the program more complicated, I think it is fun to do such a way: For invalid UTF-8 sequence, -Decrement the counter if the continuation bytes are longer or shorter than expected - Calculate the percentage of wrong UTF-8 sequence (e.g. "Test1.txt is 78% a valid UTF-8 file") About the overlong encodings, it is really eye-opening, digital crime is everywhere. |
|||
01 Feb 2022, 05:53 |
|
revolution 01 Feb 2022, 06:03
I omitted to say that 0x0d and 0x0a (and maybe 0x0c) are normal ASCII so perhaps shouldn't trigger a binary file detection.
|
|||
01 Feb 2022, 06:03 |
|
FlierMate1 03 Jul 2022, 07:14
Hi folks, I have updated my utf8count.asm to v0.03, now it uses 'reset' ANSI escape code instead of black color code, so that it is compatible with terminal emulator with white background (previous version only work with black background).
Though, the colors have no effect in WSL, as tested on my PC.
|
||||||||||||||||||||
03 Jul 2022, 07:14 |
|
FlierMateI 12 Sep 2022, 06:54
A nice DIY idea for those who have downloaded my hexdump v0.01, you can use coloring for different group of hexadecimal values.
For the example in my screenshot attached below, null values are highlighted with blue color, making it easier to distinguish from the rest. I am not attaching the files, but you use ANSI escape code for coloring.
|
||||||||||
12 Sep 2022, 06:54 |
|
MatQuasar 25 May 2024, 07:00
I have changed the set of ANSI escape code to use semicolon instead of colon, so now it is working in WSL also!
v0.04 is the latest minor update. If you don't want to download new file, you can update your existing utf8count.asm by modifying the last few lines: Code: colorblue db 27,'[48;5;33m' ; ANSI escape code lenblue = $ - colorblue colorgreen db 27,'[48;5;46m' ; ANSI escape code lengreen = $ - colorgreen colororange db 27,'[48;5;196m' ; ANSI escape code lenorange = $ - colororange coloryellow db 27,'[48;5;172m' ; ANSI escape code lenyellow = $ - coloryellow Use semicolon after "27,'[48....m'".
|
||||||||||||||||||||
25 May 2024, 07:00 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.