flat assembler
Message board for the users of flat assembler.

Index > DOS > Decommenter / Parser issues: where did those NULs come from?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 00:05
Hi. I developed a small dos program to strip-off comments from a source file. The idea is;

1. Read the source file
2. Copy the content to a buffer
3. strip-off the comments (and copy to a new buffer)
4. copy the new compacted buffer to a new file
5. the new file should be ready for immediate compilation.

The code
Code:
format mz
entry start:main
SIZE = 14000  ;size of test file "m16.asm" in bytes

segment info
fn db "newf.asm",0      ;new file with compacted code
opf db "m16.asm",0      ;file to be strip off comments
buff db SIZE dup(?)     ;buffer read from m16.asm
pack db SIZE dup(?)     ;buffer of compacted file for newf.asm

segment start
main:   push info
        pop ds

        push buff       ;Copy the content to this buffer
        push SIZE       ;in this size
        push opf        ;From this file
        call FOPENR     ;Open for reading

        call START      ;Start stripping off comments

        push fn         ;newf is the new (target) file
        call FNEW       ;Create new file

        push pack       ;The content of stripped source
        push SIZE       ;The same size
        push fn         ;the new file to be written
        call FOPENW

        mov ah,0        ;Pause
        int 16h
        mov ah,4ch      ;Exit
        int 21h

START:  ;-----------------------
        xor si,si
next:   mov al,[ds:buff+si]
        cmp al,';'
            je pck
ok:     mov byte[ds:pack+si],al
        cmp si,SIZE
            je done
        inc si
        jmp next
pck:    inc si
        mov al,[ds:buff+si]
        cmp al,0dh
            je ok
        cmp al,0ah
            je ok
        jmp pck
done:   ret

FNEW:   ;-----------------------
        push bp
        mov bp,sp
        mov dx,[bp+4]
        mov ah,3ch
        mov cl,0
        int 21h
        pop bp
        ret

FOPENR: ;-----------------------
        push bp
        mov bp,sp
        mov dx,[bp+4]
        mov al,0
        mov ah,3dh                            
        int 21h                                      

        mov bx,ax                        
        mov cx,[bp+6]
        mov dx,[bp+8]
        mov ah,3fh                       
        int 21h                   

        mov ah,3eh ;close handle
        int 21h
        pop bp
        ret

FOPENW: ;-----------------------
        push bp
        mov bp,sp
        mov dx,[bp+4]
        mov al,2                       
        mov ah,3dh                            
        int 21h                                      
        mov bx,ax                        
        mov cx,[bp+6]
        mov dx,[bp+8]
        mov ah,40h                     
        int 21h                       

        mov ah,3eh ;close handle
        int 21h
        pop bp
        ret    


This code works. The problems are;

1. FASMW can open the newly created file but the content is blank. Therefore I cannot compile it both from the command line and from FASMW. If compiled, FASMW produces a .bin file with size 0. The size on disk is 13KB.

2. The newly created file (with the stripped-off comments) is perfectly readable in DOS's Edit, DOS's type, Notepad, wordpad. No problem.

3. The result is the same if I used either .asm or .txt for the new file's extension.


I used this test file (m16.asm) that contains both side comments and line comments to test for robustness. A new file should be created once you've compiled the program.

Thanks for the help and clarification.


Last edited by fasmnewbie on 17 May 2014, 05:00; edited 1 time in total
Post 16 May 2014, 00:05
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 00:17
I don't know if this is the correct section to post but I think it is related to FASMW's formatting?
Post 16 May 2014, 00:17
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 00:19
For those of us that can't run DOS how about you post the compacted file.

But I suspect your problem is that you use SI as both the input and output pointer in START. You will need a separate pointer for your PACK buffer and a separate SIZE value to indicate the new size.

Moving to DOS section


Last edited by revolution on 16 May 2014, 00:29; edited 1 time in total
Post 16 May 2014, 00:19
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 00:26
revolution wrote:
For those of use that can't run DOS how about you post the compacted file.

But I suspect your problem is that you use SI as both the input and output pointer in START. You will need a separate pointer for your PACK buffer and a separate SIZE value to indicate the new size.

Moving to DOS section
I think there is no problem with the compacted file. I read it over and over - the comments are gone and the code content stays. This is the result, readable only in text format, but not from FASMW.


Last edited by fasmnewbie on 17 May 2014, 05:01; edited 1 time in total
Post 16 May 2014, 00:26
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 00:28
Look at it with a hex editor.
Post 16 May 2014, 00:28
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 00:32
Yeah, I am aware of the SI mismatch. A little confusion when I modified this code for posting purposes.
Post 16 May 2014, 00:32
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 00:35
So your "compacted" output NEWF.asm is 13.67kB and the input m16.asm was only 13.03kB. Even without seeing the files I can tell there is already something wrong with the output file.

What did you see in the hex editor?
Post 16 May 2014, 00:35
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 00:37
revolution wrote:
Look at it with a hex editor.
I can't. No exe yet.
Post 16 May 2014, 00:37
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 00:45
revolution wrote:
So your "compacted" output NEWF.asm is 13.67kB and the input m16.asm was only 13.03kB. Even without seeing the files I can tell there is already something wrong with the output file.

What did you see in the hex editor?
I declared the file size to be bigger. Should be 14KBs more or less. There could be other non-printable characters hidden in there. Not sure myself.

I don't have hex editor. Will download one shortly.

I was about to further "compact" it by eliminating blank lines. But this one holding me back.
Post 16 May 2014, 00:45
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 01:05
Do you realise that you are not removing the comments, but rather are just skipping them and leaving whatever bytes are in the output buffer in its place? You are advancing SI past the comments and just leaving a gap of unwritten bytes in the output.

I suggest you use DI as an output pointer and only advance it when you are putting a new byte into the output. But still keep SI as an input pointer.
Post 16 May 2014, 01:05
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 01:07
Also what happens with this line?
Code:
text: db 'Here is some text; this is NOT a comment.',13,10,0 ;this is a comment    
Post 16 May 2014, 01:07
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 01:22
revolution

this is the .txt version (output) from the same code. See, no more comments there. Only blank lines at places used to be comments. And that should be readable in FASMW regardless, because they are in pure ASCII format. But I can't see nothing.

I'll post my corrections later. Thanks for spotting the bugs. Catch up with u later.


Last edited by fasmnewbie on 17 May 2014, 05:01; edited 1 time in total
Post 16 May 2014, 01:22
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 01:28
Here is what I see.


Description:
Filesize: 16.68 KB
Viewed: 21457 Time(s)

NEWF.TXT.png


Post 16 May 2014, 01:28
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 16 May 2014, 04:49
fasmnewbie, NUL characters are invalid character in a text file. Many programs will consider it as a end-of-file marker (including FASM). The fact that some editors ignore it, proofs nothing. It is still invalid. As revolution already said, you have to fix your code.

Notice also the example with the quoted semicolon.
Post 16 May 2014, 04:49
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 05:26
It's worse than just NULs, it is going to be whatever garbage is in the buffer from the last usage. In this case it appears that NTVDM clears the program memory to all zeros before execution, but not all DOSes will do that.
Post 16 May 2014, 05:26
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 06:28
@revolution, I don't know what those NULs means. Are they dangerous? I have no idea. I came to know this thing we call ASCII just last night somewhere between half awake and half asleep. I am a self-learner.

@John, ok. I am completely clueless on this characters interpretation. Very new to me. But the code is fixed already.

Code:
format mz
entry start:main
SIZE = 14000  ;size of test file "m16.asm" in bytes

segment info
fn db "newf.asm",0      ;new file with compacted code
opf db "m16.asm",0      ;file to be strip off comments
buff db SIZE dup(?)     ;buffer read from m16.asm
pack db SIZE dup(?)     ;buffer of compacted file for newf.asm

segment start
main:   push info
        pop ds

        push buff       ;Copy the content to this buffer
        push SIZE       ;in this size
        push opf        ;From this file
        call FOPENR     ;Open for reading

        call START      ;Start stripping off comments

        push fn         ;newf is the new (target) file
        call FNEW       ;Create new file

        push pack       ;The content of stripped source
        push SIZE       ;The same size
        push fn         ;the new file to be written
        call FOPENW

        mov ah,0        ;Pause
        int 16h
        mov ah,4ch      ;Exit
        int 21h

START:  ;-----------------------
        xor si,si
        xor di,di
next:   mov al,[ds:buff+si]
        cmp al,';'
            je pck
ok:     mov byte[ds:pack+di],al
        cmp si,SIZE
            je done
        inc si
        inc di
        jmp next
pck:    inc si
        mov al,[ds:buff+si]
        cmp al,0dh
            je ok
        cmp al,0ah
            je ok
        jmp pck
done:   ret

FNEW:   ;-----------------------
        push bp
        mov bp,sp
        mov dx,[bp+4]
        mov ah,3ch
        mov cl,0
        int 21h
        pop bp
        ret

FOPENR: ;-----------------------
        push bp
        mov bp,sp
        mov dx,[bp+4]
        mov al,0
        mov ah,3dh                            
        int 21h                                      

        mov bx,ax                        
        mov cx,[bp+6]
        mov dx,[bp+8]
        mov ah,3fh                       
        int 21h                   

        mov ah,3eh ;close handle
        int 21h
        pop bp
        ret

FOPENW: ;-----------------------
        push bp
        mov bp,sp
        mov dx,[bp+4]
        mov al,2                       
        mov ah,3dh                            
        int 21h                                      
        mov bx,ax                        
        mov cx,[bp+6]
        mov dx,[bp+8]
        mov ah,40h                     
        int 21h                       

        mov ah,3eh ;close handle
        int 21h
        pop bp
        ret    


Should be able to get rid of all comments off a source file.
Post 16 May 2014, 06:28
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 06:39
revolution wrote:
Also what happens with this line?
Code:
text: db 'Here is some text; this is NOT a comment.',13,10,0 ;this is a comment    


Any idea how to fix it? This is my first time engaging this kind of problem and using int 21h's file service. Show me some macro skill Very Happy
Post 16 May 2014, 06:39
View user's profile Send private message Visit poster's website Reply with quote
fasmnewbie



Joined: 01 Mar 2011
Posts: 555
fasmnewbie 16 May 2014, 06:48
revolution wrote:
It's worse than just NULs, it is going to be whatever garbage is in the buffer from the last usage. In this case it appears that NTVDM clears the program memory to all zeros before execution, but not all DOSes will do that.
One step at a time, revolution. This is not even completed yet. hihihi Laughing
I am just testing some features of int 21h because windoze won't allow me access to some BIOS disk service - for obvious reasons of course Very Happy
Post 16 May 2014, 06:48
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 06:50
fasmnewbie wrote:
Any idea how to fix it? This is my first time engaging this kind of problem and using int 21h's file service. Show me some macro skill Very Happy
Yes I know how to fix it. Use a proper line parser. But this is not a task suited to macros.
Post 16 May 2014, 06:50
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 16 May 2014, 06:52
fasmnewbie wrote:
I am just testing some features of int 21h because windoze won't allow me access to some BIOS disk service - for obvious reasons of course
If you can access things through NTVDM (the DOS machine) then you can also do it with Windows API calls. NTVDM uses Windows to do its stuff.
Post 16 May 2014, 06:52
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.