flat assembler
Message board for the users of flat assembler.

Index > DOS > Opening file and going throught it byte by byte.

Author
Thread Post new topic Reply to topic
mmajewski



Joined: 15 Apr 2014
Posts: 10
mmajewski 12 May 2014, 21:09
I am writing program and part of this program is to go through file.
I assume the structure of file I open is like:
Code:
RandomWord1 RandomWord2 RandomWord3    

(Random worlds separated by "Space")
I do not know the size of file but I assume there are not any worlds in English that consists >30 letters. So I declared:
Code:
ToFind0 db 30 dup(0)
ToFind1 db 30 dup(0)
ToFind2 db 30 dup(0)    

I want my program to store 'RandomWorld1$" under ToFind0, 'RandomWorld2$' under ToFind1 and so on..

The name of my file is declared as FileNameToFind.
So I wrote this code:
Code:
mov dx, FileNameToFind
mov al, 2              ;access mode - read and write 
mov ah, 03Dh          ;function 3Dh -open a file 
int 21h
jc ErrorOpening  ;if carry flag is set jump to ErrorOpening    


I know that this code open a file. I want to iterate throught my file byte by byte and save to my ToFind0 etc. without knowing the size of file. I searched the web but I did not find good answer. I appreciate any links, tips, examples etc.
Post 12 May 2014, 21:09
View user's profile Send private message Reply with quote
freecrac



Joined: 19 Oct 2011
Posts: 117
Location: Germany Hamburg
freecrac 13 May 2014, 07:58
Hello.
With the read function it is possible to load 64 KB bytes into our buffer and with a following read we can load the next 64 KB of the file and so on. In the AX-register we get the number of bytes actually read.

RBIL->inter61b.zip->INTERRUP.F
Quote:

--------D-213F-------------------------------
INT 21 - DOS 2+ - "READ" - READ FROM FILE OR DEVICE
AH = 3Fh
BX = file handle
CX = number of bytes to read
DS:DX -> buffer for data
Return: CF clear if successful
AX = number of bytes actually read (0 if at EOF before call)
CF set on error
AX = error code (05h,06h) (see #01680 at AH=59h/BX=0000h)
Notes: data is read beginning at current file position, and the file position
is updated after a successful read
the returned AX may be smaller than the request in CX if a partial
read occurred
if reading from CON, read stops at first CR
under the FlashTek X-32 DOS extender, the pointer is in DS:EDX
BUG: Novell NETX.EXE v3.26 and 3.31 do not set CF if the read fails due to
a record lock (see AH=5Ch), though it does return AX=0005h; this
has been documented by Novell
SeeAlso: AH=27h,AH=40h,AH=93h,INT 2F/AX=1108h,INT 2F/AX=1229h

But it is also possible to get the length of the file before loading.
Quote:

--------D-2142-------------------------------
INT 21 - DOS 2+ - "LSEEK" - SET CURRENT FILE POSITION
AH = 42h
AL = origin of move
00h start of file
01h current file position
02h end of file
BX = file handle
CX:DX = (signed) offset from origin of new file position
Return: CF clear if successful
DX:AX = new file position in bytes from start of file
CF set on error
AX = error code (01h,06h) (see #01680 at AH=59h/BX=0000h)
Notes: for origins 01h and 02h, the pointer may be positioned before the
start of the file; no error is returned in that case (except under
Windows NT), but subsequent attempts at I/O will produce errors
if the new position is beyond the current end of file, the file will
be extended by the next write (see AH=40h); for FAT32 drives, the
file must have been opened with AX=6C00h with the "extended size"
flag in order to expand the file beyond 2GB
BUG: using this method to grow a file from zero bytes to a very large size
can corrupt the FAT in some versions of DOS; the file should first
be grown from zero to one byte and then to the desired large size
SeeAlso: AH=24h,INT 2F/AX=1228h

Code:
mov    ah, 42h
mov    bx, [HANDLE]
mov    al, 2                  ; from the end of file
mov    cx, 0                  ; high
mov    dx, 0                  ; low
int  21h
mov    [LENGTH_LOW], ax
mov    [LENGTH_HIGH], dx

mov    ah, 42h                ; set the file pointer to the start position
mov    bx, [HANDLE]
mov    al, 0                  ; start of file
mov    cx, 0                  ; high
mov    dx, 0                  ; low
int  21h
    

Dirk
Post 13 May 2014, 07:58
View user's profile Send private message Send e-mail Reply with quote
mmajewski



Joined: 15 Apr 2014
Posts: 10
mmajewski 13 May 2014, 08:54
I have written this code (using your code example):
Code:
mov dx, FileNameToFind
mov al, 2              ;access mode - read and write 
mov ah, 03Dh          ;function 3Dh -open a file 
int 21h
jc ErrorOpening  ;if carry flag is set jump to ErrorOpening
mov [FileHandle], ax ;saving file handle

mov ah, 42h
mov bx, [FileHandle]
mov al, 2 ;from the end of file
mov cx, 0 ;high
mov dx, 0 ;low
int 21h

mov [LENGTH_LOW], ax
mov [LENGTH_HIGH], dx

mov ah, 42h ;set the file pointer
mov bx, [FileHandle]
mov al, 0
mov cx, 0
mov dx, 0
int 21h

FileHandle dw 0 
LENGTH_HIGH dw 0
LENGTH_LOW dw 0
    


I do not understand this code quite a bit. Why are you doing this
Code:
mov ah, 42h ;set the file pointer
mov bx, [FileHandle]
mov al, 0
mov cx, 0
mov dx, 0
int 21h    

after it is clear what is the size of file?

Should the function to read the file be like?:
Code:
mov ah, 3Fh 
mov bx, [FileHandle]
mov cx, [LENGTH_HIGH]
mov dx, buff
int 21h    

It actually is not working, I did not know why. Still I have no clue how to access data from file, for example how to print it.
Could you elaborate on this more, please?
Post 13 May 2014, 08:54
View user's profile Send private message Reply with quote
freecrac



Joined: 19 Oct 2011
Posts: 117
Location: Germany Hamburg
freecrac 13 May 2014, 10:39
After we get the length of the file by setting the file pointer to the end of the file, we have to set the file pointer back to the start position, because the read function will read beginning at current file position.

And the number of bytes for to read can only have a maximal size of 0FFFFh bytes and if the file size is greater, then we have to read again by calling the read funtion again. Each reading will additional move the file pointer forward, so the next read will be continue at current file position.

This "mov cx, [LENGTH_HIGH]" is wrong because this contains only the high word of the file length, but we can only read 0FFFFh of bytes with calling the read function.
Example if the file size is greater than 0FFFFh, then the highword of the file length contains a value greater 0, but otherwise it contains 0. For to read we can put a value of 0FFFFh in CX for to load the first part of the file. And then we can read again the next parts with calling the read function again with CX=0FFFFh. The read function return in AX the number of bytes actually read or a 0 value, if we get the end of the file before calling.

The access of the bytes of the file is similar with the access of any data in the ram. For printing the (ASCII-) bytes to the screen we can use function 0Eh of the int 10h.

RBIL->inter61a.zip->INTERRUP.A
Quote:
--------b-100E--CXABCD-----------------------
INT 10 - V20-XT-BIOS - TELETYPE OUTPUT WITH ATTRIBUTE
AH = 0Eh
CX = ABCDh
BP = ABCDh
AL = character to write
BH = page number
BL = foreground color (text modes as well as graphics modes)
Return: nothing
Program: V20-XT-BIOS is a ROM BIOS replacement with extensions by Peter
Koehlmann / c't magazine
Desc: display a character on the screen, advancing the cursor and scrolling
the screen as necessary
Notes: characters 07h (BEL), 08h (BS), 0Ah (LF), and 0Dh (CR) are interpreted
and do the expected things
SeeAlso: INT 15/AH=84h"V20-XT-BIOS"

But if the bytes do not have an ASCII character, then we can print the number of the bytes instead. For a decimal output we have to split the value in different ASCII numbers by dividing the number with 100 and with 10, beginning with the highest part of the number and to convert it to ASCII with adding "0" to each value printing from the left to the rigth side of the number. For a hexadecimal output we have to seperate each half nibble of the byte also beginning with the highest part of the number. Then checking if the value is greater 9 and if it is, then we have to add a 7 to it for to get the letter from A-F and additional with adding "0" to the value for to convert it to ASCII.

Dirk
Post 13 May 2014, 10:39
View user's profile Send private message Send e-mail Reply with quote
mmajewski



Joined: 15 Apr 2014
Posts: 10
mmajewski 15 May 2014, 19:00
Thank you! It helped me a lot !
Post 15 May 2014, 19:00
View user's profile Send private message Reply with quote
henry.37



Joined: 28 Mar 2015
Posts: 5
henry.37 28 Mar 2015, 16:03
Hello, I am having a hard time trying to print the file smaller than 128 kB. I am an assembly noob, so it´s hard to understand everything. How would you do it ?

freecrac wrote:
Hello.
With the read function it is possible to load 64 KB bytes into our buffer and with a following read we can load the next 64 KB of the file and so on. In the AX-register we get the number of bytes actually read.

RBIL->inter61b.zip->INTERRUP.F
Quote:

--------D-213F-------------------------------
INT 21 - DOS 2+ - "READ" - READ FROM FILE OR DEVICE
AH = 3Fh
BX = file handle
CX = number of bytes to read
DS:DX -> buffer for data
Return: CF clear if successful
AX = number of bytes actually read (0 if at EOF before call)
CF set on error
AX = error code (05h,06h) (see #01680 at AH=59h/BX=0000h)
Notes: data is read beginning at current file position, and the file position
is updated after a successful read
the returned AX may be smaller than the request in CX if a partial
read occurred
if reading from CON, read stops at first CR
under the FlashTek X-32 DOS extender, the pointer is in DS:EDX
BUG: Novell NETX.EXE v3.26 and 3.31 do not set CF if the read fails due to
a record lock (see AH=5Ch), though it does return AX=0005h; this
has been documented by Novell
SeeAlso: AH=27h,AH=40h,AH=93h,INT 2F/AX=1108h,INT 2F/AX=1229h

But it is also possible to get the length of the file before loading.
Quote:

--------D-2142-------------------------------
INT 21 - DOS 2+ - "LSEEK" - SET CURRENT FILE POSITION
AH = 42h
AL = origin of move
00h start of file
01h current file position
02h end of file
BX = file handle
CX:DX = (signed) offset from origin of new file position
Return: CF clear if successful
DX:AX = new file position in bytes from start of file
CF set on error
AX = error code (01h,06h) (see #01680 at AH=59h/BX=0000h)
Notes: for origins 01h and 02h, the pointer may be positioned before the
start of the file; no error is returned in that case (except under
Windows NT), but subsequent attempts at I/O will produce errors
if the new position is beyond the current end of file, the file will
be extended by the next write (see AH=40h); for FAT32 drives, the
file must have been opened with AX=6C00h with the "extended size"
flag in order to expand the file beyond 2GB
BUG: using this method to grow a file from zero bytes to a very large size
can corrupt the FAT in some versions of DOS; the file should first
be grown from zero to one byte and then to the desired large size
SeeAlso: AH=24h,INT 2F/AX=1228h

Code:
mov    ah, 42h
mov    bx, [HANDLE]
mov    al, 2                  ; from the end of file
mov    cx, 0                  ; high
mov    dx, 0                  ; low
int  21h
mov    [LENGTH_LOW], ax
mov    [LENGTH_HIGH], dx

mov    ah, 42h                ; set the file pointer to the start position
mov    bx, [HANDLE]
mov    al, 0                  ; start of file
mov    cx, 0                  ; high
mov    dx, 0                  ; low
int  21h
    

Dirk
:
Post 28 Mar 2015, 16:03
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 28 Mar 2015, 18:44
henry.37,

Can you be more specific about the problem you've trouble with? Open file with int21/3D, read a chunk of it into a buffer with int21/3F, output read chunk with int21/40, repeat until EOF. Don't forget to close file with int21/3E afterwards (just to behave nicely). Error checking is a good thing too.
Post 28 Mar 2015, 18:44
View user's profile Send private message Reply with quote
henry.37



Joined: 28 Mar 2015
Posts: 5
henry.37 28 Mar 2015, 20:24
i´ve opened file, but here comes to problem, i am disoriented with spliting the file, [LENGTH_LOW] / [LENGTH_HIGH] should be size of buffer, so i should create buffer with size [LENGTH_LOW/high] and throw it to buffer, than simply print it. Opening and closing are working fine, I checked it. Sorry if the code is terrible, i'm not good at it.
This is my procedure:

read_print_FILE PROC
mov ax,4200h ;read
mov bx,filehandle ;file handle


mov al,0
mov cx,1 ;file pointer to cx
mov dx,1 ;file pointer to dx
int 21h ;vykonaj
jc ERROR

mov [LENGTH_LOW], ax ;velkost nacitana
mov [LENGTH_HIGH], dx


mov cx,LENGTH_LOW ; size to print lower
mov dx,offset buff
int 21h
vypis buff ;macro to print

mov cx,LENGTH_HIGH ; size to print higher
mov dx,offset buff
int 21h
vypis buff ;macro to print

mov ah, 42h ; set the file pointer to the start position
mov bx, filehandle
mov al, 0 ; start of file
mov cx, 0 ; high
mov dx, 0 ; low
int 21h
jc ERROR

; je N_chybaC
jmp N_koniecPC
ERROR: vypis chybstC ;print "problem handling the file"

N_koniecPC: ret
ENDP

;macro to print
vypis MACRO tx
xor ax,ax
mov ah,9 ;9 vezme cely retazec, nie len 1 bit az po dolar, do ah
mov dx,offset tx ;do dx je potrebne dat offset aby mohlo textu1
int 21h
ENDM

;aand
bsize EQU 32768

buff DB bsize dup ('$')
filehandle DW 0
baldr wrote:
henry.37,

Can you be more specific about the problem you've trouble with? Open file with int21/3D, read a chunk of it into a buffer with int21/3F, output read chunk with int21/40, repeat until EOF. Don't forget to close file with int21/3E afterwards (just to behave nicely). Error checking is a good thing too.
Post 28 Mar 2015, 20:24
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1737
Location: Toronto, Canada
AsmGuru62 29 Mar 2015, 01:57
Is this what you want?
Code:
    ; ---------------------------------------------------------------------------
    ; FILE: DosType.Asm
    ; DATE: March 28, 2015
    ; ---------------------------------------------------------------------------
    ;
    ; COM file standard header
    ;
    org 100h
    use16
    ;
    ; MS-DOS Program Entry Point
    ;
    mov       dx, TextBuffer
    call      TeletypeTheFile
    ;
    ; Quit to MS-DOS
    ;
    int       20h
    ;
    ; Procedure to load file by pieces and print it on console
    ;
TeletypeTheFile:
    ; --------------------------------------------------------
    ; INPUT:
    ;   DS:DX = file name
    ; --------------------------------------------------------
    ;
    ; Open file
    ;
    mov       ax, 3D00h
    int       21h
    jc        .no_file
    ;
    ; File opened - store handle into variable
    ;
    mov       [HFile], ax
    ;
    ; Read 511 bytes into 'TextBuffer'
    ;
.read_more:
    mov       bx, [HFile]
    mov       cx, 511
    mov       dx, TextBuffer
    mov       ah, 3Fh
    int       21h
    jc        .read_error
    ;
    ; Check if anything is loaded from file (# of loaded bytes is in AX)
    ;
    test      ax, ax
    jz        .no_more_bytes
    ;
    ; Terminate loaded text with '$'
    ;
    mov       si, dx
    add       si, ax
    mov       byte [si], '$'
    ;
    ; Print piece to console and continue loading until no bytes left
    ;
    mov       ah, 9
    int       21h
    jmp       .read_more

.no_more_bytes:
    ;
    ; Close the file
    ;
    mov       bx, [HFile]
    mov       ah, 3Eh
    int       21h
    ret

.read_error:
    mov       ah, 9
    mov       dx, CantRead
    int       21h

.no_file:
    mov       ah, 9
    mov       dx, NoSuchFile
    int       21h
    ret
    ;
    ; Data section
    ;
    HFile       dw 0
    NoSuchFile  db 0Dh,0Ah,'File Not Found.',0Dh,0Ah,'$'
    CantRead    db 0Dh,0Ah,'Cannot Read This File.',0Dh,0Ah,'$'
    FileName    db 'C:\Tools\FASM\SOURCE\IDE\FASMW\FASMW.ASM',0  ; <-- put here your file name
    TextBuffer  rb 512
    
Post 29 Mar 2015, 01:57
View user's profile Send private message Send e-mail Reply with quote
henry.37



Joined: 28 Mar 2015
Posts: 5
henry.37 29 Mar 2015, 11:19
This does quite lots of error hard to fix,
*TextBuffer rb 512 - (textbuffer undefined, should be here dw ?,db is out of range)
*mov byte [si], '$' -(argument needs type override)
*mov dx, CantRead -(operand type do not match, i guess you forgot offset CantRead)
*mov dx, NoSuchFile -(same as above)

After trying to fix errors as i suggested above, it gives me ERROR with *mov byte [si], '$'
AsmGuru62 wrote:
Is this what you want?
Code:
    ; ---------------------------------------------------------------------------
    ; FILE: DosType.Asm
    ; DATE: March 28, 2015
    ; ---------------------------------------------------------------------------
    ;
    ; COM file standard header
    ;
    org 100h
    use16
    ;
    ; MS-DOS Program Entry Point
    ;
    mov       dx, TextBuffer
    call      TeletypeTheFile
    ;
    ; Quit to MS-DOS
    ;
    int       20h
    ;
    ; Procedure to load file by pieces and print it on console
    ;
TeletypeTheFile:
    ; --------------------------------------------------------
    ; INPUT:
    ;   DS:DX = file name
    ; --------------------------------------------------------
    ;
    ; Open file
    ;
    mov       ax, 3D00h
    int       21h
    jc        .no_file
    ;
    ; File opened - store handle into variable
    ;
    mov       [HFile], ax
    ;
    ; Read 511 bytes into 'TextBuffer'
    ;
.read_more:
    mov       bx, [HFile]
    mov       cx, 511
    mov       dx, TextBuffer
    mov       ah, 3Fh
    int       21h
    jc        .read_error
    ;
    ; Check if anything is loaded from file (# of loaded bytes is in AX)
    ;
    test      ax, ax
    jz        .no_more_bytes
    ;
    ; Terminate loaded text with '$'
    ;
    mov       si, dx
    add       si, ax
    mov       byte [si], '$'
    ;
    ; Print piece to console and continue loading until no bytes left
    ;
    mov       ah, 9
    int       21h
    jmp       .read_more

.no_more_bytes:
    ;
    ; Close the file
    ;
    mov       bx, [HFile]
    mov       ah, 3Eh
    int       21h
    ret

.read_error:
    mov       ah, 9
    mov       dx, CantRead
    int       21h

.no_file:
    mov       ah, 9
    mov       dx, NoSuchFile
    int       21h
    ret
    ;
    ; Data section
    ;
    HFile       dw 0
    NoSuchFile  db 0Dh,0Ah,'File Not Found.',0Dh,0Ah,'$'
    CantRead    db 0Dh,0Ah,'Cannot Read This File.',0Dh,0Ah,'$'
    FileName    db 'C:\Tools\FASM\SOURCE\IDE\FASMW\FASMW.ASM',0  ; <-- put here your file name
    TextBuffer  rb 512
    
Post 29 Mar 2015, 11:19
View user's profile Send private message Reply with quote
henry.37



Joined: 28 Mar 2015
Posts: 5
henry.37 29 Mar 2015, 11:26
also, shouldnt be there procedure to call actuall read ? there is open, but shouldnt be there after that "mov ah,3fh" to read ?
Post 29 Mar 2015, 11:26
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 29 Mar 2015, 11:26
henry.37 wrote:
This does quite lots of error hard to fix,
*TextBuffer rb 512 - (textbuffer undefined, should be here dw ?,db is out of range)
*mov byte [si], '$' -(argument needs type override)
*mov dx, CantRead -(operand type do not match, i guess you forgot offset CantRead)
*mov dx, NoSuchFile -(same as above)

After trying to fix errors as i suggested above, it gives me ERROR with *mov byte [si], '$'
The code posted by AsmGuru62 assembles fine. What version of fasm are you using? Did you download your copy from this domain?
Post 29 Mar 2015, 11:26
View user's profile Send private message Visit poster's website Reply with quote
henry.37



Joined: 28 Mar 2015
Posts: 5
henry.37 29 Mar 2015, 11:31
no, i am using Tasm, for school study purpose.
Post 29 Mar 2015, 11:31
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1737
Location: Toronto, Canada
AsmGuru62 29 Mar 2015, 17:46
I see.
There are only few places where TASM code would be different from FASM.
Like when you need an address of a variable you use OFFSET in TASM.
Code:
mov si, Buffer          ; FASM
mov si, OFFSET Buffer   ; TASM
    

If you want to load a value from memory, remove brackets:
Code:
mov bx, [HFile]   ; FASM
mov bx, HFile     ; TASM
    

The directive 'rb' I used to reserve buffer is probably also different - something with DUP...
I forgot TASM already, so you'll have to translate the code using some TASM manuals.
You should have those since you are studying TASM in school.

Also, you can read the logic of the code and do the same in TASM.
I think the code logic is what matters in your question.
Post 29 Mar 2015, 17:46
View user's profile Send private message Send e-mail Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20753
Location: In your JS exploiting you and your system
revolution 30 Mar 2015, 09:36
AsmGuru62 wrote:
If you want to load a value from memory, remove brackets:
Code:
mov bx, [HFile]   ; FASM
mov bx, HFile     ; TASM
    
I haven't used TASM in a long time but I seem to remember that this syntax depends upon the assembly mode. In MASM mode: you don't need to have brackets since the context is taken from the definition of the variable. In Ideal mode: the brackets are mandatory.
Post 30 Mar 2015, 09:36
View user's profile Send private message Visit poster's website Reply with quote
DOS386



Joined: 08 Dec 2006
Posts: 1904
DOS386 02 Apr 2015, 17:59
henry.37 wrote:
no, i am using Tasm, for school study purpose.


Damn. Please tell your teacher about FASM.

> With the read function it is possible to load 64 KB bytes into our buffer

NOT true. Unfortunately we can read up to $FFFF AKA 65'535 AKA (64 Ki - 1) Octet's only. So if we want to read a big file, what block size to use? $FFFF ? $FFFE ? $FFFC ? $FFF0 ? $FF00 ? $FE00 ? $F000 ? $C000 ? $8000 ? The dilemma is not solvable ($FE00 seems to give good performance).

> In Ideal mode: the brackets are mandatory.

YES, TASM has an "ideal mode", and it's more similar to FASM. And YES, FASM design is partly inspired by TASM ideal mode.

> (Random worlds separated by "Space")

Be prepared to encounter other junk.

> I do not know the size of file but I assume there are not
> any worlds in English that consists >30 letters.

This is a dangerous approach.

> So I declared:
> ToFind0 db 30 dup(0)
> ToFind1 db 30 dup(0)
> ToFind2 db 30 dup(0)
> I want my program to store 'RandomWorld1$" under
> ToFind0, 'RandomWorld2$' under ToFind1 and so on..

This is desperately inefficient. What are you trying to do ? Just count words ? This is very simple if your file is < 64 KiO, if it's above, it's more tricky, as your buffer border may fall between 2 words, or into a word. Hist: try to count words in a file < 64 KiO for the start.

_________________
Bug Nr.: 12345

Title: Hello World program compiles to 100 KB !!!

Status: Closed: NOT a Bug
Post 02 Apr 2015, 17:59
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.