flat assembler
Message board for the users of flat assembler.

Index > Windows > Open file, read by char and count a specyfic char

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Jakubs11



Joined: 20 Jan 2014
Posts: 2
Jakubs11 20 Jan 2014, 19:40
Hello everyone!

I'm a basic fasm user and I'm struggling with an easy task:
to make my program count a specific letter in a text file.

I've menaged to open the file, display it's content and even make the program read the file char by char but I cannot seem to understand why isn't the following code working:

Code:
include 'd:\fasm\include\win32ax.inc'

.data
        FileTitle db 'myfile.txt',0
        hFile dd ?
        nSize dd ?
        lpBytesRead dd ?
        lpBuffer rb 8192

        count dd 0
        letter db 'x',0
        MessageBoxCaption db 'repetitons:',0

.code
        main:

                invoke CreateFile, FileTitle, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0 ; open the file to read
                mov [hFile], eax ;Save the file's handle to hFile
                invoke GetFileSize, [hFile], 0 ; Determine the file size

                ;---------------
                ;if entire file needs to be displayed:

                ; mov [nSize] , eax ; Save the file size given by EAX
                ; invoke MessageBox, NULL, addr lpBuffer , addr MessageBoxCaption, MB_OK
                ; invoke CloseHandle, [hFile]
                ; invoke ExitProcess,0

                ;---------------

                .while 1 ;or another exit parameter
                          invoke ReadFile, [hFile], lpBuffer, 1 , lpBytesRead, 0 ;read one byte from file
                          .if (lpBuffer = letter)
                              add counter, 1
                          .endif

                          .if(????) 
                          ;end while-loop condition? maybe 'GetOverlappedResult [hFile], lpBuffer, 1 , lpBytesRead, 0' would do the magic but I cannot seem to make it work
                                jmp endofwhile
                .endw

                endofwhile:
                invoke CloseHandle, [hFile] ;   Handle should be close after the file has been read

                add counter, 48 ; is it needed to dispaly an int as ASCII?

                invoke MessageBox, NULL, addr counter , addr MessageBoxCaption, MB_OK
                invoke ExitProcess,0
        .end main  
    


myfile.txt contains:
Code:
text text text    


Would you please tell me what is that I am doing wrong?
Thank you in advance!
Post 20 Jan 2014, 19:40
View user's profile Send private message Reply with quote
RIxRIpt



Joined: 18 Apr 2013
Posts: 50
RIxRIpt 20 Jan 2014, 20:24
Quote:
add counter, 48

FASM Docs wrote:
When operand is a data in memory, the address of that data (also any numerical expression, but it may contain registers) should be enclosed in square brackets or preceded by ptr operator. For example instruction mov eax,3 will put the immediate value 3 into the EAX register, instruction mov eax,[7] will put the 32-bit value from the address 7 into EAX and the instruction mov byte [7],3 will put the immediate value 3 into the byte at address 7, it can also be written as mov byte ptr 7,3. To specify which segment register should be used for addressing, segment register name followed by a colon should be put just before the address value (inside the square brackets or after the ptr operator).

http://flatassembler.net/docs.php?article=manual#1.2.1

And here's my implementation of "Open file, read by char and count a specific char" (using Microsoft C Runtime DLL [msvcrt])

Code:
format PE CONSOLE
entry main
include 'win32a.inc'

section '.code' code readable writeable executable
        proc main
                cinvoke fopen, fileName, fileMode
                mov [fp], eax
                test eax, eax
                jz .end
                
                .loop:
                cinvoke fgetc, [fp]
                cmp eax, -1 ;EOF
                je .eof
                cmp al, [letter]
                jne .loop
                inc [count]
                jmp .loop
                
                .eof:
                movzx eax, [letter]
                cinvoke printf, fmt, eax, fileName, [count]
                cinvoke fclose, [fp]
                cinvoke getch
                xor eax, eax
                .end:
                ret
        endp

section '.data' data readable writeable
        fp dd ?
        count dd ?
        fileName db 'file.txt', 0
        fileMode db 'r', 0
        letter db 't'
        fmt db 'Number of `%c` in %s: %i', 13, 10, 0

section '.idata' import data readable writeable
        library msvcrt, 'msvcrt.dll'
        
        import msvcrt,\
                fopen, 'fopen',\
                fclose, 'fclose',\
                fgetc, 'fgetc',\
                printf, 'printf',\
                getch, '_getch' ;for pause
    

_________________
Привет =3
Admins, please activate my account "RIscRIpt"


Last edited by RIxRIpt on 20 Jan 2014, 20:39; edited 1 time in total
Post 20 Jan 2014, 20:24
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1769
Roman 20 Jan 2014, 20:29
Jakubs11
You example read 1 bytes from file. And if buffer = text symbols counter +1
And in cycle read file.

Then after read all file counter apply 48 (thi is text number 0 )
Print number in ram counter.
counter this is adres memory.

По русски понимаеш ?
Post 20 Jan 2014, 20:29
View user's profile Send private message Reply with quote
Jakubs11



Joined: 20 Jan 2014
Posts: 2
Jakubs11 20 Jan 2014, 21:00
Roman wrote:

По русски понимаеш ?


no, sorry, I don't spreak russian.

RIxRIpt, thank you for your implementation. Helped me understand my errors.
Best regards.
Post 20 Jan 2014, 21:00
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1619
Location: Toronto, Canada
AsmGuru62 21 Jan 2014, 15:10
And here it is made into a function with parameters:
Code:
;
; COUNT BYTES IN FILE
;
format  PE GUI 4.0
entry   start
stack   4000h, 4000h

    include 'Win32W.Inc'

; ---------------------------------------------------------------------------
section '.data' data readable writeable

    glb_FilePath  db 'C:\Temp\MyFile.txt',0    ; <-- put your test file name in here
    glb_BufMsg    rb 80
    glb_FmtMsg    db 'There are %d characters found in file.',0
    glb_Title     db 'Counting Characters...',0

; ---------------------------------------------------------------------------
section '.code' code readable executable

; ---------------------------------------------------------------------------
virtual at 0
loc1:
    .Buffer       db ?  ; Loaded ANSI character from file
    .FindMe       db ?  ; Character to look for
    .Padding      rb 2  ; For alignment (stack must be aligned to DWORD)
    .CharsLoaded  dd ?  ; Can be 0 (if file ended) or 1 (next character loaded)
    .CountChars   dd ?  ; Character counter
    .size = $
end virtual

align 16
TFile_CountChars:
; ---------------------------------------------------------------------------
; INPUT:
;   ESI = ANSI file name
;    AL = character code to count
; OUTPUT:
;   EAX = count of characters in file
; ---------------------------------------------------------------------------
    push      ebx esi edi ebp
    ;
    ; Save AL for now (because CreateFile will destroy it)
    ;
    mov       edi, eax
    ;
    ; Open file for reading
    ;
    invoke    CreateFileA, esi, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0
    cmp       eax, INVALID_HANDLE_VALUE
    je        .no_file
    ;
    ; File opened OK
    ;
    mov       ebx, eax     ; store file handle into EBX
    ;
    ; At this point some local variables needed, so the
    ; small structure 'loc1' is allocated on stack and
    ; EBP is pointed to this structure.
    ;
    sub       esp, loc1.size
    mov       ebp, esp
    ;
    ; Two buffers are needed for ReadFile:
    ; - buffer into which character is loaded (ESI)
    ; - buffer into which  # of bytes is stored (EDI)
    ;
    mov       esi, ebp
    lea       eax, [ebp + loc1.CharsLoaded]
    xchg      eax, edi
    mov       [ebp + loc1.FindMe], al
    and       [ebp + loc1.CountChars], 0   ; Set COUNTER=0
    ;
    ; In the loop read characters from file (1-by-1)
    ;
.read_char:
    invoke    ReadFile, ebx, esi, 1, edi, 0
    ;
    ; See if we got the character
    ;
    mov       ecx, [edi]
    jecxz     .no_more_bytes
    ;
    ; See if the character at ESI is the one we need
    ;
    mov       al, [esi]
    ;
    ; ECX = 1 (if AL is matching the ANSI code parameter) or 0 (if no match)
    ;
    xor       ecx, ecx
    cmp       al, [ebp + loc1.FindMe]
    sete      cl
    ;
    ; Add ECX to the counter
    ;
    add       [ebp + loc1.CountChars], ecx
    jmp       .read_char

.no_more_bytes:
    invoke    CloseHandle, ebx                ; Close file
    mov       eax, [ebp + loc1.CountChars]    ; EAX = return value
    add       esp, loc1.size                  ; 'Forget' local variables
    jmp       .done

.no_file:
    xor       eax, eax      ; Return zero

.done:
    pop       ebp edi esi ebx
    ret

; ---------------------------------------------------------------------------
; PROGRAM ENTRY POINT
; ---------------------------------------------------------------------------
align 16
start:
    ;
    ; A small test
    ;
    mov       al, 't'
    mov       esi, glb_FilePath
    call      TFile_CountChars
    ;
    ; Show the counter in a message box
    ;
    mov       edi, glb_BufMsg
    cinvoke   wsprintfA, edi, glb_FmtMsg, eax
    invoke    MessageBoxA, 0, edi, glb_Title, MB_ICONINFORMATION
    ;
    ; Quit
    ;
    invoke    ExitProcess, 0

; ---------------------------------------------------------------------------
section '.idata' import data readable writeable

    library kernel32,'KERNEL32.DLL',user32,'USER32.DLL',gdi32,'GDI32.DLL'

    include 'API\Kernel32.Inc'
    include 'API\User32.Inc'
    include 'API\Gdi32.Inc'
    
Post 21 Jan 2014, 15:10
View user's profile Send private message Send e-mail Reply with quote
m3ntal



Joined: 08 Dec 2013
Posts: 296
m3ntal 21 Jan 2014, 16:45
It's best to read the entire file once into memory then iterate through it. Reading individual bytes is strongly discouraged, it could take seconds to load 4MB worth of files (4,194,304 disk reads versus one. Optimized C/C++ compilers will use memory I/O). Pseudo:
Code:
ReadFile(file.h, p, file.size, tmp.rw, 0)    
Note: file.size. Here's how I'd do it:
Code:
function count.file.c, file, c
  locals n
  try load.text file
  get n=text.count.c r0, c
  flush
endf n    
Post 21 Jan 2014, 16:45
View user's profile Send private message Reply with quote
RIxRIpt



Joined: 18 Apr 2013
Posts: 50
RIxRIpt 21 Jan 2014, 18:06
m3ntal wrote:
It's best to read the entire file once into memory then iterate through it. Reading individual bytes is strongly discouraged, it could take seconds to load 4MB worth of files (4,194,304 disk reads versus one. Optimized C/C++ compilers will use memory I/O).

I don't think you would read the entire 16GB file at once. I guess you wanted to suggest reading by blocks. (for example 4KB)
By the way, msvcrt._getch uses its own buffer with size 4096 (at least in my system, proof)

_________________
Привет =3
Admins, please activate my account "RIscRIpt"
Post 21 Jan 2014, 18:06
View user's profile Send private message Visit poster's website Reply with quote
m3ntal



Joined: 08 Dec 2013
Posts: 296
m3ntal 21 Jan 2014, 20:16
Quote:
I don't think you would read the entire 16GB file at once
16GB? Where'd you get that from? Re-Read: 4MB (Megabytes, 4,194,304 bytes) was the example I used. Millions of physical disk reads = Extreme hard drive thrashing.
Post 21 Jan 2014, 20:16
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 21 Jan 2014, 20:31
m3ntal wrote:
4MB (Megabytes, 4,194,304 bytes) was the example I used. Millions of physical disk reads (only need one) = Extreme hard drive thrashing.
For almost all users Windows will have caching enabled and there will be only a few disk reads even if you read one byte at a time using the standard API calls. I find Windows to be very robust and fast when reading and writing files, and a user would have to go to extreme lengths to force it to perform poorly. Even so, there will still be a performance impact with the numerous API calls, but the reason is not what you suggest it to be with physical disk read activity, instead it is just the normal API overhead being amplified a few million times.
Post 21 Jan 2014, 20:31
View user's profile Send private message Visit poster's website Reply with quote
m3ntal



Joined: 08 Dec 2013
Posts: 296
m3ntal 21 Jan 2014, 20:59
Quote:
a user would have to go to extreme lengths to force it to perform poorly
On my PCs (laptop+ITX), it takes about 5 seconds to load 4MB worth of images using ReadFile for individual bytes. But if I read the entire file once then process in memory, it will load in the blink of an eye. I guess my old version of Win7 does not have internal caching.
Post 21 Jan 2014, 20:59
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 21 Jan 2014, 21:06
m3ntal wrote:
Quote:
a user would have to go to extreme lengths to force it to perform poorly
On my PCs (laptop+ITX), it takes about 5 seconds to load 4MB worth of images using ReadFile for individual bytes. But if I read the entire file once then process in memory, it will load in the blink of an eye. I guess my old version of Win7 does not have internal caching.
If your HDD can do 4M reads (separate I/O ops) in 5 seconds I would be very surprised. Instead I expect the caching is working as intended and the API overhead for 4M calls is within expectation also.

It would be a good test to disable caching (which isn't actually easy to do BTW) and note just how much extra time it takes to do 4M individual physical HDD read I/O ops. I'd guess it would take a very long time.
Post 21 Jan 2014, 21:06
View user's profile Send private message Visit poster's website Reply with quote
upsurt



Joined: 14 Jan 2014
Posts: 51
upsurt 21 Jan 2014, 22:58
RIxRIpt wrote:
And here's my implementation of "Open file, read by char and count a specific char" (using Microsoft C Runtime DLL [msvcrt])

Code:
format PE CONSOLE
entry main
include 'win32a.inc'

section '.code' code readable writeable executable
        proc main
                cinvoke fopen, fileName, fileMode
                mov [fp], eax
                test eax, eax
                jz .end
                
                .loop:
                cinvoke fgetc, [fp]
                cmp eax, -1 ;EOF
                je .eof
                cmp al, [letter]
                jne .loop
                inc [count]
                jmp .loop
                
                .eof:
                movzx eax, [letter]
                cinvoke printf, fmt, eax, fileName, [count]
                cinvoke fclose, [fp]
                cinvoke getch
                xor eax, eax
                .end:
                ret
        endp

section '.data' data readable writeable
        fp dd ?
        count dd ?
        fileName db 'file.txt', 0
        fileMode db 'r', 0
        letter db 't'
        fmt db 'Number of `%c` in %s: %i', 13, 10, 0

section '.idata' import data readable writeable
        library msvcrt, 'msvcrt.dll'
        
        import msvcrt,\
                fopen, 'fopen',\
                fclose, 'fclose',\
                fgetc, 'fgetc',\
                printf, 'printf',\
                getch, '_getch' ;for pause
    


great, thank you!
how about searching a word instead of a char?
Post 21 Jan 2014, 22:58
View user's profile Send private message Reply with quote
neville



Joined: 13 Jul 2008
Posts: 507
Location: New Zealand
neville 22 Jan 2014, 01:49
revolution wrote:
For almost all users Windows will have caching enabled .... I find Windows to be very robust and fast when reading and writing files, and a user would have to go to extreme lengths to force it to perform poorly. ....
Imo it is misleading to suggest that Windows' disk caching is provided as a "feature". Without it Windows'performance would be even more canine, especially since Windows users are actively encouraged to enable virtual memory by maintaining a so-called swap-file of non-zero size.
So Windows is more than capable of Extreme Hard Drive Thrashing, which I have personally witnessed on many occasions (thankfully always on other people's machines!)
Meanwhile, enjoy your dance on the head of a pin Wink

_________________
FAMOS - the first memory operating system
Post 22 Jan 2014, 01:49
View user's profile Send private message Visit poster's website Reply with quote
Frank



Joined: 17 Jun 2003
Posts: 100
Frank 22 Jan 2014, 03:14
neville wrote:
Imo it is misleading to suggest that Windows' disk caching is provided as a "feature". Without it Windows'performance would be even more canine

We heard from "m3ntal" (the former "uart777", if I understood this right) that reading a 4MB file BYTE FOR BYTE takes 5 seconds under Windows 7. You seem to claim that disk caching in Windows serves to hide some kind of deficiency in the operating system, and that reading a 4MB file BYTE FOR BYTE can be done at the same speed (or faster) even without disk caching in other operating systems. Please provide evidence. For example, how long does it take to read 4MB BYTE FOR BYTE in the operating system FAMOS that you advertise in your signature? From your message I understand that hobbyist operating systems such as FAMOS achieve the same performance or better (4MB in 5 seconds) WITHOUT disk caching.
Post 22 Jan 2014, 03:14
View user's profile Send private message Reply with quote
m3ntal



Joined: 08 Dec 2013
Posts: 296
m3ntal 22 Jan 2014, 03:29
Sorry for kind of changing the subject, but it's true that ReadFile can take noticeable time. Consider that one 1366x768x32 image = 4,196,352 bytes! (1366*768*4)

Jakubs11: .while 1 is not valid. Here is a forever macro that creates an infinite loop:
Code:
macro forever {
 local ..start, ..next, ..end
 ?START equ ..start
 ?NEXT equ ..next
 ?END equ ..end
 ?START:
}

macro endfv {
  ?NEXT:
  jmp ?START
 ?END:
 restore ?START, ?NEXT, ?END
}    
C/C++/Java style languages do not have a dedicated keyword for this. It is often written as:
Code:
for (;;) {} /* nasty hacks! */
while (1) {}    
Post 22 Jan 2014, 03:29
View user's profile Send private message Reply with quote
Frank



Joined: 17 Jun 2003
Posts: 100
Frank 22 Jan 2014, 03:43
upsurt wrote:

how about searching a word instead of a char?

How about doing your homework assignments yourself, rather than asking others to do them for you?
Post 22 Jan 2014, 03:43
View user's profile Send private message Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 125
Melissa 22 Jan 2014, 05:32
On Linux file can be directly read from disk, but only multiple of 512
bytes. I guess that is because disk read operation takes at least
512 bytes.
Reading directly from disk 8192 512 bytes blocks takes about
half a second on my machine.
Post 22 Jan 2014, 05:32
View user's profile Send private message Reply with quote
Frank



Joined: 17 Jun 2003
Posts: 100
Frank 22 Jan 2014, 11:48
So your 8192 read accesses to the disk (sector for sector, 512 bytes at a time) take half a second. Reading a 4M file BYTE FOR BYTE (without caching the sectors!) would mean 4M read accesses to the disk. By extension, that would take half a second X 512 = 256 seconds on your machine.
Post 22 Jan 2014, 11:48
View user's profile Send private message Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 125
Melissa 22 Jan 2014, 12:42
Yes, 4M disk reads takes 4m37s.
Post 22 Jan 2014, 12:42
View user's profile Send private message Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1619
Location: Toronto, Canada
AsmGuru62 22 Jan 2014, 14:32
It is obvious that reading file by bytes is much slower than doing it by larger pieces.
I just showed how the original post CAN BE done as a function.
I had no intention of optimizing it in any way.

Also, it is possible that ReadFile() API makes some internal checks, like
making sure that the file handle parameter is a valid handle. That will
happen ~4M times in this case, so yeah, it will be slow way to read a file.

As for looking for a word in a file - it is more complex. If you reading by 1 bytes as in the original post,
you need to keep incrementing a count of matching characters and clear it for every byte,
which is not a match. Here is pseudo-code:
Code:
long    nMatches=0, foundMatches, ofsFile=-1;
char    c;
char    word[8];

strcpy (word, "text");  // Loking for "text" in file
foundMatches = strlen (word);

while (1)
{
        c = LoadByteFromFile();
        if (no bytes left in file) break;
        //
        // 'c' is a next character from file
        //
        ++ofsFile;
        if (c == word [nMatches])
        {
                ++nMatches;
                if (nMatches == foundMatches)
                {
                        ofsFile -= foundMatches-1;
                        //
                        // Word has been found at offset 'ofsFile' in file!
                        //
                }
        }
        else
        {
                if ((nMatches != 0) && (c == word [0]))
                {
                        nMatches=1;
                        continue;
                }
                nMatches=0;
        }
}
    

This pseudo-code was not tested -- it is basically brut-force scan of the file.
I may be missing something, but you should get the idea.


Last edited by AsmGuru62 on 23 Jan 2014, 13:41; edited 1 time in total
Post 22 Jan 2014, 14:32
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.