flat assembler
Message board for the users of flat assembler.

Index > Windows > FindFiles recursive

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
Hallo Everybody,
i couldnt find the post about nested procs, so i have posted here a routine
to find files recursive, as example how to nest manually built procs. Differences with other similiar routines i have seen are:

1) uses only one 512 byte name buffer for all find-jobs (avoid stack overflow)
2) it is only one nested proc
3) use callback to allow customization

here
Code:
listfiles:
 push ebp
 push ebx
 push edi
 push esi

 mov esi,[esp+20]  ;_path
 mov edx,[esp+24]     ;_mask
 mov ebp,[esp+28]     ;_callback

 test esi,esi
 jz .err_lfA

 sub esp,512
 mov edi,esp             
 mov ecx,esp
@@:
 lodsb
 stosb
 test al,al
 jnz @b
 sub edi,ecx
 sub esp,sizeof.WIN32_FIND_DATA
 dec edi
 mov esi,esp                
 xchg ecx,edi
 sub esp,4

 test edx,edx
 jnz @f
 mov edx,esp
 mov dword[edx],02A2E2Ah
@@:
 push .err_lf

.listit:
 ;IN EBP callback
 ;IN ECX len of path
 ;IN EDI path
 ;IN ESI w32fd
 ;IN EDX pMask
 push ebx
 sub esp,8
 mov [esp],ecx                      ;store len
 mov [esp+4],edx          ;store mask

 push edi
 mov al,"\"
 add edi,ecx
 push esi
 stosb
 mov esi,edx
@@:
 lodsb
 stosb
 test al,al
 jnz @b
 pop esi
 pop edi

 push esi
 push edi
 invoke FindFirstFile
 or eax,eax
 jle .err_liB
 mov ebx,eax

.next_liA:
 lea edx,[esi+2Ch] ;WIN32_FIND_DATA.cFileName
 mov eax,[edx]
 cmp ax,002Eh
 jz .next_liB
 cmp ax,2E2Eh
 jz .next_liB
 mov ecx,[esp]
 mov eax,edi
 test ebp,ebp
 jz .next_liC
 mov byte [edi+ecx],0

 push ebp
 push ebx
 push edi
 push esi
 call ebp
 pop esi
 pop edi
 pop ebx
 pop ebp

 test eax,eax
 jz .next_liD

.next_liC:
 mov eax,[esi]      ;       +WIN32_FIND_DATA.dwFileAttributes
 mov ecx,[esp]
 test al,FILE_ATTRIBUTE_DIRECTORY
 jz .next_liB

 push edi
 mov al,"\"
 add edi,ecx
 push esi
 stosb
 lea esi,[esi+2Ch]       ;WIN32_FIND_DATA.cFileName
@@:
 lodsb
 stosb
 inc ecx
 test al,al
 jnz @b
 pop esi
 pop edi
 mov edx,[esp+4]
 call .listit
 test eax,eax
 jz .next_liD
      
.next_liB:
 push esi
 push ebx
 invoke FindNextFile
 test eax,eax
 jnz   .next_liA
 inc eax

.next_liD:     
 push eax
 push ebx
 invoke FindClose
 pop eax

.err_liB:
 add esp,8
 pop ebx
 ret 0

.err_lf:        
 add esp,4
 add esp,sizeof.WIN32_FIND_DATA
 add esp,512
.err_lfA:
 pop esi
 pop edi
 pop ebx
 pop ebp
 ret 12
    


you could define a callback function in this way:
Code:
callback:
 ;IN EAX=dirname
 ;IN EDX= filename or dir
 ;IN ECX = len of EAX
 ;IN ESI = WIN32_FIND_DATA
 push edx
 push eax
 push szFormat
 cinvoke printf
 add esp,12
 ret 0
    


usage
Code:
; szFormat db "EAX=%s EDX=%s",13,10,0
; szFilt       db "*.asm",0
; szPath     db "C:",0

 push callback
 push szFilt
 push szPath
 call listfiles
    


Note:
- IF callback = 0 no callback will be called
- IF szFilt = 0 standard filter will be used "*.*"
- szPath must be standard, example "C:\mydir"
- Returning EAX=0 from the callback stops search exiting
- there's no need to preserve EBP/EBX/ESI/EDI in the callback

Note also that if you set a filter like "*.asm" it will behave not greedy, it is to say, it finds all ".asm" files/dirs in the current directory.
To have it greedy in the subdirectories set szFilt=0 and parse values (in ESI=pWIN32_FIND_DATA) in the callback

Cheers,
hopcode


Last edited by hopcode on 27 Jan 2010, 15:53; edited 1 time in total
Post 23 Jan 2010, 09:25
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode,

Several comments:

1. First of all, do you have to pay for each character written in source, or you purchase them in bulk for a discount? Wink

2. FindFirstFileA() limits the length of lpFileName-pointed zstring to MAX_PATH==260 characters (for strlen(lpFileName)>260 it fails with ERROR_FILENAME_EXCED_RANGE==206, even with strlen(lpFileName)==260 it can't find existing file (ERROR_PATH_NOT_FOUND==3)).
FindFirstFileW() accepts "\\?\" file name prefix to overcome that limitation, but your version is apparently ANSI. Hence 512-byte buffer is an overkill.

3. Though Windows seems to specifically handle "*.*" filename mask to include in the results filenames without dot, "*" mask looks better and less ambiguous ("*_*" is a mask for filenames containing underscore without extension, or those with extension should match too?)

4. It's better to adhere to some standard calling conventions (findfiles is stdcall, why callback function uses custom calling convention?), imagine that someone wants to call your function from HLL. And what is the purpose of calling findfiles without callback? HDD thrashing?

5. Probably it will run on 9x/Me, but NT hates misaligned stack. WIN32_FIND_DATAA is 318 bytes long, after sub esp,sizeof.WIN32_FIND_DATA FindFirstFileA() fails with fuzzy ERROR_NOACCESS==998 "Invalid access to memory location".

Is this a beta version? There is a plenty of room for improvement…
Post 23 Jan 2010, 14:27
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
i didnt understand the question No 1, but it
set me up a good mood Very Happy , and that results in a point for you.
Good, also
yes..., the nested proc is not so severe as in your
comments-requirements-list (i know that you know, but i know it too).
Anyway, good requirement list. I will try to do my best for it,
even if perhaps i am expecting too much from you,(baldr), and especially concret coded
baldr wrote:
...improvements
to quote yourself completely. For example, how to avoid the almost same code
just before/at the ret 4 - ret 12 nested procs.
i dont like symmetry.
Symmetry sounds... diabolic to me.
Wink

Regards,
hopcode
Post 25 Jan 2010, 00:57
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Nice, maybe you should call it "ListFilenames" because at first I thought it returned handles to the files. (I know, bad bad thinking lol Razz)
Post 25 Jan 2010, 20:14
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode,

Here goes my attempt of similar program (slightly tested):
Code:
                format  PE console
                include "Win32A.Inc"
                stdcall scan_dir, _allfiles
                ret

scan_dir:       push    ebp
                mov     ebp, esp
define file_mask ebp+8
; start directory scan (subdirectory too)
find_first:     push    ebx                             ; save caller ebx/handle
                invoke  FindFirstFile, [file_mask], wfd
                cmp     eax, INVALID_HANDLE_VALUE
                je      step_back                       ; no files, return to prev. dir
                mov     ebx, eax
output:         cinvoke printf, _fmt, [wfd.dwFileAttributes], wfd.cFileName
                mov     eax, [wfd.dwFileAttributes]
                test    eax, FILE_ATTRIBUTE_DIRECTORY
                jz      find_next                       ; not directory, continue scan
FILE_ATTRIBUTE_REPARSE_POINT = 0x400
                test    eax, FILE_ATTRIBUTE_REPARSE_POINT
                jnz     find_next                       ; symbolic link, continue scan
                mov     eax, dword[wfd.cFileName]
                cmp     ax, "."
                je      find_next                       ; same for dot
                and     eax, 1 shl 24 - 1
                cmp     eax, ".."
                je      find_next                       ; and double dot
; descend into dir and start scan again
                invoke  SetCurrentDirectory, wfd.cFileName
                jmp     find_first
find_next:      invoke  FindNextFile, ebx, wfd
                test    eax, eax
                jnz     output
                invoke  FindClose, ebx
step_back:      pop     ebx                             ; dir scanned, restore handle/caller ebx
                cmp     esp, ebp
                jnb     done                            ; no more ebx on stack, done
                invoke  SetCurrentDirectory, _updir     ; ascend one dir level up
                jmp     find_next
done:           pop     ebp
                retn    4

_allfiles       TCHAR   "*", 0
_updir          TCHAR   "..", 0
_fmt            TCHAR   "%8x %.260s", 10, 0
                align   4
                data import
                library Kernel32, "Kernel32", MSVCRT, "MSVCRT"
                import  Kernel32,\
                        FindFirstFile, "FindFirstFileA",\
                        FindNextFile, "FindNextFileA",\
                        FindClose, "FindClose",\
                        SetCurrentDirectory, "SetCurrentDirectoryA"
                import  MSVCRT, printf, "printf"
                end data
wfd             WIN32_FIND_DATA    
The idea was to eliminate recursive call and parameters copying (as we only need find handles stack). Find handle is pushed on stack when function encounters good subdir (i.e. not "." or ".." or symlink) and popped when done with it. Function returns when last ebx value is popped (that of caller).
It works from current directory, can be modified to accept path in file mask (then SetCurrentDirectory calls would be unnecessary).
Callback is embedded into function (printf) for simplicity.
Post 26 Jan 2010, 00:00
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
baldr, i am deeply touched from your effort.
that is really surprising me, and that is a good feeling. Very Happy
Ok, i will not post anything before tonight or tommorrow
night because i am busy at the moment (so i will profit of some
hours-of-other-light-doing to think).

My solution skip the last directory (or more), your solution fails in
a recursive infinite, because perhaps already in the current dir.
What follows is a temporary solution, because it does not distinguish between error and
other ret value, but avoid infinite loop. Note that all matching files are forced in a SetCurDir, and this is slowing somehow the whole proc.
Code:
; descend into dir and start scan again
  invoke  SetCurrentDirectory, wfd.cFileName
  test eax,eax
  jz    find_first
    

Anyway, that is not the matter. The matter is that
i really like this, and i will work on this:
Code:
step_back:
    pop     ebx       ; dir scanned, restore handle/caller ebx
    cmp     esp, ebp
    jnb     done      ; no more ebx on stack, done
    


- FILE_ATTRIBUTE_REPARSE_POINT i.e. <JUNCTION> is a MUST-DO,
i have experienced this http://blogs.msdn.com/oldnewthing/archive/2004/12/27/332704.aspx
on a bad install of IExplorer.
- Path in mask is a really good idea.
Borsuc wrote:
...call it "ListFilenames"...
is OK, because more abstract: listfile is the proc, find/found thisfile/thatfile is the restriction from the customizable callback.

Thanks for the help,
Hear you soon,

hopcode
Very Happy
Post 26 Jan 2010, 11:43
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode wrote:
…your solution fails in a recursive infinite, because perhaps already in the current dir.
Can you explain how it fails and what is that "recursive infinite"? Endless loop? printf call can be modified to show that subdirs are scanned.
Code:
output:         sub     esp, (MAX_PATH+3) and -4
                invoke  GetCurrentDirectory, MAX_PATH, esp
                cinvoke printf, _fmt, [wfd.dwFileAttributes], wfd.cFileName, esp
                add     esp, (MAX_PATH+3) and -4
...
_fmt            TCHAR   "%8x %.260s", 10, "%.260s", 10, 0    
hopcode wrote:
Note that all matching files are forced in a SetCurDir, and this is slowing somehow the whole proc.
Are they? I thought jz find_next after test eax, FILE_ATTRIBUTE_DIRECTORY skips all dir handling stuff altogether… Wink

There is another problem: hardlinks. It's not easy to ensure that callback will be called only once for each unique file (not name). Probably this is not an issue, let callback handle this.

My function isn't thread-safe due to static wfd. This is because I was focused on iterative approach; wfd can be made automatic.

Perhaps I'll write another version, for something like scandir "C:\WINDOWS\*.DLL".
Post 26 Jan 2010, 13:57
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
It is ok the listfiles proc now:
- fixed the dir scan
- proc reduced of few bytes
i havent found yet a good solution to avoid repeating some lines.
other fixings, as from baldr's requirements-list i will do it later.
Ok.
do we challenge ? Very Happy
if anyone interested in the challenge, you could propose your version under the following condition
    - one process, one thread
    - no UNC files, for simplicity (but you might it to)

that's all.
In the meanwhile i will try to extract a timer proc framework from other
threads to use it in the challenge.

Cheers,
hopcode
Post 27 Jan 2010, 16:10
View user's profile Send private message Visit poster's website Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
The best way to handle junctions is IMO, to just ignore them. Wink
Post 27 Jan 2010, 17:24
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
Here a new version,308 bytes only.This is different in design (i prefer this new one).

EDIT: fixed bug for ..0 as files/directory
EDIT: listfiles return EAX=num items found /files/dir)
- Useful is to know the nesting level (in CX) at each moment.
- Useful too is having the number of items at each callback notify.
EDIT: Added userparam to send to the calback
The callback follows the x32 API convention. It must preserve EBP/EBX/ESI/EDI.
It could be improved a lot, but i will try first at a discrete design Cool
.
usage:
Code:
 push userparam  ;param = 0 /param = value
 push callback ;0 no callback / callback address 
 push FILE_ATTRIBUTE_READONLY or FILE_ATTRIBUTE_DIRECTORY ;filter
 push 1  ;maxlevel
 push 0  ;szMaskFile
 push szPath
 call listfiles
    


    - Maxlevel can be -1 = all subdirectories /0 = current directory / N=depth
    - filter acts as an OR (at least one of the flags that matches) filter before notifying the file/dir to the callback (i.e. no match -> no notify)
    - userparam to send to the callback

example
Code:
proc callback \
 _w32fd,\
 _numitems,\
 _userparam

 push ebx
 push edi
 push esi
 mov ebx,[_userparam] ;<------ userparam

 ;IN EAX=dirname
 ;IN EDX= filename or dir
 ;IN CX = level / 0=current dir   ;<------------------
 ;IN SHR ECX,16 = len of EAX    ;<----------------

 ;RET EAX=0 stop execution
 push edx
 push eax
 push szFormat
 cinvoke printf
 add esp,12
 xor eax,eax
 inc eax

 pop esi
 pop edi
 pop ebx
 ret
endp

    

and the proc listfiles
Code:
listfiles:
    push ebp
    mov ebp,esp
    sub esp,512+320+4+4+4
    label .upath dword at ebp+8
    label .umask dword at ebp+12
    label .ulevel  dword at ebp+16
    label .ufilter dword at ebp+20
    label .ucback dword at ebp+24
    label .uparam dword at ebp+28
    
    label .path dword at ebp-(512+320+12)
    label .w32fd dword at ebp-(320+12)
    label .nitems dword at ebp-12
    label .level dword at ebp-8
    label .mask dword at ebp-4
    
    push ebx
    push edi
    push esi

    mov esi,[.upath]    ;user path
    lea ecx,[.path]
    mov edx,[.umask]
    xor eax,eax
    test esi,esi
    mov [.level],eax
    mov [.nitems],eax
    jz  .err_lf
    mov edi,ecx
@@:
    lodsb
    stosb
    test al,al
    jnz @b
    sub edi,ecx
    dec edi
    xchg ecx,edi        ;len of path
    lea esi,[.w32fd]
    test edx,edx
    jnz @f
    mov edx,.def_mask
@@:
    push .err_lf
    mov [.mask],edx

.listit:    
    ;IN EDI path
    ;IN ESI w32fd
    ;IN ECX len
    push ebx
    sub esp,4
    xor eax,eax
    mov [esp],ecx
    inc [.level]
    mov [esi+2ch],eax

    push esi
    push edi

    push esi
    push edi
    
    mov al,"\"
    add edi,ecx
    stosb
    mov esi,[.mask]

@@:
    lodsb
    stosb
    test al,al
    jnz @b
    invoke FindFirstFile
    pop edi
    pop esi
    or eax,eax
    jle .err_liB
    mov ebx,eax

.next_liA:
    lea edx,[esi+2Ch] ;WIN32_FIND_DATA.cFileName
    mov eax,[edx]
    cmp ax,002Eh
    jz  .next_liB
    cmp eax,2E2Eh
    jz  .next_liB

    ;--------- match filter ------
    mov ecx,[.ufilter]
    and ecx,[esi]
    test ecx,ecx
    jz  .next_liC

    inc [.nitems]
    mov ecx,[esp]       
    mov eax,edi

    push ebx
    push edi
    xor ebx,ebx
    push ebp
    mov dword [edi+ecx],ebx
    push esi

    rol ecx,16
    mov ebx,[.ucback]
    mov cx,word[.level]
    test ebx,ebx
    jz  .next_liE
    dec ecx

    push [.uparam]
    push [.nitems]
    push esi
    call ebx

.next_liE:
    pop esi
    pop ebp
    pop edi
    pop ebx
    test eax,eax
    jz  .next_liD

.next_liC:
    mov eax,[esi]   ;   +WIN32_FIND_DATA.dwFileAttributes
    mov ecx,[esp]
    test al,FILE_ATTRIBUTE_DIRECTORY
    jz  .next_liB

    ;--------- match level -------
    mov eax,[.ulevel]
    inc eax
    jz  @f
    cmp eax,[.level]
    jbe .next_liB

@@:
    push esi
    push edi

    mov al,"\"
    add edi,ecx
    stosb
    lea esi,[esi+2Ch]   ;WIN32_FIND_DATA.cFileName

@@:
    lodsb
    stosb
    inc ecx
    test al,al
    jnz @b

    pop edi
    pop esi
    call .listit
    test eax,eax
    jz  .next_liD
    
.next_liB:
    push esi
    push ebx
    invoke FindNextFile
    test eax,eax
    jnz .next_liA
    inc eax

.next_liD:  
    push eax
    push ebx
    invoke FindClose
    dec [.level]
    pop eax

.err_liB:
    add esp,4
    pop ebx
    ret 0

.err_lf:
    pop esi
    pop edi
    pop ebx
    mov eax,[.nitems]
    add esp,4+4+4+320+512
    pop ebp
    ret 24
align 4
.def_mask:
    db "*",0,0,0
    


Cheers, Very Happy
hopcode
.
.
.


Last edited by hopcode on 02 Feb 2010, 00:15; edited 2 times in total
Post 31 Jan 2010, 04:38
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode,

  1. ..0 is a valid name for file/directory.
  2. Symlinks can create cycles in subdirectory graph.
  3. Non-default filename mask can mask out directories (*.asm directories? Probably none).
  4. Different prototype for callback depending on parameter's value? What if that parameter is passed as parameter to calling function? I mean, my function calls listfiles and passes parameter given to my function as userparam. Two different callbacks? One callback and wrapper?
Code:
void my_function(void *param) {
...
  if (param==0)
    listfiles(..., parameterless_callback, param); // this one with ret 0
  else
    listfiles(..., parameterised_callback, param); // this one with ret 4
...
}    
Anyway, parameters are passed to callback in registers… I meant stdcall or cdecl conventions for that.
Post 01 Feb 2010, 12:07
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
baldr wrote:
stdcall or cdecl conventions for that.

Yes, you are completely right about. RET 0 /RET 4 is somewhat boring.
I cannot resolve myself Very Happy wether should be fixed RET 4/8 (one/two params to the callback) or RET 0.
Your opinion ?
Mine is that the more fixed, the more the user feels stability and... consequently believes/acts as an intelligent person Wink. Perhaps 2 params, EAX/ECX/EDX infos and fixed RET 8 in the callback and we go whole hog.
NOTE about symlinks: the new version gives nesting level in CX.
.
.
EDIT
baldr wrote:
..0 is a valid name for file/directory.

important. thanks for reporting.it is already on the way to-be-fixed.
.
Post 01 Feb 2010, 13:02
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode,

Register parameters are good for callback function written in assembly language; fastcall differs between x86-32 compilers; probably stdcall will fit the bill? cdecl can be good too, if parameters are const (so callback can't modify them, then caller can simply pop them back).

Nesting level can't help to resolve circular directory structure, it can limit recursion (if FindFirstFile doesn't already). Probably another callback (to decide whether to descend into found directory/symlink)?

Filename mask issue is difficult too: either two passes should be done, or pattern matching. Or callback should decide itself.

What if we split problem in two? First function scans subdirectory structure, callback scans found subdirectories for matching files? Two passes, again.

Definitely there is more of that than it seems from first look.
Post 01 Feb 2010, 14:06
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17348
Location: In your JS exploiting you and your system
revolution
The MS 'search' box in explorer will search all first level directories first, then move to all second level, then third, fourth, etc. I expect they did some research and found that most searches are found close to the root. Perhaps you can consider incorporating this behaviour into your code to speed up the search.


Last edited by revolution on 01 Feb 2010, 15:37; edited 1 time in total
Post 01 Feb 2010, 14:19
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
revolution wrote:
...I expect they did some research and found that most searches are found close to the root
At a first glance it seems statistically correct whether one has few or thousand files in the root dir, especially if using a mask.
What i would really like to have is a scan ordering:
- 1st level,2nd level etc. This could be done having, as baldr said, 2 dedicated procs, one for files and one for directories. I tried such solution, but i cannot avoid to do it multithreaded and, as you know, TEB for each directory. that is a lot of resources. That is the reason why i would prefer one buffer/one stack.
Or Question

btw: updated to parse ..0 files/directories
Post 01 Feb 2010, 15:34
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode,

Single (MT-aware) function probably can do it.

__________
revolution,

Tagged file system, that's what we need. Associative memory. Wink
Post 01 Feb 2010, 22:58
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
baldr wrote:
Single (MT-aware) function probably can do it.

Yes,with less than 1024 bytes total stack it could result in a real gem of programming. I will be glad to study and use it.!

Ok about general _stdcall (registers preserving) but EAX/EDX/ECX
will be used to convey infos in the callback (fixed on RET 12 / 3 parameters always, if implemented)
Also, No mercy upon HLL programmers! sorry Very Happy

One needs 10 minutes to learn to use the function.
If one learn to use a function, one should learn something "deepened" in the system instead of something on the nearer surface.
In fact, from the user side, errors should be caused (apart from internal bugs in the function) more like for a lack of knoweledge of the system than for the reason that
a function tries to be user-friendly, despite the intelligence of the programmer.
No mercy!
It will be a good chance to learn inline assembly for the future! Very Happy
.
.
Cheers,
hopcode
Post 02 Feb 2010, 00:36
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode,

Got some time, wrote this:
Code:
                proc    scandir uses ebx, mask, callback, param
local .mask[MAX_PATH]:BYTE, .wfd WIN32_FIND_DATA
                mov     ecx, [mask]
                lea     edx, [.mask]
                push    edx             ; assume no path given
.copy_mask:     mov     al, [ecx]
                inc     ecx
                mov     [edx], al
                inc     edx
                cmp     al, '\'
                je      .mark_mask
                cmp     al, '/'
                jne     .test_nul
.mark_mask:     mov     [mask], ecx     ; update mask ptr
                mov     [esp], edx      ; update .mask ptr
                jmp     .copy_mask      ; no need to test
.test_nul:      test    al, al
                jnz     .copy_mask
;;; first pass, expects:
;;;    esp: pointer to pattern in .mask
;;;       N*file find handle (N==0 on entry)
;;;         saved caller's ebx
;;;  .mask: current path + pattern
.1st_pass:      invoke  FindFirstFile, addr .mask, addr .wfd
                cmp     eax, INVALID_HANDLE_VALUE
                jz      .2nd_pass       ; no files matching filename mask, scan subdirs
; found first file, cut off pattern from .mask
                mov     ebx, eax
                mov     edx, [esp]
                mov     byte[edx], 0
.check_file:    test    [.wfd.dwFileAttributes], FILE_ATTRIBUTE_DIRECTORY
                jnz     .next_file      ; skip dir
                invoke  callback, addr .mask, addr .wfd, [param]
.next_file:     invoke  FindNextFile, ebx, addr .wfd
                test    eax, eax
                jnz     .check_file
                invoke  FindClose, ebx
;;; second pass, expects:
;;;   same stack as 1st
;;;   .mask: at least current path
.2nd_pass:      mov     edx, [esp]
                mov     word[edx], '*'  ; append "*" as pattern to .mask
                invoke  FindFirstFile, addr .mask, addr .wfd
                cmp     eax, INVALID_HANDLE_VALUE
                je      .dir_done       ; failed to found any, count as done
                mov     ebx, eax
.check_dir:     mov     eax, [.wfd.dwFileAttributes]
                test    eax, FILE_ATTRIBUTE_DIRECTORY
                jz      .next_dir       ; skip non-dir
FILE_ATTRIBUTE_REPARSE_POINT = 0x400
                test    eax, FILE_ATTRIBUTE_REPARSE_POINT
                jnz     .next_dir       ; skip symlink
                mov     eax, dword[.wfd.cFileName]
                cmp     ax, "."
                je      .next_dir       ; same for dot
                and     eax, 1 shl 24 - 1
                cmp     eax, ".."
                je      .next_dir       ; and double dot
;;; got dir to scan, modify .mask and adjust stack
                pop     edx             ; pattern pointer into .mask
                push    ebx             ; save file find handle
; append subdir name and backslash to .mask instead of "*" pattern
                lea     ecx, [.wfd.cFileName]
.append_subdir: mov     al, byte[ecx]
                inc     ecx
                mov     byte[edx], al
                inc     edx
                test    al, al
                jnz     .append_subdir
                mov     byte[edx-1], '\'
                push    edx             ; save pattern pointer
; append pattern to mask
                mov     ecx, [mask]
.cat_filemask:  mov     al, byte[ecx]
                inc     ecx
                mov     byte[edx], al
                inc     edx
                test    al, al
                jnz     .cat_filemask
                jmp     .1st_pass       ; ready to start over!
.next_dir:      invoke  FindNextFile, ebx, addr .wfd
                test    eax, eax
                jnz     .check_dir
                invoke  FindClose, ebx
;;; done with dir, backtrack
.dir_done:      pop     edx             ; pattern pointer
                pop     ebx             ; file find handle
; check for file find handles stack empty
                lea     eax, [.mask]
                cmp     esp, eax
                jnb     .done           ; that was caller's ebx
; scan .mask backward for backslash
.trim_last_dir: dec     edx
                cmp     edx, eax
                jbe     .trimmed        ; no more backslashes (mask was simply pattern)
                cmp     byte[edx-1], '\'
                jne     .trim_last_dir
.trimmed:       push    edx             ; save pattern pointer
                jmp     .next_dir       ; proceed to search with popped handle
.done:          ret
                endp    
Same idea as previous: use stack for backtracking (now local variable's address is used as sentinel). Function uses three parameters (path+pattern, callback, parameter), callback too (path, pointer to WIN32_FIND_DATA, parameter). Both stdcall.

Two passes are done: first invokes callback for each matching non-directory entry, second iterates over all subdirectories and adjusts mask/stack.

Estimated stack usage (data structures only): (318+2)+260+4+4*N+4 == 588+4*N, where N is maximum nesting level.

Depth-first search is simple (stack size depends on nesting level), breadth-first requires some serious efforts (FIFO for pending subdirs, for example).
Post 02 Feb 2010, 20:38
View user's profile Send private message Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode
309 bytes, good work, use of the stack is ok.
i need some time to study something in it that attracts me.

Question:
baldr wrote:
Non-default filename mask can mask out directories
example of non-default filename mask to test ?

minor fixes:

    - no dir notification (in callback only files, 20h using your old printf format) they are important items when the function is a means to replicate directories
    - important the proc skips JUNCTIONS (FILE_ATTRIBUTE_REPARSE_POINT), but one, on the contrary, should be able to find solely JUNCTIONS, for example. to do it one needs to know the nesting level (in the callback, i think) allowing to stop execution under an unendless looping


Ok, back to one point of interest:
Quote:
...breadth-first... FIFO for pending subdirs

That is the point! pending:
THREADING -> CREATE_SUSPENDED -> resume -> sync -> STACK

But i have not much to say (apart from my personal experiments)
The point is just avoid that all-consuming-Explorer-like behaviour.

If one uses a control that stores strings of its own, a listbox
to say,it is simple 1 thread and then 1st ALL-level, 2nd ALL-levels,etc.

A coded solution,an abstraction,i mean coded Code that works,
flexible to be ported etc, would deserve surely a license.

Cheers, Very Happy
hopcode
Post 02 Feb 2010, 23:00
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
hopcode wrote:
example of non-default filename mask to test ?
For example, you are using that function to find *.asm files. How do you think, will directories not matching *.asm pattern be found by FindFirst/NextFile?
hopcode wrote:
no dir notification
That is related to previous: file name mask restricts the set of directory names that will be passed to callback.
hopcode wrote:
important the proc skips JUNCTIONS
That is intentional. I haven't decided yet how to handle hardlinks and symlinks.
hopcode wrote:
The point is just avoid that all-consuming-Explorer-like behaviour.
Nothing too complex with that search order: instead of pushing current directory on stack and scanning found subdirectory, enqueue it. When current subdirectory is processed, dequeue next and repeat. The problem is that subdirectory reference is slightly bigger than find file handle. Wink
Post 02 Feb 2010, 23:36
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.