flat assembler
Message board for the users of flat assembler.
Index
> Windows > FindFiles recursive Goto page 1, 2 Next |
Author |
|
baldr 23 Jan 2010, 14:27
hopcode,
Several comments: 1. First of all, do you have to pay for each character written in source, or you purchase them in bulk for a discount? 2. FindFirstFileA() limits the length of lpFileName-pointed zstring to MAX_PATH==260 characters (for strlen(lpFileName)>260 it fails with ERROR_FILENAME_EXCED_RANGE==206, even with strlen(lpFileName)==260 it can't find existing file (ERROR_PATH_NOT_FOUND==3)). FindFirstFileW() accepts "\\?\" file name prefix to overcome that limitation, but your version is apparently ANSI. Hence 512-byte buffer is an overkill. 3. Though Windows seems to specifically handle "*.*" filename mask to include in the results filenames without dot, "*" mask looks better and less ambiguous ("*_*" is a mask for filenames containing underscore without extension, or those with extension should match too?) 4. It's better to adhere to some standard calling conventions (findfiles is stdcall, why callback function uses custom calling convention?), imagine that someone wants to call your function from HLL. And what is the purpose of calling findfiles without callback? HDD thrashing? 5. Probably it will run on 9x/Me, but NT hates misaligned stack. WIN32_FIND_DATAA is 318 bytes long, after sub esp,sizeof.WIN32_FIND_DATA FindFirstFileA() fails with fuzzy ERROR_NOACCESS==998 "Invalid access to memory location". Is this a beta version? There is a plenty of room for improvement… |
|||
23 Jan 2010, 14:27 |
|
hopcode 25 Jan 2010, 00:57
i didnt understand the question No 1, but it
set me up a good mood , and that results in a point for you. Good, also yes..., the nested proc is not so severe as in your comments-requirements-list (i know that you know, but i know it too). Anyway, good requirement list. I will try to do my best for it, even if perhaps i am expecting too much from you,(baldr), and especially concret coded baldr wrote: ...improvements just before/at the ret 4 - ret 12 nested procs. i dont like symmetry. Symmetry sounds... diabolic to me. Regards, hopcode |
|||
25 Jan 2010, 00:57 |
|
Borsuc 25 Jan 2010, 20:14
Nice, maybe you should call it "ListFilenames" because at first I thought it returned handles to the files. (I know, bad bad thinking lol )
|
|||
25 Jan 2010, 20:14 |
|
baldr 26 Jan 2010, 00:00
hopcode,
Here goes my attempt of similar program (slightly tested): Code: format PE console include "Win32A.Inc" stdcall scan_dir, _allfiles ret scan_dir: push ebp mov ebp, esp define file_mask ebp+8 ; start directory scan (subdirectory too) find_first: push ebx ; save caller ebx/handle invoke FindFirstFile, [file_mask], wfd cmp eax, INVALID_HANDLE_VALUE je step_back ; no files, return to prev. dir mov ebx, eax output: cinvoke printf, _fmt, [wfd.dwFileAttributes], wfd.cFileName mov eax, [wfd.dwFileAttributes] test eax, FILE_ATTRIBUTE_DIRECTORY jz find_next ; not directory, continue scan FILE_ATTRIBUTE_REPARSE_POINT = 0x400 test eax, FILE_ATTRIBUTE_REPARSE_POINT jnz find_next ; symbolic link, continue scan mov eax, dword[wfd.cFileName] cmp ax, "." je find_next ; same for dot and eax, 1 shl 24 - 1 cmp eax, ".." je find_next ; and double dot ; descend into dir and start scan again invoke SetCurrentDirectory, wfd.cFileName jmp find_first find_next: invoke FindNextFile, ebx, wfd test eax, eax jnz output invoke FindClose, ebx step_back: pop ebx ; dir scanned, restore handle/caller ebx cmp esp, ebp jnb done ; no more ebx on stack, done invoke SetCurrentDirectory, _updir ; ascend one dir level up jmp find_next done: pop ebp retn 4 _allfiles TCHAR "*", 0 _updir TCHAR "..", 0 _fmt TCHAR "%8x %.260s", 10, 0 align 4 data import library Kernel32, "Kernel32", MSVCRT, "MSVCRT" import Kernel32,\ FindFirstFile, "FindFirstFileA",\ FindNextFile, "FindNextFileA",\ FindClose, "FindClose",\ SetCurrentDirectory, "SetCurrentDirectoryA" import MSVCRT, printf, "printf" end data wfd WIN32_FIND_DATA It works from current directory, can be modified to accept path in file mask (then SetCurrentDirectory calls would be unnecessary). Callback is embedded into function (printf) for simplicity. |
|||
26 Jan 2010, 00:00 |
|
hopcode 26 Jan 2010, 11:43
baldr, i am deeply touched from your effort.
that is really surprising me, and that is a good feeling. Ok, i will not post anything before tonight or tommorrow night because i am busy at the moment (so i will profit of some hours-of-other-light-doing to think). My solution skip the last directory (or more), your solution fails in a recursive infinite, because perhaps already in the current dir. What follows is a temporary solution, because it does not distinguish between error and other ret value, but avoid infinite loop. Note that all matching files are forced in a SetCurDir, and this is slowing somehow the whole proc. Code: ; descend into dir and start scan again invoke SetCurrentDirectory, wfd.cFileName test eax,eax jz find_first Anyway, that is not the matter. The matter is that i really like this, and i will work on this: Code: step_back: pop ebx ; dir scanned, restore handle/caller ebx cmp esp, ebp jnb done ; no more ebx on stack, done - FILE_ATTRIBUTE_REPARSE_POINT i.e. <JUNCTION> is a MUST-DO, i have experienced this http://blogs.msdn.com/oldnewthing/archive/2004/12/27/332704.aspx on a bad install of IExplorer. - Path in mask is a really good idea. Borsuc wrote: ...call it "ListFilenames"... Thanks for the help, Hear you soon, hopcode |
|||
26 Jan 2010, 11:43 |
|
baldr 26 Jan 2010, 13:57
hopcode wrote: …your solution fails in a recursive infinite, because perhaps already in the current dir. Code: output: sub esp, (MAX_PATH+3) and -4 invoke GetCurrentDirectory, MAX_PATH, esp cinvoke printf, _fmt, [wfd.dwFileAttributes], wfd.cFileName, esp add esp, (MAX_PATH+3) and -4 ... _fmt TCHAR "%8x %.260s", 10, "%.260s", 10, 0 hopcode wrote: Note that all matching files are forced in a SetCurDir, and this is slowing somehow the whole proc. There is another problem: hardlinks. It's not easy to ensure that callback will be called only once for each unique file (not name). Probably this is not an issue, let callback handle this. My function isn't thread-safe due to static wfd. This is because I was focused on iterative approach; wfd can be made automatic. Perhaps I'll write another version, for something like scandir "C:\WINDOWS\*.DLL". |
|||
26 Jan 2010, 13:57 |
|
hopcode 27 Jan 2010, 16:10
It is ok the listfiles proc now:
- fixed the dir scan - proc reduced of few bytes i havent found yet a good solution to avoid repeating some lines. other fixings, as from baldr's requirements-list i will do it later. Ok. do we challenge ? if anyone interested in the challenge, you could propose your version under the following condition
- no UNC files, for simplicity (but you might it to) that's all. In the meanwhile i will try to extract a timer proc framework from other threads to use it in the challenge. Cheers, hopcode |
|||
27 Jan 2010, 16:10 |
|
Borsuc 27 Jan 2010, 17:24
The best way to handle junctions is IMO, to just ignore them.
|
|||
27 Jan 2010, 17:24 |
|
hopcode 31 Jan 2010, 04:38
Here a new version,308 bytes only.This is different in design (i prefer this new one).
EDIT: fixed bug for ..0 as files/directory EDIT: listfiles return EAX=num items found /files/dir) - Useful is to know the nesting level (in CX) at each moment. - Useful too is having the number of items at each callback notify. EDIT: Added userparam to send to the calback The callback follows the x32 API convention. It must preserve EBP/EBX/ESI/EDI. It could be improved a lot, but i will try first at a discrete design . usage: Code: push userparam ;param = 0 /param = value push callback ;0 no callback / callback address push FILE_ATTRIBUTE_READONLY or FILE_ATTRIBUTE_DIRECTORY ;filter push 1 ;maxlevel push 0 ;szMaskFile push szPath call listfiles - Maxlevel can be -1 = all subdirectories /0 = current directory / N=depth - filter acts as an OR (at least one of the flags that matches) filter before notifying the file/dir to the callback (i.e. no match -> no notify) - userparam to send to the callback example Code: proc callback \ _w32fd,\ _numitems,\ _userparam push ebx push edi push esi mov ebx,[_userparam] ;<------ userparam ;IN EAX=dirname ;IN EDX= filename or dir ;IN CX = level / 0=current dir ;<------------------ ;IN SHR ECX,16 = len of EAX ;<---------------- ;RET EAX=0 stop execution push edx push eax push szFormat cinvoke printf add esp,12 xor eax,eax inc eax pop esi pop edi pop ebx ret endp and the proc listfiles Code: listfiles: push ebp mov ebp,esp sub esp,512+320+4+4+4 label .upath dword at ebp+8 label .umask dword at ebp+12 label .ulevel dword at ebp+16 label .ufilter dword at ebp+20 label .ucback dword at ebp+24 label .uparam dword at ebp+28 label .path dword at ebp-(512+320+12) label .w32fd dword at ebp-(320+12) label .nitems dword at ebp-12 label .level dword at ebp-8 label .mask dword at ebp-4 push ebx push edi push esi mov esi,[.upath] ;user path lea ecx,[.path] mov edx,[.umask] xor eax,eax test esi,esi mov [.level],eax mov [.nitems],eax jz .err_lf mov edi,ecx @@: lodsb stosb test al,al jnz @b sub edi,ecx dec edi xchg ecx,edi ;len of path lea esi,[.w32fd] test edx,edx jnz @f mov edx,.def_mask @@: push .err_lf mov [.mask],edx .listit: ;IN EDI path ;IN ESI w32fd ;IN ECX len push ebx sub esp,4 xor eax,eax mov [esp],ecx inc [.level] mov [esi+2ch],eax push esi push edi push esi push edi mov al,"\" add edi,ecx stosb mov esi,[.mask] @@: lodsb stosb test al,al jnz @b invoke FindFirstFile pop edi pop esi or eax,eax jle .err_liB mov ebx,eax .next_liA: lea edx,[esi+2Ch] ;WIN32_FIND_DATA.cFileName mov eax,[edx] cmp ax,002Eh jz .next_liB cmp eax,2E2Eh jz .next_liB ;--------- match filter ------ mov ecx,[.ufilter] and ecx,[esi] test ecx,ecx jz .next_liC inc [.nitems] mov ecx,[esp] mov eax,edi push ebx push edi xor ebx,ebx push ebp mov dword [edi+ecx],ebx push esi rol ecx,16 mov ebx,[.ucback] mov cx,word[.level] test ebx,ebx jz .next_liE dec ecx push [.uparam] push [.nitems] push esi call ebx .next_liE: pop esi pop ebp pop edi pop ebx test eax,eax jz .next_liD .next_liC: mov eax,[esi] ; +WIN32_FIND_DATA.dwFileAttributes mov ecx,[esp] test al,FILE_ATTRIBUTE_DIRECTORY jz .next_liB ;--------- match level ------- mov eax,[.ulevel] inc eax jz @f cmp eax,[.level] jbe .next_liB @@: push esi push edi mov al,"\" add edi,ecx stosb lea esi,[esi+2Ch] ;WIN32_FIND_DATA.cFileName @@: lodsb stosb inc ecx test al,al jnz @b pop edi pop esi call .listit test eax,eax jz .next_liD .next_liB: push esi push ebx invoke FindNextFile test eax,eax jnz .next_liA inc eax .next_liD: push eax push ebx invoke FindClose dec [.level] pop eax .err_liB: add esp,4 pop ebx ret 0 .err_lf: pop esi pop edi pop ebx mov eax,[.nitems] add esp,4+4+4+320+512 pop ebp ret 24 align 4 .def_mask: db "*",0,0,0 Cheers, hopcode . . . Last edited by hopcode on 02 Feb 2010, 00:15; edited 2 times in total |
|||
31 Jan 2010, 04:38 |
|
baldr 01 Feb 2010, 12:07
hopcode,
Code: void my_function(void *param) { ... if (param==0) listfiles(..., parameterless_callback, param); // this one with ret 0 else listfiles(..., parameterised_callback, param); // this one with ret 4 ... } |
|||
01 Feb 2010, 12:07 |
|
hopcode 01 Feb 2010, 13:02
baldr wrote: stdcall or cdecl conventions for that. Yes, you are completely right about. RET 0 /RET 4 is somewhat boring. I cannot resolve myself wether should be fixed RET 4/8 (one/two params to the callback) or RET 0. Your opinion ? Mine is that the more fixed, the more the user feels stability and... consequently believes/acts as an intelligent person . Perhaps 2 params, EAX/ECX/EDX infos and fixed RET 8 in the callback and we go whole hog. NOTE about symlinks: the new version gives nesting level in CX. . . EDIT baldr wrote: ..0 is a valid name for file/directory. important. thanks for reporting.it is already on the way to-be-fixed. . |
|||
01 Feb 2010, 13:02 |
|
baldr 01 Feb 2010, 14:06
hopcode,
Register parameters are good for callback function written in assembly language; fastcall differs between x86-32 compilers; probably stdcall will fit the bill? cdecl can be good too, if parameters are const (so callback can't modify them, then caller can simply pop them back). Nesting level can't help to resolve circular directory structure, it can limit recursion (if FindFirstFile doesn't already). Probably another callback (to decide whether to descend into found directory/symlink)? Filename mask issue is difficult too: either two passes should be done, or pattern matching. Or callback should decide itself. What if we split problem in two? First function scans subdirectory structure, callback scans found subdirectories for matching files? Two passes, again. Definitely there is more of that than it seems from first look. |
|||
01 Feb 2010, 14:06 |
|
revolution 01 Feb 2010, 14:19
The MS 'search' box in explorer will search all first level directories first, then move to all second level, then third, fourth, etc. I expect they did some research and found that most searches are found close to the root. Perhaps you can consider incorporating this behaviour into your code to speed up the search.
Last edited by revolution on 01 Feb 2010, 15:37; edited 1 time in total |
|||
01 Feb 2010, 14:19 |
|
hopcode 01 Feb 2010, 15:34
revolution wrote: ...I expect they did some research and found that most searches are found close to the root What i would really like to have is a scan ordering: - 1st level,2nd level etc. This could be done having, as baldr said, 2 dedicated procs, one for files and one for directories. I tried such solution, but i cannot avoid to do it multithreaded and, as you know, TEB for each directory. that is a lot of resources. That is the reason why i would prefer one buffer/one stack. Or btw: updated to parse ..0 files/directories |
|||
01 Feb 2010, 15:34 |
|
baldr 01 Feb 2010, 22:58
hopcode,
Single (MT-aware) function probably can do it. __________ revolution, Tagged file system, that's what we need. Associative memory. |
|||
01 Feb 2010, 22:58 |
|
hopcode 02 Feb 2010, 00:36
baldr wrote: Single (MT-aware) function probably can do it. Yes,with less than 1024 bytes total stack it could result in a real gem of programming. I will be glad to study and use it.! Ok about general _stdcall (registers preserving) but EAX/EDX/ECX will be used to convey infos in the callback (fixed on RET 12 / 3 parameters always, if implemented) Also, No mercy upon HLL programmers! sorry One needs 10 minutes to learn to use the function. If one learn to use a function, one should learn something "deepened" in the system instead of something on the nearer surface. In fact, from the user side, errors should be caused (apart from internal bugs in the function) more like for a lack of knoweledge of the system than for the reason that a function tries to be user-friendly, despite the intelligence of the programmer. No mercy! It will be a good chance to learn inline assembly for the future! . . Cheers, hopcode |
|||
02 Feb 2010, 00:36 |
|
baldr 02 Feb 2010, 20:38
hopcode,
Got some time, wrote this: Code: proc scandir uses ebx, mask, callback, param local .mask[MAX_PATH]:BYTE, .wfd WIN32_FIND_DATA mov ecx, [mask] lea edx, [.mask] push edx ; assume no path given .copy_mask: mov al, [ecx] inc ecx mov [edx], al inc edx cmp al, '\' je .mark_mask cmp al, '/' jne .test_nul .mark_mask: mov [mask], ecx ; update mask ptr mov [esp], edx ; update .mask ptr jmp .copy_mask ; no need to test .test_nul: test al, al jnz .copy_mask ;;; first pass, expects: ;;; esp: pointer to pattern in .mask ;;; N*file find handle (N==0 on entry) ;;; saved caller's ebx ;;; .mask: current path + pattern .1st_pass: invoke FindFirstFile, addr .mask, addr .wfd cmp eax, INVALID_HANDLE_VALUE jz .2nd_pass ; no files matching filename mask, scan subdirs ; found first file, cut off pattern from .mask mov ebx, eax mov edx, [esp] mov byte[edx], 0 .check_file: test [.wfd.dwFileAttributes], FILE_ATTRIBUTE_DIRECTORY jnz .next_file ; skip dir invoke callback, addr .mask, addr .wfd, [param] .next_file: invoke FindNextFile, ebx, addr .wfd test eax, eax jnz .check_file invoke FindClose, ebx ;;; second pass, expects: ;;; same stack as 1st ;;; .mask: at least current path .2nd_pass: mov edx, [esp] mov word[edx], '*' ; append "*" as pattern to .mask invoke FindFirstFile, addr .mask, addr .wfd cmp eax, INVALID_HANDLE_VALUE je .dir_done ; failed to found any, count as done mov ebx, eax .check_dir: mov eax, [.wfd.dwFileAttributes] test eax, FILE_ATTRIBUTE_DIRECTORY jz .next_dir ; skip non-dir FILE_ATTRIBUTE_REPARSE_POINT = 0x400 test eax, FILE_ATTRIBUTE_REPARSE_POINT jnz .next_dir ; skip symlink mov eax, dword[.wfd.cFileName] cmp ax, "." je .next_dir ; same for dot and eax, 1 shl 24 - 1 cmp eax, ".." je .next_dir ; and double dot ;;; got dir to scan, modify .mask and adjust stack pop edx ; pattern pointer into .mask push ebx ; save file find handle ; append subdir name and backslash to .mask instead of "*" pattern lea ecx, [.wfd.cFileName] .append_subdir: mov al, byte[ecx] inc ecx mov byte[edx], al inc edx test al, al jnz .append_subdir mov byte[edx-1], '\' push edx ; save pattern pointer ; append pattern to mask mov ecx, [mask] .cat_filemask: mov al, byte[ecx] inc ecx mov byte[edx], al inc edx test al, al jnz .cat_filemask jmp .1st_pass ; ready to start over! .next_dir: invoke FindNextFile, ebx, addr .wfd test eax, eax jnz .check_dir invoke FindClose, ebx ;;; done with dir, backtrack .dir_done: pop edx ; pattern pointer pop ebx ; file find handle ; check for file find handles stack empty lea eax, [.mask] cmp esp, eax jnb .done ; that was caller's ebx ; scan .mask backward for backslash .trim_last_dir: dec edx cmp edx, eax jbe .trimmed ; no more backslashes (mask was simply pattern) cmp byte[edx-1], '\' jne .trim_last_dir .trimmed: push edx ; save pattern pointer jmp .next_dir ; proceed to search with popped handle .done: ret endp Two passes are done: first invokes callback for each matching non-directory entry, second iterates over all subdirectories and adjusts mask/stack. Estimated stack usage (data structures only): (318+2)+260+4+4*N+4 == 588+4*N, where N is maximum nesting level. Depth-first search is simple (stack size depends on nesting level), breadth-first requires some serious efforts (FIFO for pending subdirs, for example). |
|||
02 Feb 2010, 20:38 |
|
hopcode 02 Feb 2010, 23:00
309 bytes, good work, use of the stack is ok.
i need some time to study something in it that attracts me. Question: baldr wrote: Non-default filename mask can mask out directories minor fixes: - no dir notification (in callback only files, 20h using your old printf format) they are important items when the function is a means to replicate directories - important the proc skips JUNCTIONS (FILE_ATTRIBUTE_REPARSE_POINT), but one, on the contrary, should be able to find solely JUNCTIONS, for example. to do it one needs to know the nesting level (in the callback, i think) allowing to stop execution under an unendless looping Ok, back to one point of interest: Quote: ...breadth-first... FIFO for pending subdirs That is the point! pending: THREADING -> CREATE_SUSPENDED -> resume -> sync -> STACK But i have not much to say (apart from my personal experiments) The point is just avoid that all-consuming-Explorer-like behaviour. If one uses a control that stores strings of its own, a listbox to say,it is simple 1 thread and then 1st ALL-level, 2nd ALL-levels,etc. A coded solution,an abstraction,i mean coded Code that works, flexible to be ported etc, would deserve surely a license. Cheers, hopcode |
|||
02 Feb 2010, 23:00 |
|
baldr 02 Feb 2010, 23:36
hopcode wrote: example of non-default filename mask to test ? hopcode wrote: no dir notification hopcode wrote: important the proc skips JUNCTIONS hopcode wrote: The point is just avoid that all-consuming-Explorer-like behaviour. |
|||
02 Feb 2010, 23:36 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.