flat assembler
Message board for the users of flat assembler.

Index > Main > STDIN/STDOUT to/from memory buffer; Battle FASM vs C++

Goto page 1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
I post the question here, because I need both solutions for Linux and Win32.
I need to read huge file from STDIN to the buffer in memory.

How can I determine the needed memory size before reading from the file?
AFAIK STDIN does not support GetFileSize. I am not sure about Linux.

Any considerations?

[edit]As long as the topic shifted, I slightly changed its title[/edit]

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9


Last edited by JohnFound on 09 May 2012, 16:54; edited 1 time in total
Post 01 May 2012, 15:52
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
LostCoder



Joined: 07 Mar 2012
Posts: 22
LostCoder
Realloc? Allocate some memory, for example 4096 bytes, when it full add 4096 bytes to existing size, then realloc buffer and continue reading.
Post 01 May 2012, 16:22
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Does the buffer need to be 100% contiguous? If not, do a linked list of decently-sized chunks.

If it does, consider mmap'ing a region that will "surely be large enough", but only reserve the region - then commit pages as you go.
Post 01 May 2012, 17:52
View user's profile Send private message Visit poster's website Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
LostCoder wrote:
Realloc? Allocate some memory, for example 4096 bytes, when it full add 4096 bytes to existing size, then realloc buffer and continue reading.


Yes this is how I do it in C. Read and allocate +2 bytes until CR/LF.

in C
Code:
int main(int argc, char* argv[])
{
char * buffer = 0;
char   _char  = 0;
int     _read = 0;

while( (_char = getchar()) != '\n')
{
   buffer = (char*)realloc(buffer,_read+++2);
   if(!buffer)  // if the first call of realloc fails
      break;

 buffer[_read-1]   = _char;
 buffer[_read]     = '\000';
}

puts("Read: ");
puts(buffer);

if(buffer)
     free(buffer);

getchar();

return 0;
}
    



FASM

Code:

You do it.

    


Last edited by typedef on 01 May 2012, 18:24; edited 2 times in total
Post 01 May 2012, 17:55
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
typedef wrote:
LostCoder wrote:
Realloc? Allocate some memory, for example 4096 bytes, when it full add 4096 bytes to existing size, then realloc buffer and continue reading.

Yes this is how I do it in C. Read and allocate +2 bytes until CR/LF.

Are you insane?

Realloc in itself is bad enough, you'll potentially end up with an über-fragmented heap... and allocating just 2 bytes extra? Ugh.

PS: on some OSes you might be able to use stat() or similar system calls, but it's by no means portable, and you might get the size of some internal pipe buffer rather than the actual file size. I'd recommend the mmap solution if you really need the buffer to be contiguous, or linkedlist of chunks if you can adapt whatever processing code.

_________________
Image - carpe noctem
Post 01 May 2012, 17:56
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
JohnFound wrote:
I post the question here, because I need both solutions for Linux and Win32.
I need to read huge file from STDIN to the buffer in memory.

How can I determine the needed memory size before reading from the file?
AFAIK STDIN does not support GetFileSize. I am not sure about Linux.


I think you answered your own question. If it's a file, it's not a stream, so why limit yourself to stream activities? If it's always going to be a file, treat it as a file and ignore the stream stuff.

EDIT: At the very least, you can probably externally list and then pipe the filesize into STDIN first, then start reading ....
Post 01 May 2012, 18:05
View user's profile Send private message Visit poster's website Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
f0dder wrote:
typedef wrote:
LostCoder wrote:
Realloc? Allocate some memory, for example 4096 bytes, when it full add 4096 bytes to existing size, then realloc buffer and continue reading.

Yes this is how I do it in C. Read and allocate +2 bytes until CR/LF.

Are you insane?

Realloc in itself is bad enough, you'll potentially end up with an über-fragmented heap... and allocating just 2 bytes extra? Ugh.

PS: on some OSes you might be able to use stat() or similar system calls, but it's by no means portable, and you might get the size of some internal pipe buffer rather than the actual file size. I'd recommend the mmap solution if you really need the buffer to be contiguous, or linkedlist of chunks if you can adapt whatever processing code.



It all depends on what you are trying to with it. For example, fscanf is more vulnerable to BOF attacks and this way you can sort of bypass that.

Also having a fucked up heap is because you don't use realloc right. You'll need an extra byte so as not to corrupt it.
Post 01 May 2012, 18:27
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
Well, my current solution is just that: reading chunks of the buffer, until the read byte count is smaller that the requested.
The every next chunk request is double of the previous in order to keep reallocations count as small as possible.
It works fine, but at the price of non elegant sollution and possibly allocation of double of the needed memory.

And I really can't avoid STDIN stream here.

The problem is that I need it for a contest for size/speed optimization, so I wanted to use the best solution... Anyway.
Post 01 May 2012, 18:43
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
LostCoder



Joined: 07 Mar 2012
Posts: 22
LostCoder
typedef, you can get memory leakage at this point if realloc fail:
Code:
   buffer = (char*)realloc(buffer,_read+++2);
   if(!buffer)  // if the first call of realloc fails
      break;    
correct is:
Code:
   char *newbuffer = (char*)realloc(buffer,_read+++2);
   if(!newbuffer)  // if the call of realloc fails
      break;
   buffer = newbuffer;
    
JohnFound, some counter-questions:
1. Do you need seeking?
2. How much data size can be?
3. What kind of data is supposed?
Post 01 May 2012, 18:55
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
LostCoder wrote:
JohnFound, some counter-questions:
1. Do you need seeking?
2. How much data size can be?
3. What kind of data is supposed?


1. No, I need everything from the STDIN to be read in the memory buffer. Then I will make several passes through the data in order to create the result.
2. Probably 20..30 MBytes. But the limit is the memory I can allocate.
3. The data is UTF-8 characters. (it is artificially constructed test file that is piped to the STDIN

The code I use now is this:
Code:
START_SIZE = 1024

proc ReadTheInput
begin
        mov     [SourceSize], START_SIZE

        stdcall GetMem, [SourceSize]
        mov     edi, eax

        xor     esi, esi

.readloop:
        mov     ebx, [SourceSize]
        lea     eax, [esi+edi]
        sub     ebx, esi

        stdcall FileRead, [STDIN], eax, ebx
        cmp     eax, ebx
        jne     .endoffile

        add     esi, ebx
        shl     [SourceSize], 1

        stdcall ResizeMem, edi, [SourceSize]
        mov     edi, eax

        jmp     .readloop

.endoffile:
        add     esi, eax
        mov     [SourceSize], esi
        xor     eax, eax

        mov     [pSourceBuffer], edi
        mov     [edi+esi], eax  ; zero terminated...

        return
endp
    


It is not polished to the every possible byte, but it is not needed actually.

The test file of 78Mbytes, piped to STDIN was loaded for 1100ms (intel atom 1.6GHz)
Is it fast or not? Is it possible to load it faster? I don't know.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 01 May 2012, 19:11
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22
If you want to optimize for speed make your START_SIZE something egregious like ~1MB or 512KB. The optimal size would be processor (data cache) & OS (internal buffer size) specific.

The less calls you make to ReadFile and ResizeMem APIs the faster it will run.
Post 01 May 2012, 19:47
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
Even 4Mbytes start size decrease the load time by 10%.
But if I set start size enough to take the data (80Mbytes), the time decrease 3..4 times! to 250..280ms
Post 01 May 2012, 20:29
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
well. If speed is not a problem you can read and write to a temporary file and count the bytes written. Then rewind the file, allocate the needed memory at once and read the temp file.

WHILE READ_FROM_STDIN
WRITE_TO_TEMP, BYTES, N_BYTES_READ
TOTAL_BYTES += N_BYTES_READ
ENDW

pBuff = ALLOC, TOTAL_BYTES + 1

READ_FROM_TEMP,pBuff,TOTAL_BYTES

......

FREE, pBuff


Overkill ?
Post 01 May 2012, 20:51
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17467
Location: In your JS exploiting you and your system
revolution
In Windows reading the STDIN does support GetFileSize. When reading from a file you can just get the size, allocate memory and then read. Easy. If it is not a file then GetFilesize returns an error and you can then deal with it in whatever way suits your program.
Post 01 May 2012, 22:21
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
Hm, it really works in Windows. I though GetFileSize does not work for STDIN.
But what about Linux?
Post 01 May 2012, 22:47
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
GetFileSize() vs stat()... yay, support two operating systems - or at least two specific kernel versions of two operating systems. Way to go. You're not even supporting two classes of OSes (win32 vs posix), just two very specific versioned APIs. And for very specific use cases.

Please specify your needs a bit more directly. Do you need to optimized just "getting STDIN entirely to a contiguous memory buffer" because of some extremely lame framework needs or synthetic benchmark, or do you have some more reasonable needs?

And please stay away from anything involving realloc - it might work decently on whatever OS you're testing on, but it might also end up with horrible results... take a look at how something like the windows heap APIs have changed over time, for instance.
Post 02 May 2012, 01:03
View user's profile Send private message Visit poster's website Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
AFAIK, stdin is a stream and bytes will always flow in. It's never a fixed size how would stat() work on it.

Have you actually tried it? Wink
Post 02 May 2012, 01:56
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
revolution, thanks for the hint. It works great in Windows. Now I have to check it in Linux, but I have some troubles with STAT structure definition.

f0dder, I am not very sure what you mean. I need exactly what I described - to read the whole content redirected through STDIN. I need it in most optimal and resource friendly manner. So, I don't want to use realloc at all.

typedef - it seems that STDIN is not exactly a stream, at least in Windows.
Post 02 May 2012, 08:19
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
JohnFound wrote:
f0dder, I am not very sure what you mean. I need exactly what I described - to read the whole content redirected through STDIN. I need it in most optimal and resource friendly manner. So, I don't want to use realloc at all.
Surely you need to do something with the content after you've read it all to memory. Does this "something" require that the data is in one big contiguous chunk?

JohnFound wrote:
typedef - it seems that STDIN is not exactly a stream, at least in Windows.
Will your app be run in a way the stdin is always redirected from a file? i.e., will it always be invoked by some other app, or does it make sense for a user to run the app standalone? Are you guaranteed the app will never have a pipe or socket for stdin?

_________________
Image - carpe noctem
Post 02 May 2012, 09:56
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3502
Location: Bulgaria
JohnFound
f0dder, the processing of the file is better to be in one block in memory. Of course, if the attempt to determine the exact data size fails, I can always fall back to the the block reading algorithm, described above.
But allocating memory and reading at once is 4..5 times faster...
Post 02 May 2012, 10:24
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.