flat assembler
Message board for the users of flat assembler.

Index > Main > STDIN/STDOUT to/from memory buffer; Battle FASM vs C++

Goto page Previous  1, 2, 3, 4  Next
Author
Thread Post new topic Reply to topic
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 03 May 2012, 15:48
JohnFound wrote:
But allocating memory and reading at once is 4..5 times faster...
Hm, you shouldn't be able to see much speed difference in the I/O part unless you use ridiculously small buffer sizes. On a 64bit Win7 with a SSD, I found that optimum speed and CPU usage was reached at 32kb buffer size - above that, there weren't much difference. Will of course depend on OS and hardware (and whether you read from local disk or network), but with regard to CPU usage and wall-clock-time, there shouldn't be a lot of difference between reading the file in one big chunk, or X chunks of a reasonable buffer size.

Processing speed is of course another matter. What kind of processing are you doing? Do you need the entire content in memory before you can process, or would it be possible to do it chunked? If you can do it chunked, async I/O might be worth investigating.

_________________
Image - carpe noctem
Post 03 May 2012, 15:48
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 03 May 2012, 17:20
On my computer (eeepc 1.6GHz with HDD) I made two tests with 80Mbytes test file:

1. allocate/reallocate single block with doubling the size every time.
1.1 Start buffer size 1024bytes - 1100ms
1.2. with start buffer size 4Meg - 1000ms

2. With GetFileSize and single allocation of the proper size buffer the loading time decreases approximately to 250ms

The information read, is source file with markdown formatted text. It should be converted to HTML. Because of forward definitions, the source file must to be compiled in two passes. That is why I need the whole source in the memory.

I planned to write markdown parser for my needs, but this one is part of programming battle on the one of the Bulgarian programming forums.
After some holly war on ASM vs HLL theme, I challenged my opponents to prove its statements for the great HLL compilers with this task, implemented in HLL and assembly.
It was their demand to use STDIN/STDOUT as a source and destination of the code.
The final winner will be distinguished by tests with huge artificially generated markdown texts by the independent battle committee.
This is the whole story. Smile
Post 03 May 2012, 17:20
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 03 May 2012, 17:42
LOL... I am so proud of you son.

OK, so you are going to implement both ASM and HLL( Which one C or C++ ? ... JAVA ?)

If you use JAVA just take advantage of regular expressions. I suggest you use C though.
Post 03 May 2012, 17:42
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 03 May 2012, 17:49
JohnFound wrote:
On my computer (eeepc 1.6GHz with HDD) I made two tests with 80Mbytes test file:

1. allocate/reallocate single block with doubling the size every time.
1.1 Start buffer size 1024bytes - 1100ms
1.2. with start buffer size 4Meg - 1000ms

2. With GetFileSize and single allocation of the proper size buffer the loading time decreases approximately to 250ms
OK, that sounds pretty weird! - with a 4meg buffer you'd do, what, one alloc and 6 reallocs... shouldn't spend too much time in heap code, even if the reallocs can't expand and have to move to a new zone.

Which OS is this? Granted, I did my tests with a file of several gigabytes, so I could be sure the file wasn't cached between runs... and I purely tested read speed with a fixed buffer size. I wonder if something is screwy, if there's a first-time penalty to use HeapAlloc (or did you use some other allocation method?), or if the heap functions are simply that slow...

Btw, if you go VirtualAlloc with MEM_RESERVE and then MEM_COMMIT chunks as necessary... without large pages, reserving 2 gigabytes of address space costs 2 megabytes of "virtual size" and about 100kb working set. Haven't tested how fast the VirtualAlloc calls are, but they should have less logic than HeapAlloc... OTOH, they will require a ring3->ring0->ring3 roundtrip, where HeapAlloc will likely over-allocate so it can stay in user mode - in other words, you'll want to MEM_COMMIT not-too-small chunks if you go that way. Bonus is that you'll get a contiguous buffer, so processing code can possibly be simpler... but you must be sure to MEM_RESERVE more than the largest expected input size.

JohnFound wrote:
The information read, is source file with markdown formatted text. It should be converted to HTML. Because of forward definitions, the source file must to be compiled in two passes. That is why I need the whole source in the memory.
Not familiar enough with markdown formatting to know exactly what's needed, but it sounds like you could do the reading in async and chunked mode combined with 1st pass processing, and then do 2nd pass while in memory. It's going to complicate your code, but you basically get 1st pass for free (process one chunk while waiting for the next IO to become ready).

_________________
Image - carpe noctem
Post 03 May 2012, 17:49
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 03 May 2012, 17:50
typedef wrote:
If you use JAVA just take advantage of regular expressions.
I thought the point was to create something fast?

_________________
Image - carpe noctem
Post 03 May 2012, 17:50
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 03 May 2012, 18:07
f0dder, the OS for these results is Windows XP. I tested the code in Linux as well but it was slower, approximately twice. It is possibly because I use installation with virtual disks (mint4win or what is the same Wubi installer)
I will think once more about asynchronous processing and loading in several chunks. But I am afraid the processing will become more complex and slow and this will eat the gain from the faster read.

typedef, I never use HLL anymore, so I can't make the HLL version.
The HLL language will be a choice of my opponent.
IMHO, he will use plain C or C++ - this is the only way to have some chance in this battle.
Post 03 May 2012, 18:07
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 03 May 2012, 18:27
JohnFound wrote:
f0dder, the OS for these results is Windows XP. I tested the code in Linux as well but it was slower, approximately twice. It is possibly because I use installation with virtual disks (mint4win or what is the same Wubi installer)
That definitely does mean you can't depend on those benchmark results at all Smile, virtualizing messes massively with stats of any kind. Like, it's often faster for me to install OSes on a virtual machine than on physical hardware...

JohnFound wrote:
I will think once more about asynchronous processing and loading in several chunks. But I am afraid the processing will become more complex and slow and this will eat the gain from the faster read.
It definitely does complicate things, so the only way you can win by doing it is if the 1st pass is the relatively complicated... or if you can reduce the cost of 2nd pass by doing more preprocessing in 1st pass than you'd normally have. Disk I/O is immensively slow, so you might be able to get your 1st pass processing for free... but you'd have to be sure to do it in a way to your 2nd is (at least) as fast with chunks of memory as it is with your current contiguous-buffer implementation Smile. I guess stuff like determining line offsets (or, if you don't need that, at least forward-reference offsets) during 1st pass could yield some interesting results.

JohnFound wrote:
IMHO, he will use plain C or C++ - this is the only way to have some chance in this battle.
Java actually isn't too shabby if you do it right - but you kinda have to be a C/C++ programmer to do high-performance Java Wink

_________________
Image - carpe noctem
Post 03 May 2012, 18:27
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 05 May 2012, 04:38
If someone is interested about this programmers battle, the thread with my article and the flame war is here - it is in Bulgarian language.
I submitted to the repository the first alpha and posted compiled versions for tests in the above thread.
The size of the Windows executable is 3584 bytes and Linux one 2272 bytes.
My opponent (with nick dvader - it is symbolical, the battle is vs the dark side of the force. Very Happy) claims his executable is the same size - 3584 bytes. He will post the source code later today or tomorrow.
Obviously, I should make some size optimizations and probably speed optimizations as well.
Funny...
Post 05 May 2012, 04:38
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 05 May 2012, 05:27
Hack the PE down to 300bytes and try not to use imports.

Use the kernel hack where you get the kernel base and find all function pointers. Or you can hard code them. Very Happy

Ok. I used translator and correct me if I'm wrong the guy(dvader) says C/C++ compilers are faster, portable and modern compilers can beat most programmers?

Even through translator I can tell he knows that's not true but won't accept it.
Post 05 May 2012, 05:27
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 05 May 2012, 05:44
typedef wrote:
Ok. I used translator and correct me if I'm wrong the guy(dvader) says C/C++ compilers are faster, portable and modern compilers can beat most programmers?


Well, what else can say HLL follower? Wink

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 05 May 2012, 05:44
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 05 May 2012, 05:53
JohnFound wrote:
typedef wrote:
Ok. I used translator and correct me if I'm wrong the guy(dvader) says C/C++ compilers are faster, portable and modern compilers can beat most programmers?


Well, what else can say HLL follower? Wink



Does he know Assembly at all..
Post 05 May 2012, 05:53
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 05 May 2012, 05:56
Actually I don't know. He claims 3k executable with C++ OOP. We will see what he will post at the end.
Post 05 May 2012, 05:56
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 05 May 2012, 05:57
JohnFound wrote:
Actually I don't know. He claims 3k executable with C++ OOP


haha.. OOP my ass. He'll modify it.
Post 05 May 2012, 05:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20303
Location: In your JS exploiting you and your system
revolution 05 May 2012, 06:04
It might be possible for a .NET OOP program to be that small. Of course when it runs it will pull in the 50MB+ .NET runtime, but that doesn't count, right?
Post 05 May 2012, 06:04
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 05 May 2012, 06:25
Well, I don't know. But if he uses .net I will not be able to test it on my computers, so I will claim a foul.
Post 05 May 2012, 06:25
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2909
Location: 0x77760000
typedef 05 May 2012, 06:26
^^hhmm...Would you be nice to prove me wrong sir ?....err...maam
Post 05 May 2012, 06:26
View user's profile Send private message Reply with quote
Inagawa



Joined: 24 Mar 2012
Posts: 153
Inagawa 05 May 2012, 09:09
Sorry to butt in, but I have to ask. The claim that learning assembly is useless, because compilers nowadays are so advanced that most of the time they'll will do a better job is bullshit, right? Right?
Post 05 May 2012, 09:09
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 05 May 2012, 09:32
Inagawa, right! But it is not my statement. Also, IMO, just "learning" not enough. Learning of assembly language is a must, if one want to be good HLL programmer.
One have to use assembly language, in order to be "assembly programmer".
Post 05 May 2012, 09:32
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria
JohnFound 05 May 2012, 19:36
Now I have the first version of the dark forces solution. dvader managed to create 4k executable.
But I don't have any CPP environment. So, please some of you that have these skills to check the project for me and to post some opinion. Is it fair enough to be accepted. What tricks he used to make 4k executable?

P.S. It is very, very slow. 56s vs 0.61s


Description:
Download
Filename: Markdown.rar
Filesize: 12.36 KB
Downloaded: 309 Time(s)


_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 05 May 2012, 19:36
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
dancho



Joined: 06 Mar 2011
Posts: 74
dancho 06 May 2012, 09:39
well the app is C++ with OOP alround Smile , nothing unusuall about it so should be accept...
as far as optimization:
for compilar :
optimization : full (/Ox)
inline : any suitable (/Ob2)
and favor small code : (/Os)
he uses fastcall convetion for functions,/MT runtime library,defines WIN32_LEAN_AND_MEAN
for linker :
references : eliminate unreferenced data (/OPT:REF)
enable COMDAT folding : remove redundant COMDATs (/OPT:ICF)

so make 4k exe with this setting is expected...
Post 06 May 2012, 09:39
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3, 4  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.