flat assembler
Message board for the users of flat assembler.

Index > High Level Languages > Human vs. compiler

Goto page Previous  1, 2, 3
Author
Thread Post new topic Reply to topic
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
just a note: Java for mobiles is not j2ee. It is some j2-m-something, and there are 3 of them.
Post 25 Dec 2006, 10:32
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
rugxulo: too bad that most phones only are programmable using java Sad
Post 25 Dec 2006, 15:27
View user's profile Send private message Visit poster's website Reply with quote
Maverick



Joined: 07 Aug 2006
Posts: 251
Location: Citizen of the Universe
Maverick
Filter wrote:
Maverick wrote:
kohlrak wrote:
The computer can't always beat the human, but the human can always beat the computer.

Sure, expecially MIPS wise. Wink

In 50 years we'll be useless, and assimilated or replaced. Very Happy


Maverick, we will do rom dumps of our brain and run ourselves emulated on the fasted multi-core light processors ever designed.

Filter, that's what I call a good idea!! Let's not forget we're made also of NVRAM anyway (besides my granny who suffers of Alzeheimer :^) Wink
Post 25 Dec 2006, 19:23
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
good thing winCE allows programs native to processor. More OSes could.
Post 26 Dec 2006, 00:35
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
Fr3m3n



Joined: 21 Apr 2006
Posts: 5
Fr3m3n
rugxulo wrote:

(BTW, what C++ compiler is the best? I assume GCC, but some would probably say MSVC.)

What about some benchmark? Human vs compiler code Smile . It will also allow to see which compiler optimizes best.

This function 'draws' three-dimensional line, by placing points coords in a buffer.

C version:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int Line3d(double*,double,double,double,double,double,double);

void main(void){
    double bufor[800*3];
    for(int i=0;i<100000;++i)
    Line3d(bufor,0.2,0.2,0.3,1.0,1.0,0.2);
}


int Line3d(double* bufor,double xa,double ya, double za,double xb, double yb, double zb){
    register double wx,wy,wz=0.0;
    register double n=0;
    wx=xb-xa;
    wy=yb-ya;
    wz=zb-za;
    n=sqrt(wx*wx+wy*wy+wz*wz);
    n*=700.0; //size of world - cube 700*700*700
    unsigned int _n=n;
    //save first point' coords
    *bufor++=xa;
    *bufor++=ya;
    *bufor++=za;
    if(!_n)return 0;
    wx=wx/n;
    wy=wy/n;
    wz=wz/n;
    int i=0;
    while(i++<_n){
       xa+=wx;
       *bufor++=xa;
       ya+=wy;
       *bufor++=ya;
       za+=wz;
       *bufor++=za;
    }
    return i;
}
    


Asm version:
Code:
format PE GUI 4.0
entry start

include '%fasminc%\win32ax.inc'

section '.idata' import data readable
library kernel32,'KERNEL32.DLL',user32,'USER32.DLL'
include '%fasminc%\apia\kernel32.inc'
include '%fasminc%\apia\user32.inc'

section '.data' data readable writeable
F_SCALE dq 700.0 ;visible world at the moment
bufor rb 8*800*3 ;792
section '.code' code readable executable
start:
fninit ;set precision to 64, near, zero registers - fastest way
mov edi,100000 ;number of repeats
@@:
stdcall Line3d,bufor,double 0.2,double 0.2,double 0.3,double 1.0,double 1.0,double 0.2
dec edi
jz @f
jmp @b
@@:
ret

invoke MessageBox ; needed for nt loader
invoke ExitProcess

proc Line3d stdcall uses edi, \
buf:DWORD,xa:QWORD,ya:QWORD,za:QWORD,xb:QWORD,yb:QWORD,zb:QWORD ;(a=start point b=end point)
mov edi,[buf] ;address of world address
;wx=xb-xa;
fld [xb]
fsub [xa]
;wy=yb-ya;
fld [yb]
fsub [ya]
;wz=zb-za;
fld [zb]
fsub [za]
;st0=wz
;st1=wy
;st2=wx

;calculate n
fld st0 ;wz
fmul st0,st0
fld st2 ;wy
fmul st0,st0
fld st4 ;wx
fmul st0,st0
;st0=wx^2
;st1=wy^2
;st2=wz^2
;st3=wz
;st4=wy
;st5=wx
faddp st2,st0 ;wz^2+wx^2
xor ecx,ecx
faddp st1,st0 ;wy^2+(wz^2+wx^2)
push eax
fsqrt
;st0=n
fmul qword[F_SCALE] ;multiply n by size of a world
;st1=wz
;st2=wy
;st3=wx
;wx=wx/n
fxch st3
fdiv st0,st3
fxch st3
;wy=wy/z
fxch st2
fdiv st0,st2
fxch st2
;wz=wz/n
fxch st1
fdiv st0,st1
fxch st1

fistp dword[esp]
fld [za]
fst qword[edi+16]
inc ecx
fld [ya]
fst qword[edi+8]
pop eax ;eax=n
fld [xa]
mov edx,8
fst qword[edi]
add edi,3*8
test eax,eax
jz .koniec
;st0=xa
;st1=ya
;st2=za
;st3=wz
;st4=wy
;st5=wx
mov ecx,eax
@@:
;x
       fadd st0,st5
       fst qword[edi]
       add edi,edx
;y
       fincstp
       fadd st0,st3
       fst qword[edi]
       add edi,edx
;z
       fincstp
       fadd st0,st1
       fst qword[edi]
       add edi,edx
       dec eax
       jz .koniec
       fdecstp
       fdecstp
       jmp @b
.koniec:
fninit
mov eax,ecx
ret
ud2
endp
    

(can someone optimize it better? it's my best)
(written using sse will beat all hll code on knees, but fpu version is more fair (for hll) and works on older platforms.)

On my pc, asm version is about 8x faster than c version. But i have bad optimising compiler (lcc32), so this is not very reliable results.

It would be nice if someone will write his timings.
Post 26 Dec 2006, 04:42
View user's profile Send private message Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
Quote:
Heh. Again, check quake3 - there's some pretty nizzle stuff there, programmer pride, not just "selling the game". And as for "THINK are optimizations", remember your nit-picking about calling conventions


Optimization is optimization, it's the small things that make or break how much it's optimized in my book.

Quote:
You're bitching about code quality, and at the same time saying "we'll have to use what we can get" - that doesn't compute. Also, considering how much C and C++ code there's around, a lot of it sucks - but have a look at things like the MACH kernel, and code from the *BSD projects - that's generally pretty high quality code.


I'm saying it's a shame that we can't get picky, but we can't get picky cause there's nothing better. It's a shame we can't get picky...

Quote:
Dunno if it's easier to optimize in assembly, but it's certainly where you can gain the largest optimizations.


You can see the code in assembly, if it's a huge stack of code it's gonna take a while, verses C++ you can simple just nest a large calculation in 1 line, which is translated to alot of lines.

Quote:
Hard enough that they actually produce quality


but when i'm buying something i want perfection... I spent money on it, it better be worth my money.

Quote:
Not all of the Windows API is well designed - especially not things like the "common controls" introduced with IE. But return values are necessary... unless you want to introduce something like 'errno'.


but a little warning on things like ebx, ecx, edx, and other things getting edited would be rather nice.

Quote:
Registers, ho humm. The existing calling conventions makes a lot of sense, actually - scratch and persistance. If it wasn't done this way, either functions would be free to trash all registers (and caller would have to preserve everything, often a waste of time), functions would have to preserve all registers (not really needed, again a waste of time), or you'd have per-function specification (a horrible mess).


Per-function specification wouldn't be that much of a mess, there aren't very many registers, plus you could make a graphical chart. You just have to say they're edited, not why they're edited.

Quote:
It makes sense having to cover the basics before you move on, and be limited to the basics so you learn them well - not everybody knows a lot when they take a computer class. Sucks if there's important topics you won't cover, and from other posts you've written it sounds like your class sucks. Things aren't like that everywhere though. I can only speak from what I know from Denmark, where the quality is generally very high.


but some of the stuff that we can't use is the bare basics that would be useful but will never be taught. If it's good quality there, then power to ya, but not everywhere do you find quality.

Quote:
Where would you use this to enhance execution time so much? It sounds like you're doing a lot of useless micro-optimizations that you don't get much benefit from, instead of writing clear code and doing optimizations where they matter.


Well, take the following bad example for example:

Code:
if (var>= 'a' && <= 'z')    


A nice optimization would be:

Code:
if (var&64)    


I think it was 64... I can't remember, but ascii has a logic to it... Anyway, imagin a program that must convert everything to capitals or everything to lowercase to run, perhaps it was a script interpreter or something. We're talking about thousands of times of checking this code if it's a large program. Imagin if the interpreter had to be run while the program you're interpreting is being executed. Now, this is a bad example of such optimizations, but it is an example none the less.

Quote:
(BTW, what C++ compiler is the best? I assume GCC, but some would probably say MSVC.)


MSVC is commonly used to help with GUI and such, but i've used soemthing called digital mars (command line), devC++, and some others. i really can't say.

Quote:
I think C being free form makes it much MUCH easier to write ugly code. IOCCC was actually inspired by a sh clone or whatever. Plus, there are just too many C/C++ compilers (although I actually like that ... fun!) and standards (ANSI, DOS, Win, *nix, C99, POSIX, K&R, etc.). That's just my opinion, though. (Maybe if I didn't suck at C, but it's just such a seemingly HUGE language with a billion quirks ... argh).


and some C programmers argue that assembly has too many instructions and regsiters to worry about.

Quote:
OS/2 had a kernel written in assembly (supposedly), and that was popular. Face it, writing portable HLL (e.g., C) code is cool and all, but it isn't always the best idea. Your replies always seem a bit biased (in a good way, I guess) based on your experience. Your opinion is always valuable to me, though


The cross platform thing annoys me about HLLs, as i find often untrue. Yea, printf is printf on all platforms, but when you get into things like graphics and sound, you're probably learning your operating system, i find it even easier just to make your thing lower than the OS rather than adding something like they do in java, which isn't perfectly cross platform, either.
Post 27 Dec 2006, 06:52
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
kohlrak wrote:

Optimization is optimization, it's the small things that make or break how much it's optimized in my book.

So a small "optimization" (lets say C vs. STDCALL) whose effect you can't even measure... that's a "make or break" thing? Imho the trick is to focus on areas where you can actually gain something, unless you're writing code for the sake of the code itself.

kohlrak wrote:

You can see the code in assembly, if it's a huge stack of code it's gonna take a while, verses C++ you can simple just nest a large calculation in 1 line, which is translated to alot of lines.

What's easiest and fastest to make heads and tails of? A few lines of math in C++, or a screenful or two of x87 FPU? Smile What's easiest to write, maintain, and verify against formula?

kohlrak wrote:

but when i'm buying something i want perfection... I spent money on it, it better be worth my money.

"perfection" doesn't exist anywhere, since it doesn't really make any sense - and I'm not just talking about software. And do keep in mind that just because something isn't perfect doesn't mean it's crap.

kohlrak wrote:

Per-function specification wouldn't be that much of a mess, there aren't very many registers, plus you could make a graphical chart. You just have to say they're edited, not why they're edited.

There's a shitload of API functions though, and it would be senseless doing this kind of documentation. You'd get just about zero benefit from the information. And if a function was updated to use a smarter algorithm at the expense of a register, it'd have to do some register preservation to avoid breaking it's interface contract. In the long run, the current scratch/precious register split makes the most sense.

kohlrak wrote:

I think it was 64... I can't remember, but ascii has a logic to it... Anyway, imagin a program that must convert everything to capitals or everything to lowercase to run, perhaps it was a script interpreter or something. We're talking about thousands of times of checking this code if it's a large program. Imagin if the interpreter had to be run while the program you're interpreting is being executed. Now, this is a bad example of such optimizations, but it is an example none the less.

ASCII is so 1990'es Smile

Seriously though, even if you don't do unicode and even if you're not doing MBCS either, there's still OEM codepages to consider. If you hardcode toupper/tolower (or any case insensitiveness) that way, you get screwy behaviour for non-english locales.

And then there's code readability - the "range compare" version is self commenting, the bit-test needs a comment.

But yes, compilers often don't see the same opportunities as a human does, and with bit operations, depending on underflow being represented as overflow etc. can lend to some nice optimizations. Much of this is doable with C++ code as well, although some carry tricks aren't easily done with C++.

kohlrak wrote:

MSVC is commonly used to help with GUI and such, but i've used soemthing called digital mars (command line), devC++, and some others. i really can't say.

MSVC has pretty good code generation, which is the important thing here. You're mixing up the compiler with the IDE Smile. DevC++ is, again, an IDE, and typically used with the GCC compiler. Digital Mars is pretty nifty for a free product, but it's code generation isn't all the way up there, and it's use of OMF objects instead of coff is a bit tedious.
Post 27 Dec 2006, 07:57
View user's profile Send private message Visit poster's website Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
Quote:
So a small "optimization" (lets say C vs. STDCALL) whose effect you can't even measure... that's a "make or break" thing? Imho the trick is to focus on areas where you can actually gain something, unless you're writing code for the sake of the code itself.


not 1 small optimization, but many small optimizations. All huge optimizations are a collection of many small optimizations.

Quote:
What's easiest and fastest to make heads and tails of? A few lines of math in C++, or a screenful or two of x87 FPU? What's easiest to write, maintain, and verify against formula?


It's the runtime that counts to the customer. Whatever happened to the customer always being right? And for those who don't have customers (free ware programmers) they are expelled from this rule, since you can't argue with free software.

Quote:
"perfection" doesn't exist anywhere, since it doesn't really make any sense - and I'm not just talking about software. And do keep in mind that just because something isn't perfect doesn't mean it's crap.


true, but remember that the more imperfection it is, the closer to crap it is. It should at least be as best as it can be for the customer. Often i find loading times for software like "microsoft word" and such to be rather much. And to think that these programs take up all that space.

Quote:
There's a shitload of API functions though, and it would be senseless doing this kind of documentation. You'd get just about zero benefit from the information. And if a function was updated to use a smarter algorithm at the expense of a register, it'd have to do some register preservation to avoid breaking it's interface contract. In the long run, the current scratch/precious register split makes the most sense.


First off, we shouldn't have all the functions that we have. Some of which, i have noticed, aren't exactly perfect either. I've been screwing around with this one function that i half expected to work like a "gotoxy", but it dosn't seem to do that. Maybe it is my interpretation that is wrong, or maybe it is buggy. I don't know if it's true or not, but i heard a joke that says, "Even Microsoft dosn't even know what is in windows, anymore." Then to top it off, we could have things update. How often do you update your OS with all these little updates? A few years apart, and if the software programmers cared enough to go with these updates, they could simply update their program.

Quote:
Seriously though, even if you don't do unicode and even if you're not doing MBCS either, there's still OEM codepages to consider. If you hardcode toupper/tolower (or any case insensitiveness) that way, you get screwy behaviour for non-english locales.


indeed, but i have had much trouble with unicode in the console window. The only true unicode that i've ever managed to get in a console window was with WriteConsoleW i think the function was called. wcout seemingly dosn't work, and wprintf has given me some trouble, as well.

Quote:
And then there's code readability - the "range compare" version is self commenting, the bit-test needs a comment.


Of course, but it'd be wise to comment both lines anyway.

Quote:
But yes, compilers often don't see the same opportunities as a human does, and with bit operations, depending on underflow being represented as overflow etc. can lend to some nice optimizations. Much of this is doable with C++ code as well, although some carry tricks aren't easily done with C++.


Oh, of course. And as i've said in the other posts, sadly these little C++ tricks aren't taught anymore in many classes. With C++, it is easy to not know something and get along fine without knowing it, verses assembly, where it's hard to program if you don't know it all.

Quote:
MSVC has pretty good code generation, which is the important thing here. You're mixing up the compiler with the IDE . DevC++ is, again, an IDE, and typically used with the GCC compiler. Digital Mars is pretty nifty for a free product, but it's code generation isn't all the way up there, and it's use of OMF objects instead of coff is a bit tedious.


Well the compiler does come with the IDE. If you take away the IDE, you my as well be coding in assembly cause you don't have to worry about all those possible syntax errors. lol
Post 27 Dec 2006, 08:24
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
kohlrak wrote:

not 1 small optimization, but many small optimizations. All huge optimizations are a collection of many small optimizations.

Not in my experience - the optimizations that have really mattered have, for me, always been focused on a few critical areas... small 'optimizations' scattered all over don't really matter.

Of course you still have to write decent code - choose the proper data structures, algorithms, and program design. Those decisions matter a lot - not details of whether to use C, STDCALL or SYSCALL calling conventions.

kohlrak wrote:

It's the runtime that counts to the customer. Whatever happened to the customer always being right? And for those who don't have customers (free ware programmers) they are expelled from this rule, since you can't argue with free software.

Hm, "the runtime" - what do you mean with this?

kohlrak wrote:

Often i find loading times for software like "microsoft word" and such to be rather much.

Compare word2000 and OpenOffice loading speeds... then say that again. Repeat on a pmmx-200/64ram, and then try saying it once again. (And yes, that's with the speedup item from the start menu removed). Haven't looked at later versions though.

kohlrak wrote:

First off, we shouldn't have all the functions that we have. Some of which, i have noticed, aren't exactly perfect either.

The win32 API is a mess, partly because it was carried over from win16, partly because there's too many different teams involved (the "common controls" introduced with IE aren't very comfortable to work with IMHO), et cetera.

Bould would you rather want a limited system where you can only do things one way? And where you'd have to write things like tree controls, multi-line edit controls etc. yourself from scratch? A system where threading either has to be done by manual scheduling, or by using processes to emulate threads?

kohlrak wrote:

I've been screwing around with this one function that i half expected to work like a "gotoxy", but it dosn't seem to do that. Maybe it is my interpretation that is wrong, or maybe it is buggy.

Most likely your interpretation is wrong - blame yourself before blaming others. Or perhaps you have a register preservation problem, that was recently a cause for a guy on the asmcommunity board having trouble with some line drawing code.

kohlrak wrote:

I don't know if it's true or not, but i heard a joke that says, "Even Microsoft dosn't even know what is in windows, anymore."

No single source in Microsoft knows all the details of the entire system - that's generally a good idea, with a system that large. Focus on a few key areas and do that well. Problem is that, yeah, it's become a bit too large, and not all programmers are equally skilled. Plus of course the extremely heavy burden of legacy support.

kohlrak wrote:

Then to top it off, we could have things update. How often do you update your OS with all these little updates? A few years apart, and if the software programmers cared enough to go with these updates, they could simply update their program.

If there's a bug, you need to fix it asap. If you realize you can boost performance, you might want to issue an update.

Would you like having to update all your installed programs everytime there's a Windows Update? No? Didn't think so.

Having register specifications per function is plain lame.

kohlrak wrote:

f0dder wrote:

And then there's code readability - the "range compare" version is self commenting, the bit-test needs a comment.

Of course, but it'd be wise to comment both lines anyway.

Nope. Never comment anything that is truely self-explaining - that only wastes time and adds line noise. Consider this:
Code:
foo.bar(baz, fififofum); // call bar on foo with baz, fifofum
    

I've seen similar kind of comments in real code (where the various names are of course more sensible). One word: useless.

kohlrak wrote:

Oh, of course. And as i've said in the other posts, sadly these little C++ tricks aren't taught anymore in many classes. With C++, it is easy to not know something and get along fine without knowing it, verses assembly, where it's hard to program if you don't know it all.

In lots of coding, it's irrelevant anyway. It's more important to get program design right, and write solid code. Much of the code written today doesn't need speed optimizations, but it needs to be solid. Too much time would be wasted if people had to do the kind of optimizations you're advocating all over the place.

kohlrak wrote:

Well the compiler does come with the IDE. If you take away the IDE, you my as well be coding in assembly cause you don't have to worry about all those possible syntax errors. lol

Heh.

I do a lot of my coding outside the IDE, using Notepad++ or VIM, and a shell with a commandline compiler. And if there's a compilation warning or error in HLL code, that's a possible subtle bug in a lower-level language that's avoided Smile
Post 27 Dec 2006, 23:21
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
f0dder wrote:
Much of the code written today doesn't need speed optimizations, but it needs to be solid. Too much time would be wasted if people had to do the kind of optimizations you're advocating all over the place.


Bogus. I understand what you mean (not everything is used often enough to need to be lightning-fast), but there are LOTS of programs I wish ran faster (e.g., PAQ8?, which thankfully has an MMX speedup routine). C often adds about 10% to a program's overhead, and C++ is even more (20%?). In certain kinds of programs (compression, compilers, or anything used fairly often, not just games), speed IS important. Seriously, yes, optimization shouldn't necessarily start at the beginning of a program's development. But you can often avoid certain bottlenecks with a bit of forethought. "An ounce of prevention is worth a pound of cure."

P.S. I wrote a trivial .ASM proggie recently that turns out to be 10x slower than the C version. (I'm guessing because the C version is line buffered. So, now I have to go optimize the slower-but-smaller .ASM version. Laughing).


Last edited by rugxulo on 28 Dec 2006, 20:23; edited 1 time in total
Post 28 Dec 2006, 04:04
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
rugxulo: sure, there are classes of programs that really should be optimized, and de/compressors are one of those; data structures and algorithm tuning, then possibly an assembly implementation. There's been nifty speedups to both WinRAR and 7-zip from time to time, so optimization certainly isn't fruitless - I don't think anybody's trying to argue that, though Wink

Compilers are a tough bunch - pretty complex data structures, and NP-complete problems that have to be approximated.

My point is just that for the large majority of applications (and probably most of the code even in a compressor) doesn't really need it. You don't gain anything from writing GUI fluff in assembly, or optimizing a few cycles away from an API call that ends up incurring a ring transition.

C/C++ overhead, ho humm - depends on what you're comparing it to (naïve assembly? fine-tuned code?), which compiler, which piece of code etc. As for C++ being more overhead, please check out this: http://blog.sc.tri-bit.com/archives/169 . Really depends on what you're doing - write retarded code and you'll suffer, as always Smile

rugxulo wrote:

Seriously, yes, optimization shouldn't necessarily start at the beginning of a program's development. But you can often avoid certain bottlenecks with a bit of forethought. "An ounce of prevention is worth a pound of cure."

I agree 100% here. If you don't do your program design properly (ie, make a bitblt that internally uses get/setpixel), if you choose wrong algorithms (bubblesorting huge arrays, anyone?) or wrong data structures (linked list when an array would be better, or the reverse) - you'll be screwed, and no amount of low-level optimization will help you.
Post 28 Dec 2006, 08:14
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
f0dder wrote:
C/C++ overhead, ho humm - depends on what you're comparing it to (naïve assembly? fine-tuned code?), which compiler, which piece of code etc.


That's true (see below).

rugxulo wrote:
P.S. I wrote a trivial .ASM proggie recently that turns out to be 10x slower than the C version. (I'm guessing because the C version is line buffered. So, now I have to go optimize the slower-but-smaller .ASM version. Laughing ).


Well, I wrote a program that prints out hex byte values (like DEBUG) of a binary file, twenty per line. The original .ASM version was like 10x slower than the C version. So, I rewrote it to output only when a full line has been "encoded". Now, it's about 30% faster than it used to be, but it's still a lot slower than several C compilers (GCC 3.4.4/DJGPP 2.03p2, BCC/Dev86 0.16.2 w/ updated LIBDOS.A + CRT0.A from DEV86CLB.ZIP, Turbo C++ 1.01). It does beat out CC386 3.27, though, by a large margin. I'm actually surprised that DJGPP beats out both 16-bit compilers (TC++ and BCC/Dev86), taking 1.5 secs for a 300k file instead of 2.5 secs or so.

P.S. Yes, I didn't really care about speed when writing it, just wanted it to work. So, I almost definitely wrote it incorrectly by using printf("%02X",((unsigned char)*buf)); (or whatever, I forget ...) on every byte. So, yes, I assume C's speedup is due to line buffering, but I still haven't matched it in ASM yet. (Of course, I'm not a very good programmer, so no big surprise there).

P.P.S. Notice that PAQ8? is written in portable C++, and the NASM (MMX speedup) src works with DOS, Win32, etc. That's probably the best way to go, write the program then optimize part of it. (Nobody ever died from a slow program though, but man, are they annoying!)
Post 28 Dec 2006, 20:21
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Yup, libc FILE* stuff traditionally has buffering (some of the minimal libcs like Jibz' WCRT don't though!), but it's still best not to output char-at-a-time - less overhead anyway.

Yup, portable C++ and assembly-where-it-matters is my "recipe for success". Not saying that everybody should work that way, but it's what makes sense when you work on real programs. Yes, a few people are very productive using only assembly, more power to them - that just doesn't work very well for most people, and isn't portable (not that this matters all the time).

CC836 doesn't really optimize iirc, and LCC-win32 has poor optimization (iirc some years ago the text on his website was that it was ~10% (more? less?) slower than VC6... which generates a lot worse code than vc2003 or newer).
Post 28 Dec 2006, 21:53
View user's profile Send private message Visit poster's website Reply with quote
kohlrak



Joined: 21 Jul 2006
Posts: 1421
Location: Uncle Sam's Pad
kohlrak
Quote:
Not in my experience - the optimizations that have really mattered have, for me, always been focused on a few critical areas... small 'optimizations' scattered all over don't really matter.


Perhaps you have a program where you have one crappy line of code. The more you use it, the more of a problem it is. Think about that...

Quote:
Hm, "the runtime" - what do you mean with this?


While it's running.

Quote:
Compare word2000 and OpenOffice loading speeds... then say that again. Repeat on a pmmx-200/64ram, and then try saying it once again. (And yes, that's with the speedup item from the start menu removed). Haven't looked at later versions though.


Why not just have nothing and access extra features as the program opens. Check the loading speed of microsoft word then the loading speed of notepad.

Quote:
The win32 API is a mess, partly because it was carried over from win16, partly because there's too many different teams involved (the "common controls" introduced with IE aren't very comfortable to work with IMHO), et cetera.


That's why they should have 1 guy solving 1 problem at a time and talking with the other members. No point in having 2 answers to the same problem if one is efficient.

Quote:
Bould would you rather want a limited system where you can only do things one way? And where you'd have to write things like tree controls, multi-line edit controls etc. yourself from scratch? A system where threading either has to be done by manual scheduling, or by using processes to emulate threads?


There should only be 1 way to print messages to a screen, other methods should use that particular method.

Quote:
Most likely your interpretation is wrong - blame yourself before blaming others. Or perhaps you have a register preservation problem, that was recently a cause for a guy on the asmcommunity board having trouble with some line drawing code.


i was using constants... Documentation isn't good enough in my book. Reminds me of direct x documentation. Multiple deffinitions for one word, and they don't define these words for you. I remember reading the direct x documentation and did a circle while looking up words.

Quote:
No single source in Microsoft knows all the details of the entire system - that's generally a good idea, with a system that large. Focus on a few key areas and do that well. Problem is that, yeah, it's become a bit too large, and not all programmers are equally skilled. Plus of course the extremely heavy burden of legacy support.


That's my main issue with Microsoft. I as a programmer like to be able to program whatever i want, yet i end up going around in a loop-de-loop.

Quote:
If there's a bug, you need to fix it asap. If you realize you can boost performance, you might want to issue an update.


If there is a more efficient agloritham to do something, chances are the answer is still the same. Because of this flexabiltiy of code issue, i'm a firm beleiver of functions preserving the registers automatically, and only passing back return values or values that are documented. If it edits the registers, either preserve them, or document the purpose behind not preserving them. Maybe that's inefficient, but that's the downside of using a code outside of your own.

Quote:
Would you like having to update all your installed programs everytime there's a Windows Update? No? Didn't think so.


Perhaps a less freaquent update. I very seldomly update windows, especially because i don't want to accidentally download windows' video drivers which love to crash my computer.

Quote:
Having register specifications per function is plain lame.


So is keeping a recipie book. So is documenting your functions at all.

Quote:
I've seen similar kind of comments in real code (where the various names are of course more sensible). One word: useless.


Perhaps instead of saying what you're calling, perhaps saying why you are calling it. If that is self explained, then there'sd no need to comment it. You must pretend that the person reading your code dosn't already know your algortham for solving the problem.

Quote:
I do a lot of my coding outside the IDE, using Notepad++ or VIM, and a shell with a commandline compiler. And if there's a compilation warning or error in HLL code, that's a possible subtle bug in a lower-level language that's avoided


I did learn C++ using digital mars. My ide: Notepad. When i started using fasm, that's actually what i was usind to edit my source files.

Quote:
My point is just that for the large majority of applications (and probably most of the code even in a compressor) doesn't really need it. You don't gain anything from writing GUI fluff in assembly, or optimizing a few cycles away from an API call that ends up incurring a ring transition.


But you forget that those little optimizations could help another program running, or help yours when another one is running. People multi-task alot, now.

Quote:
C/C++ overhead, ho humm - depends on what you're comparing it to (naïve assembly? fine-tuned code?), which compiler, which piece of code etc. As for C++ being more overhead, please check out this: http://blog.sc.tri-bit.com/archives/169 . Really depends on what you're doing - write retarded code and you'll suffer, as always


indeed. It takes a good programmer to write a code good to begin with, and a good algoritham in java beats a crappy one in assembly, but at the same time, a good algoritham in assembly still beats the same one in java. Less over hang. =p

Quote:
I agree 100% here. If you don't do your program design properly (ie, make a bitblt that internally uses get/setpixel), if you choose wrong algorithms (bubblesorting huge arrays, anyone?) or wrong data structures (linked list when an array would be better, or the reverse) - you'll be screwed, and no amount of low-level optimization will help you.


I consider that switching of your algoritham as optimization as well. Perhaps we don't see the same deffinition of optimization. I see optimization as any change in your code that makes it run faster/smaller, but produces the exact same result (minus it being faster and/or smaller). This includes whole new ways of approaching a problem.

Quote:
Yup, libc FILE* stuff traditionally has buffering (some of the minimal libcs like Jibz' WCRT don't though!), but it's still best not to output char-at-a-time - less overhead anyway.


Do as many calculations as you can before printing it. Perhaps that's why printf works better than cout, but i've seen that cout no longer prints 1 letter at a time. I hate undocumented changes...

Quote:
Yup, portable C++ and assembly-where-it-matters is my "recipe for success". Not saying that everybody should work that way, but it's what makes sense when you work on real programs. Yes, a few people are very productive using only assembly, more power to them - that just doesn't work very well for most people, and isn't portable (not that this matters all the time).


If you take away the STL you take away the portability of C++. Make a cross platform assembly library and your programs will be portable, just not cross processor, but even C++ apps are barely portable crossing processors. Instead of adding a new layer of indirection to solve a problem, why not attempt to remove a layer? That's my theory behind cross platform, since now many operating systems are becomming x86, like mac.
Post 28 Dec 2006, 22:42
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
kohlrak wrote:

Why not just have nothing and access extra features as the program opens. Check the loading speed of microsoft word then the loading speed of notepad.

Word2000 and the other office apps already do this, and can even install features on demand. If you overdo this (with use of DLLs) the net effect, however, will be higher memory & disk usage.

Word2000 loading speed is almost instant anyway - less than 1.5 seconds on my current box, and it wasn't much slower on my old pmmx-200.

And, well, use the right tool for the job. If you fire up word, it should be because you need it's features...

kohlrak wrote:

That's why they should have 1 guy solving 1 problem at a time and talking with the other members. No point in having 2 answers to the same problem if one is efficient.

I presume they have fine communication inside each team - there's multiple teams, though. And not much to do about that, it's unrealistic for one team to know everything about everything.

kohlrak wrote:

There should only be 1 way to print messages to a screen, other methods should use that particular method.

So, do you want that to be MessageBox or WriteFile or WriteConsole or WriteConsoleOutput? I'm afraid that one size doesn't fit all. Either you end up with too simple functions (like, not having async I/O, which would make the OS unusable for servers), or you end up with over-extremely-general functions (as if the amount of parameters to CreateFile or CreateWindowEx wasn't bad enough).

kohlrak wrote:

Perhaps a less freaquent update. I very seldomly update windows, especially because i don't want to accidentally download windows' video drivers which love to crash my computer.

I hope you're behind a NAT'ed router then, and perhaps a firewall as well.

Besides, drivers aren't downloaded automatically, as they're optional updates. And you have control of how automatic updates work (notify only, download, download & install).

kohlrak wrote:

f0dder wrote:

Having register specifications per function is plain lame.

So is keeping a recipie book. So is documenting your functions at all.

Not really - a recipe book helps you remember recipes, function documentation lets other people use it. Having per-function register guidelines doesn't really buy you anything, means you have to check each functions specifics (instead of having one rule that's valid for all functions), and is utterly unimportant for the target audience...

Of course you could also say that each function must preserve all registers, but then you have possible unnecessary push/pops, which conflicts with your own idea of über-optimizing everything.

When you're really optimizing a piece of code and need to use all registers, which is where per-function register specification might make a slight bit of sense, you shouldn't be calling functions anyway. It's more useful knowing that you don't have to preserve {eax,ecx,edx} and that {ebx,esi,edi} must be preserved.

kohlrak wrote:

Perhaps instead of saying what you're calling, perhaps saying why you are calling it. If that is self explained, then there'sd no need to comment it. You must pretend that the person reading your code dosn't already know your algortham for solving the problem.

Unfortunately, that's not how people tend to use those per-line comments.

If you keep your functions relatively short, with a single purpose, and logical names, you often don't need many comments - a single per-fucntion comment describing purpose & pre/post-conditions is often enough.

It takes skill and experience writing good code and good comments.


kohlrak wrote:

f0dder wrote:

My point is just that for the large majority of applications (and probably most of the code even in a compressor) doesn't really need it. You don't gain anything from writing GUI fluff in assembly, or optimizing a few cycles away from an API call that ends up incurring a ring transition.

But you forget that those little optimizations could help another program running, or help yours when another one is running. People multi-task alot, now.

Not really - those optimizations are entirely dwarfed by the cost of a ring transition, which is necessary to schedule another thread.

What does help you in multithreading, however, is knowing how and when to change thread priority, how to do efficient resource sharing (mutexes, critical sections, semaphores, shared memory, ...), knowledge of blocking and timing, etc.

kohlrak wrote:

indeed. It takes a good programmer to write a code good to begin with, and a good algoritham in java beats a crappy one in assembly, but at the same time, a good algoritham in assembly still beats the same one in java. Less over hang. =p

Sure, but typically it takes longer implementing an algorithm well in assembly, and it's not portable then. And if you spend your time doing micro-optimizations all over the program, you're probably failing in seeing the big picture and applying the optimizations that really matter.

But if you can do all this in assembly in the same time that a proficient programmer could do it in C++, well, I'll tip my hat to you, even if your code isn't portable Wink

kohlrak wrote:

I consider that switching of your algoritham as optimization as well. Perhaps we don't see the same deffinition of optimization. I see optimization as [i]any change in your code[/b] that makes it run faster/smaller, but produces the exact same result (minus it being faster and/or smaller). This includes whole new ways of approaching a problem.

Sure thing, I agree fully here.

Where I disagree with your, it would seem, is in which level and kind of optimization that's worthwhile.

kohlrak wrote:

Do as many calculations as you can before printing it. Perhaps that's why printf works better than cout, but i've seen that cout no longer prints 1 letter at a time. I hate undocumented changes...

do keep in mind that C++ iostreams support locales, don't buffer-overflow, have to be reasonably exception-safe, and are very extendable - you don't get that for free.

Haven't done any speed comparisons since ages ago in DOS, but I'm inclined to believe that if printf vs. iostream speed matters in your app, you aren't dong much anything interesting Smile (not that I would mind libc/libc++ being optimized as much as possible though, since it's used by a lot of apps).

kohlrak wrote:

If you take away the STL you take away the portability of C++. Make a cross platform assembly library and your programs will be portable, just not cross processor, but even C++ apps are barely portable crossing processors. Instead of adding a new layer of indirection to solve a problem, why not attempt to remove a layer? That's my theory behind cross platform, since now many operating systems are becomming x86, like mac.

You mean take away STL and libc... they are separate entities. And available on just about any C++ and C platform (although STL is generally very limited on embedded devices).

Cross platform assembly library means porting it to every processor - that'd take a shitload of time, if you want it to be available everywhere C/C++ is. You could move to a meta-asm language to alleviate that, but then you might as well be writing it in C and probably end up with better code.

Badly written C/C++ code isn't very portable, well-written code tends to be. You need to be careful when touching memory directly, sending things across the network etc., but it isn't that bad really (well, depending on just how portable you want to be - I don't care for 16bit CPUs).

While x86 is a lot, and almost exclusive in the desktop market, it isn't everything - and overall, it's not all that much. There's plenty of interesting 32- and 64bit CPUs around and in use...
Post 29 Dec 2006, 00:13
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
f0dder wrote:
And, well, use the right tool for the job. If you fire up word, it should be because you need it's features...


WordPad is the "super lite" version, no? Not all Windows come with Word (or even Works). But, there's always AbiWord (if OpenOffice is too big for you).

f0dder wrote:
kohlrak wrote:

indeed. It takes a good programmer to write a code good to begin with, and a good algoritham in java beats a crappy one in assembly, but at the same time, a good algoritham in assembly still beats the same one in java. Less over hang. =p

Sure, but typically it takes longer implementing an algorithm well in assembly, and it's not portable then. And if you spend your time doing micro-optimizations all over the program, you're probably failing in seeing the big picture and applying the optimizations that really matter.


Well, you obviously mean portable across architectures. Thanks to DOSBOX or BOCHS (both written in C++, I think Cool ), you ARE able to run assembly programs portably (e.g., I recently tried Invaders on a Mac OS X PPC laptop via DOSBOX ... it worked, albeit a drop too slow, no surprise).

f0dder wrote:
Badly written C/C++ code isn't very portable, well-written code tends to be. You need to be careful when touching memory directly, sending things across the network etc., but it isn't that bad really (well, depending on just how portable you want to be - I don't care for 16bit CPUs).


Yeah, I was just gonna whine that a lot of C code assumes 32-bit ints. Or weird non-standard includes (windows.h, errno.h).

f0dder wrote:
While x86 is a lot, and almost exclusive in the desktop market, it isn't everything - and overall, it's not all that much. There's plenty of interesting 32- and 64bit CPUs around and in use...


Yes. FASMARM wasn't created for nothing! But where do people find access to these things (universities? jobs?)? Oh I forgot! Probably handheld PDAs and old legacy gaming computers (Atari 800, Atari ST, Commodore Amiga, Commodore 64, etc.) and via emulation. Dunno about MIPS, Cray, etc. (I'm a n00b at architectures, heh).
Post 29 Dec 2006, 00:53
View user's profile Send private message Visit poster's website Reply with quote
MCD



Joined: 21 Aug 2004
Posts: 604
Location: Germany
MCD
[quote="Fr3m3n"]
rugxulo wrote:

Asm version:
Code:
format PE GUI 4.0
entry start

include '%fasminc%\win32ax.inc'

section '.idata' import data readable
library kernel32,'KERNEL32.DLL',user32,'USER32.DLL'
include '%fasminc%\apia\kernel32.inc'
include '%fasminc%\apia\user32.inc'

section '.data' data readable writeable
F_SCALE dq 700.0 ;visible world at the moment
bufor rb 8*800*3 ;792
section '.code' code readable executable
start:
fninit ;set precision to 64, near, zero registers - fastest way
mov edi,100000 ;number of repeats
@@:
stdcall Line3d,bufor,double 0.2,double 0.2,double 0.3,double 1.0,double 1.0,double 0.2
dec edi
jz @f
jmp @b
@@:
ret

invoke MessageBox ; needed for nt loader
invoke ExitProcess

proc Line3d stdcall uses edi, \
buf:DWORD,xa:QWORD,ya:QWORD,za:QWORD,xb:QWORD,yb:QWORD,zb:QWORD ;(a=start point b=end point)
mov edi,[buf] ;address of world address
;wx=xb-xa;
fld [xb]
fsub [xa]
;wy=yb-ya;
fld [yb]
fsub [ya]
;wz=zb-za;
fld [zb]
fsub [za]
;st0=wz
;st1=wy
;st2=wx

;calculate n
fld st0 ;wz
fmul st0,st0
fld st2 ;wy
fmul st0,st0
fld st4 ;wx
fmul st0,st0
;st0=wx^2
;st1=wy^2
;st2=wz^2
;st3=wz
;st4=wy
;st5=wx
faddp st2,st0 ;wz^2+wx^2
xor ecx,ecx
faddp st1,st0 ;wy^2+(wz^2+wx^2)
push eax
fsqrt
;st0=n
fmul qword[F_SCALE] ;multiply n by size of a world
;st1=wz
;st2=wy
;st3=wx
;wx=wx/n
fxch st3
fdiv st0,st3
fxch st3
;wy=wy/z
fxch st2
fdiv st0,st2
fxch st2
;wz=wz/n
fxch st1
fdiv st0,st1
fxch st1

fistp dword[esp]
fld [za]
fst qword[edi+16]
inc ecx
fld [ya]
fst qword[edi+8]
pop eax ;eax=n
fld [xa]
mov edx,8
fst qword[edi]
add edi,3*8
test eax,eax
jz .koniec
;st0=xa
;st1=ya
;st2=za
;st3=wz
;st4=wy
;st5=wx
mov ecx,eax
@@:
;x
       fadd st0,st5
       fst qword[edi]
       add edi,edx
;y
       fincstp
       fadd st0,st3
       fst qword[edi]
       add edi,edx
;z
       fincstp
       fadd st0,st1
       fst qword[edi]
       add edi,edx
       dec eax
       jz .koniec
       fdecstp
       fdecstp
       jmp @b
.koniec:
fninit
mov eax,ecx
ret
ud2
endp
    

(can someone optimize it better? it's my best)
(written using sse will beat all hll code on knees, but fpu version is more fair (for hll) and works on older platforms.)

ähh, use SSE stuff in the first place for such code Wink
Just to lazy to do it myself

_________________
MCD - the inevitable return of the Mad Computer Doggy

-||__/
.|+-~
.|| ||
Post 29 Dec 2006, 06:48
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2, 3

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.