flat assembler
Message board for the users of flat assembler.

Index > Linux > spectralnorm bench

Author
Thread Post new topic Reply to topic
Melissa



Joined: 12 Apr 2012
Posts: 71
Melissa
This time I'have implemented
spectral norm bench and I am quite satisfied as first shot was quite successful
http://shootout.alioth.debian.org/u64q/program.php?test=spectralnorm&lang=java&id=2

This program is useful as example for threads implemented with sys_clone
and some synchronization primitives like barrier and mutex implemented
with futex syscall. Also there is some sse again.

Program executes at about 2 secs on q6600 @ 2.4 GHz and is faster
than C++ version. Don't have intel fortran to test that.
All in all this is fun to chase Wink


Description:
Download
Filename: spectralnorm2.asm
Filesize: 9.65 KB
Downloaded: 212 Time(s)

Post 03 May 2012, 00:09
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Perhaps I'm a bit lost, but shouldn't you use AMDPad16 BEFORE the labels?

Example:
Code:
        AMDPad16
.L0:
;       AMDPad16
        dec r8
        xorpd xmm0,xmm0
        xor ebx,ebx
        AMDPad16
.L1:
;       AMDPad16    
That way you'd have the labels aligned to 16 bytes and the padding would be executed less often. But again, maybe I misunderstood your macro. Or perhaps you found out executing the padding often is actually making it work better??

BTW, for some odd reason the link you posted directs me to the n-body problem you posted earlier even though the test web param says otherwise. This link worked for me: http://shootout.alioth.debian.org/u64q/benchmark.php?test=spectralnorm&lang=java
Post 03 May 2012, 00:27
View user's profile Send private message Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 71
Melissa
I didn't know how to use macro, I took it from this board.
Thanks for correction Wink

Greetings!
Post 03 May 2012, 08:57
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2141
Location: Estonia
Madis731
I agree that the padding should go in front of the label. I've seen C compilers generate align 2 and align 4 before labels and I think you only need align 16 in case of SSE. Sometimes aligning too much will hurt performance.

I would rather align call destinations to 16 and leave them labels be.

In case the nops (padding) before the label is never executed for example proc) you don't need a macro, but a simple align 16 will suffice.

A tiny optimization hint. I don't think we see 65536 CPU-s anytime soon so its safe to assume div word[threadnum]. You're using qword division by default which (according to Agner's Sandy Bridge listings) is 34-56 clocks and upto 94 clocks of latency. 32, 16 an 8-bit wide divisions are 10-11 clocks and only upto 24 or 28 clocks.
This means you can start crunching through *real* numbers about 66 clocks earlier Smile You can do at least a dozen useful calculations insead.


Is it possible to submit ASM programs up there?
Post 04 May 2012, 14:07
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Melissa



Joined: 12 Apr 2012
Posts: 71
Melissa
Thanks for suggestions Wink

I asked and they responded that don't want assembler programs there.
Perhaps that will change in future as I think that running such benchmarks
could popularize asm programming (and are good exercise).
Post 04 May 2012, 21:01
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Melissa wrote:

I asked and they responded that don't want assembler programs there.


Maybe they think it's almost "cheating"? (???) Or maybe they're just scared of what will happen. Laughing
Post 05 May 2012, 12:56
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
rugxulo wrote:
Maybe they think it's almost "cheating"? (???) Or maybe they're just scared of what will happen.
More like putting the World Heavy Weight Boxing Champion in the ring against the midgets. It would be no contest and ultimately boring.

Well written assembly will always outperform well written HLL/compiler on any non-trivial task.
Poorly written assembly can be beaten by almost anything, so be careful about general comparisons, they won't always be accurate.
Post 05 May 2012, 13:02
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.