flat assembler
Message board for the users of flat assembler.

Index > Linux > Vector libraries for C++ on FASM in x86-64 Linux

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Perhaps Pentium IV's trace cache alleviates the unaligned code issue? If you can prepare a ready to run test bench for both aligned and unaligned code I could test on a Core i3 and an Athlon64 here.
Post 03 Sep 2012, 12:47
View user's profile Send private message Reply with quote
jackblack



Joined: 04 Feb 2011
Posts: 12
jackblack
I'm sorry. I missed the 2-nd page of this topic. Now when I read you message, I made 2 binary files with aligned and unaligned code.
I uploaded them to my Google Disk.
Here are the links:
https://docs.google.com/open?id=0B_IqYbGFxbzeNm9oNkVncFYwdm8
https://docs.google.com/open?id=0B_IqYbGFxbzeWHpRVjhBb2Zkb3c
If you can test them on many CPUs, then it will be great.

On my PC I got following results:
Unaligned code
Code:
################################################################################
#       Array library speed test                                               #
################################################################################
This test operates on 10000000 elements wide flt64_t arrays in 100 rounds.

Addition:
    Classic scalar code time: 8.469804 sec
    LinAsm vector code time: 8.507637 sec

Subtraction:
    Classic scalar code time: 8.525537 sec
    LinAsm vector code time: 8.505540 sec

Multiplication:
    Classic scalar code time: 8.524257 sec
    LinAsm vector code time: 8.476669 sec

Division:
    Classic scalar code time: 22.848977 sec
    LinAsm vector code time: 17.084971 sec
    


Aligned code
Code:
################################################################################
#       Array library speed test                                               #
################################################################################
This test operates on 10000000 elements wide flt64_t arrays in 100 rounds.

Addition:
    Classic scalar code time: 8.492586 sec
    LinAsm vector code time: 8.521262 sec

Subtraction:
    Classic scalar code time: 8.544034 sec
    LinAsm vector code time: 8.527114 sec

Multiplication:
    Classic scalar code time: 8.546466 sec
    LinAsm vector code time: 8.501122 sec

Division:
    Classic scalar code time: 22.880396 sec
    LinAsm vector code time: 17.094742 sec
    


In these tests aligned code looks slower, but it is not real slowdown. It is because my PC did other tasks in the same time.
Post 08 Sep 2012, 10:10
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Code:
loco@Ubuntu-VAIO:~/Desktop$ ./ArrayTest_aligned && ./ArrayTest_unaligned && ./ArrayTest_aligned && ./ArrayTest_unaligned 
################################################################################
#       Array library speed test                                               #
################################################################################
This test operates on 10000000 elements wide flt64_t arrays in 100 rounds.

Addition:
    Classic scalar code time: 2.037717 sec
    LinAsm vector code time: 1.898958 sec

Subtraction:
    Classic scalar code time: 2.023843 sec
    LinAsm vector code time: 1.887334 sec

Multiplication:
    Classic scalar code time: 2.040209 sec
    LinAsm vector code time: 1.892707 sec

Division:
    Classic scalar code time: 11.264690 sec
    LinAsm vector code time: 5.922402 sec
################################################################################
#       Array library speed test                                               #
################################################################################
This test operates on 10000000 elements wide flt64_t arrays in 100 rounds.

Addition:
    Classic scalar code time: 2.058576 sec
    LinAsm vector code time: 1.871824 sec

Subtraction:
    Classic scalar code time: 2.024328 sec
    LinAsm vector code time: 1.916844 sec

Multiplication:
    Classic scalar code time: 2.023449 sec
    LinAsm vector code time: 1.889113 sec

Division:
    Classic scalar code time: 11.173319 sec
    LinAsm vector code time: 5.934704 sec
################################################################################
#       Array library speed test                                               #
################################################################################
This test operates on 10000000 elements wide flt64_t arrays in 100 rounds.

Addition:
    Classic scalar code time: 2.016459 sec
    LinAsm vector code time: 1.892393 sec

Subtraction:
    Classic scalar code time: 2.026710 sec
    LinAsm vector code time: 2.007047 sec

Multiplication:
    Classic scalar code time: 2.222073 sec
    LinAsm vector code time: 2.026494 sec

Division:
    Classic scalar code time: 11.787808 sec
    LinAsm vector code time: 6.126286 sec
################################################################################
#       Array library speed test                                               #
################################################################################
This test operates on 10000000 elements wide flt64_t arrays in 100 rounds.

Addition:
    Classic scalar code time: 2.278812 sec
    LinAsm vector code time: 2.034858 sec

Subtraction:
    Classic scalar code time: 2.272292 sec
    LinAsm vector code time: 2.026706 sec

Multiplication:
    Classic scalar code time: 2.106645 sec
    LinAsm vector code time: 2.063666 sec

Division:
    Classic scalar code time: 11.977029 sec
    LinAsm vector code time: 6.221352 sec
loco@Ubuntu-VAIO:~/Desktop$ lsb_release -a
No LSB modules are available.
Distributor ID:  Ubuntu
Description:      Ubuntu 12.04.1 LTS
Release:      12.04
Codename:  precise
loco@Ubuntu-VAIO:~/Desktop$ uname -a
Linux Ubuntu-VAIO 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux    
(The first two runs shouldn't be considered since the processor is not running in performance mode initially)
This was ran under a VirtualBox, host is Windows 7 64-bit. The computer is this one: Sony VAIO - VPCEJ16FX/B
Short specs:
Code:
Processor

    Processor Type : Intel® Core™ i3-2310M
    Processor Technology : Dual Core
    Processor Speed : 2.10GHz1
    Processor Cache : 3MB

Memory

    Installed Memory : 4GB (2GB x 2)
    Memory Type/Speed : DDR3/1333MHz
    Max. Memory : 8GB    


Can't test on the Athlon64 now, I'll come back to you with the results later.

PS: If anyone trying to run the binaries gets the message "No such file or directory", after making sure of having execute permissions set up right (with "chmod +x ArrayTest_*"), check you have either a file or a symlink at /lib/ld-linux-x86-64.so.2, if not add a symlink to your interpreter. In my case I solve the problem with "sudo ln -s /lib/x86_64-linux-gnu/ld-2.15.so ld-linux-x86-64.so.2"

PS2: I've realized the other day that my CPU re-introduced the trace cache, but smaller than the one the Pentium 4 has, so if unalignment isn't hitting the performance perhaps it is due to this "L0" cache.
Post 08 Sep 2012, 22:45
View user's profile Send private message Reply with quote
jackblack



Joined: 04 Feb 2011
Posts: 12
jackblack
I guess, I got the answer why there are not differences between aligned and unaligned code in recent CPUs. Pentium4 (NetBurst) and almost all processors after it, has cache of decoded instructions. Assembly instructions are decoded to internal RISC commands only ones, and then they are stored in the instructions cache. So when you organize small loops and function calls, the decoded instructions are read from CPU cache, but not from memory. That is why code alignment is not actual nowadays.

It's just my own interpretation of the results. Who has any information about why the aligned and unaligned code has the same performance, please let me know.
Post 10 Sep 2012, 05:56
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.