flat assembler
Message board for the users of flat assembler.

Index > Windows > for tomasz, please replace the FractalExplorer64 example

Author
Thread Post new topic Reply to topic
tthsqe



Joined: 20 May 2009
Posts: 730
tthsqe
I found some bugs in the example 'FractalExplorer64' on the examples page. Could you remove it or replace it with the attached?
This one is much simpler and works in windowed mode.
the performance gains are close to linear:
SSE: 7.2 giga iterations per second
AVX: 13.7 giga iterations per second
AVX2: 21.5 giga iterations per second

attachment has been deleted now


Last edited by tthsqe on 15 Feb 2014, 22:44; edited 7 times in total
Post 24 Jan 2014, 04:48
View user's profile Send private message Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 730
tthsqe
The new version contains a deep version that goes up to 4096 bits of precision, though I doubt anyone has the patience to zoom that far. I was comparing two methods of squaring and it seems like 'mulx' does not give a huge benefit over the plain 'mul'. Does anyone see something wrong with the 'mulx' functions? They are in 'Sqrx.inc'. Here are the results (in iterations per second) for various precision levels:
Code:
bits  mul   mulx   mulx/mul
128   293M  302M   1.03
192   190M  201M   1.05
256   128M  144M   1.12
384   77M   87M    1.13    
Post 30 Jan 2014, 04:13
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7802
Location: Kraków, Poland
Tomasz Grysztar
The DeepExplorer silently quits on my AVX machine, what are its requirements?

Also this new version does not have an FPU code path at all - I liked in the previous version that it demonstrates the usage of so many different instruction sets.
Post 12 Feb 2014, 12:50
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 730
tthsqe
oops - the problem with the deep explorer is that I forgot to make a code switch for mul/mulx. Now it should detect if BMI2 is present and use the correct path.
Also, I added an fpu path to the shallow explorer. The reason I don't like the old version is that my register reloader introduced too many complications and made the code cryptic; this new version is very clear, has good performance, and can be followed easily by another person.
Post 13 Feb 2014, 01:36
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 7802
Location: Kraków, Poland
Tomasz Grysztar
Thank you for all the improvements, I have replaced the example on the official page.
Post 13 Feb 2014, 16:37
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.