flat assembler
Message board for the users of flat assembler.
Index
> Main > Is FASM taking advantage of Intel's Knights Corner (or MIC)? Goto page 1, 2 Next |
Author |
|
kalambong 16 Jul 2012, 11:16
I do not know if this message belongs in the "Main" section, if not, please move it to a more appropriate forum
Intel's Knights Corner, a variation of its cancelled Larabee project, is going to be available in the market as "MIC" http://en.wikipedia.org/wiki/Intel_MIC Reportedly it will be like 50 cores packed in one package, with 8GB of GDDR5 memory Is FASM going to take advantage of Intel's new offering? Last edited by kalambong on 16 Jul 2012, 23:38; edited 1 time in total |
|||
16 Jul 2012, 11:16 |
|
LostCoder 16 Jul 2012, 17:19
kalambong, well, wikipedia link says it still prototype, so I think it is too early to talk about it.
|
|||
16 Jul 2012, 17:19 |
|
randall 16 Jul 2012, 21:47
LostCoder wrote: kalambong, well, wikipedia link says it still prototype, so I think it is too early to talk about it. "Prototype products, codenamed Knights Ferry were announced and released to developers in 2010. A commercial release, codenamed Knights Corner to be built on a 22nm process is scheduled to go into production in late 2012." Knights Corner is not a prototype. It will be a commercial product. I think Knights Corner can be very interesting machine. |
|||
16 Jul 2012, 21:47 |
|
kalambong 16 Jul 2012, 23:33
Enko wrote: What is the diference between the normal cpu and this one? Apparently there may be some new instruction sets for the Knight's Corner products, although technically they do fall into the X86/X64 category. As I've been following this thing closely, perhaps I should share some links - Intel has released the software stacks for the MIC products to the Linux community, open-source, and is actively adding to it. http://software.intel.com/en-us/blogs/2012/06/05/knights-corner-open-source-software-stack/ http://www.phoronix.com/scan.php?page=news_item&px=MTExOTE Allow me to quote : Quote: GCC for Knights Corner is really only for building the kernel and related tools; it is not for building applications. Using GCC to build an application for Knights Corner will most often result in low performance code due its current inability to vectorize for the new Knights Corner vector instructions. Future changes to give full usage of Knights Corner vector instructions would require work on the GCC vectorizer to utilize those instructions’ masking capabilities. What the above is hinting is that there will be a lot more new stuffs, maybe in the form of new instruction sets, that can take full advantage of the vectorize ability of Intel's MIC products. Perhaps in this regards FASM can shine - if FASM can support the same vectorize features that are in the MIC family, FASM may be a good alternative to GCC for users who want to tap into the full power of MIC |
|||
16 Jul 2012, 23:33 |
|
Tomasz Grysztar 17 Jul 2012, 09:33
Its instruction set is a new variant of what the Larrabee instruction set was (with 512-bit vector registers). fasm never supported the latter, since it was just an exotic prototype. As for the MIC instructions - we will see, I may implement them if there is need for it, but certainly not soon.
|
|||
17 Jul 2012, 09:33 |
|
kalambong 17 Jul 2012, 23:57
Tomasz Grysztar wrote: Its instruction set is a new variant of what the Larrabee instruction set was (with 512-bit vector registers). fasm never supported the latter, since it was just an exotic prototype. As for the MIC instructions - we will see, I may implement them if there is need for it, but certainly not soon. Thank you, Tomasz, for your reply I do have nothing but respect for you, Sir. So, whichever path that you think is right for FASM in regarding to Intel's MIC, of course, I'll also respect It's just that, IMHO, in this age where many chip manufacturers, - such as Nvidia and AMD's ATi, - have decided that they won't allow assembly language programmers to program directly to the inner hardware registers of their cpu/gpu, Intel's MIC remains a tantalizing possibility GCC may one day implement the vectorized registers of Intel's MIC - but as we already know, it may take years, or even decades, for the gigantic GCC machinery to reach that stage And in the meantime, if people can get to the same vectorize registers through FASM (that is, if Intel releases sufficient info, and if you decide to implement them in FASM) - programmers who want to tap into the full potential of Intel's MIC could utilize FASM to realize their dream But of course, it's all vapor-ware talk, for now |
|||
17 Jul 2012, 23:57 |
|
tthsqe 18 Jul 2012, 01:06
My only question is: how good is its 64x64->128 bit multiply and 64 bit adc?
i.e. could it beat a 2600K or better in deep Mandelbrot zooms? 32 bit floats only have limited use and its hard to string them together for multiprecision... |
|||
18 Jul 2012, 01:06 |
|
hopcode 19 Jul 2012, 11:53
tthsqe wrote: My only question is: how good is its 64x64->128 bit multiply and 64 bit adc? dont expect too much from it. there are not instructional improvements "per se". we will see instructional improvements, of course. although in the same way, i imagine, we observed latencies and differences beetween 45nm and 65nm. but those cores are Pentiums glued together, simply und stop. and my investigation, started on this board more than one year ago, had driven me to see far right one year ago in a public discussion telling Larrabee as the alter-ego of Transmeta. and Larrabee representing Intel trying to atone for what has been its engineering after Pentium. (because better than aliasing MMX on FPU they couldnt really improve the FPU - what i consider a masterpiece of engeneering!) now, 1) the presence of FPU instro back-compatibility on MIC should be read as an hint of what would not change. 2) there is no reason for 300W power nowadays. unless they think it for server-side applications. they say then they want to simplify access to cores (to win the competition against CUDA etc). and this is not bad. but coders use GCC, not assembly nor their own tools; math library included too because not open source. 3) the shared cache is difficoult to program, and i doubt seriously that coders understand/take advantage of things like in this paper http://rolfed.com/nehalem/nehalemPaper.pdf and whenever they understand it, have they such a huge GCC-latency-time to apply it ? in that sense GCC is not trustable. because by "shared cache" and "multicore" computing, design results to be a MUST, not a toolchain option, as they say 4) considering a prefetching strategy doesent give more than 25-30% performances (my personal experience up to SSE 4.1) 5) i didnt hear something from Microsoft about MIC. and the SYSENTER/SYSEXIT instro will be discarded from that set! ergo: i would not hack into that set, nor i will buy one of those machines to test that instro-set. i would rather read reports from users/developers. and this last fact (imho a possible marketing-error of Intel) will mark a destiny for their MIC... again. didnt we learn the Itanium's lesson too ? ok,my opinion. konkret: dont worry about it too much. but consider that your FMA Mandelbrot experiments (whenever i like generally what you write) do not run as i expect on my Quad Core. they run not so smooth as expected. there must be another more efficient strategy for them now, i mean for my quadcore, under SSE 4.1 no VEX, or SSE 4.2 on the contrary, back-buffering +-2,3 zoom factors would be perfectly doable on newer MIC machine, also your occasion/temptation Cheers, _________________ ⠓⠕⠏⠉⠕⠙⠑ |
|||
19 Jul 2012, 11:53 |
|
kalambong 22 Jul 2012, 02:04
So what you are essentially saying is that Intel is not to be trusted, and that anyone who wants to program the new MIC chip will have to go through GCC, which will help Intel to obfuscate the juicy intricacy bits from the masses, that no one get to code bare metal MIC with assembly languages such as FASM?
|
|||
22 Jul 2012, 02:04 |
|
hopcode 22 Jul 2012, 08:49
kalambong wrote: ...you saying Intel is not to be trusted not from the MIC itself, that remain more or less than a concept, an acronym, because there's no commercial release at the moment, as reported from wikipedia. knowing some of the bottlenecks ot the prefetch instructions i was/i am still the first enthusiast of the shared cache, even if on a theorethical basis because i never had a machine to test or implement it. but i do assembly, in fact the question should be addressed to those, the majority, using C/C++ toolchains. could someone factually provide example of designs (open/closed source) of a good advantage of the shared cache using those toolchains, GCC etc ? i would discuss about them. kalambong wrote: ...to program the new MIC chip will have to go through GCC kalambong wrote: ... which will help Intel to obfuscate the juicy intricacy bits from the masses kalambong wrote: no one get to code bare metal MIC with assembly languages such as FASM? also, i agree with Tomasz about that instro-set: only Quote: if there is need for it, but certainly not soon. _________________ ⠓⠕⠏⠉⠕⠙⠑ |
|||
22 Jul 2012, 08:49 |
|
tthsqe 23 Jul 2012, 05:46
hopcode,
are you suggesting improvements to the mandelbrot explorer I posted at http://board.flatassembler.net/topic.php?t=12722? The only thing I regret was the overly complicated way of reloading the SSE vectors, but I think it does reduce the drawing time. Anyways, I was talking about computing really deep zoom (like 200+ decimal digits). You can see some of the ones I have rendered on youtube: http://www.youtube.com/watch?v=v-9siTf8K6c&feature=plcp This was done with fasm to compute and color everything and ffmpeg to compress the bitmaps into a video. Also, in my experience multiprecision with CUDA ptx has about the same price/performance ratio as with a 2600K, so I stuck with the CPU for now. |
|||
23 Jul 2012, 05:46 |
|
hopcode 23 Jul 2012, 07:31
please consider that i am not an expert of gaming/graphics.
what follows is generally my opinion on how the cpu acts. also i may be wrong in this case. ok tthsqe wrote: http://www.youtube.com/watch?v=v-9siTf8K6c&feature=plcp now, tthsqe wrote: the overly complicated way of reloading the SSE vectors - switching ciclically from data->mem->stack is the main bottleneck. - no prefetch strategy - FPU should be totally eliminated. - too much mem-moves in the CUBIC_SSE_Reload - all memory should be homogeneous; also, all stack or all virtual alloc in some way. so at a first glance, i estimate that mem-moves in the CUBIC_SSE_Reload block takes at least ~40% of cycles. comment the rest of the code to time that block of mov/movapd. if it is so, then there is the bottleneck. required is to interleave calculations (just as you did in the .loop block). then do prefetch on no more than 1/2 of required virtual alloced memory. this should be enough. Cheers, _________________ ⠓⠕⠏⠉⠕⠙⠑ |
|||
23 Jul 2012, 07:31 |
|
randall 23 Jul 2012, 11:37
tthsqe, this video is beautiful. Great job.
|
|||
23 Jul 2012, 11:37 |
|
randall 28 Jul 2012, 17:29
Interesting project http://ispc.github.com/
Generally, Intel C compiler with auto vectorization capability. Supports SSE, AVX, AVX2 and Xeon Phi (Knights Corner) instruction set. |
|||
28 Jul 2012, 17:29 |
|
hopcode 29 Jul 2012, 00:22
randall wrote: Interesting project http://ispc.github.com/ thanks for the link. the good one (but for Intel) is that it is BSD. ok. it doesent buy me though. after considering the output code here http://ispc.github.com/mandelbrot.txt there is no effective productive gain, imho. one can read MIC specs once, in one day,and continue using his own toolchain without ispc, producing even better code. then, it makes the code more opaque than by using HLL macros. and those obscure thingy barrier() and syncronizationS too... complex too much, too much verbose descriptions. take this from http://ispc.github.com/ispc.html#uniform-control-flow Quote: Uniform Control Flow Cheers, _________________ ⠓⠕⠏⠉⠕⠙⠑ |
|||
29 Jul 2012, 00:22 |
|
randall 29 Jul 2012, 10:47
Hand written code always will be better. But for most of the programmers who don't want to mess with assembly I think that it is nice tool.
Of course I won't be using this I prefer fasm. |
|||
29 Jul 2012, 10:47 |
|
hopcode 31 Jul 2012, 11:55
hi everybody,
randall wrote: ...programmers who don't want to mess with assembly I think that it is nice tool. well, but received notification just yesterday per email and started digging again. i must admit compilers are very open to this not-yet-active-on-the-market MIC initiative. here a list of them http://openmp.org/wp/openmp-compilers/ now, because ScaleMP, ergo OpenMP, will both ease the transition to MIC, read please from the 1st paras after the image here http://goparallel.sourceforge.net/virtualized-symmetric-multiprocessing-eases-mic-transition/ dont neglect please the fact that new multicore layouts would raise, 100% guaranteed, some considerable-2-huge problems by the management of shared cache, and access to it. but hey!, there is already much in the "guess" of Linus Torvalds in 2009 here http://multicorenz.wordpress.com/2009/03/26/linus-torvalds-patterson-and-different-views-or-different-worlds/ also, if we define for the first time here a "micset" as a middle layout wrapping in a toolchain in order to develop for MIC, relating it to fasm, i think it is more convenient to have it got "as" instructional-opcoding-macros rather then implementing it as new instro-set. and just as the one reported from randall in the link, i think we will see a flourishing bunch of those micsets in a near future, it is but not accidental the fact that my opinion, (i built it myself from crude/raw asm programming the cache) corresponds basically to that of LinusT. above. in all cases, and before integrating one of those micset in the toolchain, consider please this kind of programming as a very special/dedicated one. where performance will be 100% dependent from the micset. final quote from the 2nd above document, ScaleMP on Xeon Phi, Quote: Emulation, of course, will slow the algorithm down somewhat, you were warned. Cheers, _________________ ⠓⠕⠏⠉⠕⠙⠑ |
|||
31 Jul 2012, 11:55 |
|
Alphonso 03 Sep 2012, 17:34
Tomasz Grysztar wrote: Its instruction set is a new variant Umm, that link seems invalid, anyone have a new link? Thanks. |
|||
03 Sep 2012, 17:34 |
|
hopcode 03 Sep 2012, 18:22
Alphonso wrote:
...nothing can elude hopcode's control saved RAR on my website here https://sites.google.com/site/x64lab/327364001EN.rar because it doesent fit the quota on board, Cheers _________________ ⠓⠕⠏⠉⠕⠙⠑ |
|||
03 Sep 2012, 18:22 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.