flat assembler
Message board for the users of flat assembler.
Index
> Main > HLL compilers generate better code than hand written asm? Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next |
Author |
|
AsmGuru62 13 Mar 2015, 12:47
The algorithm change is the only optimization which gives real results.
|
|||
13 Mar 2015, 12:47 |
|
revolution 13 Mar 2015, 14:02
JohnFound wrote: The code B might be optimized for size, instead of for speed. Sometimes the size optimizations are also faster, but not always. |
|||
13 Mar 2015, 14:02 |
|
HaHaAnonymous 13 Mar 2015, 14:49
Quote:
Calm down, it was just a metaphor. D: Quote:
FPC has the so called "Whole program optimization"... Not that it is worth, but it is there. Quote:
I think no. Because optimization (in my opinion) is to make the same algorithm faster. Rewriting a completely different one will require further optimization as well (and chances are minimum to get the maximum possible performance out of it). Quote:
That is the top #1 (in my opinion) bullshit I hear from many people. Why is it bullshit? My opinion is above and below. Quote:
Now that makes sense! Because an algorithm can be implemented in infinite number of ways, you can gain lots of performance just by implementing it the other way. I would try optimizing your existing algorithm implementations first (unless you find out a clearly more efficient one), and if that was not enough... Try creating a better one (if possible at all, it is harder than doing what I said previously) Quote:
It is not simply a change, what if you change to a more inefficient algorithm!? So it is not simply a change, it has to be a "better" change (which is not always possible). Easier said than done. I guess I have reached my bullshit (said by me) quota for today. D: I apologize for any inconveniences I may have caused. |
|||
13 Mar 2015, 14:49 |
|
Tyler 16 Mar 2015, 06:09
I think it really boils down to the fact that asm isn't necessary for the bulk of code that's written. Almost nothing is "hot" anymore, you just churn out thousands of lines of "cold" code that only gets executed once per 100+ ms. At that point, it makes more sense to use a descriptive language or a language that allows code reuse (generic programming, polymorphism, etc.) to cut down on the amount of code written and time spent. Of course you can do polymorphism in assembly. I've done it (in C and I can translate anything in C to asm given some time and a reference book). The problem is that it's so trivial that there's no making it any faster. Either way you go, you're stuck having a vtable. And the vtable is the most expensive part about polymorphism. (Having one or two memory reads with every method call.) What you can't have in assembly is generic programming. (Maybe you can with some really awesome macros???) And generic programming is literally the coolest thing ever.
More on topic though, if someone wants to pick a sorting algorithm to test, I can probably write it in C/C++. It seems unfair, IMO, to compare assembly with Pascal. Not that I know anything about Pascal (too young for that ). It just doesn't have a reputation for being fast, AFAIK. |
|||
16 Mar 2015, 06:09 |
|
Tyler 16 Mar 2015, 06:15
revolution wrote:
Code: for(int i = 0;i < SIZE;i++) for(int j = 0;j < SIZE;j++) array[j * SIZE + i] += 1; // Should swap the i and j, to access memory linearly. GCC has an optimizer that does some pretty cool math to do this sort of optimization. https://gcc.gnu.org/wiki/Graphite General info: http://en.wikipedia.org/wiki/Polytope_model |
|||
16 Mar 2015, 06:15 |
|
HaHaAnonymous 16 Mar 2015, 16:47
Quote:
That is not true at all (or for most cases). As a test case, I implemented an algorithm in C (I am not a C* user), Pascal and "hand written asm". The C variant performed worse of them all and I couldn't make it any faster, do not know if it was because of my lack of experience or anything but that is what happened. The "hand written asm" was a bit faster than Pascal (~1.6 seconds difference for a 250 millions input). I guess it all depends on your experience, as the Pascal compiler (FPC) will do exactly what you wrote while GCC will often catch your stupid code and avoid the unnecessary steps (not inline of functions unless explicitly declared and specified in the command line and known values after the loop are great examples, FPC will generate the loop while GCC will just "hard code" the final result as it is known). Because of this, I think it is harder to write "faster" Pascal code than in C* (GCC). As the Pascal compiler is more "obedient" (or less smart, as you wish). But FPC compiles much faster... A giant project written in Pascal will compile from scratch in under 5 minutes while GCC will take tens of minutes to hours... +the performance difference (if any) is not as catastrophic as many C* users claim (guess that it is because many of them never touched pascal code in their lives). There are other areas where the FPC is better than C*GCC as well, so performance is just 1/10 of the story. +It doesn't really matter as performance critical routines are VERY OFTEN coded in hand written assembly (video/image/audio encoders and general file compression are a great example) as their high level variant are brutally slower. FPC shines at this one as it provides much better ways of incorporating assembly codes in your program (without including object files) while in GCC you have to type every line in a very uncomfortable and non productive way. More info at: http://www.freepascal.org/faq.var I think I reached my bullshit quota for today. Sorry for wasting your time. D: Last edited by HaHaAnonymous on 16 Mar 2015, 21:06; edited 4 times in total |
|||
16 Mar 2015, 16:47 |
|
Tyler 16 Mar 2015, 20:51
GCC is largely a kludge (and is so on purpose, even) so compile times aren't that fair to compare. Clang is a lot better in terms of proper design, but slightly less powerful in terms of optimizations. But it still may be slower than FPC, idk. I wrote (aka translated from wikipedia) the algo for heapsort in C++. It sorted 1,000,000 ints in .15s. I can give you a .inc file (in FASM syntax) with the same data, if you're interested. I understand if you're not. I wouldn't want to write this in assembly, at all.) Here's the code:
To compile: clang++ -std=c++11 -march=native -O3 code.cpp Add "-masm=intel -S" to get assembly. g++ has similar performance, if you have it more readily available. Code: #include <iostream> #include "data.h" using namespace std; void siftDown(int* a, int start, int end){ int root = start; while(root * 2 + 1 <= end){ int child = root * 2 + 1; int swap = root; if(a[swap] < a[child]) swap = child; if(child+1 <= end && a[swap] < a[child+1]) swap = child + 1; if(swap == root) return; else ::swap(a[root], a[swap]); root = swap; } } void heapify(int* a, int count){ int start = (count - 2) / 2; while(start >= 0){ siftDown(a, start, count - 1); start--; } } template<size_t N> void heapsort(int data[N]){ int* a = data; int count = N; heapify(a, count); int end = count - 1; while(end > 0){ swap(a[end], a[0]); end--; siftDown(a, 0, end); } } int main(){ heapsort<sizeof(data) / sizeof(int)>(data); /*for(auto i : data) cout << i << endl;*/ return 0; } ASM: Code: .text .file "heapsort.cpp" .globl _Z8siftDownPiii .align 16, 0x90 .type _Z8siftDownPiii,@function _Z8siftDownPiii: # @_Z8siftDownPiii .cfi_startproc # BB#0: # kill: ESI<def> ESI<kill> RSI<def> lea ecx, dword ptr [rsi + rsi + 1] cmp ecx, edx jg .LBB0_10 # BB#1: # %.lr.ph movsxd rax, esi mov r8d, dword ptr [rdi + 4*rax] .align 16, 0x90 .LBB0_2: # =>This Inner Loop Header: Depth=1 movsxd r9, ecx mov r10d, ecx cmp r8d, dword ptr [rdi + 4*r9] jl .LBB0_4 # BB#3: # %select.mid # in Loop: Header=BB0_2 Depth=1 mov r10d, esi .LBB0_4: # %select.end # in Loop: Header=BB0_2 Depth=1 cmp ecx, edx jge .LBB0_8 # BB#5: # in Loop: Header=BB0_2 Depth=1 inc ecx movsxd rax, r10d mov r9d, dword ptr [rdi + 4*rax] movsxd rax, ecx cmp r9d, dword ptr [rdi + 4*rax] jl .LBB0_7 # BB#6: # %select.mid5 # in Loop: Header=BB0_2 Depth=1 mov ecx, r10d .LBB0_7: # %select.end4 # in Loop: Header=BB0_2 Depth=1 mov r10d, ecx .LBB0_8: # in Loop: Header=BB0_2 Depth=1 cmp r10d, esi je .LBB0_10 # BB#9: # in Loop: Header=BB0_2 Depth=1 movsxd rax, esi movsxd rcx, r10d mov esi, dword ptr [rdi + 4*rcx] mov dword ptr [rdi + 4*rax], esi mov dword ptr [rdi + 4*rcx], r8d lea ecx, dword ptr [r10 + r10 + 1] mov esi, r10d cmp ecx, edx jle .LBB0_2 .LBB0_10: # %._crit_edge ret .Ltmp0: .size _Z8siftDownPiii, .Ltmp0-_Z8siftDownPiii .cfi_endproc .globl _Z7heapifyPii .align 16, 0x90 .type _Z7heapifyPii,@function _Z7heapifyPii: # @_Z7heapifyPii .cfi_startproc # BB#0: # kill: ESI<def> ESI<kill> RSI<def> test esi, esi jle .LBB1_13 # BB#1: # %.lr.ph lea eax, dword ptr [rsi - 2] shr eax, 31 lea eax, dword ptr [rsi + rax - 2] sar eax dec esi movsxd r8, eax .align 16, 0x90 .LBB1_2: # =>This Loop Header: Depth=1 # Child Loop BB1_4 Depth 2 lea eax, dword ptr [r8 + r8 + 1] cmp eax, esi jg .LBB1_12 # BB#3: # %.lr.ph.i # in Loop: Header=BB1_2 Depth=1 mov r9d, dword ptr [rdi + 4*r8] mov r11d, r8d .align 16, 0x90 .LBB1_4: # Parent Loop BB1_2 Depth=1 # => This Inner Loop Header: Depth=2 movsxd r10, eax mov ecx, eax cmp r9d, dword ptr [rdi + 4*r10] jl .LBB1_6 # BB#5: # %select.mid # in Loop: Header=BB1_4 Depth=2 mov ecx, r11d .LBB1_6: # %select.end # in Loop: Header=BB1_4 Depth=2 cmp eax, esi jge .LBB1_10 # BB#7: # in Loop: Header=BB1_4 Depth=2 inc eax movsxd rdx, ecx mov r10d, dword ptr [rdi + 4*rdx] movsxd rdx, eax cmp r10d, dword ptr [rdi + 4*rdx] jl .LBB1_9 # BB#8: # %select.mid3 # in Loop: Header=BB1_4 Depth=2 mov eax, ecx .LBB1_9: # %select.end2 # in Loop: Header=BB1_4 Depth=2 mov ecx, eax .LBB1_10: # in Loop: Header=BB1_4 Depth=2 cmp ecx, r11d je .LBB1_12 # BB#11: # in Loop: Header=BB1_4 Depth=2 movsxd r10, r11d movsxd rdx, ecx mov eax, dword ptr [rdi + 4*rdx] mov dword ptr [rdi + 4*r10], eax mov dword ptr [rdi + 4*rdx], r9d lea eax, dword ptr [rcx + rcx + 1] mov r11d, ecx cmp eax, esi jle .LBB1_4 .LBB1_12: # %_Z8siftDownPiii.exit # in Loop: Header=BB1_2 Depth=1 mov rax, r8 dec rax test r8d, r8d mov r8, rax jg .LBB1_2 .LBB1_13: # %._crit_edge ret .Ltmp1: .size _Z7heapifyPii, .Ltmp1-_Z7heapifyPii .cfi_endproc .globl main .align 16, 0x90 .type main,@function main: # @main .cfi_startproc # BB#0: push rax .Ltmp2: .cfi_def_cfa_offset 16 mov edi, data call _Z8heapsortILm1000000EEvPi xor eax, eax pop rdx ret .Ltmp3: .size main, .Ltmp3-main .cfi_endproc .section .text._Z8heapsortILm1000000EEvPi,"axG",@progbits,_Z8heapsortILm1000000EEvPi,comdat .weak _Z8heapsortILm1000000EEvPi .align 16, 0x90 .type _Z8heapsortILm1000000EEvPi,@function _Z8heapsortILm1000000EEvPi: # @_Z8heapsortILm1000000EEvPi .cfi_startproc # BB#0: mov r8d, 499999 mov r9d, 999999 .align 16, 0x90 .LBB3_1: # =>This Loop Header: Depth=1 # Child Loop BB3_3 Depth 2 lea ecx, dword ptr [r8 + r8 + 1] cmp ecx, 999999 jg .LBB3_11 # BB#2: # %.lr.ph.i.i # in Loop: Header=BB3_1 Depth=1 mov r10d, dword ptr [rdi + 4*r8] mov esi, r8d .align 16, 0x90 .LBB3_3: # Parent Loop BB3_1 Depth=1 # => This Inner Loop Header: Depth=2 movsxd rdx, ecx mov eax, ecx cmp r10d, dword ptr [rdi + 4*rdx] jl .LBB3_5 # BB#4: # %select.mid # in Loop: Header=BB3_3 Depth=2 mov eax, esi .LBB3_5: # %select.end # in Loop: Header=BB3_3 Depth=2 cmp ecx, 999998 jg .LBB3_9 # BB#6: # in Loop: Header=BB3_3 Depth=2 inc ecx movsxd rdx, eax mov r11d, dword ptr [rdi + 4*rdx] movsxd rdx, ecx cmp r11d, dword ptr [rdi + 4*rdx] jl .LBB3_8 # BB#7: # %select.mid4 # in Loop: Header=BB3_3 Depth=2 mov ecx, eax .LBB3_8: # %select.end3 # in Loop: Header=BB3_3 Depth=2 mov eax, ecx .LBB3_9: # in Loop: Header=BB3_3 Depth=2 cmp eax, esi je .LBB3_11 # BB#10: # in Loop: Header=BB3_3 Depth=2 movsxd rcx, esi movsxd rdx, eax mov esi, dword ptr [rdi + 4*rdx] mov dword ptr [rdi + 4*rcx], esi mov dword ptr [rdi + 4*rdx], r10d lea ecx, dword ptr [rax + rax + 1] mov esi, eax cmp ecx, 1000000 jl .LBB3_3 .LBB3_11: # %_Z8siftDownPiii.exit.i # in Loop: Header=BB3_1 Depth=1 mov rax, r8 dec rax test r8d, r8d mov r8, rax jg .LBB3_1 .align 16, 0x90 .LBB3_12: # %_Z7heapifyPii.exit.preheader # =>This Loop Header: Depth=1 # Child Loop BB3_14 Depth 2 mov r8d, dword ptr [rdi + 4*r9] mov eax, dword ptr [rdi] mov dword ptr [rdi + 4*r9], eax mov dword ptr [rdi], r8d mov eax, r9d lea r9, qword ptr [r9 - 1] cmp eax, 1 jle .LBB3_22 # BB#13: # in Loop: Header=BB3_12 Depth=1 mov esi, 1 xor edx, edx .align 16, 0x90 .LBB3_14: # %.lr.ph.i # Parent Loop BB3_12 Depth=1 # => This Inner Loop Header: Depth=2 movsxd rcx, esi mov eax, esi cmp r8d, dword ptr [rdi + 4*rcx] jl .LBB3_16 # BB#15: # %select.mid9 # in Loop: Header=BB3_14 Depth=2 mov eax, edx .LBB3_16: # %select.end8 # in Loop: Header=BB3_14 Depth=2 cmp esi, r9d jge .LBB3_20 # BB#17: # in Loop: Header=BB3_14 Depth=2 inc esi movsxd rcx, eax mov r10d, dword ptr [rdi + 4*rcx] movsxd rcx, esi cmp r10d, dword ptr [rdi + 4*rcx] jl .LBB3_19 # BB#18: # %select.mid11 # in Loop: Header=BB3_14 Depth=2 mov esi, eax .LBB3_19: # %select.end10 # in Loop: Header=BB3_14 Depth=2 mov eax, esi .LBB3_20: # in Loop: Header=BB3_14 Depth=2 cmp eax, edx je .LBB3_22 # BB#21: # in Loop: Header=BB3_14 Depth=2 movsxd rcx, edx movsxd rdx, eax mov esi, dword ptr [rdi + 4*rdx] mov dword ptr [rdi + 4*rcx], esi mov dword ptr [rdi + 4*rdx], r8d lea esi, dword ptr [rax + rax + 1] mov edx, eax cmp esi, r9d jle .LBB3_14 .LBB3_22: # %_Z7heapifyPii.exit.backedge # in Loop: Header=BB3_12 Depth=1 test r9d, r9d jg .LBB3_12 # BB#23: ret .Ltmp4: .size _Z8heapsortILm1000000EEvPi, .Ltmp4-_Z8heapsortILm1000000EEvPi .cfi_endproc .section .text.startup,"ax",@progbits .align 16, 0x90 .type _GLOBAL__sub_I_heapsort.cpp,@function _GLOBAL__sub_I_heapsort.cpp: # @_GLOBAL__sub_I_heapsort.cpp .cfi_startproc # BB#0: push rax .Ltmp5: .cfi_def_cfa_offset 16 mov edi, _ZStL8__ioinit call _ZNSt8ios_base4InitC1Ev mov edi, _ZNSt8ios_base4InitD1Ev mov esi, _ZStL8__ioinit mov edx, __dso_handle pop rax jmp __cxa_atexit # TAILCALL .Ltmp6: .size _GLOBAL__sub_I_heapsort.cpp, .Ltmp6-_GLOBAL__sub_I_heapsort.cpp .cfi_endproc .type _ZStL8__ioinit,@object # @_ZStL8__ioinit .local _ZStL8__ioinit .comm _ZStL8__ioinit,1,1 .type data,@object # @data .data .globl data .align 16 data: ; constants |
|||
16 Mar 2015, 20:51 |
|
HaHaAnonymous 16 Mar 2015, 21:15
Quote:
I am interested. D: |
|||
16 Mar 2015, 21:15 |
|
Tyler 17 Mar 2015, 14:36
It was too big to upload. Here's a link: http://thardin.name/data.inc.
|
|||
17 Mar 2015, 14:36 |
|
HaHaAnonymous 17 Mar 2015, 16:09
Tyler wrote: It was too big to upload. Here's a link: http://thardin.name/data.inc. Quote: error: illegal instruction. Renaming "data" to "data0" worked... If it worked for you then there's a bug somewhere (in FASM). D: +a few tips: 1.In binary the result was just 4MB, +compressed = 3.1MB. Or plain text compressed = 3.6MB. That could save tens of minutes for dial-up users. D: Thanks for the input data! :D I apologize for any inconvenience, if any. |
|||
17 Mar 2015, 16:09 |
|
l_inc 17 Mar 2015, 23:57
Tyler
Quote: I wrote (aka translated from wikipedia) the algo for heapsort in C++ After you mentioned how cool generic programming is I must say, I rarely see such a perverted way of applying it. Aside from making the sorting even less generic than it ever appeared (meaning it's only usable for arrays with the size known at compile time) you'll end up with a replicated copy of the binary code for each individual array you sort provided they have different lengths. As a side note by cleverly inlining the helper functions (heapify and sift) you could achieve about 20% speedup on average. _________________ Faith is a superposition of knowledge and fallacy |
|||
17 Mar 2015, 23:57 |
|
Tyler 18 Mar 2015, 04:38
l_inc wrote: Tyler Now that I look at it, the asm for the heapsort instantiation doesn't contain any calls to the helper functions, so I guess it did do what I was hoping. (Though I didn't check if it propagated the size to them too. Nor did I check if it actually used the known size to do any useful loop transformations/unrolling. That would take too much effort.) Is it dirty, premature optimization? Yes. But I had an intention in doing it. One could change it to: Code: void heapsort(int* data, size_t count){ heapify(a, count); int end = count - 1; while(end > 0){ swap(a[end], a[0]); end--; siftDown(a, 0, end); } } template<size_t N> void heapsort(int data[N]){ heapsort(data, N); } But now that I think about it, I suppose it was a waste of time anyway. At O3, the compiler probably would have inlined the whole heapsort function into main and did all the same constant propogation anyway. |
|||
18 Mar 2015, 04:38 |
|
revolution 18 Mar 2015, 06:30
Tyler wrote: ... hoping maybe the compiler would ... I would guess that it did ... I guess it did do ... and probably still get the same ... My guess is that the ... At O3, the compiler probably would have ... |
|||
18 Mar 2015, 06:30 |
|
Tyler 18 Mar 2015, 10:48
revolution wrote:
The infected compiler thing is interesting, but I already have my fair share of tin foil hat theories, so I think I'll leave that one for you. I'll trust Apple and GCC to do reasonable code review just like I trust every other dev team of software I use. It's a calculated risk, of course, but I think I'll take my chances instead of writing my own OS in assembly. Though I don't think it would be that hard to check for, if you assume that the malicious stub calls at least one syscall, since those are traceable. |
|||
18 Mar 2015, 10:48 |
|
HaHaAnonymous 19 Mar 2015, 00:31
Quote:
I guess I found out why... D: At least in my case as described below: I noticed that when I am coding programs in a "High Level Language" I usually code tens to thousands of lines without testing... The result!? A heap of bugs. D: And when I code in assembly I code in "checkpoints" and the code is highly tested at every checkpoint to ensure everything is correct. Sometimes bugs emerge, of course. But when it does, it was (most of the time) not considered in the "checkpoint" test. D: And I disagree with the following statement (found on the web): Quote:
o_O' I know from experience this is not a myth at all. It really takes much longer... Unless you are using macros... But then that's not "pure" assembly anymore as you are using macros to write very often repetitive code and, sometimes, complete "statements" that resembles more a sophisticated high level language than anything else. D: Again, sorry for any inconvenience (if any). |
|||
19 Mar 2015, 00:31 |
|
revolution 19 Mar 2015, 00:44
Tyler wrote: ... rework the HLL into a form the compiler can work with. Writing-HLL-code-to-match-the-compiler-in-use strikes me as being very similar to writing-assembly-code-to-match-the-CPU-in-use. The concept is the same, only the finer details and execution are changed. |
|||
19 Mar 2015, 00:44 |
|
revolution 19 Mar 2015, 01:31
HaHaAnonymous wrote: And I disagree with the following statement (found on the web): |
|||
19 Mar 2015, 01:31 |
|
l_inc 19 Mar 2015, 02:26
Tyler
Quote: hoping maybe the compiler would do some constant propagation Well, you know this is normally not what one would want from a sorting algorithm, cause the array size is in most cases dynamically retrieved. You are obviously not prohibited from doing useless things on purpose, but that doesn't automatically make those things more useful. Quote: Now that I look at it, the asm for the heapsort instantiation doesn't contain any calls to the helper functions, so I guess it did do what I was hoping. I must admit I did the measurements the first time using x86 and VS2010. To make a clean experiment I allowed myself to spend some time configuring clang, and the results from O3 are following (for 0x800000 elements): Code: HeapSortBad: 2006029 us HeapSort: 1848703 us The second result is my inlined implementation. I passed pointers to HeapSortBad and HeapSort to a measuring function with a noinline attribute, so that there's guaranteed no inlining into the main. I left the array size propagation for your implementation only. So you can see the keyword really was "cleverly". Quote: My guess is that the constant prop, if it leads to inlining and loop unrolling, will more than cancel out the larger code size (considering the whole program could fit in L2 anyway). What do you mean by "canceling out"? Inlining and loop unrolling, which is luckily not done here, would make it even worse. Increased program size, no matter if it fits into L2 or not, significantly correlates with performance degradation because of the increased CPU resources footprint. Even if we forget about L1 (though it matters quite a lot) for such a long running code there will definitely be context switches resulting in cache lines eviction: the larger the cache footprint the more cache misses will follow for both scheduling out and scheduling back. Additionally there are also other resources wasted such as TLB and branch predictor entries: the larger your code the more CPU resources you waste the more significant performance degradation you'll experience. For that reason loop unrolling and inlining increases performance to a certain extent when done carefully, but careless code replication is a sign of sloppy coding. _________________ Faith is a superposition of knowledge and fallacy |
|||
19 Mar 2015, 02:26 |
|
HaHaAnonymous 19 Mar 2015, 21:49
Considering compilers are more efficient at code generation than humans, here is a question:
Just because the compilers are more efficient at code generation it does not mean it is easy or that everyone writes more efficient code using a compiler than assembly language. As an example... Me. I recently came to the conclusion that I fail to write high level code that is "faster" than my assembly coded variant. And unfortunately, it is not because I am an amazing assembly "coder", it is because I am a (much) worse high level coder. D: Given that fact, is it better to write my programs in assembly or high level language!? I guess I will leave my HLL coded programs for secondary use (i.e. when not running on x86+Linux)... D: Defense against possible "bullshits": 1.But it is the whole program performance that matters. Answer: No, it is not. It is the algorithm that finish first that opens the door faster. And having a slower one that will open it "smoother" will not help much! D: Note, the comparison above may be a bit misleading depending on how you interpret it. That's my thought of today! I apologize for any inconveniences, if any. |
|||
19 Mar 2015, 21:49 |
|
Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.