flat assembler
Message board for the users of flat assembler.

Index > Windows > Java faster than ASM!?

Goto page 1, 2, 3, 4, 5  Next
Author
Thread Post new topic Reply to topic
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
What's going on here? I tried a simple speed test for fun to see how much faster ASM in Windows was to Java in Windows, what I found was that Java was faster, a lot faster...how is this possible? Maybe it's Windows Vista? I used a simple counting test.

Anyway here's the code I used in FASM:
Code:
include 'win32ax.inc'

.data
    start_time dd 0
    count dd 00
    _output rb 20
.code

start:
            invoke MessageBox,NULL,"Click Ok to Start","Speed Test",MB_OK
            invoke GetTickCount
            mov [start_time],eax
            place:
            inc [count]
            cmp [count],1000000000
            jl place
            invoke GetTickCount
            sub eax,[start_time]
            invoke wsprintf,_output,"%d milliseconds",eax
            invoke MessageBox,NULL,_output,"Speed Test Finished",MB_OK
            invoke ExitProcess,0
 .end start
    


And here's the code in Java:
Code:
public class SpeedTest {

    public static void main(String[] args) {
               
            double start=0,end=0,time=0;
                System.out.print("Starting Speed Test...");
               start = System.currentTimeMillis();
         int count = 0;
              while (count < 1000000000) {
                        count++;
            }
              end = System.currentTimeMillis();
           System.out.println("..Done");
             time = (end-start);
         System.out.println(time+" milliseconds");
     
        }

}
    


The results are on average 980 milliseconds for Java and 5600 milliseconds for ASM on my Vista Notebook

However when I tried the same test on my slower XP desktop with (512 MB of RAM vs. 2GB RAM for the Vista laptop) I got 2000 milliseconds for Java and 2800 milliseconds for ASM...

But, what's going on here? How is Java faster?
Post 01 Feb 2008, 23:25
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
I expect the java is using JIT and would use a register for the loop counter. Try rewriting the ASM without using the memory access, just use registers.
Post 01 Feb 2008, 23:32
View user's profile Send private message Visit poster's website Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
And are you sure that the while loop is actually executed? Today's compilers are a little more intelligent than before and perhaps this code:
Code:
                int count = 0; 
                while (count < 1000000000) { 
                        count++; 
                } 
    

Got compiled as:
Code:
int count = 1000000000    
Since it produce the same result and much faster.

Try this Java code instead and lets see if there is some difference
Code:
public class SpeedTest { 

        public static void main(String[] args) { 
                 
                double start=0,end=0,time=0; 
                System.out.print("Starting Speed Test..."); 
                start = System.currentTimeMillis(); 
                int count = start - System.currentTimeMillis(); // I'm not a Java programmer so if you have better ideas to make sure that the compiler will not interpret count initializator as constant use it
                while (count < 1000000000) { 
                        count++; 
                } 
                end = System.currentTimeMillis(); 
                System.out.println("..Done"); 
                time = (end-start); 
                System.out.println(time+" milliseconds"); 
             
        } 

}    

Still, perhaps it manages to rewrite the while loop by placing just "count = 1000000000" if Java supports modular arithmetic (i.e. does not throws an overflow exception but just wraps-around).

I think that the JDK comes with a Java disassembler, try to use it to make sure that the while loop is efectivelly present in the bytecode file.
Post 01 Feb 2008, 23:37
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
LocoDelAssembly: I doubt that even java will take 2000ms just to read the timer twice and set a counter to a fixed value once!
Post 01 Feb 2008, 23:51
View user's profile Send private message Visit poster's website Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY
Try doing some math inside the loop like:
(PSEUDO CODE AS I DON't KNOW JAVA)
x = get_random_number();
y= get_random_number();
z=((x*x)+(y*x*4))/2;

This will make sure it's not a constant. I suspect LocoDelAssembly is right about the intelligence of today compilers.
Post 02 Feb 2008, 00:13
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
I was focused on the Vista case actually Razz But yes, I think that not using a register in the assembly code is the real problem. However, it is still true what I've said about compilers and I had problems in the past due to agressive optimizations on even much less trivial code than the while loop above. Today, if you need the compiler produced to produce general case code for real life environment you have to provide one or mimic it as much as you can because if you don't provide inputs that are unpredictable at compile-time then you are risking that the compiler takes advantage of fixed behavior and hence you end up testing the speed of a code that actually will not be the one used in real life.
Post 02 Feb 2008, 00:15
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
(I've just downloaded JDK 1.6)

Code:
C:\Archivos de programa\Java\jdk1.6.0_04\bin>javap -c SpeedTest
Compiled from "SpeedTest.java"
public class SpeedTest extends java.lang.Object{
public SpeedTest();
  Code:
   0:   aload_0
   1:   invokespecial   #1; //Method java/lang/Object."<init>":()V
   4:   return

public static void main(java.lang.String[]);
  Code:
   0:   dconst_0
   1:   dstore_1
   2:   dconst_0
   3:   dstore_3
   4:   dconst_0
   5:   dstore  5
   7:   getstatic       #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   10:  ldc     #3; //String Starting Speed Test...
   12:  invokevirtual   #4; //Method java/io/PrintStream.print:(Ljava/lang/String;)V
   15:  invokestatic    #5; //Method java/lang/System.currentTimeMillis:()J
   18:  l2d
   19:  dstore_1
   20:  iconst_0
   21:  istore  7
   23:  iload   7
   25:  ldc     #6; //int 1000000000
   27:  if_icmpge       36
   30:  iinc    7, 1
   33:  goto    23
   36:  invokestatic    #5; //Method java/lang/System.currentTimeMillis:()J
   39:  l2d
   40:  dstore_3
   41:  getstatic       #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   44:  ldc     #7; //String ..Done
   46:  invokevirtual   #8; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   49:  dload_3
   50:  dload_1
   51:  dsub
   52:  dstore  5
   54:  getstatic       #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   57:  new     #9; //class java/lang/StringBuilder
   60:  dup
   61:  invokespecial   #10; //Method java/lang/StringBuilder."<init>":()V
   64:  dload   5
   66:  invokevirtual   #11; //Method java/lang/StringBuilder.append:(D)Ljava/lang/StringBuilder;
   69:  ldc     #12; //String  milliseconds
   71:  invokevirtual   #13; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   74:  invokevirtual   #14; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   77:  invokevirtual   #8; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   80:  return

}    


OK, the loop is there :P
Post 02 Feb 2008, 00:29
View user's profile Send private message Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
I tried it using registers, replacing it with edx, it seems that memory access was causing the huge slow down

What I got was this time 748-800 milliseconds for ASM, but still around 1000 milliseconds for Java, but then I changed count to long instead of int and it shot up to around 3000 milliseconds

Registers are a lot faster, in fact nearly 7 times faster, but Java is surprisingly fast, just goes to show you that bad code in a fast language may be slower
Post 02 Feb 2008, 00:42
View user's profile Send private message Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY
Using this code:
Code:
include 'win32ax.inc' 

.data 
    start_time dd 0 
    count dd 00 
    _output rb 20 
.code 

start: 
            invoke MessageBox,NULL,"Click Ok to Start","Speed Test",MB_OK 
            invoke GetTickCount 
            mov [start_time],eax 
            place: 
            inc [count] 
            cmp [count],1000000000 
            jl place 
            invoke GetTickCount 
            sub eax,[start_time] 
            invoke wsprintf,_output,"%d milliseconds",eax 
            invoke MessageBox,NULL,_output,"Speed Test Finished",MB_OK 
            invoke ExitProcess,0 
 .end start
    


I get 1900 milliseconds on my ultra-slow CPU! Very Happy
When I optimize to use register like this:
Code:
include 'win32ax.inc'

.data 
    start_time dd 0
    _output rb 20 
.code 

start: 
            invoke MessageBox,NULL,"Click Ok to Start","Speed Test",MB_OK 
            invoke GetTickCount 
            mov [start_time],eax
            xor ecx,ecx
            place: 
            inc ecx
            cmp ecx,1000000000
            jl place 
            invoke GetTickCount 
            sub eax,[start_time] 
            invoke wsprintf,_output,"%d milliseconds",eax 
            invoke MessageBox,NULL,_output,"Speed Test Finished",MB_OK 
            invoke ExitProcess,0 
 .end start 
    


I get just 640 milliseconds! Yes, memory access is slow! One of the compilers optimization is to use registers when possible! Very Happy
Post 02 Feb 2008, 02:10
View user's profile Send private message Reply with quote
asmhack



Joined: 01 Feb 2008
Posts: 431
asmhack
itsnobody wrote:

Code:
            place:
            inc [count]
            cmp [count],1000000000
            jl place
    



what is the point of using assembly if not optimizing and using its benefits ?
code in java then...
use the below code:
Code:
mov eax,1000000000
place:
lea eax,[eax-$1]
test eax,eax
jnz place
    
Post 02 Feb 2008, 02:25
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17278
Location: In your JS exploiting you and your system
revolution
itsnobody wrote:
I changed count to long instead of int and it shot up to around 3000 milliseconds
Not surprising, in java 'long' is 64bit.
Post 02 Feb 2008, 05:03
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2915
Location: [RSP+8*5]
bitRAKE
mov [count],1000000000

Wow, several orders of magnitude faster! Very Happy
Post 02 Feb 2008, 06:28
View user's profile Send private message Visit poster's website Reply with quote
asmfan



Joined: 11 Aug 2006
Posts: 392
Location: Russian
asmfan
Use this one
Code:
mov eax,1000000000
place:
sub eax,1
jnz place
    

Conclusion: Bad written assembly code is the worst thing - it often much worse than compiler's code.

_________________
Any offers?
Post 02 Feb 2008, 06:37
View user's profile Send private message Reply with quote
madmatt



Joined: 07 Oct 2003
Posts: 1045
Location: Michigan, USA
madmatt
And for the c code, you can do this:
int count = 1000000000;
while(count-- > 0);
Post 02 Feb 2008, 08:30
View user's profile Send private message Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
The fastest I've gotten trying all the suggestions is 702 milliseconds, around 300 milliseconds faster than Java

Memory Access must be slower on Vista, because it still takes over 5 seconds
Post 02 Feb 2008, 11:55
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
Re: register vs. memory, it's (supposedly) not even true anymore that registers are faster than memory (according to something I read on Intel's site). Also, you didn't tell us what specific cpu you have. BTW, one thing modern compilers usually do correctly is alignment, so beware.
Post 02 Feb 2008, 17:09
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
note that java JIT compiler may decide to unroll loop
Post 02 Feb 2008, 17:21
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
rugxulo wrote:
Re: register vs. memory, it's (supposedly) not even true anymore that registers are faster than memory (according to something I read on Intel's site). Also, you didn't tell us what specific cpu you have. BTW, one thing modern compilers usually do correctly is alignment, so beware.


Intel is lying or using specific tests which show memory being just as fast....

Because it's still 700 milliseconds (registers) vs. 5000 milliseconds (memory)...more than 7 times slower (the only change is register vs. memory). This is shown consistently over and over again by the tests

I am on a Intel® Pentium® Dual-Core Mobile Processor T2310
Post 02 Feb 2008, 19:19
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo
itsnobody wrote:
rugxulo wrote:
Re: register vs. memory, it's (supposedly) not even true anymore that registers are faster than memory (according to something I read on Intel's site). Also, you didn't tell us what specific cpu you have. BTW, one thing modern compilers usually do correctly is alignment, so beware.


Intel is lying or using specific tests which show memory being just as fast....


I may have misunderstood what they said, but here's the excerpt:

Quote:

Avoid thinking you know what the performance issues are. Processors today are so complex that performance snags can occur in places that even experienced developers would never consider.

Beyond the places already discussed, there are still many pitfalls. Consider, for example, efforts to recode a C function in assembly language. One temptation might be to make more extensive use of the enlarged register offered by the Intel® NetBurst® microarchitecture. This use of registers must be done with extreme care, however – it is no longer true that keeping many items in registers automatically delivers better performance.

Processor performance can be adversely affected by excess register allocation; using too many registers makes it difficult for the processor to move data around for optimal execution sequencing. This situation, known as register pressure, is nearly impossible to detect, except by the intractable diminished execution speed. (For this reason, most C/C++ compilers today ignore the 'register' keyword, which at one time was a command to compilers to place specific variables inside registers.) The Intel® VTune™ Performance Analyzer is one of the few tools that can diagnose register pressure. [man, they can't ever write anything without hype for that blasted VTune, *sigh*, oh well]

The prevalence and impact of cache-stripe errors, excessively tight loops, register pressure, and many similar performance snags suggest that automated hot-spot location is vital to the developer's enterprise. The VTune Analyzer or other good profiling tools are clearly the best means of locating and fixing hot spots, and for quantifying the improvements. Developers who rely only on their experience are likely to spend fruitless hours tweaking the wrong code.
Post 02 Feb 2008, 19:33
View user's profile Send private message Visit poster's website Reply with quote
itsnobody



Joined: 01 Feb 2008
Posts: 93
Location: Silver Spring, MD
itsnobody
rugxulo wrote:
itsnobody wrote:
rugxulo wrote:
Re: register vs. memory, it's (supposedly) not even true anymore that registers are faster than memory (according to something I read on Intel's site). Also, you didn't tell us what specific cpu you have. BTW, one thing modern compilers usually do correctly is alignment, so beware.


Intel is lying or using specific tests which show memory being just as fast....


I may have misunderstood what they said, but here's the excerpt:

Quote:

Avoid thinking you know what the performance issues are. Processors today are so complex that performance snags can occur in places that even experienced developers would never consider.

Beyond the places already discussed, there are still many pitfalls. Consider, for example, efforts to recode a C function in assembly language. One temptation might be to make more extensive use of the enlarged register offered by the Intel® NetBurst® microarchitecture. This use of registers must be done with extreme care, however – it is no longer true that keeping many items in registers automatically delivers better performance.

Processor performance can be adversely affected by excess register allocation; using too many registers makes it difficult for the processor to move data around for optimal execution sequencing. This situation, known as register pressure, is nearly impossible to detect, except by the intractable diminished execution speed. (For this reason, most C/C++ compilers today ignore the 'register' keyword, which at one time was a command to compilers to place specific variables inside registers.) The Intel® VTune™ Performance Analyzer is one of the few tools that can diagnose register pressure. [man, they can't ever write anything without hype for that blasted VTune, *sigh*, oh well]

The prevalence and impact of cache-stripe errors, excessively tight loops, register pressure, and many similar performance snags suggest that automated hot-spot location is vital to the developer's enterprise. The VTune Analyzer or other good profiling tools are clearly the best means of locating and fixing hot spots, and for quantifying the improvements. Developers who rely only on their experience are likely to spend fruitless hours tweaking the wrong code.


Right.

But still on everyone's computer registers are faster in this case...have you ever tried the speed test yourself?

I know Java has some type of advanced memory management system, that could be a reason it's faster also. Ofcourse Intel is right that there are a variation of factors involved.
Post 02 Feb 2008, 19:38
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3, 4, 5  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.