flat assembler
Message board for the users of flat assembler.

Index > Windows > Where is the load barrier for the volatile statement?

Author
Thread Post new topic Reply to topic
salilsurendran



Joined: 02 Jan 2015
Posts: 4
salilsurendran 05 Jan 2015, 15:17
I wrote this simple Java program:
Code:
package com.salil.threads;

public class IncrementClass {

    static volatile int j = 0;
    static int i = 0;

    public static void main(String args[]) {

        for(int a=0;a<1000000;a++);
        i++;
        j++;            
    }       
}
    

This generate the following disassembled code for i++ and j++ (remaining disassembled code removed):
Code:
  0x0000000002961a6c: 49ba98e8d0d507000000 mov       r10,7d5d0e898h
                                                ;   {oop(a 'java/lang/Class' = 'com/salil/threads/IncrementClass')}
  0x0000000002961a76: 41ff4274            inc       dword ptr [r10+74h]
                                                ;*if_icmpge
                                                ; - com.salil.threads.IncrementClass::main@5 (line 10)
  0x0000000002961a7a: 458b5a70            mov       r11d,dword ptr [r10+70h]
  0x0000000002961a7e: 41ffc3              inc       r11d
  0x0000000002961a81: 45895a70            mov       dword ptr [r10+70h],r11d
  0x0000000002961a85: f083042400          lock add  dword ptr [rsp],0h
                                                ;*putstatic j
                                                ; - com.salil.threads.IncrementClass::main@27 (line 14)
    

This is what I understand about the following assembly code:

mov r10,7d5d0e898h : Moves the pointer to the IncrementClass.class to register r10
inc dword ptr [r10+74h] : Increments the 4 byte value at the address at [r10 + 74h],(i.e. i)
mov r11d,dword ptr [r10+70h] :Moves the 4 value value at the address [r10 + 70h] to register r11d (i.e move value of j to r11d)
inc r11d : Increment r11d
mov dword ptr [r10+70h],r11d : write value of r11d to [r10 + 70h] so it is visible to other threads -lock add dword ptr [rsp],0h : lock the memory address represented by the stack pointer rsp and add 0 to it.
JMM states that before each volatile read there must be a load memory barrier and after every volatile write there must be a store barrier. My question is:

Why isn't there a load barrier before the read of j into r11d?
How does the lock and add to rsp ensure the value of j in r11d is propogated back to main memory. All I read from the intel specs is that lock provides the cpu with an exclusive lock on the specified memory address for the duration of the operation. Why is i++ not atomic or thread safe. If you look at the instruction incrementing i "inc dword ptr [r10+74h]" this should directly write to memory and every other thread should be able to see this value. From what I understand when the CPU writes to memory as above this value is cached in the cache line and doesn't go all the way to memory and so an explicit instruction is needed for it to write to memory. Which I believe is the LOCK statement but how does a LOCK on the stack pointer ensure the value in the cache gets written to memory[/code]
Post 05 Jan 2015, 15:17
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20300
Location: In your JS exploiting you and your system
revolution 05 Jan 2015, 15:24
Volatile means that the value must be in memory, i.e not in a register. Volatile provides no guarantee that it is thread safe. If you need thread safe then use a mutex, a critical section or a lock.
Post 05 Jan 2015, 15:24
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20300
Location: In your JS exploiting you and your system
revolution 05 Jan 2015, 15:26
About the cache thing. The cache coherency protocol ensure that all other cores will see the new value.
Post 05 Jan 2015, 15:26
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20300
Location: In your JS exploiting you and your system
revolution 05 Jan 2015, 15:27
And memory barriers are more designed for transactions rather than thread safety.
Post 05 Jan 2015, 15:27
View user's profile Send private message Visit poster's website Reply with quote
salilsurendran



Joined: 02 Jan 2015
Posts: 4
salilsurendran 05 Jan 2015, 15:27
Yes I understand that volatile is not thread safe or atomic. I am asking as to how does the above assembly code bypass the cache and ensure that the volatile variable is written to memory. Then why isn't the value of i not seen by other thread?
Post 05 Jan 2015, 15:27
View user's profile Send private message Reply with quote
salilsurendran



Joined: 02 Jan 2015
Posts: 4
salilsurendran 05 Jan 2015, 15:32
if the cache coherence policy will make the value of j to be seen by other threads will the value of i also be seen by other threads which is being incremented by this statement "inc dword ptr [r10+74h] ". Also what is the LOCK statement doing? What is it's role in making sure that the volatile write gets propogated to memory?
Post 05 Jan 2015, 15:32
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20300
Location: In your JS exploiting you and your system
revolution 05 Jan 2015, 15:50
Volatile just means to always read and write a new value to/from memory. That is, don't keep it in a register. Originally it was used for I/O type memory that needs to see all changes (like a comms data port for example).
Post 05 Jan 2015, 15:50
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20300
Location: In your JS exploiting you and your system
revolution 05 Jan 2015, 15:59
I reread your Qs and perhaps your confusion is because you are conflating caches with memory and idempotentcy. Volatile tells the compiler that the memory is not idempotent. The involvement of caches in no way affects the results of the program unless the caches have been configured incorrectly by the BIOS/OS on inappropriate memory (unlikely).
Post 05 Jan 2015, 15:59
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.