flat assembler
Message board for the users of flat assembler.

Index > High Level Languages > Cmpsb multithread?

Author
Thread Post new topic Reply to topic
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 19 Oct 2020, 17:18
Code:
#include <stdio.h>
#include <thread>
volatile int x;
void p1() {
    int ret;
    while (1) {
        asm("cmpsb;pushf;pop %%rax":"=A"(ret):"D"(&x), "S"(&x));
        if(ret&(1<<6)) { // Can't compile setz
            putchar(' ');
        } else {
            putchar('*');
        }
    }
}
int main() {
    std::thread p(p1);
    while (1) {
        x = 0;
        x = -1;
    }
}    
Running this puts some *.
1. On same core different thread, there're lots of *s; on different core, less *s; on same thread, no *. So core's cache can be updated during an instruction?
2. document only says
Code:
temp ← SRC1 - SRC2;
SetStatusFlags(temp);    
Is it a promise that SRC1 is read first?
Post 19 Oct 2020, 17:18
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4047
Location: vpcmpistri
bitRAKE 19 Oct 2020, 19:02
I built it with clang and didn't get a single "*".
Kind of curious what you're running on?
Running on a Ryzen 2700, atm.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 19 Oct 2020, 19:02
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 19 Oct 2020, 19:22
I'm on i7-4790
Post 19 Oct 2020, 19:22
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4047
Location: vpcmpistri
bitRAKE 19 Oct 2020, 22:59
I didn't think it was possible to split the update but tried to catch a problem with multiple threads. Still nothing.

If this program returns zero on your machine then your results are coming from somewhere else.

Edit: whoops I didn't change anything, lol.

Edit: this one catches it happening - definitely surprised. 17586658 splits out of 15*2^32.


Description:
Download
Filename: cmpsb.zip
Filesize: 2.73 KB
Downloaded: 584 Time(s)


_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 19 Oct 2020, 22:59
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4047
Location: vpcmpistri
bitRAKE 20 Oct 2020, 01:02
I'm sorry about the subsystem version, but I don't have an old linker on this machine. It doesn't use anything fancy - should probably run fine on XP.
Post 20 Oct 2020, 01:02
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 20 Oct 2020, 03:42
What does the addition of lock do?
Post 20 Oct 2020, 03:42
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4047
Location: vpcmpistri
bitRAKE 20 Oct 2020, 05:49
I've added the option to toggle a single byte and added some comments about performance: https://github.com/bitRAKE/fasmg_playground/blob/master/win64.coff/cmpsb_split.g

Adding a lock to the data would obviously prevent the data from changing during another thread access. The downside is that I'd have 15 threads lining up for access to that data and the execution time goes up substantially. The main goal here was to show that the two reads of CMPSB can be split. It's not exactly a practical use-case.

If you're inquiring about the LOCK ADD existing in the code - that is to ensure the counts are correct and not to synchronize anything else.

My processor can hide the collisions if there is other bus traffic - that is why the threads need to be high-priority to force collisions. I'm impressed with how well it does.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 20 Oct 2020, 05:49
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 20 Oct 2020, 06:05
I think that what is really being tested here is the CPU data access patterns.

I see no reason why the CPU should ensure the two reads by cmpsb are protected from another thread's access. It is always the programmers responsibility to ensure threads don't stomp on other's data.
Post 20 Oct 2020, 06:05
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4047
Location: vpcmpistri
bitRAKE 20 Oct 2020, 06:35
I agree. Could even compare it to two basic instructions and see if it has the same profile. Or we could see if all processors perform the two reads in the same order.
Post 20 Oct 2020, 06:35
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 20 Oct 2020, 06:45
bitRAKE wrote:
Adding a lock to the data would obviously prevent the data from changing during another thread access.
I meant to try lock cmpsb.
Post 20 Oct 2020, 06:45
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4047
Location: vpcmpistri
bitRAKE 20 Oct 2020, 07:17
Illegal instruction. Razz

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 20 Oct 2020, 07:17
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 20 Oct 2020, 16:42
Changed changer loop to
Code:
lock incl (x)    
CY is always 0 unless incl carry.

PS. Why High Level Lang? C++ part only serve putchar and thread
Post 20 Oct 2020, 16:42
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20363
Location: In your JS exploiting you and your system
revolution 20 Oct 2020, 23:13
l4m2 wrote:
PS. Why High Level Lang? C++ part only serve putchar and thread
Because we need an HLL to compile your code. It doesn't involve fasm, so it doesn't belong in Main where you posted it originally.
Post 20 Oct 2020, 23:13
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.