flat assembler
Message board for the users of flat assembler.

Index > Main > Clear top byte of 64bit GPR?

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 08:42
Is there a faster way than "shl GPR,8 shr GPR,8"? I'm used to be being able to use and for this but it doesn't work in 64bit mode so I need to use something else..


Also, how do I do comparisons in 64bit mode? I'm used to using cmp, but it doesn't work in 64bit mode (e.g. cmp rax,"abcdefgh" won't even compile).. Confused
Post 28 Aug 2009, 08:42
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 28 Aug 2009, 08:56
Code:
x dq 00ffffffffffffffh
...
and rax,[x]
...
    

Like this?
Post 28 Aug 2009, 08:56
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 09:01
Thanks.. it will be much slower when x isn't in L1, though. Is there a way that is faster across the board?


Also, what about for working on a memory location? Can only have 1 memory arg..
Post 28 Aug 2009, 09:01
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 28 Aug 2009, 09:20
>Thanks.. it will be much slower when x isn't in L1, though. Is there a way that is faster across the board?
Code:
jmp @f
x dq 00ffffffffffffffh
@@: and rax,[x]
    


>Also, what about for working on a memory location? Can only have 1 memory arg..
Code:
x rq 1
...
mov byte[x+7],0 ;and [x],00ffffffffffffffh
    
Post 28 Aug 2009, 09:20
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 09:24
sinsi wrote:
>Thanks.. it will be much slower when x isn't in L1, though. Is there a way that is faster across the board?
Code:
jmp @f
x dq 00ffffffffffffffh
@@: and rax,[x]
    
A branch and memory operation are faster than two shifts?

sinsi wrote:
>Also, what about for working on a memory location? Can only have 1 memory arg..
Code:
x rq 1
...
mov byte[x+7],0 ;and [x],00ffffffffffffffh
    
Thanks, that works really good for the example I gave.


But.. what about other masks, like 01ffffffffffffffh?
Post 28 Aug 2009, 09:24
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
neville



Joined: 13 Jul 2008
Posts: 507
Location: New Zealand
neville 28 Aug 2009, 09:32
Quote:
A branch and memory operation are faster than two shifts?

I guess sinsi is saying that "embedding" the data within the code ensures it will be in L1 cache.

_________________
FAMOS - the first memory operating system
Post 28 Aug 2009, 09:32
View user's profile Send private message Visit poster's website Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 28 Aug 2009, 09:39
>I guess sinsi is saying that "embedding" the data within the code ensures it will be in L1 cache.
Yeah, I was trying to be clever...

Shifting might be good if you put another instruction between the two shifts.

>But.. what about other masks, like 01ffffffffffffffh?
and byte[x+7],1 - if it's memory just treat it as 8 bytes.
Post 28 Aug 2009, 09:39
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 10:09
Thanks guys.


One more question;
Why doesn't shrd rdx,rax,72 work the same as mov rdx,rax and rdx,qword[mem00FFFFFFFFFFFFFF]?
Post 28 Aug 2009, 10:09
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 28 Aug 2009, 11:02
A shift is limited - with 'shrd rdx,rax,72', the shift count is '72 and 63'.
Post 28 Aug 2009, 11:02
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 11:06
But isn't the imm 8 bits? It should go up to 255 then.. (or 127 if it's signed)
Post 28 Aug 2009, 11:06
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 28 Aug 2009, 11:13
32-bit mask is 5 bits (00011111), 64-bit is 6 bits (00111111). Even CL is masked (except with an 8088?)
Post 28 Aug 2009, 11:13
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 11:15
Confused
So what are the other bits used for?
Post 28 Aug 2009, 11:15
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 28 Aug 2009, 11:30
Those masks are applied by the CPU, not a compiler.
Think of 'shrd rdx,rax,64' as 'mov rdx,rax' and you'll get the idea.
What is the point of shifting a register beyond its size?

Maybe you are thinking of a rotation rather than a shift?
Post 28 Aug 2009, 11:30
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 11:45
Because it should be faster than
Code:
mov  rdx,rax
and  rdx,qword[mem00FFFFFFFFFFFFFF]    






edit: Er.. actually nevermind. Even if it worked it wouldn't do the same thing. Embarassed
I'd need a shl after it.. and then they are both two instructions anyways..
Post 28 Aug 2009, 11:45
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
sinsi



Joined: 10 Aug 2007
Posts: 794
Location: Adelaide
sinsi 28 Aug 2009, 11:57
There's probably a real easy way to do it with ss*e**
Post 28 Aug 2009, 11:57
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8359
Location: Kraków, Poland
Tomasz Grysztar 28 Aug 2009, 12:08
Azu wrote:
Because it should be faster than
Code:
mov  rdx,rax
and  rdx,qword[mem00FFFFFFFFFFFFFF]    

And why not
Code:
mov rdx,00FFFFFFFFFFFFFFh
and rdx,rax    
?
Post 28 Aug 2009, 12:08
View user's profile Send private message Visit poster's website Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 12:14
Basically I want to compare a bunch of strings of different lengths.


e.g.

Code:
     mov     rax,qword[memory]
   cmp     rax,"abcdefgh"
    mov     rdx,rax
je   match1
      cmp     rax,"12345678"
je  match2
      and     rdx,$00FFFFFFFFFFFFFFF
      cmp     rdx,"qwertyu"
je   match3
      and     rdx,$0000FFFFFFFFFFFFFF
     cmp     rdx,"barfoo"
je    match4
      cmp     rdx,"foobar"
je    match5    



I guess I should have just outlined the whole thing from the beginning. Sorry about that.
Post 28 Aug 2009, 12:14
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4624
Location: Argentina
LocoDelAssembly 28 Aug 2009, 15:18
Code:
        mov     rax,qword[memory]

        cmp     rax, [abcdefgh]
        mov     rdx,rax
je      match1

        cmp     rax, [_12345678]
je      match2

        shl     rdx, 8
        cmp     rdx, [qwertyu]
je      match3

        shl     rdx, 8
        cmp     rdx, [barfoo]
je      match4

        cmp     rdx, [foobar]
je      match5

align 64 ; AMD's cache line size
abcdefgh  dq "abcdefgh"
_12345678 dq "12345678"
qwertyu   dq "qwertyu" shl 8
barfoo    dq "barfoo" shl 16
foobar    dq "foobar" shl 16
    


But probably this will be slower if used many times. Try to compare what happens when a string table is used and what when the strings are moved to a register via mov reg, imm64 first.
Post 28 Aug 2009, 15:18
View user's profile Send private message Reply with quote
Azu



Joined: 16 Dec 2008
Posts: 1159
Azu 28 Aug 2009, 15:27
Thanks.. is there a way to automatically define constants for that?

Something like
Code:
blah    = place
macro        autodefine const{
if ~ defined const
        virtual at blah
             label   const const.size at $
               if      const.size eq byte
                  db `const
           elseif const.size eq word
                   dw `const
           elseif const.size eq dword
                  dd `const
           elseif const.size eq qword
                  dq `const
           elseif const.size eq dqword
                 ddq `const
          end
 end virtual
 blah = blah + const.size
end if
}

cmp rax,autodefine(qword[abcdefgh])

align    64
place:    
???

It would save much time, I think.
Post 28 Aug 2009, 15:27
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger ICQ Number Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 28 Aug 2009, 15:39
No that's bad size-wise and maybe performance-wise (caching), you should use a loop instead and have a "data structure" with strings & offsets where to jump to. (depending how many strings you have -- if you plan to have many, it's certainly NOT a good idea).

Also putting data in between code is kinda nasty (read: slow) for micro-ops and caching.

Example of defining such a data structure from one of my programs that use it. Of course, if the strings are not variable-length (like in your case) you won't even NEED the prefix-size for the string (you can use null-terminators too if you want).

Code:
Strings:
irps arg, string1 string2 blah whatever
{
  local str, strsize
  db strsize
  str db `arg
  strsize = $-str

  dd JumpLabel_#arg
}
db 0  ; terminator (so we know when the data structure ended)    


The format is:
Code:
<Length Of String><String><4-bytes Jump Label Address for string>    
(if the length is 0 it means it's the end)

Then you define labels like:
Code:
JumpLabel_String1:
; String 1 code    
and such. There's nothing magical about it, I hope you get the point.

Note that this is for 32-bit but you get the idea (I think it should work in 64-bit no problem).

Then you'll need to loop through this string. I'm not sure about 64-bit, but in 32-bit you can easily do this with string operations like cmpsd or cmpsb (maybe with a rep prefix).
Post 28 Aug 2009, 15:39
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.