flat assembler
Message board for the users of flat assembler.
  
       
      Index
      > Main > Clear top byte of 64bit GPR?Goto page 1, 2, 3 Next  | 
  
| Author | 
  | 
              
| 
                  
                   sinsi 28 Aug 2009, 08:56 
                  Code: x dq 00ffffffffffffffh ... and rax,[x] ... Like this?  | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 09:01 
                  Thanks.. it will be much slower when x isn't in L1, though. Is there a way that is faster across the board?
 
                  
                Also, what about for working on a memory location? Can only have 1 memory arg..  | 
              |||
                  
  | 
              
| 
                  
                   sinsi 28 Aug 2009, 09:20 
                  >Thanks.. it will be much slower when x isn't in L1, though. Is there a way that is faster across the board? 
 
                  
                Code: jmp @f x dq 00ffffffffffffffh @@: and rax,[x] >Also, what about for working on a memory location? Can only have 1 memory arg.. Code: x rq 1 ... mov byte[x+7],0 ;and [x],00ffffffffffffffh  | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 09:24 
                  sinsi wrote: >Thanks.. it will be much slower when x isn't in L1, though. Is there a way that is faster across the board? sinsi wrote: >Also, what about for working on a memory location? Can only have 1 memory arg.. But.. what about other masks, like 01ffffffffffffffh?  | 
              |||
                  
  | 
              
| 
                  
                   neville 28 Aug 2009, 09:32 
                  Quote: A branch and memory operation are faster than two shifts? I guess sinsi is saying that "embedding" the data within the code ensures it will be in L1 cache. _________________ FAMOS - the first memory operating system  | 
              |||
                  
  | 
              
| 
                  
                   sinsi 28 Aug 2009, 09:39 
                  >I guess sinsi is saying that "embedding" the data within the code ensures it will be in L1 cache.
 
                  
                Yeah, I was trying to be clever... Shifting might be good if you put another instruction between the two shifts. >But.. what about other masks, like 01ffffffffffffffh? and byte[x+7],1 - if it's memory just treat it as 8 bytes.  | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 10:09 
                  Thanks guys.
 
                  
                One more question; Why doesn't shrd rdx,rax,72 work the same as mov rdx,rax and rdx,qword[mem00FFFFFFFFFFFFFF]?  | 
              |||
                  
  | 
              
| 
                  
                   sinsi 28 Aug 2009, 11:02 
                  A shift is limited - with 'shrd rdx,rax,72', the shift count is '72 and 63'. 
                  
                 | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 11:06 
                  But isn't the imm 8 bits? It should go up to 255 then.. (or 127 if it's signed) 
                  
                 | 
              |||
                  
  | 
              
| 
                  
                   sinsi 28 Aug 2009, 11:13 
                  32-bit mask is 5 bits (00011111), 64-bit is 6 bits (00111111). Even CL is masked (except with an 8088?) 
                  
                 | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 11:15 
                  So what are the other bits used for?  | 
              |||
                  
  | 
              
| 
                  
                   sinsi 28 Aug 2009, 11:30 
                  Those masks are applied by the CPU, not a compiler.
 
                  
                Think of 'shrd rdx,rax,64' as 'mov rdx,rax' and you'll get the idea. What is the point of shifting a register beyond its size? Maybe you are thinking of a rotation rather than a shift?  | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 11:45 
                  Because it should be faster than
 
                  
                Code: mov rdx,rax and rdx,qword[mem00FFFFFFFFFFFFFF] edit: Er.. actually nevermind. Even if it worked it wouldn't do the same thing. I'd need a shl after it.. and then they are both two instructions anyways..  | 
              |||
                  
  | 
              
| 
                  
                   sinsi 28 Aug 2009, 11:57 
                  There's probably a real easy way to do it with ss*e** 
                  
                 | 
              |||
                  
  | 
              
| 
                  
                   Tomasz Grysztar 28 Aug 2009, 12:08 
                  Azu wrote: Because it should be faster than And why not Code: mov rdx,00FFFFFFFFFFFFFFh and rdx,rax  | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 12:14 
                  Basically I want to compare a bunch of strings of different lengths.
 
                  
                e.g. Code: mov rax,qword[memory] cmp rax,"abcdefgh" mov rdx,rax je match1 cmp rax,"12345678" je match2 and rdx,$00FFFFFFFFFFFFFFF cmp rdx,"qwertyu" je match3 and rdx,$0000FFFFFFFFFFFFFF cmp rdx,"barfoo" je match4 cmp rdx,"foobar" je match5 I guess I should have just outlined the whole thing from the beginning. Sorry about that.  | 
              |||
                  
  | 
              
| 
                  
                   LocoDelAssembly 28 Aug 2009, 15:18 
                  Code: mov rax,qword[memory] cmp rax, [abcdefgh] mov rdx,rax je match1 cmp rax, [_12345678] je match2 shl rdx, 8 cmp rdx, [qwertyu] je match3 shl rdx, 8 cmp rdx, [barfoo] je match4 cmp rdx, [foobar] je match5 align 64 ; AMD's cache line size abcdefgh dq "abcdefgh" _12345678 dq "12345678" qwertyu dq "qwertyu" shl 8 barfoo dq "barfoo" shl 16 foobar dq "foobar" shl 16 But probably this will be slower if used many times. Try to compare what happens when a string table is used and what when the strings are moved to a register via mov reg, imm64 first.  | 
              |||
                  
  | 
              
| 
                  
                   Azu 28 Aug 2009, 15:27 
                  Thanks.. is there a way to automatically define constants for that?
 
                  
                Something like Code: blah = place macro autodefine const{ if ~ defined const virtual at blah label const const.size at $ if const.size eq byte db `const elseif const.size eq word dw `const elseif const.size eq dword dd `const elseif const.size eq qword dq `const elseif const.size eq dqword ddq `const end end virtual blah = blah + const.size end if } cmp rax,autodefine(qword[abcdefgh]) align 64 place: It would save much time, I think.  | 
              |||
                  
  | 
              
| 
                  
                   Borsuc 28 Aug 2009, 15:39 
                  No that's bad size-wise and maybe performance-wise (caching), you should use a loop instead and have a "data structure" with strings & offsets where to jump to. (depending how many strings you have -- if you plan to have many, it's certainly NOT a good idea).
 
                  
                Also putting data in between code is kinda nasty (read: slow) for micro-ops and caching. Example of defining such a data structure from one of my programs that use it. Of course, if the strings are not variable-length (like in your case) you won't even NEED the prefix-size for the string (you can use null-terminators too if you want). Code: Strings: irps arg, string1 string2 blah whatever { local str, strsize db strsize str db `arg strsize = $-str dd JumpLabel_#arg } db 0 ; terminator (so we know when the data structure ended) The format is: Code: <Length Of String><String><4-bytes Jump Label Address for string> Then you define labels like: Code: JumpLabel_String1: ; String 1 code Note that this is for 32-bit but you get the idea (I think it should work in 64-bit no problem). Then you'll need to loop through this string. I'm not sure about 64-bit, but in 32-bit you can easily do this with string operations like cmpsd or cmpsb (maybe with a rep prefix).  | 
              |||
                  
  | 
              
| Goto page 1, 2, 3  Next < Last Thread | Next Thread >  | 
    
Forum Rules: 
  | 
    
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.