flat assembler
Message board for the users of flat assembler.

Index > Tutorials and Examples > Processor Description String

Author
Thread Post new topic Reply to topic
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 14 Jul 2013, 02:33
This snippet returns 48 byte processor description string on stack.

Code:
        mov     eax, 0x80000004

   @@:
        push    eax
        cpuid
        xchg    edx, [esp]
        push    ecx
        push    ebx
        push    eax
        cmp      dl, 2
        jz       @F

        mov     eax, edx
        dec      al
        jmp      @B

   @@:
    
Output is padded with spaces, so some sort of method to backtrace and insert a NULL would need to be coded here.
Post 14 Jul 2013, 02:33
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4249
Location: vpcmpistri
bitRAKE 14 Jul 2013, 11:31
Code:
    use32

    mov eax,$80000005
@@: dec eax
    push eax
    cpuid
    xchg edx,[esp]
    push ecx ebx eax
    xchg eax,edx
    cmp al,2
    jnz @B    

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 14 Jul 2013, 11:31
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 14 Jul 2013, 16:40
Original 16 bit version.
Code:
00  66B804000080      mov eax,0x80000004
06  6650              push eax
08  0FA2              cpuid
0A  6766871424        xchg edx,[esp]
0F  6651              push ecx
11  6653              push ebx
13  6650              push eax
15  80FA02            cmp dl,0x2
18  7407              jz 0x21
1A  6689D0            mov eax,edx
1D  FEC8              dec al
1F  EBE5              jmp short 0x6

21 = 33 Bytes
    

Improved 16 bit version. In my case I'll need to use this version because boot loader has just passed control to here and still in REAL mode.
Code:
00  66B805000080      mov eax,0x80000005
06  6648              dec eax
08  6650              push eax
0A  0FA2              cpuid
0C  6766871424        xchg edx,[esp]
11  6651              push ecx
13  6653              push ebx
15  6650              push eax
17  6692              xchg eax,edx
19  3C02              cmp al,0x2
1B  75E9              jnz 0x6

1D = 30 Bytes
    


32 bit version as per bitRAKE's example.
Code:
00  B805000080        mov eax,0x80000005
05  48                dec eax
06  50                push eax
07  0FA2              cpuid
09  871424            xchg edx,[esp]
0C  51                push ecx
0D  53                push ebx
0E  50                push eax
0F  92                xchg eax,edx
10  3C02              cmp al,0x2
12  75F1              jnz 0x5

14 = 20 Bytes
    


and 64 bit version
Code:
00  B805000080        mov eax,0x80000005
05  FFC8              dec eax
07  50                push rax
08  0FA2              cpuid
0A  67871424          xchg edx,[esp]
0E  51                push rcx
0F  53                push rbx
10  50                push rax
11  92                xchg eax,edx
12  3C02              cmp al,0x2
14  75EF              jnz 0x5

16 = 22 Bytres
    
Post 14 Jul 2013, 16:40
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4249
Location: vpcmpistri
bitRAKE 14 Jul 2013, 23:07
The default address size for use64 is 64-bit, so a byte is saved with "xchg edx,[rsp]". Also, the string is no longer contiguous in memory. Maybe something like,
Code:
    use64
    mov eax,$80000005
@@: dec eax
    push rax
    cpuid
    xchg ecx,[rsp]
    mov [rsp+4],edx
    push rax
    mov [rsp+4],ebx

    xchg eax,ecx
    cmp al,2
    jnz @B    
(27 bytes)...yet, it depends on what the display algorithm is expecting. Just like the 16/32-bit versions could use PUSHAD to save some code bytes, but quadruple the stack usage -- if the display algorithm were changed to account for this, and the stack space is available.
Code:
    use16
    mov ax,(4+1)*2 + 1
@@:
    dec ax
    dec ax ; sub al,2
    push ax
      cwde
      ror eax,1
      cpuid
    pop bp
    pushad
    xchg ax,bp
    cmp al,2
    jnz @B    
21 bytes, and shrinking... Razz

Oddly, in 16-bit mode, "dec ax" can be used without effecting the upper word, and is one byte shorter. Also, the addressing mode is 16-bit -- "xchg edx,[sp]" saves a byte.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 14 Jul 2013, 23:07
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 15 Jul 2013, 02:22
Yes, I did realize my error for 64 bit, even before I read your message.

I tried one simple mod, that added 4 to RSP after each push, but that failed miserably. Don't know if this is the most efficient way, but it works.

Code:
        mov     eax, 0x80000005  
                                 
   @@:  dec     eax              
        push    rax              
        cpuid                    
        shl     rdx, 32          
        add     rdx, rcx         
        xchg    [rsp], rdx       
        shl     rbx, 32          
        add     rbx, rax         
        push    rbx              
        xchg    rdx, rax         
        cmp      al, 2           
        jmp      @B              
    


Result:

Quote:

6FEF0 - 20 20 20 20 20 20 20 49 6E 74 65 6C 28 52 29 20 Intel(R)
6FF00 - 43 6F 72 65 28 54 4D 29 20 69 35 2D 33 35 37 30 Core(TM) i5-3570
6FF10 - 4B 20 43 50 55 20 40 20 33 2E 34 30 47 48 7A 00 K CPU @ 3.4GHz.
Post 15 Jul 2013, 02:22
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 15 Jul 2013, 03:39
64 Bit version of yours @ 27 bytes worked great.
bitRAKE wrote:
21 bytes, and shrinking... Razz

Interesting concept and I particularly like the preamble. Demonstrates how an acute knowledge of architecture can lead to creative ideas. Unfortunately, it didn't work.
Code:
        mov     ax,(4+1)*2 + 1
@@: 
        dec     ax
        dec     ax ; sub al,2
        push    ax
        cwde
        ror     eax, 1
        cpuid
        pop     bp
        pushad
        add     sp, 16          ; Need to omit extraneous registers
        xchg    ax, bp
        cmp     al,5            ; AX saved before ROR.
        jnz     @B
    


Even this version still doesn't produce desired result due to the way pushad saves registers.

l(R)e(TM CorInte CPU ) i5 @ GHz2.67 750

when it is supposed to look like this

Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz

This code yields the desired result, but at a saving of 2 more bytes over the best previous 16 bit version of mine being 30 bytes.
Code:
        mov     ax,(4+1)*2 + 1
@@: 
        dec     ax
        dec     ax ; sub al,2
        push    ax
        cwde
        ror     eax, 1
        cpuid
        pop     di
        push    edx
        push    ecx
        push    ebx
        push    eax
        push    di
        pop     ax
        cmp     al,5            ; AX saved before ROR.
        jnz     @B
    


NOTE: The difference in strings 16 vs 64 is that Bochs is used for 16 bit testing and FDBG is used for 64, hence emulated versus real.


Last edited by TightCoderEx on 15 Jul 2013, 03:48; edited 1 time in total
Post 15 Jul 2013, 03:39
View user's profile Send private message Visit poster's website Reply with quote
BAiC



Joined: 22 Mar 2011
Posts: 272
Location: California
BAiC 15 Jul 2013, 03:40
you don't need to use the stack for the loop variable. simply use rsi, rdi, or even rbp as the loop variable (or any of the number registers):
Code:
        mov     esi, 0x80000005
                                 
   @@:
        dec  si
        mov eax, esi
        cpuid                    
        shl     rdx, 32       
        shl     rbx, 32             
        add     rdx, rcx         
        add     rbx, rax
        push    rdx
        push    rbx              
        cmp     si, 2
        jnz       @B    

_________________
byte me.
Post 15 Jul 2013, 03:40
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4249
Location: vpcmpistri
bitRAKE 15 Jul 2013, 03:56
TightCoderEx wrote:
Unfortunately, it didn't work.
I was quite explicit about the display routine needed to account for the lack of organization in the data. We haven't defined what the display function is.

XCHG AX,DI is only one byte.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 15 Jul 2013, 03:56
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 15 Jul 2013, 05:14
bitRAKE wrote:
I was quite explicit about the display routine needed to account for the lack of organization in the data.
True, but it seems counter intuitive to construct anything other than an ASCIIZ string. It's much more efficient to modify this code, rather than construct a specialized display alog to compensate.

I guess I should have been explicit, but the failure part would have come from the processor throwing an exception when 7FFFFFF would have been passed to CPUID. This is what would have happened with the three extra iteration by comparing AL with 2 rather than 5.

I never tested the exception theory, as a valuable tool as Bochs is, it's still an emulator and may not mimic real hardware, or at least the hardware I have.
Post 15 Jul 2013, 05:14
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4249
Location: vpcmpistri
bitRAKE 15 Jul 2013, 05:43
That's what I get for not testing the code. DOSBox is the only thing I have on this machine, atm. Thank you for explaining the error further.
Post 15 Jul 2013, 05:43
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 15 Jul 2013, 23:11
Since there's no clearly defined specification for the snippet, I've made it using esi:
Code:
_cpuid: mov     esi, 0x80000003 ; 5
.prev:  lea     eax, [esi+1]    ; 3
        cpuid                   ; 2
        push    edx             ; 1
        push    ecx             ; 1
        push    ebx             ; 1
        push    eax             ; 1
        dec     esi             ; 1
        jpo     .prev           ; 2    
17 bytes in 32-bit mode. As a bonus, I present snippet to squeeze extra spaces from that brand string:
Code:
_squeeze:
        mov     esi, esp; 2
        mov     edi, esp; 2
        xor     ecx, ecx; 2; reset flag: don't copy
.next:  lodsb           ; 1
        test    al, al  ; 2
        jz      .done   ; 2; almost done if NUL
        cmp     al, ' ' ; 2
        jecxz   .check  ; 2; don't copy?
.copy:  stosb           ; 1; copy
        jne     .next   ; 2; repeat if not ' '
        not     ecx     ; 2; ' ' copied, reset flag
.check: je      .next   ; 2; either via 'jecxz', then ZF indicates al==' '; continue to skip ' ' if so
                        ;    or ecx was ==-1 (now ==0), then ZF==1; proceed to skip ' '
        not     ecx     ; 2; set flag (copy); we can get here only if ecx was ==0 && al!=' '
        jmp     .copy   ; 2; proceed to copy non-' '; note that ZF==0, 'jne .next' uses it
.done:  cmp     esp, edi; 2; CF==edi>esp; this accounts for brand string of all ' 's
        sbb     edi, ecx; 2
; ecx==0:
;   CF==1: edi>esp => skipping after copying => trailing space, decrement
;   CF==0: edi==esp => string contains only ' 's => no trailing space, no decrement
; ecx==-1: edi>esp since we were copying => CF==1 => no decrement
        stosb           ; 1; store NUL    
It's quite convoluted for 31 bytes, so I've added extensive comments. Wink
Post 15 Jul 2013, 23:11
View user's profile Send private message Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 16 Jul 2013, 02:32
baldr wrote:
Since there's no clearly defined specification for the snippet
.
Quite true, but I think it's safe to assume at minimum most of us try to get as much computing done as possible with the least amount of instructions. This does result in saving space and hopefully time too, but it seems that's not always guaranteed.

Till now it was indeterminant where exactly I wanted to put this, but all things considerd and thanks to everyone's contributions the specifications are as follows;

A 16 bit routine that returns a pointer in ES:DI to Processors brand string , excluding leading spaces and length in CX minus terminator.

Code:
00  66BE03000080      mov esi,0x80000003

06  67668D4601        lea eax,[esi+0x1]
0B  0FA2              cpuid
0D  6652              push edx
0F  6651              push ecx
11  6653              push ebx
13  6650              push eax
15  664E              dec esi
17  7BED              jpo 0x6

19  16                push ss
1A  07                pop es
1B  89E7              mov di,sp
1D  83C9FF            or cx,-1
20  B020              mov al,' '
22  F3AE              repe scasb
24  4F                dec di
25  83C131            add cx, 31H
    


NOTE: As this is going to be used immediately after my boot loader is finished, all registers are volatile, except those being used in the process.

Protected mode is the most efficient space wise, but as I'm going to be going directly into long mode from real it's really not an option in this case.

In conclusion, more functionality packed into 8 bytes less than my original posting.
Post 16 Jul 2013, 02:32
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 16 Jul 2013, 04:43
TightCoderEx,

If you're not against self-modifying code, 2 bytes can be shaved off as follows:
Code:
        use16
_cpuid: mov     eax, 0x80000003
label .al byte at $-4
        inc     ax
        cpuid
        push    edx ecx ebx eax
        dec     [.al]
        jnz     _cpuid    
If [_cpuid.al] can be addressed via disp8 (I often set bp to point somewhere in the middle of boot loader, thus making instructions shorter), another byte is ready to be chopped. Also you can store 0x80000003 somewhere (e.g. in GDT entry corresponding to selectors 0…3) to avoid SMC and yield another 2 spare bytes (in fact it's 2 bytes more, but the code itself will be shorter).

By the way, on my netbook cpuid shows this spacing:
Code:
47 65 6E 75 69 6E 65 20-49 6E 74 65 6C 28 52 29   Genuine Intel(R)
20 43 50 55 20 20 20 20-20 20 20 20 20 20 20 55    CPU           U
32 33 30 30 20 20 40 20-31 2E 32 30 47 48 7A 00   2300  @ 1.20GHz.    
That's why I wrote that weird squeeze code.

P.S. Long/IA-32e mode is only reachable via protected mode, isn't it? So why don't you take opportunity to execute this code somewhere between RM and LM, with shorter encoding? Another 5 bytes.
Post 16 Jul 2013, 04:43
View user's profile Send private message Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 16 Jul 2013, 06:46
On my Asus P8 Z-77V LK, there are 7 leading spaces and as this is just a one time thing, I'm just going to design my screen for 48 bytes.

Self modifying code is something I used to use a lot on Z80 CPM machines. I changed your code slightly to my style and to fortify in my mind what "label" does. I notice most programmers initialize segments to known values and in 16 bit a lot of times DS = CS. In my Boot Loader DS points to 5 segments below CS thus only saving one byte.

Code:
label  Value word at $ + 2

   @@:  mov     eax, 0x80000003
        inc     ax
        cpuid
        push    edx
        push    ecx
        push    ebx
        push    eax
        dec     [cs:Value]
        jnz     @B

        push    ss
        pop     es
        mov     di, sp
        or      cx, -1
        mov     al, ' '
        repz    scasb
        dec     di
        add     cx, 49         
    

I used "word" instead of "byte" just to see if there was any change in code.
Post 16 Jul 2013, 06:46
View user's profile Send private message Visit poster's website Reply with quote
TightCoderEx



Joined: 14 Feb 2013
Posts: 58
Location: Alberta
TightCoderEx 16 Jul 2013, 15:34
baldr wrote:
Long/IA-32e mode is only reachable via protected mode, isn't it?
I think the only reason one sees so many examples of this, is that a lot of code is devoted to initializing devices. Just a guess on my part, but no 64 bit is attainable from REAL.

Long Mode Directly
Post 16 Jul 2013, 15:34
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.