flat assembler
Message board for the users of flat assembler.

Index > DOS > 1 byte integer arithmetic/load with x87 instruction ?

Author
Thread Post new topic Reply to topic
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 31 Mar 2015, 15:22
Hi there,

I'm trying to do some intro size coding within dos in 16 bit mode and I want to do some floating point arithmetic using some 1 byte data from memory as input constants.

As far as I see instructions like FIADD or FIMUL require at least 16 bit data, but origanizing all my 1 byte data into 16 bit data would double the memory usage.

Is there any other way how to do that or is there any x87 insruction that may "load" from x86 register and not from memory that I am missing out ?

Otherwise I guess I would have to do some small x86 loop that loads the 1 byte data, stores it to a 16 bit data set and then access that with x87.
Post 31 Mar 2015, 15:22
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1708
Location: Toronto, Canada
AsmGuru62 31 Mar 2015, 17:00
If you make few macros like this:
...
Code:
mov     al, [int8]
cbw
push    ax
mov     si, sp
fiadd   word [si]
    

And, of course do not forget to restore stack.
Post 31 Mar 2015, 17:00
View user's profile Send private message Send e-mail Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 31 Mar 2015, 21:07
Quote:

but origanizing all my 1 byte data into 16 bit data would double the memory usage.

But wouldn't the additional code alone require even more memory? Please, explain if possible...

Thank you!
Post 31 Mar 2015, 21:07
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20621
Location: In your JS exploiting you and your system
revolution 01 Apr 2015, 10:10
Make it a function, not a macro, if it is used in many places. Otherwise if it is in some loop and only used in that one place then it makes no difference.
Post 01 Apr 2015, 10:10
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 01 Apr 2015, 15:37
@HaHaAnonymous:

I have something like 80 single bytes of necessary data that I need to process with some x87 code. So I would need additional 80 bytes of data for the word structure, so a small loop with load, CBW, store etc. covers much less space than 80 bytes, as reserving the extra memory for the 16 bit data is "free". I'll see...when I'm ready with the code I'll post it here.
Post 01 Apr 2015, 15:37
View user's profile Send private message Visit poster's website Reply with quote
tthsqe



Joined: 20 May 2009
Posts: 767
tthsqe 02 Apr 2015, 05:17
Hey Kuemmel!
have not seed you here for a long time.
How's it going?
Have you tried any 3D fractals yet?
Post 02 Apr 2015, 05:17
View user's profile Send private message Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 02 Apr 2015, 15:34
Hi tthsqe,

indeed, just not really much time for coding. When I'm coding it's mostly under Risc OS on ARM dev boards for fun (e.g. on a Pandaboard). But got kind of curious on 256 byte DOS intros, as there are such nice productions out there on pouet.

For RiscOS/ARM I ported a Mandelbulb GL shader language code to assembler, but it's still no way realtime...nowdays everything is GLSL coding regarding graphics and 3D (like what you see on www.shadertoy.com). But I'm not really a GLSL coder, just collecting ideas from there. These graphic cards nowdays are so powerfull...you can't keep up with even multi core assembler code doing those things in the CPU.
Post 02 Apr 2015, 15:34
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4224
Location: vpcmpistri
bitRAKE 03 Apr 2015, 02:10
Code:
fiadd word [di] ; load constant times 256
scasb ; next byte
fmul [de_scale] ; 1/256    
...and just ignore the partial fraction. Also, might want to factor de_scale into calculation. Size coding is fuzzy coding - to reduce size an approximate environment is sufficient. Smile
Post 03 Apr 2015, 02:10
View user's profile Send private message Visit poster's website Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 03 Apr 2015, 08:45
Kuemmel wrote:
These graphic cards nowdays are so powerfull...you can't keep up with even multi core assembler code doing those things in the CPU.


Intel KNL is on the horizon:

http://www.zdnet.com/article/intels-next-big-thing-knights-landing/

According to the article:

"Knights Landing will deliver 3 teraflops double-precision and 6 teraflops single-precision."
Post 03 Apr 2015, 08:45
View user's profile Send private message Visit poster's website Reply with quote
HaHaAnonymous



Joined: 02 Dec 2012
Posts: 1178
Location: Unknown
HaHaAnonymous 03 Apr 2015, 16:41
Quote:

"Knights Landing will deliver 3 teraflops double-precision and 6 teraflops single-precision."

You can have a GPU that will achieve the same performance for a tiny fraction of the price. But if you are rich enough, dive in... It is all I can say. D:

I apologize for any inconveniences I may have caused.
Post 03 Apr 2015, 16:41
View user's profile Send private message Reply with quote
randall



Joined: 03 Dec 2011
Posts: 155
Location: Poland
randall 03 Apr 2015, 18:20
HaHaAnonymous wrote:
Quote:

"Knights Landing will deliver 3 teraflops double-precision and 6 teraflops single-precision."

You can have a GPU that will achieve the same performance for a tiny fraction of the price. But if you are rich enough, dive in... It is all I can say. D:

I apologize for any inconveniences I may have caused.


3 teraflops double-precision GPU (NVIDIA K80) is also very expensive.
Post 03 Apr 2015, 18:20
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 04 Apr 2015, 08:23
Code:
fiadd word [di] ; load constant times 256
scasb ; next byte
fmul [de_scale] ; 1/256    

Nice idea ! "scasb"...more obscure opcodes that I still got to learn on their usage ...looks like compared to my ARM-stuck brain that the art in x86 is to find the "one" weird opcode that fits Wink

How "cheap" would I get my data address into "di" ? Just like lea di,data_label ?
Post 04 Apr 2015, 08:23
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20621
Location: In your JS exploiting you and your system
revolution 04 Apr 2015, 08:27
LEA would use an extra byte over MOV IIRC.

Also what is the advantage of "scasb" over "inc di"?
Post 04 Apr 2015, 08:27
View user's profile Send private message Visit poster's website Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4224
Location: vpcmpistri
bitRAKE 04 Apr 2015, 23:03
revolution wrote:
Also what is the advantage of "scasb" over "inc di"?
None.

It would be nice to see the inner loop code. We can all have some fun pondering alternatives. Wink

My last size coding was just some sillyness.
http://www.pouet.net/prod.php?which=61073

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 04 Apr 2015, 23:03
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 14 Apr 2015, 20:43
...finally got some code running. It's basically a drawing of 12 cubic bezier lines in the form of the good old acorn logo represented as many dots (so it displays like an outline, no space for a line algo). Of course a bit stupid to also draw horizontal or vertical lines as bezier...but it's just an exercise o get me started on that weird but fun DOS 16bit coding. I'm down to 236 bytes Smile but no animation yet Sad

For sure I missed out lots of size optimizations. If you spot some, just tell, any comment welcome. I could also provide a BASIC-code for explanation if you need.

I'm still thinking about those string commands and rep loops...can they be of use here, for example in my byte to word copy routine ? Or is "in" and "out" usefull here ?
Code:
org 100h
use16

max_bez_lines = 12   ;12 bezier lines
max_bez_dots  = 255  ;represented as 255 dots...no space for a line algo

start:   push 0a000h ;vga
         pop es
         mov al,13h  ;mode 13, 320x200
         int 10h

;copy routine to copy all the bezier data and 2 more variables from
;byte to word data for FPU access
mov di,bez_b
mov cl,(max_bez_lines*8+2)  ;is ch always zero ? seems to be...
L0:
  mov al,byte[di]
  mov word[di+bx+(max_bez_lines*8+2)],ax ;ah also seems to be zero...
  inc di
  inc bx                    ;bx is zero at .com start
loop L0

mov di,count_w                ;could be replaced by add offset, but not shorter
mov bl,((max_bez_lines-1)*16) ;bh is zero already

L1:
  mov word[di],cx         ;cx is always zero here
  mov dx,max_bez_dots

  L2:
    fild  word[di]        ;get counter
    fidiv word[di-2]      ;t = counter/max_segments
    fld1                  ;get 1.0
    fsub st0,st1          ;at=1.0-t | t
    fld  st0              ;at   | at | t
    fmul st0,st0          ;at*at    | at | t
    fmul st0,st1          ;at*at*at | at | t
    fld  st2              ;t    | at*at*at | at | t
    fmul st0,st0          ;t*t      | at*at*at | at | t
    fmul st0,st3          ;t*t*t    | at*at*at | at | t
    fild word[di-4]       ;3        | t*t*t    | at*at*at  | at | t
    fmul st0,st3          ;3*at     ...
    fmul st0,st4          ;3*at*t   ...
    fmul st3,st0          ;3*at*t   | t*t*t    | at*at*at  | 3*at*at*t | t
    fmulp st4,st0         ;t*t*t    | at*at*at | 3*at*at*t | 3*at*t*t
                      
    lea si,[di+bx-196]    ;(max_bez_lines*8+2)*2]

    mov cl,4
    fldz
    fldz               ;init nx | ny
    L3:
      fstp st6         ;store latest nx,ny from iteration same position each time
      fstp st6         ;b0     | b1     | b2 | b3     | nx     | ny
      fld  st0         ;b0     | b0     | b1 | b2     | b3     | nx     | ny
      fimul word[si]   ;b0*sx  | b0     | b1 | b2     | b3     | nx     | ny
      faddp st5,st0    ;b0     | b1     | b2 | b3     | nx_new | ny
      fimul word[si+2] ;b0*sy  | b1     | b2 | b3     | nx_new | ny
      faddp st5,st0    ;b1     | b2     | b3 | nx_new | ny_new
      fld st4
      fld st4          ;nx_new | ny_new | b1 | b2     | b3
      add si,4
    loop L3

    fistp word[di+2]   ;get x
    fistp word[di+4]   ;get y

    imul si,word[di+4],320
    add  si,word[di+2]     ;screen address is x + y * 320

    mov byte[es:si],2      ;store pixel with green colour

    inc byte[di]           ;inc counter for FPU
    dec dx
  jnz L2

sub bx,16
jns L1

L5:
  mov ah,01h      ;wait for keyboard
  int 16h
jz L5
  mov ax,03h
  int 10h
ret               ;exit

bez_b     db 199,98, 120, 98,123,  4,196,  4
          db 199,98, 120, 98,120, 98,199, 98
          db 156,143,118,103,118,126,139,140
          db 201,103,118,103,118,103,201,103
          db 163,143,201,103,201,126,180,140
          db 156,147,156,143,156,143,156,147
          db 163,149,163,143,163,143,163,149
          db 117,137,156,147,146,145,133,141
          db 208,166,163,149,179,155,193,160
          db 198,170,116,140,150,150,174,158
          db 117,137,116,140,116,140,117,137
          db 208,166,198,170,198,170,208,166
three_b   db 3
max_seg_b db max_bez_dots
bez_w     dw 12*8 dup ?
three_w   dw 1 dup ?
max_seg_w dw 1 dup ?
count_w   dw 1 dup ?
coord     dw 2 dup ?    
Post 14 Apr 2015, 20:43
View user's profile Send private message Visit poster's website Reply with quote
Kuemmel



Joined: 30 Jan 2006
Posts: 200
Location: Stuttgart, Germany
Kuemmel 21 May 2015, 18:21
...released my final version and some extra procedural graphics on pouet =>
http://www.pouet.net/prod.php?which=65630
Post 21 May 2015, 18:21
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.