BASELIB: General purpose libs for beginners

Index > Tutorials and Examples > BASELIB: General purpose libs for beginners

Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next

Author

Thread

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 27 Sep 2015, 13:52

JohnFound wrote:

fasmnewbie wrote:
I updated the zip file to reflect a recent addition of...

Isn't it better to adopt some version control system. This way all changes will be easily followed and the history will be saved as well.

I would suggest using Fossil scm, because it has the same minimalistic spirit as assembly programming itself.

You can host your repositories free at Chisel or on your own hosting (with fossil it is very easy).

You should have told me about it much earlier John. I have my sources scattered all over the place on multiple PCs, at work, at home etc and at times I forgot which file and / or routine was just updated. Sometimes I saved the newer routines in the older source file and on other PC at work thinking of it as the same file I updated earlier on home laptop and vice versa It was a total mess!

27 Sep 2015, 13:52

JohnFound

Joined: 16 Jun 2003
Posts: 3499
Location: Bulgaria

JohnFound 27 Sep 2015, 16:27

fasmnewbie wrote:

You should have told me about it much earlier John. I have my sources scattered all over the place on multiple PCs, at work, at home etc and at times I forgot which file and / or routine was just updated. Sometimes I saved the newer routines in the older source file and on other PC at work thinking of it as the same file I updated earlier on home laptop and vice versa It was a total mess!

Well, actually I have told you several times. Wink

For example here and here.

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9

27 Sep 2015, 16:27

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 28 Sep 2015, 04:48

prtflt is one of the most naive routine. I intend to keep it that way because it offers bit-by-bit conversion process, slow but rich in FPU details. Secondly, for historical reason - that's the first FP conversion routine I wrote ;D

But for alternative, motivated beginners can further modify/complete this code to provide E notation implementation which is missing;

Code:

;----------------------------
;prnfloat(1)
;Simple print float without E
;----------------------------
;EAX    : FP value
;----------------------------
;Ret    : -
;Note   : No E notation
;       : Deals with normals only
;----------------------------
prnfloat:
        jmp     .start
        .strflt rb 16
.start:
        push    rbp
        mov     rbp,rsp
        sub     rsp,512
        and     rsp,-16
        fxsave  [rsp]
        mov     rbx,8           ;Total digits
        cld
        mov     rdi,.strflt
        test    eax,eax
        jns     .nxt
        mov     byte[rdi],'-'
        add     rdi,1
        btr     eax,31
.nxt:
        test    eax,eax
        jnz     .proceed
        mov     eax,'0.0'
        stosd
        sub     rdi,1
        jmp     .done
.proceed:
        mov     edx,eax
        ;shl     eax,1
        ;shr     eax,24
        ;mov     ecx,127
        ;sub     eax,ecx         ;exponent
        call    sse_round.down
;--------------------------
.get_parts:
        movd    xmm0,edx
        movd    xmm1,edx
        cvtss2si eax,xmm0
        cvtsi2ss xmm0,eax
        subss   xmm1,xmm0
        cvtss2si eax,xmm0       ;integer
        movd    ecx,xmm1        ;fraction
;--------------------------
.tmp1:
        mov     r8,10
        xor     esi,esi
;--------------------------
.int:
        xor     rdx,rdx
        div     r8
        add     dl,30h
        push    rdx
        inc     esi
        test    rax,rax
        jnz     .int
.rep1:
        pop     rax
        stosb
        sub     rbx,1
        jz      .err
        dec     esi
        jnz     .rep1
;--------------------------
.tmp2:
        mov     al,'.'
        stosb
        mov     edx,1.0
        movd    xmm3,edx
        mov     edx,10.0
        movd    xmm0,ecx
        movd    xmm1,edx
        movd    xmm2,ecx
;--------------------------
.frac:
        mulss   xmm0,xmm1
        mulss   xmm2,xmm1
        cvtss2si eax,xmm2
        add     al,30h
        stosb
        sub     ebx,1
        jz      .done
        sub     al,30h
        cvtsi2ss xmm2,eax
        subss   xmm0,xmm2
        movq    xmm2,xmm0
        comisd  xmm0,xmm3
        jc      .frac
        jmp     .done
;--------------------------
.err:   ;whatever
        mov     al,'#'
        stosb
;--------------------------
.done:
        xor     al,al
        stosb
        mov     rax,.strflt
        call    prtstrz
        fxrstor [rsp]
        mov     rsp,rbp
        pop     rbp
        ret

28 Sep 2015, 04:48

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 28 Sep 2015, 04:59

Just a comment about float printing functions. If you use the FPU/SSE to do conversion and scaling etc. then you can lose accuracy. This may or may not be important for a particular application, but if you have a requirement for accurate production of decimal values of binary FP numbers then a multi-precision integer code implementation can provide full accuracy.

Also be mindful of non-number values like NaN and infinity.

28 Sep 2015, 04:59

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 28 Sep 2015, 05:14

For the sake of completion, I changed a few lines, but still missing the E notation engine.

Code:

;----------------------------
;prnfloat(1)
;Simple print float without E
;----------------------------
;EAX    : FP value
;----------------------------
;Ret    : -
;Note   : No E notation
;       : Deals with normals only
;----------------------------
prnfloat:
        jmp     .start
        align   16              ;for EXE only
        .save   rb 512          ;don't turn to object
        .strflt rb 16
.start:
        push    rax rbx rcx rdx
        push    rsi rdi r8
        mov     rdi,.save
        fxsave  [rdi]
        mov     rbx,8           ;Total digits
        cld
        mov     rdi,.strflt
        test    eax,eax
        jns     .nxt
        mov     byte[rdi],'-'
        add     rdi,1
        btr     eax,31
.nxt:
        test    eax,eax
        jnz     .proceed
        mov     eax,'0.0'
        stosd
        sub     rdi,1
        jmp     .done
.proceed:
        mov     edx,eax
        ;shl     eax,1
        ;shr     eax,24
        ;mov     ecx,127
        ;sub     eax,ecx         ;exponent
        call    sse_round.down
;--------------------------
.get_parts:
        movd    xmm0,edx
        movd    xmm1,edx
        cvtss2si eax,xmm0
        cvtsi2ss xmm0,eax
        subss   xmm1,xmm0
        cvtss2si eax,xmm0       ;integer
        movd    ecx,xmm1        ;fraction
;--------------------------
.tmp1:
        mov     r8,10
        xor     esi,esi
;--------------------------
.int:
        xor     rdx,rdx
        div     r8
        add     dl,30h
        push    rdx
        inc     esi
        test    rax,rax
        jnz     .int
.rep1:
        pop     rax
        stosb
        sub     rbx,1
        jz      .err
        dec     esi
        jnz     .rep1
;--------------------------
.tmp2:
        mov     al,'.'
        stosb
        mov     edx,1.0
        movd    xmm3,edx
        mov     edx,10.0
        movd    xmm0,ecx
        movd    xmm1,edx
        movd    xmm2,ecx
;--------------------------
.frac:
        mulss   xmm0,xmm1
        mulss   xmm2,xmm1
        cvtss2si eax,xmm2
        add     al,30h
        stosb
        sub     ebx,1
        jz      .done
        sub     al,30h
        cvtsi2ss xmm2,eax
        subss   xmm0,xmm2
        movq    xmm2,xmm0
        comisd  xmm0,xmm3
        jc      .frac
        jmp     .done
;--------------------------
.err:   ;whatever comes through here
        mov     al,'#'
        stosb
;--------------------------
.done:
        xor     al,al
        stosb
        mov     rax,.strflt
        call    prtstrz
        mov     rdi,.save
        fxrstor [rdi]
        pop     r8 rdi rsi
        pop     rdx rcx rbx rax
        ret

NOTE: prnfloat is not part of the zipped library. This is just for testing purposes. You can include it if you want btw.

Revo

How does scaling results in precision lost? My old prtflt does use scaling but the final result still comply with IEEE Real4. I did it bit-by-bit. Can I see your code?

Last edited by fasmnewbie on 28 Sep 2015, 06:08; edited 1 time in total

28 Sep 2015, 05:14

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 28 Sep 2015, 05:27

Note on all FP routines (thanks to revo) all special values (NaN, infinity, DBZ) are handled via '#' error message for the sake of simplicity. This library isn't recommended for production anyway (as stated in the documentation).

28 Sep 2015, 05:27

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 28 Sep 2015, 05:48

You appear to be using single precision SSE instructions to print single precision numbers in decimal. Accuracy loss will happen for many numbers. For the most part it will only affect the 7th and higher digits but it can also affect the 6th digit for some values. If accuracy is important to an application then knowing the limitations is necessary. Otherwise you could limit the output to 5 digits only, and with correct rounding of the last digit, you could state that it is accurate for all input values within the range of numbers it is designed to handle.

If you needed accurate output for double precision numbers then you would, at a minimum, be required to carefully use extended precision from the FPU. And if you need to print accurate extended precision numbers then you can't use the hardware floating point instructions at all.

28 Sep 2015, 05:48

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 28 Sep 2015, 06:05

revolution wrote:

You appear to be using single precision SSE instructions to print single precision numbers in decimal. Accuracy loss will happen for many numbers. For the most part it will only affect the 7th and higher digits but it can also affect the 6th digit for some values. If accuracy is important to an application then knowing the limitations is necessary. Otherwise you could limit the output to 5 digits only, and with correct rounding of the last digit, you could state that it is accurate for all input values within the range of numbers it is designed to handle.

If you needed accurate output for double precision numbers then you would, at a minimum, be required to carefully use extended precision from the FPU. And if you need to print accurate extended precision numbers then you can't use the hardware floating point instructions at all.

Oh I got what you mean. I did try using the extended precision but the precision loss is poorer than the double version. I don't know why. I had it attached to the zip file under prtdblx (80-bit version) but I have to cancel it because it offers no accuracy advantage over its double sibling.. Maybe I need to re-check it.

28 Sep 2015, 06:05

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 28 Sep 2015, 06:10

You might need to enable the extended precision mode in the control word for the FPU. IIRC the default is to use only 64-bit precision.

28 Sep 2015, 06:10

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 28 Sep 2015, 06:23

revolution wrote:

You might need to enable the extended precision mode in the control word for the FPU. IIRC the default is to use only 64-bit precision.

Oh-eM-G!! I completely missed that one. I'll re-check it tonight. Will tell you the result. I need the summoning ritual to call prtdblx back from the grave.

Thanks.

28 Sep 2015, 06:23

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 28 Sep 2015, 21:09

This is the completed version of prnfloat that uses SSE instead of FPU. You can manually include this in the 64-bit source as an alternative to current FPU-based prtflt routine. It now has the E-notation engine.

Code:

prnfoalt:
        jmp     .start
        .val    dd 0.0
        .strflt rb 16
.start:
        push    rbp
        mov     rbp,rsp
        sub     rsp,512
        and     rsp,-16
        fxsave  [rsp]
        push    rax
        push    rbx
        push    rcx
        push    rdx
        push    rsi
        push    rdi
        push    r8
        push    r9
        push    r10
        xor     r10,r10
        xor     r9,r9
        mov     rbx,7           ;8?
        cld
        mov     rdi,.strflt
        mov     edx,eax
        mov     [.val],eax
        xor     eax,eax
        fld     dword[.val]
        fxam
        fstsw   ax
        and     eax,4500h       ;Mask C3, C2 and C0
        cmp     eax,500h        ;NAN
        je      .err
        cmp     eax,100h        ;INF
        je      .err
        cmp     eax,0           ;Unsupported
        je      .err
        mov     eax,edx
        test    eax,eax
        jns     .nxt
        mov     byte[rdi],'-'
        add     rdi,1
        btr     eax,31
.nxt:
        test    eax,eax
        jnz     .proceed
        mov     eax,'0.0'
        stosd
        sub     rdi,1
        jmp     .done
.proceed:
        mov     edx,eax
        shl     eax,1
        shr     eax,24
        mov     ecx,127
        sub     eax,ecx         ;exponent
        test    eax,eax
        js      .small
        cmp     eax,23
        jae     .big
;--------------------------
.get_parts:
        call    sse_round.down
        movd    xmm0,edx
        movd    xmm1,edx
        cvtss2si eax,xmm0
        cvtsi2ss xmm0,eax
        subss   xmm1,xmm0
        cvtss2si eax,xmm0       ;integer
        movd    ecx,xmm1        ;fraction
        jmp     .tmp1
;---------------------------
.small:
        mov     ecx,edx
        movd    xmm0,edx
        mov     edx,10.0
        movd    xmm1,edx
        mov     edx,1.0
        movd    xmm2,edx
.again:
        mulss   xmm0,xmm1
        inc     r9
        comisd  xmm0,xmm2
        jc      .again
        cmp     r9,7
        ja      .nxt3
        mov     edx,ecx
        jmp     .get_parts
.nxt3:
        mov     r10,1
        movd    edx,xmm0
        jmp     .get_parts
;--------------------------
.big:
        movd    xmm0,edx
        mov     edx,10.0
        movd    xmm1,edx
.again1:
        divss   xmm0,xmm1
        inc     r9
        comisd  xmm0,xmm1
        jnc     .again1
        movd    edx,xmm0
        mov     r10,2
        jmp     .get_parts
;--------------------------
.tmp1:
        mov     r8,10
        xor     esi,esi
;--------------------------
.int:
        xor     rdx,rdx
        div     r8
        add     dl,30h
        push    rdx
        inc     esi
        test    rax,rax
        jnz     .int
.rep1:
        pop     rax
        stosb
        sub     rbx,1
        jz      .err
        dec     esi
        jnz     .rep1
;--------------------------
.tmp2:
        test    r10,r10
        js      .done
        mov     al,'.'
        stosb
        mov     edx,1.0
        movd    xmm3,edx
        mov     edx,10.0
        movd    xmm0,ecx
        movd    xmm1,edx
        movd    xmm2,ecx
;--------------------------
.frac:
        mulss   xmm0,xmm1
        mulss   xmm2,xmm1
        cvtss2si eax,xmm2
        add     al,30h
        stosb
        sub     ebx,1
        jz      .predone
        sub     al,30h
        cvtsi2ss xmm2,eax
        subss   xmm0,xmm2
        movq    xmm2,xmm0
        comisd  xmm0,xmm3
        jc      .frac
;--------------------------
.predone:
        cmp     r10,0
        jle     .done
        mov     ax,'e-'
        cmp     r10,1
        je      .nxt4
        cmp     r10,2
        jne     .done
        mov     ax,'e+'
.nxt4:
        stosw
        mov     eax,r9d
        mov     r10,-1
        jmp     .tmp1
;--------------------------
.err:
        mov     al,'#'
        stosb
;--------------------------
.done:
        xor     al,al
        stosb
        mov     rax,.strflt
        call    prtstrz
        pop     r10
        pop     r9
        pop     r8
        pop     rdi
        pop     rsi
        pop     rdx
        pop     rcx
        pop     rbx
        pop     rax
        fxrstor [rsp]
        mov     rsp,rbp
        pop     rbp
        ret

28 Sep 2015, 21:09

HaHaAnonymous

Joined: 02 Dec 2012
Posts: 1178
Location: Unknown

HaHaAnonymous 29 Sep 2015, 02:04

Quote:

I would request that you don't remove or destroy your old posts. It makes the board less useful and hard to follow. Some other people have deleted/destroyed old posts, but the reason was not to help improve the site.

Just want to apologize about that. I know I was stupid.

29 Sep 2015, 02:04

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 04 Oct 2015, 21:41

Since I had more spare time last week, I made lots of changes and modifications to the current library.

Added:
--------
1) pow2 - calculate 2^n (powint does the same but I made this anyway)
2) stackviewf - The ability to view the stack by flat bytes
3) sincos - self-explanatory
4) fpu_precision - set fpu_precision control
5) prtstreg - display short string off RAX. Handy for short string.
6) str_trim - trim a 0-ended string

Improved/Edit/Corrected:
---------
1) pow - now can calculate x^n for precision values
2) powint - introduced a skip to base 2 to improve speed. Corrected for sign base
3) dumpreg/d/u - now RBP shows surface / caller's RBP instead.
4) stackview - now with options to select multiple format
5) prtintd - now with sign/unsign options
6) prtintw - same as above
7) prtintb - same as above
8) prtxmm - code shortened due to deletions of prtintdu, prtintwu, prtintbu. Changed the selection sequence too.
9) getflt/getdbl - corrected the size of buffer to make it more permissive for keyboard kittens to enter very large or small values like 0.00000000000000000000045876
10) corrected log10 stack balancing typo error (add rax,8 instead of add rsp,8)

Deleted:
---------
1) stackviewd - this option is available in stackview
2) stackviewdu - same as above
3) prtintdu - this option now available in prtintd
4) prtintwu - this option now available in prtintw
5) prtintbu - this option now available in prtintb

Update October 6th, 2015

Last edited by fasmnewbie on 05 Oct 2015, 21:05; edited 1 time in total

04 Oct 2015, 21:41

Picnic

Joined: 05 May 2007
Posts: 1435
Location: Piraeus, Greece

Picnic 05 Oct 2015, 06:29

fasmnewbie wrote:

I think this library is complete now and has almost every routines required by beginners to start learning assembly programming quite comfortably.

It's hard not to find something useful. I will test the library extensively this week and let you know about my impressions.

Thanks for your effort fasmnewbie.

05 Oct 2015, 06:29

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 05 Oct 2015, 21:10

Picnic wrote:

It's hard not to find something useful. I will test the library extensively this week and let you know about my impressions.

Thanks for your effort fasmnewbie.

I don't know why is it I have this guilty feeling every time a senior wants to check out my code. Feels like I am 10 year old, in the principal's office waiting for punishment for not paying attention in class xD

05 Oct 2015, 21:10

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 05 Oct 2015, 21:26

I still have no idea how to develop half-precision routine since there's no verification tool available to test the accuracy / correctness of the result. If anybody happen to know one, please let me know.

05 Oct 2015, 21:26

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 07 Oct 2015, 17:32

Did a final touch up here and there. Tested bug-free but not fully tested for logical error.
Not perfect, but should be enough for beginners to go thru those difficult times.

Happy coding.

07 Oct 2015, 17:32

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 12 Oct 2015, 16:45

I added 3 more routines (str_wordcnt, dumpseg, ascii), which are not too important but it gives me a sense of completion of this basic library. With these latest additions, a beginner who chooses this utility to accompany him/her learning basic assembly programming with FASM is now armed to the teeth to face most of the learning challenges at the basic level - flat style, no C, no linker, no debugger required.

Enjoy it and happy coding.

12 Oct 2015, 16:45

idle

Joined: 06 Jan 2011
Posts: 440
Location: Ukraine

idle 13 Oct 2015, 16:21

Hi fasmnewbie!
Just wanted to warn you of the rakes i've met on my way:
- are the routines limited to specific sse-version?
- it may happen a routine of you to be included in a write-protected section - avoid ...

Code:

...
section '' code readable
...
prnfloat:
        jmp     .start
        .strflt rb 16 
...
;... avoid writes to .strflt etc, use stack or user-passed buffer

- debuggers should be your friends Smile

Great job!

13 Oct 2015, 16:21

fasmnewbie

Joined: 01 Mar 2011
Posts: 555

fasmnewbie 14 Oct 2015, 11:13

Hi idle. Thanks for stopping by. I agree with your points except for this one;

idle wrote:

- debuggers should be your friends

At beginners level, IMHO, a debugger is counter-productive. This library is specifically tailored to reproduce a debugger output on-the-fly. It saves time, don't have to stop coding just to use the debugger, and a user can choose what information he wants at any point of execution.

Lets say a beginner wants to learn about EXTRACTPS instruction.

Code:

format PE64 console
include 'win64axp.inc'
entry Begin

section '.data' data readable writeable
x dd 5.0,8.0,6.5,-0.6

section '.code' code readable executable
Begin:
        movdqa xmm0,dqword[x]
        extractps eax,xmm0,1       ;learn this
        movd xmm1,eax              ;copy the result to xmm1

        mov eax,8                    ;shows xmm dump with singles output
        call dumpxmm               ;use this routine instead of calling debugger

        call exitp

The output;

Code:

XMM0 : -0.6|6.5|8.0|5.0
XMM1 : 0.0|0.0|0.0|8.0             ;visually proven
XMM2 : 0.0|0.0|0.0|0.0
XMM3 : 0.0|0.0|0.0|0.0
XMM4 : 0.0|0.0|0.0|0.0
XMM5 : 0.0|0.0|0.0|0.0
XMM6 : 0.0|0.0|0.0|0.0
XMM7 : 0.0|0.0|0.0|0.0
XMM8 : 0.0|0.0|0.0|0.0
XMM9 : 0.0|0.0|0.0|0.0
XMM10: 0.0|0.0|0.0|0.0
XMM11: 0.0|0.0|0.0|0.0
XMM12: 0.0|0.0|0.0|0.0
XMM13: 0.0|0.0|0.0|0.0
XMM14: 0.0|0.0|0.0|0.0
XMM15: 0.0|0.0|0.0|0.0

Once satisfied and confirmed, a user can delete the call to dumpxmm or try another imm8 for EXTRACTPS (learn efficiently), AVOID bugs line-by-line, saves time and most importantly, it can be done while at coding, interactively and visually.

These routines are particularly designed to mimic a debugger output;

dumpreg/d/u
prtreg/d/u
flags
fpu_stack
fpu_sflag
fpu_cflag
prtxmm
dumpxmm
sse_flags
fpu_reg
stackview
stackviewf

and probably more...

But anyway, it's just a matter of personal convenience.

14 Oct 2015, 11:13

Goto page Previous 1, 2, 3, 4, 5, 6, 7 Next

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum