flat assembler
Message board for the users of flat assembler.
![]() |
Author |
|
uart777 18 May 2013, 17:32
Minimal version of Z77 for ARM. Includes portable graphics (draw pixel, line, rectangle, scanline, gradient), text operations (copy, compare, etc) and more.
Assemble with FASMARM (outputs .GBA). Runs in popular GBA emulators: VisualBoy Advance, No$GBA and GBA Emu for Android. ![]() Download: http://sungod777.zxq.net/z77gba.zip Thanks to Tomasz for FASM and to revolution for ARM addition. FASMARM is nice. Source is clear, written professionally and it includes documentation and examples. As a programming environment, I think FASM+ARM+Z77 together is way better than GNU/GCC ASM/C/C++ package. And I will never use Microsoft's junk bloated compilers or DevARMKit (600MB+ download!) or ARM's DS-5 package (bloatware) or Eclipse (slow, illogical, cluttered IDE, time consuming setup, downloads take forever). You guys think drawing a 32BPP gradient in X86/Windoze is hard? ![]() GBA programming references: * GBAtek: http://nocash.emubase.de/gbatek.htm * CowBite: http://www.cs.rit.edu/~tjh8300/CowBite/CowBiteSpec.htm Source Code Preview: Code: ; $$$$$$$$$$$$$$$$$$ Z77 4 ARM $$$$$$$$$$$$$$$$$$$ ; *************** STAR^2 SOFTWARE **************** ; ???????????????????? Z.INC ????????????????????? ; ___ __ ; / _/_ __/ /___ _________ ; / _/ // / __/ // / __/ -_) ; /_/ \_,_/\__/\_,_/_/ \__/_ __ ; ___ ____ ___ ___ __ _ / / / /__ ____ ; / _ `(_-<(_-</ -_) ' \/ _ \/ / -_) __/ ; \_,_/___/___/\__/_/_/_/_.__/_/\__/_/ format binary as 'GBA' use32 macro use [f] { forward include 'use\'#`f#'.inc' } use cpu, language, math, system, memory, text, draw ;;;;;;;;;;;;;; EVOLUTION BEGINS NOW ;;;;;;;;;;;;;; ; fast unsigned division by 10. r0/10 macro div10 { movri r1, 1999999Ah ; r1=((2^32)/10)+1 sub r0, r0, r0, lsr 30 ; r0=r0-(r0>>>30) umull r2, r0, r1, r0 ; r0=r1*r0 } ; with remainder in r1 macro divr10 { mov r3, r0 ; dividend udiv10 mov r1, r0, lsl 1 ; multiply by 10: add r1, r1, lsl 2 ; r1=(r0<<1)+(r1<<2) sub r1, r3, r1 ; r1=r3-r1 } ; faster than div10 macro __div10 { sub r0, r0, r0, lsr 14 ; r0=r0-(r0>>>14) add r1, r0, r0, lsl 1 ; r1=r0+(r0<<1) add r0, r0, r1, lsl 2 ; r0=r0+(r1<<2) add r0, r0, r1, lsl 6 ; r0=r0+(r1<<6) add r0, r0, r1, lsl 10 ; r0=r0+(r1<<10) mov r0, r0, lsr 15 ; r0=(r0>>>15) } ; divide by 255 (256-1) macro div255 { mov r1, r0, lsr 8 ; n=((n>> |
|||
![]() |
|
HaHaAnonymous 18 May 2013, 17:42
[ Post removed by author. ]
Last edited by HaHaAnonymous on 28 Feb 2015, 20:30; edited 1 time in total |
|||
![]() |
|
edfed 18 May 2013, 18:24
nice
![]() can this work (and how to run) on a HTC explorer? |
|||
![]() |
|
TmX 19 May 2013, 02:36
Nice. Hopefully the next release of Z77 can run on Android natively
![]() |
|||
![]() |
|
revolution 19 May 2013, 02:50
uart777: I would suggest you use the processor and coprocessor directives to make sure that fasmarm doesn't generate opcodes not supported by the CPU.
|
|||
![]() |
|
uart777 19 May 2013, 09:33
revolution: Which directive/s are needed for GBA? It has a ARM7TDMI CPU. What about Raspberry PI? ARM1176JZF-S.
MHajduk: ARM assembly is harder than Intel, but in many ways, ARM is a more powerful CPU and it can do more in one instruction. edfed: You can try running in any GBA emulator. Thinking about ordering a HTC Tilt 2 for Windows Mobile programming if I can find one on EBay for <=$20. |
|||
![]() |
|
revolution 19 May 2013, 10:05
uart777 wrote: revolution: Which directive/s are needed for GBA? It has a ARM7TDMI CPU. ReadMe.txt wrote: For ARM7TDMI CPUs: uart777 wrote: What about Raspberry PI? ARM1176JZF-S. |
|||
![]() |
|
uart777 19 May 2013, 11:34
Thanks revolution. Sorry, I overlooked that, although I did view your documentation and I was specifically looking for how to call Windows Mobile/CE functions from coredll.
Let me get this straight... To call a Windows Mobile/CE function, you send the first 4 parameters in r0-r3/a1-a4 then the remaining on the stack, right? In what order? For example, how exactly would I call CreateFileW from coredll? Example? GBA does not support the movw/movt method (>=ARMv6T2) of constructing immediate values so I wrote my own "movri" (in CPU.INC) and it does not produce the pointless "orr 0"'s as seen in typical examples). Last edited by uart777 on 10 Aug 2013, 09:22; edited 2 times in total |
|||
![]() |
|
revolution 19 May 2013, 11:44
For WinCE it follows the APS calling convention. This is explained in the file PROCAPS.INC
PROCAPS.INC wrote: ;High level procedure macros for APS (ARM Procedure Standard) calling |
|||
![]() |
|
uart777 19 May 2013, 12:12
Thanks, again. That's what I thought. I have followed the ARM APS standard in my code.
Code: ; standard register names. lowest-level macros ; - CPU+LANGUAGE - shall use the names r0-r12 ; while high-level functions - in the library - ; use a1-a4/v1-v8 to make a clear distinction ; between parameters and "scratch registers" ![]() ![]() Code: ; create BGR 15BPP (1.5.5.5), 0-31 each... function rgb, r, g, b and a1, a1, 11111b ; r=(r&1Fh) and v2, a2, 11111b ; g=(g&1Fh) and v3, a3, 11111b ; b=(b&1Fh) orr a1, v2, lsl 5 ; c|(g<<5) orr a1, v3, lsl 10 ; c=(b<<10) endf ; alpha combination. a1/a2 = a/b. a3/n=0-31 function mix, a, b, n mov a3, a3, lsl 3 ; convert n to 0-255 mov v1, a1, lsr 10 ; db=(c1>>10)&11111b mov v2, a2, lsr 10 ; sb=(c2>>10)&11111b and v1, v1, 1Fh and v2, v2, 1Fh sub v2, v2, v1 ; (sb-db) mul v2, v2, a3 ; (sb-db)*n lsr v2, v2, 8 ; ((sb-db)*n)>>+db add v3, v2, v1 mov v1, a1, lsr 5 ; dg=(c1>>5)&11111b mov v2, a2, lsr 5 ; sg=(c2>>5)&11111b and v1, v1, 1Fh and v2, v2, 1Fh sub v2, v2, v1 ; (sg-dg) mul v2, v2, a3 ; (sg-dg)*n lsr v2, v2, 8 ; ((sg-dg)*n)>>+dg add v4, v2, v1 and v1, a1, 1Fh ; dr=c1&11111b and v2, a2, 1Fh ; sr=c2&11111b sub v2, v2, v1 ; (sr-dr) mul v2, v2, a3 ; (sr-dr)*n lsr v2, v2, 8 ; ((sr-dr)*n)>>+dr add a1, v2, v1 ; c=r|(g<<5)|(b<<10) orr a1, v4, lsl 5 orr a1, v3, lsl 10 endf ; shift, mask, scale, subtract then divide. ; return: a1=delta ; (((((b>>s)&m)<<-(((a>>s)&m)<<)/w) function delta8, a, b, s, n alias a=a1, b=a2, s=a3, n=a4, m=v1 mov m, 11111b mov b, b, lsr s ; ((b>>s)&m)<< and b, b, m lsl b, b, 8 mov a, a, lsr s ; (((a>>s)&m)<< and a, a, m lsl a, a, 8 sub a, b, a ; (b-a)/w mov b, n idiv endf |
|||
![]() |
|
revolution 19 May 2013, 12:19
uart777 wrote: I have followed the ARM APS standard in my code. |
|||
![]() |
|
MHajduk 19 May 2013, 12:20
uart777 wrote: Ok, let me shove some hardcore ARM ASM in the face of these newbies ![]() |
|||
![]() |
|
uart777 19 May 2013, 20:16
revolution: Yes, "function" saves registers r4-r12+lr: stmfd sp!, { r4-r12, lr }.
Last edited by uart777 on 10 Aug 2013, 09:23; edited 1 time in total |
|||
![]() |
|
revolution 20 May 2013, 00:36
uart777 wrote: revolution: Yes, "function" saves registers r4-r12+lr: stmfd sp!, { r4-r12, lr }. |
|||
![]() |
|
uart777 20 May 2013, 06:52
revolution: I thought about this, too, but the default "function" saves all 'v' registers just to be safe. Safety is more important than optimization. Who cares how fast a program is if it doesn't work? I use ASM primarily for knowledge. Drawing requires 1,000s of times the speed compared to the call overhead, so reducing it won't make a difference. 97%+ of the time is spent drawing.
Another thing to consider is that "let" should perform ARM specific optimizations. For example, "let" should detect/match sequences involving the barrel shifter like this: "let r1<<r2, r0+r1" and replace it with: "add r0, r1, lsl r2". |
|||
![]() |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.