Sorry, my English is too bad to make normal documentation and to describe all features of my macros, but I must make some important notes about library writing.
Before reading it please look at some examples to understand general syntax of my macros.

1) First to specify you are writing library define symbolic constant e_type equal to ET_DYN before including "elf32/elf.inc":


e_type equ ET_DYN
include 'elf32/elf.inc'


The default value for e_type is ET_EXEC i.e. executable program.


2) Then you are writing a library you must remember you don't know the actual function or data address. If you compile code like:


l2 dw l1

...

l1:


l2 will not contain real l1 address it will contain only offset of l1 from the beginning of library image in the address space of running process.

So if you want to get l1 address you should use code like


;ARM

getl1:
ldr r0,[l2]
add r0,pc,l2-$-8

l2 dw l1-$

...

l1:



or:



;thumb

getl1:
ldr r0,[l2]
.get:
add r0,pc

...

l2 dw l1-getl1.get-4

...

l1:




This way is useful if difference between l1 and getl1 addresses is very big.

If difference between l1 and code needs in l1 addresses can be fit in add immediately field i.e. in the worst case less then 0x100 then you can simply use "adr" pseudoinstruction:


getl1:adr r0,l1

...

l1:


Or you can combine some "add" instructions:

getl1:
add r0,pc,(l1-$-8) and 0xFF
add r0,pc,(l1-$-8)and not 0xFF

...

l1:


In x86 pc isn't general purposed register and the only two types of pc-relative instructions: jumps and call. You can use such code:


getl1:
call .l2
.l2:pop eax
add eax,l1-getl1

...

l1:


In MIPS situation are partially similar to x86 (only brunches and brunches with link are pc-relative) but according to MIPS elf specification register %t9 before function call must become equal to callee address.


function_entry:
...
getl1:addiu %v0,%t9,l1-function_entry

...

l1:


or


function_entry:
...
getl1:
li %v0,l1-function_entry
addu %v0,%v0,%t9

...

l1:





3) For example you have a such code:


e_type equ ET_EXEC

...

segment PF_R+PF_X

...

textend:

segment PF_R+PF_W

datastart:

...



That the number is expression (datastart-textend) equals to? Of coarse, to zero.
And now lets change e_type to ET_DYN


e_type equ ET_DYN

...

segment PF_R+PF_X

...

textend:

segment PF_R+PF_W

datastart:

...


And that the number is expression (datastart-textend) equals to now?

in ARM, x86: datastart-textend=0x1000
in MIPS:     datastart-textend=0x10000

Why? The reason is connected with virtual memory organisation and UNIX file mapping mechanism.
In these architectures the whole available virtual address space is divided into pages. In ARM and x86 minimal page size is 4096 bytes (0x1000). In MIPS minimal page size is different on different models but according to MIPS elf specification the greatest minimal page size is 64 kb (0x10000). 
Each page can be readable, readable-writeable, readable-executable, readable-writeable-executable etc. But pages can not be divided into some parts so no page can be partially readable-writeable and partially readable-executable.
UNIX mmap function used by dynamic linker to load libraries can map file only by data blocks aligned by page size.
For example your ARM source has been compiled in a file with such structure:

file region   |segment
--------------+---------
0-0x1600      |PF_R+PF_X
0x1600-0x2700 |PF_R+PF_W
--------------+---------



file offset    segment
-----------+______________+
0          |PF_R+PF_X     |
           |         _____|
0x1000     |_________|    |
           |PF_R+PF_W^  __|
0x2000     |____________|
                     ^  ^
0x3000               |  |
                     |  +--0x2700
                     +-----0x1600



and it is loading at memory region started with 0x100000.
The full library file size is 0x2700 so theoretically three pages are enough for this structure. But as I have written before linker can not create the same structure in a memory. But it can load this library with the following way:

virtual address| page              |file region
---------------+___________________+--------------------
0x100000       |readable executable|from 0
               |___________________|to 0x1000
0x101000       |readable executable|from 0x1000
               |___________________|to 0x2000
0x102000       |readable writeable |from 0x1000
               |___________________|to 0x2000
0x103000       |readable writeable |from 0x2000 to 0x2700
               |___________________|

As you can see there are four pages which is used for library image, the second and the third pages are associated with the same file area and initially contains same data it but have different access rights. So both pages contains a part of the first segment and a part of the second segment but actually the second segment begins at the offset 0x600 from the third page and actually the first segment ends at the offset 0x600 from the second page.
It is a bit strange that executable files can be loaded as is because it is possible only if border pages are allocated with all permissions (readable-writeable-executable). I don't know why dynamic linker doesn't use the same algorithm to load libraries.
And what does it affect? Even if you have very small code and data segments attempting to load data address with "adr" in ARM and with "add" in thumb will fail (I'm not sure it will always fail in ARM but in most common situations it fails). So in ARM you should use two add instructions instead this, and in MIPS you can use the following way:


function_entry:

...

li %v0,l1-function_entry
addiu %v0,%v0,%t9

...

segment PF_R+PF_W

l1:


Or if you just want to load word from data segment:


function_entry:

...

lui %t0,(l1-function_entry)shr 16
addiu %t0,%t0,%t9
lw %v0,[%t0+((l1-function_entry)and 0xFFFF)]

...

segment PF_R+PF_W

l1 dd ?


Or you can use GOT:


function_entry:
li %gp,GOT-function_entry
add %gp,%t9

...

lw %v0,[%gp,GOT_l1-GOT]

...

segment PF_R+PF_W

GOT:
.got GOT_l1,l1 ;In MIPS all GOT entries are automatically relocated during to library loading, so GOT_l1 entry contains real l1 address.

l1:


or maybe this way:


function_entry:
li %gp,GOT-function_entry
add %gp,%t9

...

addiu %v0,%gp,l1-GOT

...

segment PF_R+PF_W

GOT:
.got

l1:



4) This note is connected not only with ELF libraries.
Don't forget to add .dynstr, .dynsym, .hash and .dynamic sections to your file. If you use external symbols or objects you also need in .reloc (not in MIPS) and .got sections. In executable programs and executable libraries you also need in .interpreter section. Also it is necessary to specify name of your library with DT_SONAME tag.

Good luck!
