Data directive with custom size

Index > Programming Language Design > Data directive with custom size

Goto page Previous 1, 2

Author

Thread

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 19 Sep 2016, 21:04

JohnFound wrote:

The name is not very important here. More important is that the suggested syntax does not distinguish between the data and the data size. In order to stay readable enough, it must separate the data size from the data values.

This type of syntax where first of comma-separated arguments means something else is not a precedent in case of fasm - the IRP directive is another example. But I think jmg's suggestion of a colon instead of comma was a good one, the same idea crossed my mind when I was implementing it - it is a matter of changing a single line in fasmg source.

Combining D with a number is in fact something I earlier considered as a possible remedy for the problem of redefining DW I mentioned in the initial post - the only difference is that I thought about mnemonics like D16 or D32 - with numbers of bits instead of bytes. However DBX is much more than that. Keep in mind that the first argument may contain any expression, even a value that requires multiple passes to resolve.

19 Sep 2016, 21:04

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 19 Sep 2016, 21:25

Before I go to sleep today, I have quickly added the ":" syntax as an option to the current version, since this was so effortless. It now works like:

Code:

dbx 4: 4,5,6

From all the suggested names I think I liked EMIT the most.

19 Sep 2016, 21:25

jmg

Joined: 18 Sep 2016
Posts: 62

jmg 19 Sep 2016, 22:39

Tomasz Grysztar wrote:

Before I go to sleep today, I have quickly added the ":"
syntax as an option to the current version, since this was so effortless. It now works like:
Code:
dbx 4: 4,5,6    

Looking good.

is this also possible ?

Code:

dbx 4: 4,5,6 2:0x33

what about strings with dbx/EMIT ?

Tomasz Grysztar wrote:

From all the suggested names I think I liked EMIT the most.

That's ok, the more it expands from a simple db, the less keeping the root matters.

19 Sep 2016, 22:39

rugxulo

Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)

rugxulo 20 Sep 2016, 03:48

Okay, so I was originally going to ignore this thread (because trivial opinions like this, especially from me, are a dime a dozen).

Just for the record, I was originally thinking of (and discarding) ideas similar to these:

dbyte
dbit

bits
bytes

defbyte
defbits

... which are fairly unoriginal and pointless.

Though, semi-jokingly, I thought of a better alternative:

po

And if you insist on showing that it's Little Ending, use (alias) "pole". Cool

20 Sep 2016, 03:48

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 20 Sep 2016, 07:01

I have added EMIT as a synonym to DBX and made it the official name in the documentation. I'm leaving the DBX as an undocumented option for now. It may be good to have two different names, just in case one of them needs to be taken and redefined for other purposes in some macro framework.

jmg wrote:

his also possible ?
Code:
dbx 4: 4,5,6 2:0x33    

No, you still have to use a macro for that.

jmg wrote:

what about strings with dbx/EMIT ?

This is a generalization of other such data directives and the general rule is the same for all of them :

fasmg manual wrote:

When a string of bytes is provided as the value to any of these instructions, the generated data is extended with zero bytes to the length which is the multiple of data unit.

This rule ensures that result is unambiguous when the string is short enough to fit in the data unit, for instance the following two instructions produce the same result:

Code:

dq "abcdefg"
dq 1*"abcdefg"

20 Sep 2016, 07:01

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 10 Oct 2016, 16:20

I have noticed that EMIT also allows a very simple implementation of a macro that could generate constants for arbitrary-length integer calculations:

Code:

macro dv value
        local length
        if value > 0
                length = bsr (value) shr 3 + 1
                dd length
                emit length: value
        else if value = 0
                dd 0
        else
                err 'negative values not supported'
        end if
end macro

This macro generates data in form of 32-bit length followed by this number of bytes containing the number, it fits the unsigned number into as few bytes as possible (little endian, of course).

Last edited by Tomasz Grysztar on 10 Oct 2016, 17:08; edited 1 time in total

10 Oct 2016, 16:20

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 10 Oct 2016, 16:33

Perhaps BSR can have a signed equivalent? SSR = signed scan reverse

10 Oct 2016, 16:33

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 10 Oct 2016, 17:07

revolution wrote:

Perhaps BSR can have a signed equivalent? SSR = signed scan reverse

I don't think it wold be all that useful, usually if you need an unbounded BSR then you need to treat positive and negative numbers differently, and then you simply have an IF block and BSR(-number) or BSR NOT number. In the above sample the format is unsigned by design (for example it fits 65535 into two bytes).

On a side note: fasmg's BSR operator also works with floats, and it ignores the sign of a number there.

10 Oct 2016, 17:07

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 11 Oct 2016, 12:22

While we are at it, here is how you can use the behavior of BSR and SHL with respect to floating-point type to generate any custom floating-point format. This example generates a sign bit in a separate byte (for clarity), stores exponent as a plain two's complement number (in many formats it would be biased to an unsigned number) and a 16-bit mantissa (or a "significand", as some purist might insist) that includes the highest bit which is always set. It can be easily tweaked to generate any specific format that one may need (and there used to be lots of them). To get a rounded mantissa instead of simply truncated one, one bit more should be extracted and the value corrected accordingly.

Code:

x = float 1/3

if x < 0
        sign_bit = 1
        x = -x
else
        sign_bit = 0
end if

db sign_bit

if x <> 0

        exponent = bsr x
        mantissa = trunc (x shl (15 - exponent))

        if exponent < -80h | exponent > 7Fh
                err 'exponent out of range'
        end if
        db exponent

        dw mantissa

else

        db 0
        dw 0

end if

You should keep in mind that fasmg has its own internal limitation on how many bits of mantissa it is able to maintain - it should be guaranteed only that it is at least as many bits as it is needed for the largest IEEE format that fasmg's data directives allow to produce (currently this is quad precision).
But if more precision is needed, the source code of fasmg allows to modify this limit very easily. The MANTISSA_SEGMENTS constant defined in FLOATS.INC file sets up the length of internally handles mantissa, in 32-bit segments.

11 Oct 2016, 12:22

revolution
When all else fails, read the source

Joined: 24 Aug 2004
Posts: 20700
Location: In your JS exploiting you and your system

revolution 11 Oct 2016, 13:50

Tomasz Grysztar wrote:

But if more precision is needed, the source code of fasmg allows to modify this limit very easily. The MANTISSA_SEGMENTS constant defined in FLOATS.INC file sets up the length of internally handles mantissa, in 32-bit segments.

I would be in favour of having this available to the source code as a directive or setting or something. For code to be posted here it feels wrong to me to have to tell others to modify their assembler in order to assembler the code. These kinds of extra assembly instruction steps can easily be forgotten or lost. And each time someone downloads a new version of fasmg they have to remember to change it accordingly.

11 Oct 2016, 13:50

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 11 Oct 2016, 14:06

revolution wrote:

Tomasz Grysztar wrote:
But if more precision is needed, the source code of fasmg allows to modify this limit very easily. The MANTISSA_SEGMENTS constant defined in FLOATS.INC file sets up the length of internally handles mantissa, in 32-bit segments.
I would be in favour of having this available to the source code as a directive or setting or something. For code to be posted here it feels wrong to me to have to tell others to modify their assembler in order to assembler the code. These kinds of extra assembly instruction steps can easily be forgotten or lost. And each time someone downloads a new version of fasmg they have to remember to change it accordingly.

Unfortunately there is no easy way to change this at a run-time because of how it is implemented. The variable-length implementation would be much more costly and would require substantial rewrite.

I mentioned the option of changing this value just because how simple it is to do it just in case someone may need it for some very specific purpose where you can afford to use tweaked tools. But if you plan to perform calculations on ultra-long mantissas and still want to share it as a regular fasmg source, then you would be much better off simply using fixed-point arithmetic with fasmg's long integers.

11 Oct 2016, 14:06

VEG

Joined: 06 Feb 2013
Posts: 80

VEG 03 May 2017, 09:49

I understand that it is a bit late and decision has been done, but I have some additional ideas, maybe someone will like them Smile

Code:

; Variant 1
emit {sizevar} 1, 2, 3, 4, 5
dword equ {4}
dd equ emit {4}

; Variant 2
emit <sizevar> 1, 2, 3, 4, 5
dword equ <4>
dd equ emit <4>

; Variant 3
emit ^4 or emit ^(sizevar) 1, 2, 3, 4, 5
dword equ ^4
dd equ emit ^4

It will allow to check actual sizes, not keywords like dword, in macros. So, "mov dword [eax], 1" will be treated as "mov {4} [eax], 1" and the "mov" macro will understand which size is required. So, it will be possible to create easily such "data types" for any platform where x86 terms like dword are not very acceptable.

03 May 2017, 09:49

zhak

Joined: 12 Apr 2005
Posts: 501
Location: Belarus

zhak 03 May 2017, 10:13

You can define your own data types with additional params. For example, in my framework I use custom datatypes `byte`, `word`, etc. with additional inner properties like __size (size of data type) or __length (total length of initialized data field), and the ability to get size of datatype with sizeof keyword. emit directive has simple syntax, common with other data types like db, dw, etc. which have been there for ages and no need to overcomplicate things

03 May 2017, 10:13

VEG

Joined: 06 Feb 2013
Posts: 80

VEG 03 May 2017, 10:17

zhak, it's just about "how to make FASM G more general and less x86". x86 has been in FASM 1 for ages, but FASM G removed most of it from the core. So, it means that such dramatic changes are acceptable for the project. FASM 1 was like NASM or YASM, but FASM G is unique. It allows to assemble everything you want. Just write a bunch of macros.

03 May 2017, 10:17

VEG

Joined: 06 Feb 2013
Posts: 80

VEG 03 May 2017, 13:42

About label, load and store. These directives use type/size hinting in such formats:

Code:

label smth dword at $ - 4
label smth:dword at $ - 4
store smth:dword at addrspace:addr
load smth:dword from addrspace:addr

If {sizevar} will be used as the syntax of size hinting, "dword equ {4}" will make these lines to look like this:

Code:

label smth {4} at $ - 4
label smth:{4} at $ - 4
store smth:{4} at addrspace:addr
load smth:{4} from addrspace:addr

For compatibility reasons, load/store/label can accept sizes in two formats: in plain numbers and in {number} format. So, when you declare the size of variable in bytes, it still will be possible to write just ":4".

It will look consistent with other things. Type/size hinting can be placed in 2 possible positions:

1. Before some expression. In this case only extended form "{4}" has to be accepted. Examples:

Code:

RECORD_SIZE := 128
mov dword [eax], RECORD_SIZE * 10 ; mov {4} [eax], RECORD_SIZE * 10
mov [eax], dword RECORD_SIZE * 10 ; mov [eax], {4} RECORD_SIZE * 10
BASE_ADDR := dword 0x325 ; BASE_ADDR := {4} 0x325
mov [eax], BASE_ADDR + 125 ; size is known from the BASE_ADDR, but we can overwrite if we wish

In this example I've used another suggested extension from this topic which allows us to declare labels with size hint using the ":=" operator.

2. Inside some directives where "labelname:size" form is acceptable (label/load/store/etc), just after label name and colon. In this case two forms can be accepted: extended form "{4}" and just a number form "4" also, so it will not be required to type additional characters if you prefer to declare size using a number.

So, almost everything will look like now, but all these byte/word/dword/qword and db/dw/dd/dq will be declared in x86 include files using "equ":

Code:

dword equ {4}
dd equ emit dword

; Example of the dd:
nums dd 1, 2, 3, 4, 5
; will be treated as:
nums emit {4} 1, 2, 3, 4, 5
; which is not hard to parse, it seems

Unfortunately, this approach will require changing of the current code which generates relocs (in the pe.inc and others). Now it replaces dd and dq (and it seems that "emit 4: label" will not generate a reloc), after this change it will be required to replace emit itself (and "emit {4} label" will generate a reloc).

03 May 2017, 13:42

Tomasz Grysztar

Joined: 16 Jun 2003
Posts: 8434
Location: Kraków, Poland

Tomasz Grysztar 03 May 2017, 19:00

VEG wrote:

I understand that it is a bit late and decision has been done, but I have some additional ideas, maybe someone will like them

These ideas are interesting, but perhaps a bit risky from the point of view of fasm's design. And as I recently noted the braces may not be so safe to use for new syntactical purposes.

VEG wrote:

Unfortunately, this approach will require changing of the current code which generates relocs (in the pe.inc and others). Now it replaces dd and dq (and it seems that "emit 4: label" will not generate a reloc), after this change it will be required to replace emit itself (and "emit {4} label" will generate a reloc).

As long as the instructions that need to generate relocations still used "dd" and not "emit" directly, it would still be enough to redefine "dd" only. The conversion to "emit" would happen at a deeper nesting level.

03 May 2017, 19:00

VEG

Joined: 06 Feb 2013
Posts: 80

VEG 03 May 2017, 19:34

Quote:

As long as the instructions that need to generate relocations still used "dd" and not "emit" directly, it would still be enough to redefine "dd" only

A programmer can write a table of pointers. In this case dd has to be used and it is important, because emit will generate no relocs. And it is not clear, at the first sight "dd" == "dbx 4:" == "emit 4:". But I don't think that someone will use emit for such purpose with x86 target. So, I'm just mentioning it. It can be just a "known behavior" thing. Like "always use dd if you want your relocs".

BTW, I was curious how generation of relocs works from the macro side, it was not clear how it decides that this is just a number and that is a label which has to be relocatable. And I have to admit that your solution with these "elements" is awesome. A really cool idea.

03 May 2017, 19:34

VEG

Joined: 06 Feb 2013
Posts: 80

VEG 03 May 2017, 19:54

Quote:

And as I recently noted the braces may not be so safe to use for new syntactical purposes.

It is possible to use more complicated syntax for safety (and it is better to choose something easy to parse). For example, {:4}. Actually, this syntax will be used just in macros, only internal purposes. A programmer will always use predefined types inside instructions like mov [eax], dword 1. According to load/store/label, it still will be possible to write just a number after the colon, so it will be also used.

03 May 2017, 19:54

Goto page Previous 1, 2

< Last Thread | Next Thread >

Forum Rules:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum