flat assembler
Message board for the users of flat assembler.

Index > Main > question about arrays

Author
Thread Post new topic Reply to topic
geekbasic@gmx.com



Joined: 25 Oct 2022
Posts: 71
Location: Arizona
geekbasic@gmx.com 08 Jan 2023, 20:06
I added arrays to my compiler which generates com files.

I noticed that the larger I define the array, the larger the com file.

The com file is maxed out in 64kb size if I do this....
Code:
list dw 32000 dup(0)    


This is pretty much the largest array I am able to define for com files.

Why is that? Does it put a buffer in the file or something?

My asm skills are still very poor. It shows. :lol:
Post 08 Jan 2023, 20:06
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 08 Jan 2023, 20:57
Code:
list rw 32000    

But this does not change the fact that in 16-bit mode, segments are limited to 64 kb. And this array barely fits in 64 kb, so to access its elements you need:
1) Align the beginning of the array to the paragraph size (16 bytes);
2) Calculate the beginning of the segment from which this array starts;
Code:
    call @f
@@:
    pop dx ; dx = ip from cs
    mov cx, cs
    add dx, list - @b ; dx = list from cs
    shr dx, 4
    add dx, cx ; dx = new cs value
    mov fs, dx    

3) Use this segment address in one of the segment registers when accessing this array
Code:
    index = 500
    mov di, 10000 * 2
    mov ax, [fs:index * 2] ; load
    mov [fs:di], ax ; store    
Post 08 Jan 2023, 20:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20754
Location: In your JS exploiting you and your system
revolution 08 Jan 2023, 23:54
To define blocks of data you can use the question mark or rb, rw, etc.
Code:
list dw 32000 dup(0)  ; define a block of data initialised to all zeros
list dw 32000 dup(?)  ; reserve a block of uninitialised data
list rw 32000 ; reserve a block of uninitialised data    
If you put uninitialised data at the end of your code then it doesn't get stored in the output file. The following will be zero bytes.
Code:
format binary
list dw 32000 dup(?) ; takes up no space in the output    
But the following will be 64001 bytes because there is initialised data after the uninitialised data:
Code:
format binary
list dw 32000 dup(?)
ret ; this extra byte forces all previous data to be output    
Post 08 Jan 2023, 23:54
View user's profile Send private message Visit poster's website Reply with quote
Hrstka



Joined: 05 May 2008
Posts: 65
Location: Czech republic
Hrstka 09 Jan 2023, 08:28
From Wikipedia:
Quote:
The COM format is the original binary executable format used in CP/M (including SCP and MSX-DOS) as well as DOS. It is very simple; it has no header (with the exception of CP/M 3 files), and contains no standard metadata, only code and data. This simplicity exacts a price: the binary has a maximum size of 65,280 (FF00h) bytes (256 bytes short of 64 KB) and stores all its code and data in one segment.
Post 09 Jan 2023, 08:28
View user's profile Send private message Reply with quote
geekbasic@gmx.com



Joined: 25 Oct 2022
Posts: 71
Location: Arizona
geekbasic@gmx.com 09 Jan 2023, 19:47
Thank you all for the prompt replies as always. It's very helpful.

1.
What I gathered from reading your responses is that declaring a larger array cannot work as it has too much data and cannot be addressed with larger numbers.

2.
A ? instead of a 0 in the dup isn't supposed to fill up the com. Only if defind at the end of the program.

3.
All program data must fit in just under 64kb.



So, it seems I need to make my compiler put array definitions after all other definitions.

What if I have multiple arrays there? Will that cause the data to be put in the file?
Post 09 Jan 2023, 19:47
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20754
Location: In your JS exploiting you and your system
revolution 09 Jan 2023, 22:37
Allocate space for uninitialised data a runtime.

Learn to love the segment registers. Smile
Post 09 Jan 2023, 22:37
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1738
Location: Toronto, Canada
AsmGuru62 10 Jan 2023, 14:06
You can allocate room for additional data in COM program starting at PSP + 64Kb.
Now, if you need to dynamically allocate and free memory blocks with unknown sizes -- it is better to use functions from INT 21H -- there are memory allocation functions:

AH=48H (allocate memory block)
AH=49H (free memory block)
AH=4AH (resize memory block)

And, as revolution mentioned -- there is a pain of segment registers.
There are opcodes LDS, LES for this, however, using LDS will lose the variables declared in your COM program. What I mean is this:
Code:
mov cx, [your COM variable #1]
...
lds si, [far pointer beyond COM itself]
; do some work with DS:SI pointer

mov dx, [your COM variable #2]     ; <-- this will be wrong if you do not return DS back to a COM image.
    

You can use segment overrides, like this:
Code:
mov dx, gs:[your COM variable #2]
    

Or you can store/restore DS before/after using LDS opcode.

At the end of the 80s I coded so much of that stuff -- still recall the 'fun' of segment registers.
Post 10 Jan 2023, 14:06
View user's profile Send private message Send e-mail Reply with quote
geekbasic@gmx.com



Joined: 25 Oct 2022
Posts: 71
Location: Arizona
geekbasic@gmx.com 10 Jan 2023, 19:04
Wow, that is amazing information that I never heard. This is why I love this forum!

When i get back home from the city later today, I will have to study your posts and return back here with my progress.
Post 10 Jan 2023, 19:04
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 11 Jan 2023, 04:37
I want to clarify something else. Do not forget that in a COM file, by default, the stack is configured at the end of the COM file segment. If the array is located at the end, then it can be overlapped by the program stack and erased during its operation.
Post 11 Jan 2023, 04:37
View user's profile Send private message Reply with quote
geekbasic@gmx.com



Joined: 25 Oct 2022
Posts: 71
Location: Arizona
geekbasic@gmx.com 18 Jan 2023, 01:16
macomics wrote:
I want to clarify something else. Do not forget that in a COM file, by default, the stack is configured at the end of the COM file segment. If the array is located at the end, then it can be overlapped by the program stack and erased during its operation.


So would it be better to place definitions of arrays at the top of a program code and use jmp to skip past it?

AsmGuru62 wrote:
You can allocate room for additional data in COM program starting at PSP + 64Kb.
Now, if you need to dynamically allocate and free memory blocks with unknown sizes -- it is better to use functions from INT 21


If I am trying to define arrays with a constant number of elements, would it suffice to define the additional data at the beginning of the com file?

What would be a reason to have the data dynamically allocated? Please excuse me this is very difficult for me to understand.
Post 18 Jan 2023, 01:16
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1738
Location: Toronto, Canada
AsmGuru62 18 Jan 2023, 03:24
If array is static, like, say, a 256 16-bit words -- then you do not need the dynamic allocation.
You can allocate the array inside the COM segment or outside of it.
The memory outside of COM begins at segment = DS + 1000, for example if DS = 27B8 then your free memory starts at 37B8:0000 and that is a far pointer, which means you need to again switch segments or use an override to access these bytes beyond COM.
You can allocate array at the beginning of COM, but you will have to JMP to your code and Anti-Virus tools will not like this fact -- all of your work will 'seem' as malicious code. Remember, also, if you declare your array inside COM, then it will take away the room from code + stack + other data. If your arrays are big -- the best bet is to use DS + 1000H and start from there.

Example: you need two arrays: 40,000 bytes and 16,000 of 16-bit words.
The sizes of both arrays in paragraphs:
40,000 / 16 + 1 = 09C5
32,000 /16 + 1 = 07D1

The 1st array starts at DS + 1000
The 2nd array starts at DS + 1000 + 09C5
The 3rd array starts at DS + 1000 + 09C5 + 07D1

Those ^^^ are segment values. The full far pointer will have the offset set to 0000.
Say, DS = 27B8 -- then 2nd array full pointer = 417D:0000 and that value should be stored in a COM memory for easy loading into registers.
Post 18 Jan 2023, 03:24
View user's profile Send private message Send e-mail Reply with quote
geekbasic@gmx.com



Joined: 25 Oct 2022
Posts: 71
Location: Arizona
geekbasic@gmx.com 18 Jan 2023, 05:12
Am I to use int 21 ah=41h for defining the data outside of the com space?

Say I am defining static arrays. How do I determine the value for DS?

Is it possible to declare the data starting just after the 64kb?

What you're saying makes sense to me. It's the application of the information I an having trouble with.
At least I am learning Smile
Post 18 Jan 2023, 05:12
View user's profile Send private message Visit poster's website Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 18 Jan 2023, 05:27
geekbasic@gmx.com wrote:

So would it be better to place definitions of arrays at the top of a program code and use jmp to skip past it?
No. I'm hinting that the stack in the COM program should be reconfigured to an area that will be described explicitly and will exclude intersections with other elements of the program. The stack should be represented as the same uninitialized array at the end of the program and set it explicitly. And place your other arrays after the stack.

Code:
format binary as "COM"
org 256
use16
    mov ax, cs
    mov ds, ax
    mov es, ax
; I want to note that I redefine the value of the sp and ss registers. Thus, stack overflow will not erase the program code.
    add ax, stack_start shr 4
    mov ss, ax
    mov sp, stack_size

; ...

    mov ax, 0x4C00
    int 33

; Defining an uninitialized array for the stack immediately after the program code
label stack_start at (($ + 15) and (0 - 16))
stack_size = 1024 ; 1 kb of stack

; And after the array in which the program stack will be located, you can declare a certain number of uninitialized arrays necessary for the program.
label my_array word at (stack_start + stack_size)    
Post 18 Jan 2023, 05:27
View user's profile Send private message Reply with quote
macomics



Joined: 26 Jan 2021
Posts: 1197
Location: Russia
macomics 18 Jan 2023, 05:57
geekbasic@gmx.com wrote:
Am I to use int 21 ah=41h for defining the data outside of the com space?
By default, for a COM program, DOS allocates all available memory. Therefore, you can use any space behind the COM segment at your discretion. But remember that you have only 640 KB of memory.

To use the function ah=72/int 33, you first need to free all unused COM memory by the program. To free memory, you first need to determine the most senior address used by your program for code, initialized data, stack, and uninitialized data. Then you align this address by the value of the paragraph, add the PSP address of your program to it and free all the remaining memory by calling the function ah=74/int 33.

Code:
; Pseudocode
cs=ds=es=ss
cs:0x0000 PSP db 256
cs:0x0100 code rb code_size
    mov ax, cs
    mov ds, ax
    mov es, ax
    add ax, cs + stack shr 4
    mov ss, ax
    mov sp, stack_size

    mov bx, end_of_used_memory
    mov ah, 74
    int 33 ; free memory

; ...

    mov bx, 0x1000 ; 64 kb
    mov ah, 72
    int 33 ; allocate new block
    mov [dynamics_array.segment], ax

; ...

    lds si, [dynamic_array]
    les di, [dynamic_array]

ds:0xXXXX data db data_size dup (0)
label dynamic_array dword
dynamic_array.offset dw 0
dynamic_array.segment dw 0
align 16 ; paragraph
ss:0xYYYY stack rb stack_size
xx:0xZZZZ udata rb uninit_data_size ; static arrays
align 16 ; paragraph
label end_of_used_memory at ($ shr 4)    


geekbasic@gmx.com wrote:
Say I am defining static arrays. How do I determine the value for DS?
I have already shown an example of their definition in post above, but you can define the stack and uninitialized data inside the virtual block

Code:
format binary as "COM"
org 256
use16

stack_size = 1024

    mov ax, cs
    mov ds, ax
    mov es, ax
    add ax, stack_start
    mov ss, ax
    mov sp, stack_size
    push 0
    mov bx, end_of_used_memory
    mov ah, 74
    int 33 ; free

; ...

    mov ax, 0x4C00
    int 33

virtual at ((0 - 16) and ($ + 15))
   stack_start dw (stack_size / 2) dup (0)
   static_array1 dw 100 dup (0)
   static_array2 db 1024 dup (0)
   static_array3 dw 5120 dup (0)
   label end_of_used_memory at (($ + 15) shr 4)
end virtual    


geekbasic@gmx.com wrote:
Is it possible to declare the data starting just after the 64kb?
It is possible.
Post 18 Jan 2023, 05:57
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20754
Location: In your JS exploiting you and your system
revolution 18 Jan 2023, 08:58
You don't need to use any DOS calls either. For a simple app you can just start using memory. For example this code assembles to 41 bytes and defines arrays of more than 200kB.
Code:
org 0x100

        mov     sp,top_of_stack
        mov     ax,data_1 shr 4
        mov     dx,cs
        add     ax,dx
        mov     ds,ax
        ; use data_1
        mov     ax,data_2 shr 4
        mov     dx,cs
        add     ax,dx
        mov     ds,ax
        ; use data_2
        mov     ax,data_3 shr 4
        mov     dx,cs
        add     ax,dx
        mov     ds,ax
        ; use data_3
        mov     ax,data_4 shr 4
        mov     dx,cs
        add     ax,dx
        mov     ds,ax
        ; use data_4
        int     0x20

        align   2
        rb      1 shl 10
top_of_stack:

        align   0x10
data_1: rb      1 shl 14
data_2: rb      1 shl 15
data_3: rb      1 shl 16
data_4: rb      1 shl 17    
Post 18 Jan 2023, 08:58
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8465
Location: Kraków, Poland
Tomasz Grysztar 18 Jan 2023, 11:27
macomics wrote:
To use the function ah=72/int 33, you first need to free all unused COM memory by the program. To free memory, you first need to determine the most senior address used by your program for code, initialized data, stack, and uninitialized data. Then you align this address by the value of the paragraph, add the PSP address of your program to it and free all the remaining memory by calling the function ah=74/int 33.
There is also another option - to convert the program to MZ .EXE format, as MZ header has a field allowing to specify the maximum amount of memory "cushion" that DOS may allocate in addition to memory required for the program.

It is very easy to add an MZ header to .COM program in such way, that the program itself does not require any modifications. You can do it like this:
Code:
format MZ
segment program ; label the segment containing program code
PSP = program - 10h ; PSP is the 256 bytes immediately before, which is 16 paragraphs
; set up initial CS:IP the same way .COM program does:
entry PSP:100h
; similarly set up SS:SP
stack PSP:0FFFEh

; now proceed with the text of .COM program, no alterations needed:

org 100h

        int     20h

; finally, reserve the remainder of the segment, otherwise stack could end up in unallocated memory:
rb 10000h - $    
Once you're using MZ format, the "heap 0" setting allows to tell DOS to only allocate as much memory as necessary for structures defined by the program:
Code:
format MZ
segment program
PSP = program - 10h
entry PSP:100h
stack PSP:0FFFEh

heap 0 ; if you remove this line, there will be no memory available to allocate without freeing it first

org 100h

        mov     bx,1000h
        mov     ah,48h
        int     21h
        jc      failure
        mov     dx,_success
        jmp     summary
failure:
        mov     dx,_failure
summary:
        mov     ah,9
        int     21h
        int     20h

_success db "Memory allocation succeeded",13,10,"$"
_failure db "Memory allocation failed",13,10,"$"

rb 10000h - $ ; reserve the stack space    
When using MZ format, you can also define additional segments and reserve more memory directly in the program text. I wouldn't recommend doing that with .COM format, because you don't know how much memory is present after your initial segment - there may be none, especially if your program was loaded in high memory for some reason. But when you reserve more memory in MZ format, DOS is not going to execute it unless there is enough memory for everything declared.

And also, MZ format allows to have actual code and data exceeding 64k limit, although you still need to segment them unless you are using FRM. And remember to switch to using function 4Ch of int 21h instead of int 20h when you start making programs spanning more than a single segment. There is a lot of topics we could touch here, perhaps I should not jump the gun.
Post 18 Jan 2023, 11:27
View user's profile Send private message Visit poster's website Reply with quote
geekbasic@gmx.com



Joined: 25 Oct 2022
Posts: 71
Location: Arizona
geekbasic@gmx.com 22 Jan 2023, 01:45
Thank you for the responses. I am doing my best to understand them all individually.
Post 22 Jan 2023, 01:45
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.