flat assembler
Message board for the users of flat assembler.

Index > Main > Segments

Author
Thread Post new topic Reply to topic
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 27 May 2010, 18:38
I need to understand the segments.

Memory is splitted in segments. Segments overlaps. Each application has it own segments (CS, DS, ES, etc). Each segment has it own offset. (seg:off - eg. cs:0002h).

Questions:
1. why do i need (or do i not need?) to put code and data (and anything else) in different segments?
2. if segments overlaps, a segment can change an important (and required) data of the other one? If yes, how to control that? If not, why not?
3. a binary file is a array of bytes in the disk, like:
Code:
B8 00 B3 01 C5 00 90 90 ...    

When we put them (the bytes) in different segments they are loaded in different segments, so there is no more a linear array of bytes? like:

Code:
memory: (lets assume CS=0000h, DS=0100h)
(0000:0000h) B8 00 B3 01
;     offset:0h 1h 2h 3h
; jmp 100h segments ahead:
(0100:0000h) C5 00 90 90
;     offset:0h 1h 2h 3h

## Graphical display ##
+----+
| B8 | : 0000:0000h (our CS)
| 00 |
| B3 |
| 01 |
| .. |
| .. |
+----+
| .. | : 0050:0000h (any other segment)
| .. |
| .. | ; etc.
+----+
| C5 | : 0100:0000h (our DS)
| 00 |
| 90 |
| 90 |
| .. |
| .. |
+----+
    

4. if (3. == true), where in the binary file it says what byte must be loaded in which segment?

Thanks.

_________________
Sorry if bad english.
Post 27 May 2010, 18:38
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 27 May 2010, 19:39
Teehee,

I assume that you're talking about real-address mode (or virtual-8086 feature of protected mode).

Memory consists of bytes (because it's minimal addressable unit). To address particular byte in real-address/virtual-8086 mode you need combination of two components, segment selector and effective address, which (using simple arithmetics) translates to physical address (or linear in virtual-8086 mode with paging enabled). Same physical/linear address can have different logical representation (aliasing). That's all.

1. You can use segment registers as you wish. Just make sure that your logical address (segment selector in corresponding register and effective address) is right.

2. Yes, your program can trash anything addressable. Follow the rule I've mentioned in (1).

3. Memory is contiguous array of bytes (though some of them can be not present physically), it's real-address mode addressing that causes the mess with segment registers. When you're reading file from disk, you put its contents in particular bytes with corresponding addresses. If you do it so adjacent bytes from file are written to adjacent bytes in memory, they stay that way.

4. Memory layout of file image depends on the way you put it. For example, when you use DOS service 0x4B "EXEC" with al==1 (load but don't execute) on MZ EXE file, DOS loads file according to its MZ header (skips header, modifies file image using relocations, for example) at the address that it chooses itself. On the other hand, DOS service 0x3F "READ" puts file contents into memory (starting with address specified by you) as is.
Post 27 May 2010, 19:39
View user's profile Send private message Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 04 Jun 2010, 14:46
when a program starts to execute, DS, CS and SS already contains their respective values, or i must do that?
Post 04 Jun 2010, 14:46
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4336
Location: Now
edfed 04 Jun 2010, 16:56
when a program starts, depending on the OS, segments are set or not to specified locations.


one thing, memory is not really splitted in segments.
memory is just a continuous array of PHISICAL bytes, up to 1 mB in real mode, up to 4gb in pm, up to 64 gb in IA32e, and up to 1tb in long mode.

segments are there to index the memory.
in real mode, we use 16 bits segments as index *16, because 16 bits addresses are not enough to access the full megabyte.

in protected mode, we use segments descriptors to DESCRIBE memory zones, where it is, how many bytes, and some indicators.
all modes after pm are variations of pm.
Post 04 Jun 2010, 16:56
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 04 Jun 2010, 17:48
Teehee,

Program startup has one common property: cs:(e)ip is the entry point. Content of other registers depends on OS and executable file format.

For DOS .Com cs==ds==es==ss==PSP segment selector.
MZ .Exe header contains relative values for cs and ss in the header.
GUI/console PE .Exe executes under flat memory model, thus cs contains selector for flat code segment, ds==es==ss contain selector for flat data segment, and fs contains selector for special segment with TEB structure of primary thread.
Post 04 Jun 2010, 17:48
View user's profile Send private message Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 07 Jun 2010, 16:20
Thanks guys.

Just a little question now. If i change, for example, CS value, the processor will stop run my code to run where I set in CS? Without jump requiered?

ex:
Code:
org 100h

   xor eax, eax

   mov ax,200h ; any address 
   push ax
   pop cs

   ; CPU says: I'm not here anymore

   xor ebx,ebx  ; ignored line

   ; ignored everything else
   ; .
   ; .
   ; .
    
Post 07 Jun 2010, 16:20
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4336
Location: Now
edfed 07 Jun 2010, 16:40
you cannot access CS register directlly, only ret, jmp and int can.
Post 07 Jun 2010, 16:40
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 07 Jun 2010, 16:43
Teehee,

Read manual, pop cs has no encoding (actually 0x0F is a prefix of many opcodes). mov cs, r/m16 results in #UD.
Post 07 Jun 2010, 16:43
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 07 Jun 2010, 16:59
Teehee wrote:
Just a little question now. If i change, for example, CS value, the processor will stop run my code to run where I set in CS? Without jump requiered?
Well technically changing CS is a jump. Just because the opcode mnemonic isn't jmp doesn't mean it is not a jump. ret is another example of a jump in disguise.
Post 07 Jun 2010, 16:59
View user's profile Send private message Visit poster's website Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 07 Jun 2010, 17:19
oh.. ok..

but then why fasm compiles pop cs?

So I can change any segment except CS?

In case of building an OS i will control CS by using GDT/LDT, right?
Post 07 Jun 2010, 17:19
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20343
Location: In your JS exploiting you and your system
revolution 07 Jun 2010, 17:55
Teehee wrote:
but then why fasm compiles pop cs?
8086 compatibility.
Teehee wrote:
So I can change any segment except CS?
You can change all segment registers including CS. RETF, JMP far and CALL far can change CS.
Teehee wrote:
In case of building an OS i will control CS by using GDT/LDT, right?
The app can still change CS if your OS is in real mode. And the app can still change CS in protected mode if you set the descriptor rights appropriately to allow that.
Post 07 Jun 2010, 17:55
View user's profile Send private message Visit poster's website Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 08 Jun 2010, 03:02
Teehee wrote:


So I can change any segment except CS?

In case of building an OS i will control CS by using GDT/LDT, right?


You are free to do whatever you want with CS, just make sure there is code waiting..

Lets assume we have loaded some code at 0x0050:0x0000 and will go to it...

Example using retf
Code:
push 0x0050 ;segment
push 0x0000 ;offset
retf
    


Example using far jmp
Code:
jmp 0x0050:0x0000
    


Example using far call
Code:
call 0x0050:0x0000
    


Once you learn the difference in the 3 you will easily choose
what will be the correct one to use in the given circumstance...

Note: A call pushes a return IP where the others do not...

_________________
Coding a 3D game engine with fasm is like trying to eat an elephant,
you just have to keep focused and take it one 'byte' at a time.
Post 08 Jun 2010, 03:02
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.