flat assembler
Message board for the users of flat assembler.

Index > OS Construction > My boot code can't read the drive

Author
Thread Post new topic Reply to topic
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 09 Jul 2009, 17:15
Hi all,

I've started to write my own little OS, I'm really at the beginning, but I'm already stuck.

Using FAT file system would be too complicated for me, so I designed my own, primitive file system that is easy to read, since it doesn't even have a cluster chain, and it can be only used on floppies (since it has a fixed size).

The boot loader searches the file system, and loads KERNEL.BIN, then jumps to the kernel. The kernel in it's current state, just sets up its INT handler, and prints a message to indicate it's executed successfully, then waits for a key, and then reboots the machine.

Everything's all right, when I boot the floppy in the following methods:
- Boot under WMVare virtual machine.
- Boot under QEMU.
- Boot on a real machine, from GRUB command-line:
rootnoverify (fd0)
chainloader +1
boot

So, the kernel gets loaded and invoked correctly in the listed 3 situations. BUT when I boot the floppy directly on a real machine (when BIOS loads the boot record after POST), my boot code fails. I've tried everything I can think of. I have no idea what's the problem.

Things I've figured out:
- The boot sector gets loaded correctly. It's sure, 'cause it prints a message on the screen.
- The floppy doesn't have any bad sectors.
- Thanks to the lot of debug signs I've added, I know that even the very first read attempt fails. (The boot loader prints a dot after reading a sector, but it doesn't print a single dot.)
- I've even added a drive reset (XOR AH, AH - INT 13h) that executes prior to the first read attempt, however I don't see any point of it.
- I know that occasionally it may happen that INT 13h can't read a sector from the floppy, in these cases, the read operation should be retried, after a drive reset. If I do that, I get an infinite loop, and I just hear the floppy drive buzzing.
- I've added initial value to SP to make sure that I don't interfere with the stack. I don't know what's the initial value of SP when the boot loader is invoked.
- Meanwhile, I've modified the code to only attempt to read one sector at once, but it still doesn't work.

I suppose I need to do something to the floppy drive prior to the first read attempt, but a reset doesn't work. I have no idea what's wrong, because I also wrote a boot record a year ago, and it worked - however it read much less sectors. I've even took a look at the FreeDOS bootup code for FAT12/16, but I can't notice why does that work.

I hope you can point out what I missed, since you are much more experienced than me. It's only the beginning of my first OS.

Thanks,
MegaBrutal
Post 09 Jul 2009, 17:15
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4060
Location: vpcmpistri
bitRAKE 09 Jul 2009, 17:45
Post your code.

You can use a far jump to set CS:IP to known values, and then put SS:SP a to safe place. One machine here puts SS:SP at the end of the BIOS save area, and it is a problem if not moved.

I'd debug the read parameters.
Post 09 Jul 2009, 17:45
View user's profile Send private message Visit poster's website Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 09 Jul 2009, 18:18
OK, I'll post the code. Excuse me that it's really messy, especially since I added the debug messages.

There is an LBA to CHS conversion, and my old code from a year ago didn't have such. Although it may be possible that is fails, but
a, it run without problem under WMVare, QEMU, and even when I booted it from GRUB.
b, I even debugged this code under DOS debug, and it was successful (of course I had to change the segments).

The code really assumes that it is loaded to 7C00h, but afaik boot loaders get loaded to this address.

P.S.: I hope you won't kill me because of the code is written for NASM. I realize that it's a FASM forum, but my problem is not related to the assembler.


Description: Messy bootstrap loader.
Download
Filename: primboot.asm
Filesize: 8.27 KB
Downloaded: 364 Time(s)

Post 09 Jul 2009, 18:18
View user's profile Send private message Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 09 Jul 2009, 18:21
I also post the cleaner version of the boot loader too. Maybe that's easier to look out. I've done many, maybe unnecessary modifications to the code since this state, while I was trying to find the problem.


Description: Cleaner version of my bootstrap loader.
Download
Filename: primboot.bak.asm
Filesize: 5.76 KB
Downloaded: 359 Time(s)

Post 09 Jul 2009, 18:21
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4060
Location: vpcmpistri
bitRAKE 09 Jul 2009, 21:30
ORG 0000:7C00 is not a valid assumption - some BIOS jump to 07C0:0000.
Post 09 Jul 2009, 21:30
View user's profile Send private message Visit poster's website Reply with quote
egos



Joined: 10 Feb 2009
Posts: 144
egos 09 Jul 2009, 21:39
For a start:
- set up SS register.
- try to read a sector again after an error was returned. Do this for several times.
- don't read sectors located on different tracks at one time.
- ensure that a buffer does not cross 64 kb boundaries.
Post 09 Jul 2009, 21:39
View user's profile Send private message Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 09 Jul 2009, 22:37
I've modified my code, according to your suggestions.
But the issue is still the same. :S

Now it sets SS, jumps to segment 0 at the beginning, read sectors one by one. Still, it works on virtual machines, and GRUB can still boot it, but real BIOS still fails to boot it on both of my machines.

P.S.: Still, I'm a bit suspicious about my LBA to CHS conversion code, but I don't understand what could it ruin that only affect direct BIOS boots. Maybe I shouldn't mess with 32-bit registers? But still, then why does it boot from GRUB? I also don't understand that my old ShitOS boot record a year ago why did work. But I can't use that, since I thought out a very different concept for the file system. By the way, I attach that as well. Maybe someone notice what I didn't ruin in that.


Description: Old boot sector, written about a year ago - why does this work?
Download
Filename: sector0.asm
Filesize: 2.25 KB
Downloaded: 329 Time(s)

Description: New boot asm
Download
Filename: primboot.asm
Filesize: 8.38 KB
Downloaded: 367 Time(s)

Post 09 Jul 2009, 22:37
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4060
Location: vpcmpistri
bitRAKE 10 Jul 2009, 00:06
What prints on your screen? You said the first "." doesn't appear, so it does like:

Loading ShitOS...
L:FS


...and then nothing, right?
Post 10 Jul 2009, 00:06
View user's profile Send private message Visit poster's website Reply with quote
Mac2004



Joined: 15 Dec 2003
Posts: 314
Mac2004 10 Jul 2009, 04:02
MegaBrutal: Here's my example which performs a basic binary chunk loading from a floppy disk. I hope this helps you a bit.

http://board.flatassembler.net/topic.php?t=6529

Regards
Mac2004
Post 10 Jul 2009, 04:02
View user's profile Send private message Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 10 Jul 2009, 08:35
bitRAKE: It looks like this, when it works:

Code:
Loading ShitOS... L:FS
........................C:FST
F:KER
L:KER
........................J:KER
HAAAALIHOW!!!    


The last line is printed by the kernel startup code.

When it doesn't work:

Code:
Loading ShitOS... L:FS
Operation failed!    


Before the "Operation failed!" message appears, you can clearly hear the floppy drive's 10 read attempts.
Post 10 Jul 2009, 08:35
View user's profile Send private message Reply with quote
egos



Joined: 10 Feb 2009
Posts: 144
egos 10 Jul 2009, 11:40
Code:
%define FILESYS_SEG     07F0h
%define KERNEL_SEG      0AF0h
    

Align your buffers on 512-byte boundaries else maybe the following requirement will be missed:
- ensure that a buffer does not cross 64 kb boundaries.

Code:
        DEC     BYTE [_RD_FAILCOUNT]
        CMP     [_RD_FAILCOUNT], BYTE 0
        JNZ     _DO_READ
        JMP     FAIL
_READ_SUCCESS:
        ADD     DI, 200h
    

Try to save DI value when int 13h is called. And CMP instruction is not required Smile
Post 10 Jul 2009, 11:40
View user's profile Send private message Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 10 Jul 2009, 11:53
egos wrote:
Align your buffers on 512-byte boundaries else maybe the following requirement will be missed:
- ensure that a buffer does not cross 64 kb boundaries.


I don't really get it. I thought if a define a new segment, I don't cross 64K boundaries. I thought you were talking about segment boundaries, which I obviously don't cross. The data gets read to 07F0h:0000h, and 24 sectors don't take 64 kilobytes.

As for the CMP - I didn't think that DEC sets flags. But it obviously does:
http://faydoc.tripod.com/cpu/dec.htm
I wonder how many needless instructions I put into my codes, because of such ignorance.
Post 10 Jul 2009, 11:53
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20414
Location: In your JS exploiting you and your system
revolution 10 Jul 2009, 12:36
MegaBrutal: Don't rely on random webpages for info on the instructions, download the definitive manuals from Intel for AMD or both. Don't settle for anything less than the original source. And it is all free.
Post 10 Jul 2009, 12:36
View user's profile Send private message Visit poster's website Reply with quote
egos



Joined: 10 Feb 2009
Posts: 144
egos 10 Jul 2009, 13:55
MegaBrutal wrote:
I don't really get it. I thought if a define a new segment, I don't cross 64K boundaries. I thought you were talking about segment boundaries, which I obviously don't cross. The data gets read to 07F0h:0000h, and 24 sectors don't take 64 kilobytes.
I meant the boundaries in physical address space. However you don't actually reach the first 64 kb boundary:

0x7F00 + 512*24 < 0x10000
0xAF00 + 512*24 < 0x10000

But don't forget about this in the future.

Reset _RD_FAILCOUNT when you read each sector separately. And what about DI?
Post 10 Jul 2009, 13:55
View user's profile Send private message Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 10 Jul 2009, 14:18
egos wrote:
I meant the boundaries in physical address space. However you don't actually reach the first 64 kb boundary:

0x7F00 + 512*24 < 0x10000
0xAF00 + 512*24 < 0x10000

But don't forget about this in the future.
I didn't know about that. :S I thought ES:BX can point to anywhere within the conventional memory.
egos wrote:
Reset _RD_FAILCOUNT when you read each sector separately. And what about DI?
Thanks, _RD_FAILCOUNT really needs to get initialized on each READ_DRIVE calls.

As for DI - afaik, INT 13h/02h does nothing with DI.
http://www.ctyme.com/intr/rb-0607.htm
ES:BX defines the buffer for it, and doesn't return anything in DI. But if it would ruin DI, the first read attempt should succeed anyway, and it would also fail on WMVare, QEMU, and GRUB. Note, GRUB is the most strange situation. I can imagine that the WMVare and QEMU BIOS are pretty different than my real BIOS, but in the case of GRUB, my real BIOS works. Or does GRUB install some extra support for INT 13h/02h? I think it's unlikely, but possible.
Post 10 Jul 2009, 14:18
View user's profile Send private message Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 10 Jul 2009, 15:25
I don't know why I didn't do this before, but I modified the code to print the error code it gets from INT 13h. Interesting. It returns error 9 that says "data boundary error (attempted DMA across 64K boundary or >80h sectors)".

So egos, you are really close with the 64K boundary. Though I still don't get it - where would I cross that boundary? As we calculated, I don't cross it. And I also don't read more than 128 sectors, since I read them one by one, and I would read only 24 anyway. In the case if somehow I accidentally ruin the registers (BX or AL) that would also cause an error on virtual machines and GRUB.

Basically, I can't imagine anything that could prevent my BIOS to read those sectors, but would still let it boot via GRUB.

More interesting, under FreeDOS DEBUG, this very same code could read without any problems to weird segments like 5000h that even crosses the 64K boundary. Or does FreeDOS install some fixes for INT 13h?
Post 10 Jul 2009, 15:25
View user's profile Send private message Reply with quote
egos



Joined: 10 Feb 2009
Posts: 144
egos 10 Jul 2009, 18:10
MegaBrutal wrote:
As for DI - afaik, INT 13h/02h does nothing with DI.
Are you sure? Check this firstly.

What this means:
Code:
MOV     EBX, 002015012h
                ==
    
Your conversion code too complicated partly because you are using too complicated conversion formulas. There are some simplest:

S = N mod SPT + 1
T = N div SPT

H = T mod Heads
C = T div Heads

What is more, for double-sided floppies conversion code can be oversimplified:
Code:
; ax <- linear sector number
; cl <- SPT
; dl <- floppy disk number (<80h)

  and     dx, 7Fh ; dh <- 0, dl.7 <- 0

  div     cl
  mov     ch, al
  mov     cl, ah
  shr     ch, 1
  adc     dh, dh
  inc     cx
  mov     ax, 0201h
  int     13h
    
Post 10 Jul 2009, 18:10
View user's profile Send private message Reply with quote
MegaBrutal



Joined: 09 Jul 2009
Posts: 10
Location: Hungary
MegaBrutal 11 Jul 2009, 11:19
EGOS, YOU ARE A GOD!

Your code is obviously much more efficient than the one I used. It is much smaller, so now I got some spare space in my boot record. :) It was already really tight there. When I put something in, I had to put something out.

I've replaced my READ_DRIVE function with the one you presented, and now my boot loader works fine in every situation.

It wasn't enough for me, because I also wanted to know why didn't my first code work, what caused this paranormal phenomenon. My yucky overcomplicated LBA to CHS conversion actually worked. If it were miscalculate the CHS address, it would have caused problems on virtual machines as well. But my code calculated the CHS address from the value of ECX. OK, don't make any assumptions about the initial value of the registers. I thought I obey this. Actually, I cared for 16-bit registers. The fact is that my BIOS leaves some junk in the high 16 bit of ECX. Then, my code calculated an impossibly high CHS value to read. That's it.

Well, in the last year, I started to write the kernel, and at first, I wanted to prepare it for reading hard disks. That's because I used ECX - the code was prepared to calculate 32 bit LBA addresses as well. I copied this code into my boot loader, along with a function, GET_DRIVEDATA.
Code:
MOV     EBX, 002015012h    

It was there, because my READ_DRIVE was expecting the drive parameters in this form. At first, this value is returned by GET_DRIVEDATA that called INT 13h/08h, and transformed the result to the above format. Meanwhile, I deleted GET_DRIVEDATA, because I always test on a floppy drive with these parameters, my file system is sized for a 1.44M floppy, and I also had to make space for the debug messages. So instead of calling GET_DRIVEDATA that would return the value for EBX, I moved a constant value to it.

Back to ECX. I figured it out by adding a test to the beginning.
Code:
     SHR     ECX, 16
     CMP     ECX, 0
      JZ      WELCOME
     JMP     ERROR    

So I only checked the high 16 bit of ECX, because I knew that I initialize CX well. And as I expected, the boot loader run on WMVare, QEMU, when it was booted from GRUB, but it printed an error message when I tried to boot it directly after the POST.

I don't know how could I miss it, how could I be so forgetful, since I checked if I make assumptions on initial register values, and I didn't notice my mistake, since I only checked for 16-bit registers.

It wasn't a paranormal phenomenon at all, it was just my inadvertence.

Thanks again! I don't know how much time it would take to figure it out without your help! :)
Post 11 Jul 2009, 11:19
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.