flat assembler
Message board for the users of flat assembler.
Index
> OS Construction > IDE bootloader (no BIOS) Goto page 1, 2 Next |
Author |
|
AdamMarquis 20 Jun 2003, 01:39
IDE Bootloader for OS experiments v1.01
*This post is updated as soon as something new happen.* *Look at post: IDE Bootloader take two* A.M. 27/04/2005 Last update: November 17, 2003 Underlying principle The code takes advantage of the fact that most partioning software partition at head bondaries, and that the first partition must start at head 1 to keep the MBR intact. That gives you most of the time 63 sectors to play with, unless some other program use that space for copy protection or other stealth stuff. WARNING! Don't forget to keep the header valid if you want your boot media (floppy, USB stick, etc) to be readable! Just copy the 64 first bytes (FAT volume information) into a file, patch the first two bytes to EB 3E (short jump 62 bytes ahead after the jump instruction, which is 2 bytes) using your favorite Hex editor and and write at the beginning of your code: Code: Beginning: ;calculations are based on this label file "filename" or keep a unpatched backup and play with "file" start offset! For the Windows user For those running WinXP/2000, there's a program at www.internalreality.com called "Device Sector Viewer" that let you see those bytes. There's also a Win200/XP dd program (raw sector read/write) in the SPB Linux Distribution, along with c source, at: http://www.8ung.at/spblinux/. (usbboot.zip) Unfortunately, the distribution only boots from USB device acting as an hard drive or Zip drive, so USB FDD sticks won't work with it. Anyway, we're not in raw assembly programming to lose time hacking linux ;o) Thanks for your contributions, Adam Marquis References: *Inspiration *ATA/ATAPI-6 (Big file) Code: ; IDE PC Bootloader for 386+ ;Features: ;- Disable all interrupts (NMI too) ;- Enable the (fast) A20 gate ;- Go into flat 32 bit protected mode ;- Loads n sectors from IDE hard drive ; following the MBR and jumps to it. ;- Can work from any bootable medium, ; even network! ;- BIOS independent ; ;TODO list ;-Support for PACKET atapi interface ;-Raw, small and efficient ethernet access ;-Suggestions? ;Have Fun! macro align value { times (value-1)-($+value-1) mod value nop} use16 org 7C00h Beginning: file "header.bak" ;optional, for FAT12 filesystems jmp 0:$+5 cli ;Disable interrupts cld ;Clear direction flag in AL,070h ;Turn NMI off or AL,080h out 070h,AL mov al,0Ch ;floppy motor off mov dx,03F2h ;(optional) out dx,al out 0e1h,al in AL,092h ; or AL,2 ; Enable the fast A20 gate out 80h, AL ; 1ms Delay out 092h,AL ; lgdt [CS:GDT] ;Loads GDT mov EAX,CR0 ; or AX,1 ; Set protected mode bit mov CR0,EAX ; jmp 8:Protected ;Flush pipeline align 8 ;Faster, optional GDT: ;Flat, minimal GDT Table dw (8*3)-1 ; No need for the blank descriptor, dd GDT ; here it holds the special dw 0000h ; six bytes GDT pointer. dw 0FFFFh,0000h,9A00h,00CFh ;code dw 0FFFFh,0000h,9200h,00CFh ;data use32 Protected: ;Now in 32 bit protected mode mov EAX, 10h ;Data descriptor pointer mov DS, EAX ; mov ES, EAX ; Initialize crucial descriptors mov SS, EAX ; mov ESP, 7C00h ;Initialize the Return Stack xor EDI, EDI mov EAX,3 ; Load N sectors after MBR (0=256 sectors) call HD_Read ; at [EDI] xor EDX, EDX ;Test display routine mov EDI, 0B8000h @@: mov EAX, dword [EDX*4] inc EDX call ShowDword cmp EDX, 128 jnz @b @@: in al, 64h test al,1 jz @b @@: in al,60h kb: in al,64h ;make sure buffer is empty test al,1 jnz @b test al,2 jnz kb mov al,0FEh ;before sending reboot out 64h,al jmp $ IDE_Idle: mov DX, 1F7h in al,dx ;Read Status register test al,80h ; Busy bit jnz IDE_Idle ; must be zero test al, 8h ; Wait until DRQ bit jz IDE_Idle ; is ready sub EDX,7h ;1F0h: data port ret HD_Read: mov bl, al ;BL: loop sector counter mov dx, 1f2h ;Sector count register out dx,al mov eax,0E0000000h ;28 bits LBA address inc edx ;1F3h LBA Low (0:7) out dx,al shr EAX, 8 inc EDX ;1F4h LBA Mid (8:15) out dx, al shr EAX, 8 inc EDX ;1F5h LBA High (16:23) out dx, al shr EAX, 8 inc EDX ;1F6h LBA & DEV, LBA (24:27) out dx, al inc EDX ;1F7h Command register mov al,20h ;"READ SECTOR(S)" command out dx,al IDE_PIO_In: call IDE_Idle mov ecx,256 ;512 bytes rep insw ;16 bit wide bus dec bl jnz IDE_PIO_In ret ShowDword: ;Display routine mov CL, 8 ;nibble count NextNibble: rol EAX, 4 ;most significant nibble first push EAX and EAX, 0Fh cmp AL, 0Ah jl @f add EAX, 7 ;offset to letters @@: add EAX, 0F30h ;white on black attribute stosw pop EAX dec CL jnz NextNibble ret SpaceLeft = 510-($-Beginning) times SpaceLeft db 0h dw 0AA55h ;Boot sector magic number ;=================================================================== ;Number to string conversion directives from FASM manual ; * Used to get free space, as the code will always compile to 512 bytes. d1 = '0'+ SpaceLeft shr 8 and 0Fh d2 = '0'+ SpaceLeft shr 4 and 0Fh d3 = '0'+ SpaceLeft and 0Fh if d1>'9' d1 = d1 + 7 end if if d2>'9' d2 = d2 + 7 end if if d3>'9' d3 = d3 + 7 end if display 'Space left in MBR image: ',d1,d2,d3,'h',13,10 ;=================================================================== Last edited by AdamMarquis on 27 Apr 2005, 17:27; edited 54 times in total |
|||
20 Jun 2003, 01:39 |
|
comrade 20 Jun 2003, 01:49
Cool, thanks!
|
|||
20 Jun 2003, 01:49 |
|
AdamMarquis 21 Jul 2003, 23:47
bitRAKE wrote: Does this work on multiple sectors? I can see how it'd work on one sector, but DX would not be vaild on second loop. I have not tried the code, but will in the future - thank you. You're right! Someone finally saw it! ;o) Seriously I didn't see that obvious bug, thanks! I tested it only for one sector before posting, that wasn't the brightest thing to do. Adam Marquis |
|||
21 Jul 2003, 23:47 |
|
bitRAKE 23 Jul 2003, 15:32
I don't know if this is an error in FASM, or if both instructions are valid and different; but FASM adds a size override prefix byte to "mov ds, ax" and doesn't if the instruction is "mov ds, eax" - the latter should work and save some bytes.
Also, there is only one instruction using the ES: selector - should be able to use an override on that as well, but I haven't figured out the syntax. I don't know if assuming CS=0 on entry is a good idea. Code: use16 org 7C00h Beginning: cli ; Disable interrupts cld ; Clear direction flag in al, 070h ; Turn NMI off or al, 080h out 070h, al in al, 092h ;\ or al, 2 ; Enable the A20 through FastA20 out 080h, al ; 1ms Delay out 092h, al ;/ lgdt [CS:GDT] ; Fetch GDT mov eax, cr0 ;\ inc ax ; Set protected mode bit 0 mov cr0, eax ;/ mov ax, 10h ; Data descriptor mov dx, 1F2h ; sector count jmp 8:Protected ; Flush pipeline align 8 GDT: dw 8*3 - 1 ; flat memory GDT dd GDT dw 0000h dw 0FFFFh,0000h,9A00h,00CFh ; code dw 0FFFFh,0000h,9200h,00CFh ; data use32 Protected: ; Now in 32 bit protected mode ; mov ds, eax ;\ mov es, eax ; Initialize crucial descriptors ; mov ss, eax ;/ ; mov esp, 7C00h ; Initialize the Return Stack mov al, 1 ; sectors to read xor edi, edi ; destination address ; Loads N sectors after MBR (0=256 sectors) mov bl, al ; BL act as sector counter out dx, al mov eax, 0A0000002h ; Base address in physical CHS format inc edx out dx, eax mov dl, 0F7h ;\Write to Command Register mov al, 020h ; Read and Retry out dx, al ;/ waitrdy: in al, dx ; Read Status register and eax, 8 jz waitrdy imul ecx, eax, 256/8 ;\ mov dl, 0F0h ; Sector fetch loop ;CS: override, what syntax? rep insw ;/Respect 0 = 256 sectors mov dl, 0F7h ; Write to Command Register dec bl jnz waitrdy jmp 0 ; Jump to it, in pmode with no interrupts SpaceLeft = 510-($-Beginning) times SpaceLeft db 0 dw 0AA55h ; Boot sector magic number |
|||
23 Jul 2003, 15:32 |
|
AdamMarquis 23 Jul 2003, 18:18
bitRAKE wrote: I don't know if this is an error in FASM, or if both instructions are valid and different; but FASM adds a size override prefix byte to "mov ds, ax" and doesn't if the instruction is "mov ds, eax" - the latter should work and save some bytes. Don't know why I used AX instead of EAX in pmode, every other piece of code i have here use EAX.... anyway thanks for the hint! The lgdt [CS:GDT] is the same size as my 3 instructions sequence. I don't really know for sure which is faster; the former is truly more elegant though. I posted the code here in the first place because the first part up to the descriptor setup is a great prefix to get code running in 32 bits. I use it all the time in my experimentations. But interrupts should be enabled, since the PC is interrupts driven (thinking of network and sound cards especially). IMO, it's best to set the descriptors one and for all and never touch them back after. I also like the feature that one can reuse the 32 bit code to read the HD later on. I'm in desperate need of PCI know-how. So if anyone can start from here and get to talk to a network card, he/she could build a 1 sector bridge between 2 network cards! Go ahead, steal my idea =) Just booting from raw ethernet would be great (SImplified Network booting). Makes me think of the GROS project at: http://www.geocities.com/k_r3456/personal.html I posted this small experiment in the first place to fill the hole when one wants to learn about those things using Internet. Also, I hope these code snippets will help in the battle against DMCA type laws and my holy war ;o) against software bloat. Thanks for your interest! Adam |
|||
23 Jul 2003, 18:18 |
|
bitRAKE 23 Jul 2003, 20:24
AdamMarquis wrote: The lgdt [CS:GDT] is the same size as my 3 instructions sequence. I don't really know for sure which is faster; the former is truly more elegant though. Quote: IMO, it's best to set the descriptors one and for all and never touch them back after. I also like the feature that one can reuse the 32 bit code to read the HD later on. I think that there are enough people interested in OS creation to really get something out of it in the long run. We just need to use a common language (FASM) and develop some solid routines for the standards followed by most hardware. Some hardware don't have standards (ie modern video cards). I'm getting back to my machine today - can't wait... |
|||
23 Jul 2003, 20:24 |
|
AdamMarquis 30 Jul 2003, 17:09
Code: ;Written in FASM 1.48 ;Test: put bytes from keyboard on screen ; ;*Posted to write on HD's second sector ; to test the IDE bootloader ; ;PC Memory map at boot time (from OSD) ;000000-0003FF interrupt vector table ;000400-0004FF BIOS data area ;000500-007BFF FREE CONVENTIONAL MEMORY ;007C00-007DFF boot sector ;007E00-09FBFF FREE CONVENTIONAL MEMORY ;09FC00-09FFFF extended BDA (variable length) ;0A0000-0FFFFF video memory and BIOS ROMs ;100000-10FFEF high memory area (HMA) ;10FFF0- FREE EXTENDED MEMORY org 0 use32 Start: mov EDI, 0B8000h Testing: cmp EDI, 0B8000h+(80*25*2) jge Start xor EAX, EAX call KBRead call ShowChar jmp Testing KBRead: in AL, 64h test AL, 1 jz KBRead in AL, 60h ret ShowChar: push EAX shr EAX, 4 call ShowHex pop EAX ShowHex: and EAX, 0000000Fh cmp EAX, 0Ah jl @f add EAX, 07h ;letter @@: add EAX, 0F30h ;number stosw ret |
|||
30 Jul 2003, 17:09 |
|
valy 01 Aug 2003, 09:17
Hi
I tried the similar PIO code above, in conjunction with RDTSC. With a PIII 650 MHz, a 20 Gb HDD, a standard 40-pin cable, one read sector : about 105,000 cycles. I tried with LONG mode (22h instead of 20h, reducing the loop twice, and "in eax,dx") : about...15,000,000 ??? Cannot be my HDD. Maybe needs an init ?! Or penalty with jz... or more probably my code. I'll investigate further on it. THE AIM I'd like to translate some BIG files from FAT32 partition to my OS's FAT (PM32&full 32-bit). I think about that : creating a *contiguous* file under FAT32 : it will be the reserved space for my OS's FAT and disk space. My FAT won't manage gigs of memory at the beginning. Basically if I can load my big files to flat memory and understand FAT32/ata it will be OK. THE PROBLEMS Now my question : I'm puzzled with the fact that FAT32 sees... 255 heads. I know my HDD is good but I cannot figure it has PHYSICALLY so many heads ! So I won't even try to access head 254 with PIO programming. Does that mean that I must : 1/ convert it to LBA 2/ convert it to CHS again ?! and what about H parameter ?! OK, I understand 255 heads is for LOGICAL ata... did anybody dare to access head 254 ?! I feel I need more docs, still have to google. I like understanding what I program. I like BASIC tutorials with smooth progression (thx A. Frounze, for instance ) Regards _________________ Easier, faster |
|||
01 Aug 2003, 09:17 |
|
JohnFound 01 Aug 2003, 10:08
If you can speak russian, here is one good site about writing protected mode fast ATA driver. Take a look, maybe at least sources will help you.
http://users.caucasus.net/oska/ |
|||
01 Aug 2003, 10:08 |
|
valy 01 Aug 2003, 10:16
Da svidania !!!
It's about the least expression I remember of, from my travel in Russia in 1980 Don't know even what to download from his page Currently reading Bitrake's link : http://home.no.net/tkos/info/hd.html, looks interesting Thx anyway for care, JohnFound _________________ Easier, faster |
|||
01 Aug 2003, 10:16 |
|
JohnFound 01 Aug 2003, 10:55
OK here is the source. It's with english (maybe bad) comments. Still there is an Int13.txt file with description on russian but you can find some translator, unfortunately I have no free time to translate it just now.
Be careful with this sources. They work on very low level and you can easyly erase your hard disk.
|
|||||||||||
01 Aug 2003, 10:55 |
|
valy 01 Aug 2003, 11:00
Thx ! I'll have a look. Bye
_________________ Easier, faster |
|||
01 Aug 2003, 11:00 |
|
crc 01 Aug 2003, 19:09
> I tried the similar PIO code above, in conjunction with RDTSC.
> With a PIII 650 MHz, a 20 Gb HDD, a standard 40-pin cable, one read > sector : about 105,000 cycles. > I tried with LONG mode (22h instead of 20h, reducing the loop twice, > and "in eax,dx") : about...15,000,000 ??? > Cannot be my HDD. Maybe needs an init ?! Or penalty with jz... or > more probably my code. I'll investigate further on it. PIO is slow. If you are concerned about speed, you'll need to delve into DMA and use a dedicated IRQ. The advantage to PIO is simplicity. It's a lot simpler to implement (and better supported in my experience), even if it is slow. > Now my question : I'm puzzled with the fact that FAT32 sees... 255 > heads. I know my HDD is good but I cannot figure it has PHYSICALLY > so many heads ! So I won't even try to access head 254 with PIO > programming. Does that mean that I must : > 1/ convert it to LBA > 2/ convert it to CHS again ?! and what about H parameter ?! > OK, I understand 255 heads is for LOGICAL ata... did anybody dare to > access head 254 ?! > I feel I need more docs, still have to google. I like understanding what I > program. I like BASIC tutorials with smooth progression (thx A. > Frounze, for instance ) The BIOS interface permits a maximum of 1024 cylinders, 255 heads and 63 sectors. This calculates out to 504 megabytes. There are ways around this (mainly LBA), but your system can have 255 heads. I have written a raw harddrive driver (using CHS and/or LBA). It's a cross between Forth and Assembly though. It wouldn't be difficult to convert to pure assembly. I know for a fact that it works; I've tried it on five different computers without problems. The code is at my web site: http://retro.tunes.org |
|||
01 Aug 2003, 19:09 |
|
Ralph 04 Oct 2003, 16:49
I just had two quick questions.
1. Why are you using PIO now instead of your original routine? 2. I was testing your original routine in bochs, and it complained about non-byte I/O write to 01f3. Is that just due to the bochs HD bios? If so, are there many bioses that don't support word read/writes? I just want a simple loader for my OS, never thought I'd have to struggle so much with it. Isn't there some quick, straightward, mostly compadible way to just get a bunch of sectors into memory without using the BIOS? |
|||
04 Oct 2003, 16:49 |
|
AdamMarquis 05 Oct 2003, 21:29
PIO was always used, the code snippet is a sample program
to test the bootloader, just put it in your second sector (1) and it's supposed to output raw keyboard bytes. It worked in VMWare, and on a usb drive on a desknote laptop. IDE is 16 bits, but you can rewrite it to use byte transfers. PIO is supposed to be the most compatible way to load a sector in memory without a BIOS. |
|||
05 Oct 2003, 21:29 |
|
Ralph 06 Oct 2003, 19:32
Thank you for the reply, but it didn't really answer my question. I'll show you some code to elaborate. This is based on your original post some time ago:
Code: _ReadSectors: ;-T MOV EDI,10000h MOV AL,1 MOV CL,2 ;-T MOV BL,AL ;counter for loop MOV EDX,RegSectorCnt OUT DX,AL ;set num of sectors to transfer MOV AL,CL ;LBA address, starting sector? INC EDX ;hard disk sector number OUT DX,AL ;is this right? MOV DL,RegCmdStatus ;write command register MOV AL,ReadSectors OUT DX,AL ;read specified number of sectors _Wait: IN AL,DX ;read status register AND AL,00001000b JZ _Wait ;seek complete? MOV ECX,512/2 MOV EDX,RegData REP INSW ;this errors in bochs MOV EDX,RegCmdStatus ;read status register DEC BL JNZ _Wait RET That code is supposed to load CL bytes starting from sector AL into the address pointed to by EDI. If you look at the comments, there are two areas that I'm not too sure off. First the starting sector. I believe your post used EAX to transfer the LBA address to DX, but OUT DX, EAX errors out in bochs. Further down, REP INSW also errors out for the same reason. The second one is not a problem really since INSB works fine, but the entire routine simply fills 10000h with FFh, which is not the 2nd sector. I don't like to ask people to debug my code for me, but I've googled around for too long and don't feel like spending the majority of my time trying to get the OS to load rather than actually work on the OS, so I would really appreciate it if you could help me out. |
|||
06 Oct 2003, 19:32 |
|
AdamMarquis 07 Oct 2003, 01:44
Hi!
Just look at the first post's code, try it, then try to change some stuff. The new version still use insw but write to the 1f2+ registers bytewise. It works on everywhere I tried, and should work in bochs. The trick I learnt is to keep it simple to be able to share easly. I tried to be the clearest I could, just try the new code and tell me your impressions, if any, so I can correct the available information. I would especially like to hear from your project. I think there should be a model and a codebase for an OS based on the principle that there's 31.5k of free space at the beginning of almost every harddrive, right now I'm into my compiler design (I won't use the call instruction) and i try to finish it in my spare time. Adam |
|||
07 Oct 2003, 01:44 |
|
Ralph 07 Oct 2003, 19:19
Thanks, I figured it out. I was trying to shove all of EAX into DX at once. Works now. One change might be to allow more than 256 sectors to be loaded at once. That shouldn't be too hard to do though I'm sure, and I guess for loading a lot of data you should use something other than PIO anyway. I just haven't dared to wrestle with DMA yet, first comes the IDT mess.
What I'm working on is what I thought was the novel idea of making an OS based on Forth, but now that I've done more research I realize that a lot of similar ideas have already been implemented. Still, I believe my OS will have some unique features. I'm just fed up with all this disgusting bloatware floating around. Operating systems are still using 70s technology. Even Linux is pretty pathetic (except for maybe Gentoo and a few other distros). There's so much useless crap everywhere. Anyway, I'll spare you my rantings. I could keep going for hours :). |
|||
07 Oct 2003, 19:19 |
|
AdamMarquis 08 Oct 2003, 02:14
Hi!
I think we agree on many things about the state of information technology. I would like to add that forth clearly demonstrate that current hardware is also full of crap. Forth reduce the operand count at the assembler level, so that caches and pipelines of more than 2 stages are not worth the ressources spent: smaller opcodes result in no need for cache, simpler instruction dispatching unit, etc. A chip the size of a pentium4 using the same advanced process technology could yield a couple supercomputers on a single chip. I'm sure it's already done. I try to implement a new kind of compiler inspired from the colorforth/aha effort: -Small number of simple functions tokens (colors) -No call instructions (push push jmp) -No search at compile time (instantaneous) -definition implictly close previous definitions (no ;, just a ... primitive needed for fallthrough, if any I'll also use aligned primitives, so I can do lookback optimization without using a pointer (list in c.f.). The only issue is with litteral handling. I have a magenta word, with an argument 0 to 31, presently it's used to copy n bytes from source directly into compiled code, so I can easly define macros. I previoulsy intended to use # as a litteral compiling macro (much like "# 78563412"), but I'm not sure at all. My project changed shapes many times! For good conditional handling I might even change it once more. Anyway, I'm now at the stage of the IDT handling myself before thinking of network or fast IDE I/O. Menuet O/S is a great code base, clearer to read than Linux code to me. Adam BTW screw the DMCA, I paid for the machine I want to use it. Gladly I don't live in the US =) http://www.dreamsongs.com/ Great material |
|||
08 Oct 2003, 02:14 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.