flat assembler
Message board for the users of flat assembler.

Index > OS Construction > Help reading FAT32

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Roberto Waldteufel



Joined: 11 Feb 2006
Posts: 12
Location: Isle of Jura, Scotland
Roberto Waldteufel
Hello everybody,

I am writing a small custom OS dedicated to running a single 32-bit protected mode application which cannot run under Windows because it needs to use 4 GB of RAM (yes I have 4 GB on my main computer) and Windows only allows applications access to half the address space (I could shoot Bill Gates!) and although I have a very basic version of it up and running now, there is not yet any support for file I/O. The application needs to fill most of the 4GB ram with data contained in several large files, so it has to be able to access the file system. Can anyone point me in the direction of some good docs describing how to read the FAT32 file system, or some assembly source code I could study to get the hang of it? I know how to do it in real mode DOS using Int 21h, but so far I have written everything to avoid the use of any DOS or BIOS calls at all so it can run in pmode without any switching back to real mode, and could even be made bootable. I would like to keep it that way, hence my need to learn the inards of the file system organisation. Any help or advice from those who have been here before me would be most welcome.

Best Wishes,

Roberto
Post 11 Feb 2006, 12:03
View user's profile Send private message Visit poster's website Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
There is an FAT32 explorer application in SolarOS source code.
http://www.oby.ro/os/index.html

In the sourcecode you can find a sample of a File_Read() function when you press Enter on a selected file.
It is in system32\app_user\disk_explore2.asm

You can find on the internet the FAT32 filesystem specifications. Study it carefully.

Basically to read a file, you will have to:
1)Get the file's starting cluster from the directory
2)Parse the FAT chain starting from the above cluster.
3)Read from the HDD each cluster you find in the FAT cahins.

Take care at the Sectors_Per_Cluster variables and the ROOT folder location from FAT32 BPB.

You will also need to have HDD ATA read routines working .
Again you can find such routines in SolarOS.
Post 11 Feb 2006, 13:10
View user's profile Send private message Visit poster's website Reply with quote
Octavio



Joined: 21 Jun 2003
Posts: 366
Location: Spain
Octavio
Roberto Waldteufel wrote:
Hello everybody,

Can anyone point me in the direction of some good docs describing how to read the FAT32 file system, or some assembly source code I could study to get the hang of it?
Roberto

for documentation:
http://www.osdever.net/cottontail/
for sources ,my own OS and many others hobby OSES have source code.
for reading fat32,here are a few of them:
http://j__martinez.tripod.com/informatica/sistopindex.html
but first i would try to reduce the memory requirements of this program
or use a 64bit OS.
Post 11 Feb 2006, 13:57
View user's profile Send private message Visit poster's website Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
BTW I guess win2003 sever edition has an option that allows applications to use 3Giga of RAM and reserves only 1G for the os... exactly for this kind of memory hungry applications.
Post 11 Feb 2006, 17:20
View user's profile Send private message Visit poster's website Reply with quote
Roberto Waldteufel



Joined: 11 Feb 2006
Posts: 12
Location: Isle of Jura, Scotland
Roberto Waldteufel
Thank you both for the replies - I have had a look and I can see this is going to be more complicated than I thought. Which file in SolarOS contains the ATA Read functions? I understand the sequence of operations but I don't know what values I have to output to which hardware ports in order to tell the drive controller to fetch a specific sector to a memory buffer, nor which hardware interrupt to set up to catch the controller's response when it has completed the task. I assume this is what is in the ATA Read functions, and that is what I need to study. I know about the 3 GB limit in XPPro as opposed to 2GB limit in XP Home (what I have), but I feel Microsoft has sold me a product unfit for purpose, so the last thing I am about to do is to throw good money after bad buying another piece of bloatware from them which would still be unfit for purpose, albeit only half as much so as what I already have. 1 GB is still an excessive amout of address space for the operating system to deny the application in my opinion. If I can find no better solution I will just set up a real mode callback and run under DOS, copying blocks of sectors into conventional RAM below 1 MB and then copying the data to high memory in protected mode - this would require an awful lot of mode switching though for the amount of data I need to move, so a 32-bit protected mode solition would clearly be much better.

Best Wishes

Roberto
Post 12 Feb 2006, 12:50
View user's profile Send private message Visit poster's website Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
In that release the functions for reading a sector from an ATA HDD are in:
\system32\hardware\hdd\ata_pio.asm

You can see the ports definitions and equates at the top of that file. You can ignore the part after the Macro_RDTSC macro in HDD_LBA_Read function since they are for performance counting only.

Basically you only need HDD_LBA_Read() in order to read a sector (or many)
You can disable IRQs in the command send to the HDD and then you do not need IRQ 14/15 handlers for HDD.

However this simple function uses PIO mode only and this will be slower that UDMA modes.So BIOS could be faster if it is using UDMA...

Also reading a file is a liittle more complicated than reading a single sector Wink

_________________
"Any intelligent fool can make things bigger,
more complex, and more violent.
It takes a touch of genius -- and a lot of courage --
to move in the opposite direction."
Post 12 Feb 2006, 13:40
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
With win2k and more recent, you *can* access more than 4GB of memory... it has to be done in "views" though, because of the x86's limited address space. Look up "Address Windowing Extensions" on MSDN. Or you could of course switch to a 64bit CPU and OS Smile
Post 02 Mar 2006, 14:18
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2140
Location: Estonia
Madis731
Could you reveal the secrets behind your project Very Happy

Is it a DNA-analyzer? Maybe some NN application? FFT-calculations on radiosignals from "out there"? Wink
Post 02 Mar 2006, 18:33
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Dex4u



Joined: 08 Feb 2005
Posts: 1601
Location: web
Dex4u
Or maybe a game Wink .
Post 02 Mar 2006, 19:22
View user's profile Send private message Reply with quote
Roberto Waldteufel



Joined: 11 Feb 2006
Posts: 12
Location: Isle of Jura, Scotland
Roberto Waldteufel
Hi again,

To f0dder - I was completely unaware of this and will look into it, but I have come quite a long way now building a program in pure assembly launching from DOS and will try to see it through but it is good to have another avenue to explore if all my attempts should prove fruitless.

To Madis - certainly - it's not a secret, just a bit off-topic for an OS development forum which is why I didn't mention it before. I am the author of a high performance draughts-playing program (checkers if you are American) which about 3 years ago played a marathon 68-game match under proper match conditions against the (human) World Champion, Alex Moiseyev, and emerged victorious by a greater winning margin than has ever occurred in any man-machine draughts match (yes there have been others!) either before or since. All previous versions of this program have been written for Windows using the PowerBasic compiler, but with a lot of inline assembler for the speed critical parts. Since it is basically doing an alpha-beta search (well actually a modern enhancement of alpha-beta called MTDF but I won't go into that now), speed is very important, but also it uses an endgame database which contains the win/loss/draw theoretical game value assuming best play on both sides for every possible position with a total of 8 or fewer pieces on the board. All top draughts-playing programs use such a database, but typically it is about 5 and a half gigabytes in size, so only some of it can be held in RAM while the rest is loaded on the fly into buffers using a "least recently used" (LRU) replacement strategy. The disk accesses cause a significant speed hit, but the database is of such value that it still improves play in most cases. Obviously hardware becomes imprtant, with more memory always a bonus, so when my main PC died and I had to get a replacement I got the maximum memory addressable on 32-bit architecture. I am hoping to devise a compression method to reduce the size of the loaded data to 4 GB by only storing data for positions where the stringer side is to move in certain very large slices of data, thereby reducing these slices by about 50% in size, and then run with no disk accesses at all during the searches.

I am now rewriting the whole program from the ground up, a huge task since it took years to put together even using a high level language, but although I am well used to inline assembler coding in my favoured compiler, I am not so familiar with accomplishing many of the mundane tasks like file handling for which I have always relied on my compiler's built-in functions up until now. I have found that loading 32KB chinks in real mode using DOS Int 21h functions and switching in and out of pmode to copy the buffer to high memory, which I expected to be very slow, in fact reads in about 10 megabytes a second, which is a lot faster than I had thought it would be. Nonetheless, it would still be cleaner to load it al in pmode.

Since DOS cannot read NTFS (which is what the new box's hard drive has) I am accessing the database from the hard drive that was in the old box that died. I have a rather strange setup because the new box uses RAID connections that are incompatible with my old hard drive, so I have the old drive housed in an external drive unit which is actually supposed to be for an external CD drive, but it works OK - it just plugs into one of my USB ports. DOS sees this drive with no problems, as it has FAT32 since my old box was running WinME. I haven't a clue what port numbers would access it - according to my BIOS it is listed as primary IDE on PATA channel 1 (parallel ATA - I have 4 serial ATA and 2 parallel ATA channels listed).

I hope all this isn't too off topic or too boring Smile

Roberto
Post 02 Mar 2006, 20:07
View user's profile Send private message Visit poster's website Reply with quote
gunblade



Joined: 19 Feb 2004
Posts: 209
gunblade
Hey,
Sounds like a really interesting project, and from a local Scot Very Happy.
I dont have much info on FAT32, but I did want to mention that its possible to access up to 64GB of RAM using linux. I know porting the code might be a pain, but would save you having to make your own OS. Not to put you off making your own OS, its a nice idea, would probably also be much more efficient, since it only has one purpose, but was just putting out there for something to fall back on.

Good luck
Post 02 Mar 2006, 21:13
View user's profile Send private message Reply with quote
Dex4u



Joined: 08 Feb 2005
Posts: 1601
Location: web
Dex4u
@Roberto Waldteufel, I see abit of a problem with your test setup, (if i understand it right ?), as your PC's BIOS must have a BIOS USB driver (dos does not have one as standared, so it must be BIOS (or remapping) this will work fine if you go back and forth to realmode, but will not work if you make a pmode Hdd driver, as you will also need a USB driver.

In Dex4uOS i have two drivers for floppy and 2 drivers for Hdd, one is a pmode driver the other goes back & forth to realmode for using BIOS int 13h.

This lets people boot and read/write to USB key fobs from Dex4uOS ( if there BIOS can boot from USB device).
Post 02 Mar 2006, 21:48
View user's profile Send private message Reply with quote
Roberto Waldteufel



Joined: 11 Feb 2006
Posts: 12
Location: Isle of Jura, Scotland
Roberto Waldteufel
Thanks Dex and Gunblade,

I had a suspicion my unusual set-up would complicate things - yes my BIOS does have a USB device driver with several channels (6 if I remember correctly). For the time being at any rate I am satisfied with the workaround I am using now - perhaps I should set up a DOS partition on my RAID drive that came with the PC - that would avoid going through the USB device driver. I have often considered Linux, but I am completely unfamiliar with its syntax and so have never taken the plunge, but it is good to know that it doesn't restrict RAM in the way Windows does. It has been many years since I last did any serious DOS programming and I am rather enjoying it, liking meeting a long lost friend, and although I am having to learn some new tricks for pmode most of that is reusable code which I may well find useful in future for other projects (generating a 10-piece database is something I might attempt one day for example - at least two other developers have already achieved this incredible task, one in America and one in Canada, but niether have any plans to release their work as far as I am aware, and both used a network of many computers to do the job). I may put the hardware-accessing code into a library and put it up on my website for others to download if I get enough working to make it worthwhile - it might be useful to pmode OS developers to have a bunch of such code snippets all together in the same place instead of spread out across the internet. Thanks again for your comments and interest.

Roberto
Post 02 Mar 2006, 23:42
View user's profile Send private message Visit poster's website Reply with quote
Dex4u



Joined: 08 Feb 2005
Posts: 1601
Location: web
Dex4u
Sounds like a supercomputer Cool , heres a dos one http://dosbeowulf.tripod.com/index.html
Post 03 Mar 2006, 01:42
View user's profile Send private message Reply with quote
bogdanontanu



Joined: 07 Jan 2004
Posts: 403
Location: Sol. Earth. Europe. Romania. Bucuresti
bogdanontanu
BTW: the speed of reading from the HDD will be much lower by using PIO modes in protected mode when compared to Windows or Linux UDMA drivers...

So even if you do get to read the 5G file form a FA32 partition using pmode drivers... it may take a while longer than what you have expected!

BTW standard FAT32 does have a filesize limit of 4G per file does it not?
Post 03 Mar 2006, 07:31
View user's profile Send private message Visit poster's website Reply with quote
Roberto Waldteufel



Joined: 11 Feb 2006
Posts: 12
Location: Isle of Jura, Scotland
Roberto Waldteufel
That looks like a cool setup at dosbeowulf - I wish I had known about this a year ago when i was offered a bunch of old 486's with no operating systems for free but I had no real use for them, or so I thiught! Although the entire database is 5.5 GB, it is split into many separate files with no one file bigger than about 1 GB - I have never trird to use individual files bigger than 1 GB, but I noticed that on the files that were the largest DOS seems to choke towards the end of reading such a file and the machine hangs - but it can cope perfectly well with files of about 700-800 MB. As I have said, I am hoping to trim down these largest slices in order to squeeze everything into 4 GB and avoid any disk access during the searches. It stands to reason that 4GB could be a theoretical limit for file size in a 32-bit file system because you would need more than 32 bits to address any position beyond 4 GB in a larger file, although I suppose you could maybe address more by using a 64-bit address in edx:eax or in an mmx register, or even just by using the pentium's native segmentation to extend to 64 GB. In any event, I have not needed such large file sizes becaude the database is split into more manageable chunks.
Post 03 Mar 2006, 08:34
View user's profile Send private message Visit poster's website Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
Roberto Waldteufel wrote:
.... Although the entire database is 5.5 GB, it is split into many separate files with no one file bigger than about 1 GB .... As I have said, I am hoping to trim down these largest slices in order to squeeze everything into 4 GB and avoid any disk access during the searches. ..... In any event, I have not needed such large file sizes becaude the database is split into more manageable chunks.
Great post!
Wow, lots to discuss here:
http://www.acfcheckers.com/origin.html
For you homegamers, we know that checkers/draughts is thought to be the forerunner of chess, and several thousand years old, having been played in both Egypt and India and every country in between, for thousands of years. What about the equally ancient game of WeiQi (in PuTongHua) or GO (Nippongo), which however is played on a 19x19 board (unlike draughts which is played on either an 8x8 or 10x10 board--Hey Roberto, does your 5+gb data base represent the range of possible moves for the 64 square game or the 100 square game?)
Here's my main question: Has this rather large database been split into approximately 1 gbyte sizes because of computer architecture/operating system limitations??? In other words, what do you mean by "more manageable"? Why is a data base of n gigabytes "more manageable" when deconstructed into several pieces? Isn't there some considerable overhead managing which component of the data base is currently in memory? What about a fresh look at the data base itself, rather than exclusively focusing on a dedicated operating system to support the game(--an excellent decision in my opinion)? I am not suggesting that the STRATEGY of the participants needs to be reexamined, but rather the method of representing the search--I think you mentioned a variant of alpha-beta--IN ASSEMBLY LANGUAGE. In other words, how certain are you, that a game of draughts played on a 10 x 10 board with rules abcd, requires 5 gigabytes of storage? Thanks for a great topic! Smile
Post 03 Mar 2006, 12:40
View user's profile Send private message Reply with quote
Roberto Waldteufel



Joined: 11 Feb 2006
Posts: 12
Location: Isle of Jura, Scotland
Roberto Waldteufel
Hi Tom - this program plays exclusively 8x8 Anglo-American style draughts, not 10x10 International or any of the other variants such as Brazilian, Czech, Russian or Italian draughts, each of which has different rules and some of them are played on even larger boards than 10x10. However, the 8x8 board does not mean it is 64 squares, as it would be in chess for example. In draughts only half the squares are used, which makes it a most amenable game for programming on 32-bit architectire, because a bit-map of 32 bits can hold information about every square - for example oall the squares with Black pieces can be represented by a single dword, similarly for the squares with Whte pieces, the squares with kings, etc. By careful choice of which bit represents which square the moves can then be generated in parallel. Most brute force search algorithms store the position in memory and update it as they traverse the search tree, but by writing in assembler I have succeeded in storing the position entirely in registers while the search progresses, which has uielded a significant speed-up.

The search algorithm MTDF (memory-enhanced test-driver function) was discovered and published by Professor Aske Plaat, and a Google search will provide much research material by him on the subject.

The database I am using was not generated by me - it is the same database that Chinook, the program written by a team led by Professor Jonathon Schaefer, uses, and in fact it is available for download (if you have a fast connection) from the Chinook web site. The database is heavily compressed using run-length encoding, and is organised in several files based on the precise material balance. I have not tried to alter the way it is split up because then the idexing scheme would need to be recalculated. Because of the compression, actually probing the database is a very complicated programming task and will be a real challenge to do in assembler, but I am sure I will be able to do it eventually and I can then test it against my existing implementation to check that the results agree consistently. Schaefer's compression method is not necessarily the best, but it is hard to come up with anything that offers better compression without compromising on the speed of probing. I am certain that any program that does not use such a database will be weaker than one that does use it, although the precise size of the database might be reduced by a clever compression scheme. However, several others have generated a similar database independently and as far as I am aware none is below 5 gigabytes in size. At present I am still putting all the bits together and debugging as I go - whenever possible on a big project like this I try to add code in small amounts and test after each addition so as to make it easier to track down bugs. At the present time I am loading the database but not using it yet until I am satisfied that no bugs remain in the search engine itself. As for alpha-beta in assembler - why not? alpha-beta, MTDF, PVS, NegaScout and all similar search algorithms ultimately depend on speed, and the best way I know to prodice super-fast code is to program in assembler. An old saying much beloved of C-programmers is "Boys program in Basic, Men program in C and Madmen program in Assembler." Well if that is the case then I went straight from boy to madman! I still use the Basic language for HLL Windows apps, and PowerBasic compilers produce code just as good as C, but unfortunately they are all designed to produce Windows apps except for their DOS compiler, and that one only compiles 16 bit code. For many years the folks at PowerBasic have been talking about a possible Linux version but nothing seems to have come of that. Moving to pure assembler might seem a big step, but I am really enjoying programming again in an environment where the OS is not designed to stop me doing things I want to do, and I think I will probably take on other projects in this way in future.

Roberto
Post 03 Mar 2006, 16:47
View user's profile Send private message Visit poster's website Reply with quote
Dex4u



Joined: 08 Feb 2005
Posts: 1601
Location: web
Dex4u
Just a side note on files sys:
1. from my understanding you can have a file up to 4GB in size on a fat32 file sys.
2. fat32 uses only 28bits, the overs are reserved, But then cluster sizers are variable.
3. Found this link which may help. http://www.microsoft.com/windowsxp/using/games/expert/durham_fs.mspx
Quote:
Moving to pure assembler might seem a big step, but I am really enjoying programming again in an environment where the OS is not designed to stop me doing things I want to do, and I think I will probably take on other projects in this way in future.
I agree 100% with the above quote, i could not go back to programming for other OS, i find it easier to make something myself, than try to understand how a OS work, only to find that the next ver is totally differant.
The only problem i have found by using this method, is people using your programs do not share your enthusiasm and only want to use your program if it run on, say XP Sad.
Post 03 Mar 2006, 17:30
View user's profile Send private message Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias
Roberto: Plaat's mtdf (MIT) http://minilop.net/amazons/doc/SearchStrategy.html
appears to be focused on chess, rather than checkers/draughts. I would argue that the two games have a similar origin, but with regard to developing a modern computer program to tackle the two games, checkers/draughts is EASIER by at least an order of magnitude, simply because of the smaller quantity of combinations which must be searched. Hence, I don't think you are wise to follow Plaat.

http://pages.prodigy.net/eyg/Checkers/10-pieceBuild.htm

"most of the 10-piece database can be built efficiently using only 1gb ram, but at least one machine needs to have 2gb or the largest database subdivisions will build much more slowly...."
I have not uncovered the secrets of this database deconstruction, but I believe that the real GAIN or improvement in your computer programming adventure will NOT come from your efforts to create a new operating system, much as I applaud your determination to do so, but rather from investigating this 1990 based search algorithm called Chinook.
<http://www.cs.ualberta.ca/~jonathan/Papers/Papers/preview.ps>

As I read today, about Jonathan Schaeffer's Chinook program, I realized that he used the 1980's style (LISP based) so called "artificial intelligence" terminology and techniques to create this data base, which can probably be whittled down to something at least ten times smaller, and with concomitant improvement in search speed by repudiating this much overhyped technology (alpha beta!), and commencing anew, with the fundamental rules of Checkers/Draughts to create a compendium of every possible move, (old fashioned British Museum!), instead of blindly incorporating his gargantuan behometh of nonsense (further complicated by subdivision!) into your brand new operating system.
Let me digress for a moment. In that era when Chinook was being developed, second half of 1980's, I taught AI for a large corporation. I was shocked to discover, as I prepared one evening for my lectures, that the fast fourier algorithm I had been using for my own research, was unwieldy, and taking up HUGE amounts of memory, unnecessarily. In fact, a plain vanilla DFT, for a relatively small quantity of points, took as much time to execute as the "FAST" implementation, which is faster by eliminating redundant computations. The solution lay, NOT in revising the Burrus modification of the split radix implementation of the FFT, I was using, but rather, in the method of programming it, using assembly language. Burrus, of course was one of the principals, back in those days, at Texas Instruments (TI), and his team used TI digital signal processors very effectively: their assembly language was quite satisfactory, for the TI device, but not when applied to the Intel cpu.
In those days, memory was precious. The Chinook checkers program was developed in that era, so I can imagine what kind of data base this is: unwieldy comes to mind. I would not be surprised to learn that the entire program was conceived, developed, and implemented using LISP, one of the least useful languages ever foisted on the academic world, sponsored by two of the 1980's era most prestigious North American universities: Stanford, and MIT. I won't be surprised to learn, eventually, that Schaeffer developed the .5-1 gbyte subdivisions of the entire data base to accommodate some technical problem associated with Digital Equipment Corporation PDP computers.
I can remember long debates, at Stanford, with the guy who developed Mycin, another AI debacle, written in LISP in that era. Lots of unpleasant memories.....
Roberto, please, step back for a day or two. Take a look at the fundamental attributes of this data base. Ask why it must be 5 gigabytes in size, and why it must be decomposed into smaller portions. Please treat the fact that checkers/draughts uses only 32 squares, a number corresponding to the quantity of data bits on the older models of the Intel cpu architecture, as only a coincidence, rather than a starting point for design of a new computer program to evaluate moves on a game board. Checkers/Draughts is a mathematical puzzle, which deserves a mathematical approach to its solution, not a haphazard gathering of components from various sources, incorporating a "data base" of potential moves developed in an era of thinking about computer architecture that frankly was incorrect, even at that time, but to employ such obsolete thinking today, is to guarantee an inauspicious outcome for your venture, despite, what I am sure will be a LOT of hard work.
Everything I have learned today reinforces my belief that you will make a much greater improvement by throwing out this Chinook mess, and starting over.
Embarassed
Post 03 Mar 2006, 22:07
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on GitHub, YouTube, Twitter.

Website powered by rwasa.