flat assembler
Message board for the users of flat assembler.
Index
> MenuetOS > Software interupts for widgets, etc. Are they bad? Goto page 1, 2 Next |
Author |
|
VitalOne 16 Jul 2004, 05:17
Yeah, I agree. Interrupts slow a lot of things down. Maybe there should be an interrupt function for loading DLLs or something, like how spideros's MenuetOS COFF DLLs load. Or maybe interrupts should be eliminated (or left for backwards compatability)? With DLLs, writing drivers and widgets and things becomes a lot easier. But ofcourse people argue that if you eliminate the need for interrupts, then MenuetOS won't be an ASM OS anymore.
|
|||
16 Jul 2004, 05:17 |
|
Ville 16 Jul 2004, 08:02
Wishing wrote: If every application in MeOS calls the int 0x40 software interupt, then the kernal cannot process each request concurrently. bassically, everything that is done in MeOS that uses int 0x40 must wait in line. Wrong. When the process runs at kernel side executing a command, it is multitasked just like it is multitasked at the applications side. Wishing wrote:
Wrong again. Its the schedulers job to switch processes, not the kernel functions. Kernel side is part of the process. All the functions at kernel side are multitasked also. Wishing wrote:
Menuet uses ring-3 level protection, now for interrupts also. pre-4.8b Last edited by Ville on 17 Jul 2004, 13:47; edited 1 time in total |
|||
16 Jul 2004, 08:02 |
|
f0dder 16 Jul 2004, 13:58
interrupts are slow, though, especially when ring transitions are involved. Graphics and GUI are a hard thing to get right - you could move a lot of code to userspace to avoid ring transitions, but if that usermode code must call a lot of kernelmode functions you'll end up with even more transitions. Or you could move a _lot_ of code to kernelmode, and have each int call do a lot of work - but this means you have a lot of code running at a highly privileged level, which you have to make sure is bugfree.
Consider this little thought I had recently: Split the graphics primitives in two categories. "unbuffered" and "buffered". "unbuffered" (or "direct") would be as it is now, while buffered would require you to do a "StartGraphics" call, then all your buffered graphics requests, and finally a "EndGraphics" call (sorta like glBegin/glEnd when doing OpenGL coding). All of this would run in usermode, and the "GraphicsEnd" call would then finally do an INT call and switch to kernelmode, where the buffered commands would be executed. Shouldn't be too hard to implement - I wonder what kind of speed gain you could achieve with it. |
|||
16 Jul 2004, 13:58 |
|
Gomer73 17 Jul 2004, 04:44
Few things to comment on.
For wishing: Multi-tasking does not run like you think it does. There is no such thing as being able to draw two buttons concurrently(with the exception of dual processors). The computer can only run one set of instructions at a time. Multi-tasking means that every so often it pauses the current program running and jumps to the next program to run. So as for multi-tasking, all you have to worry about is whether or not the function is re-entrant(can be called more than once without returning from the function). I assume Menuet supports this. Speed really doesn't have much to do with interrupts. Task switches are where a bit of time is taken up but not with interrupts. Protection is where the speed slowness is. There is really little difference between a function call and an interrupt in speed if they are doing the same the(ie. staying at the same protection level or changing protection levels). If you want the video controller access routines at a different level than ring 3, no way to speed this up. Kernel interupts really don't have to be at ring 0, most are put there because they access hardware. I really don't know how menuet works, but for video routines, you could create a sement for the linear frame buffer in ring 3. This way all the video access routines wouldn't require a protection switch. The bug theory doesn't quite fly either. If you have a bug in your scroll bar routine, it doesn't matter whether or not the scroll bar routine is in the kernel or not, every application that uses that routine will be affected. Why Windows scrolls things quick and menuet has a lag has very little to do with using interrupts. It doesn't even have to do with hardware acceleration. It basically has to do with how the routines are programmed. On even my slow computer I can completely redraw a 1024x768 with 256 colors screen 256 times a second. This is without hardware acceleration. Probably these routines will speed up overtime, but they have nothing to do with interrupts. You can easily tell which areas can be improved. For example on any of the video demos(like the tunnel thing), you can see the mouse flickering. This is not due to using interupts, but because of the way the draw routine is designed. Programming the graphics routines in a specific order would allow the mouse to never flicker. The other thing is when it does a screen refresh. You can see exactly how it draws the screen, it draws the background, then the icons, and then the windows. Again this isn't because interrupts are used, it is because of the way the graphics routines are programmed. A way to avoid this and speed everything up is to draw everything to a seperate memory location(which is much faster than drawing to video ram), then when everything is drawn in, memory transfer the whole screen to video memory. At 256 screens per second on a slow computer, this would make the redraw look instantaneous. For Vital One: This is a misconception that has kept people away from ASM programming. You can use DLL's and have an ASM OS, my OS though just starting out does this very well, everything is a DLL in it(there is basically no kernel). It is just that the structure you need to have in place is a little more complicated. What using interupts does though is encourage the OS designer to be lazy. Interupts are number based and not dynamic by nature(although you could make them dynamic). The more programs an OS designer creates solely on interupt numbers, the harder it will be for them to switch to something dynamic. It all comes down to your compiler. For example the compiler could read a line like: drawWindow,100,200,300,400 and convert it to: mov eax, 10 ; draw window function(don't know what it is in menuet) mov ebx, 100 mov ecx, 200 mov edx, 300 mov edi, 400 int 40h Same thing for dynamic, you could load the dll into memory and then do something like so: mov eax,[Media_Player] add eax,Play_CD mov ebx,start_location int 40h Where media player would be the code that the DLL gave you after it loaded for the start of it's functions. This is very tough for menuet to do right now because it is so static: 1.) The OS is basically a 1.44meg image. Reliance on this is contrary to a dynamic nature. The whole kernel is compiled into one file(so I believe), this in itself is rigid and wasteful, doesn't allow for a variety of drivers, all drivers are loaded or none. Compiling everything into one file discourages dynamicness. 2.) The format for the program is static. It is really easy for the assembler to compile this, but because no dynamic thought had to be put into the design of the program format, hard to determine how to define a DLL standard. Like I say in my OS there are no application files, there are only DLL's. I have something like 10 pointers at the beginning of the file in order to deal with this as well as language support. Menuet has 1, start of program. 3.) The program is the program. What this means is that because there are no DLL's, the whole program needs to be loaded, not just the parts required. For example if a drawing program has routines for getting data from a scanner, and you are not using a scanner it is loaded anyway because the full program has to be loaded. 4.) Interupts are used with no provision for making it easier for the programmer. Sure, you can use macros, but would still be nice to have an assembler that would make it easier to call functions(whether they be interrupts or function calls). Once the compiler allows you to calls functions instead of numbers, they need for dll's will present itself and then be programmed in. I see the steps for menuetOS progress to be as follows: 1.) Dump the image thing, if network drivers can be programmed, drivers to load the individual files off of a hard drive or floppy should be a piece of cake. 2.) Determine a dll standard. 3.) Make an assembler or convert fasm so that functions can be called rather than using interupt numbers. I think the speed of menuet should be the last thing worked on right now. Graphics functions can be improved, but I think it is far more important to get the top two points I mentioned working first. We can already see what happens when this isn't done, people start importing c-libraries from other OS's. To me this is what C means: C/C++ is so that the programs will run on the .01%(exageration but not by much) of computers that aren't intel and the only downside is that they take up 5 times the space(not an exageration) and runs 5 times as slow(5 times the space would imply running 5 times as slow). DosBox is a good example. Can't really see the use of it. MenuetOS can only read FAT, so you need a FAT partition anyway. The only reason I can think of for DosBox is to play games. All the other stuff should be acomplished natively in the OS much faster and better(long file names should be supported). If you are going to do this, why not just boot into DOS and run the game natively at full speed. The Windows DOS prompt kicks butt because the routines are native to the OS and not designed to be "compatible" with every other possible OS out there. Keep up the good work Ville. Hopefully once more and more routines become asm people will see how your OS will blow away C based OS's like linux. ...Gomer73 |
|||
17 Jul 2004, 04:44 |
|
pelaillo 17 Jul 2004, 05:00
Gomer73 wrote: 3.) Make an assembler or convert fasm so that functions can be called rather than using interupt numbers. No need to convert fasm. Simply use macros Fully agree on your 3 numerals. |
|||
17 Jul 2004, 05:00 |
|
tom tobias 17 Jul 2004, 10:06
great post gomer. tom
|
|||
17 Jul 2004, 10:06 |
|
f0dder 17 Jul 2004, 14:07
Quote:
Well, IMO it does - one thing is that exceptions are usually handled a bit differently in user/kernel mode. But consider a wrong pointer or a bug that makes the code zero 4mb instead of 4kb - in usermode the worst thing that can happen is that your program crashes. From kernelmode, you could trash quite a lot. Quote:
Does sound like an exaggeration - but sure, if you think that all programmers write as bad code as many of the GPL coders, or that all windows HLL programmers use MFC, I can see where you would get that idea (in reality it depends on how good the programmer is and how well he knows his tools, and which tools/libraries are used) Quote:
Really? How come things like loop unrolling speed code up rather than slow it down? Anyway, what about my "batching" idea? It is from the assumption that the INT routines involve user->kernel->user switching (ie, that there's an amount of overhead involved). Also, offscreen drawing could be automated this way, and blitted at the end of a batch. |
|||
17 Jul 2004, 14:07 |
|
Ville 17 Jul 2004, 14:29
Quote: So as for multi-tasking, all you have to worry about is whether or not the function is re-entrant(can be called more than once without returning from the function). I assume Menuet supports this. Yes. Each system call allocates its own 4096 byte stack to the kernel side. So processes can have simultaneous access to the same system call. Quote: I really don't know how menuet works, but for video routines, you could create a sement for the linear frame buffer in ring 3. GS segment [gs:0] points to the beginning of linear frame buffer, for every process. So you can directly write to 24/32 bit screen memory from applications, and not use a system call. Quote: 2.) Determine a dll standard. There is a simple reason I've stayed away form DLL's. It's dependencies. Linux is an example of what using loadable modules might lead to. And 90KB kernel isn't that large after all. What I do encourage is writing libraries to assembly .INC files, which can be used for every application. Last edited by Ville on 17 Jul 2004, 14:49; edited 1 time in total |
|||
17 Jul 2004, 14:29 |
|
ASHLEY4 17 Jul 2004, 14:34
Gomer73, Very well put , i agree with all you wrote!, as it is at the mowment its just a demo os (avery good demo),but still just a demo os, not a useable asm os.
ASHLEY4. |
|||
17 Jul 2004, 14:34 |
|
bloglite 17 Jul 2004, 18:23
Ashley4,
Gotta disagree with you. MenuetOS is very much A USEABLE OS. All the RT DATA we transfer for equipment monitoring and most of our data served is via Menuet. We've stopped using DOS & Windows for these tasks as Menuet does it better. Much Better. BTW MenuetOS does DEMO very well too! G'day, Mark |
|||
17 Jul 2004, 18:23 |
|
ASHLEY4 17 Jul 2004, 18:46
I may have a old copy of menuetOs,so when i have menuet load in memory can i put a cd in the cd drive and load program's and run them in menuet ? can i do the same with a floppy full of programs ?. can i store programs i have made in menuet on a floppy ?.
If so i,m wrong, if not i,m wright. ASHLEY4. |
|||
17 Jul 2004, 18:46 |
|
Gomer73 17 Jul 2004, 19:22
For F0dder:
Easy way to show you how different C and ASM are is as follows: Write a DOS program to count to 3 in both c and asm. The C code is: for(i=1;i++;i<=3) ; ASM is: mov ax,1 :count inc ax cmp ax,3 jbe count Although the asm code might look like it takes up more space, go into debug and use the t command to see when the program ends. The c code will take up way more space than the ASM program. For some reason whenever I talk about this, c coders always think I am talking about source code size. It doesn't matter how big/small the source code is, it matters how big the compiled code is. The more instructions the slower the speed(what else do you think takes up clock cycles?). It doesn't matter how efficient the C programmer is, it is the nature of C. I don't know what loop unrolling refers to. It doesn't matter if it is MFC or not, for the above code I used no libraries. Libraries compound this especially when you have libraries calling libraries. The batching is probably a good idea(after all windows API is pretty complex and this is what it uses). I think anything that writes only to video memory at the end is a good idea. Video RAM access is significantly slower than normal RAM access. It looks like the user/kernel/user switch is kind of unavoidable since all int 40h switches will do that. This could be fixed by using a different int for routines that don't need a ring transition since the video memory does not have any need for critical access since the GS is open to everyone. I guess it is all up to the OS designer as to which routines they want in Ring 0. I am assuming now that all int 40 routines(which are all Kernel routines) are in Ring 0. From my view, only the routines that need to access IO Ports/memory that is unaccessible from ring 3 need to be put in ring 0. But I am unfamiliar with the setup of menuet and the reasons for the setup. For Ville: The .inc is kind of OK, it just wastes a lot of space. I know what you are saying about linux, libraries calling libraries and aren't efficient. Maybe this is the superior way(the way you mentioned). It is just that the same code will be loaded a whole bunch of times. For instance suppose you have something big like a video player. You would need this huge code in both your internet viewer and your media player. It definitely presents a direct way of doing stuff. Maybe it is the fastest way of doing stuff, and as long as everything is open source, won't have a problem. This is actually a good way of looking at things, looking forward to see how it turns out. The only issue I see is upgrades/patches. If say an image viewer is upgraded, every program that used that code would have to be updated seperately. We'll see what the future holds. |
|||
17 Jul 2004, 19:22 |
|
spideros1 17 Jul 2004, 19:56
Gomer73 wrote:
GCC generates such code: mov eax,2 .L6: dec eax jns .L6 It depeneds on the algorithm that you use. But most of the time GCC can produce better code than human would do, because it optimizes the code to take advantage of pipelining and other CPU features per request. |
|||
17 Jul 2004, 19:56 |
|
f0dder 17 Jul 2004, 20:17
Gomer, if you're comparing the same executable formats (in my case, win32/PE) there's no reason the C code should produce a larger executable than the asm code. By default you'll get C runtime included, but that can be turned off. Have a look at the attached zip to see what I mean. (I compiled with small code rather than fast code switches, I had to add the "x=i" so the loop isn't optimized away, and the code could be even shorter if you reverse the loop to count downwards).
Quote:
You have to take into account instruction latency and throughput, number of instructions is not enough. Also, since there's SIMD instructions, you also have to take into account the number of data items processed per instruction. Quote:
Instead of running 4096 times through a loop processing 4 bytes, run through the loop 512 times processing 32 bytes. Quote:
Well, MFC adds a *lot* of code, which might not matter for large projects where you use a lot of it, but for smaller stuff it's wasteful. Standard C runtime is also a library, etc. Yup, you can still beat compilers a lot of the time, but they (well, MSVC2003+ and intel C compiler) generate pretty good code these days. I'm in no way saying assembly is obsolete though, otherwise I probably wouldn't be hanging out here... especially when you need MMX, SSE and the like, compilers *suck*. And even with regular x87 or ALU code, you can do tricks in assembly that compilers don't use, since you know your code better. But 5x larger/slower on average code? Nah, that's quite a wild exaggeration. I still care about speed/size (otherwise I would probably be coding .NET, as it seems nice and easy ), but for the majority of my code I use some HLL - simply because I write that faster, debug it faster, and the code produced is good enough. That's _my_ personal opinion, which is subjective, and arguing about it is pointless. I'm not saying you should give up full-asm coding if that's what you like, just that it's not my path. No fighting necessary Quote:
Yup, I think sysmem buffering is a good idea - especially if you need to read back the surface/bitmap memory (ie, to do blending effects), as video memory is very slow to read. Of course hardware accel would be better, but that's a bit unrealistic considering the closedness of most hardware vendors. The batching method should still allow for hardware acceleration to be done later on though. Quote:
Probably right - but you also have to be careful about making ring0 routines "too small". Perhaps it's possible to implement some routine mostly in ring3 code, but if it needs to call a lot of ring0 routines - it might be smarter to move the entire routine to ring0, to avoid the switching. It's all about finding the point of balance. Btw, I would suggest not using INTs, but CALL some fixed memory location - this location would then use either INT to do the transition, or SYSCALL on CPUs that support it, since it's a bit faster. This is what WinXP does, btw. As for DLLs vs. static linkage - DLLs are pretty good, but it's easy to end up with the linux/windows problem of "DLL hell". I still think DLLs are a good idea, especially if the OS ever needs to support "larger" programs. And there's a bunch of DLLs that are mostly "static" now (ie, won't be updated, at least not in ways that will cause incompatibilities), like libjpeg. PS: I'm not in any way trying to dictate how MenuetOS should evolve, I'm just voicing thoughts, and considerations for my own toy kernel. Feel free to ignore
|
|||||||||||
17 Jul 2004, 20:17 |
|
Gomer73 17 Jul 2004, 21:20
Interesting, looks like the code is basically equivalent. Probably my example was too simple. I know when I was debugging Hale Landis' ide drivers it took forever to do simple stuff.
Something like string searches can definitely be optimized in asm with repne cmpsb(however it is spelled). I am assuming something like that would be significantly longer in C. I don't know the exact numbers, maybe it was because I compiled Hale's stuff with Tasm(which I thought was fairly good). But would be interesting to know what the numbers are for how much space c/asm take up. The problem is, the more complicated the program, the harder to analyze it to see a space comparison. |
|||
17 Jul 2004, 21:20 |
|
bloglite 17 Jul 2004, 21:29
ASHLEY4 wrote: I may have a old copy of menuetOs,so when i have menuet load in memory can i put a cd in the cd drive and load program's and run them in menuet ? can i do the same with a floppy full of programs ?. can i store programs i have made in menuet on a floppy ?. Try this on. Power on. Menuet boots from CD .... HTTPS is started ... CD Player starts providing "EZ" listening for our office environment while the entry system is connected to Lpt2 and the server is providing our archives and MP3's to all the machines on our P2P Network. Floppy & HD work well and the CD functionality is progressing well. I did not say you were "wrong" I stated that I disagreed with you on the usability of MenuetOS. I find it VERY usable. (many other examples are available) (Anyone else want to share how YOU use MenuetOS?) MenuetOS / Data in ... Process Data .. Data out...Things happen.. WoW ... |
|||
17 Jul 2004, 21:29 |
|
Gomer73 17 Jul 2004, 21:39
For F0dder:
Interesting how the compiler picks to use up less space but execute slower(for the xor eax,eax, inc eax - I assume this would be slower than mov eax,1 but take up less space). Not a big deal, but just interesting. Wonder what number it would go up to before using the mov instruction. |
|||
17 Jul 2004, 21:39 |
|
f0dder 17 Jul 2004, 21:40
Well, quality of code produced by a compiler depends both on the compiler and how the code is written. Some compilers (old ones, and some "hobbyist" compilers) translate line-by-line and generate pretty bad code. Delphi generates bloated code. PowerBASIC generates *horrible* code. Also, as I said, code generation quality will also depend on the programmer and how he writes code - how FOR loops are constructed, whether you use array or pointer indexing, etc etc etc.
I'm not familiar with Hale Landis' IDE drivers, but often I've seen drivers ran with very few optimization switches, perhaps because of developer paranoia. This is especially true with GNU compiled stuff, I guess GCC might sometimes have been shipped with new optimizations that weren't tested properly? Quote:
That would actually be pretty slow - well, at least for longer string comparisons. String compare, substring search, and string length determination are pretty good examples where more code is usually faster than less code. Compare, say, agner fog's strlen to a more simple solution. More code = faster speed. I think it's more or less pointless to compare HLL vs ASM anyway - there are just too many factors to consider, and you'll only end up with holy wars. Smarter to use the tool(s) suitable for the job, and get the most out of it. Compilers *are* good today, but they still can't replace humans. For me they're good enough for the "bulk" of my applications, but that's just me. [edit] Gomer: I compiled with the "optimize for size" switch. If I compile with "optimize for speed", the compiler chooses "mov eax, 1". VC has a lot of cute tricks it does for you automatically - like moving a proc address into a register and calling that, if you use a function enough. |
|||
17 Jul 2004, 21:40 |
|
Gomer73 17 Jul 2004, 22:16
Yep, probably C isn't that bad.
What I should have said is the C that is usually out there is poor. Like you say the GNU stuff is kind of horid and that is what I see most of. It all kind of comes down to the programmer in the end. I don't quite know what it is. But linux is slow for both compiling and running stuff. Anytime it takes several hours to compile something is kind of a bad thing. I think it probably has to do with too much openness. People just take opensource code and plop it in without trying to optimize it, therefore it can run on several platforms(slow, kind of buggy, but it runs). Even windows bogs down, I know it is a pain to load internet explorer. Don't know why this should be if it is already resident in memory. I guess like I say, see if we can hope for the best. Maybe Ville's technique is the way to go. |
|||
17 Jul 2004, 22:16 |
|
Goto page 1, 2 Next < Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.