First, "int 0x10, ah = 0x0C"
is completely unusable. To understand why, here's a detailed break-down of what it actually does:
- It starts with a software interrupt, which is relatively expensive all by itself because it involves micro-code and typically flushes the CPU's pipeline. This is pure pointless bloat.
- Then the BIOS has a whole bunch of tests to figure out which function you actually wanted, which is typically an insanely poor sequence of comparisons and branches (each with potential branch misprediction). This is pure pointless bloat.
- Once you reach the code you actually wanted, it has to figure out which video mode and what the pixel format is (to figure out how to write a pixel for the current video mode). This is pure pointless bloat.
- Then it has to calculate an address in the frame buffer from your coordinates. This is almost pure pointless bloat (more on that later).
- Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.
- Then it has to unwind all the crud it had to spew all over the stack from earlier. This is pure pointless bloat.
- Finally, it returns ("iret"), which is relatively expensive all by itself because it involves micro-code and typically flushes the CPU's pipeline. This is pure pointless bloat.
Mostly; there's about 100 times more pure pointless bloat than there is actual useful work.
Second, "putpixel()" is almost never sane. The problem is that you end up doing an "address = x * bytes_per_pixel + y * bytes_per_line" calculation for every single pixel; and there's almost always a way to avoid that. For a simple example, to draw a horizontal line you only need to calculate the "starting address", and after that you know that the next pixel will be at the next highest address after the previous pixel. More specifically, to draw a line you can typically do something like calculate the address once then do a "rep stosb" (if it's an 8-bpp mode) or "rep stosw" (if it's an 15-bpp mode or 16-bpp mode) or "rep stosd" (if it's an 32-bpp mode). The same happens for rectangles; where you can do one horizontal line (as already described) and then add "bytes between end of one line to start of next line" and do the next line; and only calculate that "address = x * bytes_per_pixel + y * bytes_per_line" once for the entire rectangle.
Third, for any video mode that a user won't mind looking at (which excludes ancient "320*200" nonsense) you can't use the legacy/deprecated "VGA area" without bank switching, and bank switching makes everything slow (not just the bank switching itself, but the checking to determine if you do/don't need to switch banks ruins most other optimisations). For this reason any OS that isn't worthless trash will use "linear frame buffer" (and therefore must use protected mode or long mode).