OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 2:28 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 3:24 am 
Offline
Member
Member

Joined: Wed Jan 25, 2017 3:37 pm
Posts: 71
And so, my shell works too slow.
Rectangle fills very slow(about 10 seconds).

Here is code:

Code:
for i := x to w do
   begin
     for j := y to j + h do
     begin
      PutPixel(i, j, Color);
     end;
   end;


How to make this code faster?


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 3:29 am 
Offline
Member
Member
User avatar

Joined: Sun Dec 25, 2016 1:54 am
Posts: 204
Hmmmm.... this is a hard one...

1) Don't use pascal
2) get rid of the p-code interpeter
3) use assembly
4) specifically 512 bit VMX instructions
5) in parallel on multiple cores
6) with prisma-chromatic fiber interconnects between your VRAM and the combustinator

Cheers and good luck!

_________________
Plagiarize. Plagiarize. Let not one line escape thine eyes...


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 3:33 am 
Offline
Member
Member

Joined: Wed Jan 25, 2017 3:37 pm
Posts: 71
dchapiesky wrote:
Hmmmm.... this is a hard one...

1) Don't use pascal
2) get rid of the p-code interpeter
3) use assembly
4) specifically 512 bit VMX instructions
5) in parallel on multiple cores
6) with prisma-chromatic fiber interconnects between your VRAM and the combustinator

Cheers and good luck!


TP is generate small and fast code.

TP is native language, not interpreter

Yes, i'm using inline assembler for interrupts.

Multiple cores on 486? :D


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 3:34 am 
Offline
Member
Member

Joined: Wed Jan 25, 2017 3:37 pm
Posts: 71
monobogdan wrote:
And so, my shell works too slow.
Rectangle fills very slow(about 10 seconds).

Here is code:

Code:
for i := x to w do
   begin
     for j := y to j + h do
     begin
      PutPixel(i, j, Color);
     end;
   end;


How to make this code faster?


PutPixel is not from graph unit function. It's implemented by me.


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 3:45 am 
Offline
Member
Member
User avatar

Joined: Sun Dec 25, 2016 1:54 am
Posts: 204
please explain what TP means...

_________________
Plagiarize. Plagiarize. Let not one line escape thine eyes...


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 3:46 am 
Offline
Member
Member

Joined: Tue Mar 04, 2014 5:27 am
Posts: 1108
dchapiesky wrote:
please explain what TP means...


Borland Turbo Pascal.


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 3:49 am 
Offline
Member
Member
User avatar

Joined: Sun Dec 25, 2016 1:54 am
Posts: 204
well then - from wikipedia --

Quote:
Several versions of Turbo Pascal, including the latest version 7, include a CRT unit used by many fullscreen text mode applications. This unit contains code in its initialization section to determine the CPU speed and calibrate delay loops. This code fails on processors with a speed greater than about 200 MHz and aborts immediately with a "Runtime error 200" message.[25] (the error code 200 had nothing to do with the CPU speed 200 MHz). This is caused because a loop runs to count the number of times it can iterate in a fixed time, as measured by the real-time clock. When Turbo Pascal was developed it ran on machines with CPUs running at 1 to 8 MHz, and little thought was given to the possibility of vastly higher speeds, so from about 200 MHz enough iterations can be run to overflow the 16-bit counter.[26] A patch was produced when machines became too fast for the original method, but failed as processor speeds increased yet further, and was superseded by others.

_________________
Plagiarize. Plagiarize. Let not one line escape thine eyes...


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 4:07 am 
Offline
Member
Member
User avatar

Joined: Sun Dec 25, 2016 1:54 am
Posts: 204
You are using integers and not floating point variables?

Turbo Pascal had slow floating point code...

_________________
Plagiarize. Plagiarize. Let not one line escape thine eyes...


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 4:19 am 
Offline
Member
Member

Joined: Tue Mar 04, 2014 5:27 am
Posts: 1108
monobogdan wrote:
TP is generate small and fast code.


Sorry, it does not. It compiles fast (which is why I mainly used Turbo Pascal instead of Turbo C++ in the 90's when my computers weren't fast enough), but it doesn't produce fast code. I had to write quite a bit of assembly code (with various hacks) to make rendering fast on my machines.

You either need to do the same (rendering individual pixels in a loop is a bad idea when your compiler doesn't do a good job at optimizing) or use a better compiler. I hear Free Pascal is good. Btw, no compiler will be able to optimize this kind of loop for 16-color modes where you have to switch between the four pixel planes.

I should also add that for fast rendering you need not only optimized rendering code, you also need to avoid rendering the same pixel more than once, and learning and implementing the algorithms for this is a lot of fun (both for 2d and 3d).


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 4:24 am 
Offline
Member
Member

Joined: Tue Mar 04, 2014 5:27 am
Posts: 1108
dchapiesky wrote:
Turbo Pascal had slow floating point code...

The 6-byte Real type is not supported by the x87 FPU directly, so, quite a bit of 16-bit code from the system library would be involved.


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 5:29 am 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4591
Location: Chichester, UK
Making a BIOS interrupt call for every pixel that you plot is horrendously inefficient. What should be a simple "mov" instruction is translated into hundreds of instructions.

You need to write a routine to address the display directly. This should be pretty simple in real mode.


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 6:00 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

First, "int 0x10, ah = 0x0C" is completely unusable. To understand why, here's a detailed break-down of what it actually does:
  • It starts with a software interrupt, which is relatively expensive all by itself because it involves micro-code and typically flushes the CPU's pipeline. This is pure pointless bloat.
  • Then the BIOS has a whole bunch of tests to figure out which function you actually wanted, which is typically an insanely poor sequence of comparisons and branches (each with potential branch misprediction). This is pure pointless bloat.
  • Once you reach the code you actually wanted, it has to figure out which video mode and what the pixel format is (to figure out how to write a pixel for the current video mode). This is pure pointless bloat.
  • Then it has to calculate an address in the frame buffer from your coordinates. This is almost pure pointless bloat (more on that later).
  • Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.
  • Then it has to unwind all the crud it had to spew all over the stack from earlier. This is pure pointless bloat.
  • Finally, it returns ("iret"), which is relatively expensive all by itself because it involves micro-code and typically flushes the CPU's pipeline. This is pure pointless bloat.

Mostly; there's about 100 times more pure pointless bloat than there is actual useful work.

Second, "putpixel()" is almost never sane. The problem is that you end up doing an "address = x * bytes_per_pixel + y * bytes_per_line" calculation for every single pixel; and there's almost always a way to avoid that. For a simple example, to draw a horizontal line you only need to calculate the "starting address", and after that you know that the next pixel will be at the next highest address after the previous pixel. More specifically, to draw a line you can typically do something like calculate the address once then do a "rep stosb" (if it's an 8-bpp mode) or "rep stosw" (if it's an 15-bpp mode or 16-bpp mode) or "rep stosd" (if it's an 32-bpp mode). The same happens for rectangles; where you can do one horizontal line (as already described) and then add "bytes between end of one line to start of next line" and do the next line; and only calculate that "address = x * bytes_per_pixel + y * bytes_per_line" once for the entire rectangle.

Third, for any video mode that a user won't mind looking at (which excludes ancient "320*200" nonsense) you can't use the legacy/deprecated "VGA area" without bank switching, and bank switching makes everything slow (not just the bank switching itself, but the checking to determine if you do/don't need to switch banks ruins most other optimisations). For this reason any OS that isn't worthless trash will use "linear frame buffer" (and therefore must use protected mode or long mode).


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 6:32 am 
Offline
Member
Member

Joined: Tue Mar 04, 2014 5:27 am
Posts: 1108
Brendan wrote:
* Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.


You may find that in planar 16-color modes (e.g. the VGA 640x480x16 mode) the entire screen can be updated no faster than ~30 times per second using the most optimal code. On a 1GHz+ CPU. Which is like 3K+ CPU clocks/pixel. Ouch. I'm not sure which part of the video hardware is to blame (non-planar VGA and SVGA modes don't have this problem). I never looked into it. Slow port I/O for switching planes and setting pixel masks? Some weird compatibility feature?


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 7:11 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

alexfru wrote:
Brendan wrote:
* Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.


You may find that in planar 16-color modes (e.g. the VGA 640x480x16 mode) the entire screen can be updated no faster than ~30 times per second using the most optimal code. On a 1GHz+ CPU. Which is like 3K+ CPU clocks/pixel. Ouch. I'm not sure which part of the video hardware is to blame (non-planar VGA and SVGA modes don't have this problem). I never looked into it. Slow port I/O for switching planes and setting pixel masks? Some weird compatibility feature?


For this case I pre-arrange it all as "buffer per plane" in RAM; then do "switch to plane 0; blit everything for plane 0; switch to plane 1; blit everything for plane 1; ..." (and I set the pixel mask and write mode once when setting the mode). I've never had any kind of performance problem.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: More faster putpixel than 0ch?
PostPosted: Sat Jan 28, 2017 7:24 am 
Offline
Member
Member

Joined: Tue Mar 04, 2014 5:27 am
Posts: 1108
Brendan wrote:
alexfru wrote:
Brendan wrote:
* Then it does a write to the frame buffer. This is the only part that actually matters, and is probably faster than every single step of pure pointless bloat that occurred before and after.


You may find that in planar 16-color modes (e.g. the VGA 640x480x16 mode) the entire screen can be updated no faster than ~30 times per second using the most optimal code. On a 1GHz+ CPU. Which is like 3K+ CPU clocks/pixel. Ouch. I'm not sure which part of the video hardware is to blame (non-planar VGA and SVGA modes don't have this problem). I never looked into it. Slow port I/O for switching planes and setting pixel masks? Some weird compatibility feature?


For this case I pre-arrange it all as "buffer per plane" in RAM; then do "switch to plane 0; blit everything for plane 0; switch to plane 1; blit everything for plane 1; ..." (and I set the pixel mask and write mode once when setting the mode). I've never had any kind of performance problem.


I did the same, except I probably switched the planes for every scanline, not just four times per frame. It would be interesting to test this on different machines.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group