VGA hardware double buffering

NickJohnson · **Posted:** Mon Jun 07, 2010 3:44 pm

I heard somewhere that that there is a way to use the VGA hardware to accelerate double buffering. Does such a thing exist, and is it in fact faster than double buffering in normal memory? If so, how do you do it (I suspect it has something to do with the memory map select)? I have a VGA driver that supports both 320x200x256 linear and 320x240x256 "mode x"/planar modes.

Combuster · **Posted:** Mon Jun 07, 2010 3:52 pm

The only "acceleration" you'll find in the VGA is the latch-accelerated-blits. Very useful when buses were 8-bit and you could do 4-byte writes with one write cycle instead of four. You could use that to double buffer VRAM to VRAM, but the more obvious solution would be to do pageflipping in that case.

The idea here is that you should only use the VGA read and write pipelines when you have little overdraw.

I hope the rumours you heard make a bit more sense now.

NickJohnson · **Posted:** Mon Jun 07, 2010 4:10 pm

So you're saying that the data latches can be used to copy from one place in video memory to another without using the system data bus? Is that actually faster now that the data bus is at least 32 bits?

Brendan · **Posted:** Mon Jun 07, 2010 8:33 pm

Hi,

NickJohnson wrote:

So you're saying that the data latches can be used to copy from one place in video memory to another without using the system data bus? Is that actually faster now that the data bus is at least 32 bits?

The data latches can be used to copy from one place in video memory to another, but it's "triggered" by writes (or reads? I can't remember) on the system data bus. It's meant to be slightly faster because accessing a small amount of data causes a larger amount of data to be shifted. Mostly it's a messy waste of time though.

To improve video update times, the best method (and IMHO the only sane method) is to minimise all writes to display memory while ensuring that all writes are 32-bit or 64-bit and are aligned. For VGA, doing the drawing in display memory (with or without page flipping) is insane - you can't ensure writes are aligned, and pixels may be overwritten many times. You should/must do the drawing in a buffer in RAM, and then blit this buffer to display memory. In this case you could use page flipping (blit the buffer from RAM into off-screen display memory, then tell the video card to switch); but this doesn't reduce the number of writes to display memory. In theory page flipping would reduce the chance of "tearing"; but in practice (for VGA) it's easy to blit faster than the pixels are displayed and page flipping doesn't make sense unless you're doing it in a vertical retrace IRQ (and the vertical retrace IRQ isn't supported/standard for "plain VGA").

Basically, page flipping makes a lot of sense if you're using the video card to do drawing (e.g. using the GPU or 2D or 3D acceleration to draw the graphics in display memory). This isn't the case for VGA.

What I do is keep 2 buffers in RAM. The first is the "work buffer" where everything is drawn. The second is a "change buffer", which contains the same data as display memory. When you blit from the "work buffer" to display memory you compare each dword with the dword in the "change buffer", and if the dword is different you write the dword to the change buffer and also write it to display memory. If most pixels remain the same, then most pixels are never sent to display memory. In the past I've also used an additional "modified/unmodified" flag for each horizontal line, so the entire line is skipped if the drawing code didn't touch it (and got very good results with this technique, especially for menus and things where only a small part of the screen changes between frames).

Cheers,

Brendan

Selenic · **Joined:** Sat Jan 23, 2010 2:56 pm **Posts:** 123

Brendan wrote:

In the past I've also used an additional "modified/unmodified" flag for each horizontal line, so the entire line is skipped if the drawing code didn't touch it (and got very good results with this technique, especially for menus and things where only a small part of the screen changes between frames).

Couldn't you optimise this even more? You could include a 'update this rectangle' function, which would allow the comparison to only scan relevant areas of video memory, potentially improving performance hugely (especially when using VBE-based high resolutions, where you have lots of memory to scan but no acceleration)

Brendan · **Posted:** Wed Jun 09, 2010 3:08 am

Hi,

Selenic wrote:

Brendan wrote:

In the past I've also used an additional "modified/unmodified" flag for each horizontal line, so the entire line is skipped if the drawing code didn't touch it (and got very good results with this technique, especially for menus and things where only a small part of the screen changes between frames).

Couldn't you optimise this even more? You could include a 'update this rectangle' function, which would allow the comparison to only scan relevant areas of video memory, potentially improving performance hugely (especially when using VBE-based high resolutions, where you have lots of memory to scan but no acceleration)

Using "dirty rectangles" can work (and can be faster) but it's complex (and can be slower).

For example, imagine you've got 12 dirty rectangles that all overlap in some way or another. If you use naive code to update all pixels in each dirty rectangle, then pixels in areas that overlap would be updated multiple times for no reason, which is bad.

A better (but more complex method) would be: for each horizontal line of pixels, find the intersections with all dirty rectangles and only update pieces of the horizontal line once. For example, if the horizontal line intersects with the left edge of the first dirty rectangle, the left edge of the second dirty rectangle, the right edge of the first dirty rectangle and then the right edge of the second dirty rectangle; then you know the dirty rectangles overlap and you can update pixels from the left edge of the first dirty rectangle to the right edge of the second dirty rectangle. In this case you'd minimise drawing, but checking for intersections and overlaps could be more work than it saves (if you're using a "change buffer" too).

The other thing to consider is whether or not the extra complexity is justified. With "reasonable" blitting code, the majority of the overhead ends up being in the drawing code. For an example, for my current code, drawing the screen costs about 4 times as much as dithering, and blitting to screen (using a "change buffer" but without any "modified/unmodified" flag for each horizontal line) is much faster than the dithering. In this case improving the blitting would make an insignificant difference to performance (but would also make a massive different to code maintainability, and make the drawing slower).

Cheers,

Brendan

Creature · **Posted:** Wed Jun 09, 2010 4:51 am

Brendan wrote:

For example, imagine you've got 12 dirty rectangles that all overlap in some way or another. If you use naive code to update all pixels in each dirty rectangle, then pixels in areas that overlap would be updated multiple times for no reason, which is bad.

A better (but more complex method) would be: for each horizontal line of pixels, find the intersections with all dirty rectangles and only update pieces of the horizontal line once. For example, if the horizontal line intersects with the left edge of the first dirty rectangle, the left edge of the second dirty rectangle, the right edge of the first dirty rectangle and then the right edge of the second dirty rectangle; then you know the dirty rectangles overlap and you can update pixels from the left edge of the first dirty rectangle to the right edge of the second dirty rectangle. In this case you'd minimise drawing, but checking for intersections and overlaps could be more work than it saves (if you're using a "change buffer" too).

I'm using a similar approach myself. The only problem is that you need to save the dirty rectangles somewhere (which either requires a static buffer, which may get full, in which case you might need to immediately process, or dynamic memory allocation, with lots of overhead). But in cases such as a dirty rectangle with points (20, 0), (780, 0) and (780, 60), where the screen width is 800 pixels, I keep wondering if it wouldn't be more efficient to simply clamp it from 0 to 800 (so in packed pixel modes, where all pixels and lines follow each other in memory, you can process it as one MOVS* instead of 60 MOVSL's, one for every line).

OSDev.org

VGA hardware double buffering

Who is online