No, i'm trying to find way to draw anything on screen pixel per pixel without X
That's... exactly the opposite of OpenGL. Seriously, it is. That's so far from what you are asking about, and so far from what you probably actually want to be doing, that it indicates a serious confusion of ideas.
Mind you, this is a confusing subject. I think we all need to take a step back and reconsider this entire discussion.
Let's start over from scratch with: what are you trying to display? What kinds of images, widgets, and 'content' are you working with? What is the purpose in displaying it?
My impression is that your goal is something more along the likes of a window manager (comparable to Windows USER Subsystem, or X Window System when run for local use - the original purpose was for remote graphics, which is why it has always been a bit odd compared to other display managers meant for local use) but with a different display management model. Is this an accurate statement, and if not, how would
you describe what you want?
I think we also need to review the terminology, and how most system decompose the different aspects of this (e.g., the "graphics stack").
At the lowest level we have the device drivers, which communicate with the actual hardware. These need to be able to work with either the specific display devices - the video memory, the GPU if any, the video signal generators, and even the monitor - or some common subset of it which it shares with disparate adapters. However, this does not mean that the driver must
do all the work alone. The VESA VBE/Core defines a standard minimal interface to the hardware as an extension BIOS, which a complaint video adapter should provide as a way of interfacing with the hardware without needing any proprietary details of the adapter.
Somewhere here you would find things like the Mesa driver framework and the Xlib Direct Rendering Manager. This level doesn't have a formal name in most systems, at least not as far as I know of, which is a first abstraction layer which software system (not necessarily the operating system itself) provides to give a uniform model for drawing pixels on the screen, while still exposing the underlying hardware. The split between 2-D and 3-D often starts around here, as a 3-D renderer generally needs a lot more direct hardware access than a 2-D one.
Then next level is the renderer, which is where . This is where you really see 3-D becoming a separate thing, as most systems prior to, say, 2007 would have used a strictly 2-D rendering for everything that didn't specifically require 3-D rendering, due to the need for hardware acceleration for practical real-time 3-D rendering at the time. As Brendan has pointed out before, right now the Cycle of Reincarnation
for graphics rendering has been swinging towards CPU-driven rendering since the 2012 or so, though dedicated rendering hardware is still dominant at the moment. Note, however, that the graphics rendering Wheel of Incarnation has been rolling since the very first days of computer graphics in the early 1960s, so it is a good guess that this won't be the last word on the subject.
Anyway, Mesa proper started out in the 1990s as a software 3-D renderer, but currently is used to sort of abstract the rendering in a way that the software rendering is more of a fallback mode.
This is where you need to decide how you are going to handle the differences between rendering 2-D images such as basic windows and widgets, and the more impressive but also more processing-intensive 3-D rendering. While the fact that you can treat 2-D as a special case of 3-D, it is tempting to use 3-D for everything, but that approach has some significant down sides, especially on older hardware; you may need to consider where you can use less general 2-D rendering to avoid a lot of hardware crunching where possible.
You also need to look at how you separate different renderable elements such as glyphs (letters, digits, text symbols, etc.), widgets (window borders, menus, icons, the mouse pointer), 2-D images such as drawings and pictures, 3-D manipulatable objects, etc. This relates, and raises the issue in, the next layer of the stack, the compositor. However, before that I need to mention another part of this layer, the widget toolkit.
The widget toolkit is the set of primitive widgets - window frames, menus, drawing spaces, textboxes, text areas, radio buttons, checkboxes, etc. - that a window manager uses. This is not a separate layer from the renderer, but side-by-side with it, and the widgets have to work together with the compositor.
The compositor is the part that combines the individual elements being rendered into the instantaneous display state, that is, the screen as it is at a given moment. In a 2-D design, this is usually done by the renderer directly, but 3-D UIs almost always have a separate compositor.
OK, quick history lesson. Early 2-D windowing systems generally composited in situ
, that is, directly into the display. However, while this was feasible with the stroke-vector displays of the 1960s, or on raster displays that used fixed cells drawn from tables of glyphs such as PLATO and the majority of text-oriented terminals, this was problematic for bitmapped video systems even from the outset, as it meant that a large block of memory - often as much as 30% of system memory in the days of the Alto and 128K Macintosh had to be set aside for the video, and
the timing of drawing had to be synced with the vertical refresh in order to avoid flicker.
While double buffering was part of the answer, it ran into issues with time - copying that much data would take longer than the vrefresh, so a workable double buffer needed to be done by hardware. You would have to dedicate two buffer's worth of memory in hardware (one to drive the video, and the other to draw to), and the display would need even more hardware to let it switch which of the video buffers was driving it in order to make it work. Pretty much every video system today supports this as a matter of course. However, this did nothing for when you have to copy a bitmapped image from general memory - something loaded from a file, say - into the drawing video buffer.
In order to cut the time further, they developed Bit BLT, which is a method in which a part of the image is prepared as a mask and only the mask is drawn to the video buffer. Other techniques, such as hardware sprites (which were drawn directly to the screen, bypassing the video buffer entirely) were also developed, but were mostly used in dedicated gaming and video editing systems.
I mention all this to get to compositing. Up until 2006 or so, the act of compositing for a window manager was done mainly as a 2-D action, and generally was focused a) determining what parts of the display have changed, b) determining which parts of the screen were observable, on blitting the observable sections of a window that were getting changed to the draw buffer. This was generally easier for a tiling window manager, as there was no z-scaling - no windows overlapped, so everything could be drawn, and you could divide the windows into those which had changed and those which hadn't. Layering windows managers were a little more complicated because some windows might obscure parts of others, but generally it wasn't too difficult. Even so, 2-D hardware acceleration was still very useful for this, even if it wasn't absolutely necessary.
With the introduction of 3-D layered UIs such as Aqua and Aero, the issue of combining things became much more complex, leading to the need for a separate compositor layer. Most major window managers today have a 3-D compositor, and for a time it was almost impossible to get good performance from one without a dedicated GPU, meaning software rendering was out of the question even for the basic GUI, leading to issues that previously were mostly seen in gaming.
Getting back on track, we now get to the window manager itself, which is the part that actually decided where to put each rendered component, sets things related to the way widgets interact, and just generally, well, manages the windows. This is what X Window System was from the outset, and it acts as the glue between the lower level aspects of the GUI and the more abstract parts such as the desktop manager.
The next layer is the desktop manager, and this is what most people are actually thinking when they talk of a GUI, and of the differences between Windows, Mac, and the various Linux desktops such as KDE, Gnome, Unity, XFCE, Cinnamon, MATE, and so forth.
Not all systems follow quite this pattern, and not all layers are found in all of them (or in this order), but that will at least give us a common language for discussing this.