Commits · 7b07f25b95e3d605695ab8fcf7efdce8092baaf9 · adam.huang / Xf86 Video Fbturbo

29 Jul, 2013 1 commit

Explicitly include "gcstruct.h" for GCOps · 7b07f25b

Siarhei Siamashka authored Jul 29, 2013

Should fix https://github.com/ssvb/xf86-video-sunxifb/issues/14


and prevent FTBFS on some systems.
Reported-by: Fred Chien <cfsghost@gmail.com>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

7b07f25b

12 Jun, 2013 1 commit

Add CPU optimization for PutImage · 06f5aec6

Harm Hanemaaijer authored Jun 07, 2013

Benchmark tests reveal that xorg's fb layer PutImage implementation
does not follow on optimal code path for requests without special
raster operations, which is due to the use of a slower general blit
function instead of the pixman library. This affects Xlib PutImage
requests and some ShmPutImage requests. In the case of ShmPutImage,
xorg directs ShmPutImage requests to PutImage only if the width of
the part of the image to be copied is equal to the full width of
the image, resulting in relatively poor performance. If the width
of the part of the image that is copied is smaller than the full
image, then xorg uses CopyArea which results in the use of the
already optimal pixman blit functions. The sub-optimal path is
commonly triggered by applications such as window managers and web
browsers.

To fix this unnecessary performance flaw, PutImage is replaced with
a version that uses pixman for the common case of GXcopy and all
plane masks sets. This change is device-independent and only uses
pixman CPU blit functions that is already present in the xorg server.

Using the low-level benchmark program benchx
(https://github.com/hglm/benchx.git

), the following speed-ups were
measured (1920x1080x32bpp) on an Allwinner A10 device:

ShmPutImageFullWidth (5 x 5): Speed up 9%
ShmPutImageFullWidth (7 x 7): Slow down 5%
ShmPutImageFullWidth (22 x 22): Speed up 8%
ShmPutImageFullWidth (49 x 49): Speed up 19%
ShmPutImageFullWidth (73 x 73): Speed up 55%
ShmPutImageFullWidth (109 x 109): Speed up 50%
ShmPutImageFullWidth (163 x 163): Speed up 37%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 77%
ShmPutImageFullWidth (549 x 549): Speed up 92%
AlignedShmPutImageFullWidth (5 x 5): Slow down 14%
AlignedShmPutImageFullWidth (7 x 7): Slow down 6%
AlignedShmPutImageFullWidth (15 x 15): Speed up 10%
AlignedShmPutImageFullWidth (22 x 22): Speed up 9%
AlignedShmPutImageFullWidth (33 x 33): Speed up 21%
AlignedShmPutImageFullWidth (49 x 49): Speed up 28%
AlignedShmPutImageFullWidth (73 x 73): Speed up 30%
AlignedShmPutImageFullWidth (109 x 109): Speed up 47%
AlignedShmPutImageFullWidth (163 x 163): Speed up 38%
AlignedShmPutImageFullWidth (244 x 244): Speed up 63%
AlignedShmPutImageFullWidth (366 x 366): Speed up 84%
AlignedShmPutImageFullWidth (549 x 549): Speed up 89%

At 16bpp the speed-up is even greater:

ShmPutImageFullWidth (5 x 5): Slow down 8%
ShmPutImageFullWidth (7 x 7): Slow down 8%
ShmPutImageFullWidth (10 x 10): Slow down 6%
ShmPutImageFullWidth (22 x 22): Speed up 9%
ShmPutImageFullWidth (33 x 33): Speed up 20%
ShmPutImageFullWidth (49 x 49): Speed up 27%
ShmPutImageFullWidth (73 x 73): Speed up 69%
ShmPutImageFullWidth (109 x 109): Speed up 74%
ShmPutImageFullWidth (163 x 163): Speed up 100%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 133%
ShmPutImageFullWidth (549 x 549): Speed up 123%
AlignedShmPutImageFullWidth (5 x 5): Speed up 6%
AlignedShmPutImageFullWidth (7 x 7): Slow down 9%
AlignedShmPutImageFullWidth (10 x 10): Slow down 10%
AlignedShmPutImageFullWidth (33 x 33): Speed up 17%
AlignedShmPutImageFullWidth (49 x 49): Speed up 34%
AlignedShmPutImageFullWidth (73 x 73): Speed up 49%
AlignedShmPutImageFullWidth (109 x 109): Speed up 53%
AlignedShmPutImageFullWidth (163 x 163): Speed up 69%
AlignedShmPutImageFullWidth (244 x 244): Speed up 82%
AlignedShmPutImageFullWidth (366 x 366): Speed up 116%
AlignedShmPutImageFullWidth (549 x 549): Speed up 110%
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

06f5aec6

26 Mar, 2013 1 commit

G2D: Now sunxi_x_g2d.c code does not require sunxi disp anymore · 1cd5f084

Siarhei Siamashka authored Mar 26, 2013



The sunxi_x_g2d.c file contains the midlayer code for hooking the
G2D optimized blit into xserver. But in fact it does not strictly
need to depend on anything sunxi specific.

So now we introduce a simple "blt2d_i" interface struct which
specifically provides a pointer to the accelerated blit function.
And just use this interface struct instead of the whole "sunxi_disp_t".
This allows to easily reuse the same code for other non-G2D or even
non-sunxi blit implementations in the future.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

1cd5f084

22 Mar, 2013 1 commit

G2D: enable accelerated blits for 16bpp color depth · 60291865

Siarhei Siamashka authored Mar 22, 2013

This is still not perfect, because G2D can't saturate memory bandwidth
for this color depth (it is fillrate limited). We should emulate 16bpp blits
with 32bpp blits whenever it is possible.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

60291865

21 Mar, 2013 1 commit

G2D: accelerate CopyArea between different pixmaps in framebuffer · cc1b1410

Siarhei Siamashka authored Mar 21, 2013



Now source and destination pixmaps don't need to be the same for
using G2D acceleration (as long as both of them are allocated in
the framebuffer). This allows using G2D to copy pixels from DRI2
buffers to the framebuffer on the fallback path (when the window
of an OpenGL ES application is partially overlapped by some other
windows). Though it only works when composite extension is
disabled, for example by adding the following to xorg.conf:

    Section "Extensions"
        Option "Composite" "Disable"
    EndSection

If composite extension is enabled, windows have backing pixmaps, and
we have a longer chain of copies:

   DRI2 buffer -> backing pixmap -> framebuffer

Because backing pixmap is not allocated in a physically contiguous
memory, it can't be copied using G2D yet.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

cc1b1410

18 Mar, 2013 1 commit

G2D: Hardware acceleration for XCopyArea (initially 32bpp only) · ecfeb4aa

Siarhei Siamashka authored Mar 18, 2013



Wrap CreateGC function to add a hook for CopyArea operation, which
can be accelerated using G2D for the buffers inside of the visible
part of the framebuffer. In the future we may try to also ensure
that DRI2 buffers are copied using G2D instead of CPU in the case
if we hit the fallback path and can't avoid this copy.

Benchmark using "x11perf -scroll500 -copywinwin500":

=== ShadowFB (software rendering) ===

   3000 reps @   2.0308 msec (   492.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9741 msec (   507.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9826 msec (   504.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9830 msec (   504.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9965 msec (   501.0/sec): Scroll 500x500 pixels
  15000 trep @   1.9934 msec (   502.0/sec): Scroll 500x500 pixels

   1600 reps @   3.3054 msec (   303.0/sec): Copy 500x500 from window to window
   1600 reps @   3.3179 msec (   301.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2263 msec (   310.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2491 msec (   308.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2357 msec (   309.0/sec): Copy 500x500 from window to window
   8000 trep @   3.2669 msec (   306.0/sec): Copy 500x500 from window to window

=== G2D (hardware acceleration) ===

   3000 reps @   2.1949 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1929 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1923 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1889 msec (   457.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1941 msec (   456.0/sec): Scroll 500x500 pixels
  15000 trep @   2.1926 msec (   456.0/sec): Scroll 500x500 pixels

   2800 reps @   1.8114 msec (   552.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8103 msec (   552.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8160 msec (   551.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8099 msec (   553.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8126 msec (   552.0/sec): Copy 500x500 from window to window
  14000 trep @   1.8120 msec (   552.0/sec): Copy 500x500 from window to window

CPU usage remains low when running this test with G2D acceleration enabled.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ecfeb4aa

14 Mar, 2013 1 commit

Introduce experimental G2D acceleration · ea2fc3e4

Siarhei Siamashka authored Mar 14, 2013



This initial G2D support code can speed up moving windows in XFCE. Currently
disabled by default, but can be enabled by editing /etc/X11/xorg.conf and
adding the following line to the "Device" section:

        Option          "AccelMethod" "G2D"
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ea2fc3e4