Commits · rpi-fb-copyarea-20130617 · adam.huang / Xf86 Video Fbturbo

17 Jun, 2013 1 commit
- HACK: test for DMA optimized fb_copyarea in the Raspberry Pi kernel · 4995422b
  Siarhei Siamashka authored Jun 17, 2013
  
  4995422b
12 Jun, 2013 4 commits

Add CPU optimization for PutImage · 06f5aec6

Harm Hanemaaijer authored Jun 07, 2013

Benchmark tests reveal that xorg's fb layer PutImage implementation
does not follow on optimal code path for requests without special
raster operations, which is due to the use of a slower general blit
function instead of the pixman library. This affects Xlib PutImage
requests and some ShmPutImage requests. In the case of ShmPutImage,
xorg directs ShmPutImage requests to PutImage only if the width of
the part of the image to be copied is equal to the full width of
the image, resulting in relatively poor performance. If the width
of the part of the image that is copied is smaller than the full
image, then xorg uses CopyArea which results in the use of the
already optimal pixman blit functions. The sub-optimal path is
commonly triggered by applications such as window managers and web
browsers.

To fix this unnecessary performance flaw, PutImage is replaced with
a version that uses pixman for the common case of GXcopy and all
plane masks sets. This change is device-independent and only uses
pixman CPU blit functions that is already present in the xorg server.

Using the low-level benchmark program benchx
(https://github.com/hglm/benchx.git

), the following speed-ups were
measured (1920x1080x32bpp) on an Allwinner A10 device:

ShmPutImageFullWidth (5 x 5): Speed up 9%
ShmPutImageFullWidth (7 x 7): Slow down 5%
ShmPutImageFullWidth (22 x 22): Speed up 8%
ShmPutImageFullWidth (49 x 49): Speed up 19%
ShmPutImageFullWidth (73 x 73): Speed up 55%
ShmPutImageFullWidth (109 x 109): Speed up 50%
ShmPutImageFullWidth (163 x 163): Speed up 37%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 77%
ShmPutImageFullWidth (549 x 549): Speed up 92%
AlignedShmPutImageFullWidth (5 x 5): Slow down 14%
AlignedShmPutImageFullWidth (7 x 7): Slow down 6%
AlignedShmPutImageFullWidth (15 x 15): Speed up 10%
AlignedShmPutImageFullWidth (22 x 22): Speed up 9%
AlignedShmPutImageFullWidth (33 x 33): Speed up 21%
AlignedShmPutImageFullWidth (49 x 49): Speed up 28%
AlignedShmPutImageFullWidth (73 x 73): Speed up 30%
AlignedShmPutImageFullWidth (109 x 109): Speed up 47%
AlignedShmPutImageFullWidth (163 x 163): Speed up 38%
AlignedShmPutImageFullWidth (244 x 244): Speed up 63%
AlignedShmPutImageFullWidth (366 x 366): Speed up 84%
AlignedShmPutImageFullWidth (549 x 549): Speed up 89%

At 16bpp the speed-up is even greater:

ShmPutImageFullWidth (5 x 5): Slow down 8%
ShmPutImageFullWidth (7 x 7): Slow down 8%
ShmPutImageFullWidth (10 x 10): Slow down 6%
ShmPutImageFullWidth (22 x 22): Speed up 9%
ShmPutImageFullWidth (33 x 33): Speed up 20%
ShmPutImageFullWidth (49 x 49): Speed up 27%
ShmPutImageFullWidth (73 x 73): Speed up 69%
ShmPutImageFullWidth (109 x 109): Speed up 74%
ShmPutImageFullWidth (163 x 163): Speed up 100%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 133%
ShmPutImageFullWidth (549 x 549): Speed up 123%
AlignedShmPutImageFullWidth (5 x 5): Speed up 6%
AlignedShmPutImageFullWidth (7 x 7): Slow down 9%
AlignedShmPutImageFullWidth (10 x 10): Slow down 10%
AlignedShmPutImageFullWidth (33 x 33): Speed up 17%
AlignedShmPutImageFullWidth (49 x 49): Speed up 34%
AlignedShmPutImageFullWidth (73 x 73): Speed up 49%
AlignedShmPutImageFullWidth (109 x 109): Speed up 53%
AlignedShmPutImageFullWidth (163 x 163): Speed up 69%
AlignedShmPutImageFullWidth (244 x 244): Speed up 82%
AlignedShmPutImageFullWidth (366 x 366): Speed up 116%
AlignedShmPutImageFullWidth (549 x 549): Speed up 110%
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

06f5aec6

CPU: use VFP overlapped blit on VFP-capable hardware by default · 3ad74420

Siarhei Siamashka authored Jun 12, 2013



This should be useful for Raspberry Pi. When reading uncached source buffers,
the VFP optimized overlapped two-pass blit is roughly 2-3 times slower than
memcpy in cached memory. Which makes it reasonably competitive compared to
ShadowFB (considering that ShadowFB allocates an extra buffer, does extra
memory copies which take time and thrash L2 cache, etc.). It even provides
a slight performance advantage in a more or less realistic use case
(scrolling in xterm), which needs reads from the framebuffer:

==== Before (xf86-video-fbdev with ShadowFB) ====

$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt

real    1m50.245s
user    0m1.750s
sys     0m0.800s

==== After (xf86-video-sunxifb without ShadowFB) ====

$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt

real    1m27.709s
user    0m1.690s
sys     0m0.920s

We get decent results even when reading from the framebuffer. However
in many typical workloads (excluding scrolling and dragging windows)
the framebuffer is primarily used as write-only. In write-only use
cases ShadowFB is just pure overhead. So getting rid of it is a
very good idea as this improves overall graphics performance.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

3ad74420

Fix segfault on exit (introduced by the new backing store code) · 3676a495

Siarhei Siamashka authored Jun 12, 2013



A small typo in a function argument and C compiler happily accepting
void pointers instead of something else is a dangerous combo. Need to
be more careful.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

3676a495

Backing store heuristics for improving windows dragging performance · f5501ff1

Siarhei Siamashka authored Jun 12, 2013

This patch implements a heuristics, which enables backing store for some
windows. When backing store is enabled for a window, the window gets a
backing pixmap (via automatic redirection provided by composite extension).
It acts a bit similar to ShadowFB, but for individual windows.

The advantage of backing store is that we can avoid "expose event -> redraw"
animated trail in the exposed area when dragging another window on top of it.
Dragging windows becomes much smoother and faster.

But the disadvantage of backing store is the same as for ShadowFB. That's a
loss of precious RAM, extra buffer copy when somebody tries to update window
content, potentially skip of some frames on fast animation (they just do
not reach screen). Also hardware accelerated scrolling does not currently
work for the windows with backing store enabled.

We try to make the best use of backing store by enabling backing store for
all the windows that are direct children of root, except the one which has
keyboard focus (either directly or via one of its children). In practice this
heuristics seems to provide nearly perfect results:
1) dragging windows is fast and smooth.
2) the top level window with the keyboard focus (typically the application
that a user is working with) is G2D accelerated and does not suffer from
any intermediate buffer copy overhead.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

f5501ff1

10 Jun, 2013 1 commit
- DRI2: Move DebugMsg macro to a common header · 1bbeff2f
  Siarhei Siamashka authored Jun 10, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  1bbeff2f
07 Jun, 2013 2 commits

Enable G2D acceleration by default on sun4i hardware · 3ea99e51

Siarhei Siamashka authored Jun 08, 2013

With the fallback to CPU backend for unsupported blits and also
threshold for avoiding small blits, now G2D should always provide
best overall performance.

The users of recent versions of xf86-video-sunxifb are supposed
to also have a reasonably recent version of linux-sunxi kernel.
Which includes the following fix:
  https://github.com/linux-sunxi/linux-sunxi/commit/3d49345343a1535b



The users of old kernels are going to see screen corruption on
dragging windows and scrolling. They just should upgrade :)
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

3ea99e51

G2D: Fallback to NEON optimized CPU backend for unsupported blits · cc8e2c79

Siarhei Siamashka authored Jun 08, 2013



The G2D driver only supports framebuffer->framebuffer blits and
also can't be used to accelerate dragging windows to the right
(without hacking the kernel driver to do two-pass blit there).
This patch adds fallback to NEON optimized CPU backend instead
of resorting to use poorly performing fbBlt in these cases.

Note: we assume that ioctls normally do not fail (even if they
      do, the slow old style fallback to fbBlt is not the worst
      thing to worry about).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

cc8e2c79

05 Jun, 2013 1 commit

CPU: Added ARM VFP two-pass overlapped blit implementation · b93dab5c

Siarhei Siamashka authored Jun 05, 2013



Using VFP, we can load up to 128 bytes with a single VLDM instruction.
But before this patch, only NEON implementation was available. Just
because it showed better results on Allwinner A10 compared to VFP.
And this DDX driver used to primarily target just sunxi hardware.

But looks like it makes sense to also target other devices (at least
ODROID-X, which has the same Mali400 GPU and can use the same DRI2
integration for EGL and GLESv2 support). And on the other ARM devices,
VFP aligned reads generally work better than NEON. The benchmark
results are listed below:

            1280x720, 32bpp, testing "x11perf -scroll500"

== Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement disabled ==

NEON : 10000 trep @   3.7101 msec (   270.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   2.6678 msec (   375.0/sec): Scroll 500x500 pixels

== Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement enabled ==

NEON : 15000 trep @   2.2568 msec (   443.0/sec): Scroll 500x500 pixels
VFP  : 15000 trep @   2.3016 msec (   434.0/sec): Scroll 500x500 pixels

== Exynos 4412, Cortex-A9 ==

NEON : 10000 trep @   4.5125 msec (   222.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   2.7015 msec (   370.0/sec): Scroll 500x500 pixels

== TI DM3730, Cortex-A8 ==

NEON : 15000 trep @   2.2303 msec (   448.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   3.0670 msec (   326.0/sec): Scroll 500x500 pixels

== Allwinner A10, Cortex-A8 ==

NEON : 10000 trep @   2.5559 msec (   391.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   3.0580 msec (   327.0/sec): Scroll 500x500 pixels

== Raspberry Pi, BCM2708, ARM1176 ==

VFP  :  3000 trep @   8.7699 msec (   114.0/sec): Scroll 500x500 pixels

The benchmark numbers in this particular test setup roughly represent
memory copy bandwidth measured in MB/s (when doing overlapped blits
inside of a writecombine mapped framebuffer).

-----------------------------------------------------------------------

Note: the use of VFP two-pass overlapped copy instead of ShadowFB is
      still not enabled by default when running on Raspberry Pi
      because the performance results are not so great.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

b93dab5c

03 Jun, 2013 1 commit

CPU: add ARM memcpy assembly function · ae976fe9

Siarhei Siamashka authored Jun 03, 2013

This is my old ARM9E/ARM11 memcpy code from
    https://garage.maemo.org/projects/mplayer/


with some tuning for Raspberry Pi (aligned prefetch added).

Will be used by VFP optimized overlapped blt function.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ae976fe9

02 Jun, 2013 2 commits

G2D: Implement "double speed" 16bpp blits · 98f1b119

Harm Hanemaaijer authored May 24, 2013



When source and destination coordinates allow it, a 16bpp screen-to-
screen blit is divided into up to three segments: two optional one
pixel wide edges and an  aligned middle segment that is copied in
32-bit mode.

This patch adds the low-level function sunxi_g2d_blit_r5g6b5_in_three
and adds logic to the general blit function to use it for 16bpp to
16bpp blits if the source and destination coordinates allow it. This
patch automatically enables the use of this optimization in the
sunxi G2D X driver. The area threshold for using G2D for
16bpp-to-16bpp blits was introduced in a previous patch.

Benchmarks:

1920x1080x16bpp@60Hz, ShadowFB disabled:

x11perf -scroll100
Before:
 350000 trep @   0.0881 msec ( 11400.0/sec): Scroll 100x100 pixels
After:
 350000 trep @   0.0819 msec ( 12200.0/sec): Scroll 100x100 pixels

x11perf -scroll500
Before:
  20000 trep @   1.3547 msec (   738.0/sec): Scroll 500x500 pixels
After:
  35000 trep @   0.8005 msec (  1250.0/sec): Scroll 500x500 pixels
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

98f1b119

G2D: Implement an area threshold for using G2D blits. · b3c2fd2c

Harm Hanemaaijer authored May 24, 2013



Due to the overhead of G2D for small screen-to-screen blits, CPU blits
are faster for small areas. This patch introduces are threshold below
which CPU blits are triggered. It is currently set to 1000 for 32bpp
and 2500 for 16bpp based on test results.

Some benchmarks:

1920x1080x16bppx60Hz, ShadowFB disabled:

x11perf -scroll10

Before:
1500000 trep @   0.0239 msec ( 41800.0/sec): Scroll 10x10 pixels
After:
2500000 trep @   0.0110 msec ( 90900.0/sec): Scroll 10x10 pixels

x11perf -copywinwin10

Before:
1200000 trep @   0.0247 msec ( 40500.0/sec): Copy 10x10 from window to window
After:
1800000 trep @   0.0146 msec ( 68600.0/sec): Copy 10x10 from window to window
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

b3c2fd2c

22 Apr, 2013 1 commit

test: race condition on DRI2 buffers allocation when going fullscreen · d147e25f

Siarhei Siamashka authored Apr 22, 2013

This test program exposes a problem related to window resizing
(or going fullscreen), which is may happen exactly between "back"
and "front" DRI2 buffers allocation.

The xtrace log with some annotations:

000:<:004c: 8: DRI2-Request(151,3): CreateDrawable drawable=0x02200001
000:<:004d: 16: DRI2-Request(151,5): GetBuffers drawable=0x02200001 attachments={attachment=BackLeft(0x00000001)};
000:>:004d:52: Reply to GetBuffers: width=480 height=480 buffers={attachment=BackLeft(0x00000001)
name=0x00000157 pitch=1920 cpp=4 flags=0x00000000};

Get the BackLeft buffer.

000:<:004e: 4: Request(43): GetInputFocus
000:>:004e:32: Reply to GetInputFocus: revert-to=PointerRoot(0x01) focus=0x02200001
000:<:004f: 24: Request(16): InternAtom only-if-exists=false(0x00) name='_NET_WM_STATE'
000:>:004f:32: Reply to InternAtom: atom=0xff("_NET_WM_STATE")
000:<:0050: 32: Request(16): InternAtom only-if-exists=false(0x00) name='_NET_WM_STATE_FULLSCREEN'
000:>:0050:32: Reply to InternAtom: atom=0x102("_NET_WM_STATE_FULLSCREEN")
000:<:0051: 44: Request(25): SendEvent propagate=false(0x00) destination=0x00000170 event-mask=SubstructureNotify,SubstructureRedirect
ClientMessage(33) format=0x20 window=0x02200001 type=0xff("_NET_WM_STATE")
data=0x01,0x00,0x00,0x00,0x02,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00;
000:<:0052: 4: Request(43): GetInputFocus
000:>:0052: Event DRI2-InvalidateBuffers(102) drawable=0x02200001

Here the X server attempts to notify the client side DRI2 code in the Mali blob
that the DRI2 buffer must be requested again. But this event gets happily ignored.

000:>:0052: Event Expose(12) window=0x02200001 x=0 y=0 width=1920 height=1080 count=0x0000
000:>:0052:32: Reply to GetInputFocus: revert-to=PointerRoot(0x01) focus=0x02200001
000:<:0053: 8: Request(3): GetWindowAttributes window=0x02200001
000:<:0054: 8: Request(14): GetGeometry drawable=0x02200001
000:>:0053:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:0054:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0
001:<:000c: 12: Request(98): QueryExtension name='DRI2'
001:>:000c:32: Reply to QueryExtension: present=true(0x01) major-opcode=151 first-event=101 first-error=0
001:<:000d: 32: DRI2-Request(151,8): SwapBuffers drawable=0x02200001 target_msc_hi=0 target_msc_lo=0
divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:>:000d: Event DRI2-BufferSwapComplete(101) drawable=0x00000002 ust_hi=35651585 ust_lo=0 msc_hi=0 msc_lo=0 sbc_hi=0 sbc_lo=1

Here the DRI2 code from the Mali blob tries to swap buffers (with the
hope that the allocated BackLeft would go to front)

001:>:000d:32: Reply to SwapBuffers: swap_hi=0 swap_lo=4096
000:<:0055: 8: DRI2-Request(151,3): CreateDrawable drawable=0x02200001
000:<:0056: 16: DRI2-Request(151,5): GetBuffers drawable=0x02200001 attachments={attachment=BackLeft(0x00000001)};
000:>:0056:52: Reply to GetBuffers: width=1920 height=1080 buffers={attachment=BackLeft(0x00000001)
name=0x00000159 pitch=7680 cpp=4 flags=0x00000000};

And requests for the new BackLeft DRI2 buffer.

000:<:0057: 4: Request(43): GetInputFocus
000:>:0057:32: Reply to GetInputFocus: revert-to=PointerRoot(0x01) focus=0x02200001
000:<:0058: 8: Request(3): GetWindowAttributes window=0x02200001
000:<:0059: 8: Request(14): GetGeometry drawable=0x02200001
000:>:0058:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:0059:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0
000:<:005a: 8: Request(3): GetWindowAttributes window=0x02200001
000:<:005b: 8: Request(14): GetGeometry drawable=0x02200001
000:>:005a:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:005b:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0
001:<:000e: 32: DRI2-Request(151,8): SwapBuffers drawable=0x02200001 target_msc_hi=0 target_msc_lo=0
divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:>:000e: Event DRI2-BufferSwapComplete(101) drawable=0x00000002 ust_hi=35651585 ust_lo=0 msc_hi=0 msc_lo=0 sbc_hi=0 sbc_lo=2

And here it is simply swapping the buffers.

001:>:000e:32: Reply to SwapBuffers: swap_hi=0 swap_lo=4096
000:<:005c: 8: Request(3): GetWindowAttributes window=0x02200001
000:<:005d: 8: Request(14): GetGeometry drawable=0x02200001
000:>:005c:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:005d:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0

And now it is polling for the change of window geometry. The same
"SwapBuffers -> GetGeometry -> SwapBuffers" pattern keeps repeating.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

d147e25f

30 Mar, 2013 1 commit

CPU: Added ARM NEON optimized CopyWindow/CopyArea implementation · 24d05b1d

Siarhei Siamashka authored Mar 30, 2013



Should be useful for better performance when moving windows
and scrolling on the devices without a dedicated 2D hardware
accelerator (Allwinner A13).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

24d05b1d

28 Mar, 2013 1 commit
- sunxi: Fix segfault when there is no "fbdev" option in xorg.conf · 000398d1
  Siarhei Siamashka authored Mar 27, 2013
```
Just use "/dev/fb0" by default.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  000398d1
26 Mar, 2013 3 commits

G2D: Now sunxi_x_g2d.c code does not require sunxi disp anymore · 1cd5f084

Siarhei Siamashka authored Mar 26, 2013



The sunxi_x_g2d.c file contains the midlayer code for hooking the
G2D optimized blit into xserver. But in fact it does not strictly
need to depend on anything sunxi specific.

So now we introduce a simple "blt2d_i" interface struct which
specifically provides a pointer to the accelerated blit function.
And just use this interface struct instead of the whole "sunxi_disp_t".
This allows to easily reuse the same code for other non-G2D or even
non-sunxi blit implementations in the future.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

1cd5f084

CPU: Remove unneeded test program bundled with runtime CPU detection · 66f3e5cc

Siarhei Siamashka authored Mar 26, 2013



The 'main' function got there by accident and was not spotted
earlier because the driver itself is a shared library.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

66f3e5cc

CPU: Added code for runtime CPU features detection · e6b1e48b
Siarhei Siamashka authored Mar 26, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
e6b1e48b

22 Mar, 2013 2 commits

G2D: enable accelerated blits for 16bpp color depth · 60291865

Siarhei Siamashka authored Mar 22, 2013

This is still not perfect, because G2D can't saturate memory bandwidth
for this color depth (it is fillrate limited). We should emulate 16bpp blits
with 32bpp blits whenever it is possible.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

60291865

G2D: attempt loading 'g2d_23' kernel module · 5f964213

Siarhei Siamashka authored Mar 22, 2013



It might be not statically compiled into kernel (for example in Fedora),
so we should try to explictly load it.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

5f964213

21 Mar, 2013 1 commit

G2D: accelerate CopyArea between different pixmaps in framebuffer · cc1b1410

Siarhei Siamashka authored Mar 21, 2013



Now source and destination pixmaps don't need to be the same for
using G2D acceleration (as long as both of them are allocated in
the framebuffer). This allows using G2D to copy pixels from DRI2
buffers to the framebuffer on the fallback path (when the window
of an OpenGL ES application is partially overlapped by some other
windows). Though it only works when composite extension is
disabled, for example by adding the following to xorg.conf:

    Section "Extensions"
        Option "Composite" "Disable"
    EndSection

If composite extension is enabled, windows have backing pixmaps, and
we have a longer chain of copies:

   DRI2 buffer -> backing pixmap -> framebuffer

Because backing pixmap is not allocated in a physically contiguous
memory, it can't be copied using G2D yet.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

cc1b1410

19 Mar, 2013 1 commit
- Suppress "[DISP] not supported scaler input pixel format:0" dmesg spam · d6fb7388
  Siarhei Siamashka authored Mar 20, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  d6fb7388
18 Mar, 2013 1 commit

G2D: Hardware acceleration for XCopyArea (initially 32bpp only) · ecfeb4aa

Siarhei Siamashka authored Mar 18, 2013



Wrap CreateGC function to add a hook for CopyArea operation, which
can be accelerated using G2D for the buffers inside of the visible
part of the framebuffer. In the future we may try to also ensure
that DRI2 buffers are copied using G2D instead of CPU in the case
if we hit the fallback path and can't avoid this copy.

Benchmark using "x11perf -scroll500 -copywinwin500":

=== ShadowFB (software rendering) ===

   3000 reps @   2.0308 msec (   492.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9741 msec (   507.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9826 msec (   504.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9830 msec (   504.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9965 msec (   501.0/sec): Scroll 500x500 pixels
  15000 trep @   1.9934 msec (   502.0/sec): Scroll 500x500 pixels

   1600 reps @   3.3054 msec (   303.0/sec): Copy 500x500 from window to window
   1600 reps @   3.3179 msec (   301.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2263 msec (   310.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2491 msec (   308.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2357 msec (   309.0/sec): Copy 500x500 from window to window
   8000 trep @   3.2669 msec (   306.0/sec): Copy 500x500 from window to window

=== G2D (hardware acceleration) ===

   3000 reps @   2.1949 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1929 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1923 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1889 msec (   457.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1941 msec (   456.0/sec): Scroll 500x500 pixels
  15000 trep @   2.1926 msec (   456.0/sec): Scroll 500x500 pixels

   2800 reps @   1.8114 msec (   552.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8103 msec (   552.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8160 msec (   551.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8099 msec (   553.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8126 msec (   552.0/sec): Copy 500x500 from window to window
  14000 trep @   1.8120 msec (   552.0/sec): Copy 500x500 from window to window

CPU usage remains low when running this test with G2D acceleration enabled.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ecfeb4aa

17 Mar, 2013 2 commits

DRI2: fix build problem introduced by the previous commit (stray line) · 8e6dd003
Siarhei Siamashka authored Mar 17, 2013
```
Reported-by: Maurice de la Ferté <kadava@gmx.de>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
8e6dd003

DRI2: more informative messages for /var/log/Xorg.0.log · 97f0c976

Siarhei Siamashka authored Mar 17, 2013



Explain that AIGLX is normally expected to fail and the users should
not really worry about it. Also provide a warning in the case if the
driver has been compiled without libUMP support (it could be that
the user actually wanted 3D acceleration, but just has not installed
all the needed dependencies).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

97f0c976

16 Mar, 2013 1 commit
- DRI2: Typo fixes (need to return NULL instead of FALSE) · c0620e17
  Siarhei Siamashka authored Mar 17, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  c0620e17
15 Mar, 2013 2 commits

test: Added missing sunxi_disp_close() to sunxi_g2d_bench · 7203a304
Siarhei Siamashka authored Mar 16, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
7203a304

test: Added a simple synthetic benchmark for G2D · 696b3d7e

Siarhei Siamashka authored Mar 15, 2013



It measures MPix/s numbers for blit and fill operations done
by G2D, and also for comparison tests the performance of the
same operations done by pixman (software rendering).

G2D has clock frequency configured to be half of the RAM clock
frequency. So for 480 MHz RAM, we have G2D clocked at 240 MHz,
which means that no more than 240 MPix can be processed per
second. Unfortunately this limits the performance of a simple
operation such as solid fill.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

696b3d7e

14 Mar, 2013 2 commits

Introduce experimental G2D acceleration · ea2fc3e4

Siarhei Siamashka authored Mar 14, 2013



This initial G2D support code can speed up moving windows in XFCE. Currently
disabled by default, but can be enabled by editing /etc/X11/xorg.conf and
adding the following line to the "Device" section:

        Option          "AccelMethod" "G2D"
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ea2fc3e4

Reuse the already existing xserver framebuffer mapping for sunxi_disp_t · ba548ffb

Siarhei Siamashka authored Mar 14, 2013

Avoid creating a new mapping because that's a waste of the virtual address
space. Also we are going to use this xserver framebuffer mapping address
for testing whether window backing pixmaps are allocated in the framebuffer
and can be accelerated by G2D.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ba548ffb

13 Mar, 2013 3 commits

test: use G2D acceleration in sunxi_disp_vsync_demo · 5d9c791d
Siarhei Siamashka authored Mar 13, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
5d9c791d

Added ioctl wrappers for simple G2D fill and blit operations · df534b22

Siarhei Siamashka authored Mar 13, 2013



The existing kernel driver from Allwinner for G2D accelerator
is quite bad because ioctls are synchronous and blocking the
caller thread, compromise security (basically it is a backdoor
for copying data in memory between any arbitrary physical
addresses) and have high overhead (each individual fill or
blit operation needs an ioctl). But we need to start with
something, so use this stuff as a placeholder.

The g2d_driver.h header file is taken from linux-sunxi-3.4
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

df534b22

Added 'test' directory and a demo program for testing tear-free animation · c09455c3

Siarhei Siamashka authored Mar 13, 2013



It is basically the first test program for the sunxi disp ioctls wrapper
code from "src/sunxi_disp.c".
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

c09455c3

12 Mar, 2013 1 commit
- Free sunxi_disp_t struct directly from sunxi_disp_close() · e939cc3c
  Siarhei Siamashka authored Mar 12, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  e939cc3c
11 Mar, 2013 1 commit
- DRI2: Try to explicitly load 'mali' and 'mali_drm' kernel modules · 19ac8389
  Siarhei Siamashka authored Mar 12, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  19ac8389
07 Mar, 2013 1 commit

Provide xorg.conf needed by the instructions from http://linux-sunxi.org · 6f708d4b

Siarhei Siamashka authored Mar 07, 2013

The installation instructions from http://linux-sunxi.org/Binary_drivers wiki
page currently ask the users to run the following command after compiling
and installing the ddx driver (either xf86-video-mali or xf86-video-sunxifb):

cp xorg.conf /usr/share/X11/xorg.conf.d/99-mali400.conf

Regardless of whether it is a good idea to touch /usr/share/X11/xorg.conf.d
directory in the first place, providing a sample xorg.conf file may save
some users from having unnecessary troubles.

Reported-by: Michal Suchanek via https://github.com/ssvb/xf86-video-sunxifb/pull/1

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

6f708d4b

23 Feb, 2013 1 commit
- More detailed debug messages related to DRI2 support · 950bf7e2
  Siarhei Siamashka authored Feb 20, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  950bf7e2
18 Feb, 2013 2 commits

Add support for hardware ARGB cursors up to 32x32 size · 06ae9b68

Siarhei Siamashka authored Feb 18, 2013

Actually they are converted to 32x32 with 256 color palette. In the
case if we have more than 256 unique colors, the color components
of the pixels are reduced from 8-bit to 7-bit, then to 6-bit if
necessary and so on (until we reduce the number of unique colors
so that they can fit the palette). In the worst case we may
theoretically end up with just 2 bits per A, R, G and B channels,
but in practice 7 or 6 bits seem to be enough.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

06ae9b68

Disable hardware layers when software ARGB cursor is used · 0ab0a9e8

Siarhei Siamashka authored Feb 18, 2013



The modern desktops may use ARGB cursors. As the current
sunxi display controller support code can't handle this
type of cursor yet, the X server fallbacks to a software
cursor which is not visible under layers and ruining user
experience.

This patch adds empty implementations for "UseHWCursorARGB"
and "LoadCursorARGB" functions which just return error for
now (so that the X server still fallbacks to software cursor).
However we also introduce callback functions responsible for
notifying the DRI2 code about enabling/disabling the use of
hardware cursor. So that now hardware overlays are disabled
when switching to software cursor and re-enabled again when
switching back to hardware cursor.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

0ab0a9e8

12 Feb, 2013 1 commit

Fix the creation of DRI2 buffers for pixmaps · 8507947e

Siarhei Siamashka authored Feb 12, 2013

Testing with gnome-shell revealed a problem. We need to migrate
pixmaps into UMP buffers in order to allow the GLESv2 based
compositing manager to actually access the content of redirected
windows, rendered by X11 applications into offscreen pixmaps.

Just to make sure that we don't add any unneeded overhead for 2D
(neither extra CPU cycles nor the increase for unrelated pixmaps
memory footprint), a hash table (currently uthash [1]) is used
for connecting DRI2-enabled pixmaps with UMP buffers. The lookups
are only performed on DRI2 buffer creation and pixmap destruction.

1. http://troydhanson.github.com/uthash/

Reported-by: Michal Suchanek <hramrach@gmail.com>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

8507947e