Commits · cbd5b2b6439308b293c84cce5e7923ef072c8110 · adam.huang / Xf86 Video Fbturbo

09 Sep, 2013 2 commits

mali: /var/log/Xorg.0.log warning about insufficient framebuffer size · cbd5b2b6

Siarhei Siamashka authored Sep 10, 2013

In the case if the framebuffer reservation size is too small for
efficient use of the hardware overlays and zero-copy buffers flipping,
log a hint about fixing this problem in /var/log/Xorg.0.log
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

cbd5b2b6

mali: added a sanity check for the UMP framebuffer wrappers size · b48269ab

Siarhei Siamashka authored Sep 10, 2013



Even though we are primarily using the UMP buffer obtained by the
GET_UMP_SECURE_ID_SUNXI_FB ioctl, another UMP buffer obtained by
the GET_UMP_SECURE_ID_BUF1 ioctl should also span over the whole
framebuffer. Otherwise we may have troubles with the window resize
bug recovery and buffer flipping.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

b48269ab

07 Sep, 2013 3 commits

A big README update · 4ae7f6cf

Siarhei Siamashka authored Sep 08, 2013



The instructions, links, etc.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

4ae7f6cf

sunxi: workaround a negative YUV overlay position bug · 8e6659b8

Siarhei Siamashka authored Sep 07, 2013

The Allwinner A10/A13 display controller hardware is expected to
support negative coordinates of the top left corners of the layers.
But there is some bug either in the kernel driver or in the hardware,
which messes up the picture on screen when the Y coordinate is negative
for YUV layer. Negative X coordinates are not affected. RGB formats
are not affected too (no matter whether the RGB layer is scaled or not).

We fix this by just recalculating which part of the buffer in memory
corresponds to Y=0 on screen and adjust the input buffer settings.

Fixes https://github.com/ssvb/xf86-video-sunxifb/issues/16

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

8e6659b8

mali: support sunxi hardware overlay also with r5g6b5 format · 37d5e05d

Siarhei Siamashka authored Sep 07, 2013



Now zero copy and tear free buffer swapping is also supported
for 16bpp desktop.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

37d5e05d

06 Sep, 2013 1 commit

sunxi: Only enable scaler for the layer when it is really necessary · 64a0d642

Siarhei Siamashka authored Sep 07, 2013

Now the scaler is enabled for the sunxi disp layer only when we want
to use it for YUV format with XV. Whenever the layer is configured
for RGB format or deactivated, the scaler gets disabled.

This should make the driver more friendly to the other potential
scaled layer users. The total number of available scalers is only
2 for Allwinner A10 and only 1 for Allwinner A13.

The potential drawback is that now we may get an error when trying
to enable the scaler (if somebody else has used up all the available
scalers) instead of always having it reserved and ready for use.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

64a0d642

13 Aug, 2013 1 commit

DRI2: Fix the kernel oops regression when DRI2HWOverlay=false · 6eb2defc

Siarhei Siamashka authored Aug 13, 2013



Recent changes broke the configuration when "DRI2HWOverlay" option
is set to "false". This patch adds the missing UMP secure ids
initialization and resolves the problem.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

6eb2defc

04 Aug, 2013 3 commits

DRI2: Rename all SunxiMaliDRI2 instances to 'mali' for clarity · ca05b0c0

Siarhei Siamashka authored Aug 04, 2013



Do this to keep the variables naming style consistent across the
source file (earlier these variables had different names like
'self', 'drvpriv', 'private').
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ca05b0c0

DRI2: Ensure correct ordering of frames after window resize · 30b4ca27

Siarhei Siamashka authored Aug 04, 2013

In double buffer mode, explicitly mark the buffers as designated
for odd or even frame position when putting them into queue. And
when swapping the buffers, use these flags to re-synchronize if
it is necessary. This prevents problems after window resize (when
gles-rgb-cycle-demo could expose a mismatch between the color
name in the window title and the actual window color).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

30b4ca27

test: use spacebar as a slow motion hotkey for gles-rgb-cycle-demo · 84ee17d9

Siarhei Siamashka authored Aug 04, 2013

Whenever something goes wrong in high fps mode, it may be interesting
to slow down the demo to check whether the actual background color
matches the expected color (shown in the window title).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

84ee17d9

03 Aug, 2013 1 commit

DRI2: Debugging code for testing the frames order correctness · 67d2e229

Siarhei Siamashka authored Aug 03, 2013



If DEBUG_WITH_RGB_PATTERN is defined, then we check that the
frames colors are changed as "R -> G -> B -> R -> G -> ..."
pattern and print debugging messages when this is not the
case. Such color change pattern can be generated by the
"test/gles-rgb-cycle-demo.c" program.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

67d2e229

31 Jul, 2013 3 commits

DRI2: erase the offscreen framebuffer part on first buffer allocation · e30ea496

Siarhei Siamashka authored Aug 01, 2013

Do this mostly for security reasons. We don't want any application
to see whatever was last rendered by the previous GLES application
by just peeking into a freshly allocated DRI2 buffer.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

e30ea496

DRI2: Don't waste overlay on a strange 1x1 window created by gnome-shell · 4a99dcef

Siarhei Siamashka authored Aug 01, 2013

We manage only a single hardware overlay. That's a precious shared
resource, which we want to use for zero-copy fullscreen compositing
in gnome-shell. The strange 1x1 window does not really need it.
Fixes https://github.com/ssvb/xf86-video-sunxifb/issues/2

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

4a99dcef

DRI2: Added new "SwapbuffersWait" option for xorg.conf · 789460c1

Siarhei Siamashka authored Jul 31, 2013



When enabled, it tries to avoid tearing in OpenGL ES applications.
Works on sunxi hardware in the case if the hardware overlay (sunxi
disp layer) is used for a DRI2 window. The name of this option and
the description in the man page has been borrowed from intel and
radeon drivers.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

789460c1

30 Jul, 2013 1 commit

DRI2: Implemented double buffering when using the hardware overlay · a60b0238

Siarhei Siamashka authored Jul 31, 2013

That's the right thing to do and fixes issues such as
    https://github.com/ssvb/xf86-video-sunxifb/issues/6



As a result, now the framebuffer size may need to be larger in
order to accomodate two DRI2 buffers in the offscreen part of
the framebuffer. The users of sunxi hardware are advised to
increase the value of fb0_framebuffer_num variable in fex file
to 3 for 32bpp mode and to 5 for 16bpp mode.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

a60b0238

29 Jul, 2013 2 commits

Explicitly include "gcstruct.h" for GCOps · 7b07f25b

Siarhei Siamashka authored Jul 29, 2013

Should fix https://github.com/ssvb/xf86-video-sunxifb/issues/14


and prevent FTBFS on some systems.
Reported-by: Fred Chien <cfsghost@gmail.com>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

7b07f25b

DRI2: Rely less on the information from DRI2BufferRec · a1e66a91

Siarhei Siamashka authored Jul 29, 2013



When moving further to our own DRI2 buffers bookkeeping, we can't
really trust the information from DRI2BufferRec anymore. So just add
a copy of all the missing bits of information to UMPBufferInfoRec
and use it instead.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

a1e66a91

28 Jul, 2013 1 commit

test: configurable delay between frames in gles-rgb-cycle-demo · 92b4c2cb

Siarhei Siamashka authored Jul 27, 2013



By allowing to set the delay between frames with milliseconds
precision in the command line, we can use it to test vsync.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

92b4c2cb

26 Jul, 2013 2 commits

DRI2: CPU copy fallback path does not drop half of the frames anymore · 0fd7d5de

Siarhei Siamashka authored Jul 27, 2013

The recent commit 9e0a8731

 (its part
that suppressed buffers reuse in the Xorg DRI2 framework) introduced
a regression. Half of the frames stoppped reaching the screen on
the CPU copy fallback path because the Mali blob now ended up
rendering them to the "wrong" buffer.

It just confirms that we need to completely move from the standard
DRI2 framework in the Xorg server to our own buffers bookkeeping
logic. This patch fixes the regression by introducing a single UMP
buffer per window, which is shared between back and front DRI2
buffers. We can do this because double buffering does not make much
sense on the fallback path at the moment (we can't set scanout from
this buffer and anyway have to copy this data elsewhere immediately
after we get it from Mali).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

0fd7d5de

DRI2: only pay attention to back buffers requests · 7994a0f3

Siarhei Siamashka authored Jul 26, 2013



Bail out earlier for the uninteresting types of DRI2 buffer
requests (by just returning a dummy null UMP buffer). Makes
the code a bit more simple on the common path.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

7994a0f3

24 Jul, 2013 3 commits

test: new gles-rgb-cycle-demo for testing the correctness of DRI2 · 1f89628c

Siarhei Siamashka authored Jul 25, 2013



The test program cycles through 3 colors (red, green, blue), so
it is easier to see if we get the color change pattern wrong.
Also the X11 window title is updated to indicate the current
color information. If we have any problems with window
decorations handling, they are likely to be exposed.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

1f89628c

DRI2: Refine the workaround for Mali r3p0 window resizing issue · d59ae8a7

Siarhei Siamashka authored Jul 25, 2013

Using the secure id 1 (framebuffer) to trick the Mali blob into
requesting DRI2 buffers again was not a very good idea. The problem
is that the blob still writes something there and corrupts the
framebuffer. So instead we try to assign secure id 2 to a dummy
4KiB UMP buffer allocated in memory and use it for the same purpose.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

d59ae8a7

DRI2: Workaround window resize bug in Mali r3p0 blob · 9e0a8731

Siarhei Siamashka authored Jul 24, 2013

The Mali blob is doing something like this:

 1. Request BackLeft DRI2 buffer (buffer A) and render to it
 2. Swap buffers
 3. Request BackLeft DRI2 buffer (buffer B)
 4. Check window size, and if it has changed - go back to step 1.
 5. Render to the current back buffer (either buffer A or B)
 6. Swap buffers
 7. Go back to step 4

A very serious show stopper problem is that the Mali blob ignores
DRI2-InvalidateBuffers events and just uses GetGeometry polling
to check whether the window size has changed. Unfortunately this
is racy and we may end up with a size mismatch between buffer A
and buffer B. This is particularly easy to trigger when the window
size changes exactly between steps 1 and 3.

See test/gles-yellow-blue-flip.c program which demonstrates this.
Qt5 applications also trigger this bug.

We workaround the issue by explicitly tracking the requests for
BackLeft buffers and checking whether the sizes of these buffers
match at step 1 and step 3. However the real challenge here is
notifying the client application that these buffers are no good,
so that it can request them again. As DRI2-InvalidateBuffers
events are ignored, we are in a pretty difficult situation.
Fortunately I remembered a weird behaviour observed earlier:

    https://groups.google.com/forum/#!msg/linux-sunxi/qnxpVaqp1Ys/aVTq09DVih0J



Actually if we return UMP secure ID value 1 for the second DRI2
buffer request, the blob responds to this by spitting out the
following error message:

    [EGL-X11] [2274] DETECTED ONLY ONE FRAMEBUFFER - FORCING A RESIZE
    [EGL-X11] [2274] DRI2 UMP ID 0x3 retrieved
    [EGL-X11] [2274] DRI2 WINDOW UMP SECURE ID CHANGED (0x3 -> 0x3)

And then it proceeds by re-trying to request a pair of DRI2 buffers.
But that's exactly the behaviour we want! As a down side, some ugly
flashing can be seen on screen at the time when this workaround kicks
in, but then everything normalizes. And unfortunately, the race
condition is still not totally eliminated because the blob is
apparently getting DRI2 buffer sizes from the separate GetGeometry
requests instead of using the information provided by DRI2GetBuffers.
But now the problem is at least very hard to trigger.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

9e0a8731

20 Jul, 2013 1 commit

Fix XV border artifacts when using gstreamer 1.0 · 0a3dbfba

Harm Hanemaaijer authored Jul 20, 2013

Since version 1.0, gstreamer (when using xvimagesink) often
allocates a larger XV image for the video with padding on all
four sides and then calls XvPutImage() to render a part of this
image. With the current XV implementation this results in
artifacts on the borders of the image, with a green bar at the
bottom.

I am observing this when playing a 1280x720 video on a 1920x1080
screen at 32bpp, the size of the video window doesn't matter.

This problem seems to be an exaggeration of the one described in
https://bugzilla.gnome.org/show_bug.cgi?id=685305

.

The solution appears to be to use the source area dimensions as
requested in the XvPutImage() call, as opposed to the dimensions
of the originally allocated image, and to honour the offsets
(src_x, src_y) when setting the source region on the display
controller. With this relatively simple change, the problem seems
to go away, and gstreamer 1.0 (which is faster than gstreamer 0.10
due to a zero-copy strategy) provides an acceptable solution for
video playback.
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

0a3dbfba

19 Jul, 2013 1 commit

Don't initialize XV if we can't reserve a scalable sunxi-disp layer · febafa2b

Siarhei Siamashka authored Jul 19, 2013



In the case if an attempt to reserve a scalable sunxi-disp layer
failed, don't initialize XV at all. Otherwise any attempt to use
XV overlay is not going to work correctly and just results in
the following dmesg spam:

[  728.280000] [DISP] not supported yuv channel format:18 in img_sw_para_to_reg

This may happen on Allwinner A13 if scaler mode is enabled in
.fex file (A13 only has one DEFE scaler). Allwinner A10 also
can have similar troubles in dual-head configuration if scaler
mode is enabled for one or both screens (A10 has two DEFE scalers).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

febafa2b

18 Jul, 2013 2 commits

Update man page and README to reflect diverse platform support · 15a30609

Harm Hanemaaijer authored Jul 19, 2013



Update the man page and bring it up-to-date, reflecting the fact
that the driver also supports non-sunxi platforms. Add description
of the "XVHWOverlay" option.

Also a small update to the README for similar reasons.
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

15a30609

Add option to disable XV hardware overlay · d39ccbfe

Harm Hanemaaijer authored Jul 18, 2013



Add the "XVHWOverlay" boolean xorg.conf option to make it possible
to disable the XV acceleration feature using display layers on
sunxi hardware.
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

d39ccbfe

17 Jul, 2013 2 commits

configure.ac: workaround libump/pthreads build issue · 85772fe3

Siarhei Siamashka authored Jul 17, 2013

In some systems libump library is built without an explicit pthreads
dependency. As the issue has been already confirmed to affect both
sunxi and odroid users (and maybe the users of the other mali400
based hardware), it is easier to just workaround the problem locally.
Otherwise we would need to hunt down all the libump packagers and
beg for the fix.

More details are at https://github.com/ssvb/xf86-video-sunxifb/issues/11



Reported-by: Patrick Wood
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

85772fe3

Define ARRAY_SIZE macro if it is not provided by Xorg headers · de4c24e0
Siarhei Siamashka authored Jul 17, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
de4c24e0

16 Jul, 2013 1 commit

Added initial XV extension support for sunxi hardware · f99da9c5

Siarhei Siamashka authored Jul 16, 2013



Proper layer sharing between XV and DRI2 still needs to be implemented.
Additionally we still need NEON and/or G2D "textured overlay" as a
fallback solution for the composited desktop (NEON optimized XV is going
to be useful for a wide range of ARM devices). A bit of performance
tuning is also necessary.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

f99da9c5

11 Jul, 2013 1 commit

sunxi: disp ioctl wrappers for YUV overlay and color key support · dc478c9d

Siarhei Siamashka authored Jul 12, 2013



They are needed for a basic XV extension implementation.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

dc478c9d

12 Jun, 2013 4 commits

Add CPU optimization for PutImage · 06f5aec6

Harm Hanemaaijer authored Jun 07, 2013

Benchmark tests reveal that xorg's fb layer PutImage implementation
does not follow on optimal code path for requests without special
raster operations, which is due to the use of a slower general blit
function instead of the pixman library. This affects Xlib PutImage
requests and some ShmPutImage requests. In the case of ShmPutImage,
xorg directs ShmPutImage requests to PutImage only if the width of
the part of the image to be copied is equal to the full width of
the image, resulting in relatively poor performance. If the width
of the part of the image that is copied is smaller than the full
image, then xorg uses CopyArea which results in the use of the
already optimal pixman blit functions. The sub-optimal path is
commonly triggered by applications such as window managers and web
browsers.

To fix this unnecessary performance flaw, PutImage is replaced with
a version that uses pixman for the common case of GXcopy and all
plane masks sets. This change is device-independent and only uses
pixman CPU blit functions that is already present in the xorg server.

Using the low-level benchmark program benchx
(https://github.com/hglm/benchx.git

), the following speed-ups were
measured (1920x1080x32bpp) on an Allwinner A10 device:

ShmPutImageFullWidth (5 x 5): Speed up 9%
ShmPutImageFullWidth (7 x 7): Slow down 5%
ShmPutImageFullWidth (22 x 22): Speed up 8%
ShmPutImageFullWidth (49 x 49): Speed up 19%
ShmPutImageFullWidth (73 x 73): Speed up 55%
ShmPutImageFullWidth (109 x 109): Speed up 50%
ShmPutImageFullWidth (163 x 163): Speed up 37%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 77%
ShmPutImageFullWidth (549 x 549): Speed up 92%
AlignedShmPutImageFullWidth (5 x 5): Slow down 14%
AlignedShmPutImageFullWidth (7 x 7): Slow down 6%
AlignedShmPutImageFullWidth (15 x 15): Speed up 10%
AlignedShmPutImageFullWidth (22 x 22): Speed up 9%
AlignedShmPutImageFullWidth (33 x 33): Speed up 21%
AlignedShmPutImageFullWidth (49 x 49): Speed up 28%
AlignedShmPutImageFullWidth (73 x 73): Speed up 30%
AlignedShmPutImageFullWidth (109 x 109): Speed up 47%
AlignedShmPutImageFullWidth (163 x 163): Speed up 38%
AlignedShmPutImageFullWidth (244 x 244): Speed up 63%
AlignedShmPutImageFullWidth (366 x 366): Speed up 84%
AlignedShmPutImageFullWidth (549 x 549): Speed up 89%

At 16bpp the speed-up is even greater:

ShmPutImageFullWidth (5 x 5): Slow down 8%
ShmPutImageFullWidth (7 x 7): Slow down 8%
ShmPutImageFullWidth (10 x 10): Slow down 6%
ShmPutImageFullWidth (22 x 22): Speed up 9%
ShmPutImageFullWidth (33 x 33): Speed up 20%
ShmPutImageFullWidth (49 x 49): Speed up 27%
ShmPutImageFullWidth (73 x 73): Speed up 69%
ShmPutImageFullWidth (109 x 109): Speed up 74%
ShmPutImageFullWidth (163 x 163): Speed up 100%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 133%
ShmPutImageFullWidth (549 x 549): Speed up 123%
AlignedShmPutImageFullWidth (5 x 5): Speed up 6%
AlignedShmPutImageFullWidth (7 x 7): Slow down 9%
AlignedShmPutImageFullWidth (10 x 10): Slow down 10%
AlignedShmPutImageFullWidth (33 x 33): Speed up 17%
AlignedShmPutImageFullWidth (49 x 49): Speed up 34%
AlignedShmPutImageFullWidth (73 x 73): Speed up 49%
AlignedShmPutImageFullWidth (109 x 109): Speed up 53%
AlignedShmPutImageFullWidth (163 x 163): Speed up 69%
AlignedShmPutImageFullWidth (244 x 244): Speed up 82%
AlignedShmPutImageFullWidth (366 x 366): Speed up 116%
AlignedShmPutImageFullWidth (549 x 549): Speed up 110%
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>

06f5aec6

CPU: use VFP overlapped blit on VFP-capable hardware by default · 3ad74420

Siarhei Siamashka authored Jun 12, 2013



This should be useful for Raspberry Pi. When reading uncached source buffers,
the VFP optimized overlapped two-pass blit is roughly 2-3 times slower than
memcpy in cached memory. Which makes it reasonably competitive compared to
ShadowFB (considering that ShadowFB allocates an extra buffer, does extra
memory copies which take time and thrash L2 cache, etc.). It even provides
a slight performance advantage in a more or less realistic use case
(scrolling in xterm), which needs reads from the framebuffer:

==== Before (xf86-video-fbdev with ShadowFB) ====

$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt

real    1m50.245s
user    0m1.750s
sys     0m0.800s

==== After (xf86-video-sunxifb without ShadowFB) ====

$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt

real    1m27.709s
user    0m1.690s
sys     0m0.920s

We get decent results even when reading from the framebuffer. However
in many typical workloads (excluding scrolling and dragging windows)
the framebuffer is primarily used as write-only. In write-only use
cases ShadowFB is just pure overhead. So getting rid of it is a
very good idea as this improves overall graphics performance.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

3ad74420

Fix segfault on exit (introduced by the new backing store code) · 3676a495

Siarhei Siamashka authored Jun 12, 2013



A small typo in a function argument and C compiler happily accepting
void pointers instead of something else is a dangerous combo. Need to
be more careful.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

3676a495

Backing store heuristics for improving windows dragging performance · f5501ff1

Siarhei Siamashka authored Jun 12, 2013

This patch implements a heuristics, which enables backing store for some
windows. When backing store is enabled for a window, the window gets a
backing pixmap (via automatic redirection provided by composite extension).
It acts a bit similar to ShadowFB, but for individual windows.

The advantage of backing store is that we can avoid "expose event -> redraw"
animated trail in the exposed area when dragging another window on top of it.
Dragging windows becomes much smoother and faster.

But the disadvantage of backing store is the same as for ShadowFB. That's a
loss of precious RAM, extra buffer copy when somebody tries to update window
content, potentially skip of some frames on fast animation (they just do
not reach screen). Also hardware accelerated scrolling does not currently
work for the windows with backing store enabled.

We try to make the best use of backing store by enabling backing store for
all the windows that are direct children of root, except the one which has
keyboard focus (either directly or via one of its children). In practice this
heuristics seems to provide nearly perfect results:
1) dragging windows is fast and smooth.
2) the top level window with the keyboard focus (typically the application
that a user is working with) is G2D accelerated and does not suffer from
any intermediate buffer copy overhead.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

f5501ff1

10 Jun, 2013 1 commit
- DRI2: Move DebugMsg macro to a common header · 1bbeff2f
  Siarhei Siamashka authored Jun 10, 2013
```
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
```
  1bbeff2f
07 Jun, 2013 2 commits

Enable G2D acceleration by default on sun4i hardware · 3ea99e51

Siarhei Siamashka authored Jun 08, 2013

With the fallback to CPU backend for unsupported blits and also
threshold for avoiding small blits, now G2D should always provide
best overall performance.

The users of recent versions of xf86-video-sunxifb are supposed
to also have a reasonably recent version of linux-sunxi kernel.
Which includes the following fix:
  https://github.com/linux-sunxi/linux-sunxi/commit/3d49345343a1535b



The users of old kernels are going to see screen corruption on
dragging windows and scrolling. They just should upgrade :)
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

3ea99e51

G2D: Fallback to NEON optimized CPU backend for unsupported blits · cc8e2c79

Siarhei Siamashka authored Jun 08, 2013



The G2D driver only supports framebuffer->framebuffer blits and
also can't be used to accelerate dragging windows to the right
(without hacking the kernel driver to do two-pass blit there).
This patch adds fallback to NEON optimized CPU backend instead
of resorting to use poorly performing fbBlt in these cases.

Note: we assume that ioctls normally do not fail (even if they
      do, the slow old style fallback to fbBlt is not the worst
      thing to worry about).
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

cc8e2c79

05 Jun, 2013 1 commit

CPU: Added ARM VFP two-pass overlapped blit implementation · b93dab5c

Siarhei Siamashka authored Jun 05, 2013



Using VFP, we can load up to 128 bytes with a single VLDM instruction.
But before this patch, only NEON implementation was available. Just
because it showed better results on Allwinner A10 compared to VFP.
And this DDX driver used to primarily target just sunxi hardware.

But looks like it makes sense to also target other devices (at least
ODROID-X, which has the same Mali400 GPU and can use the same DRI2
integration for EGL and GLESv2 support). And on the other ARM devices,
VFP aligned reads generally work better than NEON. The benchmark
results are listed below:

            1280x720, 32bpp, testing "x11perf -scroll500"

== Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement disabled ==

NEON : 10000 trep @   3.7101 msec (   270.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   2.6678 msec (   375.0/sec): Scroll 500x500 pixels

== Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement enabled ==

NEON : 15000 trep @   2.2568 msec (   443.0/sec): Scroll 500x500 pixels
VFP  : 15000 trep @   2.3016 msec (   434.0/sec): Scroll 500x500 pixels

== Exynos 4412, Cortex-A9 ==

NEON : 10000 trep @   4.5125 msec (   222.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   2.7015 msec (   370.0/sec): Scroll 500x500 pixels

== TI DM3730, Cortex-A8 ==

NEON : 15000 trep @   2.2303 msec (   448.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   3.0670 msec (   326.0/sec): Scroll 500x500 pixels

== Allwinner A10, Cortex-A8 ==

NEON : 10000 trep @   2.5559 msec (   391.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   3.0580 msec (   327.0/sec): Scroll 500x500 pixels

== Raspberry Pi, BCM2708, ARM1176 ==

VFP  :  3000 trep @   8.7699 msec (   114.0/sec): Scroll 500x500 pixels

The benchmark numbers in this particular test setup roughly represent
memory copy bandwidth measured in MB/s (when doing overlapped blits
inside of a writecombine mapped framebuffer).

-----------------------------------------------------------------------

Note: the use of VFP two-pass overlapped copy instead of ShadowFB is
      still not enabled by default when running on Raspberry Pi
      because the performance results are not so great.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

b93dab5c

03 Jun, 2013 1 commit

CPU: add ARM memcpy assembly function · ae976fe9

Siarhei Siamashka authored Jun 03, 2013

This is my old ARM9E/ARM11 memcpy code from
    https://garage.maemo.org/projects/mplayer/


with some tuning for Raspberry Pi (aligned prefetch added).

Will be used by VFP optimized overlapped blt function.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

ae976fe9