CPU: use VFP overlapped blit on VFP-capable hardware by default
This should be useful for Raspberry Pi. When reading uncached source buffers,
the VFP optimized overlapped two-pass blit is roughly 2-3 times slower than
memcpy in cached memory. Which makes it reasonably competitive compared to
ShadowFB (considering that ShadowFB allocates an extra buffer, does extra
memory copies which take time and thrash L2 cache, etc.). It even provides
a slight performance advantage in a more or less realistic use case
(scrolling in xterm), which needs reads from the framebuffer:
==== Before (xf86-video-fbdev with ShadowFB) ====
$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt
real 1m50.245s
user 0m1.750s
sys 0m0.800s
==== After (xf86-video-sunxifb without ShadowFB) ====
$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt
real 1m27.709s
user 0m1.690s
sys 0m0.920s
We get decent results even when reading from the framebuffer. However
in many typical workloads (excluding scrolling and dragging windows)
the framebuffer is primarily used as write-only. In write-only use
cases ShadowFB is just pure overhead. So getting rid of it is a
very good idea as this improves overall graphics performance.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Please register or sign in to comment