Commit 102957f9 authored by Siarhei Siamashka's avatar Siarhei Siamashka
Browse files

RPi: implement threshold for deciding between CPU and DMA blits



Benchmarking with x11perf, modified to support wider range of sizes
for the scroll operation. Tests have been run at the stock 700MHz CPU
clock frequency and with 1280x720 32bpp desktop.

$ DISPLAY=:0 ./x11perf -scroll5 -scroll10 -scroll15 -scroll20 \
                       -scroll30 -scroll50 -scroll100

== CPU ==

1000000 trep @   0.0289 msec ( 34600.0/sec): Scroll 5x5 pixels
1000000 trep @   0.0387 msec ( 25800.0/sec): Scroll 10x10 pixels
1000000 trep @   0.0459 msec ( 21800.0/sec): Scroll 15x15 pixels
 450000 trep @   0.0576 msec ( 17300.0/sec): Scroll 20x20 pixels
 350000 trep @   0.0817 msec ( 12200.0/sec): Scroll 30x30 pixels
 200000 trep @   0.1564 msec (  6390.0/sec): Scroll 50x50 pixels
 100000 trep @   0.4446 msec (  2250.0/sec): Scroll 100x100 pixels

== fb_copyarea (DMA) acceleration ==

1000000 trep @   0.0307 msec ( 32500.0/sec): Scroll 5x5 pixels
1000000 trep @   0.0353 msec ( 28300.0/sec): Scroll 10x10 pixels
1000000 trep @   0.0397 msec ( 25200.0/sec): Scroll 15x15 pixels
1000000 trep @   0.0464 msec ( 21600.0/sec): Scroll 20x20 pixels
 400000 trep @   0.0645 msec ( 15500.0/sec): Scroll 30x30 pixels
 250000 trep @   0.1177 msec (  8500.0/sec): Scroll 50x50 pixels
 100000 trep @   0.2783 msec (  3590.0/sec): Scroll 100x100 pixels

This shows that the ioctls overhead and the DMA setup cost is not so
significant for the Raspberry Pi. DMA already becomes a bit faster
than CPU at 10x10 size of the blit operation.

Even though there is no significant difference between CPU and DMA
for extremely small sizes of operations (the other overhead is clearly
dominating), setting a threshold is not going to harm:

== mixed CPU / fb_copyarea (DMA) with 90 pixels threshold ==

1000000 trep @   0.0291 msec ( 34300.0/sec): Scroll 5x5 pixels
1000000 trep @   0.0345 msec ( 29000.0/sec): Scroll 10x10 pixels
1000000 trep @   0.0395 msec ( 25300.0/sec): Scroll 15x15 pixels
1000000 trep @   0.0466 msec ( 21400.0/sec): Scroll 20x20 pixels
 400000 trep @   0.0650 msec ( 15400.0/sec): Scroll 30x30 pixels
 250000 trep @   0.1181 msec (  8470.0/sec): Scroll 50x50 pixels
 100000 trep @   0.2784 msec (  3590.0/sec): Scroll 100x100 pixels

If some other ARM devices also implement Raspberry Pi compatible
accelerated fb_copyarea ioctl, then the threshold selection may
be reconsidered.
Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
parent a446b3bb
......@@ -39,6 +39,9 @@
*/
#define FBIOCOPYAREA _IOW('z', 0x21, struct fb_copyarea)
/* Fallback to CPU when handling less than COPYAREA_BLT_SIZE_THRESHOLD pixels */
#define COPYAREA_BLT_SIZE_THRESHOLD 90
fb_copyarea_t *fb_copyarea_init(const char *device, void *xserver_fbmem)
{
fb_copyarea_t *ctx = calloc(sizeof(fb_copyarea_t), 1);
......@@ -197,6 +200,9 @@ int fb_copyarea_blt(void *self,
return FALLBACK_BLT();
}
if (w * h < COPYAREA_BLT_SIZE_THRESHOLD)
return FALLBACK_BLT();
copyarea.sx = src_x;
copyarea.sy = src_y;
copyarea.dx = dst_x;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment