• Siarhei Siamashka's avatar
    RPi: implement threshold for deciding between CPU and DMA blits · 102957f9
    Siarhei Siamashka authored
    
    
    Benchmarking with x11perf, modified to support wider range of sizes
    for the scroll operation. Tests have been run at the stock 700MHz CPU
    clock frequency and with 1280x720 32bpp desktop.
    
    $ DISPLAY=:0 ./x11perf -scroll5 -scroll10 -scroll15 -scroll20 \
                           -scroll30 -scroll50 -scroll100
    
    == CPU ==
    
    1000000 trep @   0.0289 msec ( 34600.0/sec): Scroll 5x5 pixels
    1000000 trep @   0.0387 msec ( 25800.0/sec): Scroll 10x10 pixels
    1000000 trep @   0.0459 msec ( 21800.0/sec): Scroll 15x15 pixels
     450000 trep @   0.0576 msec ( 17300.0/sec): Scroll 20x20 pixels
     350000 trep @   0.0817 msec ( 12200.0/sec): Scroll 30x30 pixels
     200000 trep @   0.1564 msec (  6390.0/sec): Scroll 50x50 pixels
     100000 trep @   0.4446 msec (  2250.0/sec): Scroll 100x100 pixels
    
    == fb_copyarea (DMA) acceleration ==
    
    1000000 trep @   0.0307 msec ( 32500.0/sec): Scroll 5x5 pixels
    1000000 trep @   0.0353 msec ( 28300.0/sec): Scroll 10x10 pixels
    1000000 trep @   0.0397 msec ( 25200.0/sec): Scroll 15x15 pixels
    1000000 trep @   0.0464 msec ( 21600.0/sec): Scroll 20x20 pixels
     400000 trep @   0.0645 msec ( 15500.0/sec): Scroll 30x30 pixels
     250000 trep @   0.1177 msec (  8500.0/sec): Scroll 50x50 pixels
     100000 trep @   0.2783 msec (  3590.0/sec): Scroll 100x100 pixels
    
    This shows that the ioctls overhead and the DMA setup cost is not so
    significant for the Raspberry Pi. DMA already becomes a bit faster
    than CPU at 10x10 size of the blit operation.
    
    Even though there is no significant difference between CPU and DMA
    for extremely small sizes of operations (the other overhead is clearly
    dominating), setting a threshold is not going to harm:
    
    == mixed CPU / fb_copyarea (DMA) with 90 pixels threshold ==
    
    1000000 trep @   0.0291 msec ( 34300.0/sec): Scroll 5x5 pixels
    1000000 trep @   0.0345 msec ( 29000.0/sec): Scroll 10x10 pixels
    1000000 trep @   0.0395 msec ( 25300.0/sec): Scroll 15x15 pixels
    1000000 trep @   0.0466 msec ( 21400.0/sec): Scroll 20x20 pixels
     400000 trep @   0.0650 msec ( 15400.0/sec): Scroll 30x30 pixels
     250000 trep @   0.1181 msec (  8470.0/sec): Scroll 50x50 pixels
     100000 trep @   0.2784 msec (  3590.0/sec): Scroll 100x100 pixels
    
    If some other ARM devices also implement Raspberry Pi compatible
    accelerated fb_copyarea ioctl, then the threshold selection may
    be reconsidered.
    Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
    102957f9
fb_copyarea.c 7.28 KB