1. 08 Oct, 2013 1 commit
    • Siarhei Siamashka's avatar
      RPi: implement threshold for deciding between CPU and DMA blits · 102957f9
      Siarhei Siamashka authored
      
      
      Benchmarking with x11perf, modified to support wider range of sizes
      for the scroll operation. Tests have been run at the stock 700MHz CPU
      clock frequency and with 1280x720 32bpp desktop.
      
      $ DISPLAY=:0 ./x11perf -scroll5 -scroll10 -scroll15 -scroll20 \
                             -scroll30 -scroll50 -scroll100
      
      == CPU ==
      
      1000000 trep @   0.0289 msec ( 34600.0/sec): Scroll 5x5 pixels
      1000000 trep @   0.0387 msec ( 25800.0/sec): Scroll 10x10 pixels
      1000000 trep @   0.0459 msec ( 21800.0/sec): Scroll 15x15 pixels
       450000 trep @   0.0576 msec ( 17300.0/sec): Scroll 20x20 pixels
       350000 trep @   0.0817 msec ( 12200.0/sec): Scroll 30x30 pixels
       200000 trep @   0.1564 msec (  6390.0/sec): Scroll 50x50 pixels
       100000 trep @   0.4446 msec (  2250.0/sec): Scroll 100x100 pixels
      
      == fb_copyarea (DMA) acceleration ==
      
      1000000 trep @   0.0307 msec ( 32500.0/sec): Scroll 5x5 pixels
      1000000 trep @   0.0353 msec ( 28300.0/sec): Scroll 10x10 pixels
      1000000 trep @   0.0397 msec ( 25200.0/sec): Scroll 15x15 pixels
      1000000 trep @   0.0464 msec ( 21600.0/sec): Scroll 20x20 pixels
       400000 trep @   0.0645 msec ( 15500.0/sec): Scroll 30x30 pixels
       250000 trep @   0.1177 msec (  8500.0/sec): Scroll 50x50 pixels
       100000 trep @   0.2783 msec (  3590.0/sec): Scroll 100x100 pixels
      
      This shows that the ioctls overhead and the DMA setup cost is not so
      significant for the Raspberry Pi. DMA already becomes a bit faster
      than CPU at 10x10 size of the blit operation.
      
      Even though there is no significant difference between CPU and DMA
      for extremely small sizes of operations (the other overhead is clearly
      dominating), setting a threshold is not going to harm:
      
      == mixed CPU / fb_copyarea (DMA) with 90 pixels threshold ==
      
      1000000 trep @   0.0291 msec ( 34300.0/sec): Scroll 5x5 pixels
      1000000 trep @   0.0345 msec ( 29000.0/sec): Scroll 10x10 pixels
      1000000 trep @   0.0395 msec ( 25300.0/sec): Scroll 15x15 pixels
      1000000 trep @   0.0466 msec ( 21400.0/sec): Scroll 20x20 pixels
       400000 trep @   0.0650 msec ( 15400.0/sec): Scroll 30x30 pixels
       250000 trep @   0.1181 msec (  8470.0/sec): Scroll 50x50 pixels
       100000 trep @   0.2784 msec (  3590.0/sec): Scroll 100x100 pixels
      
      If some other ARM devices also implement Raspberry Pi compatible
      accelerated fb_copyarea ioctl, then the threshold selection may
      be reconsidered.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      102957f9
  2. 03 Oct, 2013 1 commit