• Siarhei Siamashka's avatar
    Framebuffer readback assembly code for AArch64 · a5e698f1
    Siarhei Siamashka authored
    On a PINE64 board (ARM Cortex-A53), this provides ~180 MB/s
    speed for the framebuffer readback. For comparison, the normal
    memcpy operation in cached buffers runs at around ~1200 MB/s.
    
    Such read back speed is actually not very fast and is borderline
    usable. With a 1920x1080 32bpp screen resolution, this results in
    something like ~20 FPS scrolling.
    
    Benchmark vs. shadow framebuffer (1920x1080 32bpp):
    
      == Shadow framebuffer in xf86-video-fbdev ==
    
         $ wget http://mirror.its.dal.ca/gutenberg/3/2/0/3/32032/32032.txt
         $ time DISPLAY=:0 xterm +j -maximized -e cat 32032.txt
    
         real 0m43.909s
         user 0m0.820s
         sys  0m0.300s
    
         $ DISPLAY=:0 x11perf -scroll500 -copywinwin500 -copypixwin500 -copywinpix500
    
         15000 trep @   1.8460 msec (   542.0/sec): Scroll 500x500 pixels
         12000 trep @   2.2629 msec (   442.0/sec): Copy 500x500 from window to window
         12000 trep @   2.2096 msec (   453.0/sec): Copy 500x500 from pixmap to window
         14000 trep @   1.9740 msec (   507.0/sec): Copy 500x500 from window to pixmap
    
      == Direct framebuffer readback in xf86-video-fbturbo ==
    
         $ wget http://mirror.its.dal.ca/gutenberg/3/2/0/3/32032/32032.txt
         $ time DISPLAY=:0 xterm +j -maximized -e cat 32032.txt
    
         real 2m5.741s
         user 0m0.390s
         sys  0m0.190s
    
         $ DISPLAY=:0 x11perf -scroll500 -copywinwin500 -copypixwin500 -copywinpix500
    
          4500 trep @   5.9201 msec (   169.0/sec): Scroll 500x500 pixels
          6000 trep @   5.9211 msec (   169.0/sec): Copy 500x500 from window to window
         18000 trep @   1.5341 msec (   652.0/sec): Copy 500x500 from pixmap to window
          4000 trep @   6.4657 msec (   155.0/sec): Copy 500x500 from window to pixmap
    
      ==
    
    The direct framebuffer access without the shadow framebuffer layer
    makes scrolling and moving windows slower. But copying from pixmaps
    to windows becomes faster. In the real world, copying from offscreen
    pixmaps to windows is much more important, because it is one of the
    performance bottlenecks for almost every X11 application. While
    reading back from the framebuffer is only used for a few very
    specialized tasks (scrolling/moving windows and making screenshots).
    
    On 32-bit ARM systems, the uncached framebuffer readback used to
    perform better. Even the Cortex-A53 running in 32-bit mode can
    do framebuffer readback at more than 300 MB/s:
        https://github.com/ssvb/tinymembench/wiki/PINE64-(Allwinner-A64)
    
    
    
    Scrolling/moving windows still can be accelerated by the kernel
    (via DMA, a dedicated 2D accelerator or some other method) and
    hooked into xf86-video-fbturbo.
    Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
    a5e698f1
fbdev.c 38.9 KB