1. 29 Jul, 2013 1 commit
  2. 12 Jun, 2013 1 commit
    • Harm Hanemaaijer's avatar
      Add CPU optimization for PutImage · 06f5aec6
      Harm Hanemaaijer authored
      Benchmark tests reveal that xorg's fb layer PutImage implementation
      does not follow on optimal code path for requests without special
      raster operations, which is due to the use of a slower general blit
      function instead of the pixman library. This affects Xlib PutImage
      requests and some ShmPutImage requests. In the case of ShmPutImage,
      xorg directs ShmPutImage requests to PutImage only if the width of
      the part of the image to be copied is equal to the full width of
      the image, resulting in relatively poor performance. If the width
      of the part of the image that is copied is smaller than the full
      image, then xorg uses CopyArea which results in the use of the
      already optimal pixman blit functions. The sub-optimal path is
      commonly triggered by applications such as window managers and web
      browsers.
      
      To fix this unnecessary performance flaw, PutImage is replaced with
      a version that uses pixman for the common case of GXcopy and all
      plane masks sets. This change is device-independent and only uses
      pixman CPU blit functions that is already present in the xorg server.
      
      Using the low-level benchmark program benchx
      (https://github.com/hglm/benchx.git
      
      ), the following speed-ups were
      measured (1920x1080x32bpp) on an Allwinner A10 device:
      
      ShmPutImageFullWidth (5 x 5): Speed up 9%
      ShmPutImageFullWidth (7 x 7): Slow down 5%
      ShmPutImageFullWidth (22 x 22): Speed up 8%
      ShmPutImageFullWidth (49 x 49): Speed up 19%
      ShmPutImageFullWidth (73 x 73): Speed up 55%
      ShmPutImageFullWidth (109 x 109): Speed up 50%
      ShmPutImageFullWidth (163 x 163): Speed up 37%
      ShmPutImageFullWidth (244 x 244): Speed up 111%
      ShmPutImageFullWidth (366 x 366): Speed up 77%
      ShmPutImageFullWidth (549 x 549): Speed up 92%
      AlignedShmPutImageFullWidth (5 x 5): Slow down 14%
      AlignedShmPutImageFullWidth (7 x 7): Slow down 6%
      AlignedShmPutImageFullWidth (15 x 15): Speed up 10%
      AlignedShmPutImageFullWidth (22 x 22): Speed up 9%
      AlignedShmPutImageFullWidth (33 x 33): Speed up 21%
      AlignedShmPutImageFullWidth (49 x 49): Speed up 28%
      AlignedShmPutImageFullWidth (73 x 73): Speed up 30%
      AlignedShmPutImageFullWidth (109 x 109): Speed up 47%
      AlignedShmPutImageFullWidth (163 x 163): Speed up 38%
      AlignedShmPutImageFullWidth (244 x 244): Speed up 63%
      AlignedShmPutImageFullWidth (366 x 366): Speed up 84%
      AlignedShmPutImageFullWidth (549 x 549): Speed up 89%
      
      At 16bpp the speed-up is even greater:
      
      ShmPutImageFullWidth (5 x 5): Slow down 8%
      ShmPutImageFullWidth (7 x 7): Slow down 8%
      ShmPutImageFullWidth (10 x 10): Slow down 6%
      ShmPutImageFullWidth (22 x 22): Speed up 9%
      ShmPutImageFullWidth (33 x 33): Speed up 20%
      ShmPutImageFullWidth (49 x 49): Speed up 27%
      ShmPutImageFullWidth (73 x 73): Speed up 69%
      ShmPutImageFullWidth (109 x 109): Speed up 74%
      ShmPutImageFullWidth (163 x 163): Speed up 100%
      ShmPutImageFullWidth (244 x 244): Speed up 111%
      ShmPutImageFullWidth (366 x 366): Speed up 133%
      ShmPutImageFullWidth (549 x 549): Speed up 123%
      AlignedShmPutImageFullWidth (5 x 5): Speed up 6%
      AlignedShmPutImageFullWidth (7 x 7): Slow down 9%
      AlignedShmPutImageFullWidth (10 x 10): Slow down 10%
      AlignedShmPutImageFullWidth (33 x 33): Speed up 17%
      AlignedShmPutImageFullWidth (49 x 49): Speed up 34%
      AlignedShmPutImageFullWidth (73 x 73): Speed up 49%
      AlignedShmPutImageFullWidth (109 x 109): Speed up 53%
      AlignedShmPutImageFullWidth (163 x 163): Speed up 69%
      AlignedShmPutImageFullWidth (244 x 244): Speed up 82%
      AlignedShmPutImageFullWidth (366 x 366): Speed up 116%
      AlignedShmPutImageFullWidth (549 x 549): Speed up 110%
      Signed-off-by: default avatarHarm Hanemaaijer <fgenfb@yahoo.com>
      06f5aec6
  3. 26 Mar, 2013 1 commit
    • Siarhei Siamashka's avatar
      G2D: Now sunxi_x_g2d.c code does not require sunxi disp anymore · 1cd5f084
      Siarhei Siamashka authored
      
      
      The sunxi_x_g2d.c file contains the midlayer code for hooking the
      G2D optimized blit into xserver. But in fact it does not strictly
      need to depend on anything sunxi specific.
      
      So now we introduce a simple "blt2d_i" interface struct which
      specifically provides a pointer to the accelerated blit function.
      And just use this interface struct instead of the whole "sunxi_disp_t".
      This allows to easily reuse the same code for other non-G2D or even
      non-sunxi blit implementations in the future.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      1cd5f084
  4. 22 Mar, 2013 1 commit
  5. 21 Mar, 2013 1 commit
    • Siarhei Siamashka's avatar
      G2D: accelerate CopyArea between different pixmaps in framebuffer · cc1b1410
      Siarhei Siamashka authored
      
      
      Now source and destination pixmaps don't need to be the same for
      using G2D acceleration (as long as both of them are allocated in
      the framebuffer). This allows using G2D to copy pixels from DRI2
      buffers to the framebuffer on the fallback path (when the window
      of an OpenGL ES application is partially overlapped by some other
      windows). Though it only works when composite extension is
      disabled, for example by adding the following to xorg.conf:
      
          Section "Extensions"
              Option "Composite" "Disable"
          EndSection
      
      If composite extension is enabled, windows have backing pixmaps, and
      we have a longer chain of copies:
      
         DRI2 buffer -> backing pixmap -> framebuffer
      
      Because backing pixmap is not allocated in a physically contiguous
      memory, it can't be copied using G2D yet.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      cc1b1410
  6. 18 Mar, 2013 1 commit
    • Siarhei Siamashka's avatar
      G2D: Hardware acceleration for XCopyArea (initially 32bpp only) · ecfeb4aa
      Siarhei Siamashka authored
      
      
      Wrap CreateGC function to add a hook for CopyArea operation, which
      can be accelerated using G2D for the buffers inside of the visible
      part of the framebuffer. In the future we may try to also ensure
      that DRI2 buffers are copied using G2D instead of CPU in the case
      if we hit the fallback path and can't avoid this copy.
      
      Benchmark using "x11perf -scroll500 -copywinwin500":
      
      === ShadowFB (software rendering) ===
      
         3000 reps @   2.0308 msec (   492.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9741 msec (   507.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9826 msec (   504.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9830 msec (   504.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9965 msec (   501.0/sec): Scroll 500x500 pixels
        15000 trep @   1.9934 msec (   502.0/sec): Scroll 500x500 pixels
      
         1600 reps @   3.3054 msec (   303.0/sec): Copy 500x500 from window to window
         1600 reps @   3.3179 msec (   301.0/sec): Copy 500x500 from window to window
         1600 reps @   3.2263 msec (   310.0/sec): Copy 500x500 from window to window
         1600 reps @   3.2491 msec (   308.0/sec): Copy 500x500 from window to window
         1600 reps @   3.2357 msec (   309.0/sec): Copy 500x500 from window to window
         8000 trep @   3.2669 msec (   306.0/sec): Copy 500x500 from window to window
      
      === G2D (hardware acceleration) ===
      
         3000 reps @   2.1949 msec (   456.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1929 msec (   456.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1923 msec (   456.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1889 msec (   457.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1941 msec (   456.0/sec): Scroll 500x500 pixels
        15000 trep @   2.1926 msec (   456.0/sec): Scroll 500x500 pixels
      
         2800 reps @   1.8114 msec (   552.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8103 msec (   552.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8160 msec (   551.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8099 msec (   553.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8126 msec (   552.0/sec): Copy 500x500 from window to window
        14000 trep @   1.8120 msec (   552.0/sec): Copy 500x500 from window to window
      
      CPU usage remains low when running this test with G2D acceleration enabled.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      ecfeb4aa
  7. 14 Mar, 2013 1 commit