1. 05 Jun, 2013 1 commit
    • Siarhei Siamashka's avatar
      CPU: Added ARM VFP two-pass overlapped blit implementation · b93dab5c
      Siarhei Siamashka authored
      
      
      Using VFP, we can load up to 128 bytes with a single VLDM instruction.
      But before this patch, only NEON implementation was available. Just
      because it showed better results on Allwinner A10 compared to VFP.
      And this DDX driver used to primarily target just sunxi hardware.
      
      But looks like it makes sense to also target other devices (at least
      ODROID-X, which has the same Mali400 GPU and can use the same DRI2
      integration for EGL and GLESv2 support). And on the other ARM devices,
      VFP aligned reads generally work better than NEON. The benchmark
      results are listed below:
      
                  1280x720, 32bpp, testing "x11perf -scroll500"
      
      == Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement disabled ==
      
      NEON : 10000 trep @   3.7101 msec (   270.0/sec): Scroll 500x500 pixels
      VFP  : 10000 trep @   2.6678 msec (   375.0/sec): Scroll 500x500 pixels
      
      == Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement enabled ==
      
      NEON : 15000 trep @   2.2568 msec (   443.0/sec): Scroll 500x500 pixels
      VFP  : 15000 trep @   2.3016 msec (   434.0/sec): Scroll 500x500 pixels
      
      == Exynos 4412, Cortex-A9 ==
      
      NEON : 10000 trep @   4.5125 msec (   222.0/sec): Scroll 500x500 pixels
      VFP  : 10000 trep @   2.7015 msec (   370.0/sec): Scroll 500x500 pixels
      
      == TI DM3730, Cortex-A8 ==
      
      NEON : 15000 trep @   2.2303 msec (   448.0/sec): Scroll 500x500 pixels
      VFP  : 10000 trep @   3.0670 msec (   326.0/sec): Scroll 500x500 pixels
      
      == Allwinner A10, Cortex-A8 ==
      
      NEON : 10000 trep @   2.5559 msec (   391.0/sec): Scroll 500x500 pixels
      VFP  : 10000 trep @   3.0580 msec (   327.0/sec): Scroll 500x500 pixels
      
      == Raspberry Pi, BCM2708, ARM1176 ==
      
      VFP  :  3000 trep @   8.7699 msec (   114.0/sec): Scroll 500x500 pixels
      
      The benchmark numbers in this particular test setup roughly represent
      memory copy bandwidth measured in MB/s (when doing overlapped blits
      inside of a writecombine mapped framebuffer).
      
      -----------------------------------------------------------------------
      
      Note: the use of VFP two-pass overlapped copy instead of ShadowFB is
            still not enabled by default when running on Raspberry Pi
            because the performance results are not so great.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      b93dab5c
  2. 03 Jun, 2013 1 commit
  3. 02 Jun, 2013 2 commits
    • Harm Hanemaaijer's avatar
      G2D: Implement "double speed" 16bpp blits · 98f1b119
      Harm Hanemaaijer authored
      
      
      When source and destination coordinates allow it, a 16bpp screen-to-
      screen blit is divided into up to three segments: two optional one
      pixel wide edges and an  aligned middle segment that is copied in
      32-bit mode.
      
      This patch adds the low-level function sunxi_g2d_blit_r5g6b5_in_three
      and adds logic to the general blit function to use it for 16bpp to
      16bpp blits if the source and destination coordinates allow it. This
      patch automatically enables the use of this optimization in the
      sunxi G2D X driver. The area threshold for using G2D for
      16bpp-to-16bpp blits was introduced in a previous patch.
      
      Benchmarks:
      
      1920x1080x16bpp@60Hz, ShadowFB disabled:
      
      x11perf -scroll100
      Before:
       350000 trep @   0.0881 msec ( 11400.0/sec): Scroll 100x100 pixels
      After:
       350000 trep @   0.0819 msec ( 12200.0/sec): Scroll 100x100 pixels
      
      x11perf -scroll500
      Before:
        20000 trep @   1.3547 msec (   738.0/sec): Scroll 500x500 pixels
      After:
        35000 trep @   0.8005 msec (  1250.0/sec): Scroll 500x500 pixels
      Signed-off-by: default avatarHarm Hanemaaijer <fgenfb@yahoo.com>
      98f1b119
    • Harm Hanemaaijer's avatar
      G2D: Implement an area threshold for using G2D blits. · b3c2fd2c
      Harm Hanemaaijer authored
      
      
      Due to the overhead of G2D for small screen-to-screen blits, CPU blits
      are faster for small areas. This patch introduces are threshold below
      which CPU blits are triggered. It is currently set to 1000 for 32bpp
      and 2500 for 16bpp based on test results.
      
      Some benchmarks:
      
      1920x1080x16bppx60Hz, ShadowFB disabled:
      
      x11perf -scroll10
      
      Before:
      1500000 trep @   0.0239 msec ( 41800.0/sec): Scroll 10x10 pixels
      After:
      2500000 trep @   0.0110 msec ( 90900.0/sec): Scroll 10x10 pixels
      
      x11perf -copywinwin10
      
      Before:
      1200000 trep @   0.0247 msec ( 40500.0/sec): Copy 10x10 from window to window
      After:
      1800000 trep @   0.0146 msec ( 68600.0/sec): Copy 10x10 from window to window
      Signed-off-by: default avatarHarm Hanemaaijer <fgenfb@yahoo.com>
      b3c2fd2c
  4. 30 Mar, 2013 1 commit
  5. 28 Mar, 2013 1 commit
  6. 26 Mar, 2013 3 commits
  7. 22 Mar, 2013 2 commits
  8. 21 Mar, 2013 1 commit
    • Siarhei Siamashka's avatar
      G2D: accelerate CopyArea between different pixmaps in framebuffer · cc1b1410
      Siarhei Siamashka authored
      
      
      Now source and destination pixmaps don't need to be the same for
      using G2D acceleration (as long as both of them are allocated in
      the framebuffer). This allows using G2D to copy pixels from DRI2
      buffers to the framebuffer on the fallback path (when the window
      of an OpenGL ES application is partially overlapped by some other
      windows). Though it only works when composite extension is
      disabled, for example by adding the following to xorg.conf:
      
          Section "Extensions"
              Option "Composite" "Disable"
          EndSection
      
      If composite extension is enabled, windows have backing pixmaps, and
      we have a longer chain of copies:
      
         DRI2 buffer -> backing pixmap -> framebuffer
      
      Because backing pixmap is not allocated in a physically contiguous
      memory, it can't be copied using G2D yet.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      cc1b1410
  9. 19 Mar, 2013 1 commit
  10. 18 Mar, 2013 1 commit
    • Siarhei Siamashka's avatar
      G2D: Hardware acceleration for XCopyArea (initially 32bpp only) · ecfeb4aa
      Siarhei Siamashka authored
      
      
      Wrap CreateGC function to add a hook for CopyArea operation, which
      can be accelerated using G2D for the buffers inside of the visible
      part of the framebuffer. In the future we may try to also ensure
      that DRI2 buffers are copied using G2D instead of CPU in the case
      if we hit the fallback path and can't avoid this copy.
      
      Benchmark using "x11perf -scroll500 -copywinwin500":
      
      === ShadowFB (software rendering) ===
      
         3000 reps @   2.0308 msec (   492.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9741 msec (   507.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9826 msec (   504.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9830 msec (   504.0/sec): Scroll 500x500 pixels
         3000 reps @   1.9965 msec (   501.0/sec): Scroll 500x500 pixels
        15000 trep @   1.9934 msec (   502.0/sec): Scroll 500x500 pixels
      
         1600 reps @   3.3054 msec (   303.0/sec): Copy 500x500 from window to window
         1600 reps @   3.3179 msec (   301.0/sec): Copy 500x500 from window to window
         1600 reps @   3.2263 msec (   310.0/sec): Copy 500x500 from window to window
         1600 reps @   3.2491 msec (   308.0/sec): Copy 500x500 from window to window
         1600 reps @   3.2357 msec (   309.0/sec): Copy 500x500 from window to window
         8000 trep @   3.2669 msec (   306.0/sec): Copy 500x500 from window to window
      
      === G2D (hardware acceleration) ===
      
         3000 reps @   2.1949 msec (   456.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1929 msec (   456.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1923 msec (   456.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1889 msec (   457.0/sec): Scroll 500x500 pixels
         3000 reps @   2.1941 msec (   456.0/sec): Scroll 500x500 pixels
        15000 trep @   2.1926 msec (   456.0/sec): Scroll 500x500 pixels
      
         2800 reps @   1.8114 msec (   552.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8103 msec (   552.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8160 msec (   551.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8099 msec (   553.0/sec): Copy 500x500 from window to window
         2800 reps @   1.8126 msec (   552.0/sec): Copy 500x500 from window to window
        14000 trep @   1.8120 msec (   552.0/sec): Copy 500x500 from window to window
      
      CPU usage remains low when running this test with G2D acceleration enabled.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      ecfeb4aa
  11. 17 Mar, 2013 2 commits
  12. 16 Mar, 2013 1 commit
  13. 14 Mar, 2013 2 commits
  14. 13 Mar, 2013 1 commit
    • Siarhei Siamashka's avatar
      Added ioctl wrappers for simple G2D fill and blit operations · df534b22
      Siarhei Siamashka authored
      
      
      The existing kernel driver from Allwinner for G2D accelerator
      is quite bad because ioctls are synchronous and blocking the
      caller thread, compromise security (basically it is a backdoor
      for copying data in memory between any arbitrary physical
      addresses) and have high overhead (each individual fill or
      blit operation needs an ioctl). But we need to start with
      something, so use this stuff as a placeholder.
      
      The g2d_driver.h header file is taken from linux-sunxi-3.4
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      df534b22
  15. 12 Mar, 2013 1 commit
  16. 11 Mar, 2013 1 commit
  17. 23 Feb, 2013 1 commit
  18. 18 Feb, 2013 2 commits
    • Siarhei Siamashka's avatar
      Add support for hardware ARGB cursors up to 32x32 size · 06ae9b68
      Siarhei Siamashka authored
      
      
      Actually they are converted to 32x32 with 256 color palette. In the
      case if we have more than 256 unique colors, the color components
      of the pixels are reduced from 8-bit to 7-bit, then to 6-bit if
      necessary and so on (until we reduce the number of unique colors
      so that they can fit the palette). In the worst case we may
      theoretically end up with just 2 bits per A, R, G and B channels,
      but in practice 7 or 6 bits seem to be enough.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      06ae9b68
    • Siarhei Siamashka's avatar
      Disable hardware layers when software ARGB cursor is used · 0ab0a9e8
      Siarhei Siamashka authored
      
      
      The modern desktops may use ARGB cursors. As the current
      sunxi display controller support code can't handle this
      type of cursor yet, the X server fallbacks to a software
      cursor which is not visible under layers and ruining user
      experience.
      
      This patch adds empty implementations for "UseHWCursorARGB"
      and "LoadCursorARGB" functions which just return error for
      now (so that the X server still fallbacks to software cursor).
      However we also introduce callback functions responsible for
      notifying the DRI2 code about enabling/disabling the use of
      hardware cursor. So that now hardware overlays are disabled
      when switching to software cursor and re-enabled again when
      switching back to hardware cursor.
      Signed-off-by: default avatarSiarhei Siamashka <siarhei.siamashka@gmail.com>
      0ab0a9e8
  19. 12 Feb, 2013 1 commit
  20. 24 Jan, 2013 1 commit
  21. 20 Jan, 2013 4 commits
    • Siarhei Siamashka's avatar
      Support for sunxi hardware cursor · 64d51c0f
      Siarhei Siamashka authored
      Hardware cursor is necessary because it is also visible on
      top of sunxi disp layers, while software cursor is not.
      
      FIXME: there is one minor problem with negative cursor
      positions. The hardware does not support them, so such
      positions are just set to 0 for now. In the future this
      can be solved better by changing the cursor picture and
      showing only the parts which are visible on screen.
      64d51c0f
    • Siarhei Siamashka's avatar
      Added supplementary wrapper functions for sunxi display controller ioctls · bf7c7a6c
      Siarhei Siamashka authored
      Note: the header file "sunxi_disp_ioctl.h" is GPL licensed. So until
      it is gets a MIT/X11 replacement, the DDX driver is GPL licensed as
      a whole. The individual source files still have their own license.
      Also in order to avoid any possible confusion, the MIT/X11 license
      header from COPYING has been added to "fbdev.c" and "fbdev_priv.h".
      bf7c7a6c
    • Siarhei Siamashka's avatar
      Move driver private data into a separate header file · b0813ddf
      Siarhei Siamashka authored
      It is going to be included by multiple different source files.
      b0813ddf
    • Siarhei Siamashka's avatar
      Rename "fbdev" -> "sunxifb" and update man page · 703aea28
      Siarhei Siamashka authored
      As there is no way for the hardware specific bits to be accepted
      in xf86-video-fbdev, we need a new driver with its own name.
      703aea28
  22. 25 Sep, 2012 1 commit
  23. 05 Jun, 2012 3 commits
  24. 17 Dec, 2010 1 commit
  25. 10 Nov, 2010 3 commits
  26. 08 Feb, 2010 1 commit