Commit 06f5aec6 authored by Harm Hanemaaijer's avatar Harm Hanemaaijer Committed by Siarhei Siamashka
Browse files

Add CPU optimization for PutImage

Benchmark tests reveal that xorg's fb layer PutImage implementation
does not follow on optimal code path for requests without special
raster operations, which is due to the use of a slower general blit
function instead of the pixman library. This affects Xlib PutImage
requests and some ShmPutImage requests. In the case of ShmPutImage,
xorg directs ShmPutImage requests to PutImage only if the width of
the part of the image to be copied is equal to the full width of
the image, resulting in relatively poor performance. If the width
of the part of the image that is copied is smaller than the full
image, then xorg uses CopyArea which results in the use of the
already optimal pixman blit functions. The sub-optimal path is
commonly triggered by applications such as window managers and web
browsers.

To fix this unnecessary performance flaw, PutImage is replaced with
a version that uses pixman for the common case of GXcopy and all
plane masks sets. This change is device-independent and only uses
pixman CPU blit functions that is already present in the xorg server.

Using the low-level benchmark program benchx
(https://github.com/hglm/benchx.git

), the following speed-ups were
measured (1920x1080x32bpp) on an Allwinner A10 device:

ShmPutImageFullWidth (5 x 5): Speed up 9%
ShmPutImageFullWidth (7 x 7): Slow down 5%
ShmPutImageFullWidth (22 x 22): Speed up 8%
ShmPutImageFullWidth (49 x 49): Speed up 19%
ShmPutImageFullWidth (73 x 73): Speed up 55%
ShmPutImageFullWidth (109 x 109): Speed up 50%
ShmPutImageFullWidth (163 x 163): Speed up 37%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 77%
ShmPutImageFullWidth (549 x 549): Speed up 92%
AlignedShmPutImageFullWidth (5 x 5): Slow down 14%
AlignedShmPutImageFullWidth (7 x 7): Slow down 6%
AlignedShmPutImageFullWidth (15 x 15): Speed up 10%
AlignedShmPutImageFullWidth (22 x 22): Speed up 9%
AlignedShmPutImageFullWidth (33 x 33): Speed up 21%
AlignedShmPutImageFullWidth (49 x 49): Speed up 28%
AlignedShmPutImageFullWidth (73 x 73): Speed up 30%
AlignedShmPutImageFullWidth (109 x 109): Speed up 47%
AlignedShmPutImageFullWidth (163 x 163): Speed up 38%
AlignedShmPutImageFullWidth (244 x 244): Speed up 63%
AlignedShmPutImageFullWidth (366 x 366): Speed up 84%
AlignedShmPutImageFullWidth (549 x 549): Speed up 89%

At 16bpp the speed-up is even greater:

ShmPutImageFullWidth (5 x 5): Slow down 8%
ShmPutImageFullWidth (7 x 7): Slow down 8%
ShmPutImageFullWidth (10 x 10): Slow down 6%
ShmPutImageFullWidth (22 x 22): Speed up 9%
ShmPutImageFullWidth (33 x 33): Speed up 20%
ShmPutImageFullWidth (49 x 49): Speed up 27%
ShmPutImageFullWidth (73 x 73): Speed up 69%
ShmPutImageFullWidth (109 x 109): Speed up 74%
ShmPutImageFullWidth (163 x 163): Speed up 100%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 133%
ShmPutImageFullWidth (549 x 549): Speed up 123%
AlignedShmPutImageFullWidth (5 x 5): Speed up 6%
AlignedShmPutImageFullWidth (7 x 7): Slow down 9%
AlignedShmPutImageFullWidth (10 x 10): Slow down 10%
AlignedShmPutImageFullWidth (33 x 33): Speed up 17%
AlignedShmPutImageFullWidth (49 x 49): Speed up 34%
AlignedShmPutImageFullWidth (73 x 73): Speed up 49%
AlignedShmPutImageFullWidth (109 x 109): Speed up 53%
AlignedShmPutImageFullWidth (163 x 163): Speed up 69%
AlignedShmPutImageFullWidth (244 x 244): Speed up 82%
AlignedShmPutImageFullWidth (366 x 366): Speed up 116%
AlignedShmPutImageFullWidth (549 x 549): Speed up 110%
Signed-off-by: default avatarHarm Hanemaaijer <fgenfb@yahoo.com>
parent 3ad74420
...@@ -211,6 +211,95 @@ xCopyArea(DrawablePtr pSrcDrawable, ...@@ -211,6 +211,95 @@ xCopyArea(DrawablePtr pSrcDrawable,
xIn, yIn, widthSrc, heightSrc, xOut, yOut); xIn, yIn, widthSrc, heightSrc, xOut, yOut);
} }
/*
* The following function is adapted from xserver/fb/fbPutImage.c.
*/
static void xPutImage(DrawablePtr pDrawable,
GCPtr pGC,
int depth,
int x, int y, int w, int h, int leftPad, int format, char *pImage)
{
FbGCPrivPtr pPriv;
FbStride srcStride;
FbStip *src;
RegionPtr pClip;
FbStip *dst;
FbStride dstStride;
int dstBpp;
int dstXoff, dstYoff;
int nbox;
BoxPtr pbox;
int x1, y1, x2, y2;
if (format == XYBitmap || format == XYPixmap ||
pDrawable->bitsPerPixel != BitsPerPixel(pDrawable->depth)) {
fbPutImage(pDrawable, pGC, depth, x, y, w, h, leftPad, format, pImage);
return;
}
pPriv =fbGetGCPrivate(pGC);
if (pPriv->pm != FB_ALLONES || pGC->alu != GXcopy) {
fbPutImage(pDrawable, pGC, depth, x, y, w, h, leftPad, format, pImage);
return;
}
ScreenPtr pScreen = pDrawable->pScreen;
ScrnInfoPtr pScrn = xf86Screens[pScreen->myNum];
SunxiG2D *private = SUNXI_G2D(pScrn);
src = (FbStip *)pImage;
x += pDrawable->x;
y += pDrawable->y;
srcStride = PixmapBytePad(w, pDrawable->depth) / sizeof(FbStip);
pClip = fbGetCompositeClip(pGC);
fbGetStipDrawable(pDrawable, dst, dstStride, dstBpp, dstXoff, dstYoff);
for (nbox = RegionNumRects(pClip),
pbox = RegionRects(pClip); nbox--; pbox++) {
x1 = x;
y1 = y;
x2 = x + w;
y2 = y + h;
if (x1 < pbox->x1)
x1 = pbox->x1;
if (y1 < pbox->y1)
y1 = pbox->y1;
if (x2 > pbox->x2)
x2 = pbox->x2;
if (y2 > pbox->y2)
y2 = pbox->y2;
if (x1 >= x2 || y1 >= y2)
continue;
Bool done = FALSE;
int w = x2 - x1;
int h = y2 - y1;
/* first try pixman (NEON) */
if (!done) {
done = pixman_blt((uint32_t *)src, (uint32_t *)dst, srcStride, dstStride,
dstBpp, dstBpp, x1 - x,
y1 - y, x1 + dstXoff,
y1 + dstYoff, w,
h);
}
/* otherwise fall back to fb */
if (!done)
fbBlt(src + (y1 - y) * srcStride,
srcStride,
(x1 - x) * dstBpp,
dst + (y1 + dstYoff) * dstStride,
dstStride,
(x1 + dstXoff) * dstBpp,
w * dstBpp,
h, GXcopy, FB_ALLONES, dstBpp, FALSE, FALSE);
}
fbFinishAccess(pDrawable);
}
static Bool static Bool
xCreateGC(GCPtr pGC) xCreateGC(GCPtr pGC)
{ {
...@@ -228,6 +317,8 @@ xCreateGC(GCPtr pGC) ...@@ -228,6 +317,8 @@ xCreateGC(GCPtr pGC)
/* Add our own hook for CopyArea function */ /* Add our own hook for CopyArea function */
self->pGCOps->CopyArea = xCopyArea; self->pGCOps->CopyArea = xCopyArea;
/* Add our own hook for PutImage */
self->pGCOps->PutImage = xPutImage;
} }
pGC->ops = self->pGCOps; pGC->ops = self->pGCOps;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment