Add CPU optimization for PutImage
Benchmark tests reveal that xorg's fb layer PutImage implementation does not follow on optimal code path for requests without special raster operations, which is due to the use of a slower general blit function instead of the pixman library. This affects Xlib PutImage requests and some ShmPutImage requests. In the case of ShmPutImage, xorg directs ShmPutImage requests to PutImage only if the width of the part of the image to be copied is equal to the full width of the image, resulting in relatively poor performance. If the width of the part of the image that is copied is smaller than the full image, then xorg uses CopyArea which results in the use of the already optimal pixman blit functions. The sub-optimal path is commonly triggered by applications such as window managers and web browsers. To fix this unnecessary performance flaw, PutImage is replaced with a version that uses pixman for the common case of GXcopy and all plane masks sets. This change is device-independent and only uses pixman CPU blit functions that is already present in the xorg server. Using the low-level benchmark program benchx (https://github.com/hglm/benchx.git ), the following speed-ups were measured (1920x1080x32bpp) on an Allwinner A10 device: ShmPutImageFullWidth (5 x 5): Speed up 9% ShmPutImageFullWidth (7 x 7): Slow down 5% ShmPutImageFullWidth (22 x 22): Speed up 8% ShmPutImageFullWidth (49 x 49): Speed up 19% ShmPutImageFullWidth (73 x 73): Speed up 55% ShmPutImageFullWidth (109 x 109): Speed up 50% ShmPutImageFullWidth (163 x 163): Speed up 37% ShmPutImageFullWidth (244 x 244): Speed up 111% ShmPutImageFullWidth (366 x 366): Speed up 77% ShmPutImageFullWidth (549 x 549): Speed up 92% AlignedShmPutImageFullWidth (5 x 5): Slow down 14% AlignedShmPutImageFullWidth (7 x 7): Slow down 6% AlignedShmPutImageFullWidth (15 x 15): Speed up 10% AlignedShmPutImageFullWidth (22 x 22): Speed up 9% AlignedShmPutImageFullWidth (33 x 33): Speed up 21% AlignedShmPutImageFullWidth (49 x 49): Speed up 28% AlignedShmPutImageFullWidth (73 x 73): Speed up 30% AlignedShmPutImageFullWidth (109 x 109): Speed up 47% AlignedShmPutImageFullWidth (163 x 163): Speed up 38% AlignedShmPutImageFullWidth (244 x 244): Speed up 63% AlignedShmPutImageFullWidth (366 x 366): Speed up 84% AlignedShmPutImageFullWidth (549 x 549): Speed up 89% At 16bpp the speed-up is even greater: ShmPutImageFullWidth (5 x 5): Slow down 8% ShmPutImageFullWidth (7 x 7): Slow down 8% ShmPutImageFullWidth (10 x 10): Slow down 6% ShmPutImageFullWidth (22 x 22): Speed up 9% ShmPutImageFullWidth (33 x 33): Speed up 20% ShmPutImageFullWidth (49 x 49): Speed up 27% ShmPutImageFullWidth (73 x 73): Speed up 69% ShmPutImageFullWidth (109 x 109): Speed up 74% ShmPutImageFullWidth (163 x 163): Speed up 100% ShmPutImageFullWidth (244 x 244): Speed up 111% ShmPutImageFullWidth (366 x 366): Speed up 133% ShmPutImageFullWidth (549 x 549): Speed up 123% AlignedShmPutImageFullWidth (5 x 5): Speed up 6% AlignedShmPutImageFullWidth (7 x 7): Slow down 9% AlignedShmPutImageFullWidth (10 x 10): Slow down 10% AlignedShmPutImageFullWidth (33 x 33): Speed up 17% AlignedShmPutImageFullWidth (49 x 49): Speed up 34% AlignedShmPutImageFullWidth (73 x 73): Speed up 49% AlignedShmPutImageFullWidth (109 x 109): Speed up 53% AlignedShmPutImageFullWidth (163 x 163): Speed up 69% AlignedShmPutImageFullWidth (244 x 244): Speed up 82% AlignedShmPutImageFullWidth (366 x 366): Speed up 116% AlignedShmPutImageFullWidth (549 x 549): Speed up 110% Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
Please register or sign in to comment