- 09 Sep, 2013 2 commits
-
-
Siarhei Siamashka authored
In the case if the framebuffer reservation size is too small for efficient use of the hardware overlays and zero-copy buffers flipping, log a hint about fixing this problem in /var/log/Xorg.0.log Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
Even though we are primarily using the UMP buffer obtained by the GET_UMP_SECURE_ID_SUNXI_FB ioctl, another UMP buffer obtained by the GET_UMP_SECURE_ID_BUF1 ioctl should also span over the whole framebuffer. Otherwise we may have troubles with the window resize bug recovery and buffer flipping. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 07 Sep, 2013 3 commits
-
-
Siarhei Siamashka authored
The instructions, links, etc. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
The Allwinner A10/A13 display controller hardware is expected to support negative coordinates of the top left corners of the layers. But there is some bug either in the kernel driver or in the hardware, which messes up the picture on screen when the Y coordinate is negative for YUV layer. Negative X coordinates are not affected. RGB formats are not affected too (no matter whether the RGB layer is scaled or not). We fix this by just recalculating which part of the buffer in memory corresponds to Y=0 on screen and adjust the input buffer settings. Fixes https://github.com/ssvb/xf86-video-sunxifb/issues/16 Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
Now zero copy and tear free buffer swapping is also supported for 16bpp desktop. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 06 Sep, 2013 1 commit
-
-
Siarhei Siamashka authored
Now the scaler is enabled for the sunxi disp layer only when we want to use it for YUV format with XV. Whenever the layer is configured for RGB format or deactivated, the scaler gets disabled. This should make the driver more friendly to the other potential scaled layer users. The total number of available scalers is only 2 for Allwinner A10 and only 1 for Allwinner A13. The potential drawback is that now we may get an error when trying to enable the scaler (if somebody else has used up all the available scalers) instead of always having it reserved and ready for use. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 13 Aug, 2013 1 commit
-
-
Siarhei Siamashka authored
Recent changes broke the configuration when "DRI2HWOverlay" option is set to "false". This patch adds the missing UMP secure ids initialization and resolves the problem. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 04 Aug, 2013 3 commits
-
-
Siarhei Siamashka authored
Do this to keep the variables naming style consistent across the source file (earlier these variables had different names like 'self', 'drvpriv', 'private'). Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
In double buffer mode, explicitly mark the buffers as designated for odd or even frame position when putting them into queue. And when swapping the buffers, use these flags to re-synchronize if it is necessary. This prevents problems after window resize (when gles-rgb-cycle-demo could expose a mismatch between the color name in the window title and the actual window color). Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
Whenever something goes wrong in high fps mode, it may be interesting to slow down the demo to check whether the actual background color matches the expected color (shown in the window title). Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 03 Aug, 2013 1 commit
-
-
Siarhei Siamashka authored
If DEBUG_WITH_RGB_PATTERN is defined, then we check that the frames colors are changed as "R -> G -> B -> R -> G -> ..." pattern and print debugging messages when this is not the case. Such color change pattern can be generated by the "test/gles-rgb-cycle-demo.c" program. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 31 Jul, 2013 3 commits
-
-
Siarhei Siamashka authored
Do this mostly for security reasons. We don't want any application to see whatever was last rendered by the previous GLES application by just peeking into a freshly allocated DRI2 buffer. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
We manage only a single hardware overlay. That's a precious shared resource, which we want to use for zero-copy fullscreen compositing in gnome-shell. The strange 1x1 window does not really need it. Fixes https://github.com/ssvb/xf86-video-sunxifb/issues/2 Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
When enabled, it tries to avoid tearing in OpenGL ES applications. Works on sunxi hardware in the case if the hardware overlay (sunxi disp layer) is used for a DRI2 window. The name of this option and the description in the man page has been borrowed from intel and radeon drivers. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 30 Jul, 2013 1 commit
-
-
Siarhei Siamashka authored
That's the right thing to do and fixes issues such as https://github.com/ssvb/xf86-video-sunxifb/issues/6 As a result, now the framebuffer size may need to be larger in order to accomodate two DRI2 buffers in the offscreen part of the framebuffer. The users of sunxi hardware are advised to increase the value of fb0_framebuffer_num variable in fex file to 3 for 32bpp mode and to 5 for 16bpp mode. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 29 Jul, 2013 2 commits
-
-
Siarhei Siamashka authored
Should fix https://github.com/ssvb/xf86-video-sunxifb/issues/14 and prevent FTBFS on some systems. Reported-by: Fred Chien <cfsghost@gmail.com> Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
When moving further to our own DRI2 buffers bookkeeping, we can't really trust the information from DRI2BufferRec anymore. So just add a copy of all the missing bits of information to UMPBufferInfoRec and use it instead. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 28 Jul, 2013 1 commit
-
-
Siarhei Siamashka authored
By allowing to set the delay between frames with milliseconds precision in the command line, we can use it to test vsync. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 26 Jul, 2013 2 commits
-
-
Siarhei Siamashka authored
The recent commit 9e0a8731 (its part that suppressed buffers reuse in the Xorg DRI2 framework) introduced a regression. Half of the frames stoppped reaching the screen on the CPU copy fallback path because the Mali blob now ended up rendering them to the "wrong" buffer. It just confirms that we need to completely move from the standard DRI2 framework in the Xorg server to our own buffers bookkeeping logic. This patch fixes the regression by introducing a single UMP buffer per window, which is shared between back and front DRI2 buffers. We can do this because double buffering does not make much sense on the fallback path at the moment (we can't set scanout from this buffer and anyway have to copy this data elsewhere immediately after we get it from Mali). Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
Bail out earlier for the uninteresting types of DRI2 buffer requests (by just returning a dummy null UMP buffer). Makes the code a bit more simple on the common path. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 24 Jul, 2013 3 commits
-
-
Siarhei Siamashka authored
The test program cycles through 3 colors (red, green, blue), so it is easier to see if we get the color change pattern wrong. Also the X11 window title is updated to indicate the current color information. If we have any problems with window decorations handling, they are likely to be exposed. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
Using the secure id 1 (framebuffer) to trick the Mali blob into requesting DRI2 buffers again was not a very good idea. The problem is that the blob still writes something there and corrupts the framebuffer. So instead we try to assign secure id 2 to a dummy 4KiB UMP buffer allocated in memory and use it for the same purpose. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
The Mali blob is doing something like this: 1. Request BackLeft DRI2 buffer (buffer A) and render to it 2. Swap buffers 3. Request BackLeft DRI2 buffer (buffer B) 4. Check window size, and if it has changed - go back to step 1. 5. Render to the current back buffer (either buffer A or B) 6. Swap buffers 7. Go back to step 4 A very serious show stopper problem is that the Mali blob ignores DRI2-InvalidateBuffers events and just uses GetGeometry polling to check whether the window size has changed. Unfortunately this is racy and we may end up with a size mismatch between buffer A and buffer B. This is particularly easy to trigger when the window size changes exactly between steps 1 and 3. See test/gles-yellow-blue-flip.c program which demonstrates this. Qt5 applications also trigger this bug. We workaround the issue by explicitly tracking the requests for BackLeft buffers and checking whether the sizes of these buffers match at step 1 and step 3. However the real challenge here is notifying the client application that these buffers are no good, so that it can request them again. As DRI2-InvalidateBuffers events are ignored, we are in a pretty difficult situation. Fortunately I remembered a weird behaviour observed earlier: https://groups.google.com/forum/#!msg/linux-sunxi/qnxpVaqp1Ys/aVTq09DVih0J Actually if we return UMP secure ID value 1 for the second DRI2 buffer request, the blob responds to this by spitting out the following error message: [EGL-X11] [2274] DETECTED ONLY ONE FRAMEBUFFER - FORCING A RESIZE [EGL-X11] [2274] DRI2 UMP ID 0x3 retrieved [EGL-X11] [2274] DRI2 WINDOW UMP SECURE ID CHANGED (0x3 -> 0x3) And then it proceeds by re-trying to request a pair of DRI2 buffers. But that's exactly the behaviour we want! As a down side, some ugly flashing can be seen on screen at the time when this workaround kicks in, but then everything normalizes. And unfortunately, the race condition is still not totally eliminated because the blob is apparently getting DRI2 buffer sizes from the separate GetGeometry requests instead of using the information provided by DRI2GetBuffers. But now the problem is at least very hard to trigger. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 20 Jul, 2013 1 commit
-
-
Harm Hanemaaijer authored
Since version 1.0, gstreamer (when using xvimagesink) often allocates a larger XV image for the video with padding on all four sides and then calls XvPutImage() to render a part of this image. With the current XV implementation this results in artifacts on the borders of the image, with a green bar at the bottom. I am observing this when playing a 1280x720 video on a 1920x1080 screen at 32bpp, the size of the video window doesn't matter. This problem seems to be an exaggeration of the one described in https://bugzilla.gnome.org/show_bug.cgi?id=685305 . The solution appears to be to use the source area dimensions as requested in the XvPutImage() call, as opposed to the dimensions of the originally allocated image, and to honour the offsets (src_x, src_y) when setting the source region on the display controller. With this relatively simple change, the problem seems to go away, and gstreamer 1.0 (which is faster than gstreamer 0.10 due to a zero-copy strategy) provides an acceptable solution for video playback. Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
-
- 19 Jul, 2013 1 commit
-
-
Siarhei Siamashka authored
In the case if an attempt to reserve a scalable sunxi-disp layer failed, don't initialize XV at all. Otherwise any attempt to use XV overlay is not going to work correctly and just results in the following dmesg spam: [ 728.280000] [DISP] not supported yuv channel format:18 in img_sw_para_to_reg This may happen on Allwinner A13 if scaler mode is enabled in .fex file (A13 only has one DEFE scaler). Allwinner A10 also can have similar troubles in dual-head configuration if scaler mode is enabled for one or both screens (A10 has two DEFE scalers). Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 18 Jul, 2013 2 commits
-
-
Harm Hanemaaijer authored
Update the man page and bring it up-to-date, reflecting the fact that the driver also supports non-sunxi platforms. Add description of the "XVHWOverlay" option. Also a small update to the README for similar reasons. Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
-
Harm Hanemaaijer authored
Add the "XVHWOverlay" boolean xorg.conf option to make it possible to disable the XV acceleration feature using display layers on sunxi hardware. Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
-
- 17 Jul, 2013 2 commits
-
-
Siarhei Siamashka authored
In some systems libump library is built without an explicit pthreads dependency. As the issue has been already confirmed to affect both sunxi and odroid users (and maybe the users of the other mali400 based hardware), it is easier to just workaround the problem locally. Otherwise we would need to hunt down all the libump packagers and beg for the fix. More details are at https://github.com/ssvb/xf86-video-sunxifb/issues/11 Reported-by: Patrick Wood Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 16 Jul, 2013 1 commit
-
-
Siarhei Siamashka authored
Proper layer sharing between XV and DRI2 still needs to be implemented. Additionally we still need NEON and/or G2D "textured overlay" as a fallback solution for the composited desktop (NEON optimized XV is going to be useful for a wide range of ARM devices). A bit of performance tuning is also necessary. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 11 Jul, 2013 1 commit
-
-
Siarhei Siamashka authored
They are needed for a basic XV extension implementation. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 12 Jun, 2013 4 commits
-
-
Harm Hanemaaijer authored
Benchmark tests reveal that xorg's fb layer PutImage implementation does not follow on optimal code path for requests without special raster operations, which is due to the use of a slower general blit function instead of the pixman library. This affects Xlib PutImage requests and some ShmPutImage requests. In the case of ShmPutImage, xorg directs ShmPutImage requests to PutImage only if the width of the part of the image to be copied is equal to the full width of the image, resulting in relatively poor performance. If the width of the part of the image that is copied is smaller than the full image, then xorg uses CopyArea which results in the use of the already optimal pixman blit functions. The sub-optimal path is commonly triggered by applications such as window managers and web browsers. To fix this unnecessary performance flaw, PutImage is replaced with a version that uses pixman for the common case of GXcopy and all plane masks sets. This change is device-independent and only uses pixman CPU blit functions that is already present in the xorg server. Using the low-level benchmark program benchx (https://github.com/hglm/benchx.git ), the following speed-ups were measured (1920x1080x32bpp) on an Allwinner A10 device: ShmPutImageFullWidth (5 x 5): Speed up 9% ShmPutImageFullWidth (7 x 7): Slow down 5% ShmPutImageFullWidth (22 x 22): Speed up 8% ShmPutImageFullWidth (49 x 49): Speed up 19% ShmPutImageFullWidth (73 x 73): Speed up 55% ShmPutImageFullWidth (109 x 109): Speed up 50% ShmPutImageFullWidth (163 x 163): Speed up 37% ShmPutImageFullWidth (244 x 244): Speed up 111% ShmPutImageFullWidth (366 x 366): Speed up 77% ShmPutImageFullWidth (549 x 549): Speed up 92% AlignedShmPutImageFullWidth (5 x 5): Slow down 14% AlignedShmPutImageFullWidth (7 x 7): Slow down 6% AlignedShmPutImageFullWidth (15 x 15): Speed up 10% AlignedShmPutImageFullWidth (22 x 22): Speed up 9% AlignedShmPutImageFullWidth (33 x 33): Speed up 21% AlignedShmPutImageFullWidth (49 x 49): Speed up 28% AlignedShmPutImageFullWidth (73 x 73): Speed up 30% AlignedShmPutImageFullWidth (109 x 109): Speed up 47% AlignedShmPutImageFullWidth (163 x 163): Speed up 38% AlignedShmPutImageFullWidth (244 x 244): Speed up 63% AlignedShmPutImageFullWidth (366 x 366): Speed up 84% AlignedShmPutImageFullWidth (549 x 549): Speed up 89% At 16bpp the speed-up is even greater: ShmPutImageFullWidth (5 x 5): Slow down 8% ShmPutImageFullWidth (7 x 7): Slow down 8% ShmPutImageFullWidth (10 x 10): Slow down 6% ShmPutImageFullWidth (22 x 22): Speed up 9% ShmPutImageFullWidth (33 x 33): Speed up 20% ShmPutImageFullWidth (49 x 49): Speed up 27% ShmPutImageFullWidth (73 x 73): Speed up 69% ShmPutImageFullWidth (109 x 109): Speed up 74% ShmPutImageFullWidth (163 x 163): Speed up 100% ShmPutImageFullWidth (244 x 244): Speed up 111% ShmPutImageFullWidth (366 x 366): Speed up 133% ShmPutImageFullWidth (549 x 549): Speed up 123% AlignedShmPutImageFullWidth (5 x 5): Speed up 6% AlignedShmPutImageFullWidth (7 x 7): Slow down 9% AlignedShmPutImageFullWidth (10 x 10): Slow down 10% AlignedShmPutImageFullWidth (33 x 33): Speed up 17% AlignedShmPutImageFullWidth (49 x 49): Speed up 34% AlignedShmPutImageFullWidth (73 x 73): Speed up 49% AlignedShmPutImageFullWidth (109 x 109): Speed up 53% AlignedShmPutImageFullWidth (163 x 163): Speed up 69% AlignedShmPutImageFullWidth (244 x 244): Speed up 82% AlignedShmPutImageFullWidth (366 x 366): Speed up 116% AlignedShmPutImageFullWidth (549 x 549): Speed up 110% Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
-
Siarhei Siamashka authored
This should be useful for Raspberry Pi. When reading uncached source buffers, the VFP optimized overlapped two-pass blit is roughly 2-3 times slower than memcpy in cached memory. Which makes it reasonably competitive compared to ShadowFB (considering that ShadowFB allocates an extra buffer, does extra memory copies which take time and thrash L2 cache, etc.). It even provides a slight performance advantage in a more or less realistic use case (scrolling in xterm), which needs reads from the framebuffer: ==== Before (xf86-video-fbdev with ShadowFB) ==== $ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt real 1m50.245s user 0m1.750s sys 0m0.800s ==== After (xf86-video-sunxifb without ShadowFB) ==== $ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt real 1m27.709s user 0m1.690s sys 0m0.920s We get decent results even when reading from the framebuffer. However in many typical workloads (excluding scrolling and dragging windows) the framebuffer is primarily used as write-only. In write-only use cases ShadowFB is just pure overhead. So getting rid of it is a very good idea as this improves overall graphics performance. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
A small typo in a function argument and C compiler happily accepting void pointers instead of something else is a dangerous combo. Need to be more careful. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
This patch implements a heuristics, which enables backing store for some windows. When backing store is enabled for a window, the window gets a backing pixmap (via automatic redirection provided by composite extension). It acts a bit similar to ShadowFB, but for individual windows. The advantage of backing store is that we can avoid "expose event -> redraw" animated trail in the exposed area when dragging another window on top of it. Dragging windows becomes much smoother and faster. But the disadvantage of backing store is the same as for ShadowFB. That's a loss of precious RAM, extra buffer copy when somebody tries to update window content, potentially skip of some frames on fast animation (they just do not reach screen). Also hardware accelerated scrolling does not currently work for the windows with backing store enabled. We try to make the best use of backing store by enabling backing store for all the windows that are direct children of root, except the one which has keyboard focus (either directly or via one of its children). In practice this heuristics seems to provide nearly perfect results: 1) dragging windows is fast and smooth. 2) the top level window with the keyboard focus (typically the application that a user is working with) is G2D accelerated and does not suffer from any intermediate buffer copy overhead. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 10 Jun, 2013 1 commit
-
-
Siarhei Siamashka authored
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 07 Jun, 2013 2 commits
-
-
Siarhei Siamashka authored
With the fallback to CPU backend for unsupported blits and also threshold for avoiding small blits, now G2D should always provide best overall performance. The users of recent versions of xf86-video-sunxifb are supposed to also have a reasonably recent version of linux-sunxi kernel. Which includes the following fix: https://github.com/linux-sunxi/linux-sunxi/commit/3d49345343a1535b The users of old kernels are going to see screen corruption on dragging windows and scrolling. They just should upgrade :) Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
Siarhei Siamashka authored
The G2D driver only supports framebuffer->framebuffer blits and also can't be used to accelerate dragging windows to the right (without hacking the kernel driver to do two-pass blit there). This patch adds fallback to NEON optimized CPU backend instead of resorting to use poorly performing fbBlt in these cases. Note: we assume that ioctls normally do not fail (even if they do, the slow old style fallback to fbBlt is not the worst thing to worry about). Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 05 Jun, 2013 1 commit
-
-
Siarhei Siamashka authored
Using VFP, we can load up to 128 bytes with a single VLDM instruction. But before this patch, only NEON implementation was available. Just because it showed better results on Allwinner A10 compared to VFP. And this DDX driver used to primarily target just sunxi hardware. But looks like it makes sense to also target other devices (at least ODROID-X, which has the same Mali400 GPU and can use the same DRI2 integration for EGL and GLESv2 support). And on the other ARM devices, VFP aligned reads generally work better than NEON. The benchmark results are listed below: 1280x720, 32bpp, testing "x11perf -scroll500" == Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement disabled == NEON : 10000 trep @ 3.7101 msec ( 270.0/sec): Scroll 500x500 pixels VFP : 10000 trep @ 2.6678 msec ( 375.0/sec): Scroll 500x500 pixels == Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement enabled == NEON : 15000 trep @ 2.2568 msec ( 443.0/sec): Scroll 500x500 pixels VFP : 15000 trep @ 2.3016 msec ( 434.0/sec): Scroll 500x500 pixels == Exynos 4412, Cortex-A9 == NEON : 10000 trep @ 4.5125 msec ( 222.0/sec): Scroll 500x500 pixels VFP : 10000 trep @ 2.7015 msec ( 370.0/sec): Scroll 500x500 pixels == TI DM3730, Cortex-A8 == NEON : 15000 trep @ 2.2303 msec ( 448.0/sec): Scroll 500x500 pixels VFP : 10000 trep @ 3.0670 msec ( 326.0/sec): Scroll 500x500 pixels == Allwinner A10, Cortex-A8 == NEON : 10000 trep @ 2.5559 msec ( 391.0/sec): Scroll 500x500 pixels VFP : 10000 trep @ 3.0580 msec ( 327.0/sec): Scroll 500x500 pixels == Raspberry Pi, BCM2708, ARM1176 == VFP : 3000 trep @ 8.7699 msec ( 114.0/sec): Scroll 500x500 pixels The benchmark numbers in this particular test setup roughly represent memory copy bandwidth measured in MB/s (when doing overlapped blits inside of a writecombine mapped framebuffer). ----------------------------------------------------------------------- Note: the use of VFP two-pass overlapped copy instead of ShadowFB is still not enabled by default when running on Raspberry Pi because the performance results are not so great. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-
- 03 Jun, 2013 1 commit
-
-
Siarhei Siamashka authored
This is my old ARM9E/ARM11 memcpy code from https://garage.maemo.org/projects/mplayer/ with some tuning for Raspberry Pi (aligned prefetch added). Will be used by VFP optimized overlapped blt function. Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
-