Journey Through Freedreno
budding
Published: 2023-02-28
Last Edited: 2023-02-28
As part of my training at Igalia I’ve been attempting to write a new backend for Freedreno that targets the proprietary “KGSL” kernel mode driver. For those unaware there are two “main” kernel mode drivers on Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM compliant, and Freedreno already able to run on this driver. “KGSL” is the proprietary KMD that Qualcomm’s proprietary userspace driver targets. Now why would you want to run freedreno against KGSL, when MSM exists? Well there are a few ones, first MSM only really works on an up-streamed kernel, so if you have to run a down-streamed kernel you can continue using the version of KGSL that the manufacturer shipped with your device. Second this allows you to run both the proprietary adreno driver and the open source freedreno driver on the same device just by swapping libraries, which can be very nice for quickly testing something against both drivers.
When “DRM” isn’t just “DRM”
When working on a new backend, one of the critical things to do is to make use of as much “common code” as possible. This has a number of benefits, least of all reducing the amount of code you have to write. It also allows reduces the number of bugs that will likely exist as you are relying on well tested code, and it ensures that the backend is mostly likely going to continue to work with new driver updates.
When I started the work for a new backend I looked inside mesa’s
src/freedreno/drm
folder. This has the current backend code
for Freedreno, and its already modularized to support multiple backends.
It currently has support for the above mentioned MSM kernel mode driver
as well as virtio (a backend that allows Freedreno to be used from
within in a virtualized environment). From the name of this path, you
would think that the code in this module would only work with kernel
mode drivers that implement DRM, but actually there is only a handful of
places in this module where DRM support is assumed. This made it a good
starting point to introduce the KGSL backend and piggy back off the
common code.
For example the drm
module has a lot of code to deal
with the management of synchronization primitives, buffer objects, and
command submit lists. All managed at a abstraction above “DRM” and to
re-implement this code would be a bad idea.
How to get Android to behave
One of this big struggles with getting the KGSL backend working was
figuring out how I could get Android to load mesa instead of Qualcomm
blob driver that is shipped with the device image. Thankfully a good
chunk of this work has already been figured out when the Turnip
developers (Turnip is the open source Vulkan implementation for Adreno
GPUs) figured out how to get Turnip running on android with KGSL.
Thankfully one of my coworkers Danylo is one of those
Turnip developers, and he gave me a lot of guidance on getting Android
setup. One thing to watch out for is the outdated instructions here. These instructions
almost work, but require some modifications. First if you’re
using a more modern version of the Android NDK, the compiler has been
replaced with LLVM/Clang, so you need to change which compiler is being
used. Second flags like system
in the cross compiler script
incorrectly set the system as linux
instead of
android
. I had success using the below cross compiler
script. Take note that the compiler paths need to be updated to match
where you extracted the android NDK on your system.
[binaries]
ar = '/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-ar'
c = ['ccache', '/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang']
cpp = ['ccache', '/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang++', '-fno-exceptions', '-fno-unwind-tables', '-fno-asynchronous-unwind-tables', '-static-libstdc++']
c_ld = 'lld'
cpp_ld = 'lld'
strip = '/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-strip'
# Android doesn't come with a pkg-config, but we need one for Meson to be happy not
# finding all the optional deps it looks for. Use system pkg-config pointing at a
# directory we get to populate with any .pc files we want to add for Android
pkgconfig = ['env', 'PKG_CONFIG_LIBDIR=/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/pkgconfig:/home/lfryzek/Documents/projects/igalia/freedreno/install-android/lib/pkgconfig', '/usr/bin/pkg-config']
[host_machine]
system = 'android'
cpu_family = 'arm'
cpu = 'armv8'
endian = 'little'
Another thing I had to figure out with Android, that was different
with these instructions, was how I would get Android to load mesa
versions of mesa libraries. That’s when my colleague Mark pointed out to me that
Android is open source and I could just check the source code myself.
Sure enough you have find the OpenGL driver loader in Android’s
source code. From this code we can that Android will try to load a
few different files based on some settings, and in my case it would try
to load 3 different shaded libraries in the
/vendor/lib64/egl
folder, libEGL_adreno.so
,libGLESv1_CM_adreno.so
, and libGLESv2.so
. I
could just replace these libraries with the version built from mesa and
voilà, you’re now loading a custom driver! This realization that I could
just “read the code” was very powerful in debugging some more android
specific issues I ran into, like dealing with gralloc.
Something cool that the opensource Freedreno & Turnip driver
developers figured out was getting android to run test OpenGL
applications from the adb shell without building android APKs. If you
check out the freedreno
repo, they have an ndk-build.sh
script that can build
tests in the tests-*
folder. The nice benefit of this is
that it provides an easy way to run simple test cases without worrying
about the android window system integration. Another nifty feature about
this repo is the libwrap
tool that lets trace the commands
being submitted to the GPU.
What even is Gralloc?
Gralloc is the graphics memory allocated in Android, and the OS will
use it to allocate the surface for “windows”. This means that the memory
we want to render the display to is managed by gralloc and not our KGSL
backend. This means we have to get all the information about this
surface from gralloc, and if you look in
src/egl/driver/dri2/platform_android.c
you will see
existing code for handing gralloc. You would think “Hey there is no work
for me here then”, but you would be wrong. The handle gralloc provides
is hardware specific, and the code in platform_android.c
assumes a DRM gralloc implementation. Thankfully the turnip developers
had already gone through this struggle and if you look in
src/freedreno/vulkan/tu_android.c
you can see they have
implemented a separate path when a Qualcomm msm implementation of
gralloc is detected. I could copy this detection logic and add a
separate path to platform_android.c
.
Working with the Freedreno community
When working on any project (open-source or otherwise), it’s nice to
know that you aren’t working alone. Thankfully the
#freedreno
channel on irc.oftc.net
is very
active and full of helpful people to answer any questions you may have.
While working on the backend, one area I wasn’t really sure how to
address was the synchronization code for buffer objects. The backend
exposed a function called cpu_prep
, This function was just
there to call the DRM implementation of cpu_prep
on the
buffer object. I wasn’t exactly sure how to implement this functionality
with KGSL since it doesn’t use DRM buffer objects.
I ended up reaching out to the IRC channel and Rob Clark on the
channel explained to me that he was actually working on moving a lot of
the code for cpu_prep
into common code so that a non-drm
driver (like the KGSL backend I was working on) would just need to
implement that operation as NOP (no operation).
Dealing with bugs & reverse engineering the blob
I encountered a few different bugs when implementing the KGSL backend, but most of them consisted of me calling KGSL wrong, or handing synchronization incorrectly. Thankfully since Turnip is already running on KGSL, I could just more carefully compare my code to what Turnip is doing and figure out my logical mistake.
Some of the bugs I encountered required the backend interface in
Freedreno to be modified to expose per a new per driver implementation
of that backend function, instead of just using a common implementation.
For example the existing function to map a buffer object into userspace
assumed that the same fd
for the device could be used for
the buffer object in the mmap
call. This worked fine for
any buffer objects we created through KGSL but would not work for buffer
objects created from gralloc (remember the above section on surface
memory for windows comming from gralloc). To resolve this issue I
exposed a new per backend implementation of “map” where I could take a
different path if the buffer object came from gralloc.
While testing the KGSL backend I did encounter a new bug that seems
to effect both my new KGSL backend and the Turnip KGSL backend. The bug
is an iommu fault
that occurs when the surface allocated by
gralloc does not have a height that is aligned to 4. The blitting engine
on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4
the GPU will try to write to pixels that exists outside the allocated
memory. This issue only happens with KGSL backends since we import
memory from gralloc, and gralloc allocates exactly enough memory for the
surface, with no alignment on the height. If running on any other
platform, the fdl
(Freedreno Layout) code would be called
to compute the minimum required size for a surface which would take into
account the alignment requirement for the height. The blob driver
Qualcomm didn’t seem to have this problem, even though its getting the
exact same buffer from gralloc. So it must be doing something different
to handle the none aligned height.
Because this issue relied on gralloc, the application needed to running as an Android APK to get a surface from gralloc. The best way to fix this issue would be to figure out what the blob driver is doing and try to replicate this behavior in Freedreno (assuming it isn’t doing something silly like switch to sysmem rendering). Unfortunately it didn’t look like the libwrap library worked to trace an APK.
The libwrap library relied on a linux feature known as
LD_PRELOAD
to load libwrap.so
when the
application starts and replace the system functions like
open
and ioctl
with their own implementation
that traces what is being submitted to the KGSL kernel mode driver.
Thankfully android exposes this LD_PRELOAD
mechanism
through its “wrap” interface where you create a propety called
wrap.<app-name>
with a value
LD_PRELOAD=<path to libwrap.so>
. Android will then
load your library like would be done in a normal linux shell. If you
tried to do this with libwrap though you find very quickly that you
would get corrupted traces. When android launches your APK, it doesn’t
only launch your application, there are different threads for different
android system related functions and some of them can also use OpenGL.
The libwrap library is not designed to handle multiple threads using
KGSL at the same time. After discovering this issue I created a MR
that would store the tracing file handles as TLS (thread local storage)
preventing the clobbering of the trace file, and also allowing you to
view the traces generated by different threads separately from each
other.
With this is in hand one could begin investing what the blob driver is doing to handle this unaligned surfaces.
What’s next?
Well the next obvious thing to fix is the aligned height issue which is still open. I’ve also worked on upstreaming my changes with this WIP MR.