GSoC 2024 final report: Expanding ROCm support to Debian and its deriviatives
GSoC 2024 final report: Expanding ROCm support to Debian and its deriviatives
This is the final report of my GSoC ‘24 project. In this report, I’ll briefly introduce my contributions to the Debian ROCm Team, which is responsible for maintaining the open source ROCm software stack in Debian. Besides, I’ll also present the challenges, expereinces and thoughts during the GSoC journey.
Introduction
ROCm is the most competitive open source rival to NVIDIA’s proprietary CUDA, for AMD GPUs. Currently, Debian has already included the majority of components in the stack (runtime, compilers, tools and math libraries), but lacks support for high-level applications for end-users (e.g., PyTorch and Blender). In this GSoC project, I collaborated with the Debian ROCm Team to bring or update the ROCm stack into Debian’s official archive in preparation for updating PyTorch. One can view the overall contributions from my QA page.
Contributions
Familiar with Debian packages
Before starting substantial works for PyTorch, the first thing was to get myself familiar with how to organize, build and upload the upstream softwares into Debian. In this stage, I brought the following packages into Debian:
-
jupyter-cache
: an utility used in build ROCm documentation -
myst-nb
: a rendering tool to show documents in Jupyter Notebooks
I initially understood the package skills gradually via practices, including but not limited to project structures, Git integrations, automation tools and uploading procedures. These skills were applied to and expanded by the much more challenging ROCm packaging. A more detailed presentation of these skills is presented in “Unterstanding […]” below.
Prepare the dependency
-
miopen
: DL primitive library for AMD GPUs
Although the most of dependent package required by PyTorch are already packaged in Debian by the team members, there is one essential missing package: MIOpen. It is used by PyTorch to accelerate the deep learning primitive operations (e.g., convolution, pooling, activation and layer-norm) on AMD GPUs (its counter-part is cuDNN). To package PyTorch, we must have MIOpen available in Debian first.
From typical Debian’s standards, MIOpen is a relatively complicated package as it contains several components including offline kernel compilation, JIT, binary kernel database and the kernel performance database.
For example, the upstream MIOpen lacks the --gpu-architecture
option for hipRTC and results in CI failure, and we had to locate the patch the missing option.
Another case is the upstream repository maintains the crafted assembly kernels, perf DB and binary kernel cache, which is incompatible with DFSG as for now.
So we had to exclude these files from the source package, and modify the runtime library settings accordingly to make MIOpen work without these files.
The patching and modifying drives me to drove into MIOpen and have a deeper understanding about how the DL primitive library works (maybe cuDNN works in a similar way).
Working in progress: building PyTorch
Currently Debian’s PyTorch is at version 2.1.2, whereas upstream has release 2.4.x with official support for Python 3.12, with some breaking changes on the CPython level. As PyTorch may be the one of the most complicated packages in Debian, we’ve agreed to work first on updating the existing CPU and CUDA versions first, and adding ROCm in a second step.
Miscs.
Meanwhile, I also package some utilities and update some libraries to the latest upstream version:
-
amdsmi
: system monitoring and controlling tools -
roct-thunk-interface
: a thunk inferface to AMD GPU driver -
rocsparse
: a math library for sparse operations
There are also attempts to package composable kernel (CK), but we encountered some compiler issues (maybe bugs) that failed us to generate FP16 kernels, so we temporarily suspended this work.
Understanding of Debian documents, policies and tools
When trying to package upstream softwares in Debian, the first step is understand the multifarious documents and tools to follow the inherent conventions and best practices. Although this is challenge, it is mandatory as they are the essentials to ensure the package quality in Debian.
Check the license: DFSG
Before doing substantial works, the packagers must check whether the upstream software is compliant with the Debian Free Software Guidelines (DFSG).
As Debian is one of the most popular Linux distros, we have to ensure all the included packages are free and legal to distribute around the world.
It may be the most “boring” part of packaging a software, as it requires us to check all source files, as it is often the case that individual files contain third-party code not expressly listed in the top-level LICENSE or COPYING file.
But there are some automation tools like scan-copyrights
and licensecheck
which allow the packagers efficiently identify outstanding files.
As relaxation, Debian allows some DFSG-incompatible packages to be brought into the official archive, but in special sections: non-free-firmware and contrib. Packages in the non-free-firmware section are permitted to contain proprietary contents or incompatible licenses. For packages which are consistent with DFSG but rely on packages from non-free-firmware, maintainers can consider to bring them to the contrib section. The section design in archive makes Debian for flexible to include more softwares for end-users, like NVIDIA’s GPU driver.
Reproducible build
Debian seeks for reproducible build for its packages, where the identical artifacts are expected under the same source code with the safety and quality concerns. This ensures the softwares built by Debian’s official building daemon are identical with the one built by yourself, if following the instructions. From the maintainers’ perspective, one should avoid introduce machine-dependent stuffs into the building rules.
There are many tools designed for this goal, and I selected one of them: sbuild
.
Based on unshare chroot, it can provide rootless isolated and reproducible environment for package to be built.
Package with Git: gbp
Nowdays, almost all software developments undergo with version control tools, where Git is the mostly used one.
Debian integrates Git with its package workflow with gbp
(Git build package), which is a versatile tool allowing developers to checkout, update, patch and export with native Git branches and commits.
With gbp
, one can easily leverage the existing Git concepts and approaches to manage Debian packages, in the same way with upstream developers.
Declarative build: debhelper
There exists various configure systems to build a package, e.g. CMake, GNU autotools, Makefile for C/C++ projects.
This forces developers to handle build frontends differently.
Meanwhile, there are some common steps during the building for packages: symbol export, installation, etc.
To ease the burden of bridging the various frontends and common backends, one can exploit the debhelper
framework to define the command sequence that will be executed during building.
It defines the commands in a declarative manner in a Makefile
format, where developers just tell the tool with a series of declaring options, where special concerns are only needed for corner cases.
Avoid static errors: lintian
All packages should follow the same requirements (e.g., installation layout, metadata file format) in Debian.
And it is hard to remember all the details.
To ensure the consistency, developers rely on the static constraint checking tool lintian
to ensure the package to be uploaded is qualified with Debian’s requirements.
lintian
also helps developers to better understand the design and philosophy of such Debian policies, hence they can avoid the same errors next time.
Runtime tests: autopkgtest
and its extension for GPU
To meet the quality goal of Debian, packagers usually run the built-in tests during/after building to ensure the package works well.
Debian offers a tool named autopkgtest
to automatically run one or more tests in isolated test environments.
Since ROCm serves for GPU computing workloads, there are some extra infrastructure works needed to ensure testing on multiple GPU architectures.
Isolation is needed as we don’t want the host environment to polluted by the package, and we want to test multiple environments, specifically OS releases.
So the Debian ROCm Team maintains a forked version supporting either VM or container (docker
or podman
) level isolation to provide a virtualized environment for the test process exclusively access the AMD GPU.
Thanks for this, I can run the built-in tests for MIOpen to validate its effectiveness.
Exposed symbols for shared libraries
By default, shared libraris compiled from C++ expose all their symbols (functions, enums, global variables) to linked programs or libraries.
Without dedicated management, the exported symbols may lead to unexpected behaviors for the downstream packages.
Debian uses dpkg-gensymbols
tool to generate a list of public symbols exported from an ELF file to track the back compatibility between different versions.
However, for those packages without symbol management (e.g., MIOpen), it’s painful for developers to maintain such a symbol list, as they expose all the symbols and produce an extremely long one.
Thus, we have to tag the internal symbols with the optional
.
Bridge upstream developers and users
Although AMD declares ROCm is an open source compute stack, but currently it is just developed in public. Besides, the native upstream ROCm components are not distro-friendly enough to accommondate the requirements of different Linux distros. Therefore, developers have to devote efforts to modify the softwares with dedicated modifications, e.g., control the exposed symbols, exclude DFSG-incompatible files. Furthermore, there is no standard workflow for this step, one has to cope with the inconsistency and potential bugs package-by-package, driving the developers to be familiar with the package itself.
The GSoC journey
Thanks for my mentors
First and foremost, I will convey my greatest thanks to my mentors: Christian Kastner (@ckk), Cordell Bloor (@cgmb) and Mo Zhou (@lumin). Without them, it would cost much more time in every aforementioned stage during my GSoC journey. Especially, Christian offers a lot of help to me. When I first get in touch with the community, he introduced me the current status and todos for Debian ROCm Teamin detail. And when I prepare for my first upload, he provided the best practices through step-by-step guidelines with comprehensive materials. Also, he helps to check my packages to make them qualified to meet the Debian QA requirements. In summary, I cannot reach here without my mentors.
Own your own
Open source depends on coordinations from peoples. When working with others, one should act as a qualified collaborator. Meanwhile, open source is a volunteering job, i.e., no one else is responsible for your own job. You should be the owner of your job, i.e., try as much as you can to solve the faced obstacles. If you need help, you should explain the situation to your team clearly, and describe what you’ve attempted, and what’s the possible direction. One should not do the half job and just drop it in the list, waiting for others to continue.
Coordination among distros
ROCm is currently included in many Linux distros like Debian, Fedora and ArchLinux. And there are some common issues and obstacles that will occur for developers across different communities. So it would be nice if there are some cross-distro discussions so the duplicated works or exploration can be avoided.
Sustainable contributions
It’s my first experience in contributing to an open source community. And I’m willing to continously make contributions to debian-ai team and other teams in Debian. From my perspective, doing open source is a kind of enjoyment, which not only makes the infrastructure easy to use for people around the world, but also makes a self-fulfilment for me. Besides, it is also a branding activity showing my profession in the heterogeneous high-performance computing community.
Summary
To sum up, the last few months is a wonderful journey for me, where I make some contributions to the Debian ROCm community with the help of my mentors. And I’ll continously contribute to Debian in the future.