– layout: post title: “GSoC 2025 Report: Enhancing Debian packages with ROCm GPU acceleration” date: 2025-09-01 categories: gsoc debian ROCm debian-packaging author: Spaarsh Thakkar —

GitLab Salsa: @Spaarsh

GitHub: Spaarsh

Introduction

I am Spaarsh Thakkar, a final-year Computer Science Engineering undergrad from India. My interests lie in research and systems. My recent work has been in and around Graphics Processing Units and I also hold a keen interest in Computer Networks. At the time of writing, I have been an open-source contributor for almost a year.

Proposal Description (as shown on GSoC Project Profile1)

Due to Debian’s open-source nature, no Debian package in main can have a proprietary GPU package listed as a dependency. While AI and HPC workloads increasingly rely on GPU acceleration, many Debian packages still focus solely on CUDA, which is proprietary.

With the advent of ROCm, an open-source GPU computing platform, we can now integrate full-fledged AMD GPU support into Debian packages. This will improve the experience of developers working in AI/ML and HPC while positioning Debian as a strong OS choice for GPU-driven workloads. The proposal aims to aid in solving the aforementioned program by packaging several ROCm packages for debian and add ROCm support to some existing debian packages.

The deliverables are as follows:

  1. New Debian packages with GPU support
  2. Enhanced GPU support within existing Debian packages
  3. More autopackagetests running on the Debian ROCm CI

Key Objectives

Enable ROCm in:

  1. dbcsr
  2. gloo
  3. cp2k

Publish the following packages to debian apt archive:

  1. hipblas-common
  2. hipBLASlt

Work Report

1. Publishing hipblas-common to apt

This objective was successfully completed, resulting in hipblas-common being published in the apt repository2.

The process involved the following steps:

  1. Filing a Intent-To-Package (ITP)3
  2. Pulling the upstream source code repository from GitHub
  3. Adding the debian/ packaging files
  4. Testing the package locally
  5. Creating the corresponding project under rocm-team4
  6. Applying the necessary changes
  7. Building the package
  8. Testing it using sbuild
  9. Signing the package files
  10. Uploading the package to the mentors.debian.net archive(now in official archive)5
  11. Addressing review feedback and making changes
  12. Requesting sponsorship6
  13. Securing sponsorship, which led to the package being accepted into the experimental branch of apt

Since the beginning of GSoC, the package has also been promoted to the unstable branch2.


2. DBCSR ROCm and Multi-Arch Support

During my GSoC project, I worked on extending the DBCSR (Distributed Block Compressed Sparse Row)7 package to improve its ROCm/HIP support, and handling multi-architecture GPU kernels in a way that is both practical for upstream maintainers and debian package developers.

The code changes can be found at my dbcsr fork here8.

ROCm/HIP Enablement

  • Enabled ROCm backend support to DBCSR, allowing GPU acceleration beyond CUDA by enabling HIP-based builds.
  • Investigated and resolved build issues specific to HIP kernels within DBCSR.

Multi-Architecture GPU Kernel Handling

(The following content was presented in greater detail at DebConf’25 as well. The presentation video can be found here9 and the presentation slide can be found here10).

  • DBCSR contains GPU kernels that are heavily optimized for specific architectures. By default, these are built for a single target architecture, which poses challenges for packaging where binaries need to support multiple possible GPU targets.
  • Explored different strategies for solving the multi-arch GPU kernel distribution problem, including:

    • Option 1: Fat binaries (embedding multiple GPU architectures into a single binary, with runtime dispatch). This is ideal for end-users but requires deeper changes upstream and is not straightforward with HIP/ROCm.
    • Option 2: Arch-specific libraries (e.g., libdbcsr.gfxXXX.a), where the alternatives system or explicit user selection would determine which one is used. This solves the problem but pushes complexity downstream into packaging and user configuration.
    • Option 3: Prefixed functions inside a single file, where kernels are compiled separately per architecture, functions are renamed with an arch prefix, and runtime logic in DBCSR decides which kernel to invoke. This shifts complexity upstream but could give a clean downstream experience.
  • I critically analyzed these options in the context of Debian packaging and upstream maintainability. Arch-specific .a files introduce exponential dependency complexity. The prefixed-function approach seemed like a plausible way forward, though it requires upstream buy-in.
  • After consulting with my mentor, these concerns were raised in the dbcsr repository as a discussion here11 

Summary

My work involved:

  • Enabling HIP/ROCm support in DBCSR.

  • Prototyping strategies for handling GPU multi-arch builds.

  • Evaluating the trade-offs between upstream maintainability and downstream packaging complexity.


3. gloo, hipification and source code issues

One of the other packages that were targeted was gloo12. It is a collective communications library and has the implementations of different Machine Learning communication algorithms.

The code changes can be found at my gloo fork here13 (some changes have not be committed at the time of writing).

HIP/ROCm Enablement

  1. Fixing old ROCm CMake functions The upstream Gloo codebase still used old ROCm CMake functions that began with the hip_ prefix (for example, hip_add_executable). These functions have since been deprecated/removed. I updated the build system to use the modern ROCm CMake equivalents so that the package can build properly in a current ROCm environment.

  2. Debian packaging changes I modified debian/control to add a new package, libgloo-rocm, in addition to the existing packages. This allows proper separation and handling of ROCm-enabled builds in Debian.

  3. First successful library build After these changes, I was able to successfully build the library. However, I ran into issues when trying to produce the shared library: there were undefined symbol errors at link time.

Source Code Issue

On investigating the undefined symbol errors, I identified that these came from a lack of explicit template instantiation for some Gloo classes. Since C++ templates only get compiled when explicitly used or instantiated, this resulted in missing symbols in the shared library.

To solve this, I explored the source code and noticed that the HIP backend code was not natively written — it was generated from the CUDA backend using a custom hipification script maintained by the repo.

  • I experimented with modifying the HIPification process itself, trying out hipify-perl14 instead of the repository’s custom Python script.
  • I also tried tweaking the source code in places where template instantiations were missing, so that the ROCm build would correctly export the needed symbols.

Summary

The issue is still unresolved. The core problem lies in how the source code is structured: the HIP backend is almost entirely auto-generated from CUDA code, and the process does not handle template instantiations correctly. Because of this, the Debian package for Gloo with ROCm support is not yet ready for release, and further source-level fixes are required to make the ROCm build reliable.


4. cp2k

CP2K15 is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems.

HIP/ROCm Enablement

cp2k depends on dbcsr and hence, HIP/ROCm enablement in this package required the dbcsr16 package to get through.

Even though dbcsr isn’t ready yet, it was worthwhile to plan how it shall be built with HIP/ROCm once we have dbcsr in place. Upon doing this, it was realized that the architecture-wise libraries provided by the dbcsr package will result in a complicated building process for cp2k.

No changes have been made to this package yet and more concrete steps shall be taken once the dbcsr package work is completed.

Summary

The multi-arch build process for cp2k maybe complicated by the one static-library-per-architecture method used in the dependent package, dbcsr.


Auxiliary Work & Activities

While working on the aformentioned GSoC Goals, there were a few other things that were also done.

  1. libamdhip64-dev bug file17

    While trying to enable HIP/ROCm in dbcsr, CMakeDetermineHIPCompiler.cmake was unable to find HIP runtime CMake package. After going through some similar issues faced by other developers earlier, it was decided to file a bug report under the libamdhip64-dev package.

    After discussions with and trying the changes suggested by Cory (my mentor) under the bug, the issue was resolved.

    Turns out, the wrong compiler was being used by me! The gcc compiler was supposed to be used and I was using hipcc. The bug was closed since it was not due an issue with the package.

    Cory suggested that I add this info under the ROCm wiki page. It is yet to be done and hopefully I get it done soon.

  2. DebConf25 Talk

    After facing the multi-arch build dilemma with dbcsr (and also getting to know about the issues faced by other fellow package developers), I came to realise that this was more than a packaging, build or programming issue. GPU-packaging was facing a policy issue.

    Hence, I decided to cover this problem in greater detail at my DebConf25 Virtual Presentation under the Outreach Session.

    Shoutout to Cory for his support and Lucas Kanashiro for encouraging me to present my work!

  3. Bi-Weekly AMD ROCm Meetings

    Shortly after the Coding period started, Cory began the initiative of Bi-Weekly AMD ROCm Meetings18. Being a part of the meetings (participated in all but one!), seeing the work the other folks are doing and being able to discuss my own problems was a delight.

  4. (Upcoming) IndiaFOSS 2025 Talk

    After understanding the nuances and beauty of the debian packaging ecosystem in these months, I decided to spread the work about debian packaging and packaging software in general. My talk19 for the same got accepted in the upcoming IndiaFOSS 202520 conference!

    I hope this beings more people towards the packaging ecosystem and to the debian developer ecosystem.

Conclusion

My GSoC time was fantastic! I plan to complete the work that I have started during my GSoC and beyond. Working with Cory21 and Utkarsh22 (a fellow GSoC’25 contributor under Cory) has been a very positive experience.

HIP/ROCm GPU-packaging is in a nascent stage. It is an exciting time to be in this space right now. The problems are new and never encountered before (CPU packaging isn’t architecture specific!). The problems were shall face in the coming time, and our solutions to them will set a precendent for the future.

References

1 : https://summerofcode.withgoogle.com/programs/2025/projects/9s4jUjV0

2 : https://tracker.debian.org/pkg/hipblas-common

3 : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1105114

4 : https://salsa.debian.org/rocm-team

5 : https://packages.debian.org/source/sid/hipblas-common

6 : https://lists.debian.org/debian-ai/2025/05/msg00088.html

7 : https://www.cp2k.org/dbcsr

8 : https://salsa.debian.org/Spaarsh/dbcsr/

9 : https://drive.google.com/file/d/14WQuTMcI-L0lbi3zkUc9pT6RGwwVY0j1/view?usp=sharing

10 : https://docs.google.com/presentation/d/1p-nkHPgg5C5jKGy7ySZ8rts5G2vNFQpQJQ8UySOWgVE

11 : https://github.com/cp2k/dbcsr/discussions/933

12 : https://github.com/pytorch/gloo

13 : https://salsa.debian.org/Spaarsh/gloo

14 : https://tracker.debian.org/pkg/hipify

15 : https://www.cp2k.org/

16 : https://tracker.debian.org/pkg/dbcsr

17 : https://bugs.debian.org/cgi-bin/bugreport.cgi?https://fossunited.org/indiafoss/2025bug=1108159

18 : https://lists.debian.org/debian-ai/2025/05/msg00113.html

19 : https://fossunited.org/c/indiafoss/2025/cfp/dpq0b26ece

20 : https://fossunited.org/indiafoss/2025

21 : https://salsa.debian.org/cgmb

22 : https://salsa.debian.org/utk4r-sh