ninethehacker.xyz Accelerated Tensorflow on AMD and Pop!_OS

Accelerated Tensorflow on AMD and Pop!_OS

the saga continues


The first step to enabling tensorflow on AMD is determining whether or not your GPU is recognised and functioning, here is a minimal “Hello, GPU” in python:

import tensorflow as tf
cards=tf.config.experimental.list_physical_devices('GPU')
print(f"Found {len(cards)} devices")

Unfortunately Tensorflow depends on Eigen and BLAS which require nVidia’s hardware and CUDA API.

This is not the end of the world, there is a python package called tensorflow-rocm (here) which contains a community port of tensorflow to ROCm, the Radeon Open eCosystem, this should not be necessary as the patches were upstreamed september 2019. This indicates you should have accelerated deep learning just as soon as you have the ROCm drivers successfully installed on your system.

docker

AMD provides a docker container here. This is probably the sane and stable way to set up compute on AMD that I would advise anyone with a budget and deadlines to follow. I am hoping to avoid the container overhead (which includes virtualizing ubuntu 18.04) because I am not currently sane.

native packages

AMD provides official packages in a debian repo and detailed installation instructions (here)[https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html]. The packages are marked for xenial, but it appears that this is an oversight and the binaries contain support for all version. However: ROCm currently only supports up to kernel 5.3. Pop!_OS ships kernel 5.4, support for this is an open issue at AMD.

There are still a few options to consider:

  • downgrading the kernel to canonical’s LTS version (this may have been updated to 5.4 at this point)
  • waiting till AMD patches their drivers to 5.4 and putting a hold on kernel updates (not cool from a security standpoint)
  • digging into the rabbithole of tensorflow and rocm docs and pulling the packages from the official repos and gingerly avoiding DKMS package that breaks APT (still quite likely to burn my OS to the ground)

The Arch User Repository contains a community package that installs ROCm on Arch (and presumably edge kernels), their efforts and patches could potentially be reverse engineered for debian/ubuntu/pop, but it does beg the question: why I haven’t migrated to manjaro yet?

At this time the developers of the arch package have not achieved accelerated machine learning in tensorflow or pytorch. issue.

LXC?

It’s faster and lighter than docker, but I’m not sure how a 5.4 host kernel will interact with AMDs software. It still isolates my workstation from broken packages. I have not noticed significant performance problems running Jupyter in Linux Containers on my notebook. Something to consider.

future

David Airlie of Red Hat (fellow Aussie!) is currently working on an open, and hardware agnostic version of Tensorflow built against the Kronos SYCL API. This should provide reasonably performant hardware Tensorflow OOTB on all platforms with GPUs, this is of course when it is completed and filters through debian and canonical. (best guess 2022 or 2024). Check out his talks in the meantime they were very informative.



there are no comments yet


all comments are manually reviewed before publication