Fedora 40 Looks To Ship AMD ROCm 6 For End-To-End Open-Source GPU Acceleration

ylai@lemmy.ml · 11 months ago

Fedora 40 Looks To Ship AMD ROCm 6 For End-To-End Open-Source GPU Acceleration

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 11 months ago

HIP is amazing. For everyone saying “nah it can’t be the same, CUDA rulez”, just try it, it works on NVidia GPUs too (there are basically macros and stuff that remap everything to CUDA API calls) so if you code for HIP you’re basically targetting at least two GPU vendors. ROCm is the only framework that allows me to do GPGPU programming in CUDA style on a thin laptop sporting an AMD APU while still enjoying 6 to 8 hours of battery life when I don’t do GPU stuff. With CUDA, in terms of mobility, the only choices you get are a beefy and expensive gaming laptop with a pathetic battery life and heating issues, or a light laptop + SSHing into a server with an NVidia GPU.

Molecular0079@lemmy.world · 11 months ago

The problem with ROCm is that its very unstable and a ton of applications break on it. Darktable only renders half an image on my Radeon 680M laptop. HIP in Blender is also much slower than Optix. We’re still waiting on HIP-RT.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · 11 months ago

ROCm is that its very unstable

That’s true, but ROCm does get better very quickly. Before last summer it was impossible for me to compile and run HIP code on my laptop, and then after one magic update everything worked. I can’t speak for rendering as that’s not my field, but I’ve done plenty of computational code with HIP and the performance was really good.

But my point was more about coding in HIP, not really about using stuff other people made with HIP. If you write your code with HIP in mind from the start, the results are usually good and you get good intuition about the hardware differences (warps for instance are of size 32 on NVidia but can be 32 or 64 on AMD and that makes a difference if your code makes use of warp intrinsics). If however you just use AMD’s CUDA-to-HIP porting tool, then yeah chances are things won’t work on the first run and you need to refine by hand, starting with all the implicit assumptions you made about how the NVidia hardware works.

filister@lemmy.world · 10 months ago

How is the situation with ROCm using consumer GPUs for AI/DL and pytorch? Is it usable or should I stick to NVIDIA? I am planning to buy a GPU in the next 2-3 months and so far I am thinking of getting either 7900XTX or the 4070 Ti Super, and wait to see how the reviews and the AMD pricing will progress.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 10 months ago

Works out of the box on my laptop (the export below is to force ROCm to accept my APU since it’s not officially supported yet, but the 7900XTX should have official support):

Last year only compiling and running your own kernels with hipcc worked on this same laptop, the AMD devs are really doing god’s work here.

filister@lemmy.world · edit-2 10 months ago

Anything that is still broken or works better on CUDA? It is really hard to get the whole picture on how things are on ROCm as the majority of people are not using it and in the past I did some tests and it wasn’t working well.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 10 months ago

Hard to tell as it’s really dependent on your use. I’m mostly writing my own kernels (so, as if you’re doing CUDA basically), and doing “scientific ML” (SciML) stuff that doesn’t need anything beyond doing backprop on stuff with matrix multiplications and elementwise nonlinearities and some convolutions, and so far everything works. If you want some specific simple examples from computer vision: ResNet18 and VGG19 work fine.

deafboy@lemmy.world · edit-2 11 months ago

If it means I won’t have to do a ritual dance under the full moon, facing towards finland, just to get it installed correctly, I welcome my new gentleman overlords.

woelkchen@lemmy.world · edit-2 11 months ago

I never understood why AMD themselves don’t work in integration in Debian and Fedora. That way Ubuntu and RHEL would automatically inherit it. At worst it would be in Universe/EPEL.

LinusWorks4Mo@kbin.social · 10 months ago

either docker pull pytorch/rocm:latest, or yay -S rocm-hip-runtime, works reliably for me

Secret300@sh.itjust.works · 11 months ago

What is “end-to-end GPU Acceleration”? Like for playing back video? Or for rendering stuff like in blender

IHeartBadCode@kbin.social · 11 months ago

Data science term. Means everything runs inside the GPU entirely. No CPU or system RAM outside of the (usually Python) interface that started, monitors, and collects the result of the job.

ROCm is AMD’s solution to CUDA that covers for nVidia.

Codilingus@sh.itjust.works · 10 months ago

For years I’ve wondered what ROCm was, too lazy to figure it out. Thank you for this!

IHeartBadCode@kbin.social · 10 months ago

Both are vendor specific implementations of processing on GPUs. This is in opposition to open standards like OpenCL, which a lot of the exascale big boys out there mostly use.

nVidia spent a lot of cash on “outreach” to get CUDA into a lot of various packages in R, python, and what not. That did a lot of displacement from OpenCL stuff. These libraries are what a lot of folks spin up on as most of the leg work is done for them in the library. With the exascale rigs, you literally have a team that does nothing but code very specific things on the machine in front of them, so yeah, they go with the thing that is the most portable, but doesn’t exactly yield libraries for us mere mortals to use.

AMD has only recently had the cash to start paying folks to write libs for their stuff. So were starting to see it come to python libs and what not. Likely, once it becomes a fight of CUDA v ROCm, people will start heading back over to OpenCL. The “worth it” for vendor lock-in for CUDA and ROCm will diminish more and more over time. But as it stands, with CUDA you do get a good bit of “squeezing that extra bit of steam out of your GPU” by selling your soul to nVidia.

That last part also plays into the “why” of CUDA and ROCm. If you happen to NOT have a rig with 10,000 GPUs, then the difference between getting 98% of your GPU and 99.999% of your GPU means a lot to you. If you do have 10,000 GPUs, having like a 1% inefficiency is okay, you’ve got 10,000 GPUs the 1% loss is barely noticeable and not worth it to lose portability with OpenCL.

Secret300@sh.itjust.works · 10 months ago

Ah okay dope

subtext@lemmy.world · 11 months ago

I think end-to-end refers to the “open source”, not the GPU acceleration. I know GPUs have always been a black magic to get working and so you often have to use proprietary, closed-source blobs from the manufacturer to get them to work.

The revolution that this is bringing seems to be that all that black magic has been able to be implemented in open-source software.

Could be wrong though, that’s just how I interpreted the article.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · 11 months ago

Yup, it’s definitely about the “open-source” part. That’s in contrast with Nvidia’s ecosystem: CUDA and the drivers are proprietary, and the drivers’ EULA prohibit you from using your gaming GPU for datacenter uses.

woelkchen@lemmy.world · 11 months ago

Any sort of computing done on the GPU. Not sure what they mean by “end-to-end”. Perhaps that users don’t have to mess with installers.

AutoTL;DR@lemmings.world · 11 months ago

This is the best summary I could come up with:

Fedora 40 is looking at shipping the AMD ROCm 6.x GPU compute stack to offer “end-to-end open-source GPU acceleration” with ease for this Red Hat funded Linux distribution.

Fedora has been among the Linux distributions already working on packaging up AMD’s ROCm to make it easier to deploy this GPU compute solution on their platform.

This has often been a headache for those wanting to use AMD ROCm outside of the few officially supported enterprise Linux distributions.

This change proposal is being pursued by Red Hat’s Tom Rix.

To address this feedback several packages are in the process of being added to Fedora including rocFFT rocSolver hipBLASLt MiOpen.

… Fedora has finally end-to-end open source GPU acceleration.

The original article contains 362 words, the summary contains 117 words. Saved 68%. I’m a bot and I’m open source!