@AlmightySnoo@lemmy.world
@AlmightySnoo@lemmy.world avatar

AlmightySnoo

@AlmightySnoo@lemmy.world

Yoko, Shinobu ni, eto… 🤔

עַם יִשְׂרָאֵל חַי Slava Ukraini 🇺🇦 ❤️ 🇮🇱

This profile is from a federated server and may be incomplete. Browse more on the original instance.

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

HIP is amazing. For everyone saying “nah it can’t be the same, CUDA rulez”, just try it, it works on NVidia GPUs too (there are basically macros and stuff that remap everything to CUDA API calls) so if you code for HIP you’re basically targetting at least two GPU vendors. ROCm is the only framework that allows me to do GPGPU programming in CUDA style on a thin laptop sporting an AMD APU while still enjoying 6 to 8 hours of battery life when I don’t do GPU stuff. With CUDA, in terms of mobility, the only choices you get are a beefy and expensive gaming laptop with a pathetic battery life and heating issues, or a light laptop + SSHing into a server with an NVidia GPU.

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

ROCm is that its very unstable

That’s true, but ROCm does get better very quickly. Before last summer it was impossible for me to compile and run HIP code on my laptop, and then after one magic update everything worked. I can’t speak for rendering as that’s not my field, but I’ve done plenty of computational code with HIP and the performance was really good.

But my point was more about coding in HIP, not really about using stuff other people made with HIP. If you write your code with HIP in mind from the start, the results are usually good and you get good intuition about the hardware differences (warps for instance are of size 32 on NVidia but can be 32 or 64 on AMD and that makes a difference if your code makes use of warp intrinsics). If however you just use AMD’s CUDA-to-HIP porting tool, then yeah chances are things won’t work on the first run and you need to refine by hand, starting with all the implicit assumptions you made about how the NVidia hardware works.

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

Yup, it’s definitely about the “open-source” part. That’s in contrast with Nvidia’s ecosystem: CUDA and the drivers are proprietary, and the drivers’ EULA prohibit you from using your gaming GPU for datacenter uses.

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

Works out of the box on my laptop (the export below is to force ROCm to accept my APU since it’s not officially supported yet, but the 7900XTX should have official support):

https://lemmy.world/pictrs/image/18fc2c67-2486-4205-bfa1-bcc3df638bfd.png

Last year only compiling and running your own kernels with hipcc worked on this same laptop, the AMD devs are really doing god’s work here.

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

Hard to tell as it’s really dependent on your use. I’m mostly writing my own kernels (so, as if you’re doing CUDA basically), and doing “scientific ML” (SciML) stuff that doesn’t need anything beyond doing backprop on stuff with matrix multiplications and elementwise nonlinearities and some convolutions, and so far everything works. If you want some specific simple examples from computer vision: ResNet18 and VGG19 work fine.

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

They’re worse than us Arch users (btw)

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

It’s a lifelong learning nerding process

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

It depends. I’m working in the quant department of a bank and we work on pricing libraries that the traders then use. Since traders often use Excel and expect add-ins, we have a mostly Windows environment. Our head of CI, a huge Windows and Powershell fan, once then decided to add a few servers with Linux (RHEL) on them to have automated Valgrind checks and gcc/clang builds there to continuously test our builds for warnings, undefined behavior (gcc with O3 does catch a few of them) and stuff.

I thought cool, at least Linux is making it into this department. Then I logged into one of those servers.

The fucker didn’t like the default file system hierarchy and did stuff like /Applications and `/Temp’ and is installing programs by manually downloading binaries and extracting them there.

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

Ubuntu is just Windows in Tux’ clothing

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

Bad track record with their privacy invasion via their Amazon shenanigans (which Richard Stallman called the Ubuntu Spyware), the shilling of Ubuntu One cloud and now Ubuntu Pro subscriptions that are reminiscent of Microsoft’s shilling of Microsoft accounts and OneDrive, Snap telemetry…

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

Me when someone’s Ubuntu install reaches EOL: just install Arch

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

That repo is just pure trolling, read the “Improved performance” section and open some source files and you’ll understand why.

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

Double and triple buffering are techniques in GPU rendering (also used in computing, up to double buffering only though as triple buffering is pointless when headless).

Without them, if you want to do some number crunching on your GPU and have your data on the host (“CPU”) memory, then you’d basically transfer a chunk of that data from the host to a buffer on the device (GPU) memory and then run your GPU algorithm on it. There’s one big issue here: during the memory transfer, your GPU is idle because you’re waiting for the copy to finish, so you’re wasting precious GPU compute.

So GPU programmers came up with a trick to try to reduce or even hide that latency: double buffering. As the name suggests, the idea is to have not just one but two buffers of the same size allocated on your GPU. Let’s call them buffer_0 and buffer_1. The idea is that if your algorithm is iterative, and you have a bunch of chunks on your host memory on which you want to apply that same GPU code, then you could for example at the first iteration take a chunk from host memory and send it to buffer_0, then run your GPU code asynchronously on that buffer. While it’s running, your CPU has the control back and it can do something else. Here you prepare immediately for the next iteration, you pick another chunk and send it asynchronously to buffer_1. When the previous asynchronous kernel run is finished, you rerun the same kernel but this time on buffer_1, again asynchronously. Then you copy, asynchronously again, another chunk from the host to buffer_0 this time and you keep swapping the buffers like this for the rest of your loop.

Now some GPU programmers don’t want to just compute stuff, they also might want to render stuff on the screen. So what happens when they try to copy from one of those buffers to the screen? It depends, if they copy in a synchronous way, we get the initial latency problem back. If they copy asynchronously, the host->GPU copy and/or the GPU kernel will keep overwriting buffers before they finish rendering on the screen, which will cause tearing.

So those programmers pushed the double buffering idea a bit further: just add an additional buffer to hide the latency from sending stuff to the screen, and that gives us triple buffering. You can guess how this one will work because it’s exactly the same principle.

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

Biased opinion here as I haven’t used GNOME since they made the switch to version 3 and I dislike it a lot: the animations are so slow that they demand a good GPU with high vRAM speed to hide that and thus they need to borrow techniques from game/GPU programming to make GNOME more fluid for users with less beefy cards.

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

It’s as if you are in an isekai

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

You just need more EXP to unlock the Appraisal skill

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

they obviously have upscalers in their brains

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

but some of these processes involve going through Excel files which can take these bots 10s of minutes, which can be done instantly in any scripting language

The key is being proactive. Have you tried suggesting that to them? Do a small POC with say a Python script and show them the difference on one of the Excel files, they’re likely to like your alternative. They’re likely to have poor data warehousing too and it could be an opportunity for you to shine and at the same time get to learn to do that for them from scratch.

AlmightySnoo,
@AlmightySnoo@lemmy.world avatar

If he’s a contractor it’s unlikely he’ll stay there for too long. I’d bring up the improvements and potential gains (faster processing, ideally no more UiPath license costs) directly with your boss. If they’re still not open to that then yeah I’d look elsewhere, because even as an IT automation job it just screams laziness.

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

“these bots”: Yeah, you are being an asshole

I’m pretty sure he didn’t mean his colleagues and is rather talking about the UiPath bots, it’s an IT automation tool… 🤖

AlmightySnoo, (edited )
@AlmightySnoo@lemmy.world avatar

“With the new Desktop Cube, you can switch between workspaces in 3D. Your app windows float off the desktop surface with a parallax effect, so you can see behind them,” said the Zorin OS team. “There’s also the new Spatial Window Switcher, which replaces the standard flat Alt+Tab and Super+Tab dialog with a 3D window switcher.”

Compiz Fusion is an idea and ideas never die

  • All
  • Subscribed
  • Moderated
  • Favorites
  • localhost
  • All magazines
  • Loading…
    Loading the web debug toolbar…
    Attempt #