It depends. They’re simply the most annoying drives out there because Seagate on their wisdom decided to remove half of the SMART data from reports and they won’t let you change the power settings like other drives. Those drives will never spin down, they’ll even report to the system they’re spun down while in fact they’ll be still running at a lower speed. They also make a LOT of noise.
Aren’t they meant to go in data centers? You wouldn’t want a drive in a data center to spin down. That introduces latency in getting the data off of them.
That should be a choice of the OS / controller card not of the drive itself. Also what datacenter wants to run drives that don’t report half of the SMART data just because they felt like it?
We used to use smart data to predict when to order new drives and on really bad looking days increase our redundancy. Nothing like getting a bad series of drives for PB of data to make you paranoid I guess.
What kind of attributes did you find relevant? I imagine the 19x codes…
I’ve read the Blackblaze statistics and I’m using a tool (Scrutiny) that takes those stats into account for computing failure probability, but at the end of the day the most reliable tell is when a drive gets kicked out of an array (and/or can’t pass the long smart test anymore).
Meanwhile, I have drives with “lesser” attributes sitting on warning values (like command timeout) and ofc I monitor them and have good drives on standby, but they still seem to chug along fine for now.
I have 3 14tb exos drives. I have them in a Roswell 4u hotseap chassis. Running unraid.
It’s nearly inaudible over the very reasonable case fans. No grinding noises. I can hear the heads moving a bit but it’s quite subtle. Not sure why people have such different experiences with these
I noticed when they first spin up on boot they do some sub routine and they’re pretty loud and chatty. First time I heard it I was spooked but it worked fine and I just use it for backup so I just moved on. Once it’s on and in normal operation it’s like any other disk I’ve used over the decades. Nothing as loud as an old scsci disk or a quantum fireball.
Relevant documentation for others about -S / spindown_time:
Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes.
If you are doing high bandwidth GPU work, then PCIe lanes of consumer CPUs are going to be the bottleneck, as they generally only support 16 lanes.
Then there are the threadrippers, xeons and all the server/professional class CPUs that will do 40+ lanes of PCIe.
A lane of PCIe3.0 is about 1GBps (Byte not bit).
So, if you know your workload and bandwidth requirements, then you can work from that.
If you don’t need full 16 lanes per GPU, then a motherboard that supports bifurcation will allow you to run 4 GPUs with 4 lanes each from a CPU that has 16 lanes if PCIe. That’s 4GBps per GPU, or 32Gbps.
If it’s just for transcoding, and you are running into limitations of consumer GPUs (which I think are limited to 3 simultaneous streams), you could get a pro/server GPU like the Nvidia quadros, which have a certain amount of resources but are unlimited in the number of streams it can process (so, it might be able to do 300 FPS of 1080p. If your content is 1080p 30fps, that’s 10 streams). From that, you can work out bandwidth requirements, and see if you need more than 4 lanes per GPU.
I’m not sure what’s required for AI. I feel like it is similar to crypto mining, massive compute but relatively small amounts of data.
Ultimately, if you think your workload can consume more than 4 lanes per GPU, then you have to think about where that data is coming from. If it’s coming from disk, then you are going to need raid0 NVMe storage which will take up additional PCIe lanes.
5? Holy heck, that’s amazing. I remember helping people that had built streaming rigs to use during the pandemic, and wondering why their production was stuttering and having issues with a bunch remote callers. Some of that work ended up being CPU bound.
Although, looks like that patch is for Linux? Not much use if your running vmix or some other windows-only software.
In OPs case, however, that’s not a problem
Without specific experience, my assumption would be no. Much like when plugging into a desktop computer’s motherboard HDMI port instead of the GPU HDMI port.
I just did a quick bing chat search (“does DRI_PRIME work on systems without a cpu with integrated graphics?”) and it says it will work. I can’t check for you because my CPUs all have graphics.
I CAN tell you that some motherboards will support it (my ASUS does) and some don’t (my MSI).
BTW, I’m talking about Linux. If you’re using Windows, there’s a whole series of hoops you have to jump through. LTT did a video a while back.
While it might work in the OS, setting the OS up may be a pain (the installer may or may not work like that) and I strongly suspect that the BIOS can’t handle it.
I suspect that an easier route would be to use a cheap, maybe older, low-end graphics card for the video output and then using DRI_PRIME with that.
It’s probably a pain to set up in Windows. In Linux, it just works, there’s nothing to set up. I’m using it right now.
OP really should have mentioned their OS.
Edit: Actually, nevermind both my posts. I know DRI_PRIME works by using my APU for regular desktop activity, and routing discrete GPU output in whenever a game is being played. But I don’t know if it’s possible to make it use the dGPU all the time.
Even if it did, it would only work inside the OS, so if you had to boot into the BIOS for anything, you wouldn’t have a display. So for all intents and purposes, it wouldn’t really work.
No. The video card is only wired to send video out through it’s ports (which don’t exist) and the ports on the motherboard are wired to go to the nonexistent iGPU on the CPU.
In windows you’re not sending the signal directly through another port. You’re sending the dGPU’s signal through the iGPU to get to the port.
On a laptop with nvidia optimus or AMD’s equivalent you can see the increased iGPU usage even though the dGPU is doing the heavy lifting. it’s about 30% usage on my 11th gen i9’s iGPU routing the 3080s video out to my 4k display.
So next I’d be checking logs for sata errors, pcie errors and zfs kernel module errors. Anything that could shed light on what’s happening. If the system is locking up could it be some other part of the server with a hardware error, bad ram, out of memory, bad or full boot disk, etc.
19 has federation bugs. Mainly outgoing but I’ve also seen incoming federation gradually fail. Restart the docker container routinely (cron job) until fixes come out.
I don’t have practical experience with ZFS, but my understanding is that it uses RAM a lot… if that’s new, it might be worth checking the RAM by booting up memtest (for example) and just ruling that out.
Maybe also worth watching the system with nmon or htop (running in another tmux / screen pane) at the beginning of the next session, then when you think it’s jammed up, see what looks different…
Awesome, thanks for giving some clues. It’s a new build, but I didn’t focus hugely on RAM, I think it’s only 32GB. I’ll try this out.
Edit: I did some reading about L2ARC, so pending some of these tests, I’m planning to get up to 64gb ram and then extend with an l2arc SSD, assuming no other hardware errors.
Just another thought… Maybe just format the drives as a massive EXT4 JBOD (just for a temp test) and copy the data again - just to see if ZFS is the problem… maybe it’s something else altogether? Maybe - and I hope not - the USB source drive is failing after long reads?
I believe there’s another issue. ZFS has been using nearly all RAM (which is fine, I only need RAM for system and ZFS anyway, there’s nothing else running on this box), but I was pretty convinced while I was looking that I don’t have dedup turned on. Thanks for your suggestions and links!
If you’re running TrueNAS, the replication feature was the smoothest and easiest way to move large amounts of data when I did it 18 months back. Once the destination location was accessible from the sending host, it was as simple as kicking off a snapshot, resulting in a fully usable replica on the receiving host. IIRC, IXsystems staff told me rsync can be problematic compared to the replication/snapshot system, as permissions and other metadata can be lost.
Most of yhe subscribed communities seem to be working on your instance. When did you subscribe to permacomputing@lemmy.sdf.org? was it after the upgrade to 0.19.x?
Just to make sure. Are you copying to your ZFS pool directory or a dataset? Check to male sure your paths are correct.
Push vs pull shouldn’t matter but I’ve always done push.
If your zpool is not accessible anymore after a transfer then there is a low-level problem here as it shouldn’t just disappear.
I would installe tmux on your ZFS system and have a window with htop running, dmesg, and zpool status running to check your system while you copy files. Something that severe should become self evedent pretty quickly.
selfhosted
Oldest
This magazine is from a federated server and may be incomplete. Browse more on the original instance.