selfhosted

This magazine is from a federated server and may be incomplete. Browse more on the original instance.

Cyber, in Question - ZFS and rsync

I don’t have practical experience with ZFS, but my understanding is that it uses RAM a lot… if that’s new, it might be worth checking the RAM by booting up memtest (for example) and just ruling that out.

Maybe also worth watching the system with nmon or htop (running in another tmux / screen pane) at the beginning of the next session, then when you think it’s jammed up, see what looks different…

isles, (edited )

Awesome, thanks for giving some clues. It’s a new build, but I didn’t focus hugely on RAM, I think it’s only 32GB. I’ll try this out.

Edit: I did some reading about L2ARC, so pending some of these tests, I’m planning to get up to 64gb ram and then extend with an l2arc SSD, assuming no other hardware errors.

sonstwas, (edited )

Based on this thread it’s the deduplication that requires a lot of RAM.

See also: wiki.freebsd.org/ZFSTuningGuide

Edit: from my understand the pool shouldn’t become inaccessible tho and only get slow. So there might be another issue.

Edit2: here’s a guide to check whether your system is limited by zfs’ memory consumption: github.com/openzfs/zfs/issues/10251

Cyber,

Just another thought… Maybe just format the drives as a massive EXT4 JBOD (just for a temp test) and copy the data again - just to see if ZFS is the problem… maybe it’s something else altogether? Maybe - and I hope not - the USB source drive is failing after long reads?

isles,

I believe there’s another issue. ZFS has been using nearly all RAM (which is fine, I only need RAM for system and ZFS anyway, there’s nothing else running on this box), but I was pretty convinced while I was looking that I don’t have dedup turned on. Thanks for your suggestions and links!

key, in No posts when surfing through my i stance
@key@lemmy.keychat.org avatar

19 has federation bugs. Mainly outgoing but I’ve also seen incoming federation gradually fail. Restart the docker container routinely (cron job) until fixes come out.

Valmond, (edited )

Ouch, thank you 🥲!

How often do you restart it/whats it doing/any idea what’s no longer working or why?

Good luck to the developers!

And thank you obviously!

Valmond, (edited ) in No posts when surfing through my i stance

On the support community on lemmy.world I get:

Socket timeout has expired [the url link, socket_timeout=10000]

Maybe I should just reboot oe something but I’d rather understand an eventual underlying problem…

martini1992, in Question - ZFS and rsync
@martini1992@lemmy.ml avatar

The drives in the zpool, are they SMR drives? Slow write speed and disks dropping out are a symptom of I remember correctly

isles,

They’re Seagate Exos, www.seagate.com/products/cmr-smr-list/ and appear to be CMR

martini1992,
@martini1992@lemmy.ml avatar

So next I’d be checking logs for sata errors, pcie errors and zfs kernel module errors. Anything that could shed light on what’s happening. If the system is locking up could it be some other part of the server with a hardware error, bad ram, out of memory, bad or full boot disk, etc.

bdonvr,

I don’t think they make SMR drives that big

fuckwit_mcbumcrumble, in Hardware question

No. The video card is only wired to send video out through it’s ports (which don’t exist) and the ports on the motherboard are wired to go to the nonexistent iGPU on the CPU.

Appoxo,
@Appoxo@lemmy.dbzer0.com avatar

Depends. You can send the signal in Windows through another port.
But if it works without an iGPU…

fuckwit_mcbumcrumble,

In windows you’re not sending the signal directly through another port. You’re sending the dGPU’s signal through the iGPU to get to the port.

On a laptop with nvidia optimus or AMD’s equivalent you can see the increased iGPU usage even though the dGPU is doing the heavy lifting. it’s about 30% usage on my 11th gen i9’s iGPU routing the 3080s video out to my 4k display.

Appoxo,
@Appoxo@lemmy.dbzer0.com avatar

In that case nevermind.
Carry on.

OpticalMoose, in Hardware question
@OpticalMoose@discuss.tchncs.de avatar

I just did a quick bing chat search (“does DRI_PRIME work on systems without a cpu with integrated graphics?”) and it says it will work. I can’t check for you because my CPUs all have graphics.

I CAN tell you that some motherboards will support it (my ASUS does) and some don’t (my MSI).

BTW, I’m talking about Linux. If you’re using Windows, there’s a whole series of hoops you have to jump through. LTT did a video a while back.

tal,
@tal@lemmy.today avatar

While it might work in the OS, setting the OS up may be a pain (the installer may or may not work like that) and I strongly suspect that the BIOS can’t handle it.

I suspect that an easier route would be to use a cheap, maybe older, low-end graphics card for the video output and then using DRI_PRIME with that.

OpticalMoose, (edited )
@OpticalMoose@discuss.tchncs.de avatar

It’s probably a pain to set up in Windows. In Linux, it just works, there’s nothing to set up. I’m using it right now.

OP really should have mentioned their OS.

Edit: Actually, nevermind both my posts. I know DRI_PRIME works by using my APU for regular desktop activity, and routing discrete GPU output in whenever a game is being played. But I don’t know if it’s possible to make it use the dGPU all the time.

Even if it did, it would only work inside the OS, so if you had to boot into the BIOS for anything, you wouldn’t have a display. So for all intents and purposes, it wouldn’t really work.

subtext, in Hardware question

Without specific experience, my assumption would be no. Much like when plugging into a desktop computer’s motherboard HDMI port instead of the GPU HDMI port.

shnizmuffin, in Hardware question
@shnizmuffin@lemmy.inbutts.lol avatar

Probably not!

What models of GPU and Motherboard are you using?

AimlessNameless,

I’ve got an Nvidia Tesla P40 and haven’t purchased a motherboard yet. It’s currently sitting and doing nothing in my DL380.

bigredgiraffe,

Do you want to not use your DL380? IF no it might make a good moonlight host!

AimlessNameless,

My DL380 draws about 200W idle so I’m trying to downscale

towerful, (edited ) in How would you build a GPU-heavy node?

If you are doing high bandwidth GPU work, then PCIe lanes of consumer CPUs are going to be the bottleneck, as they generally only support 16 lanes.
Then there are the threadrippers, xeons and all the server/professional class CPUs that will do 40+ lanes of PCIe.

A lane of PCIe3.0 is about 1GBps (Byte not bit).
So, if you know your workload and bandwidth requirements, then you can work from that.
If you don’t need full 16 lanes per GPU, then a motherboard that supports bifurcation will allow you to run 4 GPUs with 4 lanes each from a CPU that has 16 lanes if PCIe. That’s 4GBps per GPU, or 32Gbps.
If it’s just for transcoding, and you are running into limitations of consumer GPUs (which I think are limited to 3 simultaneous streams), you could get a pro/server GPU like the Nvidia quadros, which have a certain amount of resources but are unlimited in the number of streams it can process (so, it might be able to do 300 FPS of 1080p. If your content is 1080p 30fps, that’s 10 streams). From that, you can work out bandwidth requirements, and see if you need more than 4 lanes per GPU.

I’m not sure what’s required for AI. I feel like it is similar to crypto mining, massive compute but relatively small amounts of data.

Ultimately, if you think your workload can consume more than 4 lanes per GPU, then you have to think about where that data is coming from. If it’s coming from disk, then you are going to need raid0 NVMe storage which will take up additional PCIe lanes.

ielisa,

Nvidia transcode limit is 5 for consumer GPUs these days, and its very easy to lift that limit if you need with github.com/keylase/nvidia-patch

towerful,

5? Holy heck, that’s amazing. I remember helping people that had built streaming rigs to use during the pandemic, and wondering why their production was stuttering and having issues with a bunch remote callers. Some of that work ended up being CPU bound.
Although, looks like that patch is for Linux? Not much use if your running vmix or some other windows-only software.
In OPs case, however, that’s not a problem

ielisa,

I think you can get it to work with windows somehow , but I’ve never needed to try: github.com/keylase/nvidia-patch/issues/520

grue,

I’m not sure what’s required for AI. I feel like it is similar to crypto mining, massive compute but relatively small amounts of data.

If you’re talking about training models, I think it requires both massive compute and massive amounts of data.

Dark_Arc, in Suggestions for Short Rack Mount Case
@Dark_Arc@social.packetloss.gg avatar

www.amazon.com/gp/aw/d/B09227RQV2?psc=1&ref=p…

This is my favorite rack mount chassis I’ve worked with … and it coincidentally is in that ballpark.

Lettuceeatlettuce,
@Lettuceeatlettuce@lemmy.ml avatar

Interesting design, I’ll look at it, thanks!

TCB13, in Is this Seagate Exos drive too good to be true?
@TCB13@lemmy.world avatar

It depends. They’re simply the most annoying drives out there because Seagate on their wisdom decided to remove half of the SMART data from reports and they won’t let you change the power settings like other drives. Those drives will never spin down, they’ll even report to the system they’re spun down while in fact they’ll be still running at a lower speed. They also make a LOT of noise.

hperrin,

Aren’t they meant to go in data centers? You wouldn’t want a drive in a data center to spin down. That introduces latency in getting the data off of them.

TCB13,
@TCB13@lemmy.world avatar

That should be a choice of the OS / controller card not of the drive itself. Also what datacenter wants to run drives that don’t report half of the SMART data just because they felt like it?

lemmyvore,

Data centers replace drives when they fail and that’s about it. They don’t care much about SMART data.

fruitycoder,

We used to use smart data to predict when to order new drives and on really bad looking days increase our redundancy. Nothing like getting a bad series of drives for PB of data to make you paranoid I guess.

lemmyvore,

What kind of attributes did you find relevant? I imagine the 19x codes…

I’ve read the Blackblaze statistics and I’m using a tool (Scrutiny) that takes those stats into account for computing failure probability, but at the end of the day the most reliable tell is when a drive gets kicked out of an array (and/or can’t pass the long smart test anymore).

Meanwhile, I have drives with “lesser” attributes sitting on warning values (like command timeout) and ofc I monitor them and have good drives on standby, but they still seem to chug along fine for now.

ScreaminOctopus,

I got a set off ebay, Jesus christ they’re loud. I ended up returning them cause I could hear the grinding through my whole house

Lem453,

I have 3 14tb exos drives. I have them in a Roswell 4u hotseap chassis. Running unraid.

It’s nearly inaudible over the very reasonable case fans. No grinding noises. I can hear the heads moving a bit but it’s quite subtle. Not sure why people have such different experiences with these

TCB13,
@TCB13@lemmy.world avatar

I’m questioning your auditory acuity :P

czardestructo,
@czardestructo@lemmy.world avatar

I noticed when they first spin up on boot they do some sub routine and they’re pretty loud and chatty. First time I heard it I was spooked but it worked fine and I just use it for backup so I just moved on. Once it’s on and in normal operation it’s like any other disk I’ve used over the decades. Nothing as loud as an old scsci disk or a quantum fireball.

TCB13,
@TCB13@lemmy.world avatar

Ahaha that’s about what they do.

czardestructo, (edited )
@czardestructo@lemmy.world avatar

I have an Exos x16 and x18 drive and they both spin down fine in Debian using hdparm. I use them for cold storage and they’re perfectly adequate.

TCB13,
@TCB13@lemmy.world avatar

Care you share your hdparm config then?

czardestructo,
@czardestructo@lemmy.world avatar

It’s really boring, Debian 12: /dev/disk/by-uuid/8f041da5-6f7a-4ff5-befa-2d3cc61a382c { spindown_time = 241 write_cache = off }

TCB13,
@TCB13@lemmy.world avatar

Tried that and doesn’t seem to work. :(

Relevant documentation for others about -S / spindown_time:

Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes.

redcalcium, in Help needed setting up NGINX reverse Proxy / HA / Vaultwarden using Duckdns

What happened when you tried to open it on incognito mode / private browsing mode?

Btw, if you’re using Chrome, you can type thisisunsafe to bypass hsts warning if nothing else work.

Lobotomie,

if I close the 8123 port and remove my cache, firefox will warn me, if I click on forward anyways it will forward to a website from my router for some reason saying that the DNS-Rebind-Protection has blocked my attempt and that there is some issue with the host-header.

redcalcium,

Instead of forwarding ha.yourdomain.com to 192.168.178.214 (which I assume is the lan ip address for your machine), you should forward it to a hostname called homeassistant (which is the hostname for the home assistant instance inside your docker compose network).

Lobotomie, (edited )

Now I get a error Fehlercode: SEC_ERROR_UNKNOWN_ISSUER, and if I continue it will again go to my router with the DNS-REbind / Host-Header Issue

Codilingus, in Suggestions for Short Rack Mount Case

Chenbro makes quite a few ATX shallow rack mount cases. I have one and have no complaints.

Lettuceeatlettuce,
@Lettuceeatlettuce@lemmy.ml avatar

I’ll check them, Ty!

ninjan, in Is this Seagate Exos drive too good to be true?

It’s just the cheapest type of drive there is. The use case is in large scale RAIDs where one disk failing isn’t a big issue. They tend to have decent warranty but under heavy load they’re not expected to last multiple years. Personally I use drives like this but I make sure to have them in a RAID and with backup, anything else would be foolish. Do also note that expensive NAS drives aren’t guaranteed to last either so a RAID is always recommended.

rosa666parks,

Ok cool, I plan on using them in RAID Z1

RunningInRVA,

Make that RAID Z2 my friend. One disk of redundancy is simply not enough. If a disk fails while resilvering, which can and does happen, then your entire array is lost.

SexyVetra,

Hard agree. Regret only using Z1 for my own NAS. Nothings gone wrong yet 🤞but we’ve had to replace all the drives once so far which has led to some buttock clenching.

When I upgrade, I will not be making the same mistake. (Instead I’ll find shiny new mistakes to make)

Archer,

Instead I’ll find shiny new mistakes to make

This should be the community slogan

Atemu,
@Atemu@lemmy.ml avatar

You must be running an icredible HA software stack for uptime increases so far behind the decimal to matter.

RunningInRVA,

That was uncalled for.

Randelung,

To support this: Backblaze consistently reports much higher failure rates for Seagate drives than all others. I personally don’t trust them. All my failed drives are Seagate, but that’s anecdotal. www.backblaze.com/…/hard-drive-test-databackblaze.com/…/backblaze-drive-stats-for-2022/ the by manufacturer graph.

vithigar, (edited )

That tracks with my experience as well. Literally every single Seagate drive I’ve owned has died, while I have decade old WDs that are still trucking along with zero errors. I decided a while back that I was never touching Seagate again.

Passerby6497,

I actually had my first WD failure this past month, a 10tb drive I shucked from an easystore years ago (and a couple moves ago). My Synology dropped the disk and I’ve replaced it, and the other 3 in the NAS bought around the same time are chugging away like champs.

ninjan,

For sure higher but still not high, we’re talking single digit percentage failed drives per year with a massive sample size. TCO (total cost of ownership) might still come out ahead for Seagate being that they are many times quite a bit cheaper. Still drives failures are a part of the bargain when you’re running your own NAS so plan for it no matter what drive you end up buying. Which means have cash on hand to buy a new one so you can get up to full integrity as fast as possible. (Best is of course to always have a spare on hand but that isn’t feasible for a lot of us.).

redcalcium, (edited ) in Problem while trying to setup an instance

If you’re not familiar with Ansible, I recommend to install your Lemmy instance using docker compose or Lemmy Easy Deploy.

If you still want to use Ansible, make sure to use version >= 2.11.0 . From the screenshot, chance that you might accidentally installed v2.10

Infinitus,

Thank you! Will try.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • selfhosted@lemmy.world
  • localhost
  • All magazines
  • Loading…
    Loading the web debug toolbar…
    Attempt #