What’s the best way to make an offsite backup for 42tb at this point with 20mbps of bandwidth? It would take over 6 months to upload while maxing out my connection.
Maybe I could sneakernet an initial backup then incrementally replicate?
Outside my depth but I'll give it a stab. Identify what data is important, (is the full 42Tb needed?). Can the data be split into easier to handle chunks?
If it is, then I personally do an initial sneakernet to get the fist set of data over. Then mirror different on a regular basis.
First thing to do is check SMART data to see if there are any fails. Then looking at usage hours, spin ups, pre-fails / old-age to get a general idea how worn the drive is and for how long you could make use of it depending on risk acceptance.
If there are already several clusters relocated and multiple spin up fails, I’d probably return the drive.
Apart from all the reliability stuff: I’d check the content of the drive (with a safe machine) - if it wasn’t wiped you might want to notify the previous owner, so she can change her passwords or notify customers about the leak (in compliance to local regulations) etc. - even if you don’t exploit that data, the merchants/dealers in the chain might already have.
My guess would be that it’s stored in some kind of non-volatile memory, i.e. EEPROM. Not sure if anyone ever tried that, but with the dedication of some hardware hackers that seems at least feasible. Reverse engineering / overriding the HDD’s firmware would be another approach to return fake or manipulated values.
I haven’t seen something like that in the wild so far. What I have seen are manipulated USB sticks though: advertising the wrong size (could be tested with h2testw) or worse.
I’ve bought used / refurbished (not sure which) with erased smart data. It being all zeros was a clear sign of erased / tampered info. After running badblocks some relocated sectors showed up.
Just some other suggestions, there's also Syncthing (for backups and syncing devices) and PhotoPrism (like the user who suggested Immich, for gallery view.)
Check out Immich for the photo backups. You can have multiple users with their own personal libraries. My family has Android and iOS backing up to my server right now, and its super nice to have it all consolidated.
Other than that, I second the nextcloud option. You can set the nextcloud app (which is available on all major OS) to auto upload pictures. Les Pas is a great way to view and manage a nextcloud photo library from Android.
One upon a time I might’ve suggested FreeNAS/TrueNAS but now that unraid supports ZFS there’s not a really compelling reason except if you can’t afford the license fee (which is very reasonable $59-$129 for a lifetime license depending on the license you want. If you don’t want ZFS you can run a standard XFS file system that can be accessed by basically anything. Parity expansion and all the good stuff.
Just saw your additional comment. You can run dockers for Plex and more or full VMs.
They also sometimes do Black Friday discounts to buy a basic license and fully upgrade it).
Look at videos by spaceinvaderone for some examples of what it can do. I have two pro licenses and it’s easily the best computer purchase I’ve made in the last two decades.
Take a look at hosting your own Nextcloud instance. It’ll replace Google drive, photos, docs, everything–there’s phone apps for iPhone and android. If you want to store your PC backups on it, that’s probably fine too. It might even work ok on the Pi 4 (though some parts it has integrations with may have trouble, like Nextcloud Office, since they may not have ARM binaries in their distribution).
It should work great on your local network and still be acceptable when uploading out and about (photos can auto sync if you turn that on on your Nextcloud phone app).
If 4TB is enough for your needs, I’d suggest getting another 4TB and making them a RAID1 pair using mdadm, and then probably also another 4TB to make backups of Nextcloud and Nextcloud data onto to keep offsite. You can never have too many copies of your data.
I’m not sure what to do about the variety of smaller drives. I can say I wouldn’t recommend consolidating them onto a single drive, because I did that once (many drives ranging from 60 gigglebytes to 300, onto one 1.5 TB drive) and then formatted or got rid of the smaller ones…and then dropped the 1.5 TB drive on the floor while it was running. Rip. But just like the above, a RAID1 array composed of two big drives would probably be fine.
Just make sure to set up some alerts for when a drive fails.
Sorry for the double post, I don’t see how to edit a post with Memmy, but at some point I’d like to use Jellyfin or Plex. Is that something that needs to be separate, or can it be combined with the rest of this?
One easy solution might be to check into a self-hosted search engine? I’ve used mnogosearch in the past which worked well for spidering a single domain, but it only created the database and didn’t have a web page front end. Still, if you let it go crazy across your nextcloud pages and add a search bar to your website it could provide what you’re missing. They provided enough examples at the time for me to write my own search page pretty easily.
Thank you for this! I have sent this suggestion off to our web wizard it looks extremely promising, we had wanted to attempt something like this but couldn’t find a foot hold to get started!
Good luck! And don’t get stuck on the software I use, you may find something else that is better suited for your type of data. Like if your content is wrapped up in PDFs or some kind of zipped files then the best solution is one that can peer into those files to give search hits on the given text. Of course if your content is already fairly plan text then pretty much any solution would work.
They still happily exist on YouTube- for now. So no point in re-hosting, they’ll get squirreled away into the Giant Hard Drive of Doom.
If something happens to the actual archive project in the near future, I’ll likely section them up into 20gb pieces and post them out on a torrent someplace.
Curcial and WD havea much higher rate on average across all their models.
The 800% is only because they had a single drive for a certain model, and it failed within 2 months. They have a lot of other Seagate models that are much older on average without any failures.
Interesting, from that data it seems SSD reliability so far isn’t too far off from HDDs (at least for Backblaze’s intensive cloud storage workload) despite having no moving parts…
Will be interesting to see how the reliability plays out as they build up their SSD inventory over the coming years
I agree. Consumer use cases of SSDs sees a tremendous benefit if only for accidental damage reasons, but for enterprise data center use I would not have expected the same overall rates of failure.
datahoarder
Active
This magazine is from a federated server and may be incomplete. Browse more on the original instance.