They get blocked by some sites, and some sites have pro-actively opt-out. archive.org respects the opt-outs. AFAICT, archive.org gets away w/archiving non-optout cases where their bot was permitted.
Archive.org is more than The Wayback Machine. You’re just talking about The Wayback Machine, not archive.org as a whole. Nothing I’ve said in this thread is about The Wayback Machine specifically.
My point is that archive.org does things that bend, skirt, and run afoul of copyright law (and good on them because fuck the system) and they spend more money, time, and resources fighting copyright suits than I’d imagine all Lemmy instance owners pooling their resources could afford. And that’s if they even cared enough to risk dying on that hill.
You might need to explain why 12ft.io gets away with sharing google’s cache, as Lemmy could theoretically operate the same way.
Not sure how this bit is relevant. I was speaking only about your “stage 4 (onsite archive)” item. (I thought that was pretty clear, but apparently not?) I don’t know if 12ft.io is playing with (legal) fire or not, but I’m not sure why it matters to the conversation. Nothing 12ft.io does is comparable to Lemmy users copying articles into comments.
When you say “twisted”, do you mean commentary is not a standard accepted and well-known fair use scenario?
So, I’m only going to be talking about U.S. “fair use” here because as little as I know about that, I know far far less about copyright law in other countries. That said:
First, whether fair use applies is a fairly complex matter which depends among other things on how much of the original work is copied. While maybe not technically determinitive of the validity of a fair use defense, “the whole damn article” definitely won’t help your case when you’re trying to argue a fair use defense in federal court.
Second, I think for a fair use argument to work the way you seem to be suggesting, the quoted portions of(!) the article would have to appear in the same “work” as the commentary, but I’d imagine typically all comments in a Lemmy thread would be distinct “works.” Particularly given that each comment is independently authored and mostly by distinct authors. (Copying an entire article into a comment and following it with some perfunctory “commentary” would be a pretty transparent ham-fisted attempt at a loophole. Again, a very bad look when you’re arguing your defense in federal court.) I don’t know about your Lemmy instance, but mine doesn’t seem to say anything in the legal page that could provide any argument that a thread is a single “work.” (It does say “no illegal content, including sharing copyrighted material without the explicit permission of the owner(s).”)