I think 0.19 is reverting that behaviour, because it was indeed a certified bad idea.
I think the idea was to attempt to bulletproof potentially crappy clients especially after the XSS incident, but the problem is itās simply not even always rendered in a web context which makes the processing kind of a pain.
Wouldnāt surprise me if it becomes double and triple encoded too at times because of the federation. Do you encode again or trust that the remote sent you urlencoded data already?
Best format is the original format and transform as late as possible, ideally in clients where thereās awareness of what characters are special. It is in web, not so much in an Android or terminal app.
I donāt think the Lemmy devs are particularly experienced web developers in general. Thereās been a fair amount of dubious API design decisions like passing auth as a GET parameter⦠Thankfully they also fixed that one in 0.19.
Because then you need to take care everywhere to decode it as needed and also make sure you never double-encode it.
For example, do other servers receive it pre-encoded? What if the remote instance doesnāt do that, how do you ensure what other instances send you is already encoded correctly? Do you just encode whatever you receive, at risk of double encoding it? And generally, what about use cases where you donāt need it, like mobile apps?
Data should be transformed where it needs it, otherwise you always add risks of messing it up, which is exactly what weāre seeing. That encoding is reversible, but then itās hard to know how many times it may have been encoded. For example, if I type & which is already an entity, do you detect that and decode it even though I never intended to because Iām posting an HTML snippet?
Right now itās so broken that if you edit a post, you get an editor⦠with escaped HTML entities. What happens if you save your post after that? Itās double encoded! Now everyone and every app has to make sure to decode HTML entities and it leads to more bugs.
There is exactly one place where it needs to encode, and thatās in web clients, more precisely, when itās being displayed as HTML. Thatās where it should be encoded. Mobile apps donāt care they donāt even render HTML to begin with. Bots and most things using the API donāt care. They shouldnāt have to care because it may be rendered as HTML somewhere. It just creates more bugs and more work for pretty much everyone involved. It sucks.
Now we have an even worse problem is that we donāt know what post is encoded which way, so once 0.19 rolls out and thereās version mismatches itās going to be a shitshow and may very well lead to another XSS incident.
It still leads to unsolvable problems like, what is expected when two instances federate content with eachother? What if you use a web app to use a third party instance and it spits out unsanitized data?
If you assume itās part of the API contract, then an evil instance can send you unescaped content and you got an exploit. If you escape it youāll double escape it from well behaved instances. This applies to apps too: now if Voyager for example starts expecting pre-sanitized data from the API, and it makes an API call to an evil instance that doesnāt? Bam, youāve got yourself potential XSS. Thereās nothing they can do to prevent it. Either itās inherently unsafe, or safe but will double-escape.
You end up making more vulnerabilities through edge cases than you solve by doing that. Now all an attacker needs to do is find a way to trick you into thinking they have sanitized data when itās not.
The only safe transport for user data is raw. You can never assume any user/remote input is pre-sanitized. Apps, even web ones, shouldnāt assume the data is sanitized, they should sanitize it themselves because only then you can guarantee that it will come out correctly, and safely.
This would only work if you own both the server and the UI that serves it. It immediately falls apart when you donāt control the entire pipeline from submission to display, and on the fediverse with third party clients and apps and instances, you inherently canāt trust anything.
what is you're phone journey?
optionally: what was the main browser you used on youāre phoneās
why do & ampersands never display properly in titles?
but work in body text &