At the simplest, it takes in a vector of floating-point numbers, multiplies them with other similar vectors (the “weights”), sums each one, applies a RELU* the the result, and then uses those values as a vector for another layer with it’s own weights (or gives output). The magic is in the weights.
This operation is a simple matrix-by-vector product followed by pairwise RELU, if you know what that means.
Where modelWeights is [[[Float]]], and so layer has type [Float] -> [[Float]] -> [Float].
RELU: if i>0 then i else 0. It could also be another nonlinear function, but RELU is obviously fast and works about as well as anything else. There’s interesting theoretical work on certain really weird functions, though.
Less simple, it might have a set pattern of zero weights which can be ignored, allowing fast implementation with a bunch of smaller vectors, or have pairwise multiplication steps, like in the Transformer. Aaand that’s about it, all the rest is stuff that was figured out by trail and error like encoding, and the math behind how to train the weights. Now you know.
Assuming you use hex values for 32-bit weights, you could write a line with 4 no problem:
That’s cool, though honestly I haven’t fully understood, but that’s probably because I don’t know Haskell, that line looked like complete gibberish to me lol. At least I think I got the gist of things on a high level, I’m always curious to understand but never dare to dive deep (holds self from making deep learning joke). Much appriciated btw!
Yeah, maybe somebody can translate for you. I considered using something else, but it was already long and I didn’t feel like writing out multiple loops.
No worries. It’s neat how much such a comparatively simple concept can do, with enough data to work from. Circa-2010 I thought it would never work, lol.
Recently switched jobs from maintaining a 15 year old Windows Forms .NET Framework legacy codebase.
At the new job we stick to Clean Architecture, use unit and integration tests, have a code generation tool, actually make nice use of generics and use dependency injection. Also agile processes, automatic build tools, whatever. The difference is night and day and I’m so glad my ex boss fired me because I told him he’s an asshole and his codebase is shit.
My first job out of college I have been able to see a steady improvement in the codebase. A little while ago I had to go back to an old tag and was horrified with what it used to be and impressed how much it improved.
Does anyone remember when something like this actually happened? Maybe it’s the Mandela effect but U sweat at one stage a whole heap of sites were using black/dark mode to save the planet
programming.dev
Hot