Piggybacking onto this to mention my go-to online RegEx editor: RegExr. It lets you test the regex as you type, explains the particular symbols used, as well as has a sidebar where you can see different pattern types categorically. I’ve been using it for almost 2 years now, and haven’t had any reason to use much else (after I discovered this).
Wait. Are there flavors of regex? Every time I have to use regex it hurts my brain and I never need to do it enough to actually sit down and learn it properly like OP is doing. Just knowing there are different ways of doing the same things in an already mind baffeling language blows me away even more.
Yes. Most things use pcre, or Perl Compatible Regular Expressions, but there are other flavors. Usually they lack features or have slightly different syntax.
Yeah. The only one you really need to care about (especially under Linux) is PCRE, the good 'ol Perl Compatible Regular Expressions. For the most part, every other flavor is a derivative of that. Microsoft had a weird version for a while, but that may be completely dead now, thankfully.
Learning the syntax of regex is fairly easy. Hell, I still have to use this cheat sheet more often now that my perl skills are no longer needed or even relevant.
Regex isn’t that hard. The challenge is identifying and understanding patterns in the data that you are filtering. Here is a brain hack: As an example, if to have pages and pages of logs that you need to filter, open up one of the log files, stare at the screen and hold the page down key for several dozen pages. Patterns can be easily seen in the blur of text that is quickly scrolling across the screen. (Our brains love to find patterns in noise, btw.) The patterns that you see will give you focus points for developing regular expressions to match. ie: You start breaking strings into chunks and seeing the ebb and flow of data streaming across a screen helps. Anomalies in the data “stream” are are easy to spot as well.
From a security and efficiency standpoint, you should also understand where the most processing takes place so you don’t kill whatever platform you are working on.
Sorry for the rambling, but I am getting older and feel the need to pass on a ton of tips and tricks whenever I can for these “archaic” languages.
Thanks for the comprehensive reply! I have only used it for quite simple things like getting the id’s out of log lines where this and this key word exist. Great tip about pattern searching!
The only one you really need to care about (especially under Linux) is PCRE,
Well, no. sed, grep, awk, vi etc. use POSIX regexes. GNU implementations also provide perl compatible mode via an unportable option. In modern programming languages like go and rust standard regex engines are compatible to RE2 - relatively new dialect developed in Google that is not described in the Friedl’s book (you may think of it as an extension of extended POSIX dialect). Even raku has its own dialect incompatible to perl as well as other ones.
Nowadays it is common to move away from perl-like engines, however they are still widely used in PCRE based software and software written in python, JS etc.
Perl has introduced powerful backtracking regexes that were widely adopted. However they can be damn slow in some cases, that’s why RE2 refused backtracking while using some perl-like elements. Both basic and extended POSIX regexes are also non-backtracking because they are older than perl.
Regex101 is amazing. It tends to balk at backtracing which we rely on a lot for work, but it’s such a good visual.
Chat GPT can also save a lot of time writing regex, but it tends to write very unreadable regex because it thinks it’s being clever when it really isnt.
Regex is an art form, and writing readable regex is another step above that.
That’s really cool! I know some regex and I tried to learn vim regex, only to find out it’s a rabbithole so deep I’m afraid to look into. The feeling when you press enter and your carefully crafted regex does exactly what it’s supposed to do is awesome though. Good luck!
Vim is on my list of things to learn. I didn’t even know vim had its own regex, but I suppose that makes sense. I’ve messed with vim a bit, but have stuck to nano so far.
Give a man a regular expression and he’ll match a string… teach him to make his own regular expressions and you’ve got a man with problems. – yakugo in regex.info/blog/2006-09-15/247#comment-3022 (and yes, it is http:// never https:// for this domain)
I can also recommend the book the TS mentioned, it is very good and after reading it you will understand regular expressions. It’s fine to use a cheat sheet if you want, cause if you don’t do it regularly the knowledge can sag, but the understanding is what matters. Also depending on the context, different implementations can have slightly different syntax or modifiers to be aware of.
I lent out the book to my brother once and he somehow lost it, so I never got it back. Don’t lend out book guys.
And remember not everything can be solved using a regular expression: xkcd.com/1171/
linux
Oldest
This magazine is from a federated server and may be incomplete. Browse more on the original instance.