The true power of regular expressions (2012)
The article discusses the capabilities of regular expressions in parsing HTML and other data formats. It explains the concept of regular grammars and how regular expressions can represent them more concisely. Additionally, it highlights the differences between formal definitions of regular languages and the practical implementations of regular expressions in programming languages.
- ▪Regular expressions can be powerful tools for parsing data, contrary to the common belief that they cannot parse HTML.
- ▪The article explains the formal definition of regular grammars and how they can be represented using regular expressions.
- ▪Modern regex implementations can match more than just regular languages, expanding their utility in programming.
Opening excerpt (first ~120 words) tap to expand
As someone who frequents the PHP tag on StackOverflow I pretty often see questions about how to parse some particular aspect of HTML using regular expressions. A common reply to such a question is: You cannot parse HTML with regular expressions, because HTML isn’t regular. Use an XML parser instead. This statement - in the context of the question - is somewhere between very misleading and outright wrong. What I’ll try to demonstrate in this article is how powerful modern regular expressions really are.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Npopov.