Regular Expression Matching: The Virtual Machine Approach (2009)
The article discusses the implementation of regular expression matching using a virtual machine approach. It outlines how regular expressions can be compiled into bytecode and executed by a virtual machine, allowing for features like submatching. The author presents various strategies for executing regular expressions, emphasizing the flexibility of adding new instructions to enhance functionality.
- ▪Henry Spencer's regular expression library is one of the most widely used bytecode interpreters.
- ▪The article presents two strategies for implementing regular expression matching through a virtual machine.
- ▪Submatching instructions can be added to the virtual machine to enhance its capabilities.
Opening excerpt (first ~120 words) tap to expand
Regular Expression Matching: the Virtual Machine Approach Russ Cox [email protected] December 2009 Introduction Name the most widely used bytecode interpreter or virtual machine. Sun's JVM? Adobe's Flash? .NET and Mono? Perl? Python? PHP? These are all certainly popular, but there's one more widely used than all those combined. That bytecode interpreter is Henry Spencer's regular expression library and its many descendants. The first article in this series described the two main strategies for implementing regular expression matching: the worst-case linear-time NFA- and DFA-based strategies used in awk and egrep (and now most greps), and the worst-case exponential-time backtracking strategy used almost everywhere else, including ed, sed, Perl, PCRE, and Python.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Swtch.