Regular Expressions Cookbook
- Media: Book (Paperback, 512 pages)
- ISBN: 0596520689
- Publisher: O'Reilly Media
- Release Date: May 22, 2009
Product Description
Whether you're a novice or an experienced user, Regular Expressions Cookbook will help deepen your understanding of the tool. You'll learn powerful new tricks, avoid language-specific gotchas, and save valuable time with this huge library of proven solutions to difficult, real-world problems.
Searching and Replacing with Regular Expressions
Search-and-replace is a common job for regular expressions. A search-and-replace function takes a subject string, a regular expression, and a replacement string as input. The output is the subject string with all matches of the regular expression replaced with the replacement text. Although the replacement text is not a regular expression at all, you can use certain special syntax to build dynamic replacement texts. All flavors let you reinsert the text matched by the regular expression or a capturing group into the replacement. Recipes 2.20 and 2.21 explain this. Some flavors also support inserting matched context into the replacement text, as Recipe 2.22 shows. In Chapter 3, Recipe 3.16 teaches you how to generate a different replacement text for each match in code.
Many Flavors of Replacement Text
Different ideas by different regular expression software developers have led to a wide range of regular expression flavors, each with different syntax and feature sets. The story for the replacement text is no different. In fact, there are even more replacement text flavors than regular expression flavors. Building a regular expression engine is difficult. Most programmers prefer to reuse an existing one, and bolting a search-and-replace function onto an existing regular expression engine is quite easy. The result is that there are many replacement text flavors for regular expression libraries that do not have built-in search-and-replace features.
Fortunately, all the regular expression flavors in this book have corresponding replacement text flavors, except PCRE. This gap in PCRE complicates life for programmers who use flavors based on it. The open source PCRE library does not include any functions to make replacements. Thus, all applications and programming languages that are based on PCRE need to provide their own search-and-replace function. Most programmers try to copy existing syntax, but never do so in exactly the same way.
This book covers the following replacement text flavors. Refer to “Many Flavors of Regular Expressions” on page 2 for more details on the regular expression flavors that correspond with the replacement text flavors:
Perl
Perl has built-in support for regular expression substitution via the s/regex/ replace/ operator. The Perl replacement text flavor corresponds with the Perl regular expression flavor. This book covers Perl 5.6 to Perl 5.10. The latter version adds support for named backreferences in the replacement text, as it adds named capture to the regular expression syntax.
PHP
In this book, the PHP replacement text flavor refers to the preg_replace function in PHP. This function uses the PCRE regular expression flavor and the PHP replacement text flavor.
Other programming languages that use PCRE do not use the same replacement text flavor as PHP. Depending on where the designers of your programming language got their inspiration, the replacement text syntax may be similar to PHP or any of the other replacement text flavors in this book. PHP also has an ereg_replace function. This function uses a different regular expression flavor (POSIX ERE), and a different replacement text flavor, too. PHP’s ereg functions are not discussed in this book.
.NET
The System.Text.RegularExpressions package provides various searchand- replace functions. The .NET replacement text flavor corresponds with the .NET regular expression flavor. All versions of .NET use the same replacement text flavor. The new regular expression features in .NET 2.0 do not affect the replacement text syntax.
Java
The java.util.regex package has built-in search-and-replace functions. This book covers Java 4, 5, and 6. All use the same replacement text syntax.
JavaScript
In this book, we use the term JavaScript to indicate both the replacement text flavor and the regular expression flavor defined in Edition 3 of the ECMA-262 standard.
Python
Python’s re module provides a sub function to search-and-replace. The Python replacement text flavor corresponds with the Python regular expression flavor. This book covers Python 2.4 and 2.5. Python’s regex support has been stable for many years.
Ruby
Ruby’s regular expression support is part of the Ruby language itself, including the search-and-replace function. This book covers Ruby 1.8 and 1.9. A default compilation of Ruby 1.8 uses the regular expression flavor provided directly by the Ruby source code, whereas a default compilation of Ruby 1.9 uses the Oniguruma regular expression library. Ruby 1.8 can be compiled to use Oniguruma, and Ruby 1.9 can be compiled to use the older Ruby regex flavor. In this book, we denote the native Ruby flavor as Ruby 1.8, and the Oniguruma flavor as Ruby 1.9. The replacement text syntax for Ruby 1.8 and 1.9 is the same, except that Ruby 1.9 adds support for named backreferences in the replacement text. Named capture is a new feature in Ruby 1.9 regular expressions.
Very useful and clear
What I was looking for was a book that would teach regular expressions while giving concrete examples of real life use cases that I could immediately put to work. This book is filled with them.
Chapters one and two lay the foundation by covering the basics of what regular expressions are, using them to search and replace, match text, and other basic skills. This is good, but where the book really sets itself apart is in chapters three through eight, which are overflowing with useful recipes for things like validating ISBNs, finding URLs within text, stripping leading zeros or matching IP addresses (IPv4 and IPv6). The book has an obvious organization scheme, a ton of useful recipes, and a useful index. Finding what you want or need is very easy to do, and unless your needs are especially unique or esoteric, you will probably discover exactly what you require in the book.
The best part of the book is that every example uses a clear format that sets the stage for an easy discovery of needed information.
First, a problem is stated, such as in chapter four's item, 4.1 Validate Email Addresses, which says, "You have a form on your website or a dialog box in your application that asks the user for an email address. You want to use a regular expression to validate this email address before trying to send email to it. This reduces the number of emails returned to you as undeliverable."
Next, a solution is defined, with code examples, accompanied by a description of the particular details that are vital to comprehend when implementing the solution. Next, each recipe has a section for further discussion that leads to a deeper understanding of the regular expression being used and the context in which it is being used.
Especially wonderful is that every recipe has very specific and clear code examples for use with Perl, PCRE (the "Perl Compatible Regular Expressions" library for C, which isn't identical to Perl's use of regular expressions, even though it tries), .NET, Java, JavaScript, Python, PHP, and Ruby with notes on which specific release versions or variations of each are covered. When differences exist in the implementation in these environments, those differences are clearly noted and discussed. This feature will make life much easier for people who need to use regular expressions in more than one language context and is a feature of the book I appreciate greatly.
The other regex book on my shelf will remain there until that mystical moment "when I have time to study it." This book will be used regularly as a reference.
Regular Expressions Can be complicated
Now that I have the bad part out of the way, the good part is that it gives you a tremendous amount of examples. So if you are looking for a book that gives you the answer on a specific expression this is the book for you.
Learn how to format input, manage lines, find solutions for common markups and paths, and more
For Only $30 It'll Pay For Itself In One Use
From what I've seen the examples and explanations are clearly written, and the fact that they show - and explain - solutions for Perl, .NET, Java, JavaScript, Python and Ruby makes this book too good to pass up.
Thank you and bravo to Steven Levithan and Jan Goyvaerts
I don't have enough stars to accurately give toward this book.
Thank you and bravo to Steven Levithan and Jan Goyvaerts



