Lecture#9
Chapter 7 & 8 - Regular Expressions


A regular expression in Perl deals with the concept of pattern matching. For example, you can use a regular expression to tell if a string contains or doesn't contain a certain pattern. Regular expressions in Perl are enclosed in //s.

Example:
$_="Howare you"; #This string has a tab character in it
if (/are\t/) {
  print "We have a match!\n";
}

Metacharacters and Quantifiers
.   Any single character except newline

Example: /Al.n/ matches Alan, but doesn't match Allan because in ALLan, there's more than 1 character between the first l and the n. To match a period itself, use \. /3\.14/ matches "3.14159".

*   Matches the previous character 0 or more times
Example: /Al*n/ doesn't match "Alan", but it does match "An" and "Aln" and "Alllllln"

+   Matches the previous item 1 or more times
Example: /Al+an/ would match Alan, Allan, Alllan, but not Aan or Allln

?   Preceeding item is optional (also makes * non-greedy)
Example: /Ala?n/ would match Alan or Aln only

( )   Grouping
Example: /(Alan)+/ matches Alan, AlanAlan, AlanAlanAlan, etc....

|   Alternatives
Example: /Alan (and|or) Bob/ matches Alan and Bob OR Alan or Bob

Character Classes
A character class is a list of 'possible' characters inside square brackets []
Example: /[0-9]+/ matches strings with all numbers

^ inside the brackets negates whatever's inside
Example /[^abc]/ matches anything except a,b, c

Some character classes are used so often, they have shortcuts
\d     digit [0-9]
\w     'word' character [A-Za-z0-9_]
\s     whitespace [\f\t\n\r ]
[^\d], [^\w], [^\s] - Negating the shortcuts

Example: Come up with a pattern that matches 2 lowercase words separated by 3 digits
Answer: /[a-z]+\d\d\d[a-z]+/

General Quantifiers
/a{3,5}/ - Matches aaa,aaaa,aaaaa
/a{3,}/ - Matches 3 or more a's, equivalent to /aaa+/ and /aaaa*/
/a{3}/ - Matches aaa
/(alan){3}/ - Matches alanalanalan

Anchors
^ - Match at the beginning of a string
$ - Match at the ending of a string
\b - Word boundary
\B - Not a word boundary

/^Alan/ - Matches Alan was here, but not Here was Alan
/Alan$/ - Matches Here was Alan, but not Alan was here
/\bbob\b/ - Matches bob, not bobby or joebob
/\Bbob\B/ - Matches bob in the middle of a word only, like joebobbill

Memory Parentheses
Whenever you use parentheses, they are automatically used as memory parentheses. For example....

/(Alan)/ - Matches strings containing Alan and stores the reference in \1. So, we can use it like this.....
/(TAG1|TAG2).*\1/ - Matches TAG1...sometext...TAG1 and TAG2...sometext...TAG2

Precedence ()
Quantifiers - *, +, ?, {}
Anchors, Sequence - ^,$,\b,\B
Alternatives - |


CSC255 - Alan Watkins - North Carolina State University