Basic PerlQuotes from Larry Wall (the creator of Perl): http://en.wikiquote.org/wiki/Larry_Wall (It's sorta like sed, but not. It's sorta like awk, but not. etc.) Guilty as charged. Perl is happily ugly, and happily derivative. -- Larry Wall Documentation: http://perldoc.perl.org/ Eveyone needs to know how to do this in a language. The square box represents the contents of a file, let's pretend it's called myprog.pl.
In perl we use the "." to concatenate strings. We use ".=" to concatenate in place. It's like +=, or *= in C. Oh, and perl also supports +=, *=, etc.
Use the single quote if you want a literal string. Probably you didn't want it in this case because now you see the \n literally and get no new line. 10 x Hello, world So here's a little more elaboration. Notice that we have implicitly created variables, and that variable names begin with $. We also have a basic for loop.
Unlike other languages, the curly braces are required here. Notice that the variable $i is expanded even though it's inside the string.
In this variation we turn off implicit variable declaration by means of "use strict". We also use a comma delimited list of args for print. We don't have to expand inside a string.
The 0..5 creates an array. More on that in bit. Hello, world with Arrays
Array variables in perl are prefixed with @ rather than $. Individual elements of @i can be accessesd using $i[ ... ]. Hello, world with Arrays, v2
So this only assigns to the even elements of the array. When you print this out, you have blank lines between each element as $i[1] == "". Notice that the "if" statement goes at the end of the line. Hello, world with arrays, v3
This version gets rid of the blank lines in the output. Instead of using $i as the index of the array when assigning, we use $#i+1. In perl, $#i is a magic variable which evaluates to the last index of array i that has an element. Thus, assigning to $#i+1 appends to the array. Before you assign anything to an array $#i has a value of negative 1. Hello, world with arrays, v4
This produces exactly the same output as the last example, except that join is not used. Instead we introduce a new flow control, the "foreach." Notice also we've changed the way the "if" works. If you want to use curly brackets with your "if", then the "if" goes first and not last.
Remember the 0..5 trick and how I said it created an array? Here we print out the array using join. Hello, world with arrays, v5
Here we've introduced the "elsif" control flow element. It works just like you'd expect. We've also introduced an "else" control flow element. It's block contains nothing but a comment. Other Flow Control
Here we introduce the three basic loop control mechanisms.
Oh, yeah, and comments are started by # and go to the end of the line. This should be familiar to shell programmers.
Up until now, our if statements have always used curly braces. You can leave the curly braces off if you want to, but then the order of things gets reversed. This is supposed to make more sense for an English major, e.g. "Go to the next line if $i is equal to 3."
Parsing text
So here is the basics of pattern matching. We want to find out if the string "Hello" occurs in our pattern of text. So we use the matching operator "=~". If it works, then the if block executes and $& is filled in with the part of the string that matched. The pattern is enclosed between the /'s, and it is called a regular expression. There is a lot to learn about pattern matching, but the first thing is that unescaped letters and numbers are literal, as are escaped special characters. "ABC\*\(" is a literal pattern. Parsing text, v2
I've changed a bunch of things on you here. I used the "default variable" ($_) to hold the text string rather than $txt. If we do this, then use of "=~" is not required. If we don't specify, perl knows we are using the default. I also introduced the ignore case flag on the pattern match (notice the lower case "i" after the pattern). I also changed my pattern from "Hello" to "hello" just to let the ignore case flag do something. Parsing text, v3
The new stuff is coming really fast now....
In case you hadn't figured out what this code does, it prints: "Hellowold" Parsing text, v4
The only real thing I changed was adding the "+" pattern. The "+" pattern means, match 1 or more of the preceeding pattern. There are a couple of other similar patterns that can be used to describe a number of times the preceeding element needs to match.
Parsing text, v5
The star pattern matches 0 or more. There are quite a few cases above where it matches zero times. Parsing text, v6
When this code runs it produces the output " [Hell,o] [wo,rld]". This means we found two matches. You should notice that I have added ()'s to the patterns. These create what are called "capture groups" or "backreferences." These are logical subsets of patterns which can be retrieved separately after a successful pattern match. The variables $1, and $2 in the above example accomplish this. Another thing that you should notice is the way in which the match happened. Why did we find $1 = "Hell" and $2 = "o"? Why didn't we find $1 = "H" and $2 = "ello"? The answer is that the "+" expression are greedy (as our its cousins in the above table). Each "+" operator grabs as many of its pattern elements as it can, and they grab them in order. Thus, $1 takes as many as it can and then $2 takes whatever it can of what's left over. Parsing text, v7
This pattern results in the following output: " [Hello,, ] [world,
bad name: [br] The "^" at the start of the interior of the square bracket negates it. Thus, the pattern "[^a-z]" matches any character that is not a letter. This includes punctuation and whitespace. Parsing text, v8
The above code produces the output "[Hell,o] [worl,d]" The only thing I've changed is that I've used "\w", which is a shorthand for "[a-zA-Z0-9_]". Other shorthands...
You can actually mix and match these "[\d\s]" matches any number or whitespace, "[\da-f]" matches a hex digit. Parsing text, v9
Really, there's nothing new here -- except that I've nested the backreferences. I wanted you to see what happened. Parsing text, v10
Notice the ? after the +. This turns off the greediness. Parsing text, v11
The \b matches on a word boundary. Parsing text, v12
The . matches anything but the \n character -- unless you turn on the s flag. Then it matches anything. Parsing text, v13
The $ matches the end of a string. However, if you turn on the m flag then it can also match on the end of a line. Parsing text, v14
The ^ matches the start of a string. However, if you turn on the m flag then it can also match on the start of a line. Other Regex's
Zero-width what? How about an example? OK.
Hashes
We've put several things together. We parse out name equal value pairs using backreferences. We take these name value pairs and store them in what is called a hash. This object is like an array, except that it is indexed by strings rather than numbers. The builtin function "keys" returns an array containing the keys to the hash. The notation $h{...} can be used for accessing an element of a hash, and it will expand inside a double-quoted string, just like scalar variables or members of an array. More Hashes
Here we've introduced the each to show you how you can extract name/value pairs from a hash in a nifty way. This shows you how you can initialize a hash. Actually, the thing in parens is really a list, and this would actually work just the same if we put the keys in quotes and replaced the =>'s with ,'s. More Hashes, v2
See what I mean? More Lists
Since I showed you how to initialize a hash, I thought you'd want to see how to initialize a list Testing and deleting from hashes
Subroutines
Here I've demonstrated how to write a subroutine. The arguments to the subroutine come packaged up in the @_ array. We can pop off the first arg by using the shift operator, but we don't need to. You may also notice that I've used the "printf" function that is familiar from the C programming language. Everyone loves printf. Even Java has it now. :) Reading files
That funny thing that looks like a variable but has no $, @, or % in front of it is a filehandle. It's kind of an ugly thing because if you run the perl with the -w flag (print warnings), it warns that any filehandle name you pick might conflict with a future keyword. Fortunately, there's an alternative. Reading files, v2
This makes perl -w happy :) Reading files, v3
The perl slogan is: There's more than one way to do it Reading files, v4
Chomp removes the carriage return, line feed, etc. from the end of the line. To get the same output, we have to supply it explicitly. Reading files, v5
Now we just print the user names from /etc/group. Split creates an array from a string, and it takes a regular expression as a delimiter. Watch out for putting ()'s in your regex tho! Slurp!
Context advantage
The variable $/ is a magic variable that tells perl what the line delimiter is for reading files. If you set it to the undefined value, then you slurp in the entire file at once. Writing a file
So you do something like a unix shell if you want to write. Reading from a pipe
Just what you'd expect! Reading a directory
What could be easier? I'll show you... Reading a directory, v2
That does the same thing. The part in the angle brackets is called a glob, and that's a shell-like file-matching regular expression. Counting words
Up until now we have not really talked much about the command line. It hasn't mattered. In this example we are going to need it, we are going to count the words in a file. Notice the <> operator inside the outer while loop. This fetches a line of text from the files listed on the command line, in sequence, and stores it in the default variable. Notice the string "\L$&". The "\L" is a special control character that affects the evaluation of a double quote string, turning everything that follows to lower case. There are a few other controls like this, "\U" for upper case, "\l" to lowercase the next single character, "\u" to uppercase the next single character, \E to end any automatic upper/lower casing. The ++ operator will provide a value of "1" for the hash if no prior value existed. We could have just assigned one here, but using ++ is convenient. Counting words, v2
This example is almost identical to the previous one, except that we are reading standard in, not from the files supplied on the command line. So we have to invoke it a little differently. Counting words, v3
This example is almost identical to the previous one, except that we slurp the whole file into an array before parsing it. Part of the magic of perl is that it does things based
on context. Because " Context trap
Perl figures out contexts, and it doesn't always require parens for methods -- but as the above example shows this can occaisionally cause confusion. The first print statement prints an 8 with no carriage return. Counting words, v4
In this example we identify the most common words in our input text. The key step is that we need to invoke the sort function supplying a sorting method of our own creation. When writing a sort, the variables "$a" and "$b" are magical. They are two values which we must compare. Our subroutine should return a positive integer if $b > $a, a negative integer if $a > $b, and a zero if they are equal We have also included the "and" operator inside our for loop. Counting words, v5
This example is exactly like the previous one, but we don't allow perl to help us with all its magic. We use the special variable @ARGV to look at the command line arguments. We use the "-r" operator to test if a given argument corresponds to a readable file. This operator is familiar to shell programmers, and all the same variants exist. You can use -w to test for a writable file, -x to test for an executable file, -d to test for a directory, and -e to test for existence. In general, if you think something isn't in Perl, try it out, because it usually is. :-) -- Larry Wall We use FileHandle to explicitly open, read, and close a file. Substituting text -- fixing capitalization
The only new thing here is the substitution operation on the second line of the program. It looks similar to the match operation we have seen previously, except it starts with an "s" and has a second argument of sorts, the value to be subtituted. The substituted value is essentially the same as the value that is matched, except for capitalization. We use \u and \L to transform the substituted string to a word whose first letter is a capital, and whose remaining letters are lower case. Substituting text -- doing math
The "e" flag to the text substitution operation tells us that the substituted value is not a string, but an actual code. We could have just evaluated the whole string, but I wasn't really showing you how to use the eval method. I was showing you the "e" flag :) Just Plain Magical Weridness
So the ++ operator does something interesting when applied to text: it gives you the next letter. Isn't that fun? Also, there's this neat string operator called "x" that allows you to print something x times. So in our example, ++ changes the string "p" to "q" and then prints it 5 times on a line. Evaluate just one line perl -e 'print "Hello!\n";' Transforming files in-place is as easy as pie perl -p -i -e 's/foo/bar/g' *.c This would replace foo with bar in all c source files. It would not, however, make a backup copy. You need to do this to store a backup in a file named .bak. perl -p -i.bak -e 's/foo/bar/g' *.c You can use perl like sed: echo foo and foo | perl -p -e 's/foo/bar/g' |