Basic Perl

Quotes from Larry Wall (the creator of Perl): http://en.wikiquote.org/wiki/Larry_Wall

(It's sorta like sed, but not. It's sorta like awk, but not. etc.) Guilty as charged. Perl is happily ugly, and happily derivative. -- Larry Wall

Documentation: http://perldoc.perl.org/

Eveyone needs to know how to do this in a language. The square box represents the contents of a file, let's pretend it's called myprog.pl.

hello1.pl
1	print "Hello, world\n";
$ perl ./hello1.pl Hello, world

All the examples should be assumed to work this same way unless you are told otherwise.

hello2.pl
1	print "Hello" . "world" . "\n";
$ perl ./hello2.pl Helloworld

In perl we use the "." to concatenate strings. We use ".=" to concatenate in place. It's like +=, or *= in C. Oh, and perl also supports +=, *=, etc.

hello3.pl
1	print 'Hello, world\n';
$ perl ./hello3.pl Hello, world\n

Use the single quote if you want a literal string. Probably you didn't want it in this case because now you see the \n literally and get no new line.

10 x Hello, world

So here's a little more elaboration. Notice that we have implicitly created variables, and that variable names begin with $. We also have a basic for loop.

hello4.pl
1	for($i=0;$i<5;$i++) {
2	print "$i) Hello, world\n";
3	}
$ perl ./hello4.pl 0) Hello, world 1) Hello, world 2) Hello, world 3) Hello, world 4) Hello, world

Unlike other languages, the curly braces are required here. Notice that the variable $i is expanded even though it's inside the string.

hello5.pl
1	use strict;
2
3	for(my $i=0;$i<5;$i++) {
4	print $i,") Hello, world\n";
5	}
$ perl ./hello5.pl 0) Hello, world 1) Hello, world 2) Hello, world 3) Hello, world 4) Hello, world

In this variation we turn off implicit variable declaration by means of "use strict". We also use a comma delimited list of args for print. We don't have to expand inside a string.

hello6.pl
1	for $i (0..5) {
2	print $i,") Hello, world\n";
3	}
$ perl ./hello6.pl 0) Hello, world 1) Hello, world 2) Hello, world 3) Hello, world 4) Hello, world 5) Hello, world

The 0..5 creates an array. More on that in bit.

Hello, world with Arrays

hello7.pl
1	for($i=0;$i<5;$i++) {
2	$i[$i] = "($i) Hello, world";
3	}
4	print join("\n",@i),"\n";
$ perl ./hello7.pl (0) Hello, world (1) Hello, world (2) Hello, world (3) Hello, world (4) Hello, world

Array variables in perl are prefixed with @ rather than $. Individual elements of @i can be accessesd using $i[ ... ].

Hello, world with Arrays, v2

hello8.pl
1	use strict;
2	my @i;
3	for(my $i=0;$i<5;$i++) {
4	$i[$i] = "($i) Hello, world" if($i % 2 == 0);
5	}
6	print join("\n",@i),"\n";
$ perl ./hello8.pl (0) Hello, world (2) Hello, world (4) Hello, world

So this only assigns to the even elements of the array. When you print this out, you have blank lines between each element as $i[1] == "".

Notice that the "if" statement goes at the end of the line.

Hello, world with arrays, v3

hello9.pl
1	use strict;
2	my @i;
3	for(my $i=0;$i<10;$i++) {
4	$i[$#i+1] = "($i) Hello, world" if($i % 2 == 0);
5	}
6	print join("\n",@i),"\n";
$ perl ./hello9.pl (0) Hello, world (2) Hello, world (4) Hello, world (6) Hello, world (8) Hello, world

This version gets rid of the blank lines in the output. Instead of using $i as the index of the array when assigning, we use $#i+1. In perl, $#i is a magic variable which evaluates to the last index of array i that has an element. Thus, assigning to $#i+1 appends to the array. Before you assign anything to an array $#i has a value of negative 1.

Hello, world with arrays, v4

hello9_1.pl
1	use strict;
2	my @i;
3	for(my $i=0;$i<10;$i++) {
4	if($i % 2 == 0) {
5	$i[$#i+1] = "($i) Hello, world";
6	}
7	}
8	foreach my $i (@i) {
9	print $i,"\n";
10	}
$ perl ./hello9_1.pl (0) Hello, world (2) Hello, world (4) Hello, world (6) Hello, world (8) Hello, world

This produces exactly the same output as the last example, except that join is not used. Instead we introduce a new flow control, the "foreach."

Notice also we've changed the way the "if" works. If you want to use curly brackets with your "if", then the "if" goes first and not last.

numarr.pl
1	print join(", ",0..5),"\n";
$ perl ./numarr.pl 0, 1, 2, 3, 4, 5

Remember the 0..5 trick and how I said it created an array? Here we print out the array using join.

Hello, world with arrays, v5

hello9_2.pl
1	use strict;
2	my @i;
3	for(my $i=0;$i<10;$i++) {
4	if($i % 2 == 0) {
5	$i[$#i+1] = "($i) Hello, world";
6	} elsif($i > 5) {
7	$i[$#i+1] = "[$i] Hello, world";
8	} else {
9	; # do nothing
10	}
11	}
12	foreach my $i (@i) {
13	print $i,"\n";
14	}
$ perl ./hello9_2.pl (0) Hello, world (2) Hello, world (4) Hello, world (6) Hello, world [7] Hello, world (8) Hello, world [9] Hello, world

Here we've introduced the "elsif" control flow element. It works just like you'd expect.

We've also introduced an "else" control flow element. It's block contains nothing but a comment.

Other Flow Control

control.pl
1	$j=0;
2	for($i=0;$i<10;$i++) {
3	if($i == 3) {
4	next; # next is like C's continue
5	}
6	if($i == 6) {
7	last; # last is like C's break
8	}
9	if($j < $i) {
10	$j++;
11	redo; # redoes this iteration of the loop!
12	}
13	print "$i, $j\n";
14	}
$ perl ./control.pl 0, 0 1, 1 2, 2 4, 4 5, 5

Here we introduce the three basic loop control mechanisms.

We use "last" to break out of a loop. It means, make this the last iteration.
We use "next" to skip to the next iteration of a loop.
We can use "redo" to do this iteration over.

Oh, yeah, and comments are started by # and go to the end of the line. This should be familiar to shell programmers.

control2.pl
1	my $j=0;
2	for(my $i=0;$i<10;$i++) {
3	next if($i == 3);
4	last if($i == 6);
5	if($j < $i) {
6	$j++;
7	redo; # redoes this iteration of the loop!
8	}
9	print "$i, $j\n";
10	}
$ perl ./control2.pl 0, 0 1, 1 2, 2 4, 4 5, 5

Up until now, our if statements have always used curly braces. You can leave the curly braces off if you want to, but then the order of things gets reversed. This is supposed to make more sense for an English major, e.g. "Go to the next line if $i is equal to 3."

while.pl
1	my $i=0;
2	while($i < 3) {
3	print $i,"\n" unless($i == 2);
4	$i++;
5	}
6	until($i == 0) {
7	print $i,"\n";
8	$i--;
9	}
$ perl ./while.pl 0 1 3 2 1

Parsing text

parse1.pl
1	$txt = "Hello, world\n";
2	if($txt =~ /Hello/) {
3	print $&,"\n";
4	}
$ perl ./parse1.pl Hello

So here is the basics of pattern matching. We want to find out if the string "Hello" occurs in our pattern of text. So we use the matching operator "=~". If it works, then the if block executes and $& is filled in with the part of the string that matched.

The pattern is enclosed between the /'s, and it is called a regular expression.

There is a lot to learn about pattern matching, but the first thing is that unescaped letters and numbers are literal, as are escaped special characters. "ABC\*\(" is a literal pattern.

Parsing text, v2

parse2.pl
1	$_ = "Hello, world\n";
2	if(/hello/i) {
3	print $&,"\n";
4	}
$ perl ./parse2.pl Hello

I've changed a bunch of things on you here. I used the "default variable" ($_) to hold the text string rather than $txt. If we do this, then use of "=~" is not required. If we don't specify, perl knows we are using the default.

I also introduced the ignore case flag on the pattern match (notice the lower case "i" after the pattern). I also changed my pattern from "Hello" to "hello" just to let the ignore case flag do something.

Parsing text, v3

parse3.pl
1	$_ = "Hello, world\n";
2	while(/[a-ow]/ig) {
3	print $&;
4	}
5	print "\n";
$ perl ./parse3.pl Hellowold

The new stuff is coming really fast now....

The first change is that I used the pattern "[a-ow]". This matches any characters in the range a-o, as well as the letter w. I've introduced the while loop. Nothing too exciting there. I also put the match inside a while loop, and applied the g flag to the pattern. Without the g flag, this loop would be endless. We would just repeatedly match on the first letter of the pattern. The g flag causes us to search the whole string for patterns.

In case you hadn't figured out what this code does, it prints: "Hellowold"

Parsing text, v4

parse4.pl
1	$_ = "Hello, world\n";
2	while(/[a-ow]+/ig) {
3	print "+",$&;
4	}
5	print "\n";
$ perl ./parse4.pl +Hello+wo+ld

The only real thing I changed was adding the "+" pattern. The "+" pattern means, match 1 or more of the preceeding pattern. There are a couple of other similar patterns that can be used to describe a number of times the preceeding element needs to match.

pattern	min	max
*	0	infinity
+	1	infinity
?	0	1
{3,6}	3	6
{2,}	2	infinity

Parsing text, v5

parse5.pl
1	$_ = "Hello, world\n";
2	while(/[a-ow]*/ig) {
3	print "+",$&;
4	}
5	print "\n";
$ perl ./parse5.pl +Hello+++wo++ld++

The star pattern matches 0 or more. There are quite a few cases above where it matches zero times.

Parsing text, v6

parse6.pl
1	$_ = "Hello, world\n";
2	while(/([a-ow]+)([a-z]+)/ig) {
3	print " [",$1,",",$2,"]";
4	}
5	print "\n";
$ perl ./parse6.pl [Hell,o] [wo,rld]

When this code runs it produces the output " [Hell,o] [wo,rld]". This means we found two matches.

You should notice that I have added ()'s to the patterns. These create what are called "capture groups" or "backreferences." These are logical subsets of patterns which can be retrieved separately after a successful pattern match. The variables $1, and $2 in the above example accomplish this.

Another thing that you should notice is the way in which the match happened. Why did we find $1 = "Hell" and $2 = "o"? Why didn't we find $1 = "H" and $2 = "ello"? The answer is that the "+" expression are greedy (as our its cousins in the above table). Each "+" operator grabs as many of its pattern elements as it can, and they grab them in order. Thus, $1 takes as many as it can and then $2 takes whatever it can of what's left over.

Parsing text, v7

parse7.pl
1	$_ = "Hello, world\n";
2	while(/([a-z]+)([^a-z]+)/ig) {
3	print " [",$1,",",$2,"]";
4	}
5	print "\n";
$ perl ./parse7.pl [Hello,, ] [world, ]

This pattern results in the following output: " [Hello,, ] [world, bad name: [br]
]

The "^" at the start of the interior of the square bracket negates it. Thus, the pattern "[^a-z]" matches any character that is not a letter. This includes punctuation and whitespace.

Parsing text, v8

parse8.pl
1	$_ = "Hello, world\n";
2	while(/(\w+)(\W+)/ig) {
3	print " [",$1,",",$2,"]";
4	}
5	print "\n";
$ perl ./parse8.pl [Hello,, ] [world, ]

The above code produces the output "[Hell,o] [worl,d]"

The only thing I've changed is that I've used "\w", which is a shorthand for "[a-zA-Z0-9_]".

Other shorthands...

Pattern	Synonym
\w	[a-zA-Z0-9_]
\W	[^a-zA-Z0-9_]
\d	[0-9]
\D	[^0-9]
\s	[ \t\b\r\n]
\S	[^ \t\b\r\n]

You can actually mix and match these "[\d\s]" matches any number or whitespace, "[\da-f]" matches a hex digit.

Parsing text, v9

parse9.pl
1	$_ = "Hello, world\n";
2	while(/(\w+(\W+))+/ig) {
3	print " &=($&)\n1=($1)\n2=($2)\n";
4	}
5	print "\n";
$ perl ./parse9.pl &=(Hello, world ) 1=(world ) 2=( )

Really, there's nothing new here -- except that I've nested the backreferences. I wanted you to see what happened.

Parsing text, v10

parse10.pl
1	$_ = "Hello, world\n";
2	while(/(\w+?)(\w+?)/ig) {
3	print "1=$1, 2=$2\n";
4	}
$ perl ./parse10.pl 1=H, 2=e 1=l, 2=l 1=w, 2=o 1=r, 2=l

Notice the ? after the +. This turns off the greediness.

Parsing text, v11

parse11.pl
1	$_ = "Hello, world\n";
2	/\w\w\b/;
3	print $&,"\n";
4	/\b\w\w/;
5	print $&,"\n";
$ perl ./parse11.pl lo He

The \b matches on a word boundary.

Parsing text, v12

parse12.pl
1	$x = "a line\nanother line\n";
2	$x =~ /.*/;
3	print $&,"\n";
4	$x =~ /.*/s;
5	print $&;
$ perl ./parse12.pl a line a line another line

The . matches anything but the \n character -- unless you turn on the s flag. Then it matches anything.

Parsing text, v13

parse13.pl
1	$x = "a line\nanother line\n";
2	$x =~ /.*$/;
3	print $&,"\n";
4	$x =~ /.*$/m;
5	print $&;
$ perl ./parse13.pl another line a line

The $ matches the end of a string. However, if you turn on the m flag then it can also match on the end of a line.

Parsing text, v14

parse14.pl
1	$x = "a line\nanother line\n";
2	print "not found\n" unless($x =~ /^an.*/);
3	$x =~ /^an.*/m;
4	print $&;
$ perl ./parse14.pl not found another line

The ^ matches the start of a string. However, if you turn on the m flag then it can also match on the start of a line.

Other Regex's

(?: ... )	Like ( ... ), but does not fill in $1, etc.
(?= ... )	Zero-width lookahead assertion
(?! ... )	Negated ero-width lookahead assertion

Zero-width what? How about an example? OK.

parse15.pl
1	$x = "foobar";
2	print "found 0: $1\n" if($x =~ /(?:f)(o)/);
3	print "found 1: $&\n" if($x =~ /foo/);
4	print "found 2: $&\n" if($x =~ /foo(?=bar)/);
5	print "found 3: $&\n" if($x =~ /foo(?!bar)/);
6	print "found 4: $&\n" if($x =~ /foo(?=baz)/);
7	print "found 5: $&\n" if($x =~ /foo(?!baz)/);
$ perl ./parse15.pl found 0: o found 1: foo found 2: foo found 5: foo

Hashes

hash1.pl
1	use strict;
2	my %h;
3	$_ = "height=5, girth=27, mass=1946, age=88";
4	while(/(\w+)=(\d+)/ig) {
5	$h{$1}=$2;
6	}
7	foreach my $k (keys %h) {
8	print "$k -> $h{$k}\n";
9	}
$ perl ./hash1.pl mass -> 1946 girth -> 27 age -> 88 height -> 5

We've put several things together. We parse out name equal value pairs using backreferences. We take these name value pairs and store them in what is called a hash. This object is like an array, except that it is indexed by strings rather than numbers.

The builtin function "keys" returns an array containing the keys to the hash. The notation $h{...} can be used for accessing an element of a hash, and it will expand inside a double-quoted string, just like scalar variables or members of an array.

More Hashes

hash2.pl
1	my %x = (a => "b", c => "d", e => "f" );
2	while(($k,$v) = each %x) {
3	print $k,"=",$v,"\n";
4	}
$ perl ./hash2.pl e=f c=d a=b

Here we've introduced the each to show you how you can extract name/value pairs from a hash in a nifty way.

This shows you how you can initialize a hash. Actually, the thing in parens is really a list, and this would actually work just the same if we put the keys in quotes and replaced the =>'s with ,'s.

More Hashes, v2

hash3.pl
1	my %x = ("a", "b", "c", "d", "e", "f" );
2	while(($k,$v) = each %x) {
3	print $k,"=",$v,"\n";
4	}
$ perl ./hash3.pl e=f c=d a=b

See what I mean?

More Lists

list1.pl
1	my @x = ("a","b","c","d",e => "f");
2	foreach $x (@x) {
3	print $x,"\n";
4	}
$ perl ./list1.pl a b c d e f

Since I showed you how to initialize a hash, I thought you'd want to see how to initialize a list

Testing and deleting from hashes

delhash.pl
1	$h{a}++; # this is the same as $h{"a"}++
2	print "found\n" if(defined($h{a}));
3	delete($h{a});
4	print "not found\n" unless(defined($h{a}));
$ perl ./delhash.pl found not found

Subroutines

sub.pl
1	foo("a",1,3.2);
2
3	sub foo {
4	my $a = shift;
5	my @b = @_;
6	printf("%s, %d, %f\n",$a,$b[0],$b[1]);
7	}
$ perl ./sub.pl a, 1, 3.200000

Here I've demonstrated how to write a subroutine. The arguments to the subroutine come packaged up in the @_ array. We can pop off the first arg by using the shift operator, but we don't need to.

You may also notice that I've used the "printf" function that is familiar from the C programming language. Everyone loves printf. Even Java has it now. :)

Reading files

read.pl
1	open(fd,"/etc/group");
2	while(<fd>) {
3	print;
4	last if($n++ > 5);
5	}
6	close(fd);
$ perl ./read.pl root:x:0:root bin:x:1:root,bin,daemon daemon:x:2:root,bin,daemon sys:x:3:root,bin,adm adm:x:4:root,adm,daemon tty:x:5: disk:x:6:root

That funny thing that looks like a variable but has no $, @, or % in front of it is a filehandle. It's kind of an ugly thing because if you run the perl with the -w flag (print warnings), it warns that any filehandle name you pick might conflict with a future keyword. Fortunately, there's an alternative.

Reading files, v2

read2.pl
1	use FileHandle;
2	my $fd = new FileHandle;
3	open($fd,"/etc/group");
4	while(<$fd>) {
5	print;
6	last if($n++ > 5);
7	}
8	close($fd);
$ perl ./read2.pl root:x:0:root bin:x:1:root,bin,daemon daemon:x:2:root,bin,daemon sys:x:3:root,bin,adm adm:x:4:root,adm,daemon tty:x:5: disk:x:6:root

This makes perl -w happy :)

Reading files, v3

read3.pl
1	use FileHandle;
2	my $fd = new FileHandle("/etc/group","r");
3	while(<$fd>) {
4	print;
5	last if($n++ > 5);
6	}
7	close($fd);
$ perl ./read3.pl root:x:0:root bin:x:1:root,bin,daemon daemon:x:2:root,bin,daemon sys:x:3:root,bin,adm adm:x:4:root,adm,daemon tty:x:5: disk:x:6:root

The perl slogan is: There's more than one way to do it

Reading files, v4

read4.pl
1	use FileHandle;
2	my $fd = new FileHandle("/etc/group","r");
3	while(<$fd>) {
4	chomp;
5	print $_,"\n";
6	last if($n++ > 5);
7	}
8	close($fd);
$ perl ./read4.pl root:x:0:root bin:x:1:root,bin,daemon daemon:x:2:root,bin,daemon sys:x:3:root,bin,adm adm:x:4:root,adm,daemon tty:x:5: disk:x:6:root

Chomp removes the carriage return, line feed, etc. from the end of the line. To get the same output, we have to supply it explicitly.

Reading files, v5

read5.pl
1	use FileHandle;
2	my $fd = new FileHandle("/etc/group","r");
3	while(<$fd>) {
4	chomp;
5	my @a = split(/:/,$_);
6	print $a[0],"\n";
7	last if($n++ > 5);
8	}
9	close($fd);
$ perl ./read5.pl root bin daemon sys adm tty disk

Now we just print the user names from /etc/group. Split creates an array from a string, and it takes a regular expression as a delimiter. Watch out for putting ()'s in your regex tho!

Slurp!

read7.pl
1	use FileHandle;
2	$/ = undef;
3	my $fd = new FileHandle("/etc/group","r");
4	my $c = <$fd>;
5	print length($c),"\n";
$ perl ./read7.pl 868

Context advantage

read8.pl
1	use FileHandle;
2	my $fd = new FileHandle("/etc/group","r");
3	my @c = <$fd>; # read all the lines in the file!
4	print $#c,"\n";
$ perl ./read8.pl 61

The variable $/ is a magic variable that tells perl what the line delimiter is for reading files. If you set it to the undefined value, then you slurp in the entire file at once.

Writing a file

write.pl
1	use FileHandle;
2	my $fw = new FileHandle;
3	open($fw,">/tmp/myfile.txt");
4	print $fw "Hello, world\n";
5	close($fw);
$ perl ./write.pl

So you do something like a unix shell if you want to write.

Reading from a pipe

pipe.pl
1	use FileHandle;
2	my $p = new FileHandle;
3	open($p,"ps -e\|");
4	while(<$p>) {
5	print;
6	last if($n++ > 5);
7	}
8	close($p);
$ perl ./pipe.pl PID TTY TIME CMD 1 ? 00:00:00 init 2 ? 00:00:00 kthreadd 3 ? 00:00:00 migration/0 4 ? 00:00:00 ksoftirqd/0 5 ? 00:00:00 watchdog/0 6 ? 00:00:00 migration/1

Just what you'd expect!

Reading a directory

readdir.pl
1	use FileHandle;
2	my $fd = new FileHandle;
3	opendir($fd,"/etc");
4	while($d = readdir($fd)) {
5	print $d,"\n";
6	last if($n++ > 5);
7	}
8	closedir($fd);
$ perl ./readdir.pl . .. dnsroots.global statetab logwatch filesystems texmf

What could be easier? I'll show you...

Reading a directory, v2

readdir2.pl

print join("\n",</etc/security/*>),"\n";

$ perl ./readdir2.pl
/etc/security/access.conf
/etc/security/chroot.conf
/etc/security/console.apps
/etc/security/console.handlers
/etc/security/console.perms
/etc/security/console.perms.d
/etc/security/group.conf
/etc/security/limits.conf
/etc/security/limits.d
/etc/security/namespace.conf
/etc/security/namespace.d
/etc/security/namespace.init
/etc/security/opasswd
/etc/security/pam_env.conf
/etc/security/pam_winbind.conf
/etc/security/sepermit.conf
/etc/security/time.conf

That does the same thing. The part in the angle brackets is called a glob, and that's a shell-like file-matching regular expression.

Counting words

WordCount.pl
1	use strict;
2	my %h;
3	while(<>) {
4	while(/\w+/g) {
5	$h{"\L$&"}++;
6	}
7	}
8	my @h = keys %h;
9	print "Total word count: ",$#h,"\n";
$ perl ./WordCount.pl /etc/group Total word count: 124

Up until now we have not really talked much about the command line. It hasn't mattered. In this example we are going to need it, we are going to count the words in a file.

Notice the <> operator inside the outer while loop. This fetches a line of text from the files listed on the command line, in sequence, and stores it in the default variable.

Notice the string "\L$&". The "\L" is a special control character that affects the evaluation of a double quote string, turning everything that follows to lower case. There are a few other controls like this, "\U" for upper case, "\l" to lowercase the next single character, "\u" to uppercase the next single character, \E to end any automatic upper/lower casing.

The ++ operator will provide a value of "1" for the hash if no prior value existed. We could have just assigned one here, but using ++ is convenient.

Counting words, v2

WordCount2.pl
1	use strict;
2	my %h;
3	my $inp;
4	while($inp = <STDIN>) {
5	while($inp =~ /\w+/g) {
6	$h{"\L$&"}++;
7	}
8	}
9	my @h = keys %h;
10	print "Total word count: ",$#h,"\n";
$ perl ./WordCount2.pl < /etc/group Total word count: 124

This example is almost identical to the previous one, except that we are reading standard in, not from the files supplied on the command line. So we have to invoke it a little differently.

Counting words, v3

WordCount3.pl
1	use strict;
2	my %h;
3	my @inp = <STDIN>;
4	foreach my $inp (@inp) {
5	while($inp =~ /\w+/g) {
6	$h{"\L$&"}++;
7	}
8	}
9	my @h = keys %h;
10	print "Total word count: ",$#h,"\n";
$ perl ./WordCount3.pl < /etc/group Total word count: 124

This example is almost identical to the previous one, except that we slurp the whole file into an array before parsing it.

Part of the magic of perl is that it does things based on context. Because "" is being assigned to an array it behaves differently than if it is assigned to a scalar.

Context trap

CtxTrap.pl
1	print (3+5)/2,"\n";
2	print "\n";
3	print 1*(3+5)/2,"\n";
4	print "\n";
5	print ((3+5)/2,"\n");
6	print "\n";
$ perl ./CtxTrap.pl 8 4 4

Perl figures out contexts, and it doesn't always require parens for methods -- but as the above example shows this can occaisionally cause confusion. The first print statement prints an 8 with no carriage return.

Counting words, v4

WordCount4.pl
1	use strict;
2	my %h;
3	while(<>) {
4	while(/\w+/g) {
5	$h{"\L$&"}++;
6	}
7	}
8	my @h = sort mysort keys %h;
9	print "Total word count: ",$#h,"\n";
10
11	print "\nMost common words:\n";
12	for(my $i=0;$i<5 and $i<=$#h;$i++) {
13	print $h[$i],": ",$h{$h[$i]},"\n";
14	}
15
16	sub mysort {
17	return $h{$b} - $h{$a};
18	}
$ perl ./WordCount4.pl /etc/group Total word count: 124 Most common words: x: 62 root: 8 daemon: 5 bin: 4 pulse: 3

In this example we identify the most common words in our input text. The key step is that we need to invoke the sort function supplying a sorting method of our own creation.

When writing a sort, the variables "$a" and "$b" are magical. They are two values which we must compare. Our subroutine should return a positive integer if $b > $a, a negative integer if $a > $b, and a zero if they are equal

We have also included the "and" operator inside our for loop.

Counting words, v5

WordCount5.pl
1	use strict;
2	use FileHandle;
3	my %h;
4	foreach my $file (@ARGV) {
5	if(-r $file) {
6	my $fd = new FileHandle($file,"r");
7	while(<$fd>) {
8	while(/\w+/g) {
9	$h{"\L$&"}++;
10	}
11	}
12	close($fd);
13	}
14	}
15	my @h = sort mysort keys %h;
16	print "Total word count: ",$#h,"\n";
17
18	print "\nMost common words:\n";
19	for(my $i=0;$i<5 and $i<=$#h;$i++) {
20	print $h[$i],": ",$h{$h[$i]},"\n";
21	}
22
23	sub mysort {
24	return $h{$b} - $h{$a};
25	}
$ perl ./WordCount5.pl /etc/group Total word count: 124 Most common words: x: 62 root: 8 daemon: 5 bin: 4 pulse: 3

This example is exactly like the previous one, but we don't allow perl to help us with all its magic.

We use the special variable @ARGV to look at the command line arguments. We use the "-r" operator to test if a given argument corresponds to a readable file. This operator is familiar to shell programmers, and all the same variants exist. You can use -w to test for a writable file, -x to test for an executable file, -d to test for a directory, and -e to test for existence.

In general, if you think something isn't in Perl, try it out, because it usually is. :-) -- Larry Wall

We use FileHandle to explicitly open, read, and close a file.

Substituting text -- fixing capitalization

Replace.pl
1	$txt = "helLo, wORld.";
2	$txt =~ s/\w+/\u\L$&/g;
3	print $txt,"\n";
$ perl ./Replace.pl Hello, World.

The only new thing here is the substitution operation on the second line of the program. It looks similar to the match operation we have seen previously, except it starts with an "s" and has a second argument of sorts, the value to be subtituted.

The substituted value is essentially the same as the value that is matched, except for capitalization. We use \u and \L to transform the substituted string to a word whose first letter is a capital, and whose remaining letters are lower case.

Substituting text -- doing math

Replace2.pl
1	$txt = "10 * 3 + 2 - 4";
2	while($txt =~ s/(\d+)\s[\+\-\\/]\s*(\d+)/eval $&/e) {
3	print "work: $txt\n";
4	}
5	print $txt,"\n";
$ perl ./Replace2.pl work: 30 + 2 - 4 work: 32 - 4 work: 28 28

The "e" flag to the text substitution operation tells us that the substituted value is not a string, but an actual code.

We could have just evaluated the whole string, but I wasn't really showing you how to use the eval method. I was showing you the "e" flag :)

Just Plain Magical Weridness

weird.pl
1	$counter = "p";
2	$counter++;
3	print $counter x 5,"\n";
$ perl ./weird.pl qqqqq

So the ++ operator does something interesting when applied to text: it gives you the next letter. Isn't that fun?

Also, there's this neat string operator called "x" that allows you to print something x times. So in our example, ++ changes the string "p" to "q" and then prints it 5 times on a line.

Evaluate just one line

    perl -e 'print "Hello!\n";'

Transforming files in-place is as easy as pie

    perl -p -i -e 's/foo/bar/g' *.c

This would replace foo with bar in all c source files. It would not, however, make a backup copy. You need to do this to store a backup in a file named .bak.

    perl -p -i.bak -e 's/foo/bar/g' *.c

You can use perl like sed:

    echo foo and foo | perl -p -e 's/foo/bar/g'