Introduction to Perl A brief introduction to getting started with Perl. This is not aimed at getting you proficient in writing CGI scripts, but to help you decide on when and how (not) to use Perl. Working knowledge of any programming language is assumed. 1. Introduction - 1.1 Short History
- 1.2 Evolution
- 1.3 Relevance
- 1.4 Installation
- 1.5 Similarities to common languages/tools
2. Tutorial - 2.1 First Step
- 2.2 Running Perl
- 2.3 Scalars
- 2.4 Lists (Arrays)
- 2.5 Hashes (Associative Arrays)
- 2.6 Control Structures
- 2.7 File operations
- 2.8 String Processing
- 2.9 Subroutines
- 2.10 More information
3. Useful Examples - 3.1 Processing colon delimited Excel files
- 3.2 Processing fixed format text files
- 3.3 Report Generation and Formatting
- 3.4 DBM Databases
- 3.5 Exercise
4. Modules - 4.1 Where to get them?
- 4.2 Modules vs. coding
- 4.3 Well known modules
5. Flat Databases in Perl 6. IRC Bot In Perl7. Escape Quotes In Perl Shell
1. Introduction Perl was invented by Larry Wall. He called it Practical Extraction and Reporting Language (he also calls it Pathologically Eclectic Rubbish Lister). What started as an exercise in unifying multiple tools used to write scripts to make routine tasks of a system adminstrator evolved into a powerful scripting language with lots of followers. In all fairness, Perl (written always as Perl and not as PERL), is now treated as a generic programming language, though its early beginnings as a melting pot of multiple computing paradigms still make it possible to write undecipherable programs! We will try to get an introduction to Perl and its prowess as a text manipulation language without trying to write cryptic programs. According to Larry Wall, the parents of Perl are COMPUTER SCIENCE
LINGUISTICS Perl COMMON SENSE
ART
So, Perl is a computer language that helps to implement some common sense with help from the principles of computer science in an artistic way using common linguistic constructs. 1.1 Short History Like mentioned before, Perl was born as a tool to aid system administrators. In 1986, Larry Wall was asked to build a bi-coastal CM system in a very short time. That, he did and then his manager asked to produce reports from the system. Awk in those days did not have the capability to manage multiple input files and hence the new language was born. Primarily, the new language was aimed at getting things done quickly with data from and to multiple files. 1.2 Evolution From a quick hack by one system administrator, Perl has grown into a full-fledged language. It is being developed and enhanced continuously by hundreds of programmers around the world. One big step in earning recognition was the addition of regular expression engine. Now, the regular expression capabilities of Perl are so well known (especially since version 5.0), that it is being used in other languages like Python as Perl5 regex'es. The growth of Internet also complemented Perl. The initial attempt at providing dynamic content was through CGI (even now CGI is used extensively), and Perl's remarkable text handling features made it a quick fit. CGI programming is now synonymous with Perl programming. CPAN - Comprehensive Perl Archive Network, was set up to share Perl code. Perl supports modules and chances are that for 99% of the programming requirements, there is already a tested module in CPAN (for the remaining 1%, write modules and contribute to CPAN!). Using modules really mask the complexities of adhering to pre-defined standards and frees you to concentrate on your tasks - no point in re-inventing the wheel. Now, you have modules which handles graphics, CGI etc... You can also embed Perl code in your C/C++ programs. A very popular embedded Perl architecture is mod_perl for Apache web server. JAPH is a project to get Java and Perl working together. 1.3 Relevance Data manipulation Perl can handle strings, dates, binary data, database connectivity, streams, sockets and many more. This ability to manipulate multiple data types help immensely in data conversion (and by the way, it is much faster than PL/SQL!). Perl also has provision for lists (or arrays) and for hashes (associative arrays). Perl also supports references, which are similar to the pointers in C. Lists, hashes and references together make it possible to define and manipulate powerful custom-defined data-types. Glue language Perl does not differentiate between files and pipes. So, it makes it very easy to use Perl as a glue language. Suppose you have a sed script, the output of which is to be given to a Perl script. You can do this the UNIX way, sedscript | perlscript
or the perl way perlscript open(FH,"sedscript|") or die "could not open sedscript\n"l ...
This really helps when people want to migrate from traditional UNIX tools like Awk, sed, grep etc... You can use these tools straightaway instead of worrying on how to do the same thing entirely in Perl. In this aspect, Perl is just like shell. However, we must consider other features of Perl, which shell simply cannot provide easily. CGI CGI.pm. Period. Almost all CGI programs written today are using the CGI.pm module from CPAN. Even before this was written, people used to use Perl extensively for CGI programming. CGI.pm made the process streamlined and easy, even for beginners. The graphics library GD is used extensively in producing dynamic web charts. Quick coding The ease with which Perl can be employed to write programs quickly cannot be overstressed. A disturbing fact about this is that such quick code can tend to be dirty and quickly get out of hand if you keep extending it! Most of the time, you must control your urges to over-extend short programs! But, as a prototyping tool, or as a fast reporting/text-processing tool, Perl is immensely helpful. Two very good tools worth mentioning in this context are s2p and a2p tools which come with the Perl distribution. s2p converts a sed script to Perl script and a2p converts from Awk scripts. These two help a lot in extending sed and awk scripts. Portability Most of the Perl code will run without any change in Unix or Windows or Macintosh. Typical changes you might have to make include specifying file paths and use of low-level OS specific functions. 1.4 Installation Just go to http://www.perl.com and download the source or pre-compiled binaries. Installation typically includes extracting the binary and then changing your PATH variable to reflect where Perl executable resides. Even when you want to compile Perl from scratch, it is a simple job. 1.5 Similarities to common languages/tools Perl has a remarkable resemblance to the syntax of C, AWK, SED and SHELL. C 99% of code looks like C code. So, it is very easy for C programmers to switch to Perl. And believe me, the code as you go philosophy of Perl really makes C programmers happy - especially for small programs. All C functions that are available through standard libraries are available with little or no change at all in Perl. AWK & SED The string processing strategy of Perl is very similar to that of Awk and sed, making it easy to migrate. Shell Again, the commenting scheme, variable naming scheme etc of Perl look similar to that of Shell. Many shell utilities like grep, tr etc are available as functions within Perl. {mospagebreak title=Tutorial} 2. Tutorial Majority of the contents of this tutorial section were written by Nik Silver, at the School of Computer Studies, University of Leeds, UK. Assuming working knowledge of any programming language, we will now try to see what Perl programs look like. 2.1 First Step Ever since Kernighan and Ritchie came out with C programming language, people have started learning almost any programming language with the obligatory "Hello World" program. Let us do the same! Hello World! Here is the basic perl program that we'll use to get started. #! /usr/local/bin/perl # # prints a greeting. # print 'Hello world.'; # Print a message
Comments A common Perl-pitfall is to write cryptic code. In that context, Perl do provide for comments, albeit not very flexible. Perl treats any thing from a hash # to the end of line as a comment. Block comments are not possible. So, if you want to have a block of comments, you must ensure that each line starts with #. Statements Everything other than comments are Perl statements, which must end with a semicolon, like the last line above. Unlike C, you need not put a wrapping character \ for long statements. A Perl statement always ends with a semicolon. 2.2 Running Perl Type in the example program using a text editor, and save it. The first line of the program is a typical shell construct, which will make the shell start the interpreter and feed the remaining lines of the file as an input to the interpreter. After you've entered and saved the program make sure the file is executable by using the command chmod u+x progname
at the UNIX prompt, where progname is the filename of the program. Now, to run the program, just type any of the following at the prompt. perl progname ./progname progname
If something goes wrong then you may get error messages, or you may get nothing. You can always run the program with warnings using the command perl -w progname
at the prompt. This will display warnings and other (hopefully) helpful messages before it tries to execute the program. To run the program with a debugger use the command perl -d progname
When the file is executed Perl first compiles it and then executes that compiled version. Unlike many other interpreted languages, Perl scripts are compiled first, helping you to catch most of errors before program actually starts executing. In this context, the -w switch is very helpful. It will warn you about unused variables, suspicious statements etc. 2.3 Scalars Perl supports 3 basic types of variables, viz., scalars, lists and hashes. We will explore each of these little more. The most basic kind of variable in Perl is the scalar variable. Scalar variables hold both strings and numbers, and are remarkable in that strings and numbers are completely interchangeable. For example, the statement $age = 27;
sets the scalar variable $age to 27, but you can also assign a string to exactly the same variable: $age = 'Twenty Seven';
Perl also accepts numbers as strings, like this: $priority = '9'; $default = '0009';
and can still cope with arithmetic and other operations quite happily. However, please note that the following code is a bit too much to ask for! $age = 'Twenty Seven'; $age = $age + 10;
For the curious, the above code will set $age to 10. Think why. In general variable names consists of numbers, letters and underscores, but they should not start with a number and the variable $_ is special, as we'll see later. Also, Perl is case sensitive, so $a and $A are different. Operations and Assignment Perl uses all the usual C arithmetic operators: $a = 1 + 2; # Add 1 and 2 and store in $a $a = 3 - 4; # Subtract 4 from 3 and store in $a $a = 5 * 6; # Multiply 5 and 6 $a = 7 / 8; # Divide 7 by 8 to give 0.875 $a = 9 ** 10; # Nine to the power of 10 $a = 5 % 2; # Remainder of 5 divided by 2 ++$a; # Increment $a and then return it $a++; # Return $a and then increment it --$a; # Decrement $a and then return it $a--; # Return $a and then decrement it
and for strings Perl has the following among others: $a = $b . $c; # Concatenate $b and $c $a = $b x $c; # $b repeated $c times
To assign values Perl includes $a = $b; # Assign $b to $a $a += $b; # Add $b to $a $a -= $b; # Subtract $b from $a $a .= $b; # Append $b onto $a
Note that when Perl assigns a value with $a = $b it makes a copy of $b and then assigns that to $a. Therefore the next time you change $b it will not alter $a. Other operators can be found on the perlop manual page. Type man perlop at the prompt. Interpolation The following code prints apples and pears using concatenation: $a = 'apples'; $b = 'pears'; print $a.' and '.$b;
It would be nicer to include only one string in the final print statement, but the line print '$a and $b';
prints literally $a and $b which isn't very helpful. Instead we can use the double quotes in place of the single quotes: print "$a and $b";
The double quotes force interpolation of any codes, including interpreting variables. This is a much nicer than our original statement. Other codes that are interpolated include special characters such as newline and tab. The code \n is a newline and \t is a tab. Exercise This exercise is to rewrite the Hello world program so that (a) the string is assigned to a variable and (b) this variable is then printed with a newline character. Use the double quotes and don't use the concatenation operator. 2.4 Lists (Arrays) A slightly more interesting kind of variable is the list variable which is an array of scalars (i.e. numbers and strings). From now on, we will use the terms list and array interchangeably. Array variables have the same format as scalar variables except that they are prefixed by an @ symbol. The statement @food = ("apples", "pears", "eels"); @music = ("whistle", "flute");
assigns a three element list to the array variable @food and a two element list to the array variable @music. The array is accessed by using indices starting from 0, and square brackets are used to specify the index. The expression $food[2]
returns eels. Notice that the @ has changed to a $ because eels is a scalar. Array assignments As in all of Perl, the same expression in a different context can produce a different result. The first assignment below explodes the @music variable so that it is equivalent to the second assignment. @moremusic = ("organ", @music, "harp"); @moremusic = ("organ", "whistle", "flute", "harp");
This should suggest a way of adding elements to an array. A neater way of adding elements is to use the statement push(@food, "eggs");
which pushes eggs onto the end of the array @food. To push two or more items onto the array use one of the following forms: push(@food, "eggs", "lard"); push(@food, ("eggs", "lard")); push(@food, @morefood);
The push function returns the length of the new list. So does $#food ! To remove the last item from a list and return it use the pop function. From our original list the pop function returns eels and @food now has two elements: $grub = pop(@food); # Now $grub = "eels"
It is also possible to assign an array to a scalar variable. As usual context is important. The line $f = @food;
assigns the length of @food, but $f = "@food";
turns the list into a string with a space between each element. This space can be replaced by any other string by changing the value of the special $" variable. This variable is just one of Perl's many special variables, most of which have odd names. When you get overloaded with oddity, use the English module which lets you name these variables in more user-friendly (i.e. to English-speaking people) way. Arrays can also be used to make multiple assignments to scalar variables: ($a, $b) = ($c, $d); # Same as $a=$c; $b=$d; ($a, $b) = @food; # $a and $b are the first two # items of @food. ($a, @somefood) = @food; # $a is the first item of @food # @somefood is a list of the # others. (@somefood, $a) = @food; # @somefood is @food and # $a is undefined.
The last assignment occurs because arrays are greedy, and @somefood will swallow up as much of @food as it can. Therefore that form is best avoided. Finally, you may want to find the index of the last element of a list. To do this for the @food array use the expression $#food
Displaying arrays Since context is important, it shouldn't be too surprising that the following all produce different results: print @food; # By itself print "@food"; # Embedded in double quotes print @food.""; # In a scalar context
2.5 Hashes (Associative Arrays) Ordinary list arrays allow us to access their element by number. The first element of array @food is $food[0]. The second element is $food[1], and so on. But Perl also allows us to create arrays which are accessed by string. These are called associative arrays or hashes. To define an associative array we use the usual parenthesis notation, but the array itself is prefixed by a % sign. Suppose we want to create an array of people and their ages. It would look like this: %ages = ("Michael Caine", 39, "Dirty Den", 34, "Angie", 27, "Willy", "21 in dog years", "The Queen Mother", 108);
Now we can find the age of people with the following expressions $ages{"Michael Caine"}; # Returns 39 $ages{"Dirty Den"}; # Returns 34 $ages{"Angie"}; # Returns 27 $ages{"Willy"}; # Returns "21 in dog years" $ages{"The Queen Mother"}; # Returns 108
Notice that like list arrays each % sign has changed to a $ to access an individual element because that element is a scalar. Unlike list arrays the index (in this case the person's name) is enclosed in curly braces, the idea being that associative arrays are fancier than list arrays. An associative array can be converted back into a list array just by assigning it to a list array variable. A list array can be converted into an associative array by assigning it to an associative array variable. Ideally the list array will have an even number of elements: @info = %ages; # @info is a list array. It # now has 10 elements $info[5]; # Returns the value 27 from # the list array @info %moreages = @info; # %moreages is an associative # array. It is the same as %ages
Operators Associative arrays do not have any order to their elements (they are just like hash tables) but is it possible to access all the elements in turn using the keys function and the values function: foreach $person (keys %ages) { print "I know the age of $person\n"; } foreach $age (values %ages) { print "Somebody is $age\n"; }
When keys is called it returns a list of the keys (indices) of the associative array. When values is called it returns a list of the values of the array. These functions return their lists in the same order, but this order has nothing to do with the order in which the elements have been entered. When keys and values are called in a scalar context they return the number of key/value pairs in the associative array. There is also a function each which returns a two element list of a key and its value. Every time each is called it returns another key/value pair: while (($person, $age) = each(%ages)) { print "$person is $age\n"; }
Environment variables When you run a perl program, or any script in UNIX, there will be certain environment variables set. These will be things like USER which contains your username and DISPLAY which specifies which screen your graphics will go to. When you run a perl CGI script on the World Wide Web there are environment variables which hold other useful information. All these variables and their values are stored in the associative %ENV array in which the keys are the variable names. Try the following in a perl program: print "You are called $ENV{'USER'} and you are "; print "using display $ENV{'DISPLAY'}\n";
2.6 Control Structures More interesting possibilities arise when we introduce control structures and looping. Perl supports lots of different kinds of control structures which tend to be like those in C, but are very similar to Pascal, too. Here we discuss a few of them. foreach To go through each line of an array or other list-like structure (such as lines in a file) Perl uses the foreach structure. This has the form foreach $morsel (@food) # Visit each item in turn # and call it $morsel { print "$morsel\n"; # Print the item print "Yum yum\n"; # That was nice }
The actions to be performed each time are enclosed in a block of curly braces. The first time through the block $morsel is assigned the value of the first item in the array @food. Next time it is assigned the value of the second item, and so until the end. If @food is empty to start with then the block of statements is never executed. Testing The next few structures rely on a test being true or false. In Perl any non-zero number and non-empty string is counted as true. The number zero, zero by itself in a string, and the empty string are counted as false. Here are some tests on numbers and strings. $a == $b # Is $a numerically equal to $b? # Beware: Don't use the = operator. $a != $b # Is $a numerically unequal to $b? $a eq $b # Is $a string-equal to $b? $a ne $b # Is $a string-unequal to $b?
You can also use logical and, or and not: ($a && $b) # Is $a and $b true? ($a || $b) # Is either $a or $b true? !($a) # is $a false?
for Perl has a for structure that mimics that of C. It has the form for (initialise; test; inc) { first_action; second_action; etc }
First of all the statement initialise is executed. Then while test is true the block of actions is executed. After each time the block is executed inc takes place. Here is an example for loop to print out the numbers 0 to 9. for ($i = 0; $i < 10; ++$i) # Start with $i = 1 # Do it while $i < 10 # Increment $i before repeating { print "$i\n"; }
while and until Here is a program that reads some input from the keyboard and won't continue until it is the correct password #!/usr/local/bin/perl print "Password? "; # Ask for input $a = ; # Get input chop $a; # Remove the newline at end while ($a ne "fred") # While input is wrong... { print "sorry. Again? "; # Ask again $a = ; # Get input again chop $a; # Chop off newline again }
The curly-braced block of code is executed while the input does not equal the password. The while structure should be fairly clear, but this is the opportunity to notice several things. First, we can we read from the standard input (the keyboard) without opening the file first. Second, when the password is entered $a is given that value including the newline character at the end. The chop function removes the last character of a string which in this case is the newline. To test the opposite thing we can use the until statement in just the same way. This executes the block repeatedly until the expression is true, not while it is true. Another useful technique is putting the while or until check at the end of the statement block rather than at the beginning. This will require the presence of the do operator to mark the beginning of the block and the test at the end. If we forgo the sorry. Again message in the above password program then it could be written like this. #!/usr/local/bin/perl do { print "Password? "; # Ask for input $a = ; # Get input chop $a; # Chop off newline } while ($a ne "fred") # Redo while wrong input
Exercise Modify the program from the previous exercise so that each line of the file is read in one by one and is output with a line number at the beginning. You should get something like: 1 root:oYpYXm/qRO6N2:0:0:Super-User:/:/bin/csh 2 sysadm:*:0:0:System V Administration:/usr/admin:/bin/sh 3 diag:*:0:996:Hardware Diagnostics:/usr/diags:/bin/csh etc
You may find it useful to use the structure while ($line = ) { ... }
When you have done this see if you can alter it so that line numbers are printed as 001, 002, ..., 009, 010, 011, 012, etc. To do this you should only need to change one line by inserting an extra four characters. Perl's clever like that. if-else Of course Perl also allows if/then/else statements. These are of the following form: if ($a) { print "The string is not empty\n"; } else { print "The string is empty\n"; }
For this, remember that an empty string is considered to be false. It will also give an "empty" result if $a is the string 0. It is also possible to include more alternatives in a conditional statement: if (!$a) # The ! is the not operator { print "The string is empty\n"; } elsif (length($a) == 1) # If above fails, try this { print "The string has one character\n"; } elsif (length($a) == 2) # If that fails, try this { print "The string has two characters\n"; } else # Now, everything has failed { print "The string has lots of characters\n"; }
In this, it is important to notice that the elsif statement really does have an "e" missing. Sometimes, it is more readable to use unless instead of if (!...) . The switch-case statement familiar to C programmers are not available in Perl. You can simulate it in other ways. See the manual pages. Exercise From the previous exercise you should have a program which prints out the password file with line numbers. Change it so that works with the text file. Now alter the program so that line numbers aren't printed or counted with blank lines, but every line is still printed, including the blank ones. Remember that when a line of the file is read in it will still include its newline character at the end. 2.7 File operations Here is the basic perl program which does the same as the UNIX cat command on a certain file. #!/usr/local/bin/perl # # Program to open the password file, read it in, # print it, and close it again.
$file = '/etc/passwd'; # Name the file open(INFO, $file); # Open the file @lines = ; # Read it into an array close(INFO); # Close the file print @lines; # Print the array
The open function opens a file for input (i.e. for reading). The first parameter is the filehandle which allows Perl to refer to the file in future. The second parameter is an expression denoting the filename. If the filename was given in quotes then it is taken literally without shell expansion. So the expression '~/notes/todolist' will not be interpreted successfully. If you want to force shell expansion then use angled brackets: that is, use instead. The close function tells Perl to finish with that file. There are a few useful points to add to this discussion on file-handling. First, the open statement can also specify a file for output and for appending as well as for input. To do this, prefix the filename with a > for output and a >> for appending: open(INFO, $file); # Open for input open(INFO, ">$file"); # Open for output open(INFO, ">>$file"); # Open for appending open(INFO, "<$file"); # Also open for input
Second, if you want to print something to a file you've already opened for output then you can use the print statement with an extra parameter. To print a string to the file with the INFO filehandle use print INFO "This line goes to the file.\n";
Third, you can use the following to open the standard input (usually the keyboard) and standard output (usually the screen) respectively: open(INFO, '-'); # Open standard input open(INFO, '>-'); # Open standard output
In the above program the information is read from a file. The file is the INFO file and to read from it Perl uses angled brackets. So the statement @lines = ;
reads the file denoted by the filehandle into the array @lines. Note that the expression reads in the file entirely in one go. This is because the reading takes place in the context of an array variable. If @lines is replaced by the scalar $lines then only the next one line would be read in. In either case each line is stored complete with its newline character at the end. Exercise Modify the above program so that the entire file is printed with a # symbol at the beginning of each line. You should only have to add one line and modify another. Use the $" variable. Unexpected things can happen with files, so you may find it helpful to use the -w option. Extending pipes You can very easily substitute reading a file to reading a pipe. The following example shows reading the ouput of the ps command. open(PS,"ps -aef|") or die "Cannot open ps \n"; while(){ print ; } close(PS);
2.8 String Processing One of the most useful features of Perl (if not the most useful feature) is its powerful string manipulation facilities. At the heart of this is the regular expression (RE) which is shared by many other UNIX utilities. Regular expressions A regular expression is contained in slashes, and matching occurs with the =~ operator. The following expression is true if the string the appears in variable $sentence. $sentence =~ /the/
The RE is case sensitive, so if $sentence = "The quick brown fox";
then the above match will be false. The operator !~ is used for spotting a non-match. In the above example $sentence !~ /the/
is true because the string the does not appear in $sentence. The $_ special variable We could use a conditional as if ($sentence =~ /under/) { print "We're talking about rugby\n"; }
which would print out a message if we had either of the following $sentence = "Up and under"; $sentence = "Best winkles in Sunderland";
But it's often much easier if we assign the sentence to the special variable $_ which is of course a scalar. If we do this then we can avoid using the match and non-match operators and the above can be written simply as if (/under/) { print "We're talking about rugby\n"; }
The $_ variable is the default for many Perl operations and tends to be used very heavily. More on REs In an RE there are plenty of special characters, and it is these that both give them their power and make them appear very complicated. It's best to build up your use of REs slowly; their creation can be something of an art form. Here are some special RE characters and their meaning . # Any single character except a newline ^ # The beginning of the line or string $ # The end of the line or string * # Zero or more of the last character + # One or more of the last character ? # Zero or one of the last character
and here are some example matches. Remember that should be enclosed in /.../ slashes to be used. t.e # t followed by anthing followed by e # This will match the # tre # tle # but not te # tale ^f # f at the beginning of a line ^ftp # ftp at the beginning of a line e$ # e at the end of a line tle$ # tle at the end of a line und* # un followed by zero or more d characters # This will match un # und # undd # unddd (etc) .* # Any string without a newline. This is because # the . matches anything except a newline and # the * means zero or more of these. ^$ # A line with nothing in it.
There are even more options. Square brackets are used to match any one of the characters inside them. Inside square brackets a - indicates "between" and a ^ at the beginning means "not": [qjk] # Either q or j or k [^qjk] # Neither q nor j nor k [a-z] # Anything from a to z inclusive [^a-z] # No lower case letters [a-zA-Z] # Any letter [a-z]+ # Any non-zero sequence of lower case letters
At this point you can probably skip to the end and do at least most of the exercise. The rest is mostly just for reference. A vertical bar | represents an "or" and parentheses (...) can be used to group things together: jelly|cream # Either jelly or cream (eg|le)gs # Either eggs or legs (da)+ # Either da or dada or dadada or...
Here are some more special characters: \n # A newline \t # A tab \w # Any alphanumeric (word) character. # The same as [a-zA-Z0-9_] \W # Any non-word character. # The same as [^a-zA-Z0-9_] \d # Any digit. The same as [0-9] \D # Any non-digit. The same as [^0-9] \s # Any whitespace character: space, # tab, newline, etc \S # Any non-whitespace character \b # A word boundary, outside [] only \B # No word boundary
Clearly characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash. So: \| # Vertical bar \[ # An open square bracket \) # A closing parenthesis \* # An asterisk \^ # A carat symbol \/ # A slash \\ # A backslash
and so on. Some example REs As was mentioned earlier, it's probably best to build up your use of regular expressions slowly. Here are a few examples. Remember that to use them for matching they should be put in /.../ slashes [01] # Either "0" or "1" \/0 # A division by zero: "/0" \/ 0 # A division by zero with a space: "/ 0" \/\s0 # A division by zero with a whitespace: # "/ 0" where the space may be a tab etc. \/ *0 # A division by zero with possibly some # spaces: "/0" or "/ 0" or "/ 0" etc. \/\s*0 # A division by zero with possibly some # whitespace. \/\s*0\.0* # As the previous one, but with decimal # point and maybe some 0s after it. Accepts # "/0." and "/0.0" and "/0.00" etc and # "/ 0." and "/ 0.0" and "/ 0.00" etc. # Check for valid currency value ^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{1,2})?$ # Check for valid email address ^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*$
Exercise Previously your program counted non-empty lines. Alter it so that instead of counting non-empty lines it counts only lines with - the letter x
- the string the
- the string the which may or may not have a capital t
- the word the with or without a capital. Use \b to detect word boundaries.
In each case the program should print out every line, but it should only number those specified. Try to use the $_ variable to avoid using the =~ match operator explicitly. Substitution & Translation Just like the sed and tr utilities in Unix, you have s/// and tr/// in Perl. The former is for substitution and the later is for translation. $bar =~ s/this/that/g; # change this to that in $bar $path =~ s|/usr/bin|/usr/local/bin|;
s/\bgreen\b/mauve/g; # don't change wintergreen
s/Login: $foo/Login: $bar/; # run-time pattern $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
$program =~ s { /\* # Match the opening delimiter. .*? # Match a minimal number of characters. \*/ # Match the closing delimiter. } []gsx; # Delete (most) C comments.
s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively for ($variable) { # trim white space in $variable, cheap s/^\s+//; s/\s+$//; }
s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
#Note the use of $ instead of \ in the last example. Unlike sed, #we use the \ form in only the left hand side. #Anywhere else it's $.
$myname = "BABU"; $myname =~ tr/[A-Z]/[a-z]/ ; # yields babu
Splitting Perl provides a split function to split strings, based on REs. The syntax is split /PATTERN/,EXPR,LIMIT split /PATTERN/,EXPR split /PATTERN/ split
If EXPR is omitted, $_ is used. If PATTERN is also omitted, splits on whitespaces, after skipping leading whitespaces. LIMIT sets the maximum fields returned - so this can be used to split partially. Some examples are given below: # process the password file open(PASSWD, '/etc/passwd'); while () { ($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split(/:/); # note that $shell still has a new line. # use chop or chomp to remove the newline #... ($login, $passwd, $remainder) = split(/:/, $_, 3); # here we use LIMIT to set the number of fields
}
We also have join which is the opposite of split. For fixed length strings, we have unpack and pack functions. 2.9 Subroutines Like any good programming language Perl allows the user to define their own functions, called subroutines. They may be placed anywhere in your program but it's probably best to put them all at the beginning or all at the end. A subroutine has the form sub mysubroutine { print "Not a very interesting routine\n"; print "This does the same thing every time\n"; }
regardless of any parameters that we may want to pass to it. All of the following will work to call this subroutine. Notice that a subroutine is called with an & character in front of the name: &mysubroutine; # Call the subroutine &mysubroutine($_); # Call it with a parameter &mysubroutine(1+2, $_); # Call it with two parameters
Parameters In the above case the parameters are acceptable but ignored. When the subroutine is called any parameters are passed as a list in the special @_ list array variable. This variable has absolutely nothing to do with the $_ scalar variable. The following subroutine merely prints out the list that it was called with. It is followed by a couple of examples of its use. sub printargs { print "@_\n"; }
&printargs("perly", "king"); # Example prints "perly king" &printargs("frog", "and", "toad"); # Prints "frog and toad"
Just like any other list array the individual elements of @_ can be accessed with the square bracket notation: sub printfirsttwo { print "Your first argument was $_[0]\n"; print "and $_[1] was your second\n"; }
Again it should be stressed that the indexed scalars $_[0] and $_[1] and so on have nothing to with the scalar $_ which can also be used without fear of a clash. Returning values Result of a subroutine is always the last thing evaluated. This subroutine returns the maximum of two input parameters. An example of its use follows. sub maximum { if ($_[0] > $_[1]) { $_[0]; } else { $_[1]; } }
$biggest = &maximise(37, 24); # Now $biggest is 37
The &printfirsttwo subroutine above also returns a value, in this case 1. This is because the last thing that subroutine did was a print statement and the result of a successful print statement is always 1. Local variables The @_ variable is local to the current subroutine, and so of course are $_[0], $_[1], $_[2], and so on. Other variables can be made local too, and this is useful if we want to start altering the input parameters. The following subroutine tests to see if one string is inside another, spaces not withstanding. An example follows. sub inside { local($a, $b); # Make local variables ($a, $b) = ($_[0], $_[1]); # Assign values $a =~ s/ //g; # Strip spaces from $b =~ s/ //g; # local variables ($a =~ /$b/ || $b =~ /$a/); # Is $b inside $a # or $a inside $b? }
&inside("lemon", "dole money"); # true
In fact, it can even be tidied up by replacing the first two lines with local($a, $b) = ($_[0], $_[1]);
2.10 More information Only a very brief of Perl is covered in this tutorial. The easiest way to lern Perl is to look at existing code. The Perl manual pages and FAQ's are really superb and will help you a lot. Unless until you are sure, run Perl with the -w switch! {mospagebreak title=Useful Examples} 3. Useful Examples 3.1 Processing colon delimited Excel files This example takes in a text file as parameter and creates a set of SQL statements which will create Oracle tables. #! /usr/local/bin/perl
# this is the normal perl formatting mechanism # we don't've any standard headers, so we make # STDOUT_TOP as blank. format STDOUT_TOP = . # now we format and print the column types nicely # the @<<<... is the picture and for each picture line, we # must have the next line made of the variables. # during printing, picture is substituted by the actual # values in the corresponding variables # @ for right align, @| for center-align format STDOUT = @<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<< $column_name, $data_type, $nullable .
# you can have multiple table definitions in one input file # so keep a count $tableno=$colno=0;
# similarly, we will store the comments in hashes %colcmnts=%tblcmnts=(); LINE: while(){ chomp; # remove the new line character next LINE if (/^$/); # ignore null lines next LINE if (/^\s*#/); # ignore comment lines ($column_name, $data_type, $precision, $nullable, $comment) = split /:/; $column_name =~ tr/[A-Z]/[a-z]/ ; #lowercase column name if ($data_type eq ""){ #no data-type ? then this is a table $table_name = $column_name; $colno=0; print ");\n\n" unless ($tableno==0); print "CREATE TABLE $table_name AS (\n"; $tableno++; #store table comment if there is a comment $tblcmnts{$table_name} = $comment unless ($comment eq ""); next LINE; } #column comment $colcmnts{"$table_name.$column_name"} = $comment unless ($comment eq ""); $colno++; #if precision is specified, we need to put it inside parantheses $data_type .="($precision)" unless ($precision eq ""); if ($colno==1) { $column_name = " $column_name"; } else { $column_name = ",$column_name"; } $data_type =~ tr/[a-z]/[A-Z]/ ; # uppercase datatype $nullable =~ tr/[a-z]/[A-Z]/ ; # upper case "not null" # print to the format defined before write ; } print ");\n\n" ;
# print table comments foreach my $key (sort keys(%tblcmnts)) { $comment = "'$tblcmnts{$key}'"; print "COMMENT ON TABLE $key IS $comment;\n"; } print "\n\n" ;
# print column comments foreach my $key (sort keys(%colcmnts)) { $comment = "'$colcmnts{$key}'"; print "COMMENT ON COLUMN $key IS $comment;\n"; } print "\n\n" ;
__END__ # from here is the sample input file # columns are # column(table):data type:precision:nullable:comment # for table name, data type is null BARS_BATCH::::Batch Header Table batch_number:number:5:not null:The batch number(sequential) deposit_date:date::not null:Deposit date of the batch payments:number:3:not null:Number of payments in the batch payment_amount:number:11,2:not null:Dollar amount of the batch pieces:number:3:not null:Number of cheques in the batch payment_method:varchar2:2:not null:Method of payment(Cheque,Cash...) clerk:varchar2:10:not null:Who entered the batch origin:varchar2:2:not null:Where the batch originated (Field, HO...) dirty:varchar2:1::Is the batch marked as dirty(1,0) actual_payments:number:3::Number of payments actually entered actual_amount:number:11,2::Amount actually entered creadt:date::not null:Date batch was created modidt:date:::Date batch was last modified sts:varchar2:1:not null:Status of the batch errcode:varchar2:16::Errors in the batch
BARS_GIFTS::::Batch Detail Table batch_number:number:5:not null:The batch number(sequential) doc_number:number:2:not null:The gift doc number(sequential within batch) page:number:2:not null:Page number account_id:number:8:not null:Active/new account id for the member source:varchar2:14:not null:Active source fund:varchar2:16::Active fund gift_type:varchar2:2::Active gift type credit_account:varchar2:4:not null:Active credit account (check the size!) handle_flag:varchar2:1::Special handling flag payment_amount:number:11,2:not null:Gift amount total_payment_amount:number:11,2:not null:Cheque amount creadt:date::not null:Date gift was created modidt:date:::Date gift was last modified errcode:varchar2:16::Errors in the gift
BARS_ACCOUNTS::::New members account_id:number:8:not null:New account id for the member title:varchar2:8::Title for the new member first_name:varchar2:20::First name middle_name:varchar2:20::Middle name last_name:varchar2:40:not null:Last name(In TA, can be null) suffix:varchar2:8::Suffix phone_number:varchar2:15::Phone number street_number:varchar2:8::Street number street_name:varchar2:30::Street name apt_no:varchar2:8::Apartment number zipcode:varchar2:5::Zipcode zipcode_ext:varchar2:5::Zipcode extension city:varchar2:30::City state:varchar2:2::State freeline:varchar2:50::Free comments extraline:varchar2:50::Free comments 2
BARS_CODES::::Default codes for LOVs code_type:varchar2:2:not null:Code type code:varchar2:20:not null:Code codelb:varchar2:40:not null:Description
3.2 Processing fixed format text files Following is an extract from one of our SQL*Loader control files. Note that the ocr_gift_fate and ocr_deposit_date columns are same and that there is a filler between positions 21 and 27. INTO TABLE ACQUIRED_DATA WHEN record_type = 'T' ( record_type POSITION(001:001) CHAR "DECODE (:record_type, 'T', 'BT', 'X', 'FT', 'D', 'D', 'O')", ocr_batch_number POSITION(002:010) CHAR, ocr_gift_date POSITION(011:021) CHAR, ocr_deposit_date POSITION(011:021) CHAR, target_payment_num POSITION(027:029) INTEGER EXTERNAL, target_payment_amt POSITION(030:040) DECIMAL EXTERNAL )
Here is our Perl code to read all the T records, split the record into corresponding variables and then print the batch number, payment number and the amount. #! /usr/local/bin/perl -w
# read standard input LINE : while() { #ignore records other than batch headers next LINE unless /^T/;
# remove the new line character chomp;
# split the record! ($rec_type, $ocr_batch_number, $ocr_gift_date, $filler, $target_payment_num, $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);
#convert the number fields from scalar string to scalar number! $target_payment_num += 0; $target_payment_amount += 0;
#voila! print it print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n"; }
3.3 Report Generation and Formatting This is same as the previous code except that we now format the output nicely (well, it is debatable whether this is nice!). We have also added a subroutine commify that will format numbers by adding commas. #! /usr/local/bin/perl
# use the POSIX module to access only the function # strftime - to format a date nicely use POSIX(strftime); # and use it right-away to format current time # as MM/DD/YY HH:MI $today=strftime("%m/%d/%y %H:%M",localtime());
# this subroutine adds commas to a number # sub commify { my $input = shift; $input = reverse $input; $input =~ sg; return reverse $input; }
#define the page header ## $% is the page number format STDOUT_TOP= THE NATURE CONSERVANCY @<<<<<<<<<< $today,$% ------------------------------------------------------------------ Batch Number Gift Date Payments Amount ------------------------------------------------------------------ .
# define the page format STDOUT= @> $ocr_batch_number,$ocr_gift_date,$target_payment_num,$target_payment_amount .
# $= is the lines per page . Normal printers have this as 59 $= = 59;
# initialize the variables that hold report totals $sum_num = $sum_amount = 0;
# read standard input LINE : while() { #ignore records other than batch headers next LINE unless /^T/;
# remove the new line character chomp;
# split the record! ($rec_type, $ocr_batch_number, $ocr_gift_date, $filler, $target_payment_num, $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);
#convert the number fields from scalar string to scalar number! $target_payment_num += 0; $target_payment_amount += 0;
# add to the totals $sum_num += $target_payment_num; $sum_amount += $target_payment_amount;
#add commas to the number $target_payment_num = &commify($target_payment_num); # dollar amount should have 2 decimal places $target_payment_amount = "\$".&commify(sprintf("%.2f",$target_payment_amount));
#voila! print it #print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n"; write; }
## # print a line before printing totals # $ocr_batch_number = "---------"; $ocr_gift_date = "-----------"; $target_payment_num = "------"; $target_payment_amount = "----------------------"; write;
## # print totals # $ocr_batch_number = "TOTAL"; $ocr_gift_date = ""; $target_payment_num = &commify($sum_num); $target_payment_amount = "\$".&commify(sprintf("%.2f",$sum_amount)); write;
3.4 DBM Databases DBM is a standard UNIX database, which store data as key-value pairs. In this example, we will read in the batch records, check against a DBM database whether the batch exists in that. If it exists, we will print an error and if not we will insert the batch and values. This code helps in checking for duplicate uploading of batches. #! /usr/local/bin/perl
# set the name of your DBM file $DBM_FILE = "batches.db";
# this will create two files, one for data and one for index dbmopen %HASH, $DBM_FILE, 0666 or die "Can't open $DBM_FILE: $!\n";
# read standard input LINE : while() { #ignore records other than batch headers next LINE unless /^T/;
# remove the new line character chomp;
# split the record! ($rec_type, $ocr_batch_number, $ocr_gift_date, $filler, $target_payment_num, $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);
#convert the number fields from scalar string to scalar number! $target_payment_num += 0; $target_payment_amount += 0;
# key is the batch number # value is batch date + payments + amount # all joined by : ($Key,$Value) = ($ocr_batch_number,"$ocr_gift_date:$target_payment_num:$target_payment_amount"); ## # check whether this batch is already loaded if ( defined($HASH{$Key}) ) { # if so, print an error ($b_date,$b_payments,$b_amount)=split(/:/,$HASH{$Key}); print "Error: The batch $Key ($b_payments for \$$b_amount) is already uploaded\n"; } else { # else, add to the batch database $HASH{$Key} = $Value; } } dbmclose %HASH;
3.5 Exercise Using the examples above, write a program to read all batch records from an input file, verify against a DBM database and print a formatted report. Duplicate batches should also be indicated in the report. Try to split the tasks (verifying against the database, reporting etc) into individual subroutines. Also add another routine to generate an Excel CSV file report, in addition to the normal report. {mospagebreak title=Modules} 4. Modules (The following section is borrowed directly from Tim Bunce's modules file, available at your nearest CPAN site.) Perl implements a class using a package, but the presence of a package doesn't imply the presence of a class. A package is just a namespace. A class is a package that provides subroutines that can be used as methods. A method is just a subroutine that expects, as its first argument, either the name of a package (for ``static'' methods), or a reference to something (for ``virtual'' methods). A module is a file that (by convention) provides a class of the same name (sans the .pm), plus an import method in that class that can be called to fetch exported symbols. This module may implement some of its methods by loading dynamic C or C++ objects, but that should be totally transparent to the user of the module. Likewise, the module might set up an AUTOLOAD function to slurp in subroutine definitions on demand, but this is also transparent. Only the .pm file is required to exist. 4.1 Where to get them? CPAN - Comprehensive Perl Archive Network is the one-stop shop for Perl archives, modules, scripts and even documentation. The URL is http://www.cpan.org. 4.2 Modules vs. coding The advantage of modules is that, you really don't need to know about how the software works, but how to use it. Modules can drastically reduce development time. Most modules have good documentation and are well tested. These also tend to thoroughly follow standards. As a programmer, you might not be interested in reading hundreds of pages of standards' documentation, just to code something quickly! Modules do have an apparent disadvantage. Speed. Since modules tend to be generic in nature, the code contained within tend to be large. And most of the time, you might be using just 10% of the features modules provide. In such cases, and if a second or two in the overall run-time makes a difference, you probably want to code manually. Even in such situations, the easiest way is to download the module, see what you don't require and delete it. An easier and more elegant solution is to selectively import functions from the module. CPAN provides guidelines on writing modules. So, if you think you have some code that nobody else has written (rare chance!) and can be modularized, do so by all means and submit to CPAN. 4.3 Well known modules CGI - web programming CGI.pm is the most used module for CGI scripts. These days, page design and content management are mostly done independently and it is rare to find a good CGI programmer with a good aesthetic sense! CGI.pm makes programmer's life easy, without making him bother much about the HTML tags (and also avoiding heated discussions with the page designer!). DBI - common database interface To quote Tim Bunce, the architect and author of DBI: DBI is a database access Application Programming Interface (API) for the Perl Language. The DBI API Specification defines a set of functions, variables and conventions that provide a consistent database interface independant of the actual database being used.
In simple language, the DBI interface allows users to access multiple database types transparently. So, if you are connecting to an Oracle, Informix, mSQL, Sybase or whatever database, you don't need to know the underlying mechanics of the 3GL layer. The API defined by DBI will work on all these database types. A similar benefit is gained by the ability to connect to two different databases by different vendors within the one perl script, i.e., I want to read data from an Oracle database and insert it back into an Informix database all within one program. The DBI layer allows you to do this simply and powerfully. The DBI requires one or more driver modules to talk to databases. Oracle, Access and ODBC drivers might be of interest to us. Please note that DBI only standardize the database interaction process. If you use Oracle driver and write SQL specific to Oracle, don't expect to port your project smoothly to Informix, just by changing the driver! GUI - interfaces to GUI toolkits Perl also provides modules for various GUI toolkits on Unix like Gtk, Tk, XForms etc. A beta version for Win32 GUI is also available. {mospagebreak title=Flat Database In Perl} Flat Databases in Perl Flat databases in Perl Aim: I aim to give a good overview, and insight, into making good flat database structures in perl without too much hassle. I will be using a telephone database as an example. The following fields will be used: - ID
- Name
- Address
- Telephone number
Note: You will require an intermediate perl skill level. Tip: the correct database terminology for a 'data section is a field. 'Rows' in a database are considered 'records'. What are flat databases? A flat database is a database within a single file. A database is a set of data within a structure (in memory or a file, for example). All data stored in a database, is stored within validity constraints - that, in simple terms, means the data stored, is validated, and only stored if it passes a validation check. Against popular belief, flat databases can be linked to another database, or joined; but for this tutorial, I will cover flat databases, without joining, in perl. Validation: Data must be checked before it can be stored in a 'good' database. If you have an input string, and you require this input string to be numerical, like a telephone number for example, you would use the following expression: | print "the data is not a number" unless (/^\d+$/); |
|
Perl: \d is the numerical regex operator in perl. If this returns false, then the unless condition will 'run' the print statement. This can be extended further, to include spaces, and brackets, as most telephone numbers require an international dialing code. | print "the data is not a valid telephone number" if (/[^\d|\+| |\(|\)]/); |
|
Perl: in regular expressions, [ ], delimit individual comparisons, and the pipe character (|) separates these. The caret means 'not'. Here you could use the hex equivalent of 'space' if wanted. To put this 'data validation' into action, the following script my help you: &amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#!/usr/bin/perl&amp;amp;amp;lt;br&amp;amp;amp;gt;# fdb_validation1.pl&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Simple data validation in perl&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;print "Simple data validation\n", '-' x 22 . "\n", "Hit CTRL-C to exit\n\n";&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;while(&amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;) {&amp;amp;amp;lt;br&amp;amp;amp;gt; unless (/[^\d|\+| |\(|\)]/) {&amp;amp;amp;lt;br&amp;amp;amp;gt; print "The input is not a valid telephone number\n\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; } else {&amp;amp;amp;lt;br&amp;amp;amp;gt; chomp;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Great! $_, is a proper telephone number!\n\n" &amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
To sum up: validation checks to see if data entered is of the correct type. Proper databases should run a validation check on all input. Storing databases: I'm going to cover the theory to storing a flat database first, because it's the most important to understand, and it introduces you to storage concepts. We know that validation checks to see if data is ok, but we don't know how to store information in a flat database. A structured (aka 'active' or 'online') database will store data in memory, and to disk. We do not have the option of storing to memory, because we are using a flat, offline, database. Therefore, the entire database state, should be stored to file after being modified. When storing to file, we need a method of storing the data so it can be read again. If the data cannot be read correctly, then there is no reason in storing it. There are two used methods of storing data. - Chunks: Chunks of data are often the fastest method of saving data, because they do not require any conditional formatting. A section of a 'chunk database' may look like the following: 29 data data more data even more data Each field is on it's own line, and each line has a carriage return directly after the end of the data. If there is no data, where data should be stored, a carriage return is still used to 'hold' the database structure. The records are stored in lines of 5. In code, a data chunk is often referenced as an array, for example, | my @current_chunk = @db_lines[10..15]; |
|
Here, 5 lines from 10 to 15 are stored in the array, @current_chunk. Obviously you can then access fields (when knowing the format), with $current_chunk[3]; You may wish to use chunks in your database if: 1) you want speed, and 2) you wish to use less regular expressions when retreiving from the database. - Rows: This is the most popular method of storing data. A line from a flat database might look like: 29:data:data:more data:even more data This method is popular because, 1) it is easier to handle with loops, and 2) it is easier to read by sight. Here, the fields are separated by colons (:). Records are separated by carriage returns. For this tutorial, we're going to use the latter - sounds a little more complicated? Well, maybe it sounds so, but it's not really :) Reading a database in code: There are now two ways you can go about reading a database, you can: a) read into an array and loop to use, or b) read into a hash, loop to use, or require a key to use. We're going to cover the latter, as I believe hashes are such a magnificent thing not to waste. Perl: check perldoc's hashref before proceeding, as it's very important one knows the difference between a hashref and a hash. Even I slip up sometimes. Opening, storing, and closing a file: &amp;amp;amp;lt;br&amp;amp;amp;gt;open(DATA, "filename") or die "Sorry, I could not open the specified file because: $!";&amp;amp;amp;lt;br&amp;amp;amp;gt; my @filelines = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;DATA&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt;close(DATA);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;Ok, thats the easy bit to opening a file, but, lets make it a little more fool proof :)&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;my @db_lines;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;if (-e $db_file) {&amp;amp;amp;lt;br&amp;amp;amp;gt; open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $";&amp;amp;amp;lt;br&amp;amp;amp;gt; @db_lines = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;hDB&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; close(hDB);&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
The advantage of this, is that the script will not die on first use, because of the '-e'xist condition. Should you wish, you can add your own else statement and touch the file, but you'll not need to do this as the write function later offers an alike process when it unconditionally appends. - Splitting delimited lines: In the examples above, I used, :, as a line delimited. This is not a good choice, as some input may use a colon and end up 'shifting the data'. So for posterity, and 'because I can', I'm going to use the tab character, \t. Lets have a look at one of those lines again. | 29:data:data:more data:even more data |
|
would now be, | 29 data data more data even more data |
|
- Parsing: | ($data1, $data2, $data3, $data4, $data5) = split("\t", "29 data data more data even more data"); |
|
That was easy wasn't it? Lets add the last to sections of code together and see what we get :) &amp;amp;amp;lt;br&amp;amp;amp;gt;my @db_lines;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;if (-e $db_file) {&amp;amp;amp;lt;br&amp;amp;amp;gt; open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $";&amp;amp;amp;lt;br&amp;amp;amp;gt; @db_lines = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;hDB&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; close(hDB);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $db_line (@db_lines) {&amp;amp;amp;lt;br&amp;amp;amp;gt; chomp;&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($data1, $data2, $data3, $data4, $data5) = split("\t", $db_line);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; # We deal with the values from the file here :)&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
- Hash'ed :) I mentioned that we were going to use hashes earlier. Just to recap on them, as I assume no one read the hashref perldoc, a hash is an associative array. This means, a hash can old arrays within itself access them by association. hashref talks about the simple issue you need to understand: hashes can only hold scalars arrays! But this isn't to limit our work in anyway, as a reference is also a scalar. A reference, for those that don't understand pointer concepts is as follows: $ascalar = "this is a scalar"; # set $another = $ascalar; # copied $yetanother = \$ascalar; # referenced
print $ascalar; # ok print $another; # ok print $yetanother; # NNOOOOO!
print ${$yetanother}; # great! print $$yetanother; # simplified
|
|
A reference is a pointer, much alike those used in C. (but references in C start with an amperstand, [&] and pointers with an asterisk [*], just to confuse you :) It points, or references, a memory location that the compiler must be told to 'use' correctly. If it is not told that the place is an area of memory, it won't be treated like one, and end up printing the 'location' it points to, rather than the locations content. When learning BBC basic, I remember variables and pointers much like a mail sorting office, where they have rows and rows of boxes, each of these boxes with a number - this is the 'key' to locating them. Within the box is the data stored (an envelope). When we use, ${ scalar }, we tell the interpreter that the data is a reference, and it must then get the data from that reference in memory, and use that when called on. It is also possible to to use @$scalar, if $scalar is a reference to a memory space where an array is stored. That's a lot of recapping on hashes, now, how to use them. my %hash; # this is a hash.
my $hashref = \%hash; # this is a reference to that hash
$hashref{'key'} = "value"; # this is setting a key, and a value pair.
print $hashref{'key'}; # this prints the value of key (which would be "value")
my @array = ('29', 'data', 'data', 'more data', 'even more data'); # makes an array called array
$hashref{'array'} = \@array; # sets the key 'array', to a REFERENCE of an array with our data
foreach $value (@{$hashref{'array'}}) { print "$value\n"; } # loops and prints each element of this new array stored in the hash.
|
|
That sounds easy enough, so, using a unique value in our database as a key, we should be able to create a hash holding our details... lets see! &amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;my @db_lines;&amp;amp;amp;lt;br&amp;amp;amp;gt;my %database;&amp;amp;amp;lt;br&amp;amp;amp;gt;my @validation = ('NUM', 'TEXT', 'TEXT', 'TEXT', 'TELEPHONE');&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;my $db_file = "database.txt";&amp;amp;amp;lt;br&amp;amp;amp;gt;my $sep = '|';&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;if (-e $db_file) {&amp;amp;amp;lt;br&amp;amp;amp;gt; open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $_";&amp;amp;amp;lt;br&amp;amp;amp;gt; @db_lines = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;hDB&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; close(hDB);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $db_line (@db_lines) {&amp;amp;amp;lt;br&amp;amp;amp;gt; chomp $db_line;&amp;amp;amp;lt;br&amp;amp;amp;gt; my @record = split($sep, $db_line);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; if (&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;chk_validation(@record) == 1) {&amp;amp;amp;lt;br&amp;amp;amp;gt; die "Sorry, there was an error parsing the database input :(";&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; $database{$record[0]} = [@record[1..$#record]]; &amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;sub chk_validation {&amp;amp;amp;lt;br&amp;amp;amp;gt; my @arraytocheck = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $pos = 0;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $value (@arraytocheck) {&amp;amp;amp;lt;br&amp;amp;amp;gt; if ($validation[$pos] eq 'NUM') {&amp;amp;amp;lt;br&amp;amp;amp;gt; return 1 if ($value !~ /^\d+$/); # return a non 0&amp;amp;amp;lt;br&amp;amp;amp;gt; } &amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;lt;br&amp;amp;amp;gt; if ($validation[$pos] eq 'TELEPHONE') {&amp;amp;amp;lt;br&amp;amp;amp;gt; return 1 if ($value =~ /[^\d|\+| |\(|\)]/);&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; # We don't care about text :) &amp;amp;amp;lt;br&amp;amp;amp;gt; $pos++;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; return 0;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
Here, I have added the subroutine, chk_validation. chk_validation will check to see if the elements in the parsed line fit with the conditions in the array @validation. Perl does not have a switch statement, but it would be handy right now. If you don't understand it, and know your biology, think of it as a ribosome translating rDNA, but if it finds something that doesn't match, it throws an error. Writing to a flat database We know how information is stored and read, so lets write ours in the same format. This is rather easy, compared to most of the validation... here we go. &amp;amp;amp;lt;br&amp;amp;amp;gt;sub write_entry {&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($forename, $surname, $city, $telephone) = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $cid = scalar keys %database;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; open(OUT, '&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;' . $db_file) or die "Sorry, we could not open the database for writing, $!";&amp;amp;amp;lt;br&amp;amp;amp;gt; print OUT $cid . $sep . $forename . $sep . $surname . $sep . $city . $sep . $telephone . "\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; close(OUT); &amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
That was easy, but lets add some taint checking and replacing - I shall explain. If I want to use either the tab character (\t) or the pipe character, or even the colon character to delimit fields in the database, I need to replace them in user input. Should someone use that character in a text field, where it is allowed, then the data when parsed will be split at the wrong point. To demonstrate this using spaces, try the following: &amp;amp;amp;lt;br&amp;amp;amp;gt;#!/usr/bin/perl&amp;amp;amp;lt;br&amp;amp;amp;gt;# fdb_taint1.pl&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Breaking 'data' in perl&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;lt;br&amp;amp;amp;gt;my @result;&amp;amp;amp;lt;br&amp;amp;amp;gt;my $input = '3 '; # note the extra space after the 3&amp;amp;amp;lt;br&amp;amp;amp;gt;my $line = "1, 2, $input, 4, 5"; # note no 3, rather $input&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;@result = split(' ', $line);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;foreach (@result) { print "= $_\n"; }&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
In this example, the script should print = 1, .. = 5, but fails to do so because there is an extra space after 3 on the 'input' variable. If this was the telephone directory, and we tried to refer to $result[3], then you'd find the variable would not be there - instead, that extra space would have nudged all sequencial variables up a place... probably causing havoc, and even data loss - especially if we used: | my ($data1, $data2, $data3, $data4, $data5) = @array; # there is no $data6, therefore it's been nudged, and lost |
|
So how do I fix this kind of problem? By replacing! Since we know we don't want extra septerators, we could can a hidden character no one knows about, or we can use a 'word' replacement, and fix it back a little later. Using a 'word' is preferred, as it leaves no room for accidental errors... Lets have a look: &amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;sub write_entry {&amp;amp;amp;lt;br&amp;amp;amp;gt; # my apologies if there is an easier way of doing this?&amp;amp;amp;lt;br&amp;amp;amp;gt; my $i = 0;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $value;&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $value (@_) {&amp;amp;amp;lt;br&amp;amp;amp;gt; @_[$i] = $value if (s/$sep/__BAR__/ig);&amp;amp;amp;lt;br&amp;amp;amp;gt; $i++;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($forename, $surname, $city, $telephone) = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $cid = scalar keys %database;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; open(OUT, '&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;' . $db_file) or die "Sorry, we could not open the database for writing, $!";&amp;amp;amp;lt;br&amp;amp;amp;gt; print OUT $cid . $sep . $forename . $sep . $surname . $sep . $city . $sep . $telephone . "\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; close(OUT); &amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
We'll also need a modification to the read function to reverse the process: if (-e $db_file) { open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $_"; @db_lines = ; close(hDB);
foreach $db_line (@db_lines) { chomp $db_line; my @record = split($sep, $db_line);
my $i = 0; my $value; foreach $value (@record) { @record[$i] = $value if (s/__BAR__/$sep/ig); $i++; }
if (&chk_validation(@record) == 1) { die "Sorry, there was an error parsing the database input :("; } $database{$record[0]} = [@record[1..$#record]]; } }
|
|
Searching through your database This is where our hash information comes into play, but not as evidently as I demonstrated. To search our hash, we need to exact the key and value pair. Because Perl is so wonderful, it has a lovely operator called 'each' - lets have a look: &amp;amp;amp;lt;br&amp;amp;amp;gt;sub search {&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($term) = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt; while (my ($key, $value) = each(%database)) {&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach (@{$value}) {&amp;amp;amp;lt;br&amp;amp;amp;gt; return $key if (/$term/i);&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; return undef;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
|
This function will return undef if no match is found. It's also case insensitive. Also, this is where our hash tuition comes into play - if you remember, I said a hash holds an array of scalars (or pointers!). Since the value of the key is $value, $value at any point would be exactly the same as $database{$key}, therefore, we simply reference it as an array with @{$value}, and do away with @{$database{$key}}. Finishing off We've now covered everything we need to make the database backend to an interactive phonebook database system. Some differences: - I have used the tab character to delimit the database
- I have added a menu system
- I have added an addition system
So, without further ado, lets produce the final script... &amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#!/usr/bin/perl&amp;amp;amp;lt;br&amp;amp;amp;gt;# fdb_phonebook.pl&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# DESC: A flatfile database phone book by Matt 'QX' Melton&amp;amp;amp;lt;br&amp;amp;amp;gt;# HTTP: http://blacksun.box.sk&amp;amp;amp;lt;br&amp;amp;amp;gt;# DATE: 25/10/01&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;# LNCE: You may not use this on your own site without pior permission&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;@validation = ('NUM', 'TEXT', 'TEXT', 'TEXT', 'TELEPHONE');&amp;amp;amp;lt;br&amp;amp;amp;gt;$db_file = "database.txt";&amp;amp;amp;lt;br&amp;amp;amp;gt;$sep = "\t";&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;load_database;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;main_menu;&amp;amp;amp;lt;br&amp;amp;amp;gt;exit;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;# ---------------------------------------------------------------------&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;##&amp;amp;amp;lt;br&amp;amp;amp;gt;## Display the main menu, and prompts for input&amp;amp;amp;lt;br&amp;amp;amp;gt;##&amp;amp;amp;lt;br&amp;amp;amp;gt;sub main_menu {&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;my $main_screen = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;&amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;END;&amp;amp;amp;lt;br&amp;amp;amp;gt; The Phone book - by Matt&amp;amp;amp;lt;br&amp;amp;amp;gt;--------------------------&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;What you you like to do:&amp;amp;amp;lt;br&amp;amp;amp;gt; 1) Add a new entry&amp;amp;amp;lt;br&amp;amp;amp;gt; 2) Display an entry&amp;amp;amp;lt;br&amp;amp;amp;gt; 3) Search for an entry&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; x) Exit&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;END&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; # 1st timers...&amp;amp;amp;lt;br&amp;amp;amp;gt; print $main_screen;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\t=";&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;lt;br&amp;amp;amp;gt; while ($choice = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;) {&amp;amp;amp;lt;br&amp;amp;amp;gt; chomp $choice;&amp;amp;amp;lt;br&amp;amp;amp;gt; exit if ($choice eq 'x');&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;add_entry if ($choice eq '1');&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;show_entry if ($choice eq '2');&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;search_entry if ($choice eq '3');&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; # Returns WIN32 usually, but you can never be&amp;amp;amp;lt;br&amp;amp;amp;gt; # too sure with NT and 2K :)&amp;amp;amp;lt;br&amp;amp;amp;gt; if ($^O =~ /WIN/i) {&amp;amp;amp;lt;br&amp;amp;amp;gt; system('cls'); &amp;amp;amp;lt;br&amp;amp;amp;gt; } else {&amp;amp;amp;lt;br&amp;amp;amp;gt; system('clear');&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; print $main_screen;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\t=";&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;# &amp;amp;amp;lt;br&amp;amp;amp;gt;# Prompts for new data input and validates, then runs write_entry&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub add_entry {&amp;amp;amp;lt;br&amp;amp;amp;gt; my @newrecord;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Forename: ";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $forename = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;; chomp $forename; &amp;amp;amp;lt;br&amp;amp;amp;gt; print "Surname: ";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $surname = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;; chomp $surname; &amp;amp;amp;lt;br&amp;amp;amp;gt; print "City: ";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $city = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;; chomp $city;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Telephone number: ";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $telephone = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;; chomp $telephone;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; if (&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;chk_validation(0, $forename, $surname, $city, $telephone) == 1) {&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Data entered was not valid. Please try again\n\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\n Entry NOT added.\n\nHit any key to continue...\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $null = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; return;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;write_entry($forename, $surname, $city, $telephone);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\n Added entry.\n\nHit any key to continue...\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $null = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; return;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Prompts for key, then runs display_entry&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub show_entry {&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Entry key number: ";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $key = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;; chomp $key;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\n";&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;display_entry($key);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\nHit any key to continue\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $null = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Retrieves records, checks for existance, then displays&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub display_entry {&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($key) = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $record = $database{$key};&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; if ($record == undef) {&amp;amp;amp;lt;br&amp;amp;amp;gt; print "That record does not exist\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; return;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "ID........... $key\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Name......... $$record[0]\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Surname...... $$record[1]\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; print "City......... $$record[2]\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Telephone.... $$record[3]\n";&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Prompts for search term, runs the search sub, display entry if only 1, or displays&amp;amp;amp;lt;br&amp;amp;amp;gt;# entry keys if more&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub search_entry {&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Please type the search phrase [Name, partial number]: ";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $term = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; chomp $term;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\n";&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($matches) = &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;search($term);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; if (@$matches == undef) {&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Sorry, no matches found\n\nHit any key to continue...\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $null = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; return;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; if ($#$matches == 1) {&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Found one matching entry:\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;display_entry($$matches[1]);&amp;amp;amp;lt;br&amp;amp;amp;gt; print "\nHit any key to continue.\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $null = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; return;&amp;amp;amp;lt;br&amp;amp;amp;gt; } &amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; print "Found " . $#$matches . " matching entries: " . substr(join(', ', @$matches), 2) . "\n\nHit any key to continue...\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; my $null = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;STDIN&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# If the db file exists, it will read it and split the lines into records, and then&amp;amp;amp;lt;br&amp;amp;amp;gt;# fields. The adds to $database hash&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub load_database {&amp;amp;amp;lt;br&amp;amp;amp;gt; if (-e $db_file) {&amp;amp;amp;lt;br&amp;amp;amp;gt; open(hDB, $db_file) or die "Sorry, we couldn't open the file specified: $_";&amp;amp;amp;lt;br&amp;amp;amp;gt; @db_lines = &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;hDB&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;;&amp;amp;amp;lt;br&amp;amp;amp;gt; close(hDB);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $db_line (@db_lines) {&amp;amp;amp;lt;br&amp;amp;amp;gt; chomp $db_line;&amp;amp;amp;lt;br&amp;amp;amp;gt; next if $db_line eq "";&amp;amp;amp;lt;br&amp;amp;amp;gt; my @record = split(/$sep/, $db_line);&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $i = 0;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $value;&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $value (@record) {&amp;amp;amp;lt;br&amp;amp;amp;gt; @record[$i] = $value if (s/__BAR__/$sep/ig);&amp;amp;amp;lt;br&amp;amp;amp;gt; $i++;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; if (&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;chk_validation(@record) == 1) {&amp;amp;amp;lt;br&amp;amp;amp;gt; die "Sorry, there was an error parsing the database input :(";&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; $database{$record[0]} = [@record[1..$#record]]; &amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Concurrently parses the records with the array @validation&amp;amp;amp;lt;br&amp;amp;amp;gt;# returns 1 if there is a validation error&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub chk_validation {&amp;amp;amp;lt;br&amp;amp;amp;gt; my @arraytocheck = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $pos = 0;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $value (@arraytocheck) {&amp;amp;amp;lt;br&amp;amp;amp;gt; if ($validation[$pos] eq 'NUM') {&amp;amp;amp;lt;br&amp;amp;amp;gt; return 1 if ($value !~ /^\d+$/); # returns the value 1&amp;amp;amp;lt;br&amp;amp;amp;gt; } &amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;lt;br&amp;amp;amp;gt; if ($validation[$pos] eq 'TELEPHONE') {&amp;amp;amp;lt;br&amp;amp;amp;gt; return 1 if ($value =~ /[^\d|\+| |\(|\)]/);&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; # We don't care about text :) &amp;amp;amp;lt;br&amp;amp;amp;gt; $pos++;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; return 0;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Parses, replaces, the $sep character in a string and prints to the end of the db file&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub write_entry {&amp;amp;amp;lt;br&amp;amp;amp;gt; my $i = 0;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $value; &amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; # my apologies if there is an easier way of doing this&amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $value (@_) {&amp;amp;amp;lt;br&amp;amp;amp;gt; @_[$i] = $value if (s/$sep/__BAR__/ig);&amp;amp;amp;lt;br&amp;amp;amp;gt; $i++;&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($forename, $surname, $city, $telephone) = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; my $cid = (scalar keys %database) + 1;&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; open(OUT, '&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;' . $db_file) or die "Sorry, we could not open the database for writing, $!";&amp;amp;amp;lt;br&amp;amp;amp;gt; print OUT $cid . $sep . $forename . $sep . $surname . $sep . $city . $sep . $telephone . "\n";&amp;amp;amp;lt;br&amp;amp;amp;gt; close(OUT); &amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;load_database; # reload the db, but we could do it straight to the array, but it'd be&amp;amp;amp;lt;br&amp;amp;amp;gt; # an active memory db and not a flat file one :)&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;# Cycles each key/value pair and sees if they match the term, if so, adds to array and&amp;amp;amp;lt;br&amp;amp;amp;gt;# returns list of matches&amp;amp;amp;lt;br&amp;amp;amp;gt;#&amp;amp;amp;lt;br&amp;amp;amp;gt;sub search {&amp;amp;amp;lt;br&amp;amp;amp;gt; my ($term) = @_;&amp;amp;amp;lt;br&amp;amp;amp;gt; my @found = undef;&amp;amp;amp;lt;br&amp;amp;amp;gt; while (($key, $value) = each(%database)) { &amp;amp;amp;lt;br&amp;amp;amp;gt; foreach $field (@{$value}) {&amp;amp;amp;lt;br&amp;amp;amp;gt; push (@found, $key) if ($field =~ /$term/i);&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; }&amp;amp;amp;lt;br&amp;amp;amp;gt; return undef if ($#found == 0);&amp;amp;amp;lt;br&amp;amp;amp;gt; # else&amp;amp;amp;lt;br&amp;amp;amp;gt; return \@found;&amp;amp;amp;lt;br&amp;amp;amp;gt;}&amp;amp;amp;lt;br&amp;amp;amp;gt;
|
| {mospagebreak title=IRC Bot In Perl} IRC Bot In Perl This is a short guide to creating your own perl bot which will work on irc. I will not cover all the different modules and ways to connect to irc and issue commands. This will only cover connecting with IO::Socket and using raw irc commands. I feel you learn the most this way and have alot of control over what is happening.
IRC experience is helpful, but I'll take things slow enough so that an absolute beginner can understand what is taking place. This will also help those with alittle knowledge fully understand the irc protocol. Although I am no irc expert, after creating this bot I did learn a few tricks.
We start off by getting a connection underway:
#!/usr/bin/perl use IO::Socket;
$sock = IO::Socket::INET->new( PeerAddr => 'irc.undernet.org', PeerPort => 6667, Proto => 'tcp' ) or die "could not make the connection";
You can use any irc server and any port (commonly used ports are 6667-7000), so long as they are valid. If you have problems try to find a different server on that network. To make things easier you can make the PeerAddr a variable which is specified by an argument from the command line. Or purhaps map out all the servers on the network and make an arry from them, connecting to random ones and using the best connection. There are many possibilities, each work best for certain situations. We'll stick to the simple hard coded address and port.
Now we have a connection to the server. We still need to get connected/logged in to the ircd. Anything we send to or recieve from the server will go through $sock. So lets see what the server is sending us after we make a connection.
while($line = ){ print "$line\n"; }
We will see that the server prints out some lines. Each line will have a number representation to it. This will really help to tell the bot when to start and end routines. The key here is the line with 'NOTICE AUTH' in it. This is when we need to login to the irc server. To do this we send
NICK bots_nick USER bots_ident 0 0 :bots name
With a line break after the bots_nick and a line break at the end. So in the while loop we will add something like this:
while($line = ){ print $line; if($line =~ /(NOTICE AUTH).*(checking ident)/i){ print $sock "NICK b0ilersbot\nUSER bot 0 0 :just a bot\n"; last; } }
Now we are done with the login process. If you are having any problems try to read up on the irc protocol and how to login to it with telnet. Raven from www.securitywriters.org has wrote a decent tutorial on the subject, look for it.
Some servers will ask for a ping to make sure the client is active. This is only done on some servers and is a common pitfall to many bots which don't support this kind of login proceedure. To handle this we will check if the server wants us to ping it. The server will ask for a ping before it asks about nickserv registration/identification, so we will stop this loop after it mentions nickserv. This is what those numbers in the last if statment are for, the 376|422. The way to identify to nickserv is like this
NICKSERV :identify nick_password
this is just a simple irc command. The command is 'NICKSERV' and the arguments are 'identify nick_password' where nick_password is the actual password for this nick. The line ends in a line break and all irc commands are in upper case. When there is a : before something it means it is a multiple word argument (has spaces in it). This is how we will handle the possible ping and the nickserv identification.
while($line = ){ print $line; #use next line if the server asks for a ping if($line =~ /^PING/){ print $sock "PONG :" . (split(/ :/, $line))[1]; } if($line =~ /(376|422)/i){ print $sock "NICKSERV :identify nick_password\n"; last; } }
If you want to have a registration code you can find this out on your own.. or do what I do and register the nick with a normal irc client. This way we only need the bot to identify.
When you create your bot you can customize it however you want. Most of my bots have alittle bit more AI then this tutorial shows. This bot will be pretty strait forword and doesn't make many decisions. It just connects and does something.
I like to make the bot sleep for a few seconds just to get the connection cought up. I am on a 56k and things can go slow sometimes. A few times without the sleep the bot has joined channels before the nickserv identification is complete, this can be a pain in the neck if the bot needs a usermode or other circumstances which require the nick to be identified (such as other bots, +R channel mode, or trust issues with users).
After it sleeps it will join the channel. You will see that the server prints out alot of information about the channel when you join. You can save this information in variables to allow the bot to make many decisions. Again, this is a simple bot and won't be aware of it's environment or be dynamic in anyway. But you could for example turn on/off colors by what channel modes are set or who is in the channel (some people really hate colors). This is the last bit of the login proccess, after this the bot can actually do something.
sleep 3; print $sock "JOIN #channel\n";
Notice there is no : before #channel. This is because it does not have any spaces in it. And the JOIN command is in all caps. For a full list of commands try reading a tutorial on the IRC protocol. I don't even cover the basics here, there are tons of useful to know commands.
Now we are joining the channel. There is nothing else to do besides read the messages users send to the channel and respond to them. But inorder to read the messages we need to parse them so they make sense. The format of a priv_msg is as follows:
:nick!
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
PRIVMSG #channel :the line of text
I like to seperate them into the following variables to make things easier to keep track of.
:$nick!$hostname $type $channel :$text
in this example here is the values of the variables:
$nick = nick $hostname = ident $type = priv_msg $channel = #channel $text = the line of text
So we are going to need to parse what is send from the server into useable data. This is how we'll do it. There is only one twist here, and that is incase the server sends a ping. They do this quite often to check and see if you are still connected. If we don't reply the the pings then we will get disconnected. When the server sends a ping you must reply with a PONG and the same characters the ping had. So this is how we will send it
while ($line = ) { ($command, $text) = split(/ :/, $line); #$text is the stuff from the ping or the text from the server if ($command eq 'PING'){ #while there is a line break - many different ways to do this while ( (index($text,"\r") >= 0) || (index($text,"\n") >= 0) ){ chop($text); } print $sock "PONG $text\n"; next; } #done with ping handling ($nick,$type,$channel) = split(/ /, $line); #split by spaces ($nick,$hostname) = split(/!/, $nick); #split by ! to get nick and hostname seperate $nick =~ s/://; #remove :'s $text =~ s/://; #get rid of all line breaks. Again, many different way of doing this. $/ = "\r\n"; while($text =~ m#$/$#){ chomp($text); } #end of parsing, now for actions }
ok. That was a rather large chunk of code and some parts were rather confusing. Most of it is just getting rid of what we don't want and seperating what we do want into variables. The next bit is just for looks. We print out what is said as if this is a normal irc client.
if($channel eq '#channel'){ print " $text"; }
The $channel check is needed incase people priv_msg or notice you things. This can be a problem when dealing with bots which need to be secure or can cause large headaches when things go wrong. I'll leave dealing with multiple channels to you. But to send Notices you simply do: print $sock "NOTICE nick :the line of text here\n"; and to send a priv_msg you do: print $sock "PRIVMSG nick :the line of text here\n";
Now the bot structure is done. Everything required is done, the only thing left to do is custimize your bot to have it do what you want it to do. This can be almost any sort of task imaginable. Simply parse the $text $nick and other variables we created to have the bot make decisions on what to do.
Here is the final bot in whole. I added one bit just to prove that the bot works:
#!/usr/bin/perl use IO::Socket;
$sock = IO::Socket::INET->new( PeerAddr => 'irc.undernet.org', PeerPort => 6667, Proto => 'tcp' ) or die "could not make the connection"; while($line = ){ print $line; if($line =~ /(NOTICE AUTH).*(checking ident)/i){ print $sock "NICK b0ilersbot\nUSER bot 0 0 :just a bot\n"; last; } }
while($line = ){ print $line; #use next line if the server asks for a ping if($line =~ /^PING/){ print $sock "PONG :" . (split(/ :/, $line))[1]; } if($line =~ /(376|422)/i){ print $sock "NICKSERV :identify nick_password\n"; last; } }
sleep 3; print $sock "JOIN #channel\n";
while ($line = ) { ($command, $text) = split(/ :/, $line); #$text is the stuff from the ping or the text from the server if ($command eq 'PING'){ #while there is a line break - many different ways to do this while ( (index($text,"\r") >= 0) || (index($text,"\n") >= 0) ){ chop($text); } print $sock "PONG $text\n"; next; } #done with ping handling ($nick,$type,$channel) = split(/ /, $line); #split by spaces ($nick,$hostname) = split(/!/, $nick); #split by ! to get nick and hostname seperate $nick =~ s/://; #remove :'s $text =~ s/://; #get rid of all line breaks. Again, many different way of doing this. $/ = "\r\n"; while($text =~ m#$/$#){ chomp($text); } if($channel eq '#channel'){ print " $text"; if($text =~ /hi b0ilerbot/gi){ print $sock "PRIVMSG #channel :hi $nick\n"; } } }
Not very complicated once you look at each part of it. But finding out things for yourself is the real fun of creating a bot. Much trial and error is involved in perfecting the bot, adding security and function can be alot of fun. I would like to stress the security of irc bots. They are in the most hostile environment known to the net and one security mistake and your bot could be used to execute commands on your box. I have found 4 irc perl bots vulnerable to remote command execution, don't let me find yours vulnerable aswell! Read all of the perl security related tutorials Don't let this discurage you from coding your own bot, it's a great learning experience and as long as you are careful you should be fairly safe. I would love to hear what kind of bots you come up with. The bots I have created include:
quote bot - a bot which has many features that deal with irc quotes. it reads off funny/witty things people have said while chatting in my channels. It also has some more advanced features such as listing off all the users in the channel who have a quote and an admin feature which allows me to add quotes while the bot is running.
quiz bot - A bot which quizes the channel users. I used this while studying for networking. This bot is great when the channel is dead or to start up a conversation with others. I learned alot from this bot.
poker bot - A bot which plays poker. I started to make a ucker (sp?) bot, but I lost motivation when the other people who wantted to play quit going on irc.
channel bot - A bot which enforces the channel rules. it warns, kicks, and kick bans users for breaking the rules. it voices,half ops, and ops identified users and keeps stats of channel activity. Good for preventing channel takeovers.
The reason for creating this text was because I remember the stress I had finding info on this subject when I first created the bot. I have since read a few crappy papers on irc bots, but nothing which would be very helpful.
{mospagebreak title=Escape Quotes In Shell} Escape Quotes In Shell. (requires bash,zsh,other shells with these features.. although even ran under nobody with noshell in /etc/passwd this is still possible.)
This one is fairly useful to have. Not the most used trick, but a good one to know. I started researching this after I found a post by zen-parse on an exploit for x-chat : http://online.securityfocus.com/archive/1/76874
What is so cool about this is the use of $IFS (Internal Field Separators) to add a break (space,tab,newline) inbetween commands. What the $IFS does is determine what the shell uses to seperate arguments. If $IFS is a space, then a space can be used to seperate arguments, if it is 46s03 then 46s03 can be used to seperate arguments.
This is useful for scripts which filter spaces, or where spaces are not allowed. zen-parse was forced to use this technique because spaces are not allowed in urls. (they need to be url encoded into %20). It is also useful when scripts filter spaces from user input that is printed to the shell. example (using perl -e so you can test at the command line):
perl -e ' chomp($userinput=); $userinput =~ s/\s//g; $userinput =~ s/\n//g; print `echo "$userinput"|cat`; '
We are just pretending here, actual circumstances will vary. vary so much that I will just talk about general situations and let you come up with the exploits by combining the info.
In this case we can execute commands by "breaking out" of the quotes with a single ", now we are into normal shell place (no quotes keeping this an argument of echo). So we can use a metacharacter to stop this command and issue another. A few avaliable are | ; && but there are many other characters which we can do tricky things at the shell with.
After the ; we will now put the next command we want executed, for this demonstration I'll do ls. Then lets clean things up with another ". To go along with the old one we broke out of. So the final value for $userinput is:
";ls"
And if we need to use a space, we simply put a $IFS in there:
";ls$IFS/etc/"
There is more than just $IFS. Lets just cut to the chase here. The shell interprets what is sent, so lets just send some `cmd`. This will execute the command and return the output right there. try this for $userinput:
blah`ls`bleh
Simple huh. But many times the arguments sent to the shell are filtered for the ` character. Luckily there is the lesser used version of `cmd`, which is $(cmd). This does the same thing, but gives you different characters which may or maynot be filtered. So if ` is filtered, and $() is not you can use:
blah$(ls)bleh
What if the classic meta tag filter from various older cgi security papers is implemented to escape dangerous characters? What then??
perl -e ' chomp($userinput=); $userinput =~ s/\s//g; $userinput =~ s/\n//g; $userinput =~ s/([;&;t>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g; #or any regex that forgets to filter \ print `echo "$userinput"|cat`; '
Now we can do what rfp did, and escape the escape this regex puts in.. or we can look for other methods of exploitation. Infact after reading the man pages for bash I found that bash also support \nnn encoding, where nnn is the octal value for the ascii character. This will work perfect to evade these filters and still print any character. do a man ascii to find out the octal value of the characters you want. I think this should work for $useriput:
blah\140ls\140bleh
You should have guessed it.. 140 is the octal representation of ` (which as we know executes the command). So this is the same thing to bash as: blah`ls`bleh.
The reason why it is not getting filtered by the s/([;\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g; is because the regex is looking for `, \140 is not `, it is just the characters \ 1 4 0 in a string. But when the shell goes to interpret the string sent, it sees this and interprets it into `. How nice.
Again, just to make it clear. In perl the string \140 will get by ` filters, because it is not a `. But once it is interpreted by the shell into a ` it becomes useable. Perl does have the same \nnn encoding, but "\140" and '\140' are two different things to perl (all languages). Remember that "$var" gets interpreted, '$var' does not. user input does not get interpreted automaticly, it is a string (ie. '\140). Hopefully no one got too confused. Simple idea once you grasp it.
This same theory allows things like the string '\n' to get sent to the shell and get interpreted into a newline. Look for other places where you can use this type of stuff.
One pitfall you may have already thought to this is the inability to break out of '' in the shell. Since nothing is interpreted inside '$userinput' on the bash end none of our tricks really work. The only one is to hope the script does not filter for ' so you can get out and do some magic. Or hope that you can escape the escape in something like s/([badchars])/\\$1/g;
To escape you would simply put something like \' in the $userinput of the second example. This will make the \' sent in to the regex to turn out like \\', thus escaping the escape the regex tries putting on.
That is pretty much all the tricks I can share. Other things are common sense or need to be figured out on a situation to situation basis. I would like to point out that whenever an external program is called from perl with any form of user input spechial attention needs to be provided to how that program handles data and any spechial features that program may have.
For example a root exploit in sperl was released in aug of 2000 [1] because user suppied data was sent to the /bin/mail command, which has a feature were if you have ~! on line it will call apon the shell and issue commands. This is the same principale here, only difference is we are looking at the shell itself and not a "regular" program. Since the shell is used so often it is a good target for a discussion.
The same logic here can also be applied to shell scripts, both local and running as CGI.
[1]http://www.securitybugware.org/mUNIXes/4609.html
|