Originally published in the April 1995 issue of Advanced Systems.

Software Tools

Passionate about Perl

For information processing, the tool of choice is Perl. Here's why.

By Ben Smith

Perl is a programming language I'm passionate about. I am not the only one. Thousands of programmers have turned to Perl because it gives us an efficient language for solving everything from the simplest to the most complex information processing problems without the restrictions and peculiarities of other languages.

Larry Wall of NetLabs (Los Altos, CA) developed Perl in the 1980s to be an information search and report tool (hence, the name Pathologically Eclectic Rubbish Lister, a.k.a. Practical Extraction and Reporting Language). But it soon became obvious that it had far greater applications. At first, it served to supplement Unix shell scripts; then it began to supplant them. Now it is replacing C in many instances, as well.

That's because Perl synthesizes the best of both language worlds. Unix shell scripts are easy to write and understand, but they string together otherwise disparate Unix utilities, each with their different rules and syntax; each shell-script utility runs as a performance-killing separate process, even for a trivial task like character substitution. C programs, on the other hand, run as efficient, single processes (unless specifically parallelized) under a fairly consistent set of rules and syntax, but are laborious to code and difficult to understand. Perl programs not only run as single processes, but since Perl's consistent set of rules and syntax are derived from common Unix utilities and shells (command interpreters), Perl is also easy to learn and apply.

Flexible and extensible
Although Perl's single structure runs contrary to the Unix philosophy of many small tools, it has a simple and flexible design that is attractive to even the hardest of hard-core Unix script and C programmers. You can impose your own style to the code you write, even more so than with C, and the language manages much of the minutiae that plague C authors. For one, like Basic, variables never need to be declared before they are used. And unlike Basic and most other high-level languages, Perl's data is typeless, so programmers needn't worry about conversion to and from characters, strings, integers, floats, and doubles. The conversions are all automatic.

Here is an example of the simplicity of Perl programming. It totals a sequence of numbers in a file:

Now think about the C language equivalent. Specifically, you would have to include standard input and output libraries, declare a datatype for the accumulator (maybe int, maybe long?) evaluate the command line for the name of the file to read, open, read each line, convert the input strings into the proper integer datatype (there are different functions for int versus long), accumulate the value in the accumulator, test for the end of the number file, and when you have reached it, close the file, and format the result back into a string for output. (If you think that sentence was long, write the program.)

The compiled C program may run in 75 percent the time the interpreted Perl program takes, but the development time of the C program is 1,000 percent longer than the Perl version, to be sure. If you really need the performance (and you probably don't), optimize your Perl programs by rewriting the slow parts in C and make them an extension to the Perl interpreter.

Today, computing is no longer just data processing; it has become information processing. Most digital information comes as strings of characters and blocks of text. This evolution makes Perl the information processor's tool of choice because it is extremely efficient at handling strings and lists. It also has an optimized associative-array structure from which you can build more complex structures. It packs and unpacks any other datatype that you might find in a file or coming in through a data stream. And it is good at managing processes and network communications. Perl also allows for recursive subroutines: As with C, you pass subroutine arguments by copying (the default) or by reference. Subroutine names can be treated as data elements, so they can be passed as arguments to other routines.

As an interpreter, Perl has a decided advantage over compiled languages. It can evaluate strings of its own executing code. In other words, a Perl application can modify itself, generating code on the fly and immediately eval (execute) that code within the context of the application. While I haven't seen a Genetic Programming engine written in Perl, I have used the Perl debugger, an application that is only possible through Perl's eval function. (By the way, the debugger has a complete set of operations, including trace backs, stepping, breakpoints, and symbolic referencing.)

One of the dangers of a freestyle language like Perl is you are not protected from the common mistake of using a variable before you have given it a value. Larry Wall anticipated that error by giving Perl a warning message option (perl -w), somewhat like having a runtime version of lint. By default, all Perl variables have global scope, even across scripts that are comprised of more than one file. However, you can and should declare local variables for particular blocks or subroutines and further control the scope of variables and subroutines by declaring them within a "package." Perl packages divide the code into public and private segments, a sort of half step toward object-oriented programming.

The next generation
The most widely distributed version of Perl, first released in 1993, is 4.036, also know as Perl 4 patch-level 36. It is a stable, bug-free release of the fourth generation of the language. But developers always want more.

Fortunately for us, Larry Wall realized the best way to bring about the desired evolution was to actually rewrite Perl. Perl 5 is the result of his two year efforts (as of this writing, it is still in beta testing, although it is very nearly finished). All Perl users will profit from the internal optimization that is a result of the rewrite, but developers working with complex problems have the most to gain in moving to Perl 5.

Perl 5 fully supports Perl 4 scripts, but there are many improvements to the general syntax of the language. There are two-thirds fewer reserved words, while the number of available functions has increased. Perl variables may now be declared within a lexical scope, not only a more efficient use of the name table, but also one that will avoid many problems associated with recursive functions. Through a new feature that dereferences names (variables, hash tables, subroutines, and so on), the limitations on data structures are gone. With Perl 5, you can make as complex a compound data structure as you wish. Arrays of associative arrays, associative arrays of associative array, and arrays of functions are now possible without the use of eval.

Perl 5 is modular and can be dynamically loaded as needed rather than linked to the Perl interpreter. Perl 5 also comes with dozens of modules that have been developed by leading Perl programmers. One important module extends Perl for object-oriented programming by making Perl packages function as a class, with dynamic, multiple inheritance and virtual methods. There are even constructors and destructors.

The POSIX module for Perl 5 will, no doubt, catch the interest of many cross-platform developers. Information engineers working with databases, for example, will be glad to see that they no longer need separate versions of Perl to work with each version of DBM. The included module accesses DBM, NDMB, SDBM, GDBM, and Berkeley DB from the same script.

Perl 5 is not only extensible, but is also embeddable. You can call Perl from within C and C++, as well as dynamically link C libraries into your Perl. And there are many more enhancements to Perl. Some, such as those to Perl's regular-expression handling, are very important even for the most trivial program.

Perl is a pearl
Obviously, Perl has gone well beyond its original purpose as a data extraction language or even a prototyper. Today, spurred on and nurtured by a fervent following of Perl programmers, the language does everything but the GUI, and you can find extension modules for even that. And, it comes in more flavors than ice cream: Versions exist for nearly all flavors of Unix, OS/2, Windows and Windows NT, Macintosh, and more.

About the Author
Ben Smith is a Unix and IP networking engineer and software developer in New England and former editor with BYTE magazine. He can be reached at ben.smith@advanced.com.


Perl resources

Although Perl archives exist seemingly everywhere, the definitive site is via anonymous ftp at ftp.netcom.com. Comp.lang.perl is a highly active Usenet newsgroup devoted to the language, and O'Reilly and Associates (Sebastopol, CA; ora.com) publishes the definitive textbooks: Programming Perl by Larry Wall and Randal L. Schwartz (the "camel" book) and Learning Perl by Randal L. Schwartz (the "llama" book).


[Copyright 1995 Web Publishing Inc.]

If you have problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/asm-04-1995/asm-04-swtools.html
Last updated: 1 April 1995