Putting Perl together

Learn to borrow what you need and write what you must

By Cameron Laird and Kathryn Soraiz

May 1998

Abstract

Even if you use Perl in your daily life, you may not be aware of all the different ways this versatile language can save you time. Here we give you the pros and cons of the 10 most useful ways you can use Perl to put applications together. Cameron and Kathryn cover everything from CGI scripting to compiled extensions to large applications, explaining why and how you'd want to use Perl in these different architectures. (3,600 words)

Mail this
article to
a friend

hat do you keep in your garage? Do you have a tool for every occasion that arises? That's handy until you consider the cost of managing and maintaining all those implements. You might accomplish even more if you hone and lubricate only the ones you use most, learn new ways to use them together, and rent or hire outside resources in the rare instances when these tools can't do the job.

The same principle applies to Perl programming. Perl is a marvelous language, with an amazing wealth of contributed solutions available from resources like the Comprehensive Perl Archive Network (CPAN). But many developers don't realize how many different ways they can use Perl, and how good Perl really is at team play. Sometimes the best Perl solution is one that reuses pieces from other technologies: stand alone utilities you happen to have, or special-purpose libraries in other languages, and so on.

We collected 41 models for building applications with Perl. This article presents the top 10 architectures chosen from our original list. These are the different ways to glue together solutions that are most likely to be useful to Perl programmers. The article also demonstrates the more self-contained structures, with small working examples of what you can do.

Several of the examples build on a dozen-line program that counts the distinct words in a document. This calculation has applications in indexing, data compression, and throughout linguistics. Scholars employ this sort of data, for example, to assess readability or judge the authenticity of manuscripts. Perl's flexible text processing makes it nearly ideal for writing programs centered around such analysis. The article shows a few ways to connect the word-frequency program to other tools or functions.

A catalogue of architectures
Let's first recall what Perl is. Larry Wall developed Perl in 1987 to simplify administrative reporting on a newsgroup conferencing system. A legion of programmers ("a million sounds about right to me," says Jon Orwant, publisher of The Perl Journal) write Web applications, prepare one-shot administrative reports, massage file formats, script client/server jobs, automate maintenance tasks, and solve other typical information-processing problems.

Perl is all about creating solutions.

If you already know enough Perl to manage one of these roles -- programming CGI actions, for example -- you've learned a valuable skill. This article shows you how to apply that same knowledge in other roles. Reusing your knowledge multiplies the return you receive from your investment in a Perl education.

We use the industry jargon, architecture, to describe the way a software solution is put together. A client/server application, for example, can be considered in three parts: a client piece, a server piece, and the network that connects them. The architectures we present here are:

An isolated script
Pipeline
Wrapper
More sophisticated wrapper
Compiled extension
Large problems
Small problems
CGI
Template micro-scripting
In-process scripting

You can review the pros and cons of using Perl with these architectures in our sidebar.

1. An isolated script

Advantages: Simple, portable, well-documented
Disadvantages: Often tedious to write

The simplest Perl architecture is the "isolated" script. Here's what it takes to count the frequency of different words in standard input.

#!/usr/local/bin/perl

# This is the contents of the file, "word_frequency.pl".
while(<>) {
        # Punctuation divides words.  Words
        #     are alpha-numerics, plus quote.
        # With just a little more care,
        #     we could handle hyphenation,
        #     ...  See "Pattern Matching"
        #     in *Programming Perl*.
        # We maintain case distinctions.
s/[^\w']/ /g;
s/ +/ /g;
s/^ //;
        
for $word (split / /) {
        $frequency{$word}++;
}
}

        # Print the most frequently-used words first.
foreach $word (sort {$frequency{$b} <=> $frequency{$a}}
                        keys %frequency) {
print "$frequency{$word}:       $word\n";
}

Running this program against an early draft of this article yielded these results:

46:     the
43:     Perl
35:     a
35:     of
35:     to
25:     and
20:     in
19:     for
17:     is
12:     that
12:     this
12:     you
11:     perl            
...

How do you feed a text into this program? The answer is more subtle than many Perl developers may understand. There are at least a couple of distinct ways to invoke Perl scripts for each operating system. With Solaris, for example, this program may be run as either:

word_frequency.pl <source_file

perl word_frequency.pl <source_file

While the former is obviously briefer, it demands more attention to the initial #!... line in the source. Heterogeneous workplaces with weak standards for system configuration find it more convenient to rely on launches of the form perl script ...

The range of different ways to deploy Perl scripts is a topic for a full article. It's certainly beyond the scope of this one. What you should know, though, is that there's more than one way to launch Perl, and where to go to find more.

We learned Perl entirely from online documents. Don't you do the same. Although we didn't pay in cash, the cost in time was too great. Accelerate your progress by buying at least one of the fine printed descriptions of Perl.

Programming Perl is our long-time favorite reference for the language. Its sixth chapter, Social Engineering, expands many of the points this article mentions. "Social Engineering" here describes what's involved in teaching Perl to be a good citizen in the larger society of other processes and systems. The section on command processing gives advice on different ways to launch Perl. It explains how you can use such apparent line noise as

eval 'exec perl $0 -S ${1+"$@"}'

to make your scripts more portable between users and machines.

Now we have a simple Perl program, and we have found at least one way to execute it on our desktop. What more can we do?

Advertisements

2. Pipeline

Advantages: Can solve involved problems in seconds
Disadvantages: Often not portable, certainly requires multiple (computationally expensive) processes

Command-line "pipelines" often yield powerful results. Our word_frequency.pl example can be bolted unchanged into a pipeline with the portable, freely available, character-based browser, Lynx to combine their functions (see Figure 1).

lynx -dump $URL | word_frequency.pl

This compound command retrieves the page to which $URL refers, renders it into text, and passes the text to word_frequency.pl. The result, then, is a table of the word frequencies of the document at $URL. Notice that Lynx and word_frequency.pl don't have to know anything about each other. They simply operate on whatever input they see and pass the result through to their outputs. Unix programmers are notorious for solving many interesting problems by simply connecting the right inputs and outputs of elementary building blocks.

Figure 1

One of Perl's great benefits is CPAN. The rich collection of work there has led some programmers astray, though. They aim to code only in Perl, and, in some cases, make excess work for themselves. To retrieve a URL and perform a word-frequency on its contents, for example, they

Rewrite word_frequency.pl to recognize a URL
Retrieve the URL using the LWP::Simple module
Render the result into a text document with HTML::Parse or HTML::Parser
Refit the existing code in word_frequency.pl to take its input from that rendering

They've lost track of how easy it is to reuse other tools, such as Lynx, in pipelines or other architectures.

One of Perl's treasures is the wealth of fine aphorisms its inventor, Larry Wall, has coined. He frequently counsels laziness as a "great virtue for programmers." Taking advantage of Lynx, rather than rewriting (a piece of) it in Perl, is definitely lazy. Moreover, this fits with Wall's most common characterization of Perl: "There's more than one way to do it" (TMTOWTDI). You have a choice. Writing entirely in Perl can be great for performance and portability. If you bring in outside help through pipelines or other architectures, though, your work might go even quicker.

Just remember: TMTOWTDI.

3. Wrapper

Advantages: Quick way to make functionality available in Perl
Disadvantages: Cost of process spawning, probable low portability, inflexible data communications

One of the most useful and under-appreciated architectures is the wrapper. A Perl wrapper packages an existing piece of functionality -- typically implemented in a language other than Perl -- and makes it look like Perl.

If you have a display table of an ideal size and height, made from wood that clashes with your decor, you know you don't have to scrap the table or redo the whole room. Just drape a piece of complementary fabric over the table, hide its color, and enjoy its physical dimensions.

Programming is much the same. Suppose you rely on a bond option valuation program. The program was written 15 years ago in Fortran. It gives correct results -- when fed an esoteric combination of configuration files and command-line options. What should you do? Don't stop trading bonds, and don't think you have to rewrite the entire program; just wrap it up in a bit of Perl and make its interface look exactly the way you want it.

Beginners often ask questions of the form, "I have a (C, awk, Java, shell, Fortran) program that performs a certain function; how can I combine it with a Perl program so that my (C, awk, etc.) program's results are available to Perl?" Asking the question this way is a symptom of not understanding how easy it is to wrap other applications with Perl.

Look at a small, concrete example: In the previous section, we mentioned that you can retrieve World Wide Web documents and parse them in Perl with the use of just a few modules from CPAN. Retrieving and installing these takes less than 20 minutes, even with low-end equipment. If you already have a solution at hand, though, you can wrap it in Perl, and have it ready immediately. In this case, you might write

sub simplest_render {
        $URL = shift;
        return `lynx -dump $URL`;
}

With this approach, you have a mixed-language solution with a fraction of the source code size and development time of a pure Perl approach. Lynx does the work, and all the results are available by a simple function call in Perl:

simplest_render($my_URL);

Figure 2 makes the point that a wrapped program is under the control of Perl. Its results are entirely available for any Perl programming you might choose to do.

Figure 2

This kind of wrapping is often a convenient way to reuse legacy applications with unfriendly interfaces. There's often no need to rewrite old programs in awk, Fortran, or even Perl; just wrap them up inside a new Perl interface. This is a particularly cheap way to extend Perl; that is, create a new Perl with more capabilities.

4. More sophisticated wrapper

Advantages: Like a wrapper, but more flexible
Disadvantages: Requires more care, potential operating system headaches in buffering

Perl and the programs it backticks ("`", as in section 3) cooperate only loosely. Perl hands over control to the wrapped program, waits until the latter completes processing, then receives all its input at once.

Sometimes it's useful for the two programs to work together more closely. Suppose our bond calculation program should be reset when prices enter a particular range. This calls for more teamwork between Perl and the wrapped program. They need to take turns in their processing and share partial results with each other.

Hardware control is one domain where you can use this technique. Many pieces of hardware (data acquisition devices, controllers, and so on) connect to serial ports and include simple monitor software. If you don't like the interface to the monitor program, there's no need to feel stuck with it. You can use pipes and other forms of inter-process communication (IPC) to wrap up all the monitor's functionality and give it a Perl programming interface.

This is another instance where the details depend at least mildly on the operating system. Chapter 6 of Programming Perl is mildly discouraging on the subject because the external program might buffer its input/output in a way that's inconvenient for Perl. Our experience, though, is that simple wrapping with the open, IPC::Open2, or Comm.pl facilities is good enough for the majority of problems. Experimenting with such a wrapper certainly takes far less time than rewriting the external program.

5. Compiled extension

Advantages: Performance comparable to C, with programming power of Perl
Disadvantages: Requires more programming experience and care in construction

Figure 3's process diagram for a compiled extension sits somewhere between the isolated script and the wrapper. It executes within a single process, like the former, and includes extended functionality, like the latter.

Figure 3

The other IPC mechanisms mentioned in chapter 6, like system and open, aren't the only ways to alloy the strengths of Perl and another programming language. You might, for instance, create a new Perl processor, myperl, which is like Perl, except that it has a few extra functions that aren't in the standard Perl processor. New functions can be written in C, or, with a bit of care, other languages including C++ and Java. If you have a C library that already solves some of your problems, or if a Perl program you've written doesn't perform to your standards, don't give up on Perl. Just use C codings for what C does best -- processing speed, existing functionality -- and bind it into a customized myperl.

We've chosen not to explain extension writing in this article because several of these details are specific to a particular operating system. Programming Perl explains some of what you'll need in its MakeMaker section. Also, see the SWIG site (in Resources below) for a particularly efficient way to manage interfaces between Perl and lower level languages.

6. Large problems

Advantages: Power of Perl
Disadvantages: Little industrial consciousness of large-scale Perl use results in managerial suspicion, poorly defined development practices

Perl can solve big problems. Many users know it as a scripting language and assume this somehow means that it's only fit for small project work. This isn't true at all. Perl should definitely be considered for implementing even the largest and most mission-critical applications. In our experience, it's roughly as reliable as commercial compilers for C++ and Java. Moreover, with the introduction of Perl 5 in 1996, Perl supports object-oriented methodologies and efficient modularization as well as or better than C++ and C. Many serious, large-scale applications are written principally in Perl, including such prominent Web sites as Amazon, the Internet Movie Data Base, and Intershop.

There's a bit of a problem documenting Perl's mission-critical use. Companies often don't give permission to publicize their reliance on Perl. Why not? Competitive advantage is one reason -- companies generally don't want rivals to know how they develop results so quickly. Also, Perl is an embarrassment for some. There's still a stigma about scripting large-scale projects. Don't let such prejudices hold you back. If Perl is the best vehicle for your next project, have confidence in its ability to tackle big jobs.

7. Small problems

Advantages: Power of Perl
Disadvantages: Cost to learn Perl

With the explosion of the Web, many programmers identify Perl as a CGI language and don't realize how much more it can do. The brief routine work for which Wall originally designed Perl remains a fit domain for Perl automation.

System administrators, especially those who work with personal computers, quickly profit from knowledge of Perl or the other scripting languages in its class (Python, Rexx, Tcl, and so on). If you have users who complain to you that their e-mail isn't working, this little probe will help you diagnose their situation:

#!/usr/local/bin/perl

# Invoke as "pop3probe HOST USER PASSWORD".

use Mail::POP3Client;

($host, $user, $password) = @ARGV;
$pop = new Mail::POP3Client($user, $password, $host);
if ($pop->State() eq "TRANSACTION") {
        for ($i = 1; $i <= $pop->Count; $i++) {
                print $pop->Head($i), "\n\n";
        }     
} else {
        print $pop->Alive(), "\n";
        print $pop->State(), "\n";
}

Perl's portability means you can run this tool from any platform with Perl and its Mail module installed. It takes only seconds to do a quick pop3probe, scan the resulting message headers, and estimate whether the problem is with the mail server, the client, or the user. What's the alternative? If you reconfigure an end-user mail client to reproduce the reported symptoms, you'll spend at least a couple of minutes pointing-and-clicking. Perl will save you those minutes.

Perl's File::Find module is so useful that some administrators learn just enough of the language to exploit it. If you need to find all doc and rtf files that are at least a month old, are owned by a particular user, and are at least a megabyte in size, you can quickly run:

use File::Find;

sub wanted {
        if (!/\.doc$/ && !/\.rtf$/) {
                return
        }
        if (int(-M _) < 30) {
                return;
        }
        ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size)
                                = lstat($_);
        if ($size < 1000000) {
                return;
        }
        if ($uid != 233) {
                return;
        }
        print "$File::Find::name\n";
}

find(\&wanted, "/");

Judge for yourself how long it would take to generate the same report with the administrative tools available on most personal computers.

8. Common Gateway Interface (CGI)

Advantages: Industry-standard way to make interactive Web applications
Disadvantages: Mediocre performance

Though Perl is widely known for its CGI capabilities, we still run into plenty of Perl programmers who don't realize how easy it is to get started with Web programming. Let's use the work we've already done to create a small application that reports on word frequencies for documents we find on the Web. First, we need a simple form, frequency.html, to specify a URL:

<HTML>
<form METHOD="GET" ACTION="frequency.cgi">

Give a URL:

<input TYPE="text" NAME="URL" size=50 maxlength=50>

<center>
<INPUT type=submit ALIGN="center" VALUE=Submit BORDER=0>
</center>

</form>
</HTML>

Use frequency.html with this frequency.cgi executable:

        
#!/usr/local/bin/perl

print "Content-type: text/plain", "\n\n";

        # This is the standard overhead for receiving values
        #     from forms.
@pairs=split(/&/, $ENV{'QUERY_STRING'});
foreach $pair (@pairs) {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $contents{$name} = $value;
}

$result =
`/usr/local/bin/lynx -dump $contents{URL} | word_frequency.pl`;
print "$result\n";
exit (0);

These two small sources constitute a Web application that accepts URLs, examines the documents found there, and reports back on their word frequencies.

CPAN includes several modules that simplify common CGI operations. CGI.pm is particularly popular. Once you have it installed, you can let it handle parsing in word_frequency.cgi:

#!/usr/local/bin/perl

use CGI qw(:standard);

print "Content-type: text/plain", "\n\n";

$result =
`/usr/local/bin/lynx -dump param(URL) | word_frequency.pl`;
print "$result\n";
exit (0);

Do you want to move existing applications to the Web or make your static Web pages more interactive? These examples demonstrate how little it takes to start on your CGI career.

9. Template micro-scripting

Advantages: Even easier than CGI
Disadvantages: Less standard and flexible than full-blown Perl

Do you manage a situation where the full power of Perl overwhelms your Web site workers? Perhaps their understanding of CGI is still so rudimentary that they make errors when trying to code with Perl. You can simplify CGI to its most immediate advantages with a package called WPP (see Resources below).

WPP provides idioms for inclusion of templates, variables expansion, and conditional generation. This constitutes most of the work CGI accomplishes for the majority of generated pages. Moreover, WPP provides a gentle learning curve for those moving from static HTML pages to fancier server-side scripting.

WPP simplifies Web page maintenance because it allows you to factor out redundancies. You can write

@HEAD@
@TAIL@
$Date$
@TITLE=My First Page@
<HTML>
<HEAD>
<TITLE>@TITLE@</TITLE>
</HEAD>
<BODY>
<H1>@TITLE@</H1>
</BODY>
</HTML>

and automate evaluation of @TITLE@, @HEAD@, and so on.

10. In-process scripting

Advantages: Much better performance than CGI
Disadvantages: Reduced portability

Suppose you're an experienced CGI programmer and word comes to you that one of your applications isn't performing adequately. What to do? Do you need to recode all your work to the NSAPI or ISAPI C interfaces for the Netscape and Microsoft Web servers?

Not at all. Perl interfaces are now available for most Web servers, including the VelociGen for Perl from Binary Evolution, and mod_perl for Apache. These mature implementations often boost processing speed by an order of magnitude.

Summary
Perl is a big subject. If you've already written your first Perl application, you've taken the hardest step. Now multiply your productivity by learning a few new architectures that allow you to apply your Perl knowledge. You'll do particularly well to think about ways to tie together Perl and other technologies you know.

If you've never used Perl, keep an open mind. It's a flexible scripting language and probably has a useful role to play in your next project. Ask yourself, Does one of the top 10 architectures match my needs? You may be pleasantly surprised.

Resources

Programming Republic of Perl http://www.perl.com
The Perl Journal http://www.tpj.com/
Welcome to CPAN http://www.perl.com/CPAN-local/CPAN.html
SWIG home page http://www.cs.utah.edu/~beazley/SWIG
About Lynx http://www.crl.com/~subir/lynx.html
Binary Evolution's VelociGen for Perl/TCL http://www.binevolve.com/
Why we use eval 'exec perl $0 -S ${1+"$@"}' http://www.perl.com/CPAN-local/doc/FMTEYEWTK/sh_dollar_at
WPP - Web Pre-Processor http://www.geocities.com/Tokyo/1474/wpp.html
Entropy in the Humanities and Social Sciences (application of word frequencies) http://www.math.washington.edu/~hillman/Entropy/soc.html
Amplifications and updates to this article http://starbase.neosoft.com/~claird/comp.lang.perl.misc/perl_architectures.html

About the author
Cameron Laird and Kathryn Soraiz manage their own software consultancy, Network Engineered Solutions, from just outside Houston, Texas. They invite you to e-mail them for notice of upcoming articles. Reach Kathryn at kathryn.soraiz@sunworld.com. Reach Cameron at cameron.laird@sunworld.com.

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-05-1998/swol-05-perl.html
Last modified:

Advantages and disadvantages of 10 Perl architectures

Architecture	Advantages	Disadvantages
An isolated script	Simple, portable, well-documented	Often tedious to write
Pipeline	Can solve involved problems in seconds	Often not portable, requires multiple processes
Wrapper	Quick way to make functionality available in Perl	Cost of process spawning, probable low portability, inflexible data communications
More sophisticated wrapper	More flexible than a wrapper	Requires more care, potential operating-system headaches in buffering
Compiled extension	Performance comparable to C, with programming power of Perl	Requires more programming experience and care in construction
Large problems	Power of Perl	Managerial suspicion, poorly defined development practices
Small problems	Power of Perl	Cost to learn Perl
CGI	Industry-standard way to make interactive Web applications	Mediocre performance
Template micro-scripting	Even easier than CGI	Less standard and flexible than full-blown Perl
In-process scripting	Much better performance than CGI	Reduced portability

Comments:
Name:
Email:
Company Name: