|
Choosing a scripting languagePerl, Tcl, and Python: they're not your father's scripting languages
|
Perl, Tcl, Python. The odds are that you have used or at least heard of one of these popular scripting languages. We introduce you to the basic concepts in scripting and tell you how the "big three" languages compare. If you see a scripting language somewhere in your future or if you're looking for an overview of what's out there, you should read this article. Also, be sure to check out the sidebar for quick definitions of such lesser-known languages as Rexx, Scheme, Guile, Forth, S-lang, Yorick, and Icon (3,000 words)
Mail this article to a friend |
Assume
set seconds 0.0; set stopped 1 label .stopwatch_display -text $seconds button .start -text Start -command { if $stopped { set stopped 0 tick } } button .stop -text Stop -command {set stopped 1} pack .stopwatch_display -side bottom -fill both pack .start .stop -side left proc tick {} { global seconds stopped if $stopped return after 100 tick set seconds [expr $seconds + .1] .stopwatch_display config -text [format "%.1f" $seconds] }Questions:
The correct answers are:
This on-screen stopwatch illustrates several of the attractive qualities of today's scripting languages. They're easy to learn and portable, and they accomplish remarkable results in only a few lines. What a code sample doesn't show, though, is that most scripting languages also are:
It's a seductive combination. In fact, the most important lesson of this article is that it's time, today, to learn a scripting language. Read the descriptions below and decide which one best fits you; keep in mind that any of these will be a treat to use.
|
|
|
|
Languages or frameworks?
What are the characteristics of scripting languages? First, they're
productive. You should be skeptical when told things like, "COBOL
programmers become 18.4 times as efficient when they adopt Rexx,"
because the craft of programming is simply too diverse for that to
be meaningful. Although anecdotal, the most persuasive evidence of
all is the reality of how individual engineers behave. Many working
programmers offer testimonials along the lines of, "We re-implemented
an old C application in Python, and the line count
went down by 90 percent while we added functionality." Once they know Icon,
Tcl, or another scripting language, they simply choose to use
it. It's not like high-fiber diets, object-oriented analysis, or
even timely performance reviews; people rarely say, "Now that I've
learned this language, I ought to do more scripting." They just do
more scripting.
There's an oversimplification in the title. Your choice is not really about choosing a scripting language. You'll go much farther when you understand that you're after a scripting framework. The best way to understand this is through a real life example:
You start learning the scripting language of your choice. In a short time, you've written a five- or fifteen-line script that does something you've always wanted to do, although you never had the time, confidence, or knowledge to tackle it. "Five (or fifteen) lines," you think. "This isn't bad at all."
You show it to a few friends who ignore you because they have more important matters on their minds. Then a few days later, someone comes up to you, and asks, "Know that little gadget you built? That was slick; in fact, I could use something like that, if only I could:
In fact, this is how many scripters spend their time. While modern scripting languages are remarkably expressive and capable, one characteristic that almost all share is that they're written to be extended. The big returns on your investment come when you wed the strength of the language itself to the legacy algorithms, components, and objects already in use in your organization.
It gets better. After a couple of months of demonstrating that your chosen scripting technology easily absorbs the quirky interfaces they've tried to fold into it, your colleagues worry less about where the pound signs and backslashes go; now, they just want to get jobs done, and the particular syntax of the language you choose doesn't seem like such a problem. A manager asks, "Can we do this under VMS/MacOS?" "Sure," you say, because you know that modern scripting languages have been coded with great care given to portability. "What will it take to put a copy of this stuff inside our existing product, so that users can write their own macros?" "Not a lot," you answer with confidence. "Can we Web-enable it, communicate through SNMP, administer it remotely, perform calculations with arbitrary-precision integers, and cryptographically secure the transactions?" "Well," you think, "other developers have already written packages to support precisely those operations, so it will only take a bit of reconnaissance to pull it all together."
Then one day, you're at the water cooler, and you hear yourself proclaim, "If I had to give up X (your chosen scripting language), golly, I think I'd just resign first." You've surprised yourself -- yet it sounds right to say it.
This is what has happened: When you think of your language, it's not just grammatical structure with peculiar rules for writing array elements. For you, the language opens up an entire framework which gives you extensibility, embeddability, and portability.
That's what happens with scripting languages. As languages they diverge dramatically in appearance, philosophy and style; but, as frameworks almost all have the features that make their use and reruse compelling.
Before you begin: Some words of advice
In understanding the scripting world, it's important to remember
that it's better defined existentially than essentially. We don't
say that Perl is a scripting language and C isn't because of the
formal definitions of their syntaxes or binary implementations.
Rather, when we use Perl, we usually do so in interpretive contexts
with emphasis on things like "gluing" external applications,
development so rapid as to invite discard, and a plain-text
deliverable. C as a language can support the same qualities -- yet
there's no question about the typical roles played by C and Perl.
We recognize scripting languages by their "social" uses, not by
counting a checklist of features.
There's one last objection to consider before looking at the languages themselves. "You get what you pay for," you think, and these languages are free. Real applications demand real languages, not this hobbyist stuff.
True, these languages are available at no charge. Most have generous licensing that allows everything short of claiming that you yourself invented them. This does not mean that they are toys. The languages have been implemented carefully and are at least as reliable as comparable commercial products. Scripting languages are very, very serious business to the people who use them for such mission-critical roles as controlling scientific equipment, supervising television networks and railroad operations, delivering millions of Web hits daily, and monitoring satellite communications, to list a few. The FAQs we list in the Resources section below give more details on the reliability of these languages, along with lists of firms that provide commercial support for each. These are real languages.
The big three: Perl, Tcl, and Python
Odds are that your own adventures in scripting will start with one
of these. The single most important criteria in choosing a
scripting language is that you have a friend who can help you with
the language you choose. Perl is the most widely used scripting
language on Unix machines, apart from sh, and it's on a trajectory
to pass the latter, so it's an easy choice. Even if you get stuck
with an aspect of Perl, there's probably help nearby to talk you
through it.
Perl is implemented with great care. It has a stunningly low incidence of faults, far fewer than the average commercial development product. It's more portable than Java, in the sense that high-quality ports exist for more environments than Java can now boast. It has the strongest community infrastructure of any scripting language: its annual conference is the biggest; there are more books written about it than any of the others to follow; the Usenet newsgroups comp.lang.perl.* carry the largest signal; and it's easiest for employers to recruit Perl programmers. The Perl community sponsors archives of re-usable code and documentation, called CPAN (Comprehensive Perl Archive Network), that are the model for other languages.
Through an historical fluke, Perl has come to be identified with CGI work in the minds of many people. The formal connection between them is weak; certainly the other languages here are roughly comparable to Perl in their string-handling, input-output, and database capabilities -- the usual strengths cited for Perl by CGIers. There is also no doubt that Perl has a vigorous life beyond its CGI role, although it's possible that a majority of current Perl users don't realize this. Why the identification? Apart from serendipity and the Web explosion, much credit probably should go to Jon Udell, whose column for Byte consistently delivers solid technical content wrapped with an inspiring vision of what programming leadership can achieve. Perl has been his cross-platform workhorse of choice.
Perl performs well; it's generally quite zippy. It allows object-oriented programming, although it doesn't require it.
Are there problems with Perl? A few. It's not simple, in an academic sense; it's deliberately a highly evolved language, and as with other products of evolution, students puzzle over Perl's earlobes, appendixes, wisdom teeth, and so on. Perl has become rather heavy. While it's efficient at providing the features it offers, it is too rococo and "tricky" for those with austere taste. "There's More Than One Way To Do It ... [i]s the Perl Slogan," according to the central troika (Larry Wall, Tom Christiansen, and Randal Schwartz) of the Perl world. It's a good slogan because both fans and opponents of Perl base their arguments on it.
Consider this example for system administrators, edited from a script Randal L. Schwartz published in a recent column for Unix Review:
# Usage: "perl this_script_name.pl DIR1 DIR2 ... DIRN" # Output: a sorted list of the twenty largest files found across # the file systems under DIR1 ... DIRN, collectively. use File::Find; $depth_limit = 20; @directory_list = @ARGV; # "-s" abbreviates "the size of", and "-f" "... is a file". find (sub {$size{$File::Find::name} = -s if -f;}, @directory_list); @sorted = sort {$size{$b} <=> $size{$a}} keys %size; # Toss out anything not in the top $depth_limit. splice @sorted, $depth_limit if @sorted > $depth_limit; foreach (@sorted) { # Tabulate the results in nice columns. printf "%10d: %s\n", $size{$_}, $_; }
To the uninitiated, the density of @_% characters brings line noise to mind. Perl fans prefer to think of this as "economy of expression." Other strengths of Perl illustrated here are its wealth of freely available and well-integrated special-purpose routines, libraries, and packages. In this case, File works with file systems, and File::Find specifically implements directory walks. As Schwartz writes, "Perl excels at those little routine tasks that do not have an off-the-shelf solution."
Tcl: simple extensibility
Tcl is the single most inviting of these languages. At least that's
how it feels to those of us who choose to concentrate on Tcl. The
first steps with Tcl are very easy. This same simplicity reappears
at a higher level, where Tcl users enjoy the most continuity between
the different modes of interactive experimental programming,
embedding facilities in larger applications, and repackaging work
into productized versions. Tcl is at least as portable as Perl. Tcl
is easy to learn; its syntax is so simple as to invite ridicule.
Tcl has historically been the best choice for those who want to
design "little languages" for end-user deployment. For example,
several commercial CAD products expose a Tcl interface to define a
general-purpose language for controlling the product itself.
Part of Tcl's simplicity is because, as the mantra goes, "everything is a
string." This reduction encourages
recognition of the duality between code and data, making it easy to write
language processors in Tcl. For example, Tcl has built-in looping
constructs while
, for
, and foreach
.
Creating new control
structures to supplement these three is quick:
# This implements "repeat ... until ..." # # It's definitely not in the spirit of Tcl to be so syntax- # heavy as to require a keyword, "until"; what is # characteristic of Tcl is that it's flexible enough to # implement it without a hitch. # Note that this implementation is only illustrative. A # production-ready version requires another dozen lines # of boilerplate exception-handling, to accommodate rigor- # ously the defined error conditions which "block" and # "condition" might throw. # An example usage: # set i 0 # repeat {puts -nonewline $i} until {[incr i] == 10} # emits # 0123456789 proc repeat {block until condition} { if {"until" != $until} { error "Syntax: 'repeat {...} until {...}'." } while 1 { # Notice that the "uplevel"-s ensure local # variables are evaluated in the correct # context, that is, the context of the # "repeat ..."-ing procedure. set result [uplevel 1 $block] if [uplevel 1 [list expr $condition]] { return $result } } }Historically, the "everything is a string" approach has hamstrung Tcl's performance for common arithmetic and control operations. The most recent Tcl release, 8.0, dramatically improves performance. Still, Tcl's speed remains an issue for users with stringent requirements and also for those new to the language who don't know how to make best use of its performance profile.
Along with its simplicity -- perhaps because of it -- the strongest argument for Tcl is the success of two of its extensions, Expect and Tk. Expect automates character-based interaction with a power and scope that is hard to communicate briefly. Expect is the most irreplaceable of all the languages we describe here because it uniquely abstracts the terminal control involved in responding to programs that "take over" when requesting passwords or configuration information. Everyone involved in Unix system administration needs to learn Expect without delay.
All these languages have a primary orientation to command-line operation. Fundamentally, they process characters, character streams, files of characters, and so on. However, most of these languages also provide at least one way to build graphical user interfaces. Remarkably, the best of these is the same for most of the languages. Tk is an extension to Tcl that knows how to build graphical user interfaces with native look-and-feel under Windows, X, and MacOS. It's likely that a majority of Tcl users started with the language just for its access to Tk. Recall that this article began with an example of Tk's capability. It has been so successful that it has inspired interface tools like TkPerl, TkPython, TkScheme, and others. Although each of these has the potential for independent development, in fact they are all maintained as derivatives of Tk, a Tcl extension. This is a powerful argument for the "rightness" of Tcl as an extension language.
Scripting scalably with object-oriented Python
Python is smaller than the other two languages, in the sense that
there are fewer Python books; its annual conference is a bit more
intimate; and it's generally not quite as well known. Python is
much like Perl and Tcl in its portability, universality, and quality
of implementation. Python has a strong model of object-oriented
programming. This intimidates some beginners. At the same time,
Python is most suitable for "programming in the large." Python has
a reputation for being understandable even for projects that involve
multiple authors and hundreds of thousands of lines of source.
Tcl and Perl have bitter critics; there are people who strongly dislike aspects of each language. Apparently no one reacts to Python that way. This is no lack of character; rather, it testifies to the quality of the design and implementation of Python. It appeals on subtler levels than do Perl and Tcl and is difficult to show at its best in a tiny example. Perl typically is more terse, and Tcl is always easy to understand "in the small." Python's compelling virtue is that it operates at the state of the art as a clean, modern, portable, and extensible language. Aaron Watters, lead author of the most important Python book, Internet Programming with Python, gives in his Web pages the example of a matrix multiplication implementation,
def mmult(m1,m2): m2rows,m2cols = len(m2),len(m2[0]) m1rows,m1cols = len(m1),len(m1[0]) if m1cols != m2rows: raise IndexError, "matrices don't match" result = [ None ] * m1rows for i in range( m1rows ): result[i] = [0] * m2cols for j in range( m2cols ): for k in range( m1cols ): result[i][j] = result[i][j] + m1[i][k] * m2[k][j] return resultand argues that Python simply has better aesthetics than any competitor.
SWIGging down your work
With a good feel for Perl, Tcl, and Python, you now know most of
what you'll ever need for your scripting career. The sidebar
(below) points to the most prominent special-purpose alternatives to
these languages.
There's a final piece to the state of the scripting art. You've already learned how scripting often "glues" together components written in different languages. The interfaces between the components are easy to manage -- though why manage them at all? That's the question a remarkable processor called SWIG poses. It automates cross-platform generation of interfaces to such scripting languages as Perl, Tcl, Python, and Guile from C and C++ source. It also automatically generates documentation formatted for HTML, plain-text, and LaTeX. David Beazley originally developed SWIG to construct interfaces to large-scale simulation codes at Los Alamos National Laboratory. He plans to generalize its operation so that it will be easier to accommodate other languages including FORTRAN, Java, and more. As it stands now, SWIG is a light weight, robust solution for a significant fraction of the benefits many projects want from such complex machinery as CORBA, DCOM, and ILU. Make sure you read in the Resources below Beazley's description of how he develops C-coded libraries driven by handy scripting shells.
Closing advice
This article has focused on freely-available, cross-platform,
general-purpose languages -- that is, those with the qualities our
readers have said interest them most. Eventually, you also might
want to learn about:
Finally, be good to yourself. Learn a scripting language. If you already know one, learn a second. Teaching legacy components to play together nicely will become easier than you imagined, and you'll also prototype serious standalone applications in less time than it used to take just to figure out how to describe them. Once you start scripting, you won't stop.
|
Resources
About the author
Cameron Laird and Kathryn Soraiz manage their own software consultancy, Network Engineered Solutions, from just outside Houston, Texas.
Reach Cameron at cameron.laird@sunworld.com.
If you have technical problems with this magazine, contact webmaster@sunworld.com
URL: http://www.sunworld.com/swol-10-1997/swol-10-scripting.html
Last modified:
Rexx's strength is its historical base in IBM commercial systems. Rexx is an excellent choice for those who move between big-iron and end-user level client machines. Rexx feels less modern and polished than other languages, but it gives good control over its underlying operating systems.
Scheme and Guile are in the LISP family. Scheme is a small, clean, efficient language with wonderful power and expressiveness. For both good and ill, it doesn't look like mainstream languages such as Java, C, BASIC, Perl, or Pascal. Its user base appears to be smaller than that of Perl, Tcl, and Python, but it inspires zealotry (or perhaps attracts those with a predisposition to zealotry). Guile is a Scheme that the GNU project has force-fed a collection of miscellaneous facilities. It exhibits the typical advantages (think of being able to do anything from within Emacs!) and disadvantages (think of what it takes to do anything in Emacs!) of GNU projects. Anyone already comfortable with LISP might prefer to start scripting with Scheme rather than Perl or Tcl, and will miss little with that choice.
Special-purpose choices
Forth is a relic from a time of severe constraints on memory, clock
cycles, and connectivity. At the same time, it's had a couple of
decades to build on the wisdom of its simplicity. Excellent
implementations are available for a variety of platforms, and it
deserves to be in the toolkit at least of those who program for
real-time and embedded systems.
S-lang and Yorick are particularly good for scientific or engineering calculations. They are easier to use than FORTRAN, perform reasonably well, and exhibit good engineering that can support projects that grow beyond initial expectations.
Icon is a lovely language for text processing. Perl's regular expressions come as a revelation to programmers accustomed to writing and re-writing their own parsers. Icon's power exceeds Perl by that same amount more.
Resources