The safety of scripting
Cross-language trends in scripting
It will take several paragraphs to explain that, however. Scripts have the reputation in some circles of being amateurish approximations to correct algorithms. But a deeper look shows almost the opposite.
To give some background, we'll first answer two often asked questions: Is this column an advocacy piece? And, Do we favor one language over another? Yes, and yes. We advocate whatever language helps developers swiftly and accurately meet users' wants and needs. Fulfilling those human requirements sometimes is best done with Eiffel or Prolog, and sometimes with Assembler. Our biggest bias is that, in any situation, we start by looking at the people involved.
When language theoreticians and proponents write about "safety," their first thought is usually about the kind of semantic questions Java's exception-handling addresses. Suppose you're coding an application, and you mistakenly ask for the thirteenth element of an array or vector with 10 members. In C, the result is undefined (sometimes spectacularly so -- this "loophole" is a traditional weakness that crackers attack when invading a host). Java not only explicitly specifies how its applications behave in this circumstance, but it also gives developers programming control over that behavior. Both these aspects are important, and both contribute to the safety advantages Java has over C and C++.
Practical benefits of readability
Scripting's safety has more to do with organizational process than formal language theory. Several scripting languages have interesting theoretical properties, but for now we want to look at the safety of high-level coding and large-chunk architecture.
Scripting languages are generally high-level languages, which are much more concise and expressive than C and other "systems" programming languages. This is an advantage in several ways.
Less coding yields fewer errors in coding. The incidence of errors in programming seems remarkably constant, at a few per hundred lines of finished source. Projects written in 500 lines of Scheme generally have only one-tenth as many errors as the equivalent 5000-line C applications.
More subtly but perhaps more radically as well, the expressiveness of scripting encourages source studies. Peer review is one of the surest paths to quality in implementation, and so anything that encourages colleagues to read source code is likely to yield large returns. Scripting languages are designed to be easy to use. The consequence is that they're also relatively easy to read, so it's possible to recruit many reviewers, which is very important.
Interpreting scripts easily benefits code reviews in a way we've yet to see recognized in print. Code reviews of C and Java components are rather abstract; reviewers often ask, "What does that do?" Because scripts are designed for immediate execution, reviewers are much more likely to arrive at a review session having tested pieces of the code. Exercising a C function often requires an elaborate test harness. Scripted procedures or methods, in contrast, are generally easy to load into an interpreter that gives quick results.
Comprehensibility contributes to safety in a related way. Suppose you want to use a program that monitors system function. Prudent system administrators will only let you install an application with security privileges after they thoroughly understand it. If the program is written in C, they may never make the time to verify that it's what it says it is, and that it hides no Trojan horses. Understanding a script is generally much less intimidating. This is why some workers deliberately create security-related work in scripting languages -- users more readily adopt technology they understand.
Focus on algorithms
A final security payoff of high-level languages' expressiveness is analogous to their performance advantages. Although scripting languages have the inherent performance bottlenecks predictable with late binding, the speed of scripted applications often surprises users. What happens is that scriptors can give more of their attention to algorithmic improvements, rather then slogging through details of implementation. As algorithms often differ by multiple orders of magnitude in speed, the overall win goes to the high-level languages that promote algorithmic refinement.
This expressiveness pays off for security concerns also. Programmers relieved of the burden of low-level resource management can better focus attention on substantive security issues.
Scripting's encouragement of "glueing" gives an under-appreciated safety benefit. Traditional wisdom of software development says that prototypes and even first implementations should be discarded, and final deliverables completely rewritten, because designs otherwise accumulate too much "dirt" and rough edges.
Scripting is quite different. While C mostly reuses individual functions, scripting languages emphasize encapsulation of whole processes at a time. Most scripting languages present a variety of "glues": hooks to connect processes, CORBA or DCOM components, client/server pieces, and so on, all the way down to individual C functions. This makes it a straightforward matter to "move the boundaries" of an architecture, even after a prototype has been accepted. Stewart Brand's How Buildings Learn: What Happens After They're Built is a marvelous essay ("the best book I've ever done," Brand says) in which he argues that the best buildings are those that match their occupants' lives. It's good to be able to tear down walls, reroute plumbing, or build other walls. Scripting is in the spirit of Brand's human-scale building. Scripted solutions adapt through their life cycle with relatively little trauma.
In Brand's language, Java is good for building monuments -- architectures destined to endure without change. Scripting languages tend to yield something more like "temporary school buildings" put up in emergencies which are still in use a half-century later.
So, which philosophy is a safer choice for your next project? With scripting languages, the nice thing is that you don't have to choose. You can use scripting's glueing capability to rely on as much or as little of a scripting language as is best for your situation. Scripting languages cooperate well with a formal approach (several have interfaces to Ada), and yet remain easy to use for beginners. Pick the mix of characteristics you need, and you're likely to find a scripting language that fits the bill.
Questions and acknowledgement
Several readers have asked about the Tcl/Tk CD-ROM. We have plans for a fuller review later this fall. Until then, we'll simply point to the Tcl/Tk Consortium's Web page where you can order a copy for yourself. Coincidentally, our thanks for this column go to Jean-Claude Wippler of Equi4 Software, the project manager for the CD-ROM. He is an independent consultant who works across several languages, and discussions with him raised many of the "risk management" topics that appear here. Credit also goes to Stephen Klimaszewski of Pantheon Construction, who has brought architectural concepts in general, and How Buildings Learn in particular, to life for us.
There's nothing fancy about the requirements for a strong scripting language: It must be "easy to use" or "easy to learn." Once enough people have learned it, they want it to expand beyond its original capabilities, and it consequently bulks up and loses its youthful charm. This is a natural progression, which few languages (Lua and PHP are two potential exceptions) escape.
A couple readers of our September column, "Plenty of headroom left for Perl," took us to task for describing XML as "(roughly) an extension to the HyperText Markup Language (HTML) poised to be even more indispensable than the latter." While we welcome suggestions for explaining computing concepts more precisely, it certainly surprised us that the characterization above would mislead anyone. Tim Bray, co-author of the official XML specification, sometimes goes to an even higher level of abstraction when he calls XML a "logical" extension of Electronic Data Interchange (EDI).
Rather than split terminologic hairs, we'll simply urge readers unclear about the relations between XML, HTML, and SGML (the Standard General Markup Language) to read Ed Tittel's excellent feature in February's SunWorld. (See Resources below.)
We do have a policy comment on XML, though: XML is overhyped on the client side and underhyped on the server side just like Java was during its early days. It'll be years before browsers display XML robustly. With any of the popular scripting languages, though, you can build a server-side XML application today that improves the functionality and maintainability of your document handling. We've seen examples of this already: Early adopters of XML for document management on the server side report great results with simple scripted filter ensembles.
There'll be much more to say about Python next month, when Houston (its League City suburb, more accurately) hosts the Seventh International Python Conference. There's already a crescendo in activity, though. Python founder and conference chair Guido van Rossum, for instance, has gone public with plans for a Python Consortium.
The Python Consortium is only one of several current projects looking to benefit from the experience of other language communities:
Perhaps most intriguing is Interscript, an ambitious experiment in "generic programming" by John Skaller. This is a particular form of Donald Knuth's "literate programming."
As the official comp.programming.literate FAQ puts it, "Literate programming is the combination of documentation and source together in a fashion suited for reading by human beings...In general, literate programs combine source and documentation in a single file. Literate programming tools then parse the file to produce either readable documentation or compilable source. The WEB style of literate programming was created by D. E. Knuth during the development of his TeX typsetting software." Note that WEB is the computer language Knuth defined to implement his ideas about literate programming.
Skaller's aim with Interscript is to build a collection of tools that facilitate literate work across several computer languages. Among its documentation formats, for example, is the POD (plain old documentation) style of Perl. Skaller has implemented all of Interscript in Interscript itself, except for a bootstrap version written in Python (and soon JPython). Literate programming is one of the truly brilliant ideas in software development, and it's gratifying to see how well it fits with scripting.
Skaller's goals with Interscript are actually much grander than this. He ultimately wants to surpass class-oriented, object-oriented methodologies with "generic programming," which he bases on mathematical category theory. His Web site explains more of this assault on "the main problem of programming system design, [which] is the representation of abstraction." Interscript's first steps have yielded useful applications, and this bodes well for the ultimate success of the project.
...to Larry Wall, recipient of the first annual Free Software Foundation Award for the Advancement of Free Software. Among his many contributions, most notable, of course, is his creation of Perl. Wall has received a lot press lately, including profiles in Forbes and Salon magazines. And in case you missed it, SunWorld did an extensive interview with Wall last August.
About the author
If you have technical problems with this magazine, contact
Cameron Laird and Kathryn Soraiz manage their own software consultancy, Network Engineered Solutions, from just outside Houston, TX. Reach Cameron at firstname.lastname@example.org. Reach Kathryn at email@example.com.
About the author
If you have technical problems with this magazine, contact firstname.lastname@example.org