Click on our Sponsors to help Support SunWorld
Regular Expressions by Cameron Laird

Batteries included

Python Conference showcases scripting capabilities

SunWorld
December  1998
[Next story]
[Table of Contents]
[Search]
Subscribe to SunWorld, it's free!

Abstract
December 1, 1998: Batteries included
Cameron and Kathryn report on the Seventh International Python Conference: presenters relate their Python success stories and explain the meaning behind the phrase "batteries included." (1,700 words)
December 15, 1998: Why Eiffel?
In this installment, Cameron and Kathryn justify why they devoted space to Eiffel when most folks don't consider it to be a scripting language. Plus, Python's "dark side" revealed at the conference. (1,400 words)


Mail this
article to
a friend
Were we serious in our October column when we wrote that scripting is a safe choice for high-performance, mission-critical applications? Do scripting languages really do as well as conventional system programming languages?

No. They do better.

More precisely, the presentations and conversations at last month's Seventh International Python Conference outside Houston, TX confirmed our message: the best results come from wisely scripting pieces of functionality built with other technologies.

World-class supercomputing
At the Supercomputing Conference '98 which ran the same week in Orlando, FL, a scripted application won second place in the Price/Performance category of the annual Gordon Bell Prize for supercomputing performance. Meanwhile, Python Conference attendees got an insider perspective on both the prize and the more general topic of commodity supercomputing with Python in a keynote speech by Assistant Professor David Beazley of the Computer Science Department of the University of Chicago.

Beazley entertained the audience with tales of a physics supercomputing group that worked hard at supercomputing problems for years without producing the physics that was its nominal goal. Then the Python scripting language was introduced into the mix.

Within a few months, the scripting system allowed the team to perform physics simulations much more efficiently than was previously thought possible. In the years following the introduction of scripting, simulations have led to articles in Science, Physical Review Letters, and elsewhere.

The situation was roughly this: a group in the Theoretical Physics Division at Los Alamos National Laboratory had very powerful and highly parallelized molecular simulation routines coded for performance. These worked fast and generated gigabytes of data, which were then downloaded to workstations where it might take dozens of hours to render useful visualizations of the results.

Beazley explained how the team progressed from that situation:

Python was used to wrap up the molecular dynamics simulation code and analysis tools into a unified environment where physicists could experiment with them through interactive Python sessions. This allowed the physicists to concentrate on physics rather than writing additional functionality in C, figuring out how to move data between different machines and fighting with decoupled analysis and visualization tools. While typical simulations still run for hundreds of hours, the unified simulation, data analysis, and visualization environment built using Python reduced almost all of the typical post-processing steps from several hours or days to only a few seconds or minutes.

As Beazley told us, "the key point is that Python serves as an excellent 'steering' language. Using the Python interpreter, we were able to glue different components together and steer them in a manner that was much more flexible than what was traditionally available in separate monolithic packages."


Snapshot of the fracture simulation

It's true that Python, along with Beazley's SWIG software for linking together heterogeneous components, do raw calculations a couple of orders of magnitude slower than the components themselves. But using scripting's expressivity only at the high level for which it is most appropriate renders its computational load negligible in comparison with the overall cost of the application. Moreover, "componentization" of monoliths permitted the physicists, numerical analysts, and software engineers on the team to concentrate on their specialties. A few extra cycles of Python interpretation is a small price to pay for the enormous benefit of enhanced project manageability.

The result: A recent world-record calculation -- two weeks of sustained 9.4 gigaflops computation of shock-induced plasticity -- was a scripted application!

There's actually quite a bit more to the story. Supercomputing fans will recognize that 9.4 gigaflops is, in fact, no record in a year when leaders in this area have begun to measure themselves in teraflops-per-second. That high end is hit, though, only on multimillion-dollar, highly specialized machines, while this group's Avalon was assembled in a short time as a cluster of off-the-shelf Linux boxes. As Beazley explained it

The real impact was that for $150,000, it was possible for researchers to build their own "supercomputer" using commodity components. Python's extremely good portability is a benefit since it is relatively easy to maintain our application on a wide range of architectures including special purpose supercomputing systems and commodity PCs.

Why Python?
One reaction we heard from several conferencees was, "This is a smart group of people." How did all this brain power choose Python as its preferred vehicle?

Frank Stajano, researcher at the Olivetti and Oracle Research Laboratory in Cambridge, England, launched the most successful meme of the conference when he projected a slide with the phrase, "batteries included" during the very first application presentation of the week. Numerous other speakers adopted those words to introduce their own thoughts on the wealth of facilities immediately available to Python application programmers. In Stajano's words: "the single most important reason that has me now working in Python ...[is] a library issue. I prefer Python because its standard library is a gold mine."

There certainly are fewer people now working in Python than Visual Basic, Java, Perl, and several other languages. How has this small group manufactured a toolkit that achieves such striking results? This is a deep question, and we can answer it only in part. There should be no doubt, however, about the reality of the achievement. Python's object-based specification, sound implementation, and enthusiastic adopters have brought the language to an unusual level: Python programmers expect their projects to be successful. Every language has its advocates. Python is unique in our experience, though, for the quiet confidence of its users that it will reliably meet their needs.

Python at work
The two talks that immediately followed Stajano's presentation exemplified mission-critical, text-managing roles for Python. A small team from CNRI and Reliable Software Technologies presented a highly scalable, highly extensible replacement for the popular Majordomo mailing list manager. Next, Sean McGrath of Digitome Electronic Publishing detailed Python's role in capturing The Official Record of the Proceedings of the Irish Parliament for network and CD-ROM access. This showcased Python's virtues, for the project had strict performance, quality, and scale requirements. As McGrath concluded:

No software aspect of this project took more than one man-week to prototype and a team of three programmers -- sometimes working continents apart -- could pick up code, understand it and be moving forward making changes to it very quickly.

Surely it is not fast. By writing it in, say C++ we could probably get the build time for each day's debate to a matter of seconds. However, it would have taken us many man-weeks to write the code in C++.

McGrath also illustrated the "batteries included" with Python, in stating that the success of his project depended on Python facilities to manage XML: "Java probably has the most comprehensive support for XML but CPython is running a close second."

Open in all directions
McGrath's use of CPython rather than Python deserves explanation. Perhaps the clearest manifestation of Python's desirability as a development platform is its portability. It isn't just that Python is available for Unix, Windows, MacOS, and several other specialty platforms, although this is important. And it isn't just that Python appears on most new operating systems as quickly as Pythoneers get their hands on them (although it was provocative to see how many conferencees bought WinCE machines during the week to try out the latest port). The point is more that Python is simply open to all other technologies: operating systems, protocols, formats, and standards. Examples abound:

Finally, Python is the smartest way to script Java components. Jim Hugunin of CNRI has reimplemented the Python language specification in a form called JPython that both runs inside a Java virtual machine, and provides full access to Java facilities. While the abstract functionality this makes available is evident, it might sound a bit academic -- interpret a language in the virtual machine for a different language? Such an exercise can't be intended for serious work.

The shocking result is that it is. Not only is JPython already surprisingly complete in its scope, but Hugunin presented remarkable performance results in his address. His aim with advanced compilation techniques on which he's now working is to make JPython even faster than the conventional CPython still used for most projects.

McGrath worked at a time when the conventional CPython was the natural choice. CPython has XML abilities close to those of Java. JPython, of course, is Java's equal in this regard, simply by reuse of Java components -- except that it also accesses Python's capabilities.

Safe with Python
Python's a safe choice for almost all projects. Even if requirements change -- if you need to port to a different environment, incorporate a new networking protocol, or change algorithms -- Python's likely to already have those "batteries included."

This month, the obvious site to represent scripting in the real world is SPaSM, where you can see animations of the supercomputed simulations of crystalline materials. (See Resources below.)

Our next installment on December 15 discuss the wisdom of having mentioned Eiffel in a column devoted to scripting and touch on what the Python Conference revealed about Python's dark side.

Resources


Advertisements

December 15, 1998: Why Eiffel?
Did Eiffel belong in our November 15 installment?

We weren't exactly surprised when several readers asked this question. Lars Marius Garshol, in particular, suspects:

"Many people will come away from this article thinking that Eiffel is a scripting language, instead of 'C++ done right', which [in my humble opinion] would be a better description."

We have to abbreviate our answer. Our continuing challenge, in fact, is to figure out how to balance precision and depth with the flood of news pertinent to scripters. There's always so much to cover, and we're disappointed that we haven't had room yet to feature:

And we've fallen behind on supplying the detailed examples of working code that several readers have requested. We generally concentrate on "conceptual" topics in this column, and fill the Resources section of each installment with pointers to places where you can learn how to implement specific solutions yourself. Still, we've come up with a few apt comparisons that deserve to be shared as soon as we make space for them.

Why then did we devote a column to Eiffel -- and without a "not really a scripting language" warning label?

First, Regular Expressions always focuses on scripting technology only as a means to fulfilling application requirements. Even if Eiffel doesn't match a particular definition of a "scripting language," the Eiffel story is important for programmers who rely on scripting to achieve reliable results quickly.

A thought experiment
Imagine a parallel universe. In this alternate reality, an academic named Bertrand Meyer originates such seminal concepts as "Design by Contract" [TM], writes an award-winning book on the subject of Object-Oriented Software Construction, and elaborates an imaginary language called Eiffel to illustrate his ideas. Over and over, he and his students demonstrate the power of late binding and the expressivity of a language that automates manipulation of type information -- two of the key ideas behind modern scripting. Practitioners with real working languages glean from his writings ideas fit to be adapted to other idioms. Scripters, in particular, recognize "one of their own" in his emphasis on the "selfishness principle," the "fallacy of defensive programming," and the conditions for rapid program development.

Return to planet Earth. Here, we choose not to penalize Dr. Meyer for making Eiffel real and adding 15 years of industrial success to its record. Despite its practicality, it still has a lot to teach scripters. It's not a liability that it can actually be used, instead of being merely a topic of debate.

In any case, one of this column's open secrets is that its chosen topic of scripting languages is a bogus category. "Script" is more a marketing slogan -- used both for and against particular languages -- than a technical distinction. It's true that when we began the column, we didn't expect ever to mention Eiffel in it. However, modern Eiffel has demonstrated the richness of its reusable modules, its power and portability, and its ability to connect to other technologies and resources. While it's not an interpreted language in the way most scripting languages are, it otherwise competes -- and cooperates -- quite handily with them.

Mr. Garshol is right. When you write your first "Hello, world!" in Eiffel, it will probably feel more like C++ than it does awk or VBScript. Over the long haul, though, your productivity with Eiffel will likely remind you more of a high-level scripting language. This is particularly true now, after a year which has seen public release of many new Eiffel facilities, including class libraries, compilers, and development frameworks. Give them a try.

Python's dark side
In our December 1 installment, we promised a look at the dark side of Python. More precisely, we want to round out our coverage of the Seventh International Python Conference with a glance at the growing pains on the minds of the attendees there.

Our earlier headline was the "batteries included" characterization many at the conference cheerfully adopted.

But Greg Ward, in his final-day developers' session called "Extension Building Considered Painful," posed this question: "What happens, though, when you have a job for which Python's batteries (its standard library) aren't quite enough?"

Ward is a recent hire at the Corporation for National Research Initiatives (CNRI), the same non-profit that currently employs Python's creator, Guido van Rossum. He introduced the theme with a comparison between Python, which he began to use only a few months ago, and Perl.

Programmers familiar with Perl know what to do whenever their needs go beyond the standard language distribution: Go to the Comprehensive Perl Archive Network (CPAN). CPAN is a collection of about 100 software sites around the world that mirror scores of utilities, extensions, modules, and documents beyond Perl's standard distribution. Dive into CPAN, and chances are you'll find the specialty piece that does what you need. Even better, it'll fit with your other Perl work -- the modules are packaged in a standard way, with a standard installation method documented in a standard fashion.

Life in Python land isn't quite so comfortable. It's a bit harder to locate contributed pieces. Worse, once you get them, "you'll have to solder them in yourself" because they're unlikely to snap-fit, in Ward's vivid characterization.

Ward and the other participants in his session recognize that the value of CPAN is not merely that it's an accessible Web site, or that it has a convenient search tool, although those technical commodities are necessary for its success. CPAN's achievement is most dramatically a social one: It crystallizes an entire matrix of expectations and technologies. Enthusiastic contributors know there's a standard way to wrap up their work and submit it to CPAN. Templates make it easy for Webmasters to mirror CPAN onto a new site. Perl programmers have to learn only a few stereotyped steps to download and incorporate any of CPAN's resources.

The Python community is ready for its contributed extensions to have their own smoothly run CPAN-like site. The consensus at Ward's session was that all technical issues are resolvable, and that the distribution utilities logistics and social engineering special interest group (distutils-SIG) is ready "to make building Python modules, packaging them for distribution, and installing them on a target system painless and standardized."

In fact, Pythoneers aspire to go beyond CPAN. Open source advocate Eric Raymond, on hand at the conference to deliver the keynote address on "Homesteading the Noosphere," also spoke informally on Trove. Trove is a technical project that uses Python to build on Raymond's experience running the Sunsite archive "to create an open source distributed archiving system for use at large software archive sites." It promises to be richer and more scalable than CPAN; one ambition, for example, is to automate dependency checking so that modules know to download other modules on which they rely.

Python's other challenges also appear to be more or less equally amenable to hard work and bright ideas, which suits the technical crowd perfectly. In contrast with Java, no one in this group seems disposed to file a lawsuit, or even harbor ill will. Creation of the Python Consortium represents enough of a change to make someone itchy, but all the programmers who expressed an opinion appear confident that CNRI Vice President Albert Vezza will manage it as productively as he does the World Wide Web Consortium and Internet Engineering Task Force. People agree they want more books on and publicity about Python -- and those are coming. The biggest controversies of the week had to do with technical subtleties of typing systems. (Example: What disadvantages are there to allowing, but not requiring, type declarations in a context where they've been forbidden until now?) Combatants generally agreed to go off and experiment more.

Python's dark side turns out to be only a little grey. This is one language still in its expansion phase.


Click on our Sponsors to help Support SunWorld

Resources

About the author
Cameron Laird and Kathryn Soraiz manage their own software consultancy, Network Engineered Solutions, from just outside Houston, TX. Reach Cameron at cameron.laird@sunworld.com. Reach Kathryn at kathryn.soraiz@sunworld.com.

What did you think of this article?
-Very worth reading
-Worth reading
-Not worth reading
-Too long
-Just right
-Too short
-Too technical
-Just right
-Not technical enough
 
 
 
    

SunWorld
[Table of Contents]
Subscribe to SunWorld, it's free!
[Search]
Feedback
[Next story]
Sun's Site

[(c) Copyright  Web Publishing Inc., and IDG Communication company]

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-12-1998/swol-12-regex.html
Last modified: