Speakers at Eighth World Wide Web Conference demonstrate solutions
June 15, 1999: Unraveling threads
New releases of Perl and Tcl are making threads programming easier. (1,300 words)
here's still mindshare to win.
That was evident at last month's Eighth World Wide Web Conference in Toronto. Even among the technically sophisticated audience of WWW8, not everyone gets scripting's achievement yet. We'll look this month at a couple of underappreciated languages that made appearances at WWW8.
PHP: Productive hidden plumbing
PHP is one technology with a particularly lofty ratio of accomplishment to recognition. Its creator, Rasmus Lerdorf, reported that the latest survey results from Netcraft show that more than one out of every 10 Web servers appears to be equipped to process PHP. Such prominent sites as XOOM.com, Apache.org, Red Hat Software, Freedows OS, First USA Bank, FreshMeat, Miss USA 1999, SegFault, and Volvo rely on PHP.
But how many of you know of PHP? It certainly doesn't seem to garner as much attention as Active Server Pages (ASP) or servlets. This low profile reflects the personal modesty of the creative team behind PHP. Lerdorf insisted on calling it just a "macro tool" for its first two years. He's now willing to acknowledge it as a full-fledged language, and his focus continues to be on end results and the hundreds of thousands of developers who rely on PHP. It sounds entirely natural when he explains, "The software is irrelevant; the user community is what's important."
Lerdorf and the five other "core developers" have nurtured that community with several wise decisions. The PHP home page is a model of engineering virtues. It loads so quickly that PHP programmers commonly use it as an online programming reference manual. Automatic mechanisms are in place to incorporate reader comments and updates to the documentation -- in realtime. The core development team makes a point of conducting most of its business on a public mailing list that is conveniently archived at the site.
PHP stays close to users' needs
All these actions keep PHP close to its users and their needs. It's a simple language ("a lot like C, with the dangerous parts taken out," Lerdorf said once) that is easy for nontechnical site developers to learn. It's lightweight -- low-end hardware sustains million-hit-per-day processing with PHP. Its well-structured extensibility has spawned a rich variety of supported protocols, including the widest range of database interfaces we know. The majority of PHP users we've met adopted it because they say they can connect to every database they'll need, even such peripheral ones as Adabas, MS-SQL without ODBC intermediation, and ndbm. PHP is available for all the leading Web servers. Finally, PHP is relatively safe; built-in guards protect host servers against runaway recursions, memory leaks, or similar end user coding errors.
At the WWW8 conference, PHP core codeveloper Stig Bakken presented the advantages of PHP4, which is now available in a limited release. He lauded the parsing rewrite by core team members Andi Gutmans and Zeev Suraski, citing performance increases that range up to a factor of ten. Their new parser, accessible as a portable module called Zend, exposes enough internals to permit construction of after-market optimizers (including a cacher for compiled code) and debuggers. PHP4 also rectifies a few mistakes in PHP's syntax and provides a mechanism for object overloading. This facilitates properly polymorphic interfaces with such external backends as COM and CORBA.
PHP4's improved modularization manifests on a couple of levels. Specific rewrites of the Web server and database interfaces make them more portable than ever. Moreover, the scheme is strong enough to "make it very easy for third-party developers to maintain and distribute their own PHP extensions, independently of the PHP distribution," according to Bakken.
PHP's principal stumbling block seems to be its relative market invisibility, not any particular technical deficiency. A typical comment made to the mailing lists is, PHP is great! I just discovered it today, and I already did x. Now, how do I do y? The first PHP book, Leon Atkinson's Core PHP Programming, which just made it to bookstore shelves a few weeks ago, should help with many of those questions.
The core developers have also been working on a book. Lerdorf gravely reported, "A year ago, we were about 90 percent done. We've been programming more than writing, though, so now we're about 65 percent finished."
An organization favoring open source, though, will find that PHP's useful and fresh Web site and enthusiastic user community make up for these deficiencies. If you're looking for an easy way to bring disparate computing resources to the Web, give PHP a chance.
A little background first: XML was probably the most motivating technology at this conference. XML -- Extensible Markup Language -- is roughly the successor of HTML (HyperText Markup Language). While most discussions focused on its client-side potential to manage interesting display renderings, nearly all useful deployments of XML thus far have been on the server side. Pessimists speculate that browser manufacturers might never get their houses sufficiently in order to support meaningful client-side work.
This is a stunning achievement. First, Mozquito is written with sufficient care to be portable. Stack Overflow tests it even on Internet Explorer for the Macintosh, which is notoriously nonconformant.
Mozquito is also useful. The high point of Schnitzenbaumer's talk was an illustration of how XML processing makes it feasible for a nonprogramming page designer to put together and maintain an application as complex as an appointment book calendar. Mozquito introduces appropriate general-purpose persistence and rendering constructs that bring quite sophisticated projects within the grasp of design specialists. Moreover, it's all expressed in XML, so that Mozquito gains the leverage of XML development and service environments. Stack Overflow itself is a member of the World Wide Web Consortium, and Schnitzenbaumer presented examples in terms of a prototype for the emerging Extensible HyperText Markup Language (XHTML) standard which his company has integrated in a project called OpenForms.
Brent Welch, author of Practical Programming in Tcl and Tk, and Guido van Rossum, the inventor of Python whom Dr. Dobbs Journal recently recognized with an Excellence in Programming Award, quizzed Schnitzenbaumer skeptically. While they established limits to Mozquito's role -- it can't overcome browsers' inherent security defects, for example -- Van Rossum finally concluded, "That's cool."
Scripting is so Amusing
This month's "Scripting in the Real World" example centers on American Used (Amused) Clothing of West Lafayette, IN, a storefront for a small retail trade catering to the college-town crowd. Frugal budgets don't lock Amused out of clever data processing, however.
All of Amused's physical point-of-sale (POS) and inventory transactions originate with bar code readings. Inexpensive hardware translates scans into keypress equivalents on the one low-end PC currently supporting operations. The data appears as entries in forms within a simple application.
Results have been rewarding enough that Amused plans to expand to a back-room server hosting several new applications fed by thin clients at multiple cash registers. What's interesting is that the application's architecture is already fully distributed, although it currently lives on a single, disconnected node. Here's why:
What's the advantage of this approach over a Visual Basic (VB) program, which most Win* workers probably regard as the natural solution on this platform? Palaver's Web-based coding is inherently portable, distributable, and scalable, and it simplifies deployment. Moreover, the next generation of Palaver's deliveries rests on an interesting metadata architecture for all visual elements. Rather than relying on "manual" layout of graphical user interfaces (GUIs), as VB prescribes, applications compute their GUIs at run-time from metadata kept in the same SQL datastore with the application data. This architectural insight, the same one behind much of XML's promise, yields payoffs in rapid generation of reliable GUI-based programs.
Brian Capouch, owner of Palaver, says the success of the application, means "freedom from the welter of paper-based or 'item description-based' ways in which this sort of thing has been done in the past." Even though Amused is "too small to gain access to corporate databases of UPC [universal product code] information ... this is a practical way for them to gain competitive equity with the 'big boys' in terms of inventory and POS management."
A lot of help goes into each installment of Regular Expressions. This time, we especially want to recognize the contributions of Palaver coders Bob Dusek and Eric McKeown for explaining their work, Ian Graham, who organized the Developer Day component of the WWW8, Paul Prescod, who sparked the Scripting session of the WWW8, and many generous PHP users, including Frank Balak, Mark Roedel, Shane Caraveo, Bjoern Borud, Jim Wimstead, and Jens Ellegiers.
June 15: Unraveling threads
Threading is getting easier.
Threading has traditionally been a hard subject. Programming threads is difficult and error prone. Here are the essentials you need to know:
Let's look at each of these points in more detail.
A thread is a lightweight "execution unit." A process is a not-so-lightweight execution unit. Every thread operates within a process; some processes have more than one thread.
While even these basics aren't strictly true for a few unusual operating systems, they're reasonably safe for the leading desktop and server OSs.
The details of what a thread is and how it works vary considerably. Although it's generally an OS-level concept fundamental to the operation of all system services, specifics can differ even from one release of an OS to another. On Solaris, you're likely to see references to "green" threads, pthreads, and Java threads, if not others.
Along with understanding the essence of different threading implementations, you'll also encounter a lot of questions about how to use them. Threads appear in a programming facility (a language or library, for instance) in at least a couple of distinct ways. The facility might support threads -- that is, manage creation, operation, and destruction of multiple threads. A separate matter is whether the facility can safely be embedded within other threads. Examples of each of these show up below.
There's also a notion of threading in language theory, as it arises in modeling Forth and its relatives. This is not directly related to the kinds of threads making the news most recently.
Threading is good and getting better. A decade ago, it was recognized as an important but touchy concept. Since that time, enough books have been published, pertinent Usenet discussions floated, classes given, and illuminating application experience accumulated that we collectively have far more confidence about how threads can usefully be handled.
One of the major advances is this: Higher-level threading is now portably available through more expressive languages. Hundreds of thousands of programmers have learned to work with Java's threads.
Scripting languages bring their usual advantages to threaded programming, including excellent portability and minimal need for bookkeeping.
With all these advances, thread programming is rife with mythology and delusion. The two worst misconceptions are that programming with threads boosts performance, and that multithreading is necessary to achieve multiprogramming.
Threading is a programming model, but it's certainly not the only one. Under Unix, for example, it's common to design architectures with cooperating processes, and several other languages support a third (and sadly neglected) alternative, called co-routines. However, several commercially important operating systems, including OpenVMS and Win*, implement kernels in which process creation and/or process context switches are relatively expensive. This performance hit has tarnished the wider reputation of process management and influenced programmers to use threads for multiprogramming. Keep in mind, though, that this is a contingent consequence of implementation design, not a fundamental constraint.
Even with such operating systems as Windows NT, threaded programming isn't always the right approach. Many classes of program design are not sensitive to the differences in performance between threads and processes. In others, threaded designs perform worse than monolithic ones. Performance responds to threading most often on multiprocessor hosts -- and even then the facts are generally more complex than any simple, "threading-is-fast" equation.
It's most advantageous to write threads when you're working within a programming model that supports them well. This is the case for such market-leading environments as Java and Win32API. Let's see now how well the leading scripting languages do.
Threading has been in the news lately. Scriptics Corporation announced at the end of May that:
Tcl/Tk 8.1 is a major new release of the Tcl/Tk open source scripting language that establishes Tcl/Tk as the industry's first scripting language capable of handling enterprise-scale integration tasks.
...Thread-safety now allows developers to use Tcl and to get the benefit of rapid scripting in high-performance multithreaded applications.
Keep in mind that Scriptics' founder and president, Dr. John Ousterhout, in 1996 wrote a celebrated paper called, "Why Threads Are a Bad Idea (for most purposes)." In fact, Tcl continues to support an alternative multiprogramming model -- one based on event handling, along with its newly acquired thread awareness.
The April release of Perl 5.005 improved on Perl's OS-level multithreading capabilities. It's typical of the core development team's engineering rigor to continue to label them "experimental."
Both Tcl and Perl emphasized "programming in the small" early in their histories. Threads, along with object orientation, packaging methods, and consistent portability, mark the growth of these languages' use on larger and longer-lived applications.
Since its first release, Python has had all these: object orientation, threads, cross-platform capabilities, and modules. Python's threads are unusual, though. While Perl threads don't correspond exactly to any other common threading model, they can be regarded as pthreads for most purposes. Python's threads don't match OS facilities so directly, but act on the Python interpreter's execution stream. Python's threading gives a smooth feel and pleasant load balancing when managing I/O-bound applications. However, the official release of Python does not yet have special threading optimization for purely computational problems. A consequence is that compute-bound Python applications in need of performance boosts are often better designed with separate processes, rather than separate threads.
Rexx originally specified no thread awareness. Some implementations, even early ones, were thread-safe. Object Rexx and NetRexx have supported threads (Java threads, to be precise, for the latter) since they first appeared.
Lua is one global variable away from being thread-safe. A thread-safe but slightly incompatible version is available, and is in fact the basis for CGILua. The Lua development team has plans for Lua 4.0 to be thread-safe.
PHP3 is not thread-safe; PHP4 is.
Even when the languages are implemented in thread-safe ways, challenges remain. Modern scripting languages are characterized by the fact that much of their value lies in the modules or extensions or libraries written outside the core. This approach presents a particular puzzle with threading: many of the most widely used extensions for Perl, PHP, and Tcl, at least, are not thread-safe. Is it important that all extensions be modernized? Should this be done with programmatic coarse mutual exclusion locks (mutexes) on invocation, through detailed engineering source reviews, or with some other mechanism? Who will do all the work? Discussions like this continue behind the scenes for many of the languages.
Threading isn't even unique in raising these thorny issues about how modules should be managed. In the second half of 1999, we'll look more deeply at enterprise-class applications with a scripting component. Unicode support, on which we've commented in previous columns, is like threading in that:
Also, threading remains a complex programming skill in the sense that the semantics vary from language to language. The great compliment paid Brian Kernighan and Dennis Ritchie when they first issued their classic book, The C Programming Language (see Resources), was that if you weren't sure how something worked, it was safe to guess the sensible alternative. With threads, we're all still learning what the "sensible" or "natural" choices are. It's not just Python threads that will surprise programmers coming from C or Java. For example, Tcl constructs in different threads are relatively "insulated," in the apt characterization of Alexandre Ferrieux, an engineer for France Telecom. In contrast with typical C practice, where threads look "bare" and communicate rather promiscuously, Tcl emphasizes safe programming, and its threads operate in that same spirit.
Where to go from here?
Use threads because they fit your application architecture and programming methodology. Analyze performance as a separate matter.
Even if you reviewed the various scripting languages' threading capabilities a year ago, it's time to give them a look again. Considerable progress has been made in this area.
As we're sure you're now aware, understanding threading is no simple task. If you're wanting to explore this area further, start with the books by David Butenhof, Steve Kleiman, or Bill Lewis (see Resources). When you settle on a particular language, learn in detail how that language exposes threading concepts.
Special thanks to Tom Christiansen, Mike Cowlishaw, Sumner Hayes, Roberto Ierusalimchy, Andreas Kupries, Rasmus Lerdorf, Steven Majewski, Gordon McMillan, Tim Peters, and Larry Virden for their help understanding threads.
If you have technical problems with this magazine, contact firstname.lastname@example.org