Documentum: Flagship of enterprise document management
Complete document management for the enterprise
Documents are emerging as the dominant metaphor for organizing and presenting knowledge. The requirements for managing documents are a superset of the requirements for managing data. Documentum, Inc.'s Enterprise Document Management System (EDMS) meets the most important document management requirements -- and then some. (2,800 words)
More and more organizations come to understand that they must manage their knowledge, as opposed to their data. As a result, they are looking beyond traditional database management to document management. As we pointed out in the August column ("Enterprise Document Management: from Data to Documents," SunWorld Online, August 1995), the document is emerging as the dominant metaphor for knowledge organization and presentation, and the requirements for managing documents are a superset of the requirements for managing data. The Enterprise Document Management System (EDMS) from Documentum, Inc. (Pleasanton, CA), meets the most important document management requirements -- and then some.
Big 6 bonanza
You might decide to build a document management system if your organization has lots of knowledge to manage and has the necessary computing infrastructure -- primarily, computers on virtually every desktop and a LAN. Given this starting point, you could identify a client/server architecture and a key set of components, and integrate them yourself (or have an outside firm do the work). For example, you might start with a large server machine and load it with an industrial-strength database system -- relational or object-oriented -- and a full-text index and search engine like those from Verity, Fulcrum, Conquest (acquired by Excalibur Technologies), or PLS. Then, assuming you have the right connectivity software (e.g., TCP/IP stacks) available for your client machines, you have the following system integration tasks at hand:
That's a lot of work. Yet it all needs to be done to create a document-management system that adds value to content and people will want to use. It's no wonder many large system integration firms have found lucrative practices in document management: The bill for this type of effort could easily run to millions of dollars, not counting the cost of hardware or commercial software.
Documentum is one of a growing number of vendors who have recognized the value of doing much of this integration in a generic way and packaging it as off-the-shelf software. Documentum was founded by Howard Shao and John Newton, two refugees from the database vendor Ingres. Ironically, the major source of the company's funding has been Xerox. (I say "ironically" mainly because, rather than strengthen its relationship with Documentum, Xerox's XSoft division has released its own document-management system called Astoria, a product that while oriented toward the SGML markup language has a subset of Documentum's features.)
I want a new drug
The company found its initial source of revenue in a rather unlikely place. When you think of document-intensive industries, what comes to mind -- law firms? Publishing companies? Pentagon agencies? Try pharmaceutics. In the drug business, companies bet fortunes on their ability to invent blockbuster drugs and get them to market (and under patent protection) faster than their competitors. The major bottleneck in getting a drug to market is the Food and Drug Administration's approval process, which takes forever and requires -- that's right -- mountains of paperwork. A New Drug Application (NDA) can easily total over 100,000 pages.
As you can imagine, assembling such documents is a nightmare. Entire departments exist at drug companies for the purpose of putting NDAs together. Yet drug companies simply want to get them done in as little time as possible; furthermore, they aren't particularly picky about style, design, or editorial techniques -- unlike, say, a magazine publisher. Therefore, drug companies are eager to embrace anything that will cut down on NDA cycle time. In other words, they are ripe candidates for document management. Documentum, as a result of this marketing insight, has installations in virtually all of the world's major pharmaceutics concerns.
Documentum has a client/server architecture based on Unix servers (most major flavors), a relational database (Oracle or Sybase), and the Verity TOPIC text engine. Major components on the client side include a run-time library for Documentum's API and a user interface called Workspace. Supported clients include Windows 3.1, Macintosh, and X Window System on Unix.
Documentum has built most of the system components listed above and added others as well. The system contains several major technical achievements. One of the most interesting of these is its integration of the relational database with the Verity text-search engine. Ideally, database vendors would already have added full-text indexing capabilities to standard relational or object-oriented models and to SQL. Oracle and others have proposed text-oriented SQL extensions, but they are not yet implemented. As database vendors begin to implement their content-management strategies, they will surely move toward inclusion of full-text indexing into their core products.
In the meantime, vendors like Documentum have to integrate database management with full-text indexing at a higher level. Documentum has done this with a query language called DQL (Documentum Query Language), which is essentially an SQL superset. The DQL parser figures out which part of a query is relational and which is text, and farms each part out to the right query processor on the server. Documentum's Workspace includes a GUI-style query interface that allows users to intermix text and metadata queries; when a user executes a query, it creates a DQL statement and passes it to the server for execution. Workspace also includes a DQL window for power users.
The render vendor
Another major feature of Documentum is that it does more with documents than simply store them as files that are surrounded by some structured data attributes. Documentum's primary unit of data is the DocObject, which is a file with metadata attributes as well as two other characteristics: versions and renditions, as shown in the figure below. Documentum contains a version-management facility similar to that of a software-configuration-management tool like Intersolv PVCS. It lets you check out objects, make changes, and check them in again as new versions or sub-versions, so that entire version trees can be created. By default, of course, you get the latest version of an object when you check it out.
Renditions, on the other hand, are a more innovative (and recently added) feature. Documentum recognizes that information in today's world often needs to be delivered in multiple formats and over more than one medium (in the media biz, we call this repurposing or multipurposing). Therefore, it's handy to be able to distinguish between logical content and its presentation. For example, your company might have a product catalog that is developed in Microsoft Word, but you would like to make it available online internally to people who may not have Word on their computers and externally over the World Wide Web. The solution is to also maintain a view-only version, using a free-browser viewing format like PDF (Adobe Acrobat), and an HTML version for the Web. Documentum lets you store all three as renditions of the same DocObject.
Of course, you need to create these alternative renditions; that brings us to Documentum's API. The Workspace client is an application that uses the API, yet it only exercises a small portion of the API's functionality. One of the most interesting aspects of the API is the capability of attaching code to operations such as DocObject check-in or check-out. The implications of this are enormous. Consider the multi-rendition scenario above. Documentum includes a set of format conversion filters that it licenses from Mastersoft (now owned by Frame, which was recently acquired by the ever-voracious Adobe). It also allows you to install any other programs you may have that do format conversion. In the above example, you could write a small amount of API code that causes Documentum to create a PDF rendition using Adobe PDF Writer and an HTML version using Microsoft Internet Assistant for Word. This would happen totally automatically when the user checks in a Word document.
PDF renditions are so universally useful that Documentum created an add-on product called AutoRender that creates PDFs automatically without API programming. AutoRender has additional features, such as the ability to do term-hit highlighting within the Acrobat viewer -- in other words, to cause the Acrobat viewer to highlight words in a document that match terms in the user's text query.
Work-flow that works
Another strong implication of the code-attaching feature in Documentum's API is for work-flow engineering. Documentum has some work-flow capability based on the somewhat familiar "in-box" metaphor and a rather rudimentary work-flow definition tool. With the API, you can attach code to document check-out operations in work-flows that cause certain tasks to be performed automatically.
To see how useful this is, consider the canonical work-flow example of expense report processing. A typical work-flow system allows you to define roles for the various participants in a work-flow: in this case, roles might be traveler, approver, and accounting (to process the expenses and cut a check). You could use Documentum for this purpose by designing an electronic form, using a spreadsheet with good macro capability such as Lotus 1-2-3 or Excel. You would then design a work-flow with the three roles mentioned above -- except that accounting would be a role with a dummy user; we'll see why momentarily. A user who has traveled on business would fill out the electronic form and, using Documentum, send it to his or her supervisor, who is the approver. If the latter approves the report, it's sent to the accounting step in the work-flow. Many work-flow systems would, at this point, simply print out a report that gets sent "over the wall" to accounting -- and that would be the end of the system's ability to process the report and track its progress.
However, a Documentum solution could include code that attaches to the accounting step of the work-flow and does much of the report processing. Instead of merely sending the report to another user, this step would run the attached code, which would extract information from the spreadsheet and build a transaction that could be sent over to the accounting system for processing -- instantaneously and automatically. Reports would be processed faster, and it would be easy to check the status of any report without looking for physical paperwork in the accounting department.
The final major area of interest in Documentum is how it organizes documents. The outermost organizational unit in the system is the DocBase, the database of documents that resides on a server. Users begin Documentum sessions by logging in to a DocBase. Within DocBases are cabinets; below them are hierarchical folders that should be instantly comprehensible to any user of Windows 3.x File Manager, Windows 95, or the Macintosh. Folders, of course, contain DocObjects.
So far, so familiar. Here's the wrinkle: DocObjects can also be virtual documents, which are collections of DocObjects that act like single documents. That is, the virtual document has one name, one set of metadata attributes, one set of permissions information, and so on. Furthermore, the components of a virtual document can be components of other virtual documents as well -- or DocObjects in their own right, stored in some other folder. In other words, virtual documents are like abstract objects in object hierarchies.
Virtual documents are ideal for many purposes. Consider multimedia content production. A typical multimedia product has many components of different types, yet it has a single author, editor, or producer. If you were a multimedia producer, you could create a virtual document for your product that has no content itself but contains all of the components of your multimedia production. Documentum can also take hierarchies of OLE-linked objects on Windows platforms and automatically create virtual documents from them.
Documentum is at its most powerful when you combine several of the features we've discussed. The sidebar gives such an example from the wonderful world of magazine publishing. Yet we haven't discussed all of Documentum's features. It supports just about anything you can imagine doing to or with a document. It does not include absolutely all of the system components listed above; for example, it does not come with good backup and restore utilities, presumably because such things do not make for flashy bullet-lists in marketing literature. But it goes considerably further than any other document-management system in providing an architecture that can serve as the foundation for all of an organization's knowledge-management needs in the years to come. Documentum is the flagship product in the emerging document-management system industry -- the standard against which others will be judged.
Bill Rosenblatt is director of Publishing Systems in the New York City office of the Times Mirror Co. He can be reached at email@example.com.
If you have technical problems with this magazine, contact firstname.lastname@example.org
Magazine publishing provides ways of using several key Documentum features at once. If this were a print magazine, its production editors would use a page-layout program like Quark XPress to design the magazine's pages and fill them with articles, illustrations, ads, and so on. Quark is not really the same thing as a fancy word processor -- it's more like a page-template builder. You use it to create boxes for text, pictures, ads, and such, on the assumption that these components "flow" through their boxes. The text items could come from standard word processors like Word or WordPerfect, while other elements could be scanned photos or graphics created in Adobe Illustrator or Corel Draw.
Most magazine houses compose their magazines' pages and then send all of the components to prepress, an in-house department or external service bureau that does final output to film, which is then used to do the actual printing. Publishers usually send the material out on high-capacity removable storage media, like Syquest cartridges. The Documentum virtual document is the ideal metaphor for organizing all of the components of a magazine layout, as well as the Quark templates themselves, for tracking through the editorial cycle and easy transportability to prepress.
But magazine production has another dimension. Color images are huge -- typically in the tens of megabytes -- so you don't want to move them around your network or manipulate them on your desktop machine if at all possible. Therefore, specialized image systems exist that create low-resolution versions of the images that take up much less space -- on the order of 100 KB, small enough to move over an Ethernet network quickly. Editors use these low-resolution versions to place images on the pages they are composing and to size and crop them appropriately. Quark and other layout programs include commands for doing all of this -- importing images, sizing, cropping, and so on. More importantly, they capture the user's image-manipulation commands using a standard language called OPI (Open Press Interface) and store them in the layout file.
The publisher uses the layout program to create a printable PostScript file, then sends it to prepress along with the full-resolution image files instead of the low-res versions. Prepress then uses a special PostScript output processor to bring in the full-resolution images, position them on the page, and apply the stored editing commands to them. The result is final output with appropriately edited full-resolution graphics, even though no one touched the huge hi-res image file between its acquisition (e.g., scanning) and final output.
Where does Documentum fit into all of this? In two ways. Virtual documents are the ideal metaphor for organizing all of the components of a layout -- including the text items, the imported graphics, and the layout file itself. As for the graphics files, multiple renditions can be used to store the various resolutions necessary: full high resolution, low resolution for previewing, positioning, and editing, and perhaps a small "thumbnail" for quick browsing. Similarly, the layout can have multiple output renditions, such as PostScript and PDF.
Documentum's virtual documents and renditions can help organize all of this information: Renditions can be created automatically by background processes; interrelationships among components are implicitly tracked without extra manual work; and components are easy to find -- according to content and product attributes, not file names -- and group according to purpose.