Cost recovery in the Unix world
Production isn't cheap. Who pays the bills? Here's how to determine (and deal with) your cost model
If you skipped those business courses to spend more time hacking, you may be in big trouble. Put on your green eyeshade and grab a pile of beans: it's time for accounting, Unix-style. (2,500 words)
f you want to stir the pot between mainframe and Unix cultures, broach the topic of cost recovery. Because of their huge initial cost and high maintenance, mainframes have long accounted, and charged, for the services they provide. Every job is tracked; every byte of disk space is accounted for; and reliable, mature billing systems exist to cut monthly invoices for every user on the system. Users can track their historical usage, budget for future use, and plan their computing expenses accordingly.
Unix, by and large, was built by people who believe that computing should be free to everyone. Because Unix systems were much cheaper, and users were often researchers instead of business people, cost recovery was often ignored. All the expenses for a system were often written off as a cost of doing business, or as part of a research project.
Things have changed, and now some high-end Unix systems like the Sun Enterprise 10000 carry starting price tags in the low seven figures. That's just for the processor, of course; add a terabyte or more of redundant disk storage and you've dropped another cool million or two, depending on the vendor. Don't forget backups: a nice DLT robotic tape array is a steal for $250,000. In short, Unix may be less costly than a mainframe, but it certainly isn't cheap.
If you've invested several million in a new Unix box, you'll need to think hard about recovering all that money. Because these big systems are often built to serve hundreds and thousands of users, it seems fair to divide the cost among the user base. People who use more pay more, just like in the old mainframe days. You only have two problems to solve: how much to charge, and how to keep track of usage.
Constructing a cost model
To determine your computing rates, you first need to figure out how much it costs to run your system. This includes not just the obvious things like the cost of hardware and annual service fees, but also the cost of all your administrators (including yourself), and more mundane things such as electricity, air conditioning, and network connectivity. When you add up all of these things for every system on your floor, you'll wind up with a cost model that determines how much it costs to run a system in your data center.
A cost model is best built using a divide-and-conquer strategy. Let's start with something easy, such as all the hardware. Suppose we've just bought a fabulous new Enterprise 10000, and to make things simple, we'll pay a nice round number for it: $2,000,000. That E10000 has a one-year warranty, so we won't pay any support fees for the first year. It'll get pricey after that, so let's assume we'll pay another $75,000 a year for support after the first year. Every company is different, but all equipment depreciates over time. Let's assume the server is a five-year asset, so we'll write off the capital investment at $400,000 a year for five years. As a baseline, then, that system will cost us $475,000 a year just to own.
What about software? Big machines like that usually run databases, which carry hefty licensing fees. Throw in some backup software, a job scheduler, some security tools, and a compiler or two, and you'll easily top $150,000 a year in licensing and support fees.
Now let's add the people. How many administrators do you need to keep that thing running? To make it simple, suppose you have one full-time admin (at $60,000 a year) and a full-time DBA, for another $60,000. Of course, they won't get anything done without a leader, so you'll need some fraction of your time to manage them. If you have 10 people reporting to you, one-tenth of your salary, say $8,000, gets added in for a total of $128,000 in salary. We all need insurance, vacation, and other niceties, so add 30 percent to that for fringe benefits. All told, your people costs come to $166,400 a year.
We're up to $791,400 per year for our sample system, and we haven't even touched power and air, square footage in the data center, network connectivity, systems-management tools, or help-desk support. We also haven't added any disk space to our system, but we'll cover that later.
Charging for CPU
We've spent a lot of money, and haven't recovered a cent. Let's start by charging for CPU usage. What's a good rate?
A 24x7 operation utilizes 31,536,000 seconds of available processing time each year. To further complicate things, all those seconds are available for each processor in the system. Because the minimal E10000 is a 16-way box, our fictional box actually has 504,576,000 seconds of available processing time. If your users use the system all the time, every day, your rate is simple: $791,400/504,576,000; equal to $0.001568, or about 16-hundredths of a cent per CPU second.
Unfortunately, users can't use all of a system every day, because
some of the cycles are used by the system itself. This varies from
shop to shop, so you'll need to do some measurements on your system
to see how many cycles are consumed by
daemon, and other special users. For our sample
system, we'll assume that 10 percent of the cycles are consumed by
system overhead. This means that only 454,118,400 seconds are
available for the users, and the rate should be closer to
$0.001743 per second.
The last catch is that users don't use everything you make available, but you still need to pay the bills. To make this work out, you must set a rate that recovers your costs based upon what's used. If the average duty cycle of the system is 50 percent (meaning that only half the available cycles are used), you'll need to double that rate to $0.003486 to break even. If usage falls below 50 percent, you'll lose money; if usage goes up, you'll turn a profit.
As fun as it sounds, profits are often a bad thing for data centers. The goal is usually to break even, because you're almost always paid with internal funds being transferred from other departments to yours. If you do run in the black, you'll need to reinvest those funds in system improvements, extra staffing, or better tools. This is where data-center financial management becomes more challenging.
For example, suppose usage runs at 60 percent instead of 50 percent. At our rate of $0.003486 per second, those extra 45,411,840 seconds will net you $158,305.67! Wow! If nothing else, time for bonuses all around!
Guess again. If you can't justify those extra funds, you'll be asked to reduce your rates, rebate the overage to your customers, and run the shop at the cheaper rate. On the flip side, suppose you want to add an additional processor to cover the increase in usage. Adding a processor reduces the duty cycle to 56 percent and costs about $50,000. With 17 processors at 56 percent usage, your new CPU rate now drops to $0.002965 (including the capitalized cost of the new processor). In the end, you get an extra processor, you return $108,000 to your customers, and you reduce your CPU rate by 15 percent! What a hero!
Before you hang that plaque on your wall, watch what happens when one of your customers loses that big contract and your duty cycle falls from 56 percent to 45 percent. At your new, low rate, you'll only recover $643,776 and your department will come up $157,623 in the hole at the end of the year. Let's see: who should you lay off to cover those costs?
All told, setting prices for your systems is analytical to a point and quickly turns into an art. You'll need to stay in close touch with your users, anticipating shifts in business; spend capital only when you can prove a positive return; and set rates that hedge against future shifts in demand, technology, and pricing from your vendors. In short, you have to stop being a Unix geek and start being (gasp) a businessman.
If setting rates isn't harrowing enough, wait until you try to keep track of CPU usage on your system. Although Unix provides basic system-accounting support, the devil is in the details.
Most people don't realize that Unix has an acceptable system-accounting package. When enabled, the system will cut a record for every process as it terminates, recording the systems resources used by the process, including processor time in seconds, I/O counts, and physical memory consumed. These raw records are consolidated nightly, generating summary files that list activity totals for every user who logs in on a given day. Other tools can be used to print the reports, and it's easy enough to take these numbers, whip them through awk or Perl, and generate a simple invoice.
If all your users were running simple processes, compiling code, or using a word processor, this system would almost work. The one big problem with Unix accounting is that processing records are only cut when a process terminates. If the system crashes, no accounting records are created for any processes running at the time of the crash. This differs from mainframe accounting, which usually writes partial accounting records every so often for long-running processes.
If your system doesn't crash often (and it shouldn't, if you've been paying attention to all my other feature stories on systems management -- see the Resources section below), this may not be a problem. But if you tend to have users with a few long-running processes, you may be ripe for disaster if the system takes a hit after running for a while.
More importantly, user psychology begins to play in your billing system. If those long-running processes run through the end of the month into the next, the owner will not be billed for any time for the current month, and could be billed for two months of time next month when the process finally ends. Try explaining that series of bills to a customer whose overall budget came in fine last month but was blown out this month by those accrued CPU-usage charges.
If you have the misfortune to recover costs for a system running a
database, you ain't seen nothing yet. Consider an Oracle system.
The main Oracle processes all run as the user
When a user logs in and connects to the database, the connecting
process runs as the user, but much of the user's work is performed
by the database processes running as
oracle. How do
you figure out which portion of the
oracle user's time
was consumed by each connecting user? To make matters worse, when a
user connects via the network (and the majority of your users will
connect this way), the connecting process is owned by
oracle, too. Differentiating costs becomes nearly
At this point, you'll need to take advantage of a third-party product that can probe the accounting data within Oracle to extract billing information. We'll cover these products in more detail next month, but rest assured that they don't provide a total solution. In particular, it's difficult to integrate Oracle-generated billing data with Unix-generated data, because there's often overlap between the two. Even if you do come up with a scheme, getting your customers to agree to it is usually quite a battle.
Databases are also susceptible to the "long-process" billing problems. Most database processes start when the system boots, and run until it's shut down or crashes. Of course, crashing systems don't write accounting records for your Oracle processes, so all of the accrued time for the database is lost on a crash.
Your DBAs will also drive you crazy. If a DBA shuts down a database
and later restarts it, he may inadvertently start the database as
himself instead of as the
oracle user. All that
database processing suddenly starts being charged to the DBA, not to
the users of the database! The best way to avoid this problem is to
run scripts that check the ownership of the database processes every
Finally, you'll need to figure out how not to charge people. Invariably, you will get phone calls from users who've had a process running for days in a tight loop. Users want the process killed and the charges backed out of their bill, because they shouldn't have to pay for "mistakes." Try as you may to enforce a policy of "you ran it, you pay for it," the reality of gross overcharging because of mistakes and runaway processes will force you to implement some way to provide customer credit.
Don't be scared
While I've presented much of the ugly side of cost recovery here, it's a tried-and-true practice that's served the mainframe community well for decades. If you really want to get into the production computing world, you'll need to be able to charge your users reasonable fees for the resources they've consumed. You may want to use simpler cost models and less strenuous accounting tools, but you'll still need to cover all the points I've touched on here: determining costs and rates, tracking usage, and dealing with customers.
With the rising costs of large-scale Unix servers and the desire to consolidate Unix computing into traditional data centers, it's no longer reasonable to write off your Unix investment as a cost of doing business. Customers in your company who don't use your systems won't tolerate being forced to pay a share of the costs, and it's only fair that the true users bear the burden of the systems. It means donning that accounting eyeshade every so often and running a much tighter, business savvy operation, but the long-term benefit to your company is well worth it. After all, efficient, cost-effective computing is the goal, regardless of the platform you're running.
It may seem like a lot, but we've only seen the tip of the iceberg. Next month, we'll continue this review of billing schemes by looking at several ways to bill for disk usage and network utilization, as well as ways to bill for large clusters of systems. We'll also look at some third-party accounting tools that may make life easier for those looking to implement cost recovery schemes for their Unix environment.
About the author
Chuck Musciano has been running various Web sites, including the HTML Guru Home Page, since early 1994. He serves up HTML tips and tricks to hundreds of thousands of visitors each month. He's been a beta tester and contributor to the NCSA httpd project and speaks regularly on the Internet, World Wide Web, and related topics. Chuck writes SunWorld's Webmaster column and is currently CIO at the American Kennel Club. Reach Chuck at firstname.lastname@example.org.
If you have technical problems with this magazine, contact email@example.com