Monday, December 27, 2010

A Christmas Carol

December brought overtime. Lots and lots of overtime. As usual, old man Scrooge (a large software company in the NorthWest) wheeled and dealed with Paula to avoid the threat that she would actually use her vacation to avoid losing any in the usual year-end accounting-based theft of employee time. "Trust me," the miserable old miser said, "It'll be off the books, but you can just take some extra weeks later! And here, have an extra lump of coal to heat your office!" Bastard.

Meanwhile, at home, there was little Christmas spirit. No time for tree. No time for lights. Well, no space clean enough to put up a tree. And many lights were still sitting in the yard from last year, but plant life had overgrown it in a tight weave and we could not pull it free to plug it in. Besides, there was Asian Neighbor. We live in classic suburban isolation where we don't really know our neighbors, but we know of them: Asian Neighbor, Canadian Neighbor, Microsoft Neighbor, Plumber Neighbor, Rental Neighbor, and so on. Asian Neighbor, after years of declining effort in the face of Paula's escalating light war, had sprung into action the day after Thanksgiving, with a full-yard extravaganza of lights everywhere, bright candy canes, and something that was either meant to be a small deer or a large coyote. This further lowered our decorating morale.

In the end, all we managed were two red/white fur collars with jingle bells for the dogs. Every day, I would remove the elastic jingle-collars from the doorknob, pop them onto the dog's necks, and parade down the street festively. So long as we kept the shades down, the neighbors might regard our gaily jingling excursions and imagine that our house was full of Christmas trees, presents, and lights. Clever, I thought.

At the beginning of December, the inquiries began. First, circuitous: "I don't know what to get you for Christmas." Then direct: "What do you want for Christmas?" I began early, too. "I don't want nuthin' for Christmas!" I recited, while stomping around the house. To which I eventually added "We have everything we need, and the rest are things that can't be bought." This sounded sage to me, but it was seen for what it was: sandbagging.

The armistice held until about the middle of December. Then, one day, I looked up and saw a wrapped box on my piano keyboard. "What. Is. That?" I punctuated with stabby points of my finger. "Just a little present for you." Paula replied without looking up from old man Scrooge's laptop-for-home-work-because-we-don't-need-no-union. I stared at her, but she wouldn't look up. "It begins." I said grimly. "Capone" was playing in my head. "He pulls a knife, you pull a gun," Sean Connery was burring thickly.

Three days later, two presents appeared. Five days later, Paula noticed them. "What's this?" she said. "Just a couple of presents." I said. We danced the verbal minuet. "I just have one or two more for you." "I just have the one big present, and maybe one or two small things." "I meant to get something for you, but it's out of stock." "I ordered something, but I don't think it will get here in time." Sandbagging. When you find a strategy that works, stick with it.

The endgame commenced in the final week before Christmas. Paula was on "vacation", which of course meant monitoring work via old man Scrooge's laptop to keep higher-paid people from making too big a mess before the new year. It was time to show a few cards without revealing how many were left in my hand. More presents appeared in Paula's stack. "You said you had hardly any presents for me!" Paula accused. I shrugged and stated the obvious: "He who gives the most presents wins." The Christmas Game was past the point of pretending anymore.

There were other fronts to the battle, of course. Boxes to ship to distant friends and relatives, having only the faintest clues of how big a response was necessary to ensure victory. It was a lot of wrapping. A lot of boxes. A lot of things to think about and keep track of. We were having trouble remembering what we had got people last year, in order to avoid duplicates. The strain was showing. We were beginning to make mistakes.

After the boxes went out, I was shoving Christmas debris aside to make room for a plate to eat dinner when I unearthed something buried and screamed "What Is THIS?" It was a wrapped and labeled Yankee Candle for friend-Helen in Florida, whose box had already shipped. We stared at each other accusingly, each thinking "If we lose by a single candle, I'll blame you."

The final days were most intense. It was too late to get anything shipped without giving an undeserved Christmas present to UPS. Paula hopped in her car for a day of shopping. I gathered more cards into my hand and bided my time, laying them down carefully and slowly. The day before Christmas Eve, UPS arrived. "My last present for you!" I exclaimed. "You said that wasn't coming until after Christmas!" Paula accused. "I know. UPS is great, ain't they?"

On Christmas Eve, I remembered I had long meant to reprint and laminate that favorite recipe as a gift. What could be more personalized and heartwarming than that, I thought, Christmas Game points cha-chinging in my head. As I put my coat on, Paula said suspiciously "Where do you think you're going?" "Just thought of a few last-minute items I've been meaning to get," I replied, running to the door with alacrity. "I can buy more presents, too!" Paula yelled, as I got my hand on the doorknob. "My presents are made with love; yours are just revenge-presents," I called out. "Two can play that game!" she shrieked. "Two can play, but only one can win!" I yelled as I slammed the door.

Christmas was anti-climactic. We celebrated with the relatives, firing our fusillade of presents at them, absorbing their barrage in return. Not every skirmish fell our way, but mostly we won. As with every war, it was the children who took the brunt of it, spinning high on sugar, their little attention-deficited brains slammed by one present after another as though they were in some kind of cruel CIA experiment.

At long last, we got home on Christmas Day and could relax, enjoying the peace that would hold for another 11 months. Or so I thought. Paula leapt from the couch and shouted "Don and Helen!" How could I have forgotten? A distant aunt and uncle had stealthily fired a cruise missile of presents at us by shipping them to the in-laws. We had no warning before Christmas Day and hadn't gotten them anything. We had to form a response immediately. We began ransacking the house and the depleted "Miscellaneous Xmas Gifts" box trying to come up with anything.

"I found something for Uncle Don," Paula said, "but what are we going to do for Aunt Helen?" Something in the back of my mind stirred, and I sat down to think. Suddenly I leapt up and grabbed her. "Is there not a Yankee Candle sitting here, already wrapped, that already says From: Ron&Paula To: Helen?" I shouted. Paula quickly remembered what I meant and shouted back "Yes. There. Is!"

We looked at each other and began to tear up. It was a true Christmas Miracle. In the distance, we heard the faint sound of bells jingling. The dog collars had fallen off the doorknob.

Saturday, December 18, 2010

Cash Cow Disease Revisited

Since the original Cash Cow Disease post got ycombinatored and a bit daringfireballed, I thought I would post a few collective thoughts about the feedback.

You're an Idiot

Indeed, a hidden recorder could have picked up my voice saying "I'm an idiot" scant hours ago, as I realized I had just taken a load of dirty dishes out of the dishwasher and carefully put them all back in the cupboards. Of course, saying something others feel is idiotic does not really make me an idiot, but my book isn't out yet, so I can't fault programmers for not understanding the Fundamental Attribution Error and why it drives us relentlessly to such conclusions. But the interesting thing is the number of responses in the "you're an idiot" vein. Having been a magazine editor for a decade, I know that if you get no vociferous responses you probably haven't said anything worthwhile. But I think this particular piece touches the same hot wire that Nicholas Carr's "Does IT Matter" did: the implication that a bunch of programming going on in the world is a big waste of time. As a programmer, I can understand the "reject it first and think about it never" response. I am perfectly capable of sectioning off that portion of my brain that, were I to listen, would tell me I should be working on my book instead of writing yet another LL(1) parser generator that the world doesn't need. But on I code, happy as a meth-head discovering an empty building full of copper wiring.

But What About Product X?

Moving up a level in thoughtfulness are responses that point out that, for example, GMail was a Google side project, and Android might make a lot of sense. The blithe response that comes to mind is: "Well, testicular cancer clearly played a role in making Lance Armstrong a champion -- do you advocate testicular cancer for aspiring athletes?" For example, if it were true that GMail was a "typical" Google 20% project, whatever that is, would that be evidence that this is a really good way of doing business when so many others are dead ends?

Now consider Google Go. Does the world need yet another programming language? Hey, I think it's got some cool ideas and some useful things in it. But is it a good investment of a number of high-powered and (I'm guessing) high-priced brains that the stockholders have paid for?  I don't know. I do know that if Google were a startup that had to justify every dollar it spent, that faced real financial penalties for not translating programmer hours into customer benefit, then Google Go would, well, go.

To reach for a more subtle differentiation, I think that Google is clearly some years behind Microsoft in their cognitive decline, and since they haven't hit their peak growth yet, the damage done to stockholders is less obvious and more defensible. And I'm also sympathetic to the Black Swan idea that Google should deal with the unpredictability of the future by throwing lots of projects against the wall to see what sticks.

My point is, they aren't throwing these projects against the wall, they're throwing them inside a Nerf room where nobody can get hurt, and it's very hard to tell the difference between what could stick and what's just plain fun (at stockholder expense). Is Gmail actually a success for the stockholders? I bet it is, but I really can't tell because cash cow disease drives (public!) companies to obscure the numbers that might ferret out wastes of time (try to figure out what Amazon's profit really is on a book!).


But Apple...

It's hard to quantify, but it sure seems to me that Apple is significantly more willing to cannibalize their existing product lines than either Google or Microsoft. For example, Microsoft could have split off some O/S team people to aggressively make a slimmer, cheaper Windows for Netbooks, giving them the mere commission to sell as many copies as they could. Instead, Microsoft cut a deal to sell (soon to be deprecated) Windows XP for the Eee PC, so long as the manufacturer agreed to not sell machines with more than 1GB. Here, of course, we've entered the main of Innovator's Dilemma as Microsoft's goal was clearly to defend the cash cow against a threat rather than view netbooks as an opportunity to explore a new product direction.

Apple does not seem to me to be so clearly a cash cow-dominated company. I suspect that if I asked 10 programmers what Apple's cash cow is, I would get at least 3 different answers. As another example, though iTunes clearly both sucks and blows on Windows, it's hard for me to tell whether that's really Apple defending their cash cow by making nothing look good on Windows, or merely incompetence at creating a Windows product, of which there is certainly no shortage among other software companies.


Miscellany I Found Interesting

marypcb (http://marypcb.livejournal.com/) Gmail, Picasa, translation, using Google Maps on phones to gather locations for Google Maps – they all get more data for Google to crunch. Their business model is transforming the information of the world into a source of targeted ads. Is it still cross-subsidy and over-diversification when it’s a company strategy?
My point was: what can't Google justify as a business endeavor? That question should keep somebody awake at night at Google Corporate.

Anonymous: You pick one famous unpopular 20% project which failed, but discount the hundreds or thousands of 20% projects which contribute to Google's revenue stream.
Let me just point out that it tends to be fairly crucial in companies without a cash cow to identify precisely what the contribution of each project to the revenue stream is -- can you show me in Google's financials how much Google Groups contributed this year? And it is a common occurrence for companies to find that they can increase revenues by getting rid of activities that contributed to revenue. We've entered the land of Peter Drucker here, which I cannot think how to summarize pithily.

Anonymous: Or consider that Wave matches the daily needs of a Google programmer almost perfectly. Even if it was never released, it could have been a huge time-saver. Should Costco not have forklifts to unload trucks, simply because the forklifts themselves aren't a cash cow?
I'll have to agree with Nicholas Carr by replying that Costco should not spend stockholder money developing its own custom forklifts.



Tom Bolton: [...]how one can tell the difference between undisciplined forays into new territory and real innovation before (or after) the fact (success or failure) from an outside perspective. 
That seems hard to me, and made harder by the companies having no internal accountability and no motivation to publish whatever accountability they do have (how much was really spent on Google Wave? can't see how a stockholder could possibly make a good estimate). But this is just a cog in the great wheel of disfunction that the stock market has become. For many Google stockholders, it doesn't matter whether Google is investing in data mining or just hookers and cocaine -- the stock price is all that matters. This great disconnect is how you get a GM that just blunders on indefinitely, its own mass so great that any velocity results in enough momentum to give the appearance of life even after death.

Anonymous: Is the premise of this section that we should enact legislation to force the return of these individuals to the market? If not, it looks like a pretty simple bet with high chance of small downside in exchange for a low chance of large upside (and the reduction of a small chance of high downside—if the Next Big Thing comes from your side projects division, you won't be trying to compete with it in the market when it shows up).
Two premises are relevant here: a) the actual cost is bigger than you think because you are (deliberately) kept from being able to measure it and b) you can place more bets on side projects better if you have the discipline that having a cash cow tends to erode. Why doesn't Y Combinator take the Google approach to placing these bets and just hire as many programmers as their budget allows and let them work (100% instead of 20% -- 5 times as effective, right?) as long as they want on whatever their little heart desires? Do you think Y Combinator would be more successful with that approach? Presumably they have concluded otherwise.

Jens Alfke People have a ridiculous misunderstanding of 20% projects. The vast majority are either contributions to other team's existing projects, internal tools, or very small-scale experiments. 
For better or worse, the "20% project" provides a convenient moniker for hanging various hats on. I'm hanging it with the "Geez,  Google has more projects than Molly Hatchet has guitars" hat.

Anonymous: "waste"? Seriously?? Have you any idea how much learning must have come from the effort put into Google Wave, or from other 20% projects that haven't make it as far? And that doesn't even count the increased morale from working on a pet project.

I love learning. I learned how to play the guitar. I learned a lot about the SAT this summer while tutoring my nephew. For various reasons, I learned a great deal about two particular hormones in humans, melatonin and cholecalciferol. The point is, would a financially disciplined software company want to pay me for all this learning? Or would they be more interested in paying for things that have a more measurable return? (And certainly, wouldn't they try to measure the return in at least some vague-but-better-than-nothing way?)  And if you want to talk about increased morale, I think the opposite argument is good: they are actually destroying morale compared to someone who is working on their "pet project" because they are in a startup  that has to get good or die. Compare, for example, the morale of the Danger team before and after Microsoft acquired them. Morale is not what you want to pin your argument on, I think.


Thursday, August 12, 2010

Cash Cow Disease: The Cognitive Decline of Microsoft and Google

Watching the recent product retraction of Google Wave convinced me that Google is fully infected with the same protracted, end-stage wasting disease that has consumed Microsoft for years: cash cow disease.

Cash cow disease arises when a public company has a small number of products that generate the lion's share of profits, but lacks the discipline to return those profits to the shareholders. The disease can progress for years or even decades, simply because the cash cow products produce enough massive revenues to distract shareholders from the smaller (but still massive) amounts of waste.

For example, with Microsoft, Windows and Office carry the company, roughly speaking, allowing the company to lose billions (that's with a 'b') on failed projects without incurring any serious backlash from stockholders. Without cash cows, Microsoft could not have launched a new cellphone, then canceled it a few weeks later, all while pouring more money into yet another generation of cellphone.

Cash cow disease costs stockholders untold (sometimes actively buried in accounting maneuvers) dollars. Consider Xbox, which consumed billions (that's with a 'b') before eventually turning a profit of millions (that's with an 'm'). If Xbox had been spun into a separate company, then Microsoft stockholders could have kept those billions (with a 'b') and let someone else decide to invest billions in trying to jump into the game console business.

Meanwhile, at Google, the cash cow is search-driven advertising. That allows the company to encourage engineers to waste 20% of their time on "projects", like Google Wave. Just like Microsoft stockholders, Google stockholders are expected to feel happy about the overall company profit margin and not inquire too closely into the massive amount of wastage.

Making Economic Sense

But wait, didn't Xbox eventually turn a profit? Doesn't Microsoft have to search for new sources of revenue? Isn't Google encouraging innovation that could pay off big someday? Am I mislabeling "investment" as "waste"? That's where the "cognitive decline" from the title comes in.

The problem with Microsoft's forays into phones and search, and with Google's forays into phones and operating systems (see a pattern here?) is lack of discipline. When you have a cash cow, you lose the discipline of having to make a good product and pay attention to your customers. Sure, Google and Microsoft can hire the smartest minds in the business -- but cash cow disease keeps that brainpower derailed into projects that don't have to stand the test of the marketplace (new programming language, anyone?).

How did Microsoft manage to acquire a relatively hip and happening company like Danger and turn it into a complete flop of a product launch with the Kin? To oversimplify: by having all the money the world. When your development decisions affect your ability to meet payroll quite directly, you see them in a very different light than when they affect nothing more than perhaps your next annual review or your status in the latest internecine company struggle. The economic discipline of the marketplace is lost for those afflicted with cash cow disease. A CEO can embark on a cellphone project for little better reason than that some obnoxious guy in a black turtleneck is doing well with his own cellphone.

Google offers their own unique twist on cash cow disease. Since their core competency is turning data mining into advertising dollars, they can actually claim that negative profits are the route to success. Thus, they can pay cell makers to adopt Android, and pay (in the past, quite enormous sums) customers to use their shopping cart option. Like pixie dust, potential future advertising revenues can be sprinkled on any revenue-negative scheme to make it look brilliant.

But you shelter yourself from the economic discipline of the marketplace at your own peril. If Google Wave had been a small company that had to actually convince people to pay for the product in order to meet payroll, the revelation that there's no "there" there could have been had in a fraction of the time -- and without costing Google shareholders a dime.

Stifling Innovation

Both Google and Microsoft proclaim themselves innovative companies that love competition. But the net result of cash cow disease is a waste of brainpower, and a decrease in useful innovation. A mere expression of interest by one of these giants in some particular burgeoning market is enough to dry up investment funds for any small company interested in the same market. For every failed Kin, there are multiple Dangers that could have thrived if Microsoft had had the discipline to stick to their Windows/Office knitting and restrict their other ventures to simply investing (rather than ingesting). For every magnificent Google Wave flop, there are multiple innovative new apps that could have been created (by the same people working in smaller companies) if Google had the discipline to focus on its core competency rather than creating and endless parade of "beta" apps.

The brain drain is also significant. Both Microsoft and Google would be significantly more profitable if they cut all the staff currently assigned to non-cash cow projects, but there would also be significantly more developers in the small-company milieu of software. Although lip service is paid in the U.S. to the importance of small companies, small companies are actually discriminated against in important ways. Google and Microsoft can afford H1B lobbyists, patent suites to use as weapons, high-priced legal guns, negotiate tax breaks with local governments, demand infrastructure changes, and great many other things that are impossible for small companies. The smallest companies (sole proprietors, where much true innovation begins) cannot even fully deduct their health care costs as business expenses, the most obvious example of the true (lack of) value the government places on small business.

But cash cow disease even significantly warps the ability of the rest of the market to innovate. Thus, the dream of many small software companies is completely divorced from any thought of actually staying in business and providing a good product at a good price to customers for years. Instead, the dream is to build something as quickly as possible that one of the cash cow companies will be interested in buying, so the founders can "cash out", leaving yet another half-assed product bringing down the property values in the software ghettos of Windows Live or Google Labs.

How long does cash cow disease linger? Just about as long as cash cow revenues sufficiently overshadow the enormous wastage. I can't see any cash cow company ever being motivated to give up their favorite drug any time soon. Neither Google nor Microsoft are close to being called to account, except perhaps in specific instances such as when Ballmer simultaneously plotted both employee layoffs (or was it merely a clever shifting of employees to contractor status to avoid paying them benefits?) and an inexplicable (except possibly as an ego booster) Yahoo! takeover.

The only encouraging note is that cash cow disease amplifies chaotic churn of quick and dirty software (soon, we'll all have our own "app stores"!) in the marketplace, leaving swathes of opportunities untouched. But these swathes are in areas that require slow and careful development, not quick and dirty. Few dare to tread there; we live in a world where long-term software development is overwhelmingly rejected.

Friday, April 09, 2010

Why Johnny Journalist Can't Spell

Maybe it's just me (it's not), but it seems like spelling words correctly is no longer viewed as a requirement for writing. Of course, with average citizens publishing their every thought and deed, nobody could be surprised that incorrect spelling and grammar would be the least of the problems with the daily content spew. This is just a blog, for example. I have no copyeditor to check my work before it goes online, only a couple of re-reads by yours truly. But shouldn't we still be able to muster just a small bit of concern that major news organizations can no longer spell?

Of course, The Daily Show makes regular sport of the absurdities that appear in "the crawl" beneath the never-ending news channels. But that's like shooting fish in a barrel. As correctly pointed out by yet another comedy show (30 Rock), It's a 24-hour news channel, we don't have time to do it right anymore.

But what the hell happened to print journalists? I'm no longer surprised to see any sort of boner even in the online presence of The New York Times. But why? Did they fire all the copyeditors when they started putting their copy on the Web? Is it an insidious attempt to fit in with the thumb-banging generation that u no is ROFL? It's a puzzle.

It's a puzzle, but today I got a hint from The Atlantic. During their stalwart coverage of important issues, I came across a new (non-)word: maritial. This is a big clue.

Of course, confusing of "marital" and "martial" is an ancient source of humor. But this Spooneristic spelling is something a little different, not just the transposing of one word for another, but the inventing of a new word. Do you see the clue? This word would never appear in any dictionary, whether paper or electronic. The conclusion is inescapable: The Atlantic does not even run online copy through a spelling checker before publishing!

I'm from the Government a Piece of Software, I'm Here to Help

OK, so dumb old news media can't even punch a button to get their copy spell-checked. So what? Here comes the blog-worthy twist: I blame us. We the programmers who automate tasks with our software, who put human copyeditors out of business -- this is all our fault.

I write in the book about how automated tools can make us less competent. There's a psych study that shows that experts given grammar and spelling checking tools in their word processor begin to lose expertise. But what's happened here is one meta-level removed from that. I give you Burk's Law of Automation:

To Automate a Task is to Devalue It

Consider the day of the copyeditor. Mistakes could still be made, but they besmirched someone's reputation. But to pay a copyeditor and then utterly fail to have them review copy... Well, that might result in some high-level meetings and reprimands.

Now consider the day of the automated copyedit, the spelling and grammar checking software delivered by us programmers. Failing to review the copy is now just failure to press a button. Anybody could forget to press a button. It's not like there's a separate employee sitting there whose sole job is to press the button.

We the programmers are the real source of the decline of journalistic standards. We automated spelling and grammar checks (in that shoddy, works-good-enough-to-sell kind of way we automate things) and psychology did the rest. If the computer can do it (never mind that it can't do it that well), then it's not that important. When you print the non-word maritial in your magazine, it's not a failure of a trained professional to do their job, it's a failure to push a button, something a monkey could do. The resulting flaw is exactly the same, but the use of software makes the flaw less important in our minds.

This problem is intertwined with the problem of surrendering authority to machines, of giving them undeserved agency. The computer becomes, not just an additional tool for checking grammar and spelling, but responsible for checking grammar and spelling. Computers can do many things, but they cannot be responsible. They cannot feel shame, be punished, be found legally liable, be rewarded, or take pride in their work.

Transparent Limitations

The problem here is not as specific as a few typos in print. It is a general and growing problem that people increasingly surrender authority to software as they collectively suffer ever more ignorance about software's limitations. It behooves programmers to do something they've invested little effort in in the past: make the limitations of software transparent.

Is it our fault that people don't push the button to perform grammar and spelling checks before publishing? Actually, it is. If you know that people are going to reliably fail to perform a check, to merely claim that they "should" behave in ways that psychology guarantees they won't is to simply be complicit in the problem, one step away from the cities that tweak yellow-light durations to raise more money from traffic tickets. As programmers, we are all too often offloading responsibility onto the future, distant, removed user. Since they will likewise end up offloading responsibility onto our software, small wonder that we engender situations where no one believes themselves responsible. That's a small thing when the result is a misspelled word, not so small when the result is a misdiagnosed X-ray.

But reminding the user to push a button is not the real meat of the issue (though it is entirely neglected: does your email client display in red the number of typos when you go to press the "Send" button?). The real issue is transparency of limitations. Does your grammar checker give you an estimate of the number of grammar errors it may have overlooked based on the size and complexity of the text? Has any programmer even ever considered tackling that problem? Likewise, automated software that helps doctors read X-rays needs to continually remind of its own false-positive and false-negative statistics. And, of course, a voting machine system that offers a "recount" button that merely reprints the same number from memory is so opaque about its limitations as to be fraudulent.

The Future Ain't Bright

Alas, being transparent about your limitations conflicts with the goal of selling software. The number of known bugs (heaven forbid we would attempt to estimate the number of unknown bugs!) in our software is generally treated as a secret or, in open source software, as another means of avoiding responsibility ("you have the source -- you could fix those bugs yourself!"). Who will buy the word-processing system that estimates the number of flaws it may have missed when the competition simply says nothing and hopes you'll infer it is flawless? As far as I can see, it is in everyone's short-term interests to use software as a general tool for avoiding responsibility. Short of declaring maritial law, I can't see this ever changing.

Saturday, March 13, 2010

Software Tools For Buggier Bloated Software

One of the first programming books I ever bought was Software Tools, by Kernighan and Plauger (which cost me a hefty $11.95 back in 1980). Part design instruction, part anthropology, it was an amazing exercise -- expert programmers presenting actual code to do something useful, and walking you through their thought processes. Software tools let us bootstrap ourselves up to solving more complex problems with less effort. What could be better?

So what does a few decades of technological acceleration do to the realm of software tools? The developments are not all happy ones, I would say. At least not in the area I want to talk about: parser generator tools. (Warning: basic understanding of yacc and language theory required.)

The Bad Old Days

John Aycock has written about the stodgy old tools we used to have to rely on to implement languages, epitomized by yacc and lex and thereafter mimicked in many similar descendants. He skewers yacc with a short example. Imagine you want to implement an assembly language whose syntax looks something like this:

loop:  lda   foo
       shift bar
       halt
He then shows how one might naively capture the syntax with a grammar (I've used yacc syntax here):
%token IDENT
%%
statement
    : label opcode operand
    ;
label
    : IDENT ':'
    | /* empty */
    ;
opcode
    : IDENT
    ;
operand
    : IDENT
    | /* empty */
    ;
As Aycock points out, this plausible first crack at capturing the syntax with a yacc grammar will get you the less than helpful message:
example.y contains 1 shift/reduce conflict

The Good New Days?

All is not lost, though. As an academic interested in parsers, Aycock reports that brand new modern parser generators can blow dear old yacc and lex into the weeds. See, yacc can't handle all context-free grammars, just a subset of them. In Aycock's narrative, the reason such inferior algorithms were used in the bad old days was merely a lack of memory, CPU speed, and state of the art. Nowadays, he says, we've got better algorithms, and memory and CPU are cheap, so why are you living in the 1970's with these old parser generators when you can use a tool that will take just about any context-free grammar you can throw at it? He offers multiple tools that can perform much more generally than yacc, and asks why we aren't all using them.

It is certainly true that experienced programmers need to keep up with the times, or risk being betrayed by our own experience. For example, cache memory speed has continued to outpace advances in main memory speed to such a degree that, if you're an old programmer who just assumes main memory is fast, your attempts at speed optimization may go awry. Likewise, the balance has tilted towards machines with an embarrasment of riches of both CPU speed and memory for many applications. Not so many years ago, I would have always thought of streaming or paging schemes if working on an application related to documents. Today, I would start with the assumption that 99% of all documents fit in main memory with no problem. But having spent some time using and writing parser generator tools, I'm just a wee bit suspicious of the utopia Aycock (and others) paint.

And of course, it is a very old truism in computers that making CPU fast and memory cheap doesn't make all hardware fast and memory-rich -- it also makes slow CPUs with limited memory cheap enough to use in new situations. It's a good rule of thumb that CPU is fast and memory cheap, so long as you don't forget there will always be common cases where that is not the case. For example, if I'm building a standalone application, I probably really don't care if it uses a generated parser that's a few times slower than what yacc can generate. On the other hand, if I turn that same application into a web service that I plan to put into the cloud (where I'm renting CPU/memory by the minute from Microsoft/Google/Amazon), then suddenly I may begin to care quite a bit about what were previously too-cheap-to-care inefficiencies.

The Bad New Days

Let's go back to that embarrassing inability of yacc to handle this natural-looking grammar:

%token IDENT
%%
statement
    : label opcode operand
    ;
label
    : IDENT ':'
    | /* empty */
    ;
opcode
    : IDENT
    ;
operand
    : IDENT
    | /* empty */
    ;
What went wrong here? Well, when yacc first sees an IDENT, it doesn't know whether it is seeing a label or an opcode. Aycock would say the heck with yacc, then, and use a cool new tool that will look further ahead, find that ':', and then decide whether that initial identifier is a label or an opcode. Stupid ol' yacc.

But wait just a minute. This is the kind of toy grammar that shows up a lot when academics talk about parsing. Let's move to the real world. One real world problem is that you kinda want to reserve opcode identifiers and not let the user accidentally use a label name that is the same as an opcode identifier. Otherwise, the user is one goofed colon away from not being able to figure out what they did wrong (But that's an opcode! Oh... I guess not -- where did that colon come from?). In fact, you might say that yacc's cryptic error message could be interpreted as I'm a little confused because I can't tell a 'label' from an 'opcode'! So somebody has to keep a table of all the strings that represent opcode names, and be able to give us some unique convenient integer ID for each one, since we don't want to be doing string compares all the time. Oh wait, in the bad old days, we have a tool that does all that: lex. Let's suppose we've typed our opcodes into lex and it's identifying them for us. That might lead us to this slightly more realistic grammar:

%token IDENT
%token OPCODE
%%
statement
    : label OPCODE operand
    ;
label
    : IDENT ':'
    | /* empty */
    ;
operand
    : IDENT
    | /* empty */
    ;
With this tiny move towards realism, the most amazing thing happens: yacc can now handle the grammar without the slightest problem! Now remember, Aycock picked the grammar, I didn't. But still, if you're thinking you can't really say the constraints of yacc are related in a sensible way to the constraints of real-world parsing problems based on a single example, I'll agree. I think there's a fairly obvious general principle at work here.

Why Modern Parser Generators Are Awful

Aycock was dead-on in mocking the horrifically bad error messages and general user-unfriendliness of yacc and its descendants. But the brand of modern parser generator he recommends doesn't really address those problems -- instead, they address the problem of accepting grammars with fewer constraints. Are fewer constraints in language design an unabashed Good Thing? Let me quote from the decades-old yacc manual:

moreover, the constructions which are difficult for Yacc to handle are also frequently difficult for human beings to handle. Some users have reported that the discipline of formulating valid Yacc specifications for their input revealed errors of conception or design early in the program development.
So even 30 years go, folks were quite aware that constraints on grammar were often a Good Idea. Indeed, FORTRAN is often offered as the poster child for what kind of ambiguous ad-hoc monstrosity could arise before the mild constraints of Backus-Naur Form became widely accepted.

But Aycock puts it forth as a Good Thing that the average programmer can grab a Cool Modern Parser Generator and have it accept any old context-free grammar -- even an ambiguous grammar. Let's think about that last statement. Ambiguous grammar means... well, could mean this, could mean that. The intent of the grammar is not clear. One thing that hasn't improved one whit in 30 years is that software isn't getting any less buggy. But now we have tools that will generate a parser for you based on, well, from most perspectives in most situations, what I would call a buggy grammar.

Now wait just a minute. If a parser generator accepts an ambiguous grammar, then how the heck do you ever test the thing? Well, if one gigantic modern tool causes you a problem, you just need another gigantic modern tool! So, for example, the Accent compiler-compiler comes with Amber, a separate tool that

allows the user to annote grammars to resolve ambiguities. It also offers a default strategy. Unless an ambiguity is resolved in this way, it is detected at parsing time whether a given text is ambiguous.
So, with yacc, you have to tweak your grammar to make it acceptable. With the Cool Modern Tools, you have to tweak your grammar to make it acceptable, or I guess you can just emit a runtime error for the user to try to deal with (what would it say? Hey, I couldn't decide what the grammar should do at this point -- email me your vote!). This is progress? Well, in some sense, it's the opposite of progress. yacc will flat-out tell you (in about a fraction of second) if your grammar won't work. But with the Cool Modern Tool (the separate one you have to remember to use to check for problems in your grammar):
if the grammar is unambiguous the algorithm may not terminate. [...] one has a good chance to detect a problem
Cool, so you have a good chance of being able to check your grammar. As you can see, the new generation of parser generator tools places a very low priority on a variety of aspects of software quality.

Now, Aycock's not an idiot. There are particular uses for ambiguous grammars and, for those (narrow, out-of-the ordinary, specialized) applications, I say God bless the tool that can generate you a solution. But there are no such qualifications in his endorsement of using these tools. In fact, what Aycock encourages is exactly what's happening. Programmers with little or no education about computer languages and the parsing thereof are grabbing all sorts of parser generator tools that give them more power than yacc -- more power to implement lousy languages, more power to generate parsers for grammars that are full of bugs, and more power to create software whose behavior they really can't specify at all.

In Aycock's narrative, programmers who can't deal with those confusing yacc error messages will just get a Cool Modern Tool, which will accept any grammar they throw it at it, so no error messages, and no problems! But if you hang out with the people actually using these tools, you see a very different narrative. First, the tools are still darn complicated and documentation is, well, about what you would expect from programmers, so people are still struggling to use the tool. But more disturbingly, lots of folks can't figure out what the tool is actually doing. Why does it accept this input and not that one? The purpose of a tool that generates a parser from a grammar specification is precisely to increase the ease of specifying and understanding exactly what syntax will be accepted. The direction of modern tools of this ilk is contradictory to the fundamental original purpose of this category of tools.

The Bigger Picture

I'm picking on parser generators because one of my back burner projects I'll never get finished is a little parser generator (welcome to Google Code -- the programmer equivalent of 1950's puttering around in the garage with the ol' Chevy!) But the problem of software tools (sometimes in the form of frameworks) in the modern age is a general one.

Consider the lowly and well-understood Remote Procedure Call. Writing networking applications is complicated, what with having to shove packets around asynchronously and whatnot. Let's simplify by making network I/O look like a simple procedure call. Is this an advance? Absolutely. Right up until programmers use it without understanding how it works beneath the covers. In my own state (Washington), we got to pay many millions of dollars to cancel a networked computer system that (when last audited before they decided it was cheaper to just throw it all away) had an order of magnitude too slow a response time to be usable. Was that because programmers were using software tools for networking without understanding them? I don't know, but that's certainly the first place I would look.

We want tools to hide complexity from us. Unfortunately, hiding complexity means the tool is responsible for making complex choices on our behalf, and we're not getting better at creating tools that don't regularly screw up those choices. We also want tools to save us from having to understand the complexity the tool is designed to hide from us. And we're not getting better at creating tools that make their limitations obvious to us, that inform us when we've asked it to do something that a human had better take a look at first.

The new generation of parser generator tools epitomizes the wrong direction for software tool development. Programmers naturally want to focus on new algorithms and taking advantage of new hardware. But the problems that need solving lie more in the direction of human factors. If programmers make poor interfaces, documentation, and error handling for end users, we make spectacularly bad interfaces, documentation, and error handling for ourselves.

The acceleration of technology means that the software world is incredibly more complex now than it was 30 years ago. Back then, it only took 4 years to get a CS degree. Now... it still only takes 4 years -- good luck out there, new graduates! We are too focused on creating software tools that put more and more power into the hands of (from the perspective of any particular narrow tool) less and less well-educated practitioners. Instead, we need much more focus on making tools that make their own limitations transparent, that educate the tool user on how to make good choices, and that work tirelessly to keep the tool user from creating bad code through ignorance. These are largely not problems that require new algorithms and breakthrough ideas, they mostly just require hard work and continual refinement.

Wednesday, March 10, 2010

Let Mozart Be Dead Already

As part of reading the book Technopoly, I'm listening to the old author interview from Booknotes on C-SPAN. Because Neil Postman (before his death in 2003) provided a convenient anti-thesis to Ray Kurzweil's Snoopy-like dance of technological elation, I am sympathetic, looking for things to agree with, stretching to find common ground. But then he had to go and bring up Mozart.

Poor dead Mozart gets dragged into more cultural battles than just about anybody except Jesus and the Founding Fathers. In 1998, the governor of Georgia would send out copies of Mozart for pregnant mothers to listen to (and hopefully take their minds off of Georgia's third-world level health care for neonates). Postman argued no less than that when children hear Mozart, they can't help but feel that there is order in the universe, whereas when they listen to rock'n'roll they feel that life is just one damn thing after another. OK, it's a 1992 interview, but I really thought all this musical snobbery by intellectuals was reasonably passé by the time the Beatles broke up.

If structure makes you feel there's order in the universe, surely a simpler structure makes more people feel more order more... um, better? How about "My Sharona"? Plenty of variation (guitar break in the middle, pregnant pauses, time changes, etc.), but plenty of structure and order. Light that up around some kids the right age (before others teach them to be snobs about what they are willing to listen to) and soon you'll see them all shout "Wooo!" right on cue -- if that's not evidence they hear "order", then what is?

We've been through all this rock-doesn't-have-this-or-that sniping decades ago, right? Yeah, the Beatles used three chords a lot on the Red/Blue albums, so what? Modern musical historians aren't that impressed with young Mozart (and more than a few think his Dad was helping ol' Amadeus out with his homework!). The Beatles learned a lot more chords and a lot more techniques, just like the Beach Boys with their journey from barbershop to complex tonal progressions. And do you think you can tap out all the time changes in Zeppelin's "Black Dog"? OK, in all fairness, Zeppelin couldn't really do "Black Dog" that well themselves live, but still, they got it on tape at least once.

Of course, since Postman rejected the notion that psychology is a "science", it would have been no use pointing out that psychology supports what common experience tells us -- your view of music is strongly influenced by what you listen to in your youth. Fortunately, I was blessed with a large palette of music in my youth: Gospel, country (old-time country, before it became just rock with a twang), bluegrass, rock, classical, swing, jazz -- and that was well before technology made it easy to tune into any type of music you wanted at any time. I presume Postman was not so blessed, and his formal education did nothing to make up for it. I recall that my overtly liberal public education included instruction on defending ourselves against media manipulation, and one class in particular showed how the same song could be recast as rock, country, or R&B to sell the same product to different audiences. I suppose that sort of useful relativism has since been outlawed by the Texas Board of Education.

I do agree with Postman that you can't appreciate Mozart if you don't have the necessary training. It's unfortunate he didn't see that the same is true for most any music of any complexity -- and all genres offer complexity. What you hear if you try to listen to Led Zeppelin without ever having heard Delta blues or read Tolkien, I have no idea, but it will be something short of what is actually present in the music. It's just plain sad that someone who decried "the surrender of culture to technology" didn't have the necessary training to appreciate the large body of culture represented by modern music.

No doubt the deeper underlying myth here is the familiar pining for a "simpler time". It is undeniable that there were some simpler and happier times (with more "order") in this country -- for some. But when Lead Belly sang "The Midnight Special",  he was not celebrating the special kind of "order" African Americans could expect in Houston. It's always hard to separate this brand of nostalgia from one implicit -ism (racism, sexism, rankism) or another. And technology is making that even harder.

As technology accelerates, the conservative desire for "simpler times" is made both more acute and more contradictory. For people are rarely willing to give up anything to get back to simpler times -- they only want undesirable things taken away. Note that at its peak, conservatism in America did absolutely nothing to swell the ranks of the Amish. The folks most deeply concerned about retaining "traditional" gun rights are still pretty much eating corn-fed meat that was killed in a pen with a steel slug coming out of a compressed air hose, friendo.

Here, then, is my common ground with Neil Postman. For he was clear that technology always gets used, and it always brings both advantages and drawbacks. His call was for a discussion on how to minimize those drawbacks before a new technology (like electronic voting machines) takes over and has its way. Unfortunately, such a discussion is virtually impossible. America has a highly optimized machine for taking any national topic of discussion and turning it immediately into a red-vs.-blue search for advantage or slander. There just ain't nearly enough of us watching C-SPAN, I'm afraid, instead of the daily news frenzy shows.

Thursday, January 07, 2010

More Cloudy Thinking

Since my last post was about the cloud, why not another? I was reading the blog of Nicholas Carr (whose "Does IT Matter" currently holds a place of honor in the introduction to my book-in-progress), as he noted approvingly the attempt by Amazon to create a spot market for computing cycles. After failing to get my comments on that thought posted there (ain't technology grand?), I thought "Hey, I have one of those blog things myself." So, one cut, one paste, and here we are.

The attempt to analogize computing services with electrical utilities breaks down at some interesting points. The one relevant to this article is the point about the service provider attempting to avoid unused capacity. For Chicago Edison, customers were buying its services for themselves. For Amazon, customers are buying its services primarily for their own customers. This implies that one set of customers has a greater ability to regulate their usage in the face of cost fluctuations than the other set of customers (especially given the modern web business model in which the customers of AWS customers expect free and fast web services, rather than expecting to have to pay more for better response times).

Most of AWS "juice" is sold to companies who are doing things like running websites, or some form of web-based service for their own customers. These are applications that use both bandwidth and CPU in a notoriously bursty and uneven fashion. It is not uncommon to plan for a factor of 10 of overcapacity to account for the difference between normal and peak usage. Indeed, a big part of what companies hope(!) they are buying from AWS is exactly that: overcapacity -- the guarantee that one can add an order of magnitude of resources in a matter of minutes, without having to pay for that enormous difference in capacity when it is not needed.

This places AWS in the conflicted position of joyously promising customers that they will indeed have overcapacity without paying for its idle time, while not really wanting to (or being able to, in the long run) pay for enough overcapacity to deliver on that promise. Trying to create a spot market (note that this is specifically aimed at customers who do NOT have customer-facing, uncontrollably bursty needs) is an attempt to lessen the cost of this fundamental flaw in their business model. It would be interesting to know whether Amazon adjusts their own internal demands on the fly to lessen that cost as well; Chicago Edison was not using massive amounts of its own electricity to directly compete with some of the businesses of its customers -- another breakpoint in this analogy.

I think the label of "the cloud", and to a lesser extent, attempts to analogize computing to an electric utility, are distracting from basic facts. The "cloud" is really only an incremental alteration of the situation with the boring old issues of running an ISP: how much overbooking can you get away with before you piss too many customers off? How many outages can you have before customers gain a more realistic grasp of your quality of service? How many DDOS attacks and RBL listings before customers realize centralization means incurring difficult-to-estimate risks instigated by the behavior of other centralized customers? I believe the more relevant historical situation to examine is the mainframe, and the lessons learned (or often not learned) about the pros and cons of centralization versus decentralization of computing. Computing time was commoditized quite heavily nearly 50 years ago, just about the right amount of time for the lessons learned then to have been forgotten.