Wednesday, December 02, 2009

Fallacious Cloud Arguments

note: this entry reprinted with permission as a guest post at TechFlash.

I have nothing against "the cloud". Hey, when I was in college, the mainframe was "the cloud" to me. Since I never went off campus to speak of, my data was available "everywhere", no matter what CRT console I logged in from. Cool.

Well, not always cool, of course. The lessons of centralized computing versus decentralized computing are too tedious to recount here, so if you haven't lived through any of this repeating cycle, you could go read about it. Might have to actually go to a library, though, not much of even computer history is completely digitized yet.

Really Don't Know Clouds At All

But what got me thinking about clouds was the Peter Wilson talk on "Google vs. Microsoft: An Insiders Guide" at the Dec. 1, 2009 Ignite Seattle. Wilson said the following: "[...] why cloud computing's going to win. So kinda take my word that it will. The short explanation is: when the bandwidth of your network connection and reliability equals that of your monitor cable, there's no financial reason to have a PC on your desktop any more. My bogosity meter immediately pegged and I sat through the rest of his talk enumerating just how wrong that line of thinking was. That's why I can't learn in a lecture environment; I stub my mental toe once and by the time I finish rubbing it, the lecture is over and I missed most of it.

First, just because something is "going to happen" doesn't mean I care. If it doesn't happen in the next 50 years, well, odds are likely I'll be sitting on a cloud myself (best-case scenario, of course) by then, and not too concerned with earthly matters. You want bandwidth to every house as good as my video monitor has to my PC? Get ready to solve the "last mile" problem again (not that it ever got solved that good the first time around -- try watching Hulu via your cable modem on SuperBowl Sunday). My monitor can actually display really high-definition stuff, not the friggin' compressed, glitchy HD that stutters across the Internet these days. Will Wilson live long enough to see the infrastructure (in particular, that "last mile" that nobody wants to pay for) get upgraded to that kind of quality? Maybe, but I wouldn't bet on it without first getting a look at his medical records. My monitor was also really, really reliable during the last year. Zero seconds of downtime. There's exactly one point of failure between my monitor and my PC: the cable. There's umpteen points of failure between my monitor and Amazon, or Google, or Microsoft -- and they don't even control many of them. So, I'm pretty sure the reliability won't be there in my lifetime, just arguing from the simple statistical rule that the odds that all umpteen failure points will be error-free for a year is the product (not sum) of their individual odds (.99^x shrinks pretty fast as x increases). My neighbor took "the cloud" out with a backhoe in his driveway just a few weeks ago; my monitor cable was never in any danger.

Second, my monitor has very little latency. In fact, I believe the latency is about... oh, let's just call it zero, because it's close enough. Getting the same latency from the cloud is going to require... changing the laws of physics. Sure, you can do lots and lots of useful stuff with Internet-level latencies, but keep in mind, much of that stuff is getting a lot of help from that local PC. I'm typing this post "into the cloud", but in fact it's being buffered locally. If 'twere not so, then I would be reliving ye olde telnet days, and the Internet would be full of HTTP packets containing exactly one byte. Let's just say that would increase general unpleasantness without going for a detailed analysis (don't dare me, I've been looking for a reason to use the phrase "Shannon entropy" in a sentence!). I don't actually recall any of the prospective cloud purveyors proclaiming you would no longer need a PC on your desktop, but since Wilson has worked at two of them, I have to suppose it's possible someone at Google/Microsoft really is thinking that way. Seems bizarre to me.

Third, there's a little problem of psychology. Kindle owners got a jolt when they realized that Amazon really could delete any of "their" books they wanted to. The sellers of cloud voodoo think we want them to hold all our data for us. They clearly have not taken a poll on this. I want to be able to pick up my data in my hand and hold it, and I absolutely, positively do not want Microsoft, or Amazon, or Google to have the power to deny it to me, whether by court order, or hacker nonsense, or maliciousness, or even an act of God. My data, my hand, mine, mine, mine. Sometimes I find it hard to believe these cloud sellers ever lived through the PC "revolution" (don't remind me some of them didn't -- I'm thinking young thoughts today).

My view of the cloud seems to be pretty inverted compared to, say, Ray Ozzie's. I don't want the cloud to be the authoritative copy of my data, waiting to be downloaded to any machine I care to use (well, any machine that passes Microsoft DRM muster). I want the authoritative copy of my data to be on my USB drive in my pocket, and I want the cloud to just make fairly dumb backups (encrypted of course, stored across more than one vendor) as I work -- just in case my USB disk (my data, my hand, mine, mine, mine) and any other local copies I have happen to fail. The portable app movement is on my side, but of course Microsoft wants control, control, control over their apps, so none of them are portable.

I'm no good at predicting the future (though I did call the collapse of Itanium pretty good in one WDJ editorial), but I'll predict this: there will be a market for software that tries to commoditize the various cloud offerings. Got a cloud backup program? Build it so it lets me choose whether to use Amazon S3 or some other cloud vendor for storage. Got some cool picture/movie app? Build it so it lets me choose whether my JPEGs go onto Flickr or Smugmug or whatever (and let me change my mind and move them later). And so on. One can imagine a world in which a "virtual cloud" application automatically moves both data and apps around to whomever has the cheapest rates. That would make all the cloud vendors unhappy, which pretty much guarantees it's good for consumers.

Tuesday, December 01, 2009

Mailing List for PPoP Book

Finally got off my butt and set up a Google Group to function as an announce-only mailing list for my upcoming book "The Pop Psychology of Programming". You can go sign up here (to my amusement, cutting and pasting a Google Groups signup box here in Blogger just produces glitchy non-functionality). I've been working with renewed vigor since viewing the movie "2012" since, obviously, I want to get this thing done before the world ends. Who even knew that neutrinos could mutate? The ancient Mayans, apparently.

This belief that someone else knows/knew a whole lot more than us is, I think, sometimes a way of relieving cognitive dissonance that arises when we ponder miraculous-looking accomplishments we can't understand. (Hmmm, either those folks know a lot more than me, or else -- maybe aliens did it!) The latter, of course, is exactly what some people believe firmly: there's no way humans could have come up with transistors and computers, it is more plausible that we simply gleaned that technology from downed alien spacecraft that now reside in Area 51.

Having grimly ground out my EE degree in college, I could tell these folks that there is extremely detailed and well-dated documentation on how all those technologies have evolved. There are no remarkably enormous leaps of intellect in this history, just determined grinding away at one problem at a time, with innumerable contributions from the serendipity that occurs in all our lives with some frequency.

Sadly, folks who believe we aren't smart enough to have invented computers have no interest in looking at this history. We live in a time where the very accessibility of information makes information more ignored than ever. All manner of detailed technological information is online, but most people would rather debate talking points hatched by political operatives than go spend hours studying, say, the chemistry of climate change or the mechanics of cancer that ensure earlier detection increases the harm done via increased unnecessary treatments.

On gloomy days, I suspect this is the limiting factor that keeps the human race from surviving en masse too much longer: genetic evolution can't move fast enough to help us survive the problems we are creating for ourselves. Darwin was emphatic in his belief that our hope lie in a moral evolution, that would operate via education, religion, culture, etc. Too be charitable, that viewpoint continues to look a bit iffy. Despite my natural pessimism, however, the book will argue that programmers may have a role in tipping the balance in favor of human survival.

Friday, October 30, 2009

Ignite Seattle Returns!

The nice folks at Ignite Seattle! are going to let me speak again! If you're in the Seattle area December 1 (a Tuesday), come on down to the King Cat Theatre. It's free, it's fun, and drinks will be served (but drinks aren't free)!

This will be an interesting experiment for me. My last talk video "went viral", and I want to see if that's reproducible or mostly luck. That talk had the advantage of both containing humor and being topical, since I was able to touch on the Microsoft layoffs that were in the news. This time around, my talk won't be the slightest bit topical or humorous; this time, it's nothing but ideas.

This talk is drawn from the Introduction to my book, which I've been working on for about a year. That sounds nuts, even to me, but the explanation is that the Introduction is about the meta-question of "why should anyone care about computer programming?" Answering that question requires me to read about... well, just about everything under the sun. The result has made me think about computer programming in completely new ways, and I want to convey a taste of that in this talk. Here's the synopsis:

Three Strange Definitions of Computer Programming

Legendary computer scientist Edsger Dijkstra once said: “Computer Science is no more about computers than astronomy is about telescopes.” But if programming is not about the computers, what IS it about? I want to give you three strange definitions of computer programming that will forever change how you think about software. Exploring the true nature of programming requires tracing its connections with philosophy, psychology, evolution, and physics, and following these threads leads to a startling conclusion: computer programming is not a product of the human mind – it's a product of the mind of the universe!

Saturday, October 03, 2009

Privacy is a Funny Thing

Privacy resurged to the forefront of public debate after 9/11, and one of the more chilling examples was reports of the FBI trying to strong-arm libraries into handing over people's library records without a warrant. This was kinda dumb, since warrants are handed out like candy, and the Patriot Act forbids anyone getting such a warrant from making that fact public. It was even more dumb if you know anything about librarians; these are folks who've spent a lot of time thinking about why they're in the library business -- they don't go into it to get rich.

With that backdrop, you would think it's nuts to create a startup that relies on people being willing to give up privacy about their reading habits. But that's just what Library Thing does. Library Thing is one of the most successful book cataloging sites. You create an account for yourself, and start entering the books you have on your shelf. You don't have to expose your collection to others, but most people do. That's interesting. Why is that?

There are "social" features built in to Library Thing to encourage you to abandon the same privacy you would shriek to see your librarian surrendering on your behalf. You can see who-all owns the same book you do, or even who has "similar" collections (fat chance for me; since Paula and I are sharing an account, we are only "similar" to people who are experts in psychology, programming, and astrophysics).

But I think the basic motivation to abandon privacy is simpler. If you're going to go to the trouble to enter all your books in some digital list, you're probably a bibliophile, or at least suffering the early stages of the disease. In other words, Library Thing implicitly selects people who are invested in and proud of their book collections. Not to mention, people who have so many books that they have experienced the embarrassment of bringing home a new purchase only to find they already own it -- a disappointment Library Thing can help you avoid by letting you use your cell phone to check to see if you already own a particular book.

Library Thing lets you control your own privacy, but in some sense, simply makes it "cool" to abandon your privacy. There certainly is a Tom Sawyer component here: c'mon, tediously enter all your books, put them on display to others, and I'll let you pay me for the privilege! I'm bought in at this point because the interface is tolerable, export/import seems to work (not going to enter data I can't move elsewhere), $25 for lifetime service is cheap, and I really, really hate staring at a book at Half-Price Books and wondering if I already own it or not.

You can watch me helping Library Thing whitewash their fence by checking periodically here. By entering a shelf per week, I predict we'll have the entire collection entered before the end of 2011.

Monday, September 28, 2009

Commas Depend on Linebreaks

Comma usage drifts over time. Comma cutbacks were in effect by Dickens time, but his usage would be deemed liberal by today's standards. It seems to me that even in the span of my reading lifetime (five decades), comma usage has become even more sparing.

Something bothers me about this development that I've never nailed down, like a movement flickering at the edge of vision. Today, while reading Nevada Barr's Borderline, I finally caught hold of what it was: the more sparingly one uses commas, the more one is left at the mercy of line breaks.

Commas, or at least the ones most likely to be judged optional, indicate pause. The game becomes "will you see the natural pause implicitly, or must I point it out to you?". As more and more optional commas are left out, what's been subliminally annoying me is that the subtle physical eye movement pause that line breaks impose become more likely to arrive at the most perfectly wrong moment.

Consider Barr's sentence that finally woke me up:
Maybe she was blackmailing
him for money even he couldn't afford or marriage
and a place in society.

I don't know where that line will break on your screen as you read this, but in the book, it unfortunately broke right after "marriage". This had the unpleasant effect of emphasizing "afford or marriage", a syntactically valid construct that is semantically nonsensical. As a result, I had to back up and re-read the entire sentence to make sense of it. In this sentence, it would have been perfectly grammatical to have placed a comma after "afford", and that pause would have made it crystal clear that "or marriage" was only the beginning of the alternative.

In this particular sentence, one could argue that the missing comma is an error because it's needed to show whether the "or" or the "and" binds more tightly. I claim, however, that I could read the sentence without any hiccup if the line break had not landed exactly in the place a pause made syntactic (but not semantic) sense. In any case, this was just the example that woke the camel up -- I'm certain I've seen other situations with little to favor the missing comma apart from how its absence interacted with an unfortunate line break.

The dastardly thing about this is that authors generally have no defense against this. Final fine-tuning of linebreaks is an activity that logically takes place last, after all other corrections are in. In theory, apart from hyphenation gaffes, fixing up line breaks (and widows and orphans, at the page level) should not introduce errors. And, of course, this isn't really an "error", it's an irritant. But even if recognized as an irritant, it would be fairly hard to check for, by either human or machine means.

This seems to me to put a weak bound on the trend towards fewer commas. The odds of this irritant go up the more optional commas you omit, and unless there's no typesetter in your book's future (unlikely unless it's a fairly amateur production), you'll probably have no means to catch such things. On the bright side, possibly I'm the only person who notices such things!

Monday, June 15, 2009

The Limits of Melancholy

Of all the programming books you've never read that I've read more than once, The Limits of Software by Robert Britcher is my favorite. I say you've never read it with some confidence because a) few programmers read books and b) this book was doomed to a narrow audience by its very nature.

The first audience-limiting feature of the book is its style. As Robert Glass says in the Foreword, "It's part storytelling, part history, part art, part science, part philosophy, part logic--all entwined around the subject of computing and software." At the hub of it all is Britcher's experience with one of the largest engineering projects in history: the computer systems of the FAA. But that is only the hub, and the spokes are long and not straight or predictable. If you just want a straight story on his experience with the FAA Advanced Automation System, you can read Britcher's more ordinary recitation that appeared in the book Software Runaways. But this is more like sitting down to talk to someone who was in World War II--you don't get facts, figures, or dates like a textbook would give you, but you get stories that make you think, that you can find morals in, if you care to look for them.

The second audience-limiting feature of this book is that you kind of have to be an old programmer to appreciate it. For example, Ada is mostly just a name rather than a saga to a great many programmers who never knew a PC-less world. I remember when some distant cousin, being a military contractor, notified my parents that I should be learning Ada because it would soon be required of all programmers. I worried briefly, noticed that nobody else writing C code was worried, then forgot about it. Later, a small consulting company I was in got the job of helping port an Ada compiler to some Unix platforms. We marveled at the test suites it had to pass and wondered no more where our income tax money went. Much later, I briefly plunged into a real, honest-to-God Ada disaster as a sub-sub-sub-contractor on an anti-submarine aircraft software project for the military. The game, as it was explained to me by my elders on the project, was that everyone knew the project would fail, but we had better make damn sure that we were not the first sub-contractor to miss a milestone and get the blame for the true state of things. As I stared at hundreds of lines of paper Ada (there were not enough terminals for me to actually edit anything) and found that not a single identifier could be understood without first looking through reams of definitions and declarations elsewhere, I decided that this was the game Ada was born to play. I think it's hard to grasp lines like "Ada the language was eclipsed by Ada the cottage industry" if you never brushed up against it in your life.

More Than a Feeling

The thing that's unique to me about Britcher's book is that reading it leaves me with a powerful feeling. I get lots of things from programming books, but usually not an emotion. Oh, I did get a giddy feeling when I read Peopleware that the state of programming management was really going to improve. How embarrassing it is now to confess that I actually thought a book could make a dent in management behavior!

It took me a long time to put my finger on exactly what it was that The Limits of Software made me feel. Then I realized: it was melancholy. Melancholy is a hazard of age I conclude, now that my dotage is upon me. When you see email and instant messaging being invented again for the umpteenth time as though it were something new, or watch seven rounds of layoffs each be followed by hire-backs as management could never quite grasp that laying off N people reduces the head count by N+M (where M is the number of people deserting a sinking ship), melancholy is the natural response.

But the melancholy that Britcher instills is grander than I can convey here. The subject, after all, is about limitations, not abilities and accomplishments. If Isaac Newton saw further by standing on the shoulders of giants, Britcher sees further by standing on the piled corpses of our dreams. Dreams of bug-free code. Dreams that formalism and better processes will prevent software disasters. Dreams of "the right way" to develop software. Dreams of a better programming language that will remove our current dissatisfactions. Dreams that reusability will drastically lower development costs. Dreams that technology is the answer to human problems, and all shall be well, and all shall be well, and all manner of things shall be well.

Wednesday, June 10, 2009

The Old Old Thing

The latest installment of the sad, long-running serial "Programmer, Meet the Book Business" is supplied by Raymond Chen. He writes The Old New Thing, a blog full of highly useful technical information for those dinosaurs (like me) who still interact directly with the Win32 API (rather than relying solely on one of Microsoft's 15 bloated, incompatible frameworks designed to make programmer's lives easier). The blog post title of Whew, I'm not doing *that* again! sort of gives away the crux of his experience as a book author, but in case it left you with doubts, you can read on to find him say "I make barely any money from the book at all.".

One part of me says that stories like his are doomed to keep repeating at least in part due to the same mentality that causes Cartesian Programming. Notice how, long after agreeing to do the book, writing the book, and waiting around to see it flop, he finally gets around to looking to see what other programming book authors' experiences are. My prejudice is to believe that this is characteristic programmer arrogance, but maybe it's just human. After all, there are people who make a living walking up to people, telling them they want to make them famous models, and then extracting fees from them. They couldn't make a living if most people exercised the commonsense strategy of stopping to do a little research on how professional modeling actually works.

What Went Wrong?

Without knowing the slightest detail of Chen's book publishing experience, I can probably paint a picture that describes what happened here that's in the ballpark. Acquisitions editor has a quota to fill. Publishers aren't like Pixar, they don't work as hard as it takes to make every title a hit. Quite the opposite. They work hard to keep their costs down so they can throw a bunch of titles against the wall every year and see if any of them stick.

Raymond was an attractive target, since he had already cranked out a ton of technical information in his blog. Yeah, that has the disadvantage that the content was already out there for free, but it has the advantage of raising the odds that the author can actually deliver a manuscript in a reasonable amount of time. You can take those risks by keeping costs low enough

Part of keeping costs low is that the author gets to market the book. Even though there is often some mention of this fact to the author, usually publishers fail to convey how much the author's efforts can make the difference between success and failure.

Finally, there's the small matter of making the author any money. Almost all publishers who will offer you a book deal are publishing the way God intended: with 45 middlemen taking a cut before the author gets a dime. After all, you want your book to appear in a real bookstore right along with... the latest teen romance/vampire novel, don't you? In this tried and true (and in recent years, truly failing) publishing model, almost none of the people in the chain know or care the slightest thing about the content (remember that thing we somehow expect people to pay actual cash for?) of your programming book. And, of course, the final act of not caring is when the chain bookstore that never made any effort to sell your book decides they can't sell it, so they tear the covers off and mail them in and mulch the books (the covers prove they aren't just lying about how stinking bad your sales were). It really is a sad soap opera, isn't it?

What Could Have Gone Right?

If I had made it my life's duty to make Raymond's book successful, could I have done any better? Who knows, but I think so. First off, forget about putting this book in every book and grocery store. This book should have been sold direct via the web. Do you really think that even 0.0001% of Borders customers just happened to stumble over this highly targeted, highly technical book and decided to buy it on impulse? Of course not. Just about everybody in the target demographic for this book spends lots of time on the web and is comfortably buying books off the web. Selling direct makes it possible to spend a lot less money on middlemen. You think I jest when talking about 45 middlemen? Take one example. When selling books in a bookstore, suddenly the cover and spine of your book take on majestic importance to people at the publisher who will never, ever, actually read your book. Someone has the job of being obsessed with how your spine will stand out from its neighbors in the bookstore. And, being a first-time author, you have some longstanding dream about putting something idiotic on the cover like your Grandma's favorite dead cat -- it's their job to talk you out of that crap while not pissing you off. That person gets paid a salary. Selling direct removes the need for most of that dithering (well, except for the part about convincing the author to give up their nutty cover demands).

Of course, the real pain of the middleman is much bigger than any of the small potatoes like the cover designer. The distributor wants a big percentage of your price just for the privilege of getting a copy or two into the big chains. Once you've got your book distributed, what do all those various bookstores do with it? Why, they start competing on price. And boy, can they afford to compete when their contract says they can just return/mulch any copies they can't sell. For example, right now Amazon claims that Raymond's book costs $40, but they have discounted it to $26. If you sell direct, nobody else gets to decide how much to discount new copies of your book. As you can see, Raymond's book could have been sold direct at a no-discount price of $36 and still left $10 more profit per book to go around.

Or, suppose I really do want the customer to get the book for $26. If I'm a small publisher, the distributor might only want to pay me $10 for each of Raymond's $40 books. If I sell it direct for $26 (same as Amazon's discounted price), my costs for running the credit card and doing fulfillment would have to be $16/book to make life as bad as using a distributor. I can pay others to do the processing and fulfillment for much less than that.

I have no idea what-all marketing took place for this book, but I suspect it could have been better. Like I keep saying, this book has a pretty narrow audience, and you have to turn that to your advantage by knowing (publishers usually don't) where that exact demographic hangs out. For example, the Association of Shareware Programmers has quite a few hard-core API programmers, and when Raymond's book was mentioned there (not by Raymond or any marketing folk from his publisher), most people hadn't heard of the book or him. For the price of a couple of review copies to selected ASP members, I would have stimulated some early easy sales and good blog reviews.

Publishers (and often authors) don't really have the savvy to know things like the fact that the ASP would have been a good (essentially free) place to market this book. They are operating on thin margins, and can't really afford that kind of cross-domain expertise (if they even knew how to find it).

Will This Change?

Publishing changes pretty slowly, but there's no reason tech publishing can't change faster. Most publishers are focused on the threat/opportunity of electronic publishing. I think they've missed a big boat. Good old paper tech books still have a lot of life, IMHO, but the model has to change so that it can actually afford to pay the writer and (even more important and expensive than in other genres) pay a damn good technical editor.

Where's that money going to come from? I think it can come without any cost increase to the customer by being a direct-sale publisher and ceasing to pay all those people who can't actually read your book but currently get a bigger chunk of the book cover price than the author. But that's just me; I could be completely nuts.

Monday, June 01, 2009

Ignite Videos on YouTube

All the videos from talks at the 4/29/2009 Ignite Seattle! have made it up to YouTube now. Some links:

I actually only heard about the videos arriving from a friend of a friend who saw it go by on some RSS feed. By that time, there were already 47,000 views, so apparently it got posted on one or more popular sites. I tried submitting it as a SlashDot story, but rejected again (0 for 4 over the years, even though SlashDot likes me enough to let me turn off their advertising).

Alas, the domain I slapped up for the video to link to was inaccessible, proving I'm an incompetent sysadmin. Actually, I do sysadmin-ing to learn, not professionally. My ISP's DNS servers are slaved to my "stealth" server, so it looked to me like was up and running, but I forgot that I had never told my ISP to add that to the list of domains they slave to me for, so nobody outside my network could see it. Oh well, I'll get that little one-page wonder up today for sure.

I was surprised by the 47,000 views. I don't know if that implies something about the talk itself, or just luck of the draw having the right random people post links to it on the right forum. In any case, I'll definitely make an effort to present at Ignite Seattle! again when the book is done. They seem a little fuzzy on scheduling (whaddya want for a free event?), but this last Ignite was by all accounts pretty successful, so I'm hopeful it will become a more regular occurrence.

The popularity of this video has made me rethink the format. I had vowed that if I ever did another Ignite, I would not do a memorized speech, as it's too nerve-wracking. But it's really hard to get the maximal value out of 5 minutes if you don't choose your words carefully, and that pushes you towards using a script. Maybe it won't be as nerve-wracking if I have more than a few days to prepare.

Or maybe one gets better at memorizing with practice. Few of us can recite something the length of the Iliad from memory, but there used to be rather more people on the planet who could do that sort of thing before cheap paper and pens greatly decreased the value of memorization as a skill.

Wednesday, April 29, 2009

Ignite Seattle! Tonight

Tonight is Ignite Seattle 6, the latest in a series of collections of short talks by varied speakers on topics at least vaguely of interest to geeks. I am slated to give a talk on "The Psychology of Incompetence". This has turned out to be basically an extended humorous rant on unrecognized incompetence embedded in the software industry. It'll be interesting to see how well I can perform it and how well it plays to the audience.

The Ignite format was an interesting experience to work with. The rule is that you turn in 20 slides, which will advance automatically every 15 seconds, giving you time for exactly a 5-minute talk with no opportunity to slow down, speed up, or start over. It seems to be a relatively new rule that "no notes" are allowed on stage. Maybe that's just to weed out people who weren't going to rehearse at all.

Unfortunately, I didn't hear about and apply for this Ignite until the last possible moment, so by the time I was notified that my talk was accepted, that left only a few days before the slides were due. My luck running true to form, that was the exact moment I started coming down with the stupid influenza. If Paula hadn't jumped in to help make the slides, I would have failed to deliver.

It was an interesting interative process, and probably my process was different that what many would use. First, I had to write a script that I could verifiably perform within 5 minutes. I was surprised to realize that 5 minutes of talking was little more than one sheet of paper. Then I had to recite the script and mark the slide boundaries. Then slides had to be made that at least roughly corresponded to what was being said at the time.

By that time, slides were due, so they were essentially set in stone even though I had not started serious rehearsing. Rehearsing showed problems. Too much material was crammed into 5 minutes, requiring a rapid pace that allowed no space for audience laughter and, worse, no recovery time if I made even the slightest stumble. Unable to change the slides at that point, I just had to mercilessly edit the script to create a reasonable amount of slack.

Only by yesterday did I get the final tuned script complete. Unfortunately, I'm prone to losing my voice and have entered the coughing phase of the influenza, so I had to stop rehearsing, except for a couple final run-throughs just before going to sleep (to give my brain the hint that that's what it should be moving to longer-term storage while I was asleep).

Today, I'll do a small amount of shuffle rehearsing: pick a random slide and start the talk at that point. The goal being that even if I have the complete disaster of a coughing fit or being unable to remember entire lines, I can hopefully get going again. Then I'll gird myself with dextromethorphan and plastic bag (rebreathing quickly opens the carotid, moving more oxygen to the brain) and head out to give it the old college try. What could possibly go wrong?

Sunday, March 01, 2009

The Evolutionary View of Software

Evolution, as Daniel Dennett explains, is at heart a simple algorithm. 1) Replicate new copies with variations. 2) Select according to some criteria. 3) Repeat. As a pure algorithm, evolution applies to not just biology, but any entities that can provide the required replication-with-variation and selection steps.

The non-intuitive thing about the evolutionary algorithm is that such a simple set of rules can create such incredible complexity. Thus, when the evolutionary algorithm is used by software to design solutions to problems, it is often difficult for humans to understand the design of the resulting solution -- just as it is often difficult for us to understand the design of complex biological entities that evolution has created.

Software as Evolutionary Candidate

The most obvious application of evolution to software is to write software that replicates itself with variations and let it evolve. This gives rise to the scary Kurzweillian future in which software reaches a sentient state and evolves so quickly there's virtually no problem it cannot solve (watch out if it decides humans are a "problem" that needs solving!).

But one reason this doesn't seem to be happening very fast is that software is a lousy replicant. The most obvious proof is that when, on a daily basis across the planet, someone stands up and shrieks because they just lost an important piece of software because of hardware failure, this is never followed by "Ah, never mind, I found another copy growing on the windowsill." Software still needs a directed human hand to replicate, and certainly a directed human hand to select for "fitness".

But what if software is embedded in sensor-ridden boxes so that we can just drop it into the real world and let all the complex rules of reality perform the selection? Then, something closer to real evolution takes place -- but at a pace that is bounded by the pace at which reality happens. Thus, as Dennett reports, it takes such machines many days to "realize" and react to the fact that ambient light on this planet comes in 24-hour cycles. And, when Kurzweil's software goes sentient, all its exponential power will not make it that much better at solving the problem of divorce than humans, since it will have to experience many marriages in real time to begin to evolve improved performance. It is Bonini's Paradox all over again: the better a model approaches reality, the closer it gets to being just as difficult to use and understand as reality. No matter how exponentially increasing our computing power is, reality plods along at the same speed, and many of our most interesting problems have to do with reality.

Units of Selection

There may be more interesting ways to apply evolution to software. Leo Buss, in The Evolution of Individuality says that "The history of life is a history of different units of selection. Novel selective scenarios dominate at times of transition between units of selection." He is talking about a hierarchical view of evolution where, for example, cells transition to multi-celled organisms. Long ago, our ancestor cells had to battle natural selection to survive. Now, my cells play a multi-part game of selection, in which the pressure on their individual survival is less, but they have pooled their fate into the survival of a conglomerate of fellow cells (me!), and that conglomerate faces more directly the selection pressures of the environment.

Of course, where to draw the line between "conglomerate of entities" and "new entity of more complexity" is a little fuzzy. I'm pretty sure I'm a single entity, but if you really get down to the level of one my cells, tirelessly performing hundreds of different tasks, trying to fight off free radicals, perhaps reproduce a few times before it can no longer stave off death -- from that cell's perspective, I look a lot more like a big conglomeration that happens to have some emergent behavior it, endearingly, likes to call "consciousness".

But there is room for kinds of hierarchy in evolution other than just conglomeration. For example, it is now nearly an orthodox view that the reason each of our cells has 2 sets of DNA is that, long ago, the single-DNA ancestor of our cells captured and enslaved another type of cell -- what we now call the mitochondria. Whereas the DNA in the nucleus of your cells came from a combination of your mother and father, the DNA in your mitochondria comes only from your mother -- the slave gets rather less of a shot at the evolutionary algorithm. I wonder if, instead of thinking that software might one day become a better replicator and evolve on its own, it might instead already participate in a master-slave relationship with an existing evolutionary entity: us.

Mitochondria doesn't really get to reproduce except under the direction of its master cell, and this is true of software as well. Mitochondria is also somewhat insulated from selection pressures except as dictated by the enslaving cell, and this is also true of software. Mitochondria gets to live because it serves a purpose in the evolutionary survival of the enslaving cell. Does software serve any purpose in our evolutionary survival?

It does, and that purpose is: information flow. It is information flow that makes a conglomerate of entities survive, and lack of information that destroys the conglomerate. At the level of human society, it is information flow that leads to non-zero-sumness and positive growth. Lack of information flow from code breakers to island defenders allowed the devastating attack at Pearl Harbor that put America in a losing starting position for World War II. Lack of information flow allowed investment banks to take on increasingly untenable amounts of risk without being reined in by their investors -- until it was too late. When the President announces a website to provide transparency on how his gigantic gamble of a stimulus package is spent, it may be part PR, but it is conceptually functional -- if you want to decrease corruption, you have to increase information flow.

Software as Evolutionary Advantage

Software, of course, is pure information. It offers opportunities for increasing both the speed and accuracy of information transmitted between entities in the conglomerate of human society.

In modern society, we have pooled our evolutionary fate into a conglomerate. Although I'm very busy on a daily basis with concerns related to my own survival, it turns out that these require constant interaction with the other "cells" in the conglomerate entity. Much of my food comes from specialists in distant places, and much of the work I do to survive is, rather than being directly related to my nutritional needs, sent out to satisfy other needs of the conglomerate.

In this model, software is a new form of information exchange. Software lets us exchange, not just ideas, but ideas in a precise, executable form. Just as cells had to evolve complex machinery for correcting errors in DNA replication, software represents an evolutionary advance in our ability to accurately replicate ideas.

And this brings me back to the Leo Buss quote. As many authors have noted, human history is grinding towards a turning point, where we either have to evolve an ability to face the kind of global, long-term problems that evolution never prepared us for, or fall back to the "nasty, brutish, and short" lifestyle of simpler times. This may be one of the "times of transition between units of selection" that Buss was talking about, and software may be key to the "novel selective scenarios" that decide the outcome.

Now I have to go write some code.

Saturday, February 21, 2009

Me, The Jury

In every one's life, there comes that day we all dread but must accept. I'm talking, of course, about the day you can no longer dodge jury duty. On the one hand, my civil duty meme says it's good to pitch in and play my part for society. On the other hand, my cynical gene says there's no way any prosecutor is going to put me on a jury, and every day I sit in district court is another day the lovely profits from this book I'm writing are postponed. On the third hand, my stay-out-of-jail meme says that the federal court system is a lot less tolerant of jury dodging than the state, so that pretty much seals the deal!

  • + It's a federal jury, raising the odds we'll be stickin' it to the man rather than trampling the downtrodden.
  • - O.J. is off the streets, so it's unlikely to be the trial of the century.
  • +The courthouse provides free internet access
  • -Have to get up at the butt-crack of dawn to catch a bus and be there by 8:00am.
  • +Serving will allow some hard-working employed person to continue to be productive.
  • -There are plenty of freshly-unemployed people who could do it so that I can continue to be productive.

Of course, anyone can get out of jury duty, using well-documented means, but my civil duty meme has won out and I'm resigned to going and making the best of it. In fact, given that $40/day represents a serious bump up in my income, my answer to the question "Is there any reason you cannot serve?" becomes "No, in fact, if there's any way I can serve on three or four juries at the same time, I would like to sign up for that!" Sure hope I don't have to pay income tax on this windfall.

Technically, of course, I shouldn't get seated on a jury. Once the prosecution sees I listen to NPR, oppose the death penalty, graduated college, suspect televised wrestling might have a predetermined outcome, question the existence of free will, and so on, s/he would have to be nuts to accept me. But it's a human, and therefore chaotic process, and each side has a limited number of get-out-of-seating-that-nutjob-free cards, so as Billy Joel says "Sooner or later it comes down to fate." and getting to sit for the whole 2-3 week trial cannot be ruled out. I might as well be the one. It's a matter of trust.

But probably what really keeps me from trying to dodge my duty is that I've been immersed in psychology for years now. The courtroom is jam packed with psychology. That is, of course, where Dr. Phil made his bucks, leading to his meeting Oprah and his opportunity to start making megabucks. If a defendant has enough money, there's going to be a psychologist on their side studying how to sway the jury.

Once upon a time, I was an expert witness in a Microsoft trial, and psychology was key to my testimony. I was there as a magazine editor to testify merely on some peripheral point about reverse engineering. But at some point during the preparation, it came to the attention of the attorney that there was a connection between a columnist of mine who happened to work for Microsoft, and the actual case. The attorney asked if he could ask about this, I said I would ask the Microsoftie, the Microsoftie said "please, please don't" so I relayed the negative back to the attorney. No big deal, he said, not that important anyway. Well and good, except just before I walked into the courtroom, the attorney said he wanted to walk me quickly through the questions he would ask, and for me to pretend I was on the stand and under oath. He stepped through the questions and then, suddenly interjected with "Is it true that [insert just the question we had previously agreed he wouldn't ask.]?". I stared back at him and, without missing a beat, said "No."

He was using psychology, the pressure of the situation, the idea that I was pretending to be under oath, to see if he could get what we had already agreed I would not offer on the stand. I saw exactly what he was doing and would not go along. Would I have actually gone ahead and lied under oath? Just as in poker, those were cards in my hand that the attorney would have to risk something to see. Since it was, after all, a minor point, he had the good sense not to ask a question whose answer he couldn't be certain of. It's got to be more fun sitting in the jury than sitting on the witness stand--you can snooze a little.

So if you're in U.S. District Court in Seattle during the 3 weeks starting March 23, stop by and see if I'm sitting in a jury somewhere. But please, no lighter salutes, it's just not safe with all that wood paneling

Tuesday, February 17, 2009

They Died, Died, Died

Of all things, the influenza pandemic of 1918 has provided a number of interesting little psychological examples in various parts of my book -- everything from demonstrating that we manage "information workers" the same as ditch diggers to showing that much of the personality testing field is little better than astrology. Events of massive death are always going to produce lots of psychological effects we don't see elsewhere, I suppose.

On my local/government access cable channel, a little show periodically goes by consisting of local medical/government officials sitting in a room talking about planning for the next influenza pandemic. While most of their constituents are watching "ER" or "Desperate Housewives", these people they've never heard of are discussing how to decide who will live and who will die (due to rationing of ventilators), what military options there will be to enforce quarantines, what businesses they may have commandeer to have space to separate the just-waiting-to-die crowd from the still-might-survive folks, how to handle the potential number of corpses that will exceed current mortuary capacities, and so on. It's surreal, but surreal like a tsunami -- you can talk about it and plan for it, but most folks won't take it seriously until it hits and it's too late.

Cheery Little Musical Memes

Hearing me talk about the Pandemic periodically over the months, my wife recently started humming a little ditty she claimed was about the Pandemic, where the lines tended to end in "and they died, died, died". I had never heard of such a thing, and couldn't believe that such a song could span the period from those with strong memories of 1918 and today. But sure enough, she finally dug it up, and here is a Youtube rendition. This tickles my curiosity, because I'm currently mining the literature of "memes", and of course I'm aware that some believe the little song named "Ring Around the Rosie" is about bubonic plague, though others argue it cannot be. Is it possible that worldwide plagues generate memes in the form of music to be passed down?

As with much thinking about memes, this can quickly lead to mushy thinking. But there may be some merit. This meme's survival advantage is easy to allege: those "infected" with the meme are more likely to remember the seriousness of the last plague, take news of a new plague more seriously, and therefore take steps to survive and be in a position to pass the meme on. It's interesting that "The Flu Pandemic Song" contains substantive information about the plague, including its virulence and modes of transmission.

In Gregory Benford's "Deep Time", he ponders the problem of leaving a message ("Stay away! We dumped our nuclear waste here!") that can span thousands of years successfully. It turns out to be a difficult task, for which we have few successful examples. The "Ring Around the Rosie" example suggests memes just can't usefully span that length of time, since we can't agree on what it means. However, the Christianity meme has made it 2,000 years, and though Jesus would surely be shocked at the difference between Evangelical American Christianity and his own teachings ("Let me get this straight -- thou thinkest I would support war and the death penalty?"), clearly some of his original memes have survived in at least a vaguely recognizable form.

Software and Memes

Of course, the reason I'm studying memes is to see whether I can say anything useful about what they have to do with software and psychology. Mostly, I see roads I don't care to go down. Yes, viruses and computer viruses exhibit some shared behavior, as Dawkins recites. Yes, we can simulate evolutionary algorithms with software, just as we can simulate most anything with software. Yes, the right software could be viewed as a "replicator" in the evolutionary sense, and maybe it will take off when the Singularity gets here and we can all download our minds into machines (though surely some unlucky souls will be assigned toaster duty!). Yes, this could all turn into scary stuff to think about (and it is presumed that Skynet will kill Steve Talbott first).

But none of that interests me. I'm an engineer and what I'm looking for is whether the hackneyed idea of memes and the simple evolutionary algorithm that underpins it has something practical to tell me about creating better software.

Friday, February 13, 2009

What is Life?

That's the title of a book based on some lectures physicist Erwin Schrödinger gave back in February of 1943. Imagine this. Fermi had just got his atomic reactor going in Chicago a couple of months earlier, Britain's been devastated by bombing, America has entered the war but it still looks like Hitler just might end up ruling the world, and in the midst of this chaos, Erwin Schrödinger is pondering the nature of life by thinking about how cells work.

Remember that this is well before Watson and Crick, and though funny-looking things called chromosomes had been located inside cells, nobody (let alone a physicist) really knew where the genetics were, where the cell was hiding those traits that could almost magically be passed down from one generation to another. It was a perfect opportunity for Schrödinger to make a bunch of prognostications that would soon prove foolish. But he really didn't -- he was amazingly prescient in his analysis of the nature of life.

Hands Off My Gene Splicer!

Physicists are almost irresistibly attracted to biology. Leo Szilárd, co-patenter of the nuclear reactor and the guy who got the Manhattan Project going by warning Roosevelt about nuclear fission, eventually switched to biology, and designed the radiation treatment he used to successfully treat his own bladder cancer. Richard Feynman dabbled briefly in biology just for fun. Roger Penrose tries to find quantum spookiness in the brain that will keep us from just being meat machines.

I think physicists are attracted to biology because there is a great race going on, much like the building of the Transcontinental Railroad. Physicists are exploring reality from the bottom (particle physics) up, while biologists are driving from the top (living tissue) down. Sooner or later, they are going to meet somewhere in the middle, and physicists really would rather not see biologists be the ones to drive that golden spike in to nail down our understanding of how life works.

Like the Second Coming of Christ, it's hard to prove this momentous event might not be just around the corner, so periodically physicists will make a little run at the problem just to keep their team in the game. And the 1940's were heady times for physicists -- they were bustin' atoms apart for the first time in human history! So it's entirely understandable that Schrödinger, sensing we were close to big progress in understanding cells, would want to take a hard look to see what physics could say about the situation. (His little essay would end up influencing both Watson and Crick in their search for the genetic code less than a decade later.)

Enter Psychology

How did writing a book on psychology and programming lead me to reading 60-year-old physics lectures? It's because my book starts by trying to understand the fundamental nature of what computer programming is, and how it fits into human history. Martin Seligman, founder of the Positive Psychology movement wrote a comment about a book by Robert Wright, called "Nonzero: The Logic of Human Destiny" that piqued my interest. In that book, as he considers human history from the perspective of energy usage, Wright refers back to Schrödinger's essay. And since I have the luxury of writing a book with no deadline, I cannot resist hopping the bus to the University of Washington to read Schrödinger's own words. What do I find there? Descartes once more.

I was just writing about Descartes and his pesky question: is the mind something separate from the body? After pondering the nature of cellular life, Schrödinger eventually cannot avoid making his own pronouncement on Descartes' question. To understand why anyone would care, you have to remember who this particular physicist was.

Schrödinger almost single-handedly put the "spooky" into quantum physics. With one relatively simple equation, he both explained observed results and, like Descartes, raised philosophical questions that remain unanswered today. What Schrödinger offered was an equation that both explained available data stunningly well, but did it by describing matter as a wave. What does it mean that matter can be described by a "wave"? That is still being argued today, but it certainly means that spooky stuff we can't really grasp happens when you get down to the tiny world of sub-atomic particles. One extreme extrapolation of Schrödinger's useful, accurate, but spooky wave equation is the idea that every little thing that can happen, does -- and causes yet another split into an infinite number of parallel universes. Some grown men believe this could be true. Honest.

For the holdouts who still hope that the brain isn't just a meat machine, that there is something special about "consciousness" (as though anyone agrees on what that word actually means!) that will make it impossible to create machines that are "alive", quantum spookiness is one of their last, best hopes. Schrödinger's physics offers a spookiness so rich, and full of bizarre possibiities, that it's hard to absolutely rule out (though most physicists think it bunkum) the possibility that "consciousness" (whatever that is!) is some special phenomenon woven into the very nature of reality, and therefore not something we will be able to recreate by simply reverse-engineering the neurons of the brain. What would Schrödinger have said about this? Fortunately, we don't have to wonder because he had already pondered the question more than 60 years ago. Here's exactly what he said:

According to the evidence put forward in the preceding pages the space-time events in the body of a living being which correspond to the activity of its mind, to its self-conscious or any other actions, are (considering also their complex structure and the accepted statistical explanation of physico-chemistry) if not strictly deterministic at any rate statistico-deterministic. To the physicist I wish to ephasize that in my opinion, and contrary to the opinion upheld in some quarters, quantum indeterminacy plays no biologically relevant role in them, except perhaps by enhancing their purely accidental character in such events as meiosis, natural and X-ray-induced mutation and so on -- and this is in any case obvious and well recognized.

So the very father of quantum spookiness got his vote in early: there is no quantum spookiness involved in consciousness, we are just deterministic machines and we would admit it, in his words, "if there were not the well-known, unpleasant feeling about 'declaring oneself to be a pure mechanism'."

And yet, if you read the epilogue of "What is Life?", which is titled "On Determinism and Free Will", you'll see that, just as his famous equation encompasses the contradition of matter being both a particle and a wave, Schrödinger's personal philosophy embraced the contradiction of being a purely mechanical mechanism but still having the powerful feeling of personal free will. Those who accuse Schrödinger of turning to mysticism are, I think, correct. But we all have to do something with the big, blank page labelled Currently Unknowable, and it's not clear to me that carrying it in a bag marked Mysticism is any worse that carrying it anywhere else.

The Hook

Little to none of this discussion is in my book. The real reason Robert Wright refers to Schrödinger's essay is his observation that the nature of life is to create a temporary island of decreasing entropy, though the 2nd law of thermodynamics is preserved because life emits a waste stream of increased entropy. Therein lies a key to understanding the fundamental nature of computer programming. But you'll have to wait for the book to read about that.

Wednesday, February 11, 2009

Cartesian Programming

I long ago decided that my (not yet finished) book ("The Pop Psychology of Programming", if you're paying attention) had to include a brief history of psychology. The reason is, people have lots of stereotypes and misconceptions of psychology, but what those might be depends a bit upon when you learnt anything about psychology. If most of your psych-ed came from watching 60's TV, then you're still imagining a couch-bound "talking cure". If you took a Psych 101 in the 70's, then you might imagine the field is stuck back in behaviorism. So, a Brief History of Shrinks seems like a plausible way to help get disparate readers more or less on the same page

But what I only recently decided was where to start my History of Psychology. People who write a History of Anything seem to vie with each other to start off at the earliest historical date possible. When it comes to psychology, I believe the winner is the guy who claims there was a pharaoh in ancient Egypt doing psych experiments. Well, I sure ain't gonna start back that far. For a long time, my draft of that chapter started with Freud (having found a cartoon that nicely captures the deconstruction of Freud), but now I've found what the truly best starting point is: Descartes.

Why Descartes?

Descartes is the "I think, therefore I am" dude, and also whom Cartesian coordinates are named after, one of those inventions so taken for granted that it's hard to envision the tedious chaos that preceded it (like automobile cupholders). He does not have that much to do with modern nuts-and-bolts psychology, except for framing a key question that still absorbs a great many great minds today: is the mind something separate from the body?

In that simple-sounding summation is an enormous amount of baggage that can still get even mild-mannered (pot-smoking) philosophy majors red-faced and frothing. Tied up in there are questions of free will and (highly relevant for a synthesis of programming and psychology) whether or not machines can truly "think".

If you're not careful in your reading list, you might skate through a study of psychology and think Descartes' question is none at all -- lots of Smart Folk only exhibit humorous indulgence towards those who still hope to find a Ghost in the Machine (a phrase invented specifically to make fun of Descartes conclusion that mind was separate from brain and body). But even though reductionism has chipped away at the spaces where there might be any room for an ethereal mind/spirit to still be hiding, the folks rooting for the Ghost are in deadly earnest, and not lacking in brain power themselves. Those pinning their hopes on quantum spookiness have no lesser light than physicist Roger Penrose on their side, even though some of them are wildly extrapolating his nubs of true science into flights of fancy. If they are dwindling in number and persuasiveness, well, Kurzweil's "singularity" of machine sentience continues to be in no great hurry to appear and prove them wrong.

Descartes As Programmer

Having finally settled on Descartes as my start, I am pleased to recognize him as having a true, stereotypical programmer personality. A prickly fellow, not inclined to suffer the mental deficiencies of others in silence, I think he almost certainly would have been a programmer today (although, Wolfram-like, he most likely would have insisted on inventing his own computer language rather than deigning to use one invented by his lessers).

But most of my newfound fondness for Descartes comes from the realization that he just wanted to solve everything himself, and not have to pay attention to other folks' solutions. "I think, therefore I am" was his insistence on building on absolutely nothing gotten from no one else. How can one look at Richard Stallman insisting on reinventing Unix from scratch, or Steve Gibson insisting on writing massive applications entirely in assembly language, and not see the spirit (others might choose another word) of Descartes?

As I will write about extensively in my book, the programming industry holds some deeply mistaken views about talent and the brain, and those mistakes push it to hire and encourage just the sort of folks who are not so interested in learning from the works of others. All this mishmash comes together in a golden opportunity to coin a new term: Cartesian Programming.

Cartesian Programming

"Cartesian Programming" (so sez I) is the practice of coding a solution to a problem without making the slightest effort to examine any prior work done on that problem by others. One might phrase it as: "I code, therefore I am (not interested in reading your code)".

There is a fly in my ointment. It turns out that someone else has already coined the term "Cartesian Programming" to refer to some academe-doomed programming language construct whose practical value could not fill a teaspoon, even if you spit into it to help. (Perhaps a sense of phrase-ownership has made me harsh!) But these things are best settled by gentlemanly edit-war on Wikipedia, where opinion goes to be (nearly) made fact. I trust I can generate enough enthusiasts for my definition to mold Wiki-reality my way.

Tuesday, February 10, 2009

Names Can Kill

In my book, I point out that one problem with writing clear code is variable name rot (inspired by Ward Cunningham's discussion of variable names in his 2004 OOPSLA talk). You named a variable something like AccountBalance, but then later had to change the code so that contents of the variable are rounded to the nearest dollar. Someone else then, reading your code as they modify it, fails to realize that AccountBalance really should have been named RoundedAccountBalance, so they perform penny-wise calculations that they should not

When is "Carcinoma" Not Cancer?

As I point out in the book in a footnote, the problem of naming things correctly is not isolated to computer programming. In particular, medicine has naming mistakes that cause enormous problems.

For example, every year, women are diagnosed with Ductal Carcinoma In Situ, or DCIS. Well, we all know that "carcinoma" means "cancer", so it's no wonder that many of these women elect to have their breasts amputated even though the treatment of lumpectomy followed by radiation is just as effective in most cases. There's just one little problem with that: DCIS is not cancer.

People often imagine that cancer is some foreign invader that has horns and a tail when you look at it under a microscope. It's not. Cancer is something going wrong with your own cells, and that "wrong" happens in many stages, so deciding whether or not you have cancer is a judgment call made by a guy you'll never meet who spent a little time (not much -- he has lots of others to process) staring at some of your cells under a microscope.

What DCIS technically is, is a "pre-cancer". The guy with the microscope looked at your cells, said, "well, that's not cancer, but it's definitely funny looking and I think it's real likely to turn into cancer". You certainly should either get it treated, or monitor very frequently for the appearance of cancer.

But the medical community does a lousy job of grappling with this simple question: How many women would avoid breast amputation and elect for lumpectomy followed by radiation instead if DCIS was presented to them as "not cancerous yet" instead of being presented as "Stage Zero breast cancer"? Many doctors do at least take the time to feel a little bad about the astounding percentage of women who elect amputation even though they don't technically have cancer, but the only real result is periodic handwringing in a newspaper article somewhere. Don't get me wrong: electing mastectomy for DCIS can be a rational choice; it's just that the numbers strongly imply it often is an irrational choice, and calling a non-cancer "carcinoma" surely contributes to that irrationality.

The Misnamed "Tumor Cell"

DCIS is a more or less accidental bad naming choice as far as I can tell, but there are other bad naming choices that I suspect arise from a profit motive. One of those is in the news today, and it's called a circulating tumor cell.

Doctors have known for years that people with cancer are real likely to have cancerous cells floating by in the bloodstream. But only recently have people been creating technology that lets someone quantify how many cancer cells you have in your bloodstream in a repeatable way. Remember, deciding whether or not cells are cancerous is a judgment call. A little trick is required to achieve a highly repeatable count of circulating tumor cells.

Here's the trick: there's a particular kind of cell, the epithelial cell, that really shouldn't be floating around in the bloodstream. But if you have a tumor, there's a good chance it will shed these epithelial cells. Aha! Medical science has been getting really good at tagging particular types of cells, so in recent years we've seen the introduction of lab tests that semi-automatically locate and mark those epithelial cells that shouldn't be floating around if you don't have a tumor. Now, a guy still has to look through a microscope, but it's pretty much reduced to a job of counting how many green dots he sees, since the cells of interest have been chemically marked for him.

But wait, it gets even better. An American version of this technology, named CellSearch, managed to get FDA approval, because they showed that (in a very particular situation), if your circulating tumor cell (CTC) count didn't go down shortly after you started chemotherapy, you weren't likely to live as long as the folks whose CTC counts did go down after starting chemotherapy. Cool beans, because this implied a doctor might choose a chemo drug, see that the CTC was not going down, and get at least one more shot at quickly switching to another chemo drug in the hopes it might be more effective.

What's the Naming Problem?

Where does the naming problem come in? Well, as you can see, I've lapsed into saying the phrase "circulating tumor cell" (or "CTC"), which is exactly what the folks who make the CellSearch test want. But is that name really accurate? Remember, this tests works by assuming that an epithelial cell wouldn't be floating around out there unless it was from a tumor. So, the more accurate name (which the CellSearch folks fastidiously avoid) would be "circulating epithelial cell test", because no human actually goes in there and confirms that each of those epithelial cells really is cancerous.

But if that assumption (which is swept under the carpet by using the CTC name) were faulty, how did CellSearch get approved by the FDA? Well, they did a quite impressive study, that's how. Tested a bunch (about 400) of women who came in for a breast biopsy, followed them to see which actually turned out to have cancer, and found that almost nobody who didn't have cancer had a non-zero CTC. Cool beans, because a sample of 400 is about what you need to prove statistical significance... if your sample is representative of the general population. Is it possible that women who are getting a breast biopsy are just not quite representative of the general population of women? I claim it is.

See, we used the CellSearch test on my wife after her breast cancer treatment. Comfortingly, her CTC descended eventually to 0 or 1. Hooray! Until one year, the test came back 12 (very roughly speaking, a "bad" CTC count is 5 or bigger). We were using the test well outside its FDA-mandated application at that point (remember, it was only approved for use trying to guess if your first chemo choice was working), but there was no way around the fact that it looked like bad news. Hoping for the best, I wondered: was there any way those 12 circulating "tumor" cells were actually just circulating epithelial cells that weren't cancerous at all? In fact, even though the CellSearch folks had a great study that said the answer was "no", the answer was "yes".

As I re-educated myself on the whole CTC literature, I discovered that the Europeans have a competing technology, called MAINTRAC. But they had done a fascinating study that the CellSearch folks seem to have no interest in replicating. They wanted to look at the (scary, sorry) idea that cancer surgery itself can send tumor cells out into the blood, possibly raising the odds (but probably not hugely) of the patient getting a deadly metastasis. They tracked the CTC count in the hours after breast cancer surgery and, not to their surprise, found that they went up after surgery. But the part of the study that interested me was the control: they did the same CTC tracking on a benign case, a patient who got surgery but didn't have breast cancer. The CTC number peaked at over 50,000 in that patient. And now I'm back at the naming problem, because if your "circulating tumor cell" test reports a number of 50,000 in a patient that has no tumor, then maybe that's not quite the right name for it. Solution? If you read that paper, you'll see that they are careful to say "circulating epithelial cell" and not "circulating tumor cell". Now, even though they have a test just like CellSearch, the Europeans don't call it a circulating tumor cell test, because they know that name can be misleading. They know that what the test actually counts is epithelial cells, and know of at least one situation where those epithelial cells don't point to cancer at all.

But back to the cliffhanger! My wife gets the scary CTC of 12. Did she have a deep cut that could mimic the epithelial cell mobilization seen in surger? No, but it dawned on me. She exercises. Hard. In fact, she runs a lot of half-marathons, and those come with blisters and who-knows-what kinds of internal damage. So I proposed a little test. No exercise for 2 weeks straight, followed by another CellSearch "CTC" test. The result? Tada! A count of zero. Now what are the odds that you can pick 400 women who just got a breast biopsy, and none of them had run a half-marathon within the previous week? I'm guessing the odds are better than you think, because a) women getting a breast biopsy tend to be older and b) they had time to get real worried about having cancer and cancel any big events (like a half-marathon) between when the doctor found something suspicious and when the biopsy actually got scheduled and performed.

So my hypothesis is (backed up by one data point) that, in contrast to the underlying assertion of the name "circulating tumor cell", the CellSearch test is easy to generate false positives for.

Your Turn, Men

CellSearch is in the news today because it was used in a study of prostate cancer patients. I was dismayed to see that all the news reporting I could find faithfully repeated the name "circulating tumor cell", and absolutely, completely reinforced the idea that this test has no false positives -- even though the Europeans have proven that false. Why does it matter? If the test gets widespread use, some cancer patient is going to do something to get a false positive (exercise too hard? cut themselves shaving? who knows?). And their doctor is going to make a potentially devastating medical decision based on that false positive, a decision they might be much slower to make if the name of the test were not so inaccurate.

Names can kill.

Friday, February 06, 2009

Back Pain and Programming

Old age has made me tend to see the big picture in everything, a world where everything's related to everything else. Today's headline is the failure of imaging (using X-rays, CT, and/or MRI) to improve outcomes in patients with back pain. Which, to me, has much in common with programming methodologies. Have patience; I can connect them.

Back pain has been a snake pit of medicine for years now because a) back pain often gets better even if you do nothing and b) symptoms are highly subjective, so most any treatment can be shown to possibly help a little and c) people will pay big bucks to get rid of the suffering of back pain. It hurts! So, all those factors add up to a field where medical professionals are making big bucks and, truth be told, they are in no hurry at all to conduct a rigorous study of their favorite surgery because it would be a kick in the reimbursement pocketbook if the study showed results no better than placebo.

Now here's a little study about back pain that your local back pain surgeon will never tell you. They tried to figure out what factors predict who will benefit from back pain surgery. After all, no point on cutting everybody open if you can predict in advance which patients are unlikely to feel any better afterwards (besides, they might sue if they feel no better). There were a number of candidate factors they looked at, the obvious medical stuff like exactly which vertebrae looked bad on a scan.

Here comes the study punch line, as it always must. The #1 predictor of whether or not you will benefit from surgery for back pain has nothing to do with your back: it's your psycho-social state. In other words, if you're going through a divorce, your boss hates you, you have few friends or opportunities to enjoy life, etc., then the odds that surgery can fix your back pain are not good.

Now you can try to spin that result in different ways, but just suppose the simplest and most obvious interpretation is true: the root of much back pain is in your mental and social condition. That does not mean your back pain is "all in your head", but rather that back pain is just one of those bad things that can result when you walk around all day clenching your fists, pumping stress hormones, and generally feeling pretty bad about life.

Now to connect the dots. For many businesses, software development is a pain -- in the butt, if not the lower back. However, there are many folks out there waiting to sell your business some solution (new tool, new web service, new methodology, etc.) that promises to reduce your programming pain. Suppose for a moment that, as with lower back pain, a major source of software development pain comes from psycho-social conditions. Maybe the company uses zero-sum bonuses so that if you help your co-worker with her programming problem, you may just be taking bonus money out of your pocket and giving it to her. Maybe the company has an incompetent, top-down review system, so you experience the stress of knowing that you may get a great review for a quarter when you were goofing off, and your worst review when you were working really hard and doing your best work. There are innumerable psycho-social conditions that can help make software development unlikely to go well.

Just like bad back surgeons, that consultant/salesman is happy to sell you something that maybe, kinda, works sometimes for some folks, but in no hurry at all to chip in some funds for a rigorous study that might show their solution is no better than placebo. And make no mistake, there is a placebo effect in business experiments: it's called the Hawthorne effect.

But even though, just like a bad back surgeon, that salesman/consultant can point to some stunning successes, maybe the odds that the proposed cure will help you have little to do with the cure itself, and more to do with what kind of psycho-social shape your organization is in to start with.

But just as with back pain, understanding this may not help. After all, it's much, much easier to buy a new software tool or hire a consultant than it is to make a serious structural change like moving to non-zero-sum incentives, or a review process that identifies poor managers rather than just poor performers. As I will argue in my book, the fact that few programmers or programming organizations can make serious psycho-social changes effectively represents competitive advantage. Those few who can grasp and adapt to the psycho-social forces that make programming harder than it needs to be end up with a business advantage every bit as solid as a patent or trade secret. They won't have to defend it in court, but simply rely on the general difficulty that humans have with change.

Thursday, February 05, 2009

The Halo Effect

The book on top of my desk recently is The Halo Effect, by Philip Rosenzweig. After you've been doing heavy research for some years on a book, you start to see all the threads are cross-connected in ever more complex ways. So, Nassim Taleb provides a cover blurb for The Halo Effect,, and I already lean on Taleb's The Black Swan in my introduction (a grand sweeping attempt to re-view programming in the context of psychology, philosophy, physics(!), and human history), and also used Taleb's Fooled by Randomness in the Psychology of Incompetence chapter.

What does The Halo Effect have to do with the psychology of programming? Actually, the name refers to a psychology study by Thorndike in the 50's, in which he found that superior officers tended to rate subordinates as either good at everything, or lousy at everything. No nuances in between, no people who had both significant strengths and significant weaknesses.

One thing psychology is not so good at is re-integrating its own findings over time. Pyschology researchers go off building up their own particular ideas, and they have little incentive to note how their idea (invariably with its own cute coined terminology) overlaps with others, or with much older research ideas. Starbuck (see below) has insights on this problem for both psychology and social science research. To me, the Halo Effect is pretty much the Fundamental Attribution Error examined in a context of group dynamics, and that is how I will use it.

The first nitty-gritty psychology chapter in my book is Attribution Errors, because I think the Fundamental Attribution Error is really the most simple and influential psychology concept you can learn, especially as applied to others (most of the book is focused on the individual mind, so it's good to get something group-related out at the beginning). Rosenzweig's book gives me a slightly different slant on the FAE and, being at least modestly academic, provides some relevant references into the psych research literature, which I always appreciate for my next research trip to the University of Washington.

Rosenzweig relentlessly dismantles the validity of Jim Collins' uber-popular business books, based on fundamental flaws in his research model. This makes me look to see if Rosenzweig is connected to another thread: William Starbuck. But no, "Starbuck" is not in the index (though "Starbucks" is). Did Rosenzweig really not read Starbuck's astounding The Production of Knowledge, in which he dismantles, not just the research methodology of some popular business books, but the entire field of social science research? OK, well Starbuck was recent, 2006. But Jerome Kagan is not in the index either, and he was pointing to problems with research that relies on questionnaires at least as far back as 1989, in his book "Unstable Ideas". Kagan never lets himself forget that he long ago mistakenly believed (and taught) that autism was caused by the behavior of mothers; he uses the memory of that mistake to maintain a high degree of skepticism about the limits of research.

This is the curse of modern academic life. The sheer volume of ideas produced and published each year guarantees that you will overlook some useful ideas that are highly relevant to your own. All you can do is push your ideas out there, and hope the connections you missed don't completely invalidate your work, and that they will be spotted and put together by someone else.

The flip side of this curse is that it makes possible the modern Renaissance man (or woman) of science. When Richard Feynman made his brief foray into biology, he quipped that he was able to quickly make a contribution because he hadn't wasted all that time learning the names of every little body part, like a "real" biologist has to do. In this world of information overload, the amateur who is willing to plunge into hours of reading just looking for connections can sometimes make a contribution in a field that nominally requires deep expertise.

Thus, just yesterday I find myself, sans medical degree, writing to the author of a medical study appearing in the headlines this week to point out what the experts in his field have overlooked. The headlines were about the discovery that kidney failure patients on dialysis who live at high altitudes do better than those at low altitudes. The renal specialists, of course, imagine that this must be somehow connected to the hypoxia of altitude stimulating more red blood cell production. What I know, that they don't, is that a) it takes more altitude than they imagine to stimulate red blood cell production (that literature lies in sports medicine, which nephrologists do not read) and b) there was a recently discovered amazing effect: oxygen breathing can stimulate EPO, the natural hormone that tells the bone marrow to make more red blood cells.

The trick for me is knowing that nurses will put a dialysis patient on oxygen if their oximeter indicates serum levels are low. Thus, the most likely way that altitude influences dialysis patient outcome is by virtue of the fact that they are getting more oxygen, and their caregivers are unaware that this can stimulate red blood cell production, just like giving them a shot of Procrit.

Of course, as my mother-in-law likes to exclaim "But you don't get paid for any of that!". Which is true, and makes me realize it's time to get back to writing my damn book.