November 17, 2008
We Are Typists First, Programmers Second
Remember last week when I said coding was just writing?
I was wrong. As one commenter noted, it's even simpler than that.
[This] reminds me of a true "Dilbert moment" a few years ago, when my (obviously non-technical) boss commented that he never understood why it took months to develop software. "After all", he said, "it's just typing."
Like broken clocks, even pointy-haired managers are right once a day. Coding is just typing.
So if you want to become a great programmer, start by becoming a great typist. Just ask Steve Yegge.
I can't understand why professional programmers out there allow themselves to have a career without teaching themselves to type. It doesn't make any sense. It's like being, I dunno, an actor without knowing how to put your clothes on. It's showing up to the game unprepared. It's coming to a meeting without your slides. Going to class without your homework. Swimming in the Olympics wearing a pair of Eddie Bauer Adventurer Shorts.Let's face it: it's lazy.
There's just no excuse for it. There are no excuses. I have a friend, John, who can only use one of his hands. He types 70 wpm. He invented his own technique for it. He's not making excuses; he's typing circles around people who are making excuses.
I had a brief email exchange with Steve back in March 2007, after I wrote Put Down The Mouse, where he laid that very same Reservoir Dogs quote on me. Steve's followup blog post was a very long time in coming. I hope Steve doesn't mind, but I'd like to pull two choice quotes directly from his email responses:
I was trying to figure out which is the most important computer science course a CS student could ever take, and eventually realized it's Typing 101.The really great engineers I know, the ones who build great things, they can type.
Strong statements indeed. I concur. We are typists first, and programmers second. It's very difficult for me to take another programmer seriously when I see them using the hunt and peck typing techniques. Like Steve, I've seen this far too often.
First, a bit of honesty is in order. Unlike Steve, I am a completely self-taught typist. I didn't take any typing classes in high school. Before I wrote this blog post, I realized I should check to make sure I'm not a total hypocrite. So I went to the first search result for typing test and gave it a shot.
I am by no means the world's fastest typist, though I do play a mean game of Typing of the Dead. Let me emphasize that this isn't a typing contest. I just wanted to make sure I wasn't full of crap before I posted this. I know, there's a first time for everything. Maybe this'll be the start of a trend. Doubtful, but you never know.
Steve and I believe there is nothing more fundamental in programming than the ability to efficiently express yourself through typing. Note that I said "efficiently" not "perfectly". This is about reasonable competency at a core programming discipline.
Maybe you're not convinced that typing is a core programming discipline. I don't blame you, although I do reserve the right to wonder how you manage to program without using your keyboard.
Instead of answering directly, let me share one of my (many) personal foibles with you. At least four times a day, I walk into a room having no idea why I entered that room. I mean no idea whatsoever. It's as if I have somehow been teleported into that room by an alien civilization. Sadly, the truth is much less thrilling. Here's what happened: in the brief time it took for me to get up and move from point A to point B, I have totally forgetten whatever it was that motivated me to get up at all. Oh sure, I'll rack my brain for a bit, trying to remember what I needed to do in that room. Sometimes I remember, sometimes I don't. In the end, I usually end up making multiple trips back and forth, remembering something else I should have done while I was in that room after I've already left it.
It's all quite sad. Hopefully your brain has a more efficient task stack than mine. But I don't fault my brain -- I fault my body. It can't keep up. If I had arrived faster, I wouldn't have had time to forget.
What I'm trying to say is this: speed matters. When you're a fast, efficient typist, you spend less time between thinking that thought and expressing it in code. Which means, if you're me at least, that you might actually get some of your ideas committed to screen before you completely lose your train of thought. Again.
Yes, you should think about what you're doing, obviously. Don't just type random gibberish as fast as you can on the screen, unless you're a Perl programmer. But all other things being equal -- and they never are -- the touch typist will have an advantage. The best way to become a touch typist is through typing, and lots of it. A little research and structured practice couldn't hurt either. Here are some links that might be of interest to the aspiring touch typist:
- Type Racer
- Typer Shark
- Dvorak, Keyboard Layout of Champions
- Colemak keyboard layout
- TyperA
- Das Keyboard with blank keys
- The Typing of the Dead (for PC)
- Put Down That Mouse. Seriously. It's a crutch.
- Typingmania (warning, Japanophiles only)
(But this is a meager and incomplete list. What tools do you recommend for becoming a better typist?)
There's precious little a programmer can do without touching the keyboard; it is the primary tool of our trade. I believe in practicing the fundamentals, and typing skills are as fundamental as it gets for programmers.
Hail to the typists!
| [advertisement] Lighthouse — taking the suck out of issue tracking. Developer API. Email integration. Github integration. Used by thousands of developers and open source projects including Rails, MooTools, RSpec, and Sproutcore. Free for Open Source projects. |
November 15, 2008
Your Favorite NP-Complete Cheat
Have you ever heard a software engineer refer to a problem as "NP-complete"? That's fancy computer science jargon shorthand for "incredibly hard":
The most notable characteristic of NP-complete problems is that no fast solution to them is known; that is, the time required to solve the problem using any currently known algorithm increases very quickly as the size of the problem grows. As a result, the time required to solve even moderately large versions of many of these problems easily reaches into the billions or trillions of years, using any amount of computing power available today. As a consequence, determining whether or not it is possible to solve these problems quickly is one of the principal unsolved problems in Computer Science today.While a method for computing the solutions to NP-complete problems using a reasonable amount of time remains undiscovered, computer scientists and programmers still frequently encounter NP-complete problems. An expert programmer should be able to recognize an NP-complete problem so that he or she does not unknowingly waste time trying to solve a problem which so far has eluded generations of computer scientists.
You do want to be an expert programmer, don't you? Of course you do!
NP-complete problems are like hardcore pornography. Nobody can define what makes a problem NP-complete, exactly, but you'll know it when you see it. Just this once, I'll refrain from my usual practice of inserting images to illustrate my point.
(Update: I was shooting for a poetic allusion to the P=NP problem here but based on the comments this is confusing and arguably incorrect. So I'll redact this sentence. Instead, I point you to this P=NP poll (pdf); read the comments from CS professors (including Knuth) to get an idea of how realistic this might be.)
Instead, I'll recommend a book Anthony Scian recommended to me: Computers and Intractability: A Guide to the Theory of NP-Completeness.
Like all the software engineering books I recommend, this book has a timeless quality. It was originally published in 1979, a shining testament to smart people attacking truly difficult problems in computer science: "I can't find an efficient algorithm, but neither can all these famous people."
So how many problems are NP-complete? Lots.
Even if you're a layman, you might have experienced NP-Completeness in the form of Minesweeper, as Ian Stewart explains. But for programmers, I'd argue the most well known NP-completeness problem is the travelling salesman problem.
Given a number of cities and the costs of travelling from any city to any other city, what is the least-cost round-trip route that visits each city exactly once and then returns to the starting city?
The brute-force solution -- trying every possible permutation between the cities -- might work for a very small network of cities, but this quickly becomes untenable. Even if we were to use theoretical CPUs our children might own, or our children's children. What's worse, every other algorithm we come up with to find an optimal path for the salesman has the same problem. That's the common characteristic of NP-complete problems: they are exercises in heuristics and approximation, as illustrated by this xkcd cartoon:
What do expert programmers do when faced by an intractable problem? They cheat. And so should you! Indeed, some of the modern approximations for the Travelling Salesman Problem are remarkably effective.
Various approximation algorithms, which quickly yield good solutions with high probability, have been devised. Modern methods can find solutions for extremely large problems (millions of cities) within a reasonable time, with a high probability of being just 2-3% away from the optimal solution.
Unfortunately, not all NP-complete problems have good approximations. But for those that do, I have to wonder: if we can get so close to an optimal solution by cheating, does it really matter if there's no known algorithm to produce the optimal solution? If I've learned nothing else from NP-complete problems, I've learned this: sometimes coming up with clever cheats can be more interesting than searching in vain for the perfect solution.
Consider the First Fit Decreasing algorithm for the NP-complete Bin Packing problem . It's not perfect, but it's incredibly simple and fast. The algorithm is so simple, in fact, it is regularly demonstrated at time management seminars. Oh, and it guarantees that you will get within 22% of the perfect solution every time. Not bad for a lousy cheat.
So what's your favorite NP-complete cheat?
| [advertisement] Peer code review without meetings, paperwork, or stopwatches? No wonder Code Collaborator won the Jolt Award. |
November 12, 2008
Stop Me If You Think You've Seen This Word Before
If you've ever searched for anything, you've probably run into stop words. Stop words are words so common they are typically ignored for search purposes. That is, if you type in a stop word as one of your search terms, the search engine will ignore that word (if it can). If you attempt to search using nothing but stop words, the search engine will throw up its hands and tell you to try again.
Seems straightforward enough. But there can be issues with stop words. Imagine, for example, you wanted to search for information on this band.
"The" is one of the most common words in the English language, so a naive search for "The The" rarely ends well.
Let's consider some typical English stopword lists.
| SQL Server stop words | Oracle stop words | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You'd think a pure count of frequency, how often the word occurs, would be enough to make a common group of words "stop words", but apparently not everyone agrees. The default SQL Server stop word list is much larger than the Oracle stop word list. What makes "many" a stop word to Microsoft, but not to Oracle? Who knows. And I'm not even going to show the MySQL full text search stop word list here, because it's enormous, easily double the size of the SQL Server stop word list.
These are just the default stop word lists; that doesn't mean you're stuck with them. You can edit the stop word list for any of these databases. Depending on what you're searching, you might decide to have different stop words entirely, or maybe no stop words at all.
Way back in 2004, I ran a little experiment with Google -- over a period of a week, I searched for an entire dictionary of ~110k individual English words and recorded how many hits Google returned for each.
Yes, this is probably a massive violation of the Google terms of service, but I tried to keep it polite and low impact -- I used Gzip compressed HTTP requests, specified only 10 search results should be returned per query (as all I needed was the count of hits), and I added a healthy delay between queries so I wasn't querying too rapidly. I'm not sure this kind of experiment would fly against today's Google, but it worked in 2004. At any rate, I ended up with a MySQL database of 110,000 English words and their frequency in Google as of late summer 2004. Here are the top results:
|
|
Again, a very different list than what we saw from SQL Server or Oracle. I'm not sure why the results are so strikingly different. Also, the web (or at least Google's index of the web) is much bigger now than it was in 2004; a search for "the" returns 13.4 billion results -- that's 25 times larger than my 2004 result of 522 million.
On Stack Overflow, we warn users via an AJAX callback when they enter a title composed entirely of stop words. It's hard to imagine a good title consisting solely of stopwords, but maybe that's just because our technology stack isn't sufficiently advanced yet.
Google doesn't seem to use stop words any more, as you can see from this search for "to be or not to be".
Indeed, I wonder if classic search stop words are relevant in modern computing; perhaps they're a relic of early 90's computing that we haven't quite left behind yet. We have server farms and computers perfectly capable of handling the extremely large result sets from querying common English words. A Google patent filed in 2004 and granted in 2008 seems to argue against the use of stop words.
Sometimes words and phrases that might be considered stopwords or stop-phrases may actually be meaningful or important. For example, the word "the" in the phrase "the matrix" could be considered a stopword, but someone searching for the term may be looking for information about the movie "The Matrix" instead of trying to find information about mathematical information contained in a table of rows and columns (a matrix).A search for "show me the money" might be looking for a movie where the phrase was an important line, repeated a few times in the movie. Or a search for "show me the way" might be a request to find songs using that phrase as a title from Peter Frampton or from the band Styx.
A Google patent granted this week explores how a search engine might look at queries that contain stopwords or stop-phrases, and determine whether or not the stopword or stop-phrase is meaningful enough to include in search results shown to a searcher.
Apparently, at least to Google, stop word warnings are a thing of the past.
| [advertisement] Read the largest case study ever published about lightweight peer code review in Best Kept Secrets of Peer Code Review. Free book, free shipping. |
November 10, 2008
Feeding My Graphics Card Addiction
Hello, my name is Jeff Atwood, and I'm an addict.
I'm addicted... to video cards.
In fact, I've been addicted since 1996. Well, maybe a few years earlier than that if you count some of the classic 2D accelerators. But the true fascination didn't start until 1996, when the first consumer hardware 3D accelerators came to market. I followed their development avidly in newsgroups, and tried desperately to be the first kid on my block to own the first one. And boy did I ever succeed. Here's a partial list of what I remember owning in those early days:
- Rendition Verite V1000
- 3dfx Voodoo
- 3dfx Voodoo 2
- ATI Rage Pro
- NVIDIA Riva 128
- Matrox G400
- NVIDIA Riva TNT
- NVIDIA GeForce 256
(This is only a partial list, ranging from about 1996 to 2001 -- I don't want to bore you. And believe me, I could. I mean more than I already am.)
These were heady times indeed for 3D graphics enthusiasts (read: PC gamers). I distinctly remember playing the first DOS-based Tomb Raider on my 3dfx Voodoo using the proprietary GLIDE API. Sure, it's pathetic by today's standards, but the leap from software 3D to fast hardware 3D was quite dramatic from the trenches -- and far more graphically powerful than any console available.
This was a time when you could post a thread on a usenet newsgroup about a brand new 3D card, and one of the creators of the hardware would respond to you, as Gary Tarolli did to me:
I first want to say how rewarding it is to read all your reviews after having worked on the design of Voodoo Graphics (the chipset on the Orchid Righteous 3D board) for over two years. I am one of the founders of 3Dfx and one of our goals was to deliver the highest quality graphics possible to the PC gamer. It was and still is a very risky proposition because of the cost sensitivity of the marketplace. But your reviews help convince me that we did the right thing.I thought I would share with you a little bit about what is inside the 3Dfx Voodoo Graphics chipset. There are 2 chips on the graphics board. Each is a custom designed ASIC containing approximately 1 million transistors. Although this number of transistors is on the order of a 486, it is a lot more powerful. Why? Because the logic is dedicated to graphics and there's a lot of logic to boot. For example, bilinear filtering of texture maps requires reading four 16-bit texels per pixel (that's 400 Mbytes/sec at 50 Mpixels/sec) and then computing the equation
red_result = r0*w0+r1*w1+r2*w2+r3*w3wherer0:3are the four red values andw0:3are the four weights based on the where the pixel center lies with respect to the four texels. This is performed for each color channel (red, green, blue, alpha) resulting in 16 multiples and 12 additions or 28 operations per pixel. At 50 Mpixels per second that is 1,400 Mops/sec. The way this is designed in hardware is you literally place 16 multipliers and 12 adders on the chip and hook them together. And this is only a small part of one chip. There are literally dozens of multipliers and dozens of adders on each of the two chips dedicated only to graphics. Each chip performs around 4,000 million actual operations per second, of which around one third are integer multiplies. These are real operations performed - if you were to try to do these on a CPU (or a DSP) you must also do things like load/store instructions and conditions. In my estimation it would take about a 10,000 Mip computer (peak) to do the same thing that one of our chips does. This is about 20 of the fastest P5-200 or P6-200 chips per one of our chips. Not exactly cost-effective. So if you want to brag, you can say your graphics card has approximately the same compute power as 40 P5-200 chips. Of course, these numbers are more fun than they are meaningful. What is meaningful in graphics is what you see on the screen.Now of course, if you were writing a software renderer for a game, you wouldn't attempt to perform the same calculations we perform on our chip on a general purpose CPU. You would take shortcuts, like using 8-bit color with lookup tables for blending, or performing perspective correction every (n) pixels. The image quality will depend on how many shortcuts you take and how clever you are. Voodoo Graphics takes no shortcuts and was designed to give you the highest quality image possible within the constraint of 2 chips. As your reviews have shown, it is evident that you can see the difference in quality and performance.
There's nothing quite like having a little chat on usenet with the founder of the company who created the 3D accelerator you just bought. Like I said, it was a simpler time.
Just imagine something with the power of forty Pentium-200 chips! Well, you don't have to. There's probably a CPU more powerful than that in your PC right now. But the relative scale of difference in computational power between the CPU and a GPU hasn't changed -- special purpose GPUs really are that much more powerful than general purpose CPUs.
After that first taste of hot, sweet GPU power, I was hooked. Every year since then I've made a regular pilgrimage to the temple of the GPU Gods, paying my tithe and bringing home whatever the latest, greatest, state-of-the art in 3D accelerators happens to be. What's amazing is how often, even now, performance doubles yearly.
This year, I chose the NVIDIA GTX 280. Specifically, the MSI NVIDIA GTX 280 OC, with 1 GB of memory, overclocked out of the box. I hate myself for succumbing to mail-in rebates, but they get me every time -- this card was $375 after rebate.
$375 is expensive, but this is still the fastest single card configuration available at the moment. It's also one heck of a lot cheaper than the comically expensive $650 MSRP these cards were introduced at in June. Pity the poor rubes who bought these cards at launch! Hey, wait a second -- I've been one of those rubes for 10 years now. Never mind.
This is the perfect time to buy a new video card -- before Thanksgiving and running up to Christmas is prime game release season. All the biggest games hit right about now. Courtesy of my new video card and the outstanding Fallout 3, my productivity last week hit an all-time low. But oh, was it ever worth it. I'm a long time Fallout fan, even to the point that our wedding pre-invites had secret geek Fallout art on them. Yes, that was approved by my wife, because she is awesome.
I must say that experiencing the wasteland at 60 frames per second, 1920 x 1200, in high dynamic range lighting, with every single bit of eye candy set to maximum, was so worth it. I dreamt of the wastelands.
In fact, even after reaching the end of the game, I'm still dreaming of them. I've heard some claim Fallout 3 is just Oblivion with guns. To those people, I say this: you say that like it's a bad thing. The game is incredibly true to the Fallout mythos. It's harsh, gritty, almost oppressive in its presentation of the unforgiving post-apocalyptic wasteland -- and yet there's always an undercurrent of dark humor. There are legitimate good and evil paths to every quest, and an entirely open-ended world to discover.
No need to take my word for it, though. I later found some hardware benchmark roundups that confirmed my experience: the GTX 280 is crazy fast in Fallout 3.
Of course, we wouldn't be responsible PC owners if we didn't like to mod our hardware a bit. That's what separates us from those knuckle-dragging Mac users: skill. (I kid, I kid!) First, you'll want to download a copy of the amazing little GPU-Z application, which will show you in real time what your video card is doing.
A little load testing is always a good idea, particularly since I got a bum card with my first order -- it would immediately shoot up to 105 C and throttle within a minute or two of doing anything remotely stressful in 3D. It worked, but the resulting stuttering was intolerable, and the fan noise was unpleasant as the card worked overtime to cool itself down. I'm not sure how I would have figured that out without the real time data and graphs that GPU-Z provides. I returned it for a replacement, and the replacement's behavior is much more sane; compare GPU-Z results at idle (left) and under RTHDRIBL load (right):
|
|
Fortunately, there's not much we need to do to improve things. The Nvidia 8800 and GTX series are equipped with outstanding integrated coolers which directly exhaust the GPU heat from the back of the PC. I'd much rather these high powered GPUs exhaust their heat outward instead of blowing it around inside the PC, so this is the preferred configuration out of the box. However, the default exhaust grille is incredibly restrictive. I cut half of the rear plate away with a dremel, which immediately reduced fan speeds 20% (and thus, noise 20%) due to the improvement in airflow.
Just whip out your trusty dremel (you do own a dremel, right?) and cut along the red line. It's easy. If you're a completionist, you can apply better thermal paste to the rest of the card to eke out a few more points of efficiency with the cooler.
Extreme? Maybe. But I like my PCs powerful and quiet. That's another thing that attracted me to the GTX 280 -- for a top of the line video card, it's amazingly efficient at idle. And despite my gaming proclivities, it will be idle 98% of the time.
I do love this new video card, but I say that every year. I try not to grow too attached. I'm sure this video card will be replaced in a year with something even better.
What else would you expect from an addict?
| [advertisement] Complimentary paperback book on lightweight peer code review. 10 essays from industry experts. Free shipping. Order Best Kept Secrets of Peer Code Review. |
November 08, 2008
Coding: It's Just Writing
In The Programming Aphorisms of Strunk and White, James Devlin does a typically excellent job of examining something I've been noticing myself over the last five years:
The unexpected relationship between writing code and writing.
There is perhaps no greater single reference on the topic of writing than Strunk and White's The Elements of Style. It's one of those essential books you discover in high school or college, and then spend the rest of your life wondering why other textbooks waste your time with all those unnecessary words to get their point across. Like all truly great books, it permanently changes the way you view the world, just a little.
Wikipedia provides a bit of history and context for this timeless book:
[The Elements of Style] was originally written in 1918 and privately published by Cornell University professor William Strunk, Jr., and was first revised with the help of Edward A. Tenney in 1935. In 1957, it came to the attention of E. B. White at The New Yorker. White had studied under Strunk in 1919 but had since forgotten the "little book" which he called a "forty-three-page summation of the case for cleanliness, accuracy, and brevity in the use of English."A few weeks later, White wrote a piece for The New Yorker lauding Professor Strunk and his devotion to "lucid" English prose. The book's author having died in 1946, Macmillan and Company commissioned White to recast a new edition of The Elements of Style, published in 1959. In this revision, White independently expanded and modernized the 1918 work, creating the handbook now known to millions of writers and students as, simply, "Strunk and White". White's first edition sold some two million copies, with total sales of three editions surpassing ten million copies over a span of four decades.
This is all well and good if you plan to become a writer, but what's the connection between this timeless little book and writing a computer program?
Writing programs that the computer can understand is challenging, to be sure. That's why so few people, in the big scheme of things, become competent programmers. But writing paragraphs and sentences that your fellow humans can understand -- well, that's even more difficult. The longer you write programs and the older you get, eventually you come to realize that in order to truly succeed, you have to write programs that can be understood by both the computer and your fellow programmers.
Of all the cruel tricks in software engineering, this has to be the cruelest. Most of us entered this field because the machines are so much more logical than people. And yet, even when you're writing code explicitly intended for the machine, you're still writing. For other people. Fallible, flawed, distracted human beings just like you. And that's the truly difficult part.
I think that's what Knuth was getting at with his concept of Literate Programming (pdf).
Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.
This is, of course, much easier said than done. Most of us spend our entire lives learning how to write effectively. A book like The Elements of Style can provide helpful guideposts that translate almost wholesale to the process of coding. I want to highlight the one rule from Elements of Style that I keep coming back to, over and over, since originally discovering the book so many years ago.
13. Omit needless words.Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell.
What does this say to you about your writing? About your code?
Coding, after all, is just writing. How hard can it be?
| [advertisement] Peer Code Review. No meetings. No busy-work. Customizable workflows and reports. Try Jolt Award-winning Code Collaborator. |









