Monday, March 1, 2010

It's not magic

A while back, I had a small epiphany. I'd been asked to create a web form that could send emails with attachments. I already had forms that sent email, but attachments? What were they? In my mind, attachments were the little icons above the email. I had no idea how they were created or sent.

At the same time, I was reading The Code Book, an entertaining and fascinating look at cryptography - the art of sending scrambled messages, to be unscrambled only by the intended recipient.

The simplest, most brain-dead kind of encryption is a Caesar cipher, where you take all the letters in a message and shift them by the same amount. For example, with a Caesar cipher of 1, this:


becomes this:


A modern cryptanalyst would pee his pants laughing if you used this method for anything serious. But one thing about it works: you can encode, and you can decode, as long as you know the rules.

Now, the Code Book traces the development of encryption methods so complex they'll make your head spin, but all of them are systematic: whatever is encoded can be decoded. You just need to know how.

This applies to any method of encoding information, even if it's not cryptography. For example, Morse Code encodes letters as electrical pulses - not to hide the message, but just to transmit it. How cool - you can encode actual language as beeps!

Which shows us another important idea: you can encode anything as anything else. You can encode the weather forecast with colored socks. You can encode the Constitution with duck calls. As long as there are consistent rules for encoding and decoding, it will work.

Going back to my email attachments, I soon discovered how attachments work: the file data is encoded as text in your email. It ends up looking like gibberish, but fortunately, no human has to read it, because the email program does that for you.

How does it know which parts of the text are text and which parts are files? You tell it. Say you want to attach an image. First, you choose some arbitrary string of characters which will probably never occur in an actual email text. Let's say it's "Woo_hamburgers_for_mayor_in_2050_yeeeeeha." Whenever you use that phrase, it means "I'm about to put in some different content." Each section of content also gets a label about what "MIME type" it is, like "image/jpg" or "application/pdf". Your email text goes in one section, and your image data goes in another, after being encoded as text.

(In PHP, you can encode the data as text like this: base64_encode($fileContents).**)

On the other end, when someone opens your email, their program is smart enough not to show all the scrambled-looking letters, but instead to say, 'hey, he said this part was an image - let's show it like that.' And it gets decoded. The little icons show up. It works!

The main thing is, it's not magic. For me, this turned on a light in my head. I stood next to a fax machine, and pointed my finger at it. "I know what you're doing with your crazy noises," I said. "You're encoding image data as sound!"

And that's how computers work, all the way down. Image information is encoded as characters, which are encoded as ones and zeros, which are encoded as magnetic charges on a disk platter, which are transmitted as electrical pulses in a circuit board.

It's hard to understand. It's hard to actually believe sometimes that everything I'm doing on screen can be represented as ones and zeros. But there's no magic. And if there's no magic, there's nothing to be scared of. If I work hard enough, I can understand a little piece of it - enough to get something done.

**For a detailed walkthrough of my attachment code, see this Stackoverflow post.)