Saturday, August 7, 2010

Ruby is to Esperanto as PHP is to English

In 1887, L.L. Zamenhof published a book called "Unua Libro," describing a brand-new language that he hoped would change the world. A language anyone could learn to speak. One without political affiliation, without a messy history. And, I have to imagine he thought: finally, one that made sense.

English speakers know that our language is a mess - comedy routines have been built around it.

Here's Brian Regan on learning plurals in school.
So she asked this kid who knew everything. Irwin. “Irwin, what’s the plural for ox?”

“Ox. Oxen. The farmer used his oxen.”

“Brian?”

“What?”

“Brian, what’s the plural for box?”

“Boxen. I bought 2 boxen of doughnuts.”

“No, Brian, no. Let’s try another one. Irwin, what’s the plural for goose?”

“Geese. I saw a flock of geese.”

“Brian?”

[Exasperated laughing]“Wha-a-at?”

“What’s the plural for moose?”

“Moosen! I saw a flock of MOOSEN! There were many of ‘em. Many much moosen. Out in the woods…in the wood-es…in the woodsen. The meese want the food in the woodesen…food is the eatenesen…the meese want the food in the woodesenes…food in the woodesenes.”

We learn English rules, like "to make a past tense, add 'ed' to a verb: sailed, repeated, succeeded, constructed, cleaned." But many common verbs don't follow this. I ate, not eated; I ran, not runned; I went, not goed; I slept, not sleeped.

Other languages are the same. Spanish, for example, has tons of irregular verbs. Like many languages, it also has genders for all its nouns. Mountains are feminine. Trees are masculine. Why? Nobody knows. Mark Twain had hilarious complaints about German, which adds another gender called "neuter," among other complications.

Every noun has a gender, and there is no sense or system in the distribution; so the gender of each must be learned separately and by heart. There is no other way. To do this one has to have a memory like a memorandum-book. In German, a young lady has no sex, while a turnip has. Think what overwrought reverence that shows for the turnip, and what callous disrespect for the girl. See how it looks in print -- I translate this from a conversation in one of the best of the German Sunday-school books:

Gretchen: "Wilhelm, where is the turnip?"
Wilhelm: "She has gone to the kitchen."
Gretchen: "Where is the accomplished and beautiful English maiden?"
Wilhelm: "It has gone to the opera."


The reason for all this messiness is simple: nobody planned this. Languages are organic, evolving things. We forget words, make up new ones, pronounce things lazily, and the mistakes become normal. The rules change, and the exceptions pile up.

But not in Esperanto. Esperanto was designed to make sense.

Esperanto is much easier to learn than other languages because:

  • The different letters are always pronounced in the same way and every letter in a word is pronounced. Therefore there are no difficulties with spelling and pronunciation, as one knows that the penultimate syllable is always stressed.
  • The grammar is simple, logical and without exceptions. The grammatical exceptions are often what make it so difficult to learn a new language.
  • Most of the words in Esperanto are international and are found in languages around the world.
  • It is easy to make new words with prefixes or suffixes. Thus, if one learns one word, ten or more usually come as part of the package.

And while I'm sure it has its peculiarities, it's fundamentally different from English or Russian or Chinese or Swahili, which basically became what they are by chance.

PHP is like English


PHP is a very useful language for web development. But it's a bit haphazard, like English. Just look at some of its string functions:
  • count_chars() should have a counterpart called count_words(), right? But it doesn't. The counterpart is str_word_count() .
  • strip_tags() is named with words separated by underscores. stripslashes() has no underscores.
  • hebrev() will "convert logical Hebrew text to visual text." I don't know what that means, but really: a whole function for this, in the global namespace? Could something similar be done with Korean or Arabic or Portuguese or Tagalo? If so, would we create korev() and arabiv() and portugv()? Wouldn't it make more sense to have something like langv('languageName') and be done with it? For that matter, why include such a specific function in the base langauge, when 99.9% of users won't need it and those who do could use a library?

This haphazard design reflect's PHP's history: once called "Personal Home Page/Forms Interpreter," it was made with one vision and has been amended and revised and rewritten over and over. I get the impression that somebody said, "hey, I want to add a function for messing with Hebrew." And the PHP team said, "sure, knock yourself out. We'll put it in there."

Now don't get me wrong. I'm not smart enough to design a language myself. And these days, you can write solid, test-driven, object-oriented code in PHP. But fundamentally, it feels like a language that happened, not one that was made.

Ruby is like Esperanto


Ruby, on the other hand, was designed by Yukihiro "Matz" Matsumoto, a Japanese computer scientist who wanted to draw on the strengths of Perl and Python, and above all, to write a language for human beings. As Matz wrote in the foreword to Hal Fulton's The Ruby Way:

...Programming languages are ways to express human thought... Machines do not care whether programs are structured well; they just execute them bit by bit. Structured programming is not for machines, but for humans... So to design a human-oriented langauge, Ruby, I followed the Principle of Least Surprise. I consider that everything that surprises me less is good. As a result I feel a natural feeling, even a kind of joy, when programming in Ruby.

The language feels almost philosophical. It starts with principles: everything is an object. This has deep implications: classes are objects. True and false are objects. "Class" itself is an object, of the class "Class." Odd, yes, but not haphazard. Where PHP seems sloppy, Ruby seems mysterious, like real life. How can light be both a particle and a wave? How can "nil" be an object? We don't know, but we suspect there are satisfying reasons if we dig deep enough.

This logical consistency generally extends to the particulars of the language. Want to know the number of items in an array, or the number of characters in a string? .length is your method, in either case. (PHP employs count() and strlen(), respectively.)

If anything can be converted to a string, you can rely on .to_s to do it; to_a and to_i will likewise convert to an array or integer, if possible. There are some unexpected or duplicate methods: str.to_str is the same as str.to_s, but Array doesn't have .to_str. But on the whole, it seems easier to guess what a method will be called than in PHP.

Ruby, like Esperanto, is quirky. It considers 0 to be true, for example, unlike any other language I know about. But when I encounter its quirks, I assume that they're by design; a necessary condition for some beautiful aspect of the language. In PHP, I think, "somebody didn't think that through."

But perhaps I should stop there. Who am I to judge? I'm still a beginner in this field, with plenty to learn in both PHP and Ruby. Judgment can wait.

For now, I should go back to my studies. I should open my copy of "The Well-Grounded Rubyist" with a curious and open mind. And with the enjoyable expectation that, if I read and think and tinker and practice and ask questions, it will all make sense in the end.