>Technical: Pro Git

Recently I started to use the “git” source/version control system for my own projects and at work. The book “Pro Git” is an excellent source for information about git, how to use it, how it works. Not only does the book clearly explains the basic principals of using git so you can start using it, but also it covers a number of advanced topics – like setting up git servers etc.

 

The best part is that electronic version of the book (in PDF,Kindle and other formats) is totally free. If you need to learn git, this is the book.

If you are not software developer, then you are free to ignore this post.

More on Education

Last week I finished reading Seth Godin’s educational manifesto titled “Stop Stealing the Dream”. The manifesto is really a short book that starts by asking the question “What are schools for?”.

Godin’s main theses is that our current schools were designed to produce obedient  workers for the industrial revolution. Back then we needed competent, but compliant people to work on assembly lines. Think of scientific management.

However, the world has moved to post-industrial age. Things that can be mindlessly done on assembly lines can now be done by machines.  We live in the age of instantly available information, the age of Google. Yet our schools still seem to be stuck in the previous age. Godin explores some ideas of how this could/should change.

Although I don’t necessarily agree with all of Godin’s observations – he makes some valid points.  Finally, his book is available for free in many different electronic formats. Check it out!

Generating Random Words

There is a science fiction book by Stanislaw Lem called “The Futurological Congress” in which the author describes how futurologists in this future work. Instead of studying science, technology and history, they use computers to generate random words and then try to assign meanings to them.

Just for a reference think how meaningless this sentence was in 1980: “To listen to a song, just google the title and then download the mp3.”

Anyway, I thought that generating random words that look like words is an interesting computer problem, so I wrote some code to try it.

My first attempt was a total failure. I generated random sequences of letters and then tried to pick of the ones that could be words. This resulted in mostly useless stuff.

More recently I have been reading a book called “Think Stats” and I decided to try an apply some of statistical methods to generation of new words. The basic idea is to analyze some existing text (more text the better) to see the distribution of letters in a word. Then, once the probabilities of letter occurrences are computed, words can be generated.

My algorithm is very simple. It determines the probability of a word of certain length occurring, and then within each given length the probability of a particular letter happening at each position. Given these it’s easy to generate random text.

I compute the Probability Mass Function (PMF) for distribution of word lengths and the letter distributions (see “Thinks Stats” for gory details). I used text of “Moby Dick” and “War and Peace” as my input texts (you can get lots of such text from http://www.gutenberg.org/).

Here is a snippet of such random text generated by my program:

ruloee losy nre letely we wad whlg bhmt oititrs tarr tid hhe tneos daot sacameen bas saeitmcd a ihatminatnes phvvdnn waind see iaise ty klenk st prirpcnry hy taa why yod naahtny shd kple ae waat chne rny torlld bar tet tamd s hil it hre thn hooed ot gnky litsanad sxhlrrnki thd dhte cirdne fvae tle on lrnmcoes wmohity tae a tit i morvh fsth teirk aoae  liciig bt wodhen ie peidgdg thd paoornnw ane fwat of i hiey os aae af yne ty ceviree fawtnad daut held yne to whs wht on thn tne cfurs weee anl herooerkg

As you can see, it almost looks like a language. Here are some words that could be new things in the future:

  • litsanad
  • wodhen
  • sacameen
You can try and assign some meanings to them!
(BTW, all the code in Python so if you are interested you can have a copy…. )