News

Reducing Rousseau - Dictionary making with the Shell

by Hopton, Paul 4 March 2010 in MMS Software

Jean Jaques Rousseau

Jean Jaques Rousseau

This evening I needed to get a list of unique words for a captcha tool. I decided to start with one of my favourite books of all time: 'Confessions' by Jean-Jaques Rousseau. I grabbed the text courtesy of the Gutenburg project and then had to figure how to extract the words. Rousseau was not only an eloquent writer but posessed a wide vocabulary (even when translated into english).

I googled for some scripts, but then just decided to pipe stuff together on the command line. here's how it looked:

 

grep -o '[a-zA-Z]\{4,12\}' rousseau.txt | 
    tr A-Z a-z | 
    sort -u |
    tr '\n' ';' > rousseauWords.txt

 

this processed the whole book in less than 2 seconds  and gave me a list of all uniques words longer than 3 characters and shorter than 13, stripped of punctuation, turned to lowercase and seperated by a colon.

Not particularly eloquent, and I'm sure there is some repetition but it does the job nicely

Tags: and
Who We Are and What We Do

Meyer, Miller, Smith.

Head. Hands. Heart. We believe that there is a real connection between craftsmanship and communication.

As information and intelligence becomes the domain of computers, society will place more value on the one human ability that cannot be automated: Emotions.

People want to experience beauty, enjoy one’s work, feel passion, they want to interact with each other. We all want.

That’s why we believe that the future of brands is interaction, not commodity. It’s not something you buy, but something you participate in.

Top