Richard Möhn's Old Home Page: Exercise: Removing Repeated Words from a Text

This is not a difficult exercise and I publish it only for the funny effects it has: Write a program that takes a text and returns a text that has the same words as the original, but every word only once. The order of the words should be preserved.

Variation

The obvious policy is to keep the first occurence of every word and remove all others. Changing this could have interesting effects on the output.

Solutions

My original solution is the trivial Perl script 04-01b-hashfuncs/unique-words in my ALP III Git repository. It regards everything between whitespace as a word. I applied it to the concatenation of Unterm Birnbaum and Effi Briest by Theodor Fontane and The Tragedie of Hamlet by William Shakespeare.

Origin

Our task was to write different hash functions for strings and assess their quality. I chose the above texts as sample input, but couldn't use them as is, because the different word frequencies would influence the frequencies of the hash values. Therefore I had to remove the repetitions.

« Exercise: Overlapping Removals Yielding Existing Words || Exercise: Finding the most likely chain of events »