Exercise: Removing Repeated Words from a Text
This is not a difficult exercise and I publish it only for the funny effects it has: Write a program that takes a text and returns a text that has the same words as the original, but every word only once. The order of the words should be preserved.
Variation
The obvious policy is to keep the first occurence of every word and remove all others. Changing this could have interesting effects on the output.
Solutions
- My original solution is the trivial Perl script
04-01b-hashfuncs/unique-words
in my ALP III Git repository. It regards everything between whitespace as a word. I applied it to the concatenation of Unterm Birnbaum and Effi Briest by Theodor Fontane and The Tragedie of Hamlet by William Shakespeare.
Origin
Our task was to write different hash functions for strings and assess their quality. I chose the above texts as sample input, but couldn't use them as is, because the different word frequencies would influence the frequencies of the hash values. Therefore I had to remove the repetitions.