Spam Negatively Affecting Linguistic Research

Spam. EBay has policies against it, search engines and email clients are constantly upgraded to filter it out, and most people just hate it. Now it turns out that they might be hampering the work of those linguists who have turned to Google to directly observe the evolution of language.

One popular technique used by spammers is “keyword spamming.” If everyone is currently talking about the exploits of a popular celebrity, why not add her name to your auction listing, website, or email message so that you get more visitors? Spammers have been known to load up their material with what seems to be a dictionary’s worth of words. In response, email clients and search engines have added filters that ignore lists of words that have no relationship to the true content of the email message or website. However, spammers are now getting around these filters by taking their lists of words and using them in grammatically-correct but nonsensical sentences.

How are linguists being affected? Traditional data sources such as controlled studies, eavesdropping, and collections of written or spoken words all have their disadvantages, so some linguists are using Google to monitor the evolution of language. However, other linguists warn that the Internet might not be the best source either, because keyword spamming practices might be contaminating the data.

Published by

Richard Leis

Richard Leis is a writer and poet living in Tucson, Arizona. His poetry has been published in Impossible Archetype. His essays about fairy tales and technology have been published on Tiny Donkey and Fairy Tale Review’s “Fairy-Tale Files“.