April 23, 2004 4:00 AM PDT
Google's chastity belt too tight
- Related Stories
-
Report criticizes Google's porn filters
April 10, 2003 -
Supreme Court to hear filtering case
November 12, 2002 -
ACLU paper slams filters, ratings
August 7, 1997
What's new:
Despite claims of "advanced proprietary technology," Google's opt-in porn filter proves no better than the tools of the last decade, blocking many harmless sites, a CNET News.com investigation shows.
Bottom line:
The indiscriminate nature of the tool is bad news for affected businesses. Google is the most widely used search engine, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets.
By an accident of spelling, the domain name of the Ohio electronics retailer includes an unfortunate string of letters, "sex," which is enough to block the Web site from Google's filtered results.
PartsExpress.com is not alone. A CNET News.com investigation shows that Google's SafeSearch filter technology incorrectly blocks many innocuous Web sites based solely on strings of letters such as "sex," "girls" or "porn" embedded in their domain names.
Google's SafeSearch flaws are more than academic--they can have serious consequences for innocent Web site operators blocked out by them. Google is the most widely used search engine on the Web, and failure to appear in its listings can have a direct impact on sales for some companies, particularly smaller enterprises with limited marketing budgets.
Research company WebSideStory reported last month that Google claimed an
"Traffic from Google can make or break a business," said Maria Medina, whose family-run clothing business at ALittleGirlsBoutique.com doesn't pass the SafeSearch censor. "Here I am, a mom of four children, creating an at-home business that sells little girl dresses and accessories, in order to spend more time with my children, and I have been filtered out as not being family friendly. Ridiculous."
Matt Cutts, the Google engineer who designed SafeSearch four years ago, said his algorithm looks for a "relatively small" number of trigger words in a Web page's address. If one of those words appears, the SafeSearch algorithm puts the address on a block list and does not take the next step of evaluating the content of the site. "We try to find the best trade-off of precision, recall and safety," Cutts said. "People who opt in to SafeSearch are mostly OK with us being on the conservative side."
Cutts would not disclose how many Web searches are done with SafeSearch enabled, saying only that it's a small percentage of the millions of queries handled by Google each day. But the sloppy filter stands out as a rare black eye for a company that prides itself on superior search technology and boasts on its payroll one of the world's highest concentrations of computer science doctoral degrees. Google
"That's not very bright," said Karen Schneider, a librarian who runs the
The Scunthorpe problem
For years, Web content filters have drawn criticism for inaccuracies. In a famously embarrassing incident in 1996, America Online's errant dirty-word filter prevented residents of the British town Scunthorpe from signing up as new customers. Google's SafeSearch makes the same mistake, blocking local news sites like ThisIsScunthorpe.co.uk and ScunthorpeDistrictCatsProtection.co.uk, a housecat-adoption site.
SafeSearch is "evocative of the very primitive CyberSitter-type tools of the mid-1990s--not a tool of fairly sophisticated development."
SafeSearch also marked as unsafe for children JewishSussex.com, a religious Web site; EssexCountyBeeKeepers.org of Topsfield, Mass.; BluesExcuse.SouthBurnett.com.au, an Australian blues band's site; BassExpert.com; and the Anglo-Saxon history site RomansInSussex.co.uk.
Gareth Roelofse, the Web designer of RomansInSussex.co.uk, said his filtering complaints are broader than just Google. "We also found many library Net stations, school networks and Internet cafes block sites with the word 'sex' in" the domain name, Roelofse said. "This was a challenge for RomansInSussex.co.uk because its target audience is school children."
"I think it would be nice if Google would have a 'white list' for sites like ours, but this would involve human man-hours, I guess," said Roelofse, who designed the site on behalf of the Sussex Archaeological Society and local museums.
Cutts, the Google software engineer, noted that the SafeSearch
Google is not alone in seeking to lure searchers worried about encountering online raunch and ribaldry: Yahoo offers a "
An April 2003
David Drummond, Google's vice president for business development, said that at the time of its development, SafeSearch was designed to be overly cautious. "The thinking was that SafeSearch was an opt-in feature," Drummond said. "People who turn it on care a lot more about something sneaking through than they do about something getting filtered out."
"Plainly silly" blocking
CNET News.com evaluated SafeSearch by testing tens of thousands of random Web pages and identifying which ones were incorrectly listed as pornographic. The results showed that Google encountered many of the same problems that have plagued Internet filters for almost a decade. One 1996 analysis, for instance, showed that CyberPatrol blocked National Rifle Association and gay and lesbian Web sites, and CyberSitter cordoned off Usenet newsgroups such as alt.feminism and soc.support.fat-acceptance.
"People who opt in to SafeSearch are mostly OK with us being on the conservative side."
The ACLU, which has warned against buggy filters
"In the end, the lists are proprietary," Steinhardt said. "Without access to the lists, you don't know precisely what's being blocked. You have to rely on the authors of the lists to have the right judgment."
The word "girls" also tends to lead SafeSearch astray. It incorrectly blocks the Web sites of the private school GirlsSchoolOfAustin.org; the bridesmaid dress shop DressyGirls.com; TatuGirls.com, a Russian band's site; and TheCalicoGirls.com, a Web site devoted to cat poetry.
"Porn" in a domain name can confuse SafeSearch just as thoroughly. It won't display Pornichet.org, devoted to improving tourism for the French seaside town of Pornichet; SpornGroup.com, a New York-based business consultancy; Sporn.com, which sells dog leashes; PornkRocks.com, a site devoted to the band Pornk; and Anti-Kinderporno.de, a German effort to oppose child pornography.
Aaron Wolfe, information systems director for SafeSearch-banned PartsExpress.com, said the company is planning to excise that unfortunate string of letters from its domain name. "We are going to modify our domain name to Parts-Express.com," Wolfe said, adding that the renaming will also help "get around spam filters on e-mail servers."
See more CNET content tagged:
SafeSearch,
porn,
Google Inc.,
investigation,
domain name





Helllllllo??!!! Yeah, they're sooooo sweet and innocent in their
"Russian band" schoolgirl outfits, and won't have ANY influence
on my grade-school kids!
Apparently nobody at CNET watches MTV... this was NOT the
strongest example to use to make the point!
Can anyone recommend a *good* search site?
And, yes, I snickered at the reference to Tatu (in this context) too . . .
As to the "Scunthorpe incident", Scunthorpe people seem to have found ways round to the blocking of the town's name in URLs: the two "blocked" sites mentioned in the article can be reached by an extra-click in pages that don't have the town's name in their URLs: for instance, http://www.rhatcliffe.freeserve.co.uk/scun_cats_page.htm is unblocked and has a link to http://www.scunthorpedistrictcatsprotection.co.uk/. Was the former created to counter SafeSearch's blocking of the latter, or because the town's name got blocked in general by filters? It would have been interesting to know, but there is no info on this in Mr McCullagh's article.
Another puzzling thing: at the end of his "chastity belt" article, Mr McCullagh gives a link to his own article on Ben Edelman's empirical analysis of SafeSearch (1), but that's all. It would have been great if Mr McCullagh's article had contributed to the further research suggested in the conclusions of Ben Edelman's Empirical Analysis. Unfortunately, it doesn't.
(1) http://news.com.com/2100-1032-996417.html For Edelman's research itself: http://cyber.law.harvard.edu/people/edelman/google-safesearch.
Cordially
Claude Almansi
http://www.adisi.ch
http://www.entersearchterm.org
Many site operators who feel their sites are being blocked by SafeSearch may only not have been crawled when the current index was built. It may only require that they get a few more links to their sites to ensure that Google crawls them. They should also make sure their internal linkage is set up correctly (including the use of HTML site maps if they have more than a few pages).
Hala Chaoui