Google, MSN, Yahoo Treat HTML Character Entities Differently
For last week’s SEO Quiz, I spent some time researching HTML character entities in order to maintain valid xhtml code.
For instance, the encoding for » is ». During my research on HTML entities, I noticed that the search engines differ in the way they convert query input.
Google converts encoded HTML into the character it represents, before processing the query. I’ll use Õ as an example. The character Õ is encoded as Õ
A search for “Õ” in Google returns results for the letter “O” and “Õ”. I’m guessing that Google knows that “Õ” is a variation of the letter “O”. In MSN, however, a search for “Õ” returns the same results as a search for “213″. This means MSN strips all escaping characters before processing the search. Yahoo processes a search for “Õ” the same way MSN does. Try a search for “A” and “A” in all the search engines to get a better feel for the differences.
Additionally, Google returns a different error message than MSN and Yahoo. A search for “$” (the encoding of “$”) returns nothing in Google. There is no error message, no results returned, nothing. A search for “$” returns the same thing.
Since MSN and Yahoo strip the escaping characters, the query is still processed and results are returned for a search of “36″. A search for “$” returns an error message explaining that there are no results containing “$”.
Interesting to see the differences between the engines and how they convert queries before processing. I’m suprised that Google doesn’t return any error message for these searches, but it looks like they are the only engine out of the big three that converts HTML entities before processing.