Google, MSN, Yahoo Treat HTML Character Entities Differently

For last week’s SEO Quiz, I spent some time researching HTML character entities in order to maintain valid xhtml code.

For instance, the encoding for » is ». During my research on HTML entities, I noticed that the search engines differ in the way they convert query input.

Google converts encoded HTML into the character it represents, before processing the query. I’ll use Õ as an example. The character Õ is encoded as Õ

A search for “Õ” in Google returns results for the letter “O” and “Õ”. I’m guessing that Google knows that “Õ” is a variation of the letter “O”. In MSN, however, a search for “Õ” returns the same results as a search for “213″. This means MSN strips all escaping characters before processing the search. Yahoo processes a search for “Õ” the same way MSN does. Try a search for “A” and “A” in all the search engines to get a better feel for the differences.

Additionally, Google returns a different error message than MSN and Yahoo. A search for “$” (the encoding of “$”) returns nothing in Google. There is no error message, no results returned, nothing. A search for “$” returns the same thing.

Since MSN and Yahoo strip the escaping characters, the query is still processed and results are returned for a search of “36″. A search for “$” returns an error message explaining that there are no results containing “$”.

Interesting to see the differences between the engines and how they convert queries before processing. I’m suprised that Google doesn’t return any error message for these searches, but it looks like they are the only engine out of the big three that converts HTML entities before processing.

2 thoughts on “Google, MSN, Yahoo Treat HTML Character Entities Differently

  1. I find your results quite interesting and I think Google has it right for the most part.

    Although there are region specific search engines for each of the ‘big boys’, Google processing the HTML entities show that they take into consideration not every language follows the English alphabet and there are variations in spelling and terms.

    Just because a person is in the United States (or another English speaking nation) doesn’t mean they write in English or would necessarily search in English.

    Hope that makes some sense :)

Comments are closed.