Defeating The Sandbox

Ian McAnerin has been posting about how he’s been able to get around the sandbox with some careful planning and preparation.

First off some background information about Ian’s little experiment. Back in October, Ian registered http://www.mcanerin.us and created a page on that domain aiming to rank for the term “dotus” (I’m not linking to the pages because I don’t want to influence the expirement).

A month later, the experiment page is ranking #4 for “dotus”.

Ian isn’t giving the details behind his findings but we can take a look at Ian’s experiment page and see what he is doing.

Redirects

First thing I noticed when visiting the page is that there is a 301 redirect from the mcanerin.us domain to his mcanerin.com domain. After a quick whois lookup, I can tell that his mcanerin.com domain has been registered since 2001. A site command at Google shows only one page indexed for the mcanerin.us domain – the experiment page. A query for “dotus” brings up “www.mcanerin.us/us/dotus.htm” at rank #4. What’s interesting is that the page is no longer resolving to the correct page and is now a 404. So what happened?

A couple site/link/cache checks at the other search engines might help. Yahoo is showing about 18 pages indexed for the mcanerin.us domain and MSN is showing about 100. Obviously the mcanerin.us domain (or at least the experiment page) had to have been indexed for it to rank. But with a 301 redirect, Google should have dropped the mcanerin.us domain in favor of the mcanerin.com domain. Loading up anything on the mcanerin.us domain brings you to mcanerin.com/EN/us/. So why did all the search engine index the mcanerin.us domain?

I can think of two reasons:
1. Ian started off using a 302 redirect.
2. There is a delay from when the search engines first pick up the domain and when they resolve the 301 in their indexes.

I think the answer is number one. If you take a look at the pages that were indexed all of them follow the same exact url structure as the mcanerin.com domain except the .com is replaced with .us. Although many of the mcanerin.us pages no longer resolve (check MSN for the list), if you replace the .us with .com – all of them are still there. From a developer’s standpoint, it would be very easy to write a 302 that swaps out the .us with .com.

A 302 redirect will tell the search engines to keep the mcanerin.us domain indexed along with the content that comes from the mcanerin.com domain. This is what we are seeing.

But why is there a 301 redirect on the site now? Was there ever a 302? I can only guess, but I’d say Ian started off with a 302 redirect then switched to a 301 after his experiment was over. Again I may be totally off on this one.

Linking

A link check over at Yahoo shows that Ian’s sites are the only site linking to his experiment page. Let’s look at what links to Ian’s sites – seobook, sej, seobythesea, sew, seomoz – all relevant sites and all authority sites. Ian links to other authority sites like high rankings, sew, sempo, sma-na, etc so I’d consider Ian’s site part of the authority hub too.

Content

The experiment page is cleanly coded – xhtml and css for layout. The title of the page is “DotUS Domain Registration” emphasizing “dotus” and using “domain registration” is a good semantically related keyword.

There are a couple mentions of “dotus” early on and in different header tags (h1/h2). What really stands out is the content and use of related keywords. Sprinkled throughout the content, Ian uses related keywords like “dot us”, “domain registration”, “mcanerin.us”, “dotus ccTLD”. Even the alt tags for images contain related keywords. There is no excessive use of the same keywords, since each use is not a copy but a related keyword.

So how does all this information relate to the sandbox?

Ian’s experiment page is a good example of how to approach the sandbox. In my opinion the sandbox has always been about your link patterns – specifically TrustRank. The sandbox is not a filter for new sites but a filter for untrusted sites. Great links from on topic trusted authority sites will pass on “trust” to a site and help that site avoid the sandbox. For example, this site (http://www.socialpatterns.com/) never hit the sandbox. After a couple weeks this site already started to rank. This site typically picks up links from established seo sites at a fairly natural rate. If I write something worthwhile, I’ll gain a couple links. Overall my ratio of trusted links vs untrusted links is higher for trusted links.

Overview of Ian’s experiment page

TrustRank – incoming links from Ian’s main page, which is an ontopic authority.
Page Optimization – headers/titles/semantically related keywords.
Link History – slow growth of links, trusted links outweight number of untrusted links.
Redirects – I may be totally off on this one, but I think Ian started off with a 302 in order to get the domain indexed with the same content. No real duplicate penalty because it’s obvious that the domain is a geographical copy.
Clean – clean code and small size. Xhtml/CSS allows for less code (compared to tables) and still allows designers to craft a good looking page.
Semantic Code – important terms are placed higher in the document and in a higher heading tag. Less important but still related terms are placed in a lower heading tag.
Content – good amount of related content without overstuffing keywords.

Update: I was totally off about the 302. Ian’s posted some more info about his experiment.

WebmasterWorld Dropped from Google Due to Robots.txt

It didn’t take long for Google and MSN to drop WebmasterWorld completely from their indexes after Brett Tabke changed WebmasterWorld’s robots.txt to disallow all spiders. WebmasterWorld is not showing up in Google or MSN and I’m sure Yahoo is soon to follow.

I know Brett is making a point here about resources being consumed by search engine spiders, but banning them entirely and not providing an alternative site search is a horrible decision from a user stand point.

Danny Sullivan has more indepth coverage about the changes at WebmasterWorld and the recent delisting from the search engines.

Google Base Officially Launches

Looks like Google Base is officially live. Check out the blog post and then play around with Google Base here.

Right now, there are two ways to submit data items to Google Base. Individuals and small website owners can use an interactive user interface; larger organizations and sites can use the bulk uploads option to send us content using standard XML formats.

Rather than impose specific schemas and structures on the world, Google Base suggests attributes and item types based on popularity, which you can use to define and attach your own labels and attributes to each data item. Then searchers can find information more quickly and effectively by using these labels and attributes to refine their queries on the experimental version of Google Base search.

Optimizing for Personalization

This is a repost from my response over at the SEW forums.

I think personalization represents more business for SEMs that understand how people search.

For instance lets say you have a web page about blackberry – the fruit. Right now it would be impossible for you to rank for the term blackberry, even if you have a great on topic page because blackberry (the appliance) is generating so many more searches/pages/links/updates. Currently all the SERPs are dominated by blackberry (the appliance) sites. Anyone searching for your page on the blackberry fruit using the keyword “blackberry” will have some trouble.

With personalization, you can place your site in front of the people that want to visit it.

So how do you optimize for personalization?

Do what Google is doing:

Track the urls of your ideal visitor. What sites is he/she visiting regularly? In this case your ideal visitor is someone interested in the blackberry fruit. Perhaps he/she frequents foodtv.com, recipes.com, a food forum, etc.

Figure out what categories your ideal visitor is most interested in. Foods? Cooking? Horticulture? Take some time to visit sites that belong to these categories. What topics do they cover and what terms are they using frequently?

Study the search behavior of your ideal visitor. What terms are they using on a regular basis?

Now that you have a good idea of what your ideal visitor does online – apply that to your SEO campaign. Grab links from sites that your visitor would visit frequently. Cover topics that they are interested in. Target terms that are related to articles on some of your visitor’s bookmarked sites.

Remember that Google is using user patterns to personalize rankings. If you can tailor your site/campaign to those patters, you have a better chance of performing well with personalization.

What the benefit here? You’ll get even more targetted traffic and you may be able to place for terms you never would have normally. If you take my blackberry example, even if the site is linked to by great authority sites about blackberry fruits and recipes – it won’t matter right now. But with personalization it will.

Update: Most of these tips are basic things you can do to improve your current site’s ranking. On topic links, relevant content, and placing your site in a good network all help to improve rankings. The key idea to take from this is that even for extremely high competition terms, you will be able to compete with personalized rankings. Your job is to optimize your site so that people your are marketing to, can find your site easily.

Google Patent: Personalization of Placed Content Ordering in Search Results

This is a summary of Google’s recent patent on personalized search results. It’s in a similar format as Rand’s Historical Data report for two reasons. First reason is for consistency, I’m assuming everyone is familiar with his report and thus would be familiar for this format. Second reason is because I can’t think of a better format.

Everything here is my own interpretation of the patent.

Overview of Important Concepts

These concepts are what I believe are the most important for search engine optimizers and marketers to understand in order to benefit from this report.

Google’s Goal to “Personalize” Search

Google understands that currently not all search results are relevant for everyone. For instance, if someone searches for “blackberry”, how does Google know if you are searching for blackberry devices or a black berry for cooking? Depending on the searcher, one topic will be more relevant than the other. In order to present better search results, personalized results will take a user’s profile (more on that later) into consideration. So not only will Google be ranking websites on typical factors (linkage, textual analysis, click through rate) but they will now be incorporating user historical information.

User Profile

The user profile is based upon user history. The patent specifically outlines: user search query history, documents returned in the search results, documents visited in the search results, anchor text of the documents, topics of the documents, outbound links of the documents, click through rate, format of documents, time spent looking at document, time spent scrolling a document, whether a document is printed/bookmarked/saved, repeat visits, browsing pattern, groups of individuals with similar profile, and user submitted information. All this data can be mined by programs Google has already released – Google Desktop Search, Personalized Search History, and Google Toolbar.

Search Query History

One of the biggest concepts of the patent is Google’s use of search query history. Google is building a profile based on past searches, but they are tracking many more things than previously thought. They are using past search queries to form user term profiles and then comparing the term profiles to profiles of placed content (advertisements). They are also performing analysis on search history documents to figure out what types of documents interest you. Linkage data will play a big role here.

Content Profile

Content profiles will be generated for advertisements. The content profiles consist of categories (sets of terms) mapped to a specific weight. Basically a value system – for example a category of sports may consist of the terms “basketball”, “football”, and “soccer” – each category with a varying weight based on relevancy to the content. Time to read up on term-weight vectors. These content profiles are then compared to the user profile in order to generate a similarity score for ranking.

Continue reading

Yahoo Maps Beta

Yesterday, Yahoo launched a new beta version of Yahoo! Maps. This new version is a significant improvement over their previous version – the interface has been upgraded and plenty of new features have been added.

The new interface is very application-centric and the default text form has been replaced with a huge map with a form to the left side.

Drop down menus combined with easy to find search boxes improve usability. The navigation can be collapsed for a bigger view of the map. The visual map is draggable.

In the upper right corner of the visual map there is a smaller inset map that shows the surrounding area. To the right of the inset map are the zoom controls.

One feature I really like is the live traffic overlay. With this feature, you can tell how fast traffic is moving and which roads have accidents.

Yahoo has placed default categories on the left hand side to help users find what they want faster. There are links to services (atms/gas stations/groceries), shopping, entertainment, restaurants, and travel (tourist spots, transportation, hotels).

You can choose to save locations for future searches. One of the more useful features is the new Multi-point driving directions. Now you can map from point A to B to C on one map. The directions are color coded, so they are very easy to follow.

Yahoo! Local has been integrated into the map search, so in addition to directions/address information, you can access ratings/reviews/events.

Some interface features have been added that really make it easy to use Yahoo! Maps. For instance, you can drag a drop a locations address from the map popup into the address form, making it quick and easy to get the directions. No need to retype the info into the form. Map locations can be toggled to display or not display, this way you can clean up the clutter if you want to print a map.

To coincide with the new beta, Yahoo has released a new maps API. The new API has support for both Flash and AJAX presentation models. Several new APIs have been added including geocoding, traffic information, map images, and local search. The Simple Maps API is still allowing developers to plot locations on Yahoo Maps with no rate limits.

Yahoo Update Tonight

Tim Mayer is once again giving us a heads up to the updates over at Yahoo.

We will be making changes to the ranking of our index tonight. I would expect that this update will be mild and quick compared to recent ones but will impact the ranking of some sites.

If you have any feedback for us about the new index please email: ystfeedback@yahoo.com.

Drop by this thread at WebmasterWorld to discuss what you are seeing on Yahoo’s search engine results pages. A couple people are upset over Yahoo’s delay with indexing new sites and an increased number of spam sites showing up in the rankings.

Finding Success In Search

This is a repost of an article I wrote for the company newsletter. If you haven’t subscribed yet, you should. We send out new information all the time. The mini-article is a very brief overview of some basic steps in search engine optimization and was written as an introduction to search engine optimization.

Finding Success in Search

Search represents a new and effective method of marketing to your customers. With search, you don’t hunt for customers – they hunt for you. If someone needs water purification services in Los Angeles, they load up Google and type in “Water purification in Los Angeles”. Instantly, that person will be presented with a list of web sites, all relevant to that person’s original search. The amazing part of this situation is that the businesses never went out to find the customer, instead the customer found them. Additionally, this customer is not just any regular customer. This customer is already interested in your product and your service.

Search engine marketing is one of the most effective means of attracting interested customers to your product and service. If you have a web site and you aren’t actively engaging in search engine marketing, you are neglecting a huge share of online traffic.

Continue reading