Google Updates Information for Webmasters

Google sneaked in a couple new lines in the Webmaster Guidelines. Make sure you check out all the new pages over at the Webmaster Guidelines – great information straight from Google.

Have other relevant sites link to yours.
Submit a sitemap as part of our Google Sitemaps (Beta) project. Google Sitemaps uses your sitemap to learn about the structure of your site and to increase our coverage of your webpages.
Submit your site to relevant directories such as the Open Directory Project and Yahoo!, as well as to other industry-specific expert sites.

PageRank is a measure of global popularity, since it measures how many pages in the entire index is linking back to you. PageRank by itself does not consider any other information, like the relevancy of anchor text or the topic of the original page.

But here, Google is telling us that they are focusing on the value of relevant topical links, which sounds like local popularity to me. Bye Bye PageRank, Hello LocalRank?

Update: Forgot to mention that Google has always been using some sort of local popularity analysis for its rankings. I’m thinking they are placing a bigger public emphasis on it now.

Update #2: Gary Price at SEW notices a couple more additions. Google explains what to do when moving a web site to a new IP and what the differences are between the supplemental index and the main index. Make sure you read the section on returning pages for a specific country. Great information.

One thought on “Google Updates Information for Webmasters

  1. Yesterday (March 31st 2005), Google was awarded a patent for a system of identifying and scoring documents in relation to historical data. The patent is quite complicated, but here are my initial and personal thoughts on what it entails.

    The patent illustrates that Google no longer relies simply on its Page Rank method (also patented) of scoring pages but now, and increasingly in the future, will assign scores to websites and individual web pages by analysing various historical data associated with the page and site, and pages and sites that link to it, since its creation.

    Google’s specific definition of document is “any machine-readable and machine-storable work product”. It gives examples such as newsgroup postings, web advertisements – as well as, interestingly, emails and files.

    In the context of its web search engine, however, the term document usually refers simply to a web page or website.

    Within the framework of this patent, a document’s score can be affected (positively or negatively – Google does not always specify which), by any or all of the following:

    Factors Affecting Document (Website/ Web Page) Score

    Frequency of document change, e.g. how often a web page or site is updated. Google specifically states that “updated” documents (how regularly, it doesn’t say) are given a boost in score.
    The magnitude (“amount”) of change to a document over time. This applies to both changes within individual pages (e.g. updating of content on a Homepage), and changes to the overall document (e.g. pages added to a site) over time.
    Note: The amount of change over time score is further affected by the perceived importance of the sections that change. For (a speculative) example, changes to a Homepage may be regarded by Google as more – or less – significant than changes to a Contact Details page.
    The manner in which the content of a document changes over time. To give an example that, again, is purely speculative: the content of a website may change if the domain name has been bought out (and the domain registration details have changed accordingly). In this case, Google might deem the manner of change more significant (with negative consequences to its score) than a simple content update.
    How often the document is selected when the document is included in a set of search results. E.g. If a web page is number 10 in the results for query x, but is usually selected more than the first nine results, the web page is likely to gain a higher score (and subsequently move to a higher position in the results).
    Note: Google in this instance states clearly that frequent document selection in results in a higher score, as we would expect.
    Whether the document history is associated with frequent search queries. By “is associated with”, I assume that Google mainly means that a page includes the words used in a popular search query, and/or they appear in the anchor text of inbound links to page or site.
    Note: “frequent search queries” refers to what are known in the SEO world as “money keyphrases” – and typically (but not by definition) have a higher number of results (several million or more).
    Whether documents are outdated or “stale”. Staleness is determined, “at least in part,” by users not selecting documents as much as other results alongside which they appear, for a given query. Google specifically states that stale documents are penalized.
    Age of links and associated documents, i.e. the dates on which the links were first created, and the age of the documents on which they were created. Google specifies that the system aims to penalize a document’s ranking if the links and their associated documents are short-lived, and vice-versa.
    Freshness of links. Link scores are weighted according to their “freshness”. Freshness is determined by the dates of any changes to the links themselves (particularly the anchor text), and of the documents that contain them.
    “Authority” and trustworthiness of the document containing the link. For some time now, Google-watchers have known that it is better to get links from reputable, quality sites, particularly those known in the SEO world as “hubs, authorities and expert” sites (e.g. university sites).
    Differences in documents and anchor text associated with links. The anchor text of a link is still important, but so is the relevancy of the anchor text to the linked site; the relevancy of the anchor text to the linking site; and the difference in anchor text among inbound links. (SEO practitioners have known since Florida that these patterns should look “natural”.)
    Historical information about the “behaviour” of (inbound) links to documents. Google explains that its system tries to determine “whether there is a trend toward appearance of new links … versus disappearance of existing links”. Presumably, the former is rewarded while the latter is not. Google specifically mentions penalizing a document’s ranking if the “link churn” (rate of change of inbound links) is above a certain threshold.
    Characteristics and changes to visitor traffic patterns in relation to the documents. By “characteristics” of traffic, Google may mean the web source and/or the geographical source of the traffic, as well as more granular features, or demographic features. Note: How Google obtains site traffic information is open to debate, but here are some suggestions:

    By analysing traffic that passes through its search engine results;

    By analysing Google Toolbar data;

    In the future, by providing visitor analytics – see Google’s recent acquisition of Urchin statistics.
    Changes in the visitor traffic patterns over time. Was the site once popular but not now? Or vice-versa? These factors could help determine the relevancy of a site.
    User behaviour relating to the document, such as (Google’s example) how much time they spend looking at a website or page.
    Historical domain-related information, including:
    The “legitimacy” of the domain
    The expiration date of the domain

    The domain server/ name server records (presumably to check for servers known to be associated with spam)

    Prior ranking history of the document, i.e. past performance in Google.
    The rate at which the document moves in the history, presumably to flag if document rises unusually quickly – “spikes” – in search results.
    The rate at which the document is selected as a search result over time. Again, this has to do with determining a page or site’s current popularity with searchers.
    User maintained or generated data. Google says such data includes favourites lists, bookmarks, temporary (internet) files and cache files. However, it is unclear as yet how Google would access this information. Google specifically mentions the identification of trends where users attempt to add or remove the document from their own generated or maintained data
    Information relating to document topic(s), as extracted from the document, and changes to document topic over time. It is well known that Google’s programs attempt to surmise the theme of any given document – this is how it matches relevant ads to third-party web pages.
    I have not yet fully dissected Google’s own examples and interpretations that follow the patent description, and my thoughts are likely to change on so doing.

    However, it is clear from reading this patent that Google has (as we suspected) moved far away from its once cutting-edge PageRank algorithm, and is evolving to an algorithm (itself a set of algorithms) that is much more sophisticated.

    This explains recent phenomena such as the Google Sandbox effect and the notorious Florida update (when much of the above, I suspect, was first implemented. Note that the patent was originally filed on Dec 31, 2003 – a month after the Florida update, when the Sandbox effect also first took hold).

    I believe that these changes are geared much more towards countering the effect of search engine spammers, by deterring “instant gratification” on the SERPs. The convenient side benefit for Google, of course, is that new sites that find it difficult to get listed are likely to spend money on Google Adwords.

    However, the evolution of search engines is inevitable, and the pace of such evolution is impressive. The symbiotic relationship between Google and search engine optimisers remains. While the changes here are clearly designed to benefit aged websites that are regularly updated and have gained their links “naturally”, the complexity of the algorithm is such that businesses are more likely to seek the services of skilled optimisers, rather than less.

Comments are closed.