Webmasters and SEO professionals know that getting into Google’s good graces is hard work. Most sites are continuously working to improve their chances of ranking well in search results, even though there’s no way to know for sure if their efforts will actually translate into high organic positions in SERPs (search engine result pages). This article explains how STAT technology empowers webmasters with the ability to use data mining techniques at scale using Hadoop cluster computing, exploiting years of historical data about Google rankings.
First, let’s review the basics:
Every time someone searches for anything on Google, millions of pages are evaluated by algorithms looking for relevant content. The most popular web pages then rise to page 1 of the search results page (SERP). The result for each query is then ranked using multiple signals, including:
- Relevance (how closely the page matches the search term)
- Authority (how trusted the site is deemed to be)
- An unknown measure of “quality”
These are important factors webmasters have limited control over. But there are other aspects of the website which can directly impact how it ranks, even if Google themselves has not yet made adjustments. The best example of this is Penguin, Google’s recent algorithm change that specifically penalizes sites with low-quality links pointing to them.
This article provides readers with a better understanding of how they should approach measuring their SEO efforts., we discuss how Big Data technologies are being used to automatically mine through data to identify website ranking opportunities.
STAT technology can be applied with any SEO-related dataset, but this article focuses on the tools necessary to use STAT for one of the most popular search engine optimization tasks: getting a website into Google’s top 10 results.
The algorithm powering STAT is known as “Rank Tracker”, which retrieves accurate SERP rankings without requiring webmasters or SEO professionals to manually track keyword rankings by hand. To do so would be almost impossible, since every result fluctuates regularly and an entire site might have thousands of keywords with different rankings based on location, language, date, etc. Without automation, it would be nearly impossible to keep up with all rankings changes in real-time.
So how does STAT work?
STAT’s core technology is based on a statistical analysis of the ranking and backlink data it has collected and organized through its public API. The system can be thought of like a search engine itself, where each query represents a website and the SERPs are the result set. Each website in question has many keywords associated with it, which we refer to as “features”. Using Rank Tracker, webmasters can identify opportunities by applying their own set of ranking features against any given keyword dataset. To demonstrate how this works, let’s walk through an example:
For our examples, we will use two sites that should theoretically have very different rankings for similar keywords: [Wikipedia](http://en.wikipedia.org/) and [collegehumor.com](http://www.collegehumor.com/). For the sake of simplicity, we will focus on two keywords in particular: “wiki”, which is associated with Wikipedia, and “watch”, which should be much more popular for CollegeHumor given their content. The goal is to use STAT Rank Tracker to help us identify if Wikipedia or CollegeHumor are ranking well for these two different words.
We’ll assume our keyword results are already imported into STAT’s platform for analysis (which can be done by using the rank tracker API directly or via a third-party tool like MozRank Monitor). Let’s start simply by querying all queries containing the term “watch”:
This tells us that CollegeHumor does not currently have a ranking for the “watch” keyword (at least not in this sample dataset). It also says Wikipedia is ranked [#4](http://moz.com/blog/how-to-use-stat-rank-tracker) for this term, which is in line with what we’d expect based on their Web and Social metrics:
What does this mean?
There are a few things we can infer from this data:
- CollegeHumor has very few pages that contain the word “watch” (31) compared to Wikipedia’s 313. This suggests that collegehumor.com should be able to rank well for the query, but isn’t due to insufficient content focused on it.
- The results are shown in an order that correlates with Moz’s current Page Authority metric. The higher the number, the better the rank for this keyword.
- The results also show different SERP providers (e.g., AOL, Ask, etc.), which can provide SEOs with insight into how well their pages are ranking for various search engines across various regions/languages/etc.
With this data we could make some straightforward recommendations to CollegeHumor:
- Increase page count by optimizing existing content or creating new pages targeted at “watch” queries.
- Optimize external link signals by building more authoritative links from other sites related to watch topics (similar to what CollegeHumor has done successfully with [this page](http://www.collegehumor.com/articles/651543/watch) for the query “wiki”).
Another useful feature of STAT’s results is the estimated click-through rates (CTR), which indicate the average probability that users will click on any given SERP result. CTR can be determined by dividing each specific ranking position by how many total results appear in a given dataset. For example, there are nine websites listed at rank 3 in our “watch” SERPs, resulting in an overall CTR of [33%]( http://moz.com/blog/how-to-use-stat-rank-tracker#remaining):
Stat also shows us keyword volume data, which indicates what number of searches a given term receives on a monthly basis. In the case of “watch”, we can see that it has a relatively high volume (about 2.8 million) compared to some other possible queried terms (“wiki”) and many others within the top 10:
By exploring SERPs for these two keywords, we’ve been able to determine how Wikipedia is currently ranking as well as what CollegeHumor could do to improve its position. This same methodology can be applied to any given website or keyword set by taking advantage of STAT’s Rank Tracker features.