go up

SEO

AJAX indexation from the user and programmer point of view

2210/2014

Indexing websites on AJAXAJAX is a complex approach used to create interactive web interfaces. Approach consists in background data exchange between browser and web server. The page is not reloaded completely at that, it’s updated partially.

This article will be the first in a small series dedicated to AJAX websites indexing. In this series we will consider AJAX from three different points of view:

  • as a user;
  • as a programmer;
  • as a web crawler.

From user’s perspective I picture AJAX as technology (or approach), which lets getting new content at the web page without its complete reload. I just click the button or link and get new data just in second.

Let’s consider simple example with a form sending to the server. The user fulfills the form and pushes the button “Save”. Therewith the page is reloaded. It can take some time (sometimes only a couple of seconds). If website uses AJAX technology for communicating with web server, there won’t be page reload. The page will be constantly opened, and as soon as server finishes query processing, the user will receive a message or new data.

Applications based on AJAX, send queries to the web server, which extract or send only currently needed data.

As a programmer, I see AJAX as a mean of building modern, user-friendly and fast website interface. Such approach reduces server load, because there is no need to transfer the whole page to a browser on each query. On saving data this query will be relatively small and server response will contain only changed data.

The simplest variant looks like this:

  • http://example.com – the main page, which contains links “About” and “Contacts”: each of them changes the URL to the following:
    • http://example.com#about – description;
    • http://example.com#contacts – contacts for feedback.

It looks like an ordinary fragment identifier, but such type of URL is also used by AJAX websites. As soon as hash tag changes, a query will be set and sent to receive corresponding data block. Generally corresponding hash tag (#example) is added to webpage address, but after updating the web page will be not the same as if resulting URL would be entered in a new window or tab. In case of right approach changes at the page (along with the current content) should be accessible at the changed URL momentarily in order not to look again for necessary AJAX link or button at the page.

More detailed technical description can be found at Wikipedia.

AJAX is used on such websites as Gmail, Google Maps, Facebook, Twitter, VK. These are the most remarkable representatives.

There is a third point of view on AJAX technology – from search engines perspective.

AJAX indexing from web crawler perspective

Let’s consider case example – VK user or group “wall”. At the page you can see only a few most recent posts. If you scroll the page down, more posts will be downloaded. Moreover, to find month-old information (not having direct link to it), you will have to scroll down the page for a long time. Data is downloaded using JavaScript. Web crawlers don’t support JavaScript. Web crawlers are different from web browsers: they are not able to download additional data. As a rule, web crawler doesn’t make clicks on the page, it just memorizes links for further checking. This means that only initially loaded webpage will be indexed.

Web crawler indexes only part of an URL before hash tag discarding extra data.

As you see, AJAX can be whether convenient or have disadvantages. Absence of AJAX on web pages would be more preferable for search engines. This concept hides from indexing useful content and some pages. Indexing such sites was simply impossible for web crawlers before. Users had more dynamic and animated interface, while SEO specialists were at a loss.

Today “single-page” sites often come up. Main page is downloaded, and only after this needed content is loaded in response to user’s activity by the means of background AJAX query.

In 2009 post which suggested making AJAX websites indexable appeared in Google blog.

Yandex publicly announced the ability to index dynamic content in May 2012 (http://webmaster.ya.ru/replies.xml?item_no=13754).

Fortunately for web developers, both suggested solutions were the same.

Here is Yandex description in a post “AJAX site indexing”:

  1. Substitute symbol # to #! in pages URLs. So crawler will understand, that it can request HTML version of this page content.
  2. Place HTML version of this page content in URL, where #! is changed to ?_escaped_fragment_=

One can think of manual replacing hash tag with URL _escaped_fragment_, but Google states the following on its support site answering the question “When should I use _escaped_fragment_ and when should I use #! in my AJAX URLs?

“Your site should use the #! syntax in all URLs that have adopted the AJAX crawling scheme. Googlebot will not follow hyperlinks in the _escaped_fragment_ format.”

Google also has a page with more detailed technical description of web servers and web crawlers interaction, which can be found here.

Part #! is needed to implement bypass of AJAX direct indexing. When web crawler comes across these two symbols, it understands that it’s useless to check this link alone. Content on this link will be partially hidden. Instead of that other query to the special URL is made, where #! will be substituted with on ?_escaped_fragment_=. Technically instead of AJAX query direct query by the new URL with substitution will be performed to retrieve this page content. Hash tags will work only in browser with JavaScript support.

In fact search engines didn’t implement native AJAX pages and links indexing, which would be independent from site code. But they suggested the method which works. Solution doesn’t look that good and it is still unacceptable for many. Webmasters have to make extra finesses to look like professionals both in users’ sight and in search engines’ sight. Pages requested with parameter _escaped_fragment_ are frequently created additionally. It complicates website inner arrangement. Many (most of) sites even don’t implement such adaptation leaving dynamic content out of search.

One more brief description, from which it’s clear, how AJAX make the developers’ life more difficult on Twitter example:

http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch.

PushState – HTML5 History API Part

(link to the specification).

Besides URL replacement by the standard JavaScript means, pushState method, which makes AJAX usage more friendly, appeared in HTML5 specification. In practice the method just changes current URL displayed in browser address bar.

Not long ago Matt Cutts channel on YouTube published video answer to the following question:

«Should I use pushState to update my URLs instead of using the hashbang (#!) style to manage AJAX navigation?».

The answer was that it would be preferable for Google to come up to navigation made with pushState on a site. In this case search engine doesn’t need to make extra work on rebuilding query with part #! catching and using substitution for getting good link. It is highly glad news, but detailed description of processing method using pushState for navigation was not found in official Google documents.

Example of the site using pushState:

http://html5.gingerhost.com/new-york.

Navigation is still working and links don’t look so horrible like in the case of using hash tag.

The old variant is still OK for search engines, but many sites are choosing the path of getting rid of hash tags in URL.

In particular, Twitter declared war on hash tags (http://engineering.twitter.com/2012/05/improving-performance-on-twittercom.html) and followed suit of using pushState (http://engineering.twitter.com/2012/12/implementing-pushstate-for-twittercom_7.html).

Tools for SEO specialists working with AJAX sites

Tools for SEO specialists working with AJAX sites

There is no ideal instrument, which would tell you at a glance, that all links are being indexed correctly. But there are ones worth noticing.

The simplest way to check which pages have already been indexed at the present moment is to enter query “site:domain” in the needed search engine. You will see static pages and pages using AJAX, if you have ones.

Fetch as Googlebot” will show your website as viewed by googlebot. If you added your site to Google and it is validated, then you will be able not only to make sure that link is in index and to send it manually, but also compare real page with its snapshot in pages preview. E.g., if instead of needed content standard parked page for not found page emerges, than you will manage to fix it before indexing this link.

We recommend using Fetch as Google to check the crawlability of your site, along with other tools such as:

  • HTML suggestions: See recommendations for improving your title tags, meta descriptions, and other HTML elements that can affect your site’s performance in search
  • Crawl errors: See which pages Google had trouble crawling.

I mentioned before, that webmasters have to make a copy of page or programming finesse to put it into index with construction _escaped_fragment_. This is an example of how “black” search optimization or “cloaking” can be used. Its main sense lies in content substitution in the eyes of search engine. Instead of real content web crawler will see and will index completely different text. In such a manner careless SEO specialists will try to boost site’s rating in search result. Sometimes it’s even possible to put into index the sites, which would be bolt out in advance on check-up, or to index them with other more popular keywords. But it’s important to understand, that search engine will delete you from the database in case of “cloaking” detection.

Modern AJAX usage methods prescribe solutions, which are not that simple for search engines. But it is one more source of information and ability of search engines to work with it makes information on the Internet more accessible.

Author: Alexander Denisenko

Add a comment

Your e-mail will not be published. Required fields are marked *

*
*

(Spamcheck Enabled)