Seiten
▼
Tuesday, June 7, 2011
Html5, Hashtags, SEO and the next 1000 years
We are the future, that's the past
We are the moment built to last
Oldschool Baby - Westbam feat. Nena
I already wrote briefly about SEO hell in HTML hashtag world [1]. In short: Hashtags are used widely in HTML5 js driven apps. Downside: Everything after the “#” is not even sent to the server, what makes indexing SEO and searching such “apps” difficult. If you have an app that should not be searched it is fine (say - a game). But if you want to build a new CMS using HTML5 and js and hashtags you are doomed, because your site will not be indexed easily.
There is a little relief right now: Google developed an approach that involves a special protocol, a server extension and a headless browser on your webserver to make a “#” crawlable [2]. Quite awful and it feels a bit strange. The big downsides of this approach:
1) Only supported by Google, not by Bing, Ask any other search engine to date.
2) It destroys the structure of the web. URLs are really truly great [3]. Why are we breaking them with hahstags, “#” or “#!” or whatever?
[answer: because hashtags are great for Web 2.0 apps ;)]
There is a possible “solution” to both problems : GITHub developed their tree slider using the html5 history api and pushState() [4]. The user never sees a hashtag in their URL. Therefore if a user copies and pastes the url or if the crawler tries to scan your page contents it will be accessible. As regular URL. This of course only makes sense if you tune your webserver to parse URLs accordingly. It makes most sense of you enhance your website progressively (as GITHub does...).
But - pushState() has possibly severe security implications. A malicious js can now completely change the url. Skin a website, smuggle your code into another site, forward to skinned webpage, pushState the correct url and you get a lots of passwords. Security hell in short.
Summary:
The ugly hashtag URL problem aka SEO problem might be elegantly solvable with a combined approach of pushState() and an enhanced webserver. Especially in an progressive enhancement scenario this makes sense. But it is not for all projects.
[1] http://ars-machina.raphaelbauer.com/2010/10/gwt-aka-ajax-and-crawler-seo-hell.html
[2] http://code.google.com/intl/de-DE/web/ajaxcrawling/docs/getting-started.html
[3] http://www.w3.org/Provider/Style/URI.html
[4] https://github.com/blog/760-the-tree-slider
[5] http://danwebb.net/2011/5/28/it-is-about-the-hashbangs