Boaz Sasson | Head of Performancesas son@ s im i l a rwe b .com
Server-side SEO: The Art of Making Love to Spiders
Every Website
Traffic Trends Desktop & Mobile Referring Traffic
Keywords Advertising Analysis
Popular Pages
Every App
Current User Installs Active Users
Engagement per app Retention Analysis
App store optimization
Every Country
We Reveal the Secrets of Online Success
sasson
Inrrodce SW and what i do there - ToFu (seo, ppc, social, disstribution)
Not Even a Spider: Googlebot is a headless browser, not a simple link crawlerCan render pages visually
Traverses the DOM
Executes AJAX, JS & forms
*Spotted in the wild as early as 2010(h t tp : / / sea rcheng ine land .com/goog les -p roposa l- fo r - c ra w l ing -a jax -may -be - l i ve -34411)
Meet the Cookie MonsterGooglebot DOES seem able to accept cookies
No longer a reliable way to Segregate/sniff bot traffic
Mr. Greedybot RequiresAccess to All Your Files,and QuicklyDo NOT block JS, CSS, scripts orimages from Googlebot if they are needed to render a page
Avoid setting crawl rate limits, if atall possible
Search bot traffic can eat your bandwidth, let them
Impact on PR/Linkjuice??Not clear how it flows on links in JS,forms, code, etc…
Consider all the links/filepaths in code, as well as text links, as part of the page’s link graph
Internal links can either promote Indexation, or pass juice, or do both
What is a crawl budget and how does is affect me?
More Pages Crawled = More Pages Indexed = More Traffic (*If site is healthy)
Depth Prob Based On:Amount of incoming links/buzz
Content creation rate/amounts
Trust – more of it results in wider and deeper crawling
Think in terms of both crawling a site (laterally) and crawling a page (depth)
Bad/low quality content getting crawled is a waste of your crawl budget
Indicator of Site’s HealthCrawl stats can be used as a quick indicator of a site’s general SEO health
How many pages indexed?
Trends?
What errors/parameters are indexed?
Life is HardNot easy to get many pages indexed quickly on a new site
What Should I Block?
Golden rule: “One filepath per specific content piece”
Low quality/trust pages
Duplicate (many forms), sorting, multi- category, non-existent,
framed content
How to Block Content & Some Misconceptions
Better to delete crap than to block it
Assume that anything in the DOM is technically accessible to modern search bots, even though it may not pass juice
Robots.txt only works on internally
Don't block with both robots.txt and meta robots together
Best to block with meta robots & delete via GWT (*renew every 6 months)
X-robots tag, is in document HEAD, useful for PDFs, XML, etc…
Play around with blocking elements via frames, tabs, forms, animations,
lazyloading
Links lose juice with each hop
Catch as many instances as possible in single rules
Default redirect should be 301
If no other options, use meta refresh set to zero seconds for 301s, & 5
seconds for 302s
Google Really Dislikes Broken Links
Check using a scheduled 404 report or spider
Scan on a regular basis
Sessions, parameters and cookies
Do NOT:
Print session IDs/parameters on filepaths in code
Pass session IDs via filepaths
Be mindful of parameters used, each is considered to be a unique page
Use cookies to pass session info
If no other alternative, block parameters via GWT
Supercookies (flash, browser cache, fingerprinting, E Tags, etc)
Filepaths can be flat, deep, or with several parameters if needed, all
seem to work fine
Have a clear hierarchy in terms of directory structure, use internal links
to emphasize relationships
Be consistent - all lower case, hyphens not underscores, avoid empty
spaces
Think of clickability of the filepath when seen by a human
Avoid foreign language encoding on URLs
Two Great & Free Tools for Crawling
IIS SEO Toolkithttp://www.iis.net/downloads/microsoft/search-engine-
optimization-toolkit
Xenuhttp://home.snafu.de/tilman/xenulink.html
Quick Tests:
Technical, Penalty, or Market?
47/20
Thank YouBoaz Sasson
sas son@ s im i l a rwe b .com