developing apps for humans & robots

Post on 11-Jul-2015

112 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Developing Web Applications

for Humans and Robots --- Nagaraju Sangam

Humans:

Humans has

Feelings

Habits

Languages

• Char Encoding

• Left-to-right Vs Right to Left

Cultures

Time Zones

User roles: Admin, End User

Impairments : Visual, Hear, Motor, Cognitive

Humans • alt, title for image

• Keep empty alt for unimportant images

• role for sections

• for (label –field)

• Titles for frames

• Allow keyboard navigation

Web Robots:

Web Robots : Programs that traverse the Web automatically.

Web Wanderers

Crawlers

Spiders

Good Robots :

indexing/crawling

Eg:

• Googlebot

• Bingbot

• Msnbot

Bad Robots:

Spam : Tries to read confidential info from the pages, access private folders…

Email ids, Phone numbers etc.

Problems with Good Robots:

Crawls everything…

Scripts

CSS

Resources

Images

Multiple versions of the pages

Un-related pages

Private folders etc…

Problems with Good Robots: Solution

Add Robots.txt file in root folder of your site

You should be able to browse the file via below URL

http://yourdomain/robots.txt

Put the below code in robots.txt

This will prevent all bots from crawling your site…

User-Agent:* Disallow: /

Robots.txt

Problems with Good Robots: Solution

Robots.txt

User-agent: Googlebot Disallow: /scripts Disallow: /styles Disallow: /*.PDF$ User-agent: Bingbot Disallow: /scripts Disallow: /styles Disallow: /*.PDF$ User-agent: Yandex Disallow: /scripts Disallow: /styles Disallow: /*.PDF$ User-agent: * Disallow: /

Robots.txt

Dealing with Bad Robots:

Robots.txt is not a real security feature.

It doesn’t prevent the bad robots from crawling your content.

It’s just a guideline for the robots, its up to them whether to follow it or not.

For bad robots you should have rules setup in firewalls to block them.

Typo errors in Robots.txt:

Robots.txt is a case sensitive file.

There is a possibility for typo errors.

So it’s always advisable to use tools to generate the file.

Samples:

https://www.facebook.com/robots.txt

http://www.yahoo.com/robots.txt

http://www.google.co.in/robots.txt

Meta tags for Robots:

We can setup rules for robots at the html page level via html tags

Meta tags

<META name="robots" content= "NOINDEX, NOFOLLOW">

<Meta name="googlebot" content="noindex" />

<Meta name="googlebot-news" content="nosnippet">

HTTP Headers

X-Robots-Tag: noindex

If you have Robots.txt and meta tags in page, search engines will first look at the

robots.txt and then the meta tags in the page.

Meta tag attribute values are case in-sensitive, Robots.txt is case sensitive.

Meta tag values for search engines:

Other html tags for used by web robots:

<Title>

<META NAME=“DESCRIPTION" CONTENT=“Nagaraju Sangam">

<META NAME="AUTHOR" CONTENT=“Nagaraju Sangam">

<META HTTP-EQUIV="CONTENT-LANGUAGE" CONTENT="en-US,fr">

<META HTTP-EQUIV="EXPIRES" CONTENT="Sun, 30 May 2013 12:00:00PM GMT">

<META NAME="KEYWORDS" CONTENT=“music,news,entertinement">

Title & Description in search results:

Title: Comes from the <Title> tag in the head section of the page. If no title is found,

search engine performs the heuristic algorithm and displays the title.

Description: Comes from the Meta tag in the head section of the page. If no description

is found is found, search engine performs the heuristic algorithm and displays the

description, this may not be intuitive to the page.

<Meta name=“description” content=“description goes here..”>

It’s a best practice to add title and description to each page of the site. Title should be

unique for each page.

QA

No questions

please…???

References

Google is the best place to search , use the below terms

• Web SEO

• Web Accessibility

Thank you…!!!

Next Session…!!!

Today we covered Robots only….

We will discuss “Humans…” in the next session.

top related