better sitemap (mozilla drumbeat)
DESCRIPTION
Project proposal on how SItemap 0.90 can be improved.TRANSCRIPT
![Page 1: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/1.jpg)
Better Sitemap
U-Zyn [email protected]
December 12, 2009Mozilla Drumbeat Challenge
Singapore
This work is licensed under a Creative Commons Attribution 3.0 License.All other trademarks, logos and copyrights are the property of their respective owners.
![Page 3: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/3.jpg)
• XML• List of URLs• For URL discovery• Robot-friendly
• Max of 10MB/50k URLs per file
U-Zyn [email protected]
![Page 4: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/4.jpg)
U-Zyn [email protected]
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.google.com/</loc> <priority>1.000</priority> </url> <url> <loc>http://www.google.com/3dwh_dmca.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/cpanel/domain</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/edu/</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/new.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/overview.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/privacy.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/program_policies.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/seminars.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/terms.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/testimonials.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/admins/tour.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/edu/administration.html</loc> <priority>0.5000</priority>
</url> <url> <loc>http://www.google.com/a/help/intl/en/edu/benefits.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/edu/calendar.html</loc>
<priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/edu/customers/asu.html</loc> <priority>0.5000</priority> </url> <url>
<loc>http://www.google.com/a/help/intl/en/edu/customers/pdfs/asu_success_story.pdf</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/edu/details.html</loc> <priority>0.5000</priority> </url>
<url> <loc>http://www.google.com/a/help/intl/en/edu/features.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/edu/gmail.html</loc> <priority>0.5000</priority>
</url> <url> <loc>http://www.google.com/a/help/intl/en/edu/pagecreator.html</loc> <priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/edu/seminars.html</loc>
<priority>0.5000</priority> </url> <url> <loc>http://www.google.com/a/help/intl/en/edu/startpage.html</loc> <priority>0.5000</priority> </url> <url>
<loc>http://www.google.com/a/help/intl/en/edu/talk.html</loc> <priority>0.5000</priority> </url>
• Messy
• Huge(google.com’s – 3.9MB)
• Useless(for human)
![Page 6: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/6.jpg)
• For robots:– Faster– More efficient
• For humans:– More useful– At least readable by human web client – browser.– A browser uses about 5KB of bandwidth to download favicons.
Why not use the bandwidth to download more useful material?
U-Zyn [email protected]
Aims
![Page 7: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/7.jpg)
Sitemap
• Parent page• Sibling pages• Children pages• Parsable by web browsers
U-Zyn [email protected]
Hierarchical
![Page 9: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/9.jpg)
• <lastmod> is in Sitemap 0.90• But not sorted-by• Present sitemap in chronological order
U-Zyn [email protected]
Chronological
![Page 11: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/11.jpg)
• Robots:– Do not have to download huge sitemap files
everytime– Only download first few chunks
• Browsers:– Easily tell surfers where the newly updated
content is located– (unlike RSS) not limited to blog/blog-like site.
U-Zyn [email protected]
Chronological
![Page 12: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/12.jpg)
U-Zyn [email protected]
More Efficient (Draft)
• Multiple versions– Chronological• Robots do not have to download the whole sitemap for
each crawl– Hierarchical
• Seekable– With header index– Only download needed portions
![Page 13: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/13.jpg)
U-Zyn [email protected]
More Efficient (Draft)
• Smarter– Each page serves sitemap based on where
client/user is at.– Do not have to download whole sitemap.– Do not have to parse whole sitemap.– Able to keep filesize small – approx. 5KB for
browsers to load quickly.
• Switch away from XML?
![Page 14: Better Sitemap (Mozilla Drumbeat)](https://reader034.vdocuments.mx/reader034/viewer/2022052322/557e81d1d8b42acf658b4955/html5/thumbnails/14.jpg)
Better SitemapU-Zyn Chua
This work is licensed under a Creative Commons Attribution 3.0 License.All other trademarks, logos and copyrights are the property of their respective owners.
• For robots and humans alike• Chronological• Hierarchical• Seekable• Smarter
Project Summary