web scraping using diazo!

34
Web Scraping @alvaro_aguirre Saturday, November 5, 2011

Upload: pythonchile

Post on 19-May-2015

4.983 views

Category:

Technology


1 download

DESCRIPTION

Web Scraping using Diazo!Talk given at the StarTechConf 2011Santiago, Chilewww.startechconf.com

TRANSCRIPT

Page 1: Web Scraping using Diazo!

Web Scraping@alvaro_aguirre

Saturday, November 5, 2011

Page 2: Web Scraping using Diazo!

In search of our cosmic origins...

Saturday, November 5, 2011

Page 3: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 4: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 5: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 6: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 7: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 8: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 9: Web Scraping using Diazo!

Data Scraping vs

Web Scraping

Saturday, November 5, 2011

Page 10: Web Scraping using Diazo!

<html>

<header></header>

<body>

.....

</body>

</html>

Data Scraping

Saturday, November 5, 2011

Page 11: Web Scraping using Diazo!

Web Scraping

Saturday, November 5, 2011

Page 12: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 13: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 14: Web Scraping using Diazo!

DeliveranceXDV

Diazo

Saturday, November 5, 2011

Page 15: Web Scraping using Diazo!

Diazo

Saturday, November 5, 2011

Page 16: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 17: Web Scraping using Diazo!

<replace css:content=”h1” css:theme=”#main” />

Saturday, November 5, 2011

Page 18: Web Scraping using Diazo!

<drop css:content=”h1” />

<drop css:theme=”breadcrumbs” />

Saturday, November 5, 2011

Page 19: Web Scraping using Diazo!

<replace css:theme=”#header” content=”#header-element” if-content=”” />

Saturday, November 5, 2011

Page 20: Web Scraping using Diazo!

<drop css:theme="#info-box" if-path="/news"/>

Saturday, November 5, 2011

Page 21: Web Scraping using Diazo!

<theme/><notheme/><replace/><before/><after/><drop/><strip/><merge/><copy/>

Saturday, November 5, 2011

Page 22: Web Scraping using Diazo!

<replace css:theme="#details"> <dl id="details"> <xsl:for-each css:select="table#details > tr"> <dt><xsl:copy-of select="td[1]/text()" /></dt> <dd><xsl:copy-of select="td[2]/node()"/></dd> </xsl:for-each> </dl></replace>/></dt>

<table id="details"> <tr> <td>One</td> <td>1</td> </tr> <tr> <td>Two</td> <td>2</td> </tr></table>

<dl id="details"> <dt>One</dt> <dd>1</dd> <dt>Two</dt> <dd>2</dd></dl>

Saturday, November 5, 2011

Page 23: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 24: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 25: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 26: Web Scraping using Diazo!

Tools

Saturday, November 5, 2011

Page 27: Web Scraping using Diazo!

External Content

Saturday, November 5, 2011

Page 28: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 29: Web Scraping using Diazo!

• development of web & mobile interfaces

• legacy apps integrations

• prototypes

• low coupling

Saturday, November 5, 2011

Page 30: Web Scraping using Diazo!

from diazo.compiler import compile_themefrom lxml import etreefrom diazo.compiler import compile_theme

absolute_prefix = "/static"

rules = "rules.xml"theme = "theme.html"

compiled_theme = compile_theme(rules, theme, absolute_prefix=absolute_prefix)

transform = etree.XSLT(compiled_theme)content = etree.parse(some_content)transformed = transform(content)

output = etree.tostring(transformed)

Saturday, November 5, 2011

Page 31: Web Scraping using Diazo!

github/aaguirre

Saturday, November 5, 2011

Page 32: Web Scraping using Diazo!

diazo.org

Saturday, November 5, 2011

Page 33: Web Scraping using Diazo!

plone.org

Saturday, November 5, 2011

Page 34: Web Scraping using Diazo!

gracias!

Saturday, November 5, 2011