web scraping using diazo!

Post on 19-May-2015

4.983 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Web Scraping using Diazo!Talk given at the StarTechConf 2011Santiago, Chilewww.startechconf.com

TRANSCRIPT

Web Scraping@alvaro_aguirre

Saturday, November 5, 2011

In search of our cosmic origins...

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

Data Scraping vs

Web Scraping

Saturday, November 5, 2011

<html>

<header></header>

<body>

.....

</body>

</html>

Data Scraping

Saturday, November 5, 2011

Web Scraping

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

DeliveranceXDV

Diazo

Saturday, November 5, 2011

Diazo

Saturday, November 5, 2011

Saturday, November 5, 2011

<replace css:content=”h1” css:theme=”#main” />

Saturday, November 5, 2011

<drop css:content=”h1” />

<drop css:theme=”breadcrumbs” />

Saturday, November 5, 2011

<replace css:theme=”#header” content=”#header-element” if-content=”” />

Saturday, November 5, 2011

<drop css:theme="#info-box" if-path="/news"/>

Saturday, November 5, 2011

<theme/><notheme/><replace/><before/><after/><drop/><strip/><merge/><copy/>

Saturday, November 5, 2011

<replace css:theme="#details"> <dl id="details"> <xsl:for-each css:select="table#details > tr"> <dt><xsl:copy-of select="td[1]/text()" /></dt> <dd><xsl:copy-of select="td[2]/node()"/></dd> </xsl:for-each> </dl></replace>/></dt>

<table id="details"> <tr> <td>One</td> <td>1</td> </tr> <tr> <td>Two</td> <td>2</td> </tr></table>

<dl id="details"> <dt>One</dt> <dd>1</dd> <dt>Two</dt> <dd>2</dd></dl>

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

Saturday, November 5, 2011

Tools

Saturday, November 5, 2011

External Content

Saturday, November 5, 2011

Saturday, November 5, 2011

• development of web & mobile interfaces

• legacy apps integrations

• prototypes

• low coupling

Saturday, November 5, 2011

from diazo.compiler import compile_themefrom lxml import etreefrom diazo.compiler import compile_theme

absolute_prefix = "/static"

rules = "rules.xml"theme = "theme.html"

compiled_theme = compile_theme(rules, theme, absolute_prefix=absolute_prefix)

transform = etree.XSLT(compiled_theme)content = etree.parse(some_content)transformed = transform(content)

output = etree.tostring(transformed)

Saturday, November 5, 2011

github/aaguirre

Saturday, November 5, 2011

diazo.org

Saturday, November 5, 2011

plone.org

Saturday, November 5, 2011

gracias!

Saturday, November 5, 2011

top related