transformations – wtf’s going on? [email protected]

14
Transformations – WTF’s going on? [email protected]

Upload: cody-stone

Post on 29-Dec-2015

225 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Transformations – WTF’s going on? Andy.hunt@alfresco.com

Transformations – WTF’s going on?

[email protected]

Page 2: Transformations – WTF’s going on? Andy.hunt@alfresco.com

Basics…

What’s a Transformation?• Indexing• Doclib Thumbnails• Previews• Rules• ….

Page 3: Transformations – WTF’s going on? Andy.hunt@alfresco.com

What’s the problem?

• Lots of transformers

• Lots of mimetypes

• Lots of permutations of the above

• Inconsistent results / Non-deterministic

• Transformations not working

• Lack of visibility

Page 4: Transformations – WTF’s going on? Andy.hunt@alfresco.com

How does Alfresco choose?

• Active Transformers

• “Explicit” takes precedence

• Any Limits

• Speed

Page 5: Transformations – WTF’s going on? Andy.hunt@alfresco.com

Make it transparent

• Log4j.logger. org.alfresco.repo.content.transform .TransformerDebug

= DEBUG

• debugTransfomers.txt • Exactly 18 bytes

Page 6: Transformations – WTF’s going on? Andy.hunt@alfresco.com

Example 1 – txt to html

] 193 text/plain text/html] 193 txt html 24.txt 5 bytes ContentService.transform(...)] 193 **a) transformer.complex.OpenOffice.PdfBox<<Complex>> < 5 MB 204 ms] 193 b) transformer.OpenOffice<<Proxy>> 1,918 ms] 193 c) transformer.TikaAuto 6,724 ms] 193.1 text/plain text/html] 193.1 txt html 24.txt 5 bytes transformer.complex.OpenOffice.PdfBox<<Complex>>] 193.1.1 text/plain application/pdf] 193.1.1 txt pdf 24.txt 5 bytes transformer.OpenOffice<<Proxy>>] 193.1.1 Finished in 43 ms] 193.1.2

store:///installs/3411e/tomcat/temp/Alfresco/ComplextTransformer_intermediate_txt_5927671274426616985.pdf

] 193.1.2 application/pdf text/html] 193.1.2 pdf html <<TemporaryFile>> 6.2 KB transformer.PdfBox] 193.1.2 Finished in 7 ms] 193.1 Finished in 50 ms] 193 Finished in 56 ms

Page 7: Transformations – WTF’s going on? Andy.hunt@alfresco.com

Example 2 – large txt to html

] 204 txt html alfresco.biggerlog.txt 16.5 MB ContentService.transform(...)] 204 **a) transformer.TikaAuto 526 ms] 204 b) transformer.OpenOffice<<Proxy>> 853 ms] 204 --c) transformer.complex.OpenOffice.PdfBox<<Complex>> > 5 MB

Page 8: Transformations – WTF’s going on? Andy.hunt@alfresco.com

Example lists

] 13.2 transformer.StringExtracter 0 ms

] 13.2 1) txt txt unlimited

] 13.2 2) csv txt unlimited

] 13.2 3) html txt unlimited disabled not explicit

] 14.1243 txt jp2 a) transformer.complex.OpenOffice.Image<<Complex>> 1,171 ms 5 MB

] 14.1249 txt txt a) transformer.StringExtracter 0 ms unlimited

] 14.1249 b) transformer.TikaAuto 0 ms unlimited

] 14.1249 c) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms 0 bytes

disabled

Page 9: Transformations – WTF’s going on? Andy.hunt@alfresco.com

What can we do?

• Available transformers

• content-services-context.xml<!-- This one does excel only --><bean id="transformer.Poi"class="org.alfresco.repo.content.transform.PoiHssfContentTransformer"parent="baseContentTransformer" />

Page 10: Transformations – WTF’s going on? Andy.hunt@alfresco.com

What can we do?

• Explicit transformers html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit

b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not

explicit

c) transformer.TikaAuto 0 ms unlimited disabled not explicit

d) transformer.HtmlParser 0 ms unlimited EXPLICIT

e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms

unlimited disabled not explicit

<property name="explicitTransformations">

<list>

<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" >

<property name="sourceMimetype"><value>text/html</value></property>

<property name="targetMimetype"><value>text/plain</value></property>

</bean>

</list>

</property>

Page 11: Transformations – WTF’s going on? Andy.hunt@alfresco.com

What can we do?

• Explicit transformers - 2 html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit

b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not

explicit

c) transformer.TikaAuto 0 ms unlimited disabled not explicit

d) transformer.HtmlParser 0 ms unlimited EXPLICIT

e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms

unlimited disabled not explicit

<property name="supportedTransformations">

<list>

<bean class="org.alfresco.repo.content.transform.SupportedTransformation" >

<property name="sourceMimetype"><value>text/html</value></property>

<property name="targetMimetype"><value>text/csv</value></property>

</bean>

</list>

</property>

Page 12: Transformations – WTF’s going on? Andy.hunt@alfresco.com

What can we do?

• Any Limits• maxSourceSizeKBytes

• content.transformer.PdfBox.TextToPdf.maxSourceSizeKBytes• Listed in repository.properties• content.transformer.default.maxSourceSizeKBytes=-1

Page 13: Transformations – WTF’s going on? Andy.hunt@alfresco.com

What can we do?

• Speed - Startup Averages• transformer.OpenOffice.time=123456• transformer.PdfBox.TextToPdf.time=50000• transformer.complex.Text.Image.time=10000• transformer.complex.Text.Image.count=10000

Page 14: Transformations – WTF’s going on? Andy.hunt@alfresco.com

Thank you for [email protected]