transformations – wtf’s going on? [email protected]
TRANSCRIPT
![Page 2: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/2.jpg)
Basics…
What’s a Transformation?• Indexing• Doclib Thumbnails• Previews• Rules• ….
![Page 3: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/3.jpg)
What’s the problem?
• Lots of transformers
• Lots of mimetypes
• Lots of permutations of the above
• Inconsistent results / Non-deterministic
• Transformations not working
• Lack of visibility
![Page 4: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/4.jpg)
How does Alfresco choose?
• Active Transformers
• “Explicit” takes precedence
• Any Limits
• Speed
![Page 5: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/5.jpg)
Make it transparent
• Log4j.logger. org.alfresco.repo.content.transform .TransformerDebug
= DEBUG
• debugTransfomers.txt • Exactly 18 bytes
![Page 6: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/6.jpg)
Example 1 – txt to html
] 193 text/plain text/html] 193 txt html 24.txt 5 bytes ContentService.transform(...)] 193 **a) transformer.complex.OpenOffice.PdfBox<<Complex>> < 5 MB 204 ms] 193 b) transformer.OpenOffice<<Proxy>> 1,918 ms] 193 c) transformer.TikaAuto 6,724 ms] 193.1 text/plain text/html] 193.1 txt html 24.txt 5 bytes transformer.complex.OpenOffice.PdfBox<<Complex>>] 193.1.1 text/plain application/pdf] 193.1.1 txt pdf 24.txt 5 bytes transformer.OpenOffice<<Proxy>>] 193.1.1 Finished in 43 ms] 193.1.2
store:///installs/3411e/tomcat/temp/Alfresco/ComplextTransformer_intermediate_txt_5927671274426616985.pdf
] 193.1.2 application/pdf text/html] 193.1.2 pdf html <<TemporaryFile>> 6.2 KB transformer.PdfBox] 193.1.2 Finished in 7 ms] 193.1 Finished in 50 ms] 193 Finished in 56 ms
![Page 7: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/7.jpg)
Example 2 – large txt to html
] 204 txt html alfresco.biggerlog.txt 16.5 MB ContentService.transform(...)] 204 **a) transformer.TikaAuto 526 ms] 204 b) transformer.OpenOffice<<Proxy>> 853 ms] 204 --c) transformer.complex.OpenOffice.PdfBox<<Complex>> > 5 MB
![Page 8: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/8.jpg)
Example lists
] 13.2 transformer.StringExtracter 0 ms
] 13.2 1) txt txt unlimited
] 13.2 2) csv txt unlimited
] 13.2 3) html txt unlimited disabled not explicit
] 14.1243 txt jp2 a) transformer.complex.OpenOffice.Image<<Complex>> 1,171 ms 5 MB
] 14.1249 txt txt a) transformer.StringExtracter 0 ms unlimited
] 14.1249 b) transformer.TikaAuto 0 ms unlimited
] 14.1249 c) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms 0 bytes
disabled
![Page 9: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/9.jpg)
What can we do?
• Available transformers
• content-services-context.xml<!-- This one does excel only --><bean id="transformer.Poi"class="org.alfresco.repo.content.transform.PoiHssfContentTransformer"parent="baseContentTransformer" />
![Page 10: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/10.jpg)
What can we do?
• Explicit transformers html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit
b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not
explicit
c) transformer.TikaAuto 0 ms unlimited disabled not explicit
d) transformer.HtmlParser 0 ms unlimited EXPLICIT
e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms
unlimited disabled not explicit
<property name="explicitTransformations">
<list>
<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" >
<property name="sourceMimetype"><value>text/html</value></property>
<property name="targetMimetype"><value>text/plain</value></property>
</bean>
</list>
</property>
![Page 11: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/11.jpg)
What can we do?
• Explicit transformers - 2 html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit
b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not
explicit
c) transformer.TikaAuto 0 ms unlimited disabled not explicit
d) transformer.HtmlParser 0 ms unlimited EXPLICIT
e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms
unlimited disabled not explicit
<property name="supportedTransformations">
<list>
<bean class="org.alfresco.repo.content.transform.SupportedTransformation" >
<property name="sourceMimetype"><value>text/html</value></property>
<property name="targetMimetype"><value>text/csv</value></property>
</bean>
</list>
</property>
![Page 12: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/12.jpg)
What can we do?
• Any Limits• maxSourceSizeKBytes
• content.transformer.PdfBox.TextToPdf.maxSourceSizeKBytes• Listed in repository.properties• content.transformer.default.maxSourceSizeKBytes=-1
![Page 13: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/13.jpg)
What can we do?
• Speed - Startup Averages• transformer.OpenOffice.time=123456• transformer.PdfBox.TextToPdf.time=50000• transformer.complex.Text.Image.time=10000• transformer.complex.Text.Image.count=10000
![Page 14: Transformations – WTF’s going on? Andy.hunt@alfresco.com](https://reader036.vdocuments.mx/reader036/viewer/2022082417/56649e7b5503460f94b7c76f/html5/thumbnails/14.jpg)
Thank you for [email protected]