model asset exchange: path to ubiquitous deep learning ... · ables developers to use dl models...

Model Asset eXchange: Path to Ubiquitous Deep LearningDeployment

Alex BozarthCenter for Open-Source Data & AI

Technologies (CODAIT), IBMSan Francisco, California, USA

[email protected]

Brendan DwyerCenter for Open-Source Data & AI

Technologies (CODAIT), IBMSan Francisco, California, [email protected]

Fei HuCenter for Open-Source Data & AI


[email protected]

Daniel JalovaCenter for Open-Source Data & AI


[email protected]

Karthik MuthuramanCenter for Open-Source Data & AI

Technologies (CODAIT), IBMSan Francisco, California, [email protected]

Nick PentreathCenter for Open-Source Data & AI


[email protected]

Simon PlovytCenter for Open-Source Data & AI


[email protected]

Gabriela de QueirozCenter for Open-Source Data & AI


[email protected]

Saishruthi SwaminathanCenter for Open-Source Data & AI


[email protected]

Patrick TitzlerCenter for Open-Source Data & AI


[email protected]

Xin WuCenter for Open-Source Data & AI


[email protected]

Hong Xu∗Center for Open-Source Data & AI


[email protected]

Frederick R Reiss†Center for Open-Source Data & AI


[email protected]

Vijay BommireddipalliCenter for Open-Source Data & AI


[email protected]

ABSTRACTA recent trend observed in traditionally challenging fields such ascomputer vision and natural language processing has been the sig-nificant performance gains shown by deep learning (DL). In manydifferent research fields, DL models have been evolving rapidly andbecome ubiquitous. Despite researchers’ excitement, unfortunately,most software developers are not DL experts and oftentimes havea difficult time following the booming DL research outputs. As a∗All authors contributed equally. Please contact this author or visit http://ibm.biz/max-slack for questions regarding this paper.†Also with IBM Research.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’19, November 3–7, 2019, Beijing, China© 2019 Association for Computing Machinery.ACM ISBN 978-1-4503-6976-3/19/11. . . $15.00https://doi.org/10.1145/3357384.3357860

result, it usually takes a significant amount of time for the latestsuperior DL models to prevail in industry. This issue is furtherexacerbated by the common use of sundry incompatible DL pro-gramming frameworks, such as Tensorflow, PyTorch, Theano, etc.To address this issue, we propose a system, called Model AssetExchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programmingframeworks, it provides an open source Python library (called theMAX framework) that wraps DL models and unifies programminginterfaces with our standardized RESTful APIs. These RESTful APIsenable developers to exploit the wrapped DL models for inferencetasks without the need to fully understand different DL program-ming frameworks. Using MAX, we have wrapped and open-sourcedmore than 30 state-of-the-art DL models from various researchfields, including computer vision, natural language processing andsignal processing, etc. In the end, we selectively demonstrate twoweb applications that are built on top of MAX, as well as the processof adding a DL model to MAX.

arX

iv:1

909.

0160

6v1

[cs

.LG

] 4

Sep

201

9

http://ibm.biz/max-slack

http://ibm.biz/max-slack

https://doi.org/10.1145/3357384.3357860

CCS CONCEPTS• Computing methodologies→Machine learning; • Appliedcomputing → Enterprise architecture frameworks; Service-oriented architectures; • Software and its engineering→ Soft-ware architectures.

ACM Reference Format:Alex Bozarth, Brendan Dwyer, Fei Hu, Daniel Jalova, Karthik Muthuraman,Nick Pentreath, Simon Plovyt, Gabriela de Queiroz, Saishruthi Swaminathan,Patrick Titzler, Xin Wu, Hong Xu, Frederick R Reiss, and Vijay Bommired-dipalli. 2019. Model Asset eXchange: Path to Ubiquitous Deep LearningDeployment. In The 28th ACM International Conference on Information andKnowledge Management (CIKM ’19), November 3–7, 2019, Beijing, China.ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3357384.3357860

1 INTRODUCTIONIn the past decade, research fields of artificial intelligence (AI) haveadvanced drastically. Most noticeably, deep learning (DL) has gaineddramatic performance improvements and revolutionized the wholeresearch field of AI. On the one hand, for many tasks that wereconsidered “impossible” for AI to achieve a performance comparableto human, such as playing Go [8] and image recognition [2], DL hasdemonstrated its power by surpassing human performance with alarge margin. On the other hand, DL has also become ubiquitous inmany areas, especially in Internet of things. Therefore, it is essentialto systematize DL models.

DL researchers have been celebrating their remarkable and laud-able achievements: Some DL models that arose from applied DLresearch fields such as computer vision and natural language pro-cessing, have been revolutionizing these fields. However, the in-dustry has been struggling to make use of them for the followingreasons:

Booming research results Every year, thousands of new DLresearch results are published. The state of the art is pushed forwardrapidly in many DL research fields. For a non-DL expert, it usuallytakes a significant amount of time to implement necessary blocksderived from research papers.

Mathematical complexity Sometimes DL models require de-cent knowledge on statistics and linear algebra to grasp. This math-ematical barriers are often insurmountable for most software de-velopers due to the diversity in their backgrounds.

Confusing terminologiesNew terminologies have been emerg-ing due to DL’s rapid evolution and wide adoption in a variety ofresearch fields. Unfortunately, these new terminologies are ofteninconsistent and thus frequently confuse non-DL experts.

These issues have been severely impeding the adoption of DLin industry. Therefore, to fill this gap, it is imperative to develop asoftware system that reduces the required minimum knowledge toget started with DL models.

In addition, the difficulty to harness DL in industry is furtherexacerbated by the common use of sundry DL programming frame-works, including Tensorflow [1], PyTorch [7], Theano [9], etc. TheseDL programming frameworks are incompatible with each otherand their software structures are usually fundamentally different.Hence, it usually requires strenuous effort to port programming

code based on one DL programming framework to another. There-fore, a software system that unifies these programming interfacesand blocks is critical to the success of DL in industry.

In this demo, we present a software system, called Model AsseteXchange (MAX)1, that addresses aforementioned difficulties. Thepaper is organized as follows. In Section 2, we present the architec-ture and software components of MAX. In Section 3, we describeour demonstration.

2 ARCHITECTURE DESIGN AND SOFTWARECOMPONENTS OF MAX

In this section, we describe our architecture design and varioussoftware components of MAX.

2.1 Architecture DesignMAX is a software system that employs an extensible and dis-tributive architecture and makes use of state-of-the-art containertechnology and cloud infrastructures. Figure 1 illustrates its archi-tecture. MAX is hosted on a cloud infrastructure, such as IBM cloud,and communicates with web applications via standardized RESTfulAPIs. It is undergirded by a powerful abstract component, namedthe MAX framework. The MAX framework wraps DL models im-plemented in different DL programming frameworks and providesprogramming interfaces in an uniform style, which effectively en-ables developers to use DL models without the need to dive intoDL programming frameworks. Each implementation of DL modelruns in isolated Docker containers, which promotes security andeffectively turns the architecture to be easily distributive and ex-tensible. Additionally, we build MAX exclusively on top of opensource technologies, which promotes the open and collaborativeculture that academia generally embraces.

2.2 Software ComponentsIn this subsection, we describeMAX’s various software componentsas shown in Figure 1.

2.2.1 The MAX Framework. The MAX framework is a Python li-brary that wraps DL models to unify the programming interface.To wrap a model, it simply requires implementing functions thatprocess input and output. This simplicity is key to the MAX frame-work’s agnosticism to DL programming frameworks.

2.2.2 DL Models. MAX can accommodate DL models written indifferent DL programming frameworks. The MAX framework com-municates with DL models via standardized Python programminginterfaces. To use a DL model in MAX, we only need to adapt itsPython programming interface, i.e., wrap the DL model. Once theDL model is wrapped, it is available throughout the whole MAXsystem and does not require further adaptation in the future. ThisPython programming interface is objected-oriented: Wrapping onlyrequires inheriting specific classes and implementing some prede-fined class functions by converting input and output of DL modelsto data structures acceptable to the MAX framework.

The wrapped DL models and their programming interface withthe MAX framework are hosted in Docker containers2. A container1https://developer.ibm.com/exchanges/models/2https://www.docker.com

https://doi.org/10.1145/3357384.3357860

https://developer.ibm.com/exchanges/models/

https://www.docker.com

DL ModelsApplicationsUsers

Cloud Infrastructure (IBM Cloud)

MAX Framework

···

···

Audio Classifier

Object Detector

Scene Classifier

Text Summarizer

Inte

rnal

Int

erfa

ces

Restful A

PIs

Insert JSON images here

Docker Container

···Web App

MobileApp

Figure 1: Design of the MAX architecture. The software components to the right of the vertical blue dashed line are on IBMCloud. Software components inside the round-cornered green rectangles run in Docker containers.Source of some subcomponents of the figure: https://upload.wikimedia.org/wikipedia/commons/3/38/Ethernet_Port.svg , https://commons.wikimedia.org/wiki/File:Gears.png , https://commons.wikimedia.org/wiki/File:Icons8_flat_phone_android.svg

is an isolated instance of environment that hosts software of interestand its runtime. This isolation in general promotes extensibility,distributability, and security.

For example, without the help of containers, it is usually difficultto deploy two DL models depending on conflicting runtime envi-ronments (such as different versions of TensorFlow) on the samehosting OS in the same physical computer. Containers solve thisissue by creating an isolated virtual runtime environment for eachDL model. For another example, if multiple DL models are deployedon the same hosting OS directly, a security vulnerability in one DLmodel would also normally risk other DL models.

Additionally, MAX uses Docker containers instead of traditionalisolation/container technologies such as physical isolation (multi-ple physical computers) or virtual machines (emulated hardwareenvironments running on host OSes). The reason is that traditionalisolation/container technologies are in general computationallycostly, as each isolated node runs a complete operating system (OS)instance, either physical or virtual.

To mitigate this issue, Docker containers only cost moderatecomputational resources while retaining a high degree of isolation:Based on Linux container technologies, Docker containers shareone single OS kernel instance that is the same as the one thatsteers the host OS. Therefore, by employing Docker containers,MAX is able to operate under low computational cost with verylittle compromise in extensibility, distributability, or security thatisolation/container technologies provide.

2.2.3 RESTful APIs: Between Applications and the MAX Frame-work. MAX provides a standardized DL programming framework-agnostic programming interface as RESTful APIs, which effectivelyavails developers of DL models without requiring them to dive intothe various DL programming frameworks.

For each DL model, MAX’s output is in the JSON format fol-lowing a standardized specification. This standardization enablesdevelopers to quickly adapt their applications by replacing the un-derlying DL model with very little and often zero modification tothe code that interacts with the DL model. This is in sharp contrastwith the current common practice: Due to the non-standardizedprogramming interfaces, when replacing underlying DL models,developers usually have to drastically modify their code and fre-quently find themselves mired in figuring out the correct usage ofoften abstrusely defined APIs. MAX also integrates Swagger3 tomake a graphical user interface (GUI) automatically available to allwrapped DL models. An example is shown below4:{"status": "ok","predictions":[[{"positive": 0.9977352619171143, "negative": 0.002264695707708597}],[{"positive": 0.001138084102421999, "negative": 0.9988619089126587}]

]}

3 DEMONSTRATIONIn this section, we describe our demonstration.

3.1 Web Applications Built upon MAXIn this subsection, we describe our demonstration of the architec-ture of MAX using two web applications, an object detector [3–6]and an image caption generator [10], that are built on top of MAX.We describe in turn the demonstration of software components (asshown in Figure 1) and interfaces between them.

• WebUI: They interact with users via GUIs and communicatewith the MAX framework via a RESTful JSON interface.Figures 2a and 2b preview our demonstration.

3https://swagger.io/4Taken from https://github.com/IBM/MAX-Text-Sentiment-Classifier .

https://upload.wikimedia.org/wikipedia/commons/3/38/Ethernet_Port.svg

https://swagger.io/

https://github.com/IBM/MAX-Text-Sentiment-Classifier

(a) Image caption generator (web UI screenshot)

(b) Object detector (web UI screenshot)

Figure 2: Web application demonstration.

• JSON Interface Between Web UI and the MAX Frame-work: This interface is RESTful and based on the JSON for-mat. Upon users’ requests, the Web UI accordingly sends theMAX framework a JSON-formatted string following a pre-defined specification. Figure 3 previews our demonstration(using Swagger) of this JSON interface.

• Python Programming Interface Between DL Modelsand theMAXFramework:Wewill demonstrate the Pythoninterface that bridges these two components.

3.2 Adding a DL Model to MAXWe will also interactively demonstrate the process to add a DLmodel to MAX and thus make it available to MAX users. Thisincludes three steps: (1) Wrapping a DL model using the MAXframework, (2) building a Docker image that hosts the wrapped DLmodel, and (3) optionally uploading to IBM Cloud. Additionally, forthis process, we have also created a skeleton called MAX-Skeleton5as a convenient starting point for typical use cases.

4 CONCLUSIONIn this demo paper, to address the difficulties in the industry’s adop-tion of DL models from DL research fields, we presented MAX, asoftware system that employs an extensible and distributive archi-tecture and makes use of state-of-the-art container technology andcloud infrastructures. In particular, we described the architectureand software components of MAX as well as the interaction be-tween them in detail. Finally, we proposed our demonstration of

5https://github.com/IBM/MAX-Skeleton

Figure 3: Example JSON output from the image caption gen-erator.

two web applications that are built on top of MAX, as well as theprocess of adding a DL model to MAX.

REFERENCES[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,

Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San-jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg,Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, MikeSchuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, PaulTucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals,Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.https://www.tensorflow.org/ Software available from tensorflow.org.

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deepinto Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016–1034.

[3] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, WeijunWang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications. arXivabs/1704.04861 (2017).

[4] Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara,Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, andKevin Murphy. 2017. Speed/Accuracy Trade-Offs for Modern ConvolutionalObject Detectors. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition. https://doi.org/10.1109/CVPR.2017.351

[5] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Gir-shick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. LawrenceZitnick. 2014. Microsoft COCO: CommonObjects in Context. arXiv abs/1405.0312(2014).

[6] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector.In Proceedings of the European Conference on Computer Vision. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

[7] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang,Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer.2017. Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop.

[8] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, Georgevan den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershel-vam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalch-brenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu,Thore Graepel, and Demis Hassabis. 2016. Mastering the Game of Go with DeepNeural Networks and Tree Search. Nature 529 (2016), 484–489.

[9] Theano Development Team. 2016. Theano: A Python Framework for Fast Com-putation of Mathematical Expressions. arXiv abs/1605.02688 (2016).

[10] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2017. Showand Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 652–663. https://doi.org/10.1109/TPAMI.2016.2587640

https://github.com/IBM/MAX-Skeleton

https://www.tensorflow.org/

https://doi.org/10.1109/CVPR.2017.351

https://doi.org/10.1007/978-3-319-46448-0_2

https://doi.org/10.1007/978-3-319-46448-0_2

https://doi.org/10.1109/TPAMI.2016.2587640

model asset exchange: path to ubiquitous deep learning ... · ables developers to use dl models...

Documents