dfseer: a visual analytics approach to facilitate model ... · ferent demand forecasting models due...

13
DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting Dong Sun 1 , Zezheng Feng 1 , Yuanzhe Chen 2 , Yong Wang 1 , Jia Zeng 2 , Mingxuan Yuan 2 , Ting-Chuen Pong 1 , Huamin Qu 1 1 The Hong Kong University of Science and Technology 2 Noah’s Ark Lab, Huawei Technologies Co. Ltd. 1 {dsunae, zfengak, ywangct, tcpong, huamin}@cse.ust.hk; 2 {chenyuanzhe, zeng.jia, yuan.mingxuan}@huawei.com ABSTRACT Selecting an appropriate model to forecast product demand is critical to the manufacturing industry. However, due to the data complexity, market uncertainty and users’ demanding re- quirements for the model, it is challenging for demand analysts to select a proper model. Although existing model selection methods can reduce the manual burden to some extent, they often fail to present model performance details on individual products and reveal the potential risk of the selected model. This paper presents DFSeer, an interactive visualization sys- tem to conduct reliable model selection for demand forecasting based on the products with similar historical demand. It sup- ports model comparison and selection with different levels of details. Besides, it shows the difference in model performance on similar products to reveal the risk of model selection and in- crease users’ confidence in choosing a forecasting model. Two case studies and interviews with domain experts demonstrate the effectiveness and usability of DFSeer. Author Keywords Interactive visualization; model selection; product demand forecasting; time series. CCS Concepts Human-centered computing Visualization; User inter- face design; Human computer interaction (HCI); INTRODUCTION In a manufacturing company, the forecast of product demand based on the historical demand data is critical to production planning and supply strategy developing [29]. Since there is often no model that is suitable for forecasting all the product demand [1, 22], it has become increasingly important to select appropriate models for demand forecasting. An appropriate model for demand forecasting should be accurate, stable, and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CHI’20, April 25–30, 2020, Honolulu, HI, USA © 2020 Association of Computing Machinery. ACM ISBN 978-1-4503-6708-0/20/04. . . $15.00 DOI: https://doi.org/10.1145/3313831.3376866 suitable for multiple products, which can optimize the alloca- tion of production resources and reduce the risk of both order delay and the overstock of products [29]. However, it is challenging to select a proper model for demand forecasting. First, it is not easy to evaluate and compare dif- ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually involves thousands of products and dozens of models. The domain experts often find it difficult to quickly evaluate and compare different demand forecasting models when the model performance is inconsistent on different products or in differ- ent time periods for the same product. Second, the market uncertainty, which is common in the manufacturing industry, makes it even more difficult to find a reliable demand forecast- ing model [39]. For example, a sudden decline in the price of substitute products may lead to a reduction in the demand, which can result in a large difference between the forecasted and exact demand. Third, it is challenging to find a model that fulfills the high requirements of product demand forecasting. The demand analyst is seeking a model that is accurate for most of the products, stable in the accuracy when forecasting the demand in different months, and applicable to as many products as possible. Some prior research has been conducted on the selection of forecasting models, which mainly focuses on parameter tuning [6], feature selection for modeling [36, 42], and the compari- son of different types of models [23, 30, 45, 52, 56, 40]. These studies provide helpful guidance for model selection and re- duce lots of manual efforts in model selection. However, they often only provide the overall performance of models on the dataset and fail to reveal the performance details on individual products, which is important for decision making in demand forecasting model selection. For example, how accurate is the product demand forecast by different models? How different is the demand forecasting performance of a certain model on different types of products (or different products of the same type)? Are there some models that are suitable for the demand forecasting of most products? In addition, these studies do not reveal the risk of using a specific demand forecasting model. Instead of only getting a ranked list of models, users are also curious about the reliability of the models in forecasting the demand of the products with similar historical demand data,

Upload: others

Post on 22-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

DFSeer: A Visual Analytics Approach to Facilitate ModelSelection for Demand Forecasting

Dong Sun1, Zezheng Feng1, Yuanzhe Chen2, Yong Wang1, Jia Zeng2, Mingxuan Yuan2,Ting-Chuen Pong1, Huamin Qu1

1The Hong Kong University of Science and Technology2Noah’s Ark Lab, Huawei Technologies Co. Ltd.

1{dsunae, zfengak, ywangct, tcpong, huamin}@cse.ust.hk; 2{chenyuanzhe, zeng.jia,yuan.mingxuan}@huawei.com

ABSTRACTSelecting an appropriate model to forecast product demandis critical to the manufacturing industry. However, due to thedata complexity, market uncertainty and users’ demanding re-quirements for the model, it is challenging for demand analyststo select a proper model. Although existing model selectionmethods can reduce the manual burden to some extent, theyoften fail to present model performance details on individualproducts and reveal the potential risk of the selected model.This paper presents DFSeer, an interactive visualization sys-tem to conduct reliable model selection for demand forecastingbased on the products with similar historical demand. It sup-ports model comparison and selection with different levels ofdetails. Besides, it shows the difference in model performanceon similar products to reveal the risk of model selection and in-crease users’ confidence in choosing a forecasting model. Twocase studies and interviews with domain experts demonstratethe effectiveness and usability of DFSeer.

Author KeywordsInteractive visualization; model selection; product demandforecasting; time series.

CCS Concepts•Human-centered computing→ Visualization; User inter-face design; Human computer interaction (HCI);

INTRODUCTIONIn a manufacturing company, the forecast of product demandbased on the historical demand data is critical to productionplanning and supply strategy developing [29]. Since there isoften no model that is suitable for forecasting all the productdemand [1, 22], it has become increasingly important to selectappropriate models for demand forecasting. An appropriatemodel for demand forecasting should be accurate, stable, andPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

CHI’20, April 25–30, 2020, Honolulu, HI, USA

© 2020 Association of Computing Machinery.ACM ISBN 978-1-4503-6708-0/20/04. . . $15.00

DOI: https://doi.org/10.1145/3313831.3376866

suitable for multiple products, which can optimize the alloca-tion of production resources and reduce the risk of both orderdelay and the overstock of products [29].

However, it is challenging to select a proper model for demandforecasting. First, it is not easy to evaluate and compare dif-ferent demand forecasting models due to the data complexity.The demand forecasting in a manufacturing company usuallyinvolves thousands of products and dozens of models. Thedomain experts often find it difficult to quickly evaluate andcompare different demand forecasting models when the modelperformance is inconsistent on different products or in differ-ent time periods for the same product. Second, the marketuncertainty, which is common in the manufacturing industry,makes it even more difficult to find a reliable demand forecast-ing model [39]. For example, a sudden decline in the priceof substitute products may lead to a reduction in the demand,which can result in a large difference between the forecastedand exact demand. Third, it is challenging to find a model thatfulfills the high requirements of product demand forecasting.The demand analyst is seeking a model that is accurate formost of the products, stable in the accuracy when forecastingthe demand in different months, and applicable to as manyproducts as possible.

Some prior research has been conducted on the selection offorecasting models, which mainly focuses on parameter tuning[6], feature selection for modeling [36, 42], and the compari-son of different types of models [23, 30, 45, 52, 56, 40]. Thesestudies provide helpful guidance for model selection and re-duce lots of manual efforts in model selection. However, theyoften only provide the overall performance of models on thedataset and fail to reveal the performance details on individualproducts, which is important for decision making in demandforecasting model selection. For example, how accurate is theproduct demand forecast by different models? How differentis the demand forecasting performance of a certain model ondifferent types of products (or different products of the sametype)? Are there some models that are suitable for the demandforecasting of most products? In addition, these studies do notreveal the risk of using a specific demand forecasting model.Instead of only getting a ranked list of models, users are alsocurious about the reliability of the models in forecasting thedemand of the products with similar historical demand data,

Page 2: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

which is crucial to their choice of an appropriate demand fore-casting model [10, 4]. Employing a high-risk model can leadto a reduction in revenue, loss of customers, and users’ distrustof the model.

In this paper, we propose DFSeer, an interactive visual analyt-ics system to support the comparison and selection of demandforecasting models in the manufacturing industry. Consider-ing that a model with a good average performance on all theproducts may produce a poor forecast for some products, DF-Seer supports model selection according to the forecast of theproducts with similar historical demand data [10, 4, 21]. Ourapproach allows users to compare and select demand forecast-ing models through three levels of details, i.e., an overview,a product view, and a detail view. It can help users easilyidentify the models whose forecasting results are inaccurateor unstable for some products, as well as the models that areonly suitable for a small number of products. Besides, thesystem combines the model performance with the historicaldemand data to reveal the risk of selecting a specific demandforecasting model. It allows users to remove dissimilar prod-ucts identified by integrating their domain knowledge withthe properties of the time series extracted by automated algo-rithms. By showing the difference in model performance onthe products with similar historical demand, DFSeer can helpusers understand the possibility of an accurate forecast by theselected model, which can increase their confidence in theirchoice of the model. To the best of our knowledge, DFSeeris the first visual analytics system to support the interactiveselection of different types of time series forecasting models.

We summarize the major contributions of the work as follows:

• A new workflow that supports demand forecasting modelselection based on the products with similar historical de-mand and reveals the risk of selecting a model to increaseusers’ confidence in adopting the model.• The implementation of a visual analytics system that en-

ables users to compare and select demand forecasting mod-els with different levels of details.• Detailed evaluations of the visualization system, including

two case studies with real-world product demand data andinterviews with four domain experts from a manufacturingcompany.

RELATED WORKPrevious studies related to our work can be divided into threeparts: techniques for time series forecasting, model selectionfor forecasting analysis, and visualization of discrete timeseries.

Techniques for Time Series ForecastingRecently a number of techniques have been proposed and ap-plied to the forecast of time series, which can be categorizedinto three groups according to previous studies [20, 49]: clas-sical techniques for time series modeling, machine learningtechniques, and deep learning techniques.

Classical techniques for time series modeling mainly includesthe autoregressive integrated moving average (ARIMA) model[8] and the exponential smoothing (ES) model [33]. The

ARIMA model [8] computes the difference between two adja-cent values in a time series to convert non-stationary data tostationary ones and utilizes the accumulated error terms to re-duce the random fluctuation in time series forecasting. The ESmodel [33] emphasizes the effect of the recent time series onthe forecast, without ignoring the effect of the time series froma long time ago. Machine learning techniques have also beenapplied to time series forecasting. For example, the k-nearestneighbor regression (KNN) model [3] forecasts the futurevalue of a time series based on its neighboring time series.Other machine learning techniques include the support vectorregression (SVR) model [28], gradient boosting [25], Lasso[53], ridge regression [31] and so on. In addition, inspired bythe excellent performance of deep learning techniques [38],researchers have also introduced deep learning techniques totime series forecasting. For instance, Model-Agnostic Meta-Learning (MAML) [24] can be trained quickly even thoughthere are only a small number of data points. When forecastingthe future values of a time series, the recurrent neural network(RNN) [43] takes both the current value and past ones intoconsideration.

Each of these techniques performs well on certain forecastingtasks, but their performance often declines on other forecastingtasks [1]. In this paper, we build an interactive visual analyticssystem to facilitate the fast selection of demand forecastingmodels and help users easily identify the risk of a model.

Model Selection for Forecasting AnalysisVarious studies have been conducted to facilitate the selectionof forecasting analysis models. These studies mainly consist oftwo types: visualization of forecasting models and automatedapproaches for model selection.

Visual analytics techniques play an important role in helpingusers understand forecasting models. For example, Krauseet al. [37] and Hohman et al. [32] studied the influence ofa feature on the forecast by varying the value of the feature.Visual analytics approaches have also been employed to ex-plain the mechanism of deep learning models [41, 44, 55].Some other studies focus on using visualization techniques tofacilitate the comparison and selection of forecasting models.For instance, TiMoVA [6] provides a detailed comparison ofARIMA models with different parameter settings to seek thebest model. INFUSE [36] visualizes the significance of vari-ous features in forecasting analysis with a glyph design, whereusers can select different features to train a model and comparethe generated models. ATMSeer [56] exploits visualizationtechniques to support the interactive exploration of a model se-lection process. Techniques in the statistics literature, such asthe residual plot [16] and the quantile-quantile (Q-Q) plot [59],have also been employed to evaluate and select models. Theworm plot [11] supports parameter tuning and the selection ofLMS models [15]. Gabry et al. [26] introduced visualizationtechniques to the building of Bayesian models to help modeldevelopers evaluate and compare models. Apart from thesevisualization methods, some automated approaches for modelselection have also been proposed. For example, Auto-WEKA2.0 [35] and H2O AutoML [30] are developed to help datascientists tune the parameters of different algorithms, compare

Page 3: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

the explored models, and find the model that performs the beston the dataset specified by the user.

Although these techniques provide useful guidance for modelselection and reduce the workload to some extent, they are notapplicable to model selection for demand forecasting for tworeasons. First, these approaches are aimed at the optimizedperformance of the whole dataset, which may result in aninaccurate forecast of the demand for some products. Sec-ond, they lack the performance details of models on individualproducts, which is important to decision making in demandforecasting model selection. Our work is proposed to solvethe aforementioned problems, which supports demand fore-casting model selection with different levels of details basedon the forecast of the products with similar historical demand.Besides, it reveals the risk of the selected model by showingthe performance difference on similar products.

Visualization of Discrete Time SeriesVisual analytics approaches are widely used to explore timeseries [46, 5]. According to the survey by Aigner et al. [2],the studies on time series data visualization mainly includesthree parts: ordinal, discrete, and continuous time series visu-alization. We will only discuss the prior work on visualizingdiscrete time series, which is more related to our approach.

One of the most popular visualization techniques for discretetime series is the line chart [47], which describes how datachange over time by connected line segments. An alternativeof the line chart is the horizon graph [48], a space-saving tech-nique that encodes the data by both the color and the height.Besides, the spiral [58] and calendar-based visualization [54]can reveal the periodic features of time series. To support theexploration of large scale time series, plenty of visual analyticstools have been developed. For example, PlanningVis [51] is amulti-level visualization system that juxtaposes heatmaps, linecharts and bar charts to assist factory planners in optimizing aproduction plan.

Inspired by the prior studies, we propose novel visualizationtechniques to show the discrete time series data and supportthe selection of demand forecasting models.

SYSTEM REQUIREMENTS AND DESIGNIn this section, we interviewed four domain experts to un-derstand the problem of model selection for product demandforecasting. Based on the comments and suggestions collectedfrom the interviews, we summarized five requirements to guidethe design of the visualization system.

Interview StudyIn order to understand the workflow of model selection forproduct demand forecasting, identify existing challenges inthe approach currently used, and seek opportunities to improvethe process of model selection, we conducted interviews withfour domain experts from a manufacturing company. Two ofthe domain experts are algorithm developers who apply andtrain various models to forecast the demand of thousands ofproducts, while the other two are demand analysts who arerequired to compare the performance of the models providedby algorithm developers and select the most suitable model.

The interview for each domain expert lasted for one hour,which can be divided into three parts: the introduction to de-mand forecasting, the description of the workflow of modelselection, and the discussion of challenges and user require-ments. In the first part, the domain expert introduced thedemand forecasting problem, the demand data used, and thealgorithms to forecast product demand. In the second part,we asked the domain expert questions about the workflowof model selection for demand forecasting, including whatfeatures will be used to evaluate a model, what weights willbe assigned to these features to rank the models, what infor-mation will be browsed by the domain experts, and so on. Inthe third part, we discussed with the domain expert the prob-lems in the current approach, the existing challenges, theirrequirements on a new approach for model selection, and theopportunities that visual analytics techniques can bring about.

During the interview, the domain experts complained aboutthe difficulty in model selection due to the large number ofdata they needed to explore. The current approach presents theforecast accuracy of each model for all the products in a tabularform, which is often overwhelming and requires a long scrollto explore the data. Also, the domain experts commented thatthey were not sure whether the selected model could producean accurate forecast or not. In addition, they agreed with us onsupporting model selection based on the products with similarhistorical demand. They commented that it follows the rule ofproduct lifecycle [50], and hoped that the new approach canmake full use of previous forecasting experience.

Design RequirementsBased on the feedback collected from the interview study, wesummarized the following five design requirements.

R1. Allow interactive model ranking. The system shouldsupport the interactions that allow users to adjust the weightsof different features for ranking demand forecasting models.In addition, it should also enable users to manually removedissimilar products identified based on the features of thehistorical demand data extracted by automated algorithms.

R2. Enable model selection based on the forecast for simi-lar products. The system should allow users to interactivelyselect the products with similar historical demand and helpthem find the models with the best performance on these prod-ucts. This can take advantage of previous forecasting experi-ence and alleviate the problem that a model selected accordingto the average performance on all the products may producean inaccurate forecast for some products.

R3. Present the performance of models on different prod-ucts. Besides the overall performance of models on the se-lected products, the system should also show the performanceof models on individual products, which is helpful for users toidentify the drawback of a model and seek alternative models.

R4. Reveal the risk of an inaccurate forecast. When apply-ing a demand forecasting model, users have no idea whetherthe forecast is reliable or not. Revealing the risk of an inaccu-rate forecast can prevent the adoption of an unreliable modeland increase the confidence in model selection.

Page 4: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

Figure 1. An overview of the DFSeer system, which mainly consists ofthree modules: data generation, data processing and data visualization.

R5. Support multi-level exploration. Displaying the largequantity of data related to demand forecasting on a limitedscreen is overwhelming and can result in visual clutter. Thesystem should support the exploration of demand forecastingdata with levels of details to reduce the manual effort required.

DFSEERIn this section, we introduce DFSeer, an interactive visualanalytics system that supports model selection for demandforecasting and reveals the risk of the selected model. Thissection first gives an overview of DFSeer and then describeseach component of the visualization system in detail.

System OverviewFigure 1 presents an overview of the DFSeer system, whichcan be divided into three modules: data generation, data pro-cessing, and data visualization.

The data generation module forecasts the demand for productsin the next month with various kinds of forecasting algorithms.Three analytical tasks are then performed in the data process-ing module. First, the Euclidean Distance [21] is utilizedto measure the similarity between the z-normalized [27] his-torical demand data. Second, we compute the performanceindicators to evaluate and rank different demand forecastingmodels. These performance indicators include the accuracyof the forecast in the next month, the variance of the forecastaccuracy in the recent ten months, and the number of prod-ucts to which a demand forecasting model is applicable. Tobe more specific, a model is applicable to a product when itis among the most accurate k models, where k is defined byusers. Third, automated algorithms are introduced to extractthe properties of the historical demand data, including thetrend, the seasonality, the autocorrelation, and the stationarity.The demand data and the information generated by the dataprocessing module are then sent to the visualization module.

There are five components in the data visualization module: (a)an overview which allows rerunning the algorithm to forecastthe demand in another month, supports interactively adjust-ing the weight for ranking demand forecasting models, andsummarizes the performance of different models on the prod-ucts selected by users, (b) a similarity view which encodesthe similarity between time series with the distance betweenthe points in a scatter plot and allows users to select clustersfor detailed analysis, (c) a product view which displays the

detailed performance of models on individual products with anovel glyph design, (d) a detail view which presents the histor-ical demand data of a user-specified product and the accuracyof the forecasts in past months by the top k models, and (e)a risk identification view which shows the features of similarhistorical demand data, and assists users in evaluating the riskof a model. The visualization module also supports severalinteractions such as linked-filtering and the tooltip.

With DFSeer, the user can compare and select different typesof demand forecasting models with levels of details. Besides,the methodology applied in model selection (i.e., model selec-tion is based on the performance of models on the productswith similar historical demand data) can increase the confi-dence of users in the selected model.

Ranking Models InteractivelyDFSeer enables users to adjust the weight of the performanceindicators for ranking demand forecasting models (R1). Themodels are re-ranked according to the adjustment, with thesummarized performance indicators of each model presentedin a ranked list. This presents the macroscopic information tosupport model selection for demand forecasting (R2, R5).

As shown in the upper part of the overview (Figure 2a1),three sliders are provided for users to adjust the weight ofthe forecast accuracy, the variance of the forecast accuracyin recent months, and the number of products to which ademand forecasting model is applicable. There is anotherslider for parameter “top k”, which indicates that among themodels forecasting the demand of a product, only the mostaccurate k models are considered applicable to that product.The ranked list of the demand forecasting models is shownbelow the sliders (Figure 2a2), which summarizes the modelperformance on the products selected by users. For eachmodel, the leftmost and rightmost rightward bars display theaccuracy of the forecast and the number of the products towhich a demand forecasting model is applicable respectively.The circle in the middle encodes the variance of the forecastaccuracy in recent months with the color saturation, whichis consistent with other views. Only the top k models in theranked list are depicted by different colors other than gray.

Selecting Similar ProductsThe similarity view (Figure 2b) allows users to select theproducts with similar historical demand data in a scatter plot(R2). We employ the Euclidean Distance [21] to measurethe similarity between the z-normalized [27] time series. TheEuclidean Distance is adopted because it is efficient [18] andcan handle noisy data. Then, multidimensional scaling (MDS)[7] is applied to map the time series to a scatter plot, where therelative distance between two points indicates the similaritybetween two time series. In the scatter plot, the red pointsrepresent the time series to be forecasted, while the light purplepoints encode the time series forecasted before. Users canselect a group of adjacent points by drawing a lasso to explorethe performance of models on individual products.

Exploring Model Performance on ProductsAfter selecting a cluster of time series of interest, users canview the performance of models on individual products and

Page 5: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

Figure 2. DFSeer supports interactive model selection for demand forecasting and reveals the risk of the selected model. (a) The overview enables usersto adjust the parameters for ranking demand forecasting models (a1) and presents the ranked list of models (a2). (b) The similarity view allows users toselect the products with similar historical demand data in a scatter plot. (c) The product view encodes the model performance on individual productswith bar charts (c1) and carefully-designed glyphs (c2). (d) The detail view shows the monthly forecast accuracy of the top k models (d1), and the realand forecasted demand of a user-selected product (d2). The right part of d2 is enlarged and shown in d3. (e) The risk identification view presents theextracted properties of the historical demand (e1, e2) and the model performance difference on similar products to reveal the risk of the selected model.

further explore the forecast details of a user-specified productin recent months (R3). This enables users to select demandforecasting models by comparing their mesoscopic and micro-scopic information (R2, R5).

For the time series to be forecasted, the product view (Fig-ure 2c) employs a bar chart (Figure 2c1) to display the fore-casted value of the top k models in the overview. The orderof the bars is the same as that in the ranked list. For the timeseries forecasted before, we exploit a well-crafted glyph de-sign (Figure 2c2) to describe the performance of the forecastsby different models. The current visual design is inspired byprior studies on multi-dimensional data visualization [62, 61,12, 57] and we worked closely with domain experts to collecttheir feedback on the visual design in an iterative manner. Theglyph takes a circular design, which encodes the most accu-rate k models by different sectors. The sectors are arrangedclockwise according to the forecast accuracy. In each sector,the inner arc depicts the variance of the forecast accuracy inrecent months with the color saturation, while the outer arcrepresents the accuracy of the forecast by the radius. Thecolor of the outer arc encodes the model type. In the left partof the glyph, a violin plot shows the accuracy distribution ofother models. However, the circular glyph design is not veryconvenient for comparing model performance on differentproducts. DFSeer also supports a bar chart-based glyph design(Figure 3b), where the upper bars depict the forecast accu-racy of the top k models and the lower rectangles encode thevariance of the forecast accuracy with the color saturation. Amarginal density plot on the right shows the accuracy distribu-

Figure 3. The glyph designs to describe the model performance on eachproduct: (a) the circular glyph design used in DFSeer, (b) a bar chart-based glyph design, (c) a matrix view-based glyph design, and (d) a starplot-based glyph design.

tion of other models. Since the circular glyph design can helpusers distinguish the performance indicators on the productsforecasted before (Figure 2c2) from the forecasting results offuture product demand (the bar chart shown in Figure 2c1),and it is also visually attractive, DFSeer uses the circular glyphdesign as the default choice. But users can interactively switchto the bar chart-based glyph design.

Two alternatives of the glyph are considered in our iterativedesign process, as shown in Figure 3. Our earliest attempt isto use a matrix view (Figure 3c) to describe the performanceof models on individual products. Each row of the matrixrepresents a product, while each column represents a model.However, much manual effort is required to find the most suit-able models for each product. Next, a star plot (Figure 3d) isadopted to show the performance of different models, where

Page 6: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

each star denotes a product, and each spoke denotes the per-formance of a model. However, showing dozens of spokesis overwhelming and it is also not easy to identify the mostaccurate k models for each product in the star plot.

After users click a bar chart (i.e., the time series to be fore-casted) in the product view, the detailed performance of thetop k models, the actual demand data and the latest forecast inthe past months will be shown in the detail view (Figure 2d).Similar to the overview, a bar chart (Figure 2d1) is used toexpress the accuracy of the forecasts in the past months. Thehorizontal axis of the bar chart encodes the month, while theheight encodes the accuracy. The bar charts are verticallyaligned, whose order and color are consistent with those in theoverview. In addition, users can also inspect the raw data (i.e.,the actual and forecasted demand) in a multi-line chart belowthe bar charts (Figure 2d2).

Identifying the Risk of the Selected ModelWhen users select a model to forecast the demand, it is criticalto reveal the risk of the selected model. The risk identificationview (Figure 2e) is proposed to extract the properties of thetime series (i.e., the historical demand of products) that aresimilar to the one that users specify in the product view andpresent the performance differences of the top k models onthese time series. It can help users identify the probability thatthe selected model will produce an accurate forecast (R4).

To find the properties of a time series that are important forthe forecast, we study how classical time series models makea forecast [34, 17] and discuss with domain experts how theyjudge whether two time series are similar. We then identifyfour key properties of a time series, i.e., the trend, the season-ality, the autocorrelation, and the stationarity. To extract theproperties of a time series, we first apply an additive model[14] to decompose a time series to get the trend and seasonalitycomponents. Later, the Autocorrelation Function (ACF) plot[9] is utilized to analyze the correlation between the demand indifferent months. Additionally, the augmented Dickey-Fuller(ADF) test [13] can reveal the stationarity of a time series.

In the risk identification view, the first row presents the prop-erties of the time series to be forecasted, which is specified byusers. The following rows show the properties of the time se-ries forecasted before and the performance of the top k models,which are ranked by the similarity between the time series andthe user-specified one. In the left part of the risk identificationview (Figure 2e1), a line chart shows the historical demand,overlaid with an area chart to display the trend of the time se-ries. Figure 2e2 utilizes a black line to exhibit the seasonalityof a time series, which is overlaid with a bar chart to presentthe autocorrelation function. The two light blue lines indicatethe 95% confidence interval of the autocorrelation. Besides,a rectangle is placed in the right part of Figure 2e2 to encodethe p-value output by the ADF test. The rectangle is green ifthe historical demand data is stationary, or white if the datais not stationary. A vertically arranged parallel coordinatesplot in the right part of the risk identification view (Figure 2e3)shows the performance of the top k models in forecasting thedemand of the similar products. In the parallel coordinatesplot, each horizontal axis represents a time series forecasted

Figure 4. (a) item1757 gains a significantly high demand forecastingvalue by myarimax in August 2018. (b) An abrupt sharp increase indemand in June 2018 results in low forecast accuracy. (c) myarimaxachieves a much better demand forecasting accuracy in July 2018 thanother top models, indicating its excellent sensitivity in learning an abruptdemand increase.

before, while each of the top k models is denoted as a polyline.The intersection of the polyline and the axis is displayed asa circle, whose horizontal position encodes the forecast accu-racy and color saturation encodes the variance of the forecastaccuracy in recent months. We encode the variance with thecolor saturation instead of the size of a circle or a bar to avoidpossible occlusion. When exploring the time series forecastedbefore and the performance of the top k models, users areallowed to remove a time series if they think it is dissimilar tothe time series to be forecasted.

CASE STUDYIn this section, we conducted two case studies to demonstratethe effectiveness of DFSeer. The users involved in the casestudies are the domain experts from the manufacturing com-pany we collaborate with, who also attended the interviewstudy in the next section. The dataset used consists of monthlydemand of 1,096 products from January 2015 to November2018. The actual names and types of the products are removeddue to privacy issues.

Inspecting Top Models: Best Model Does Not Work for AllIn the first case study, a user was asked to select a suitablemodel to forecast the demand in August 2018 for any clusterof products of his interest. After loading the data to DFSeerand setting the forecast period, he immediately found a clusterfrom the similarity view (Figure 2b). The user first set theparameter top k to 5 in the overview and filtered the datato gain a rough idea of the model performance (R5). Fromthe ranked list of models, the user quickly noticed that theforecast accuracy of different models is a bit low. Therefore, hefurther increased the weights of accuracy and applicability tocheck which model can lead to higher average accuracy (R1).

Page 7: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

Figure 5. Identify the difference between “similar” models by using DFSeer: The user found that the top 2 models (i.e., cat and knn) have very similaraverage performance (a2) and an overall similar distribution of ranking among the top 5 models for the selected products (c2). He then clicked anexample product of type76 (i.e., item1612) and found the two models have similar variance of the monthly accuracy (d1) and a close trend of the forecastaccuracy for the similar products (e). But the forecasted demand by cat is consistently lower than that by knn (d2 or d3). To mitigate the shortage of thistype of key products, he selected knn to forecast the demand for the products of type76.

The ranked list (Figure 2a2) shows that the model myarimax(e.g., a manually tuned ARIMAX model [60]) achieves thehighest average accuracy. Meanwhile, the circular glyphs inthe product view (Figure 2c2) also indicate that myarimaxconsistently ranks as one of the top 5 models when forecastingthe demand of the majority of the similar products shown in theproduct view (R3). By quickly clicking the similar products toexplore the performance details of the top 5 models, the userobserved that myarimax is more accurate than other top modelswhen forecasting the demand in different months (Figure 2d1)and the forecasted demand by the model is closer to the realone (Figure 2d2 or 2d3). He also compared the performance ofthe top 5 models on similar products in the risk identificationview and found that myarimax is more accurate than othertop models for most products (Figure 2e3). Thus, the userbecame more confidence in myarimax and decided to considermyarimax as a candidate model for forecasting the demand ofmost of the selected products (R2).

However, the user also quickly identified some abnormal fore-casts from the product view. The red bars of two products (i.e.,item1757 and item1830) are surrounded by a black border, asshown in Figure 2c1. It means that the product demand fore-casted by myarimax exceeds a threshold predefined based ondomain knowledge. So the user chose one of the two products(i.e., item1757) for further analysis. By looking at the detailview, the user found that item1757 and other products of thesame type (i.e., type53) usually have small demand values, butonly item1757 gains a significantly high demand forecastingvalue by myarimax (Figure 4a). Such an abnormal observa-

tion gives the user the impression that the forecasting resultof item1757 by myarimax may be problematic. By furtherchecking the historical demand of item1757, the user easilyobserved that item1757 has an abrupt sharp increase in de-mand in June (Figure 4(b)) and July (Figure 4(c)). myarimaxachieves a much better demand forecasting accuracy in Julythan other top models, indicating its excellent sensitivity inlearning an abrupt demand increase. However, it overratesthis trend and forecasts an abnormally high product demandof item1757 in August. For the other product (i.e., item1830),the user also observed similar issues.

These observations from DFSeer informed the user of thepotential risk of using myarimax to forecast product demand(R4). Although it is the “best” model in terms of the overallperformance, it can also overrate a sudden increase in productdemand, resulting in demand forecasting failures.

Breaking the Tie: “Similar” Models Can Be DifferentIn this case study, another user was asked to finish exactly thesame task as the first case study. But he selected a differentgroup of similar products by brushing in the similarity view,as is shown in Figure 5b.

By browsing the product view (Figure 5c1 and 5c2), the usereasily noticed that many products are belonging to type76.Since the products of type76 are the sub-components for pro-ducing other products and play a critical role in the wholesupply chain of the factory, a demand forecasting model thatis both accurate and stable will be preferred. Therefore, the

Page 8: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

user significantly increased the weight of accuracy and vari-ance (Figure 5a1) to favor the demand forecasting models thathave good average forecast accuracy and also perform con-sistently well along the time (R1). The ranked list of models(Figure 5a2) shows that the model cat (i.e., CatBoost [19])and the model knn (i.e., the KNN model [3]) rank the first andsecond, respectively. But the two models have very similarperformance in terms of the average accuracy and variance,making it difficult to distinguish which model is actually moreappropriate for the selected products. Therefore, he furtherchecked the performance distribution of the top five modelson the selected products whose demand was forecasted before(R3, R5). But the user still found that an overall similar dis-tribution of cat and knn models that rank within the top fivemodels for the selected products, as shown in Figure 5c2.

However, when the user continued to closely explore the prod-uct view, he quickly observed that cat (the red bars) tends toalways forecast lower future demand for the selected products(especially the products of type76) than knn (the blue bars), asshown in Figure 5c1. This phenomenon is beyond the expecta-tion of the user, since both models achieved similar accuracyand variance for different products. Therefore, he selected aproduct of type76 (i.e., item1612) from the product view toinspect more details. The detailed view shows that cat and knnhave a very similar demand forecasting accuracy along thetime for item1612 (Figure 5d1), and their demand forecastingresults for other products with similar historical demand sharea close trend (Figure 5e). However, when the user carefullychecked the line charts of the forecasting results in the detailview (Figure 5d2 or 5d3), he clearly observed that the demandforecasting result by cat is consistently lower than that byknn along the time regardless of the drop and rise of the actualdemand (R4). Such an observation was also found for otherproducts of type76.

As mentioned above, the products of type76 are critical inthe whole supply chain of the factory. The shortage of theseproducts can bring serious trouble to the manufacturing ofthe factory. Therefore, despite the similarity between cat andknn, the user finally decided to choose knn as the demandforecasting model for the products of type76, since it often hashigher demand forecasts than cat (R2). Though such a choicecan increase the risk of overstock, it can better mitigate theshortage of these key products.

EXPERT INTERVIEWWe conducted semi-structured interviews with four domainexperts working on model selection for demand forecasting todemonstrate the effectiveness and usability of DFSeer. Thissection first introduces the interview procedures. We thensummarize the feedback collected from the experts.

Participants and apparatusThe participants include four domain experts from the man-ufacturing company we work with. The domain experts aredifferent from the experts attending the requirement analysisinterview and haven’t participated in the design of the DFSeersystem. Two of the domain experts (E1 and E2) are algorithm

Q1 Find the most accurate demand forecasting model.Q2 Find the most stable demand forecasting model.

Q3 Find the demand forecasting model which has thebest applicability.

Q4 Find the model whose forecast for the target month isdifferent from other top k models.

Q5 Find the product for whom the forecast by the top kmodels is the most accurate.

Q6 For a given product, find the most unstable modelamong the top k models.

Q7 For a given product, find the model whose forecast isthe closest to the real demand.

Q8 Find the product whose historical demand is the mostdissimilar to the selected one.

Q9Given the products which are similar to a selectedone, find the product whose forecasted demand is themost accurate.

Table 1. The questions in T1 are related to model performance in differ-ent levels of details: the overview (Q1 - Q3), the product view (Q4 - Q5),the detail view (Q6 - Q7), and the risk identification view (Q8 - Q9).

developers who are working on developing demand forecast-ing algorithms, while the other two (E3 and E4) are demandanalysts who are responsible for selecting the most suitabledemand forecasting models for different products from dozensof algorithms provided by algorithm developers. The inter-views were conducted with a 23.8-inch 1920 x 1080 monitorshowing the visual analytics system.

Dataset and TasksThe dataset used in the expert interviews is the same as thatin the case studies, which contains the monthly demand dataof 1,096 products from January 2015 to November 2018. Dur-ing the interviews, we asked the domain experts to fulfill twocarefully-designed tasks to assess the usability of the visual-ization system and understand how the domain expert selectsa demand forecasting model.

• T1. Analyze the result of model selection for a group ofsimilar products.• T2. Select suitable models to forecast the demand of pre-

selected products.

In T1, we plan to select a model to forecast the demand ofproducts in August 2018 and pre-select a group of similarproducts. The corresponding model performance on theseproducts is displayed in the visualization system. The domainexperts are then asked to answer nine questions related to themodel performance in different levels of details, as shownin Table 1. In T2, the domain experts are required to findthe best model for two types of products, i.e., the generalproducts which emphasize more on the accuracy of a demandforecasting model (e.g., switches and indicator lights), andthe products that are important to the supply chain of thefactory and need a more stable model (e.g., high-speed opticalmodules and digital signal controllers).

ProcedureDuring the interview, we first introduced the methodology ofdemand forecasting model selection, the visual design and

Page 9: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

Q1 The model selection based on the products with sim-ilar historical demand is reasonable.

Q2 The workflow provides useful guidance for me toselect suitable demand forecasting models.

Q3 The workflow can reduce my workload in modelselection for product demand forecasting.

Q4 I have confidence in the model selected by using thevisualization system.

Q5 The visual design is intuitive.Q6 The visual design is easy to understand.

Q7 The visual design provides enough information forme to select a demand forecasting model.

Q8 The visual design and interactions help me compareand select demand forecasting models.

Q9 The visual analytics system is easy to use.

Q10 The visual analytics system is helpful for demandforecasting model selection.

Q11 I would like to use the visual analytics system toselect demand forecasting models in the future.

Q12 I will recommend the visual analytics system to mycolleagues working on forecasting model selection.

Table 2. The questionnaire consists of three parts: the workflow andtrust (Q1 - Q4), the visual design and interactions (Q5 - Q8), and theusability (Q9 - Q12).

interactions, and the functionality of each part of the DFSeersystem to the domain experts. After that, an example usagescenario was given to help them understand how to use the vi-sualization system to select demand forecasting models. Then,the participants were allowed to explore the system on theirown. The aforementioned tutorial lasted for about 20 minutes.After the tutorial, the domain experts were invited to discusshow they select a demand forecasting model and what infor-mation is used to support their decision in their daily work.Then, the participants were guided to carry out two tasks. Bothtasks employed a think-aloud protocol, and the comments andsuggestions of the domain experts were all recorded. Afteraccomplishing the two tasks, the participants were asked todescribe the workflow of using the DFSeer system to selectdemand forecasting models. They were also encouraged tocompare DFSeer with the approach currently used in theirdaily work. Then, we invited the participants to rate the vi-sual analytics system on a 7-point Likert scale (1 - stronglydisagree, 7 - strongly agree) from three aspects. The detailedquestions are shown in Table 2. Finally, the domain expertsattended a post-study interview where they gave commentsand suggestions on the workflow and trust, the visual designand interactions, and the usability of the DFSeer system. Thestudy for each domain expert lasted for approximately onehour and 20 minutes.

ResultsWe summarize the feedback of the interviews and the result ofthe questionnaire into three parts:

Workflow and trust. All the domain experts agreed thatthe workflow for demand forecasting model selection in ourwork is helpful and increases their trust in the selected model.E3 said “In our daily work, model selection is based on the

average accuracy of a model. It is difficult for me to trustthat the model would give an accurate forecast for a certainproduct.” He commented the proposed workflow to supportthe selection of models based on their performance on similarproducts is reasonable and trustful, since the algorithm willoften produce similar results for similar time series data [10,4, 21] and the change of demand over months follows therule of product lifecycle [50]. E2 mentioned “Having seen thedetailed model performance on similar products, I have a betterunderstanding of the models and trust the model I selected.” E4reported it was difficult to manually identify the products withdissimilar historical demand. With the properties extracted andpresented in the risk identification view, he was able to finddissimilar time series without much effort. When performingT2, E1 found the model he selected is different from that byothers. After checking the risk identification view, he realizedmany dissimilar products were included. He then shrunk thescope of selecting similar products and got a suitable model.E1 commented “The risk identification view helps me checkwhether the scope of selecting similar products is suitable.”

The domain experts also gave suggestions on the selection ofproducts with similar historical demand data. E2 suggestedinvolving the relationship between different types of productsin the computation of similarity. E4 mentioned there are toomany points in the similarity view and recommended that thesystem can further filter products by the type of products andthe relationship between them.

Visual design and interactions. The domain experts reportedthat the visual design is informative and the interactions areuser-friendly. E3 compared DFSeer with the approach cur-rently employed in their work and commented “DFSeer pro-vides an intuitive and easy way for me to explore the perfor-mance of models. Without it, I need to scroll a large tabularform to search for the information I need.” E4 liked the glyphdesign in the product view and mentioned “The bar chart canreveal which model produces a different forecast, while thecircular glyph shows which model is more accurate or stable.”He also stated that the glyph design is in line with their dailywork, where the domain experts only focused on several top-ranked models. The domain experts were surprised by thecapability of the risk identification view. E1 found there wereseveral products with few historical demand records and re-moved them from the risk identification view. He commented“the tool combines the power of automated algorithms and ourknowledge to judge whether a product is similar or not”. E2mentioned the interactions are easy and powerful. He said“Through the interactions with the visualization system, includ-ing adjusting the weights for ranking models, selecting similarproducts, and removing dissimilar historical demand data, Iget highly involved in how the models are ranked and gain abetter understanding of the models.”

Despite the positive feedback, the domain experts commentedthat the product view and the risk identification view had asteep learning curve, especially for the users who have littleknowledge of visualization.

Usability. The domain experts confirmed that the DFSeersystem is easy to use and can facilitate their work in model se-

Page 10: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

lection for demand forecasting. It took them about 25 minutesto finish the tasks (including the task introduction), where notask was skipped. Only one participant made an error whenanswering the objective questions in T1 (i.e., Q5), which isbecause there are two products with very similar model per-formance. This shows that DFSeer helps analyze the modelperformance on similar products. E4 said “Compared with ourdaily work where the similarity between products is manuallydetermined based on the type of products, DFSeer provides amore reasonable and flexible manner for finding similar prod-ucts.” The domain experts agreed that DFSeer provides helpfulguidance for them to find an appropriate demand forecastingmodel and lightens their workload a lot. E3 mentioned thevisual analytics system was easy to learn and use. He appre-ciated the nice experience of using the system, which wasbetter than using the tedious tabular form. The domain expertsplanned to deploy the DFSeer system in the manufacturingfactory to facilitate their daily work of model selection. Theyagreed it is worth spending around 20 minutes learning to usethe system.

Besides the positive comments on the usability of the DFSeersystem, the domain experts also put forward several construc-tive suggestions. E4 expressed that the system could involveother advanced similarity calculation methods and allow usersto choose which method to use. E1 suggested there was aneed for post-processing which can provide a feasible demandvalue when the selected model offers an abnormal forecast.

DISCUSSIONAlthough DFSeer has shown its ability in facilitating modelselection for demand forecasting, limitations still exist. In thissection, we discuss the limitations from the perspectives ofvisual design scalability, lessons learned, and generalizabilityand extensions.

Visual Design ScalabilityThe similarity view of the DFSeer system displays 3,288 timeseries for users to select similar products. To present a largernumber of time series, measures shall be taken to reduce theoverlap of data points. For example, we can adopt a hierarchi-cal strategy which first asks users to assign the type of productsand specify the range of distance for searching similar prod-ucts, and then shows the products inside this range. The visualanalytics system only presents the detailed performance ofless than ten models. This is in accordance with the workflowof selecting demand forecasting models in the manufacturingfactory, where domain experts only focus on several modelswith the best performance. Showing the detailed performanceof all the models could increase the visual burden of users.

Lessons LearnedBy working with domain experts, we realize that showing theinternal structure of models to demand analysts may not bethat useful. In our study, the demand analysts often do nothave a strong background in machine learning models. Hence,instead of showing the internal structure of models, DFSeervisualizes the patterns of the input (i.e., time series), the output(i.e., forecasted demand and performance indicators), andthe correlations between them. Besides, we find the model

ranked first by the algorithm may not always be the best. Forexample, in our second case study, the domain experts selectedthe model ranked second because compared with the modelranked first, it can relieve the shortage of the products that arecritical to the supply chain. In addition, our work shows thatmodel transparency is highly appreciated and needed in modelselection for demand forecasting. Instead of only being toldto use a model, users also want to know why some time seriesare considered similar and how the model performance is onthese similar products.

Generalizability and ExtensionsAlthough in our study, DFSeer is used to support model se-lection for demand forecasting, it can be adapted to otherapplication scenarios, such as stock market prediction, powerusage prediction, traffic flow forecast, and medical diagnosis.For example, in the scenario of power usage prediction, DF-Seer can group residents based on their residence locations andincomes, and help power managers find suitable predictionmodels for different groups of residents and identify poten-tial power shortages. In the scenario of medical diagnosis,DFSeer can be adapted to diagnosing the illness accordingto the cases with similar symptoms, which can help doctorsmake a diagnosis by summarizing the predictions of multiplemodels. DFSeer can also be extended to increase its powerin model selection for demand forecasting. First, other com-petitive time series forecasting models can be involved in oursystem. Second, the relationship between different types ofproducts can be considered when computing the similaritybetween time series. Third, the system can also provide moredistance calculation strategies for users to choose from.

CONCLUSION AND FUTURE WORKWe have presented DFSeer, a visual analytics system to helpdata analysts select demand forecasting models based on theproducts with similar historical demand. The visualizationsystem provides different levels of details for users to exploreand compare the performance of demand forecasting models.Besides, the DFSeer system can reveal the risk of a model byshowing the performance difference of the model on similarproducts. We evaluated the visual analytics system throughtwo case studies with product demand data and semi-structuredinterviews with four domain experts working on model selec-tion for demand forecasting in a manufacturing factory, whichshows that DFSeer is helpful and effective in facilitating modelselection for demand forecasting.

In future work, we plan to include more time series forecastingalgorithms in the visualization system. Besides, we would liketo consider the relationship between different types of productswhen computing the distance between the historical demanddata. We also want to enable users to filter products by typein the similarity view. Finally, we intend to provide otheradvanced approaches to compute the distance between timeseries and allow users to select the approach on their own.

ACKNOWLEDGMENTSWe thank all the domain experts participating in the study andthe reviewers for their constructive suggestions. This work ispartially funded by RGC GRF 16241916.

Page 11: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

REFERENCES[1] Nesreen K Ahmed, Amir F Atiya, Neamat El Gayar, and

Hisham El-Shishiny. 2010. An empirical comparison ofmachine learning models for time series forecasting.Econometric Reviews 29, 5-6 (2010), 594–621.

[2] Wolfgang Aigner, Silvia Miksch, Heidrun Schumann,and Christian Tominski. 2011. Visualization oftime-oriented data. Springer Science & Business Media.

[3] Naomi S Altman. 1992. An introduction to kernel andnearest-neighbor nonparametric regression. TheAmerican Statistician 46, 3 (1992), 175–185.

[4] Francisco Martinez Alvarez, Alicia Troncoso, Jose CRiquelme, and Jesus S Aguilar Ruiz. 2010. Energy timeseries forecasting based on pattern sequence similarity.IEEE Transactions on Knowledge and Data Engineering23, 8 (2010), 1230–1243.

[5] Benjamin Bach, Pierre Dragicevic, Daniel Archambault,Christophe Hurter, and Sheelagh Carpendale. 2014. Areview of temporal data visualizations based onspace-time cube operations.

[6] Markus Bögl, Wolfgang Aigner, Peter Filzmoser, TimLammarsch, Silvia Miksch, and Alexander Rind. 2013.Visual analytics for model selection in time seriesanalysis. IEEE Transactions on Visualization andComputer Graphics 19, 12 (2013), 2237–2246.

[7] Ingwer Borg and Patrick Groenen. 2003. Modernmultidimensional scaling: Theory and applications.Journal of Educational Measurement 40, 3 (2003),277–280.

[8] George EP Box, Gwilym M Jenkins, Gregory C Reinsel,and Greta M Ljung. 2015. Time Series Analysis:Forecasting and Control. John Wiley & Sons.

[9] Peter J Brockwell and Richard A Davis. 2016.Introduction to Time Series and Forecasting. springer.

[10] Paolo Buono, Catherine Plaisant, Adalberto Simeone,Aleks Aris, Ben Shneiderman, Galit Shmueli, andWolfgang Jank. 2007. Similarity-based forecasting withsimultaneous previews: A river plot interface for timeseries forecasting. In 2007 11th InternationalConference Information Visualization (IV’07). IEEE,191–196.

[11] Stef van Buuren and Miranda Fredriks. 2001. Wormplot: a simple diagnostic device for modelling growthreference curves. Statistics in Medicine 20, 8 (2001),1259–1277.

[12] Yuanzhe Chen, Qing Chen, Mingqian Zhao, SebastienBoyer, Kalyan Veeramachaneni, and Huamin Qu. 2016.DropoutSeer: Visualizing learning patterns in MassiveOpen Online Courses for dropout reasoning andprediction. In 2016 IEEE Conference on Visual AnalyticsScience and Technology (VAST). IEEE, 111–120.

[13] Yin-Wong Cheung and Kon S Lai. 1995. Lag order andcritical values of the augmented Dickey–Fuller test.

Journal of Business & Economic Statistics 13, 3 (1995),277–280.

[14] William P Cleveland and George C Tiao. 1976.Decomposition of seasonal time series: A model for theCensus X-11 program. J. Amer. Statist. Assoc. 71, 355(1976), 581–587.

[15] Timothy J Cole and Pamela J Green. 1992. Smoothingreference centile curves: the LMS method and penalizedlikelihood. Statistics in Medicine 11, 10 (1992),1305–1319.

[16] R Dennis Cook and Sanford Weisberg. 1982. Residualsand influence in regression. New York: Chapman andHall.

[17] Alysha M De Livera, Rob J Hyndman, and Ralph DSnyder. 2011. Forecasting time series with complexseasonal patterns using exponential smoothing. J. Amer.Statist. Assoc. 106, 496 (2011), 1513–1527.

[18] Hui Ding, Goce Trajcevski, Peter Scheuermann,Xiaoyue Wang, and Eamonn Keogh. 2008. Querying andmining of time series data: experimental comparison ofrepresentations and distance measures. Proceedings ofthe VLDB Endowment 1, 2 (2008), 1542–1552.

[19] Anna Veronika Dorogush, Vasily Ershov, and AndreyGulin. 2018. CatBoost: gradient boosting withcategorical features support. arXiv preprintarXiv:1810.11363 (2018).

[20] Christos Faloutsos, Jan Gasthaus, Tim Januschowski,and Yuyang Wang. 2018. Forecasting big time series:old and new. Proceedings of the VLDB Endowment 11,12 (2018), 2102–2105.

[21] Christos Faloutsos, Mudumbai Ranganathan, and YannisManolopoulos. 1994. Fast Subsequence Matching inTime-series Databases. Vol. 23. ACM.

[22] Manuel Fernández-Delgado, Eva Cernadas, SenénBarro, and Dinani Amorim. 2014. Do we need hundredsof classifiers to solve real world classification problems?The Journal of Machine Learning Research 15, 1 (2014),3133–3181.

[23] Matthias Feurer, Aaron Klein, Katharina Eggensperger,Jost Springenberg, Manuel Blum, and Frank Hutter.2015. Efficient and robust automated machine learning.In Advances in Neural Information Processing Systems.2962–2970.

[24] Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017.Model-agnostic meta-learning for fast adaptation ofdeep networks. In Proceedings of the 34th InternationalConference on Machine Learning-Volume 70. JMLR.org, 1126–1135.

[25] Jerome H Friedman. 2001. Greedy functionapproximation: a gradient boosting machine. Annals ofStatistics (2001), 1189–1232.

Page 12: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

[26] Jonah Gabry, Daniel Simpson, Aki Vehtari, MichaelBetancourt, and Andrew Gelman. 2019. Visualization inBayesian workflow. Journal of the Royal StatisticalSociety: Series A (Statistics in Society) 182, 2 (2019),389–402.

[27] Dina Q Goldin and Paris C Kanellakis. 1995. Onsimilarity queries for time-series data: constraintspecification and implementation. In InternationalConference on Principles and Practice of ConstraintProgramming. Springer, 137–153.

[28] Steve R Gunn and others. 1998. Support vectormachines for classification and regression. ISISTechnical Report 14, 1 (1998), 5–16.

[29] Anshuman Gupta and Costas D Maranas. 2003.Managing demand uncertainty in supply chain planning.Computers & Chemical Engineering 27, 8-9 (2003),1219–1227.

[30] H2O.ai. 2017. AutoML. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html.(2017). Accessed: 2019-08-11.

[31] Arthur E Hoerl and Robert W Kennard. 1970. Ridgeregression: Biased estimation for nonorthogonalproblems. Technometrics 12, 1 (1970), 55–67.

[32] Fred Hohman, Andrew Head, Rich Caruana, RobertDeLine, and Steven M Drucker. 2019. Gamut: A designprobe to understand how data scientists understandmachine learning models. In Proceedings of the 2019CHI Conference on Human Factors in ComputingSystems. ACM, 579.

[33] Rob Hyndman, Anne B Koehler, J Keith Ord, andRalph D Snyder. 2008. Forecasting with ExponentialSmoothing: the State Space Approach. Springer Science& Business Media.

[34] Rob J Hyndman, Yeasmin Khandakar, and others. 2007.Automatic Time Series for Forecasting: the ForecastPackage for R. Number 6/07. Monash University,Department of Econometrics and BusinessStatistics âAe.

[35] Lars Kotthoff, Chris Thornton, Holger H Hoos, FrankHutter, and Kevin Leyton-Brown. 2017. Auto-WEKA2.0: Automatic model selection and hyperparameteroptimization in WEKA. The Journal of MachineLearning Research 18, 1 (2017), 826–830.

[36] Josua Krause, Adam Perer, and Enrico Bertini. 2014.INFUSE: interactive feature selection for predictivemodeling of high dimensional data. IEEE Transactionson Visualization and Computer Graphics 20, 12 (2014),1614–1623.

[37] Josua Krause, Adam Perer, and Kenney Ng. 2016.Interacting with predictions: Visual inspection ofblack-box machine learning models. In Proceedings ofthe 2016 CHI Conference on Human Factors inComputing Systems. ACM, 5686–5697.

[38] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton.2015. Deep learning. Nature 521, 7553 (2015), 436.

[39] Hau L Lee. 2002. Aligning supply chain strategies withproduct uncertainties. California Management Review44, 3 (2002), 105–119.

[40] Lezhi Li, Yunfeng Bai, and Yang Wang. 2019. Manifold:A Model-Agnostic Visual Debugging Tool for MachineLearning at Uber. (2019).

[41] Mengchen Liu, Jiaxin Shi, Zhen Li, Chongxuan Li, JunZhu, and Shixia Liu. 2016. Towards better analysis ofdeep convolutional neural networks. IEEE Transactionson Visualization and Computer Graphics 23, 1 (2016),91–100.

[42] Yafeng Lu, Robert Krüger, Dennis Thom, Feng Wang,Steffen Koch, Thomas Ertl, and Ross Maciejewski. 2014.Integrating predictive analytics and social media. In2014 IEEE Conference on Visual Analytics Science andTechnology (VAST). IEEE, 193–202.

[43] Tomáš Mikolov, Martin Karafiát, Lukáš Burget, JanCernocky, and Sanjeev Khudanpur. 2010. Recurrentneural network based language model. In EleventhAnnual Conference of the International SpeechCommunication Association.

[44] Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li,Yuanzhe Chen, Yangqiu Song, and Huamin Qu. 2017.Understanding hidden memories of recurrent neuralnetworks. In 2017 IEEE Conference on Visual AnalyticsScience and Technology (VAST). IEEE, 13–24.

[45] mljar. 2018. mljar. https://mljar.com/. (2018).Accessed: 2019-08-11.

[46] Wolfgang Müller and Heidrun Schumann. 2003.Visualization for modeling and simulation: visualizationmethods for time-dependent data-an overview. InProceedings of the 35th Conference on WinterSimulation: Driving Innovation. Winter SimulationConference, 737–745.

[47] William Playfair. 1801. The Commercial and PoliticalAtlas: Representing, by Means of Stained Copper-plateCharts, the Progress of the Commerce, Revenues,Expenditure and Debts of England during the whole ofthe Eighteenth Century. T. Burton.

[48] Takafumi Saito, Hiroko Nakamura Miyamura,Mitsuyoshi Yamamoto, Hiroki Saito, Yuka Hoshiya, andTakumi Kaseda. 2005. Two-tone pseudo coloring:Compact visualization for one-dimensional data. InIEEE Symposium on Information Visualization, 2005.INFOVIS 2005. IEEE, 173–180.

[49] Xingjian Shi and Dit-Yan Yeung. 2018. Machinelearning for spatiotemporal sequence forecasting: Asurvey. arXiv preprint arXiv:1808.06865 (2018).

[50] John Stark. 2015. Product lifecycle management. InProduct Lifecycle Management (Volume 1). Springer,1–29.

Page 13: DFSeer: A Visual Analytics Approach to Facilitate Model ... · ferent demand forecasting models due to the data complexity. The demand forecasting in a manufacturing company usually

[51] Dong Sun, Renfei Huang, Yuanzhe Chen, Yong Wang,Jia Zeng, Mingxuan Yuan, Ting-Chuen Pong, andHuamin Qu. 2019. PlanningVis: A Visual AnalyticsApproach to Production Planning in Smart Factories.IEEE Transactions on Visualization and ComputerGraphics 26, 1 (2019), 579–589.

[52] Thomas Swearingen, Will Drevo, Bennett Cyphers,Alfredo Cuesta-Infante, Arun Ross, and KalyanVeeramachaneni. 2017. ATM: A distributed,collaborative, scalable system for automated machinelearning. In 2017 IEEE International Conference on BigData (Big Data). IEEE, 151–162.

[53] Robert Tibshirani. 1996. Regression shrinkage andselection via the lasso. Journal of the Royal StatisticalSociety: Series B (Methodological) 58, 1 (1996),267–288.

[54] Jarke J Van Wijk and Edward R Van Selow. 1999.Cluster and calendar based visualization of time seriesdata. In Proceedings 1999 IEEE Symposium onInformation Visualization (InfoVis’ 99). IEEE, 4–9.

[55] Junpeng Wang, Liang Gou, Hao Yang, and Han-WeiShen. 2018a. Ganviz: A visual analytics approach tounderstand the adversarial game. IEEE Transactions onVisualization and Computer Graphics 24, 6 (2018),1905–1917.

[56] Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen,Dongyu Liu, Micah J Smith, Kalyan Veeramachaneni,and Huamin Qu. 2019. ATMSeer: Increasing

Transparency and Controllability in Automated MachineLearning. In Proceedings of the 2019 CHI Conferenceon Human Factors in Computing Systems. ACM, 681.

[57] Yong Wang, Hammad Haleem, Conglei Shi, YanhongWu, Xun Zhao, Siwei Fu, and Huamin Qu. 2018b.Towards easy comparison of local businesses usingonline reviews. In Computer Graphics Forum, Vol. 37.Wiley Online Library, 63–74.

[58] Marc Weber, Marc Alexa, and Wolfgang Müller. 2001.Visualizing time-series on spirals. In Infovis, Vol. 1.7–14.

[59] M.B. Wilk and R. Gnanadesikan. 1968. Probabilityplotting methods for the analysis of data. Biometrika 55(1968), 1–17.

[60] Billy M Williams. 2001. Multivariate vehicular trafficflow prediction: evaluation of ARIMAX modeling.Transportation Research Record 1776, 1 (2001),194–200.

[61] Jian Zhao, Fanny Chevalier, and Ravin Balakrishnan.2011. KronoMiner: using multi-foci navigation for thevisual exploration of time-series data. In Proceedings ofthe SIGCHI Conference on Human Factors inComputing Systems. ACM, 1737–1746.

[62] Xun Zhao, Yanhong Wu, Weiwei Cui, Xinnan Du, YuanChen, Yong Wang, Dik Lun Lee, and Huamin Qu. 2017.Skylens: Visual analysis of skyline on multi-dimensionaldata. IEEE Transactions on Visualization and ComputerGraphics 24, 1 (2017), 246–255.