the power of data visualization...the stock movement to show how data visualization can be employed...

13
1 The Power of Data Visualization by Vanban L. Wu, Ph.D., PMP, PSM Indices: Data Science, Data Analysis and Visualization, Tableau, Python, Candlesticks, MACD. Abstract: The goal of this paper is to illustrate the power of data visualization to assist in the understanding of data analysis. This is an essential part of data science as data visualization can shed some amazing findings. To demonstrate this powerful element, we use an application in the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction: In data science disciplines, data visualization plays an essential role in the understanding of the dynamic and correlations of a data set. Observations derived from this phase usually follows by more robust analysis based off statistical models to objectively evaluate the findings. In general, data visualization is both a science and an art skewing more to the later factor. Presentation of data visualization is syntactically constructed on a seven-layer graphic grammar, to be introduced in the next session. Semantically, it merges creativity and definitely domain expertise to form convincing stories as part of the whole data science study. To illustrate the power of data visualization, we choose a practical application leading to some valuable and interesting observations. The application of choice in this case is the analysis of stock market movement for a stock trading in two different stock exchange platforms. For readers who are not familiar with stock trading terminologies, e.g. candlestick or Moving Average Convergence Divergence (MACD) and Simple Moving Average (SMA), please refer a previously published paper by the same author [1]. 2. Fundamentals of Graphic Grammar: The grammar of a graphic is a framework to help organize and present elements in visual data design (Figure 2), see [2] for detailed explanations. This framework consists 7 layers: Data: A set of variables to be visualized Aesthetics: A layer of scales (e.g. x- and y-axis) to map the data Geometry: Shape used to represent each variable, e.g. bars, lines, and points, see [5] Facets: Sub-plots to split variables for clearer view Statistics: Aggregated data (summaries) or trends using statistical models

Upload: others

Post on 26-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

1

The Power of Data Visualization

by Vanban L. Wu, Ph.D., PMP, PSM

Indices: Data Science, Data Analysis and Visualization, Tableau, Python, Candlesticks, MACD.

Abstract:

The goal of this paper is to illustrate the power of data visualization to assist in the understanding of data analysis. This is an essential part of data science as data visualization can shed some amazing findings. To demonstrate this powerful element, we use an application in the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations.

1. Introduction:

In data science disciplines, data visualization plays an essential role in the understanding of the dynamic and correlations of a data set. Observations derived from this phase usually follows by more robust analysis based off statistical models to objectively evaluate the findings. In general, data visualization is both a science and an art skewing more to the later factor.

Presentation of data visualization is syntactically constructed on a seven-layer graphic grammar, to be introduced in the next session. Semantically, it merges creativity and definitely domain expertise to form convincing stories as part of the whole data science study.

To illustrate the power of data visualization, we choose a practical application leading to some valuable and interesting observations. The application of choice in this case is the analysis of stock market movement for a stock trading in two different stock exchange platforms.

For readers who are not familiar with stock trading terminologies, e.g. candlestick or Moving Average Convergence Divergence (MACD) and Simple Moving Average (SMA), please refer a previously published paper by the same author [1].

2. Fundamentals of Graphic Grammar:

The grammar of a graphic is a framework to help organize and present elements in visual data design (Figure 2), see [2] for detailed explanations. This framework consists 7 layers:

• Data: A set of variables to be visualized • Aesthetics: A layer of scales (e.g. x- and y-axis) to map the data • Geometry: Shape used to represent each variable, e.g. bars, lines, and points, see [5] • Facets: Sub-plots to split variables for clearer view • Statistics: Aggregated data (summaries) or trends using statistical models

Page 2: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

2

• Coordinates: Plotting space Descriptions, a.k.a. Legend. • Themes: non-data ink such as colors, fonts size, and other design elements to enhance

visual identity, see [6] on color selections.

Figure 2

There are many programming languages and tools supporting the data visualization layer. One of the most common ones is Tableau, which is a user-friendly tool and is chosen for the writing of this report.

3. Candlestick Representation:

In the study of stock movement, candlestick is a common form of representing the fluctuation of stock prices; for a predetermined interval ranging from minutes, hours, days or longer, the pattern of a candlestick is composed of a body with two wicks, one at the top and the other at the bottom of the body, see Figure 3. In the US stock trading, green color represents an upper, or bullish, movement between Open and Close prices, reversely a red color for bearish movement. If there are prices fluctuated outside of the spectrum, they are represented by the wicks at either end of the boundary.

There is a set of terminologies used to describe specific forms of candlestick to determine and predict patterns of trading during certain interval of stock exchange. Interested readers can either refer to literatures in the Internet or the paper published in reference [1].

Page 3: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

3

Figure 3 – General Candlestick

4. Data Acquisition:

This section addresses the key points of acquiring the stock information in conjunction with some essential stock measures to illustrate the data science discipline in the study of stock trading. The stock module is written in Python language, utilizing “pandas_datareader” and “datetime” libraries. The stock information is accessed from the daily Yahoo Finance database. During the execution of the module, a user is required to specify a stock “Symbol” to be investigated, the start and end dates of the acquisition period. In this report, we are using two stock symbols, BABA and 9988.HK, with a purpose to be addressed at the end of this section. Please refer to attachment 1 for a simplified Python version of the module. The download data contains the following variables:

• Date: Date of the entry • High: The high price of the stock in that day • Low: The low price of the stock in that day • Open: The open price of the stock in that day • Close: The close price of the stock in that day • Volume: The total volume bought or sold in that day • Adjusted Close: The stock’s value after posting a dividend in that day

Along with the collected data, the acquisition module also generates the following computed data for the analysis phase. Please refer either to literatures from the internet or [1] for detailed explanations of the terms:

• MACD: Moving Average Convergence Divergence • SMA: Simple Moving Average

A “Symbol Stock.csv” file is generated as an output from the acquisition module to be used by the Data Visualization phase.

In this study, we intend to compare the stock of a company, Alibaba Group Holding Ltd., trading under two different stock exchanges, NASDAQ and Heng Seng. The goal is to assess how data visualization can help revealing characteristics of their performances and correlation. The

Page 4: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

4

stock of Alibaba listed in NASDAQ is by the symbol BABA, and in Heng Seng is 9988.HK. Since the IPO of the stock was offered in different timeframe of each exchange platform; for BABA, it was in late 2014 and for 9988.HK in late 2019, the comparison can only be conducted during the intersection interval starting late 2019 to be depicted in future charts.

At the end of this phase, we generated two CSV files – “BABA Stock.csv” and “9988.HK Stock.csv”.

5. Data Pre-Processing:

Usually this process entails data wrangling to warrant the validity of the whole data set. In this special situation, data downloaded from the source is pretty clean (no missing entries or invalid values) thus it requires no further cleansing.

6. Data Visualization:

The goal of this study is to see how the data visualization can correlate the trading behavior of traders from two different stock exchanges. In this comparison, we are studying the trading behavior without taking into consideration the currency exchanges difference between NADSAQ pegged in US dollars and Heng Seng in HK dollars, i.e. we just compare the absolute units of the two stocks:

• Import the first stock file, 9988.HK Stock.csv, created in the Acquisition phase above into the Tableau tool

• Perform an inner-join between 9988.HK Stock.csv and BABA Stock.csv using the “Date” variable as join criteria. The resulting file will contain all records starting in late November 2019 onward.

The following procedures are conducted on 9988.HK entries first followed by BABA:

• Manually create two new measures from the four existing measures (Close, Open, High, and Low): Spread Close-Open and Spread High-Low

• In the worksheet tab, bring the measures Low and Open to the Rows shelf. Click on SUM(Open) and select dual axis.

• Go to edit axis for both measures and uncheck the include zero check box • Change the default aggregation from sum to average for both measures (two items in

Rows) • Under Marks, select All and then change the default chart type to Gantt Bar • Click on AVG(Low) under Marks and place the measure Spread High-Low in the size.

Click on size and reduce it to minimal • Click on AVG(Open) under Marks and place the measure Spread Close-Open in the size.

Click on size and adjust it to be thicker than minimal • Drop the data into filter box. Select Relate Date > Range of dates > OK • Show quick filter for date, apply to Worksheets all using this data source • Bring the Dimension date to the Column and then select Day for the dropdown

Page 5: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

5

• Create a calculated field from Spread Close-Open, name it to Positive Daily Growth (Spread Close-Open > 0)

• Under Marks, select AVG(Open) and drop Positive Daily Growth into Color tab. Change the color of False to Red and True to Green

• Adjust the fonts of all formats in both X, Y axis, the background color if needed, and synchronize both left and right Y axis for consistency

Figure 6.1 depicts an intermittent picture generated so far by the process.

Figure 6.1

The second part of the process creates a MACD chart for 9988.HK movement:

• On a new worksheet, move “MACD” into the row, move “Signal” into the left axis of the chart. This action overlays both MACD curve with the Signal curve. Adjust the color of both curves if needed.

• Drag “Histogram” into the right side of the window, creating a dual axis for the presentation. Change the presentation of “Histogram” into bar instead of curve. Remember to sync up both left and right axes.

• If needed, move “Histogram” in front of “Measured Values”. This will move the image of Histogram behind both MACD and Signal lines (see Figure 6.2).

Figure 6.1 and 6.2 are the two commonly used charts in stock movement representations. We can repeat the same process to create the stock movement for BABA stock. Readers can do so per own discretion.

Page 6: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

6

By the same method, we can generate the Simple Moving Average charts (to be used in different type of trading strategy) by selecting and dropping the two variables, SMA60 and SMA180 to the ROW axis. We leave the generation of such chart per readers’ discretions.

Figure 6.2

Next, we want to compare the performance of each stock per stock exchange to correlate the stock per location. This can be done by generating two charts, one for the 9988.HK candlesticks and the other for the area chart of BABA:

• On a new worksheet, repeat the procedures of the first part to generate the candlesticks of 99889.HK

• Drag and drop Close(BABA Stock.csv) into the Rows. Convert the shape of the chart to Area configuration

• Adjust the axis interval of both charts from 160 to 235 for easy comparison.

Figure 6.3 depicts the result of the above operations. From the chart, one can easily make the following observation between the period of Nov. 2019 to Jun. 2020:

Observation 1: Both charts conform to similar patterns in daily movements. Thus, the results show both stocks trading behave quite homogeneously disregard of stock exchange platforms.

Page 7: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

7

Figure 6.3

Next we want to see if the closing of one stock always lags behind the other using absolute values comparison, i.e. not taking currency exchange in mind. The correlation is derived by the following formula: [Close] - [Close (BABA Stock.csv)].

In a new worksheet assign Date to the column (as before) and the newly computed variable to the row, one will obtain the following chart, see Figure 6.4:

Figure 6.4

Page 8: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

8

Out of 131 entries, there are only 8 entries where 9988.HK closing values are ahead of BABA (6%). For all remaining cases, BABA closing values take the lead (94%). Since the trading time difference between two time zones has a 12-hour gap, 9988.HK closing indicator can be served as a floor value of BABA stock on the same day. Thus, we can form the following observation:

Observation 2: The closing values of 9988.HK can be served as floor values of BABA stock in the same timestamps.

It should be noted, there’s a major difference between floor values and support level as the later applies to an interval while the former is just a value which can fluctuate daily. Please refer to internet literatures for the definition of support level if needed [3].

Reversely, we can compare the closing value of BABA stock of a previous day to that of 9988.HK in current day using the following formula:

lookup(sum([Close (BABA Stock.csv)]),-1) - sum([Close])

The new chart (Figure 6.5) reveals the closing values of BABA having similar impact – 95.5% leads and 4.5% lags.

Figure 6.5

Thus, we can form the following observation:

Observation 3: The closing value of BABA from a previous timestamp can be used as a ceiling value of 9988.HK stock for a current timestamp.

Similarly, the ceiling value is different from the resistance level of a stock using similar arguments for floor values vs. resistance level. Please refer to internet literatures for the definition of resistance level if needed [3].

Page 9: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

9

Next, we are interested to investigate if there’s any correlation between the close changes of consecutive trading intervals between the two stocks. This phenomenon is calculated by the following three formula:

[Close Change] - [Close BABA Change], where

Close Change = sum([Close]) - LOOKUP(sum([Close]),-1)

Close BABA Change = SUM([Close (BABA Stock.csv)]) - lookup(sum([Close (BABA Stock.csv)]),-1)

The resulting chart shows such correlation doesn’t exist as all changes are quite random and bear no specific patterns, see Figure 6.6:

Figure 6.6

Next, we want to explore the trading trend from 9988.HK influencing the subsequent trading session of BABA by using the following formula:

IF ((sum(Close - Open) > 0.0 AND sum([Open (BABA Stock.csv)]) - lookup(sum([Close (BABA Stock.csv)]),-1) > 0.0) OR

(sum(Close - Open) < 0.0 AND sum([Open (BABA Stock.csv)]) - lookup(sum([Close (BABA Stock.csv)]),-1) < 0.0)) THEN 1 ELSE -1 END

The IF statement result is a marking of “1” if 9988.HK rally (or short) in one session followed up the same trend at the open of BABA in the same timeframe (remember a 12-hour time zone difference). Otherwise the result is being marked “-1” (reverse the trend in this situation). The result of the computation is depicted in Figure 6.7.

Page 10: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

10

Figure 6.7

In this figure, the first chart shows the change of 9988.HK stock (Close – Open) in a daily exchange. The second chart shows the initial trading session of BABA in the same timeframe (Open – Close from last session). The third chart shows the 1/-1 marking where 1 stands for the same trading trend, -1 the reverse trend. In this picture, there are 88 of “1” marking (67%) and 43 of “-1” marking (33%).

By using the same rationale, we compute the trading session of BABA impacting 9988.HK in the following trading session. Please refer to the formula and Figure 6.8 for the results of computation:

IF ((SUM([Close (BABA Stock.csv)]- [Open (BABA Stock.csv)]) > 0.0 AND ((LOOKUP(sum(Open),1) - sum(Close)) > 0.0)) OR

(SUM([Close (BABA Stock.csv)]- [Open (BABA Stock.csv)]) < 0.0 AND ((LOOKUP(sum(Open),1) - sum(Close)) < 0.0))) THEN 1 ELSE -1 END

The result in the third chart shows 93 markings of “1” (70%) and 38 “-1” (30%). Despite the trend of each stock at closing favors the opening of the other stock in the subsequent session, a question being posted from these two computations: does the rate of impact differ significantly in either experiment? By calculating the p value of the two, see [4] for calculation method, we obtain a p-value of 0.5 with 95% confidence, i.e. there’s no significant difference between the two impacts. Thus, we can form the following observation:

Observation 4: Between the period of over six months, the trading trend of BABA influencing the following trading session of 9988.HK vs. and vice versa is of no significant difference.

Page 11: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

11

Figure 6.8

7. Concluding Remarks:

In conclusion, per observation 1, it shows the trend of a stock having similar patterns disregard of stock exchange platform, i.e. trading behaviors are pretty much the same reacting to external events impacting the market. But the daily trading of each stock fluctuates independently; candlestick body length, are not consistent compared to one another per observation 2. Observation 3 gives a good floor/ceiling price references for trading. Observation 4 gives reasonable prediction of the starting trend for the other stock in the following trading session, but neither bear a heavier weight influencing each other per p-values calculations.

In general, one can utilize data visualization to explore and evaluate conjectures. Many interesting ideas and observations can certainly be derived from this process. Without doubt, this attempt is only achievable by combining data science skillsets with specific domain know-how.

Page 12: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

12

References [1] V. Wu & K. Pegues, “How to Improve the Effectiveness of Machine Learning Techniques”, https://globaliotek.com/2019/11/16/how- to-improve-effectiveness-of-machine-learning-techniques/

[2] Thomas de Beus, “Think About the Grammar of Graphics When Improving Your Graphs”, https://medium.com/tdebeus/think-about-the-grammar-of-graphics-when-improving-your-graphs-18e3744d8d18

[3] Support and Resistance, https://en.wikipedia.org/wiki/Support_and_resistance

[4] Evan’s Awesome A/B Tools, http://www.evanmiller.org/ab-testing/chi-squared.html

[5] Jorge Castanon, “10 Visualizations Every Data Scientist Should Know”, https://www.datasciencecentral.com/profiles/blogs/10-visualizations-every-data-scientist-should-know

[6] SuperDataScience Team, “The Art of Color in Data Visualization”, https://aov-platform.s3.us-east-2.amazonaws.com/uploads/artofviz-the-color-theory.pdf?utm_source=ONTRAPORT-email-broadcast&utm_medium=ONTRAPORT-email-broadcast&utm_term=&utm_content=%5BCHEATSHEET%5D+The+Art+of+Color+in+Data+Visualization&utm_campaign=29072020.

Page 13: The Power of Data Visualization...the stock movement to show how data visualization can be employed to reveal interesting results and form meaningful observations. 1. Introduction:

13

Attachment 1

# Stock data reader module # Remember to install the latest pandas_reader package in the terminal # $ pip install pandas-datareader from pandas_datareader import data from datetime import datetime # User is requested to input the symbol of the stock, the acquisition start and end dates # The function will return the symbol and a dataframe containing all records of the stock def get_stock_data(): symbol = input('Stock symbol: ') start_date_str = input('Start date (mm/dd/yyyy): ') end_date_str = input('End date (mm/dd/yyyy): ') start_date = datetime.strptime(start_date_str,'%m/%d/%Y') end_date = datetime.strptime(end_date_str,'%m/%d/%Y') df = data.DataReader(symbol, 'yahoo', start_date, end_date) return(symbol, df) # Execute the function Symbol, Stock = get_stock_data() # Create MACD line, Signal line and Histogram exp1 = Stock.Close.ewm(span=12, adjust=False).mean() exp2 = Stock.Close.ewm(span=26, adjust=False).mean() Stock['MACD'] = exp1 - exp2 Stock['Signal'] = Stock.MACD.ewm(span=9, adjust=False).mean() Stock['Histogram'] = Stock.MACD - Stock.Signal # Create SMA 60 and SMA 180 lines Stock['SMA_60'] = Stock.Close.rolling(window=60).mean() Stock['SMA_180'] = Stock.Close.rolling(window=180).mean() # Print all relevant info for verification print(Stock.info()) print(Stock.head(5)) print(Stock.tail(5)) # Create a csv filename Filename = Symbol + ' Stock.csv' # Export the retrieved stock to an external csv file specified by the filename Stock.to_csv(Filename, sep='\t')