cisc 525 - big data architecture - tran (ryan) le - real-time portfolio and risk management for...

16
REAL-TIME PORTFOLIO AND RISK MANAGEMENT FOR FINANCIAL INSTITUTIONS BY TRAN (RYAN) LE CISC 525 – BIG DATA ARCHITECTURE

Upload: tran-ryan-le

Post on 13-Jan-2017

162 views

Category:

Documents


0 download

TRANSCRIPT

PowerPoint Presentation

Real-time Portfolio and Risk management for financial institutionsBy Tran (ryan) leCisc 525 big data architecture

introductionProject Objective: Leverage the power of Hadoop ecosystems to provide mutual funds and hedge funds managers real-time information and insights about the current positions of their portfolios to help them react quickly to changes in the marketDescription: The financial markets are very complex and interconnected. The fluctuations of the U.S. stock prices are caused by many different factors, including investors expectations about the outlook of the economy in the future, employment rate, oil prices, the actual earnings of companies compared to investors expectations, or the performances of major indexes in other countries, especially in China and Europe. Constantly tracking all these events requires tremendous time and effort from traders and investors, which would distract them from devising appropriate trading strategies to take advantage of changes in the market. However, the Hadoop ecosystems provide many different tools that could help consolidate structured and unstructured data from many different sources, analyze them, and provide the results in a timely manner, which would allow traders and fund managers make decisions more quickly and effectively

Data setsMachine Learning based ZZAlpha Stock Recommendations: The data made for various U.S. traded stock portfolios the morning of each day during the three year period Jan 1, 2012 Dec 31 2014. They were deposited in txt form for easy accessibility The data deposited includes calculated returns for each recommended transaction. Returns were calculated several days after sale day. The date inside the file reflects the date for which morning trading recommendation was madeThe evaluation of the recommendations involved comparison of the opening price of the day of recommendation to the opening price five market days laterStock price and news from YahooEnterprise Data Warehouse (EDW) of financial institutions

http://archive.ics.uci.edu/ml/datasets/Machine+Learning+based+ZZAlpha+Ltd.+Stock+Recommendations+2012-20144

processesUse Sqoop to move structured data from the enterprise data warehouse into HDFS (Hadoop Distributed File System)Use Flume to feed real-time data from unstructured data like economic and financial news, ZZAlpha stock recommendations from text files, and stock prices from yahooOnce data fed into HDFS, each will go through separate processes to be cleaned and restructured in the right formats to get ready for analysis

Data processingFor economic and financial data, they will flow through the NLP processes, using Oozie, Pig scripts, and Pig UDFs, to be categorized into positive and negative information that would affect the movements of stock prices in the financial marketsFor stock prices and options prices data, they will be processed by mathematical algorithms to produce different types of financial indicators such as delta, gamma, theta, implied volatility, historical volatility, and so on. The results will then be combined with the data from EDW to determine the probabilities of increasing or decreasing of potential stocks, optimal portfolios, or the risk level of each portfolio and recommended strategies used to hedge the risk

visualizationOnce the data in a ready-to-be-analyzed format, they will be imported or streamed into different applications to help financial analysts, traders, and investors make trading decisions more quickly and effectivelyZepplin: interactive browsers-based notebooks enable data engineers, data analysts, and data scientists to be more productive by developing, organizing, executing, and sharing data and visualizing results without referring to the command line or needing the cluster detailsExcel PowerView: a common tools for any financial investorsWeb applications, which allows traders and investors to have access to the information anywhere in the world

Security authentication and authorizationAuthentication: To create secure communication among various components in the system, Kerberos can be used as an authentication mechanism, in which users and services that users wish to access rely on a third-party to authenticate each to the other. This mechanism can also be used to encrypt all the traffic between the user and the service, although this has a significant performance impactAuthorization: Accumulo provides extremely fast access to data in massive tables while also controlling access to its billions of rows and millions of columns down to the individual cell (known as fine-grained data access control). Cell-level access control is important for those financial institutions with complex policies governing who is allowed to see data. It enables the intermingling of different data sets with access control policies for fine-grained access to data sets that have some sensitive elements (such as personal information about investors). Without Accumulo, those policies are difficult to enforce systematically, but Accumulo encodes those rules for each individual data cell and controls fine-grained access

Kerberos: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.2/bk_installing_manually_book/content/rpm-chap14-1-1.htmlAccumulo: http://hortonworks.com/hadoop/accumulo/8

Zeppelin, one of Hadoop ecosystem components, allows users to query for data and create different types of charts and graphs interactively. This helps shorten the data analysis cycle, which in turn allows analysts and traders detect patterns and insights much faster.9

Excel PowerView Real-time financial data will be fed into PowerView to help traders, investors, or analysts learn how stock prices have been changing over time (click Play button to demonstration)

10

Excel PowerView

Excel PowerView If investors or analysts want to analyze the changing pattern of a specific stock, it is still possible. In this case, we see that Netflixs stock price significantly outperforms others, so we want to separate this stock from others and investigate its behavior over time. (click Play button to demonstration)11

Using Tableau to create Dashboard or Story Board. We can connect directly to live data, so as more data are fed into HDFS, the Dashboard and Story Board will update automatically. This in turn provide information to traders, investors, or analysts to help them react quickly to changes in the markets.12

Web Application Financial data will be fed directly into the application to provide traders and investors real-time information about changes into stock prices, risk levels of their portfolios, and available financial instruments to hedge the risks (I am currently building this application with the goal of eventually streaming data from Hadoop into the application to provide users useful insights about companies, stock prices, and the market overall through both technical and fundamental analyses).

13

14

We can also stream live data from Tweeter into Hadoop to conduct analyses, or take advantages of many available packages in R Revolution to read data from Tweeter and create word clouds for a particular trending information about a company. In the above picture, I used R Revolution to conduct sentiment analyses for Apple (AAPL), Microsoft (MSFT), Netflix (NFLX), ONEOK (OKE), Williams Companies (WMB), and General Electronics (GE)). This information may provide some useful insights about the public opinions about those companies.15

conclusionIn the world with a lot of uncertainty, the ability to react quickly to changes is critical to the survival of a business, especially in the financial industry. By building a system that could harness the power of Big Data, financial institutions would be able to extract useful information and insights about their businesses and the industry that they are in, which in turn allows them to make better and faster decisions. This would provide them a great edge over their competitors, who fail to take advantage of Big DataWith the growth of many different components in Hadoop ecosystems in the past decade, moving and manipulating massive amount of data have never been easier. Once data are properly imported into HDFS, traders, investors, and analysts can start their data exploration using many different data analysis and visualization tools that are supported by Hadoop. Since Hadoop takes care of all the complexity of the massive parallel processing behind the scene, this allows users to concentrate more on detecting patterns and insights, which could contribute the growth of their businesses