# Machine Learning, Stock Market and Chaos

Post on 21-Apr-2017

10.634 views

TRANSCRIPT

PowerPoint PresentationMachine Learning of Chaotic SystemsSolving Complex and Insoluble Problems via Artificial IntelligenceBy Lipa Roitman PhDNovember 1st, 20151ContentsChaos VS RandomnessChaotic ProcessesModeling Chaos- Statistics ApproachModeling Chaos- Artificial Intelligence and Machine Learning ApproachSteps in Machine LearningFinancial Markets as Chaotic Processes2Chaos and RandomnessRandom noise No known cause, no regularity, no rationality, no repeatability, no pattern Impossible to predictMessage:Randomness is unpredictableChaos can be predictable3Chaos VS RandomnessRandomness ExamplesPrevious coin flips do not predict the next one.Brownian motion - random walkGaussian and non-Gaussian Random (white) noise with frequency-independent power spectrumOther modes of random processes. How to tell Chaos from Randomness in Time Series dataExamples of random processesNone of these can be predictedFuture does no depend on the past4Stationary process: statistical properties: mean value, variance,moments, and probability distribution do not change over time.Stationary ergodic process: the process has constant statistical properties with time, AND its global statistical properties can be reliably derived from a long enough sample of the process.Chaos VS RandomnessWhile the probability distribution is known, still, what comes next can not be predicted. Its completely random. Future does no depend on the past5Real life chaotic processes are neither stationary nor ergodic! Their statistics have to be constantly monitored since they drift with time. A nonparametric analysis is needed when the probability distribution of the system is not normal.Chaos VS RandomnessChaotic time series dont have fixed statistical properties. They change wit h time, sometimes abruptly..6Astronomy: Three-Body ProblemSunspotsGeology: EarthquakesOceanology: El Nio (Pacific ocean temperature) , TidesMeteorology: WeatherChaos in Natural ProcessesFluid flow: luminary vs turbulent Candle flameQuantum chaosBiology: Population growthPhysiology: Arrhythmia, Epilepsy, DiabetisDNA codeEpidemiology: diseasesChaos in Natural ProcessesSocial: fashion trendsWarsMusic and speechStock markets, etc.Chaos in Natural ProcessesChaotic ProcessesChaotic Processes Three competing paradigms: Stability InstabilitySudden and Dramatic ChangeChaotic Systems Properties11Slide What is the pattern?Stability: Persistent trends. Memory: What happens next depends on prior history. Predictable: One can predict while the pattern continues. Chaotic Systems PropertiesInstability - tired trend - accumulation of small random imbalances, or of slow systematic imbalances that precede large change. Sand pile avalanche modelPredictability is lowerChange: paradigm changes suddenly, seemingly without warning. often with reversal of trend Fat-Tail: The change could be much stronger from what is expected in the normal Gaussian distribution.Black Swan EventsChaotic Systems PropertiesChaotic Systems PropertiesCycles of varying lengths. Periods of quiet followed by big jumpsChaotic patterns are predictable, but only in terms of probabilities. Measuring Chaos - StatisticallyModeling ChaosMathematical modeling of chaotic systems is difficult: Tiny changes in parameters can sometimes lead to extreme changes in the outcome.There is no certainty, only probability.Modeling ChaosThe ubiquity of gradual trends and the rarity of the extreme events resemble the spectral density of a stochastic process, having the formIn this 1/f noise model the magnitude of the signal (event) is inversely proportional to its frequency.Modeling Chaos S(f)=1/f^Although 1/f noise is widely present in natural and social time series, the source of such noise is not well and understood. 1/fnoise is an intermediate between the white noise with no correlation in time and random walk (Brownian motion) noise with no correlation between increments.In most real chaotic processes the random (white) frequency-independent noise overlaps the 1/f noise.Modeling ChaosIn a random autoregressive process the autocorrelation functions decay exponentiallyIn chaotic process, they leave a small persistent residue: long memory.Modeling ChaosIf one looks at a chaotic process at different degrees of magnification, one finds they are similar. This self similarity brings us to a subject of fractalsSelf similarity = Power laws scale invariance fractals (Mandelbrot)Hurst exponentScale InvarianceChaos Fractals ConnectionModeling ChaosRescaling RangeGiven a relationScaling the argument x by a constant factor c causes only a proportionate scaling of the function itselfModeling ChaosIn other words: Scaling by a constant c simply multiplies the original power-law relation by the constant c^{-k}. Thus Self-SimilarityModeling ChaosPower Law Signature: Logarithms of both f(x) and x, have linear relationship: straight-line on the log-log plot.Rescaled range - Theslopeof this line gives theHurst exponent, H. Modeling ChaosHurst exponent can distinguish fractal from random time series, or find the long memory cyclesHurst Exponent HH =1/2 Random walk - Brownian motion -Normal DistributionH < 1/2 mean revertingnegative feedback:high noisehigh fractal dimensionHurst exponent H1>H>1/2 Chaotic trending process: Positive feedback Less noise Smaller fractional dimension Fractional Brownian motion, or 1/f noiseHurst exponent HMaximal Lyapunov Exponent Maximal Lyapunov exponent (MLE) is a measure of sensitivity to initial conditions, i.e. unpredictability. Positive MLE: chaos The inverse of Lyapunov exponent: predictability: 1/MLE Large MLE: shorter half-life of signal, faster loss of predictive power. Maximal Lyapunov exponent (MLE) is a measure of sensitivity to initial conditions, a property of chaos Hurst exponent H is a measure of persistencyMaximal Lyapunov ExponentFractal time series are good approximations of chaotic processes. They are complex systems that have similar properties.Modeling Chaos with FractalsModeling Chaos with Fractals Fat-tailed probability distribution Memory Effect: Slowly decaying autocorrelation function Power spectrum of 1/f type Modeled with fractal dimension and the Hurst parameter Global or local self-similarity.Fractal dimension D and Hurst exponent H each characterize the local irregularity (D) and global persistence (H).Thus D and H are the fractal analogues of variance and mean, which are not constant in the chaotic time series. Fractal Dimension and Hurst ExponentFractal Dimension and Hurst ExponentFor self-affine processes, the local properties are reflected in the global onesFor a self-affine surface in n-dimensional spaceD+H=n+1 D: fractal dimensionH: Hurst exponentChaos and Fractals Connection Fractals have self-similar patterns at different scales. Fractal dimension Multi fractal system - continuous spectrum of exponents - singularity spectrum. Random shocks to the process, such as news events. The shocks can have both temporary and lasting effect Combination of interdependent autoregressive processes, each with its own statistical properties.Two Reasons For 1/F Noise Modeling Chaos: Artificial Intelligence and Machine Learning ApproachModeling Chaos - AI ApproachArtificial IntelligenceMachine Learning Purpose: GeneralizationFind the laws within the dataPredicting changeNumber crunching allows finding hidden laws, not obvious to human eyeArtificial Intelligence Types Rules Based AI Man creates the rules: Expert Systems The rule-based approach is time consuming and not very accurate Supervised learning from examples The examples must be representative of the entire data set.Artificial Intelligence Types Un-supervised learning Classification: clusteringArtificial Intelligence TypesDeep learningDeep learningmodels high-level abstractions in data by using multiple processing layers with complex structures.Artificial Intelligence Types Deep learning can automatically select the features For a simple machine learning, a human has to tell the algorithm which combination of features to consider Deep learning finds the relationships on its own No human involvementArtificial Intelligence Types Ultra Deep LearningMachine has learned so much, it can not only derive the rules, but detect when the rules change: detect the change in paradigms. Combines the supervised, un-supervised types and rule based machine learning into a more intelligent system.Artificial Intelligence TypesSteps in Machine LearningProvide FrameworkMathematical and Programming Tools Data preparation Parameters estimation Give examples to learn from: the input (and in some methods the output)Steps in Machine LearningCreating a Model (or Models).Fitness Function: What to optimize?Example: Make more good predictions than bad ones. Data Preparation Data preparationConvert the generally non-stationary data into more-or-less stationaryRemove the cycles, trends to reduce the uniqueness of each data pointParameters EstimationParametric OR Nonparametric?Parametric model:fixed number of parametersNonparametric: no assumptions about theprobability distributionsof the variables. In non-parametric modelthe number of parameters increases with the amount of training data.Creating a ModelAll Models are Wrong, Some Models are Useful George E. P. BoxMultivariate time series Multivariate time series modeling is required when the outcome of one process depends on other processes. Examples are systems of interdependent global and local processes, asset prices, exchange rates, interest rates, and other variables. Multivariate time seriesTo create a model one could use the available knowledge about interrelationship of the processes, and combine it with unknowns in one or more of the linear or non-linear models. The fitness or error function is then created, which compares the model with the data. Machine LearningThe fitness function is improved through machine learning by varying the parameters in the model. The goal is to maximize the fitness of the model to the data presented for learning (minimize the error). Different models are screenedPart of the data is saved from the learning cycle to be used for testing.The successful model should be able to perform adequately on the test data. Dimensionality ReductionDimensionality reduction Speeds up algorithm executionImproves performanceThe less variables the better is generality Principal Component Analysis is one of the methods of dimensionality reduction. Orthogonally transforms the original data set into a new set of principal componentsDimensionality Reduction MethodsMethods:Low Variance Filter. High Correlation Filter. Pruning the network.Adding and replacing inputs.Other methods.Dimensionality Reduction MethodsClustering The many examples in the data can be compressed into clusters according to the similarity through fitting to one or more criteria.Each data member that belongs to a cluster is associated with a number from 0 to 1 that shows the degree of belonging. Each data member can also belong to multiple clusters with each specific degree of belonging.Clustering can be a goal in itself, or a part of a general model, that includes the behavior of clusters as a whole. Time ConstraintA programmer knows the value of everything, but the cost of nothing. -- Alan J. PerlisTime ConstraintSome problems are insoluble or too complex to be completely solved in reasonable time.Compromises are necessary, e.g. speed vs precision vs generalityTime complexity (big O notation) of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the string representing the input.Time Complexity (Big O Notation)Choice of AlgorithmWhich Algorithm? Depends on the task Depends on time available Depends on the precision requiredLocal and Global Minimumaccp1.org/pharmacometrics/theory.htmUphill SearchingDownhill Gradient SearchingLocal Search AlgorithmsLocal search methods: steepest descent or best-first criterion, stochastic search. simulated annealing, genetic selectionothers A randommovealtering the state Assess thefitnessof the new state Compare the fitness to the previous state Decide whether to accept the new solution or reject it. Repeat until you have converged on an acceptable answerSimulated AnnealingGlobal Search AlgorithmsStochastic optimizationUphill searchingBasin hoppingaccp1.org/pharmacometrics/theory.htmLocal and Global MinimumBasin Hopping The algorithm is iterative with each cycle composed of the following features Random perturbation of the coordinates Local minimizationAccept or reject the new coordinates based on the minimized function valueGenetic AlgorithmsMany solutions are in the pool, some good, some not so.Each solution is analogous to a chromosome in geneticsGenetic AlgorithmsWays to improve gene pool: Combination:Combine two or more solutions in hope of producing a better solution.Mutation: -Modify a solution in random places in hope of producing a better solution.Crossover: Import a solution from a similar problemSelection: Survival of the fittest68Bain-TemplateGene PoolReproduceMutateSelectRejectCrossoverGenetic Algorithm68I Know First Predictive AlgorithmMost financial time series exhibit classical chaotic behavior. The chaos theory, the classification and predictive capabilities of the machine learning has been applied to forecasting of such time series. This artificial intelligence approach is in the root of I Know First predictive algorithm. I Know First Predictive Algorithm The following slides are the method and the results of applying the algorithm to learn the database of historical time series data. The I Know First AlgorithmThe results are constantly improving as the algorithm learns from its successes and failuresTracks and predicts the flow of money from one market or investment channel to another The system is a predictive model based on Artificial Intelligence, Machine Learning, and incorporates elements of Artificial Neural Networks and Genetic AlgorithmsTracks the flow of money Artificial Intelligence (AI) Machine Learning (ML) Artificial Neural NetworksGenetic AlgorithmsI Know Firstpredicts 2000 Markets EeverydaySynopsis of the AlgorithmThe results are constantly improving as the algorithm learns from its successes and failuresTwo indicators:Signal Predicted movement of the assetPredictability Indicator Historical correlation between the prediction and the actual market movement Daily Market Heat mapXOMA returned 61.45% in1 month from this forecastForecast vs. ActualA simple way to invest would be to buy all of I Know Firsts 3-month, Top 10 stock predictions in equal weights on the first day of each quarter, as we did in our sample portfolio. However, we advise checking the forecasts daily to identify trends in the algorithm. 76I Know First Sample PortfolioI Know First beats the S&P500 by 96.4%View Full PortfolioI Know First Live Portfolio 2015 PerformanceThe PerformanceI Know First beats the S&P500 by 20.8%The PerformanceThe PerformanceThe PerformanceMain Features of the AlgorithmIdentifies The Best Market Opportunities Daily6 Time FramesTracks Over 3,000 MarketsSelf-LearningAdaptableAlways Learning New PatternsScalableADecision Support System(DSS) Predictability IndicatorStrong Historical Performance 60.66% gain in 2013The algorithm becomes more and more accurate with every prediction as it constantly tests multiple models in different market circumstancesMore Applications Of I Know First AlgorithmTime Series Forecasting of Multidimensional Chaotic Systems.What if? It is a Scenario-based Forecasting

Recommended

View more >