a stock prediction system using open-source software

40

Upload: fred-melo

Post on 28-Jul-2015

329 views

Category:

Software


3 download

TRANSCRIPT

A Stock Prediction System using open-source software

Fred Melo [email protected] @fredmelo_br

William Markito [email protected]

@william_markito

It's all about DATA

Data SourcesLook for patterns

Prediction

Machine Learning is the answer

Neural Networks

Clustering Genetic Algorithms

Train with historical dataset

Apply model to the new input

Applying Machine Learning

Hard to add new data sources

Why?

Hard to scale

Why so hard?

Hard to make it real-time

HDFS

Data Lake

Store Analytics

Hard to change Labor intensive

Inefficient

No real-time information ETL based Data-source specific

Traditional models are reactive and static

HDFSData LakeExpert System /

Machine Learning

In-Memory Real-Time Data

Continuous Learning Continuous Improvement

Continuous Adapting

Data Stream Pipeline

Multiple Data Sources Real-Time Processing Store Everything

Stream-based, real-time closed-loop analytics are needed

Info

Analysis

Look at past trends (for similar input)

Evaluate current input

Score / Predict

Neural Network

How can it be addressed?

Info

Analysis

Filter

[ json ]

Neural Network

How can it be addressed?

Info

Analysis

Filter EnrichNeural Network

How can it be addressed?

Info

Analysis

Neural NetworkFilter Enrich Transform

How can it be addressed?

Info

Analysis

Filter Enrich Transform

Neural Network

How can it be addressed?

Info

Analysis

Filter Enrich Transform

Transform

Neural Network

How can it be addressed?

Neural Network

In-Memory Data GridReal-time scoring

How can it be addressed?

Train

Neural Network

In-Memory Data Grid

Front-end

Update Push

How can it be addressed?

Ingest Transform SinkSpringXD

Store / Analyze

Fast Data

Distributed Computing

Predict / Machine Learning

Other Sources and Destinations

JMS

Streaming real-time analytics architecture

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

SpringXD

shell - R

Transformer

geode-json client

geode-json client

http-client

http-server

obj-to-json

splitter

splitter

Simulator

tap

SpringXD

INGEST / SINK PROCESS ANALYZE

• Little or no coding required • Dozens of built-in connectors • Seamless integration with Kafka,

Sqoop • Create new connectors easily

using Spring

• Call Spark, Reactor or RxJava • Built-in configurable filtering,

splitting and transformation • Out-of-box configurable jobs for

batch processing

• Import and invoke PMML jobs easily

• Call Python, R, Madlib and other tools

• Built-in configurable counters and gauges

Data Stream Pipelining

SpringXD

XD NodesXD NodesXD NodesXD Nodes

Ingest

SpringXD

Split Filter Transform Sink

XD admin

XD Nodes

Ingest Split Filter Transform Sink

Stream Deployment

Messaging

Scale-Out and HA Architecture

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

Geode client-server architecture

Partitioned Regions

Event handling

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

Neural Networks

Neural Networks

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Neural Network

Neural Network

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable

HTTP

Machine Learning

Fast DataFilter

Predict SinkHTTP

Split

Dashboard

Push

Demo Architecture

Demo Time

SpringXD

shell - R

Transformer

geode-json client

geode-json client

http-client

http-server

obj-to-json

splitter

splitter

Simulator

tap

SpringXD

http://projectgeode.org

http://projects.spring.io/spring-xd

http://www.r-project.org

Follow-up: In-Memory Unconference

"A place for all things in-memory: projects, people, ideas, roadmaps, discussions."Location: Hill Country A/B”

Weds 4:15pm - 6pm. (after this talk)

The demo code is on GitHub!

@fredmelo_br

@william_markito