memd framework for big data

INTRUSION DETECTION IN BIG DATA

Proposed Framework

ANOMALY VS INTRUSION• Considering a large sample space of big data, an anomaly would be categorised as an extreme

value when compared against the whole set of data.• An intrusion is classified as a piece of data that might look normal when compared against the

whole set but has a completely different meta-data associated with it, indicating an attempt at manual interfering.

Fig. A: Anomalous data points.

Fig. B: Intrusion case ( data looks like it follows the function flow but has a different time stamp or other types of meta-data)

PRELIMINARY MODEL FOR DETECTION

Process:1. Train the prediction algorithm on sample

of the dataset.2. Perform live prediction of incoming data3. Compare it against the prediction4. Classify the data5. If prediction algorithm underperforms,

rerun steps 1 - 4 until weights are consistent.

Data Strea

m

Training

Prediction Algorithm

Checking &

DetectionClassifying

Side Note:

This framework takes no account of the meta-data and can only detect anomalies that surpass the threshold.If system continuously underperforms it can indicate a change of influence factors or a faulty streaming system.

This method is good for detecting one time anomalies and regulates itself if permanent changes happen to the data stream.

“MEMD” ALGORITHM*

• Multivariate empirical mode decomposition is an attempt at addressing the problems of real-valued EMD for multivariate signals.

• The proposed method is to generate n-dimensional envelopes by using signal projections in n-dimensional spaces along different directions and averaging them to produce a local mean.

• In order to choose directional vectors for projections, a sampling based on low-discrepancy pointsets is to used (e.g. Hamilton sequence, Hammersley sequence).

• This algorithm has been used on synthetic signals as well as real-world inertial body motion recording.

* N. REHMAN AND D. P. MANDIC, Proc. R. Soc. A (2010) 466, 1291–1302

http://www.commsp.ee.ic.ac.uk/~mandic/research/Rehman_Mandic_Multivariate_EMD_Pro_Roy_Soc_A_2010.pdf

“MEMD” USE CASES

Fig. A: Decomposition of a synthetic multivariate signal.

Fig. B: Real-world signal decomposition (Tai Chi movement)

Data

stre

amPROPOSED DETECTION FRAMEWORK

USING “MEMD” ALGORITHM

.

.

.

MEMD

Sampling

Projection

Interpolation

Detail Extraction

Training

PredictionAlgorithm

Checking&

Detection

Clas

sifyi

ng

Process:1. Break down data stream into n

component vectors2. Feed each individual vector into

the MEMD algorithm and extract the envelope.

3. Calculate mean of envelopes and extract detail4. Use the combination of data stream envelope mean and individual envelop to train an n-dimensional predictor

5. Use the prediction to check component vectors of data6. Classify based on outcome.

ADVANTAGES OF THIS METHOD

• By training and detecting n-dimensional vector components of a data stream it allows us a better way of classifying the type of input and to act based on criterions that do not rely on the modification of the whole data stream but rather of a subcomponent.

• E.G: Consider a stream of SQL transactions. By using the number of transactions as a function of time as the only prediction factor the only conclusion that can be draw is whether or not an influx of transactions has happened. This alone can be classified as an anomaly but it does not provide contextual information.

• Now if we where to split the SQL data stream into multiple sub-functions of time like type of SQL query (INSERTION , DELETION), amount of catalogue data requested(number of records), etc. , then that provides sub-channels on which to build prediction algorithms in order to detect the type of anomaly and classify it as an intrusion or just as a change in the external factors that govern the system.

memd framework for big data

Documents