trmc big data analysis / knowledge management initiative · trmc big data analysis / knowledge...

18
UNCLASSIFIED – DISTRIBUTION STATEMENT A – Reference Number 18-S-1486; May 9, 2018 TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead Test Resource Management Center [email protected]

Upload: others

Post on 20-May-2020

51 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

UNCLASSIFIED – DISTRIBUTION STATEMENT A – Reference Number 18-S-1486; May 9, 2018

TRMC Big Data Analysis / Knowledge Management

InitiativeRyan Norman

Big Data and Knowledge Management Initiative LeadTest Resource Management Center

[email protected]

Page 2: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

What is Big Data Analytics?

• The use of advanced statistical analytic techniques in a parallel processing high-performance computing environment against very large diverse data sets that include different types of data

• Allows analysts to make better and faster decisions using data that was previously inaccessible or unusable

• Previously under-utilized data sources can be analyzed to gain new insights resulting in significantly better and faster decisions

• Instead of analyzing small chunks of data, Big Data Analytics can give the analyst a broad view of the system, allowing the discovery of “unknown unknowns.”

• Most important (and relevant to T&E) big data analytics techniques:– Anomaly Detection – Did something go wrong?– Causality Detection – What contributed to it?– Trend Analysis – What’s happening over time?– Predicting Equipment Function and Failure – When will something go wrong?– Regression Analysis – How is today’s data different than the past?– Data Set Comparison – Is test repeatable? Is the simulation the same as the test?

Is the perceived truth the same as the ground truth?– Pattern Recognition – Are there hidden relationships in the data set?

2

Better tools and techniques so analysts can do their jobs

Page 3: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

3

Example Big Data Analytics Return on Investment

Analysis: Brief (~300 ms) false on-ground event for sensor during flight

Big Data Analytics enables faster & more comprehensive analysis across the lifecycle of a program

Result: JSF-KM project discovered unknown problem with ground sensor

Page 4: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Need: An Evaluation Revolution

• Most T&E investments have been focused on the “T” rather than the “E”– Our analysis & evaluation capabilities are not keeping up with the complexity and speed

required by today’s acquisition systems– The next-generation of acquisition systems will be exponentially more complex than today

• Impact: T&E quality is inadequate for our needs– More data is being collected than can be properly analyzed– Only a tiny fraction of data is looked at– Analysis occurs on a small fraction of data– Focus is on a single test, rather than data collected across the system lifecycle– No systematic anomaly detection, trend analysis, regression analysis, causality analysis,

pattern recognition, simulation/test comparisons, perceived truth/ground truth comparisons are being done

• Impact: T&E timeliness is inadequate for our needs– Analyst retrieval of test data in many cases takes days/weeks rather than seconds/minutes– Sometimes it’s easier (though not cheaper) to just re-run a test rather than find old data that

may answer the question– Long data ingest times prevent proper debriefing of test participants after a test is over,

since their statements cannot be correlated with data in real time• Impact: T&E dollars are being spent unnecessarily

– More tests than necessary are being done, sometimes at enormous expense– Cross-program lessons learned only occur anecdotally

4

A systematic approach to Big Data Analytics and Knowledge Management is required to address these three serious issues

Page 5: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Long-term DoD T&E Big Data and Knowledge Management Vision

5

Result: T&E data used more effectively & efficiently during acquisition

• The primary product of T&E is data & knowledge

• Embrace KM & Big Data Analytics to efficiently handle & securely share T&E data

• Organize T&E data to build knowledge across all DoD acquisitions

• Federate distributed data repositories to enable execution & automated search scenarios that cannot occur today

• Use modern mechanisms to enable collaboration between SMEs in government and industry

Fundamental Functions Performed by KM and BDA

Page 6: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

1. Understand and Document T&E challenges & needs– (FY12) Completed Data Management for Distributed Testing (DM-DT) Study

− Result: Developed functional requirements for T&E enterprise distributed Data Management– (FY13) Comprehensive Review of T&E Infrastructure report published

− Key Recommendation: Use DoD cloud solution for T&E data− Key Recommendation: USD(AT&L) establish a DoD-wide KM capability for T&E to help achieve

better acquisition outcomes and reduce costs

2. Execute proofs of concept that inform an enterprise approach to T&E Knowledge Management

– (FY15-18) Joint Strike Fighter Knowledge Management (JSF-KM) project− Goal: Assess KM technologies and methodologies in support of an existing acquisition program

– (FY15-17) Collected Operational Data Analytics for Continuous Test & Evaluation (CODAC-TE) project− Goal: Apply KM technologies and methodologies across the lifecycle

3. Develop investment plan that achieves strategic objectives:– Integrate T&E infrastructure into cohesive Knowledge Management enterprise– Modernize T&E practices & processes to leverage Big Data analytics techniques– Apply Big Data analytics tools & techniques to the T&E mission space

Realizing Big Data and Improved DoD T&E Knowledge Management

Page 7: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Investments Path Forward: It Starts with Architecture

• The Big Data and Knowledge Management Architecture Reference Document (ARD) identifies:– Deficiencies in current T&E data analysis and knowledge management

practices– Government, commercial and open source software and hardware that

could address these deficiencies– The end state we are looking to achieve

• TRMC has released the ARD for feedback in preparation for making it a JMETC community standard– Reviewers should request access to BDKM User Group on TRMC website– Standardization scheduled for August JMETC Configuration Review Board

• Once a reference architecture is standardized, we can build it– Goal: Synergize evaluation investments across DoD T&E

7

https://www.tena-sda.org/display/BDKM/Documentation

Page 8: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

What do we need?Individual Range

2. Cloud Analytics Capability

4. Trained Data Science Workforce

• Integrated• Scalable• Cost-Effective• State-of-the-Art

Working Files

Regional Analytics Capability

Virtualized Big Data ToolsProcessing

Tiered StorageMLS Security

Data Scientists

Current Range Infrastructure

Existing ToolsExisting Storage

Existing Ingest Capabilities

Range AugmentationVirtualized Big Data Tools

Some ProcessingSome Tiered Storage

MLS SecurityEnhance Ingest

Individual Range

Cloud-Based Big Data Analytics and Knowledge Management System

Regional Analytics Capability

Virtualized Big Data ToolsProcessing

Tiered StorageMLS Security

Data Scientists

New

Existing

Quick-Look

Schedule Info

ApplicationRepository

Reports

DataRegional Analytics Capability

Virtualized Big Data ToolsProcessing

Tiered StorageMLS Security

Data Scientists

Video

Audio

Imagery

1. Integrated Local Data

3. Big Data Tools

Page 9: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Big Data and TENA Relationship:The Big Data Analytics Architecture is an Extension of TENA

Into the Analytic World – Seamless Integration

9

Event Data Is Ingested into Big Data Enterprise System

Working Files

Current Range Infrastructure

Existing ToolsExisting Storage

Existing Ingest Capabilities

Range AugmentationVirtualized Big Data Tools

Some ProcessingSome Tiered Storage

MLS SecurityEnhance Ingest

Individual Range

Quick-Look

Page 10: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Big Data Software Architecture Overview

10

Existing Range Computing and Storage

Structured Database Unstructured/Semi-Structured Database (Hadoop)

Structured Data Engine Unstructured Data Engine

Query Engine – Federated access for both Structured and Unstructured Data

Data Analysis Packages User-Defined Analytic Plugins

Massively Parallel Tiered Computing, Storage, and Network InfrastructureAt Multiple Independent Levels of Security

Extract-Transform-

Load

Data Sources

Analytic Services

Big Data Visualization

UC S TS SAP SAR

Security

Existing Range DatabasesFlat Files

Raw Files

Setup, Configure, and Manage

Policies Security Define MetadataPrioritization

Streams

Micro-batch

Mega-batch

Parallel

Verify

Transform

Add Metadata

Index

Warehouse

Configuration

Metadata Replication

Build Queries

Quick-Look Real-Time Continuous

2D/3D/Anim

Display Reports

Design Reports

CustomizedDisplays

Display Alerts

User Interface

Authenticate

Authorize

AccessControlEnforcePoliciesEnforce

WorkflowThreat

DetectionIntrusionDetection

ActiveDefenses

Working Sets Tables

Encryption

Audit

Alerts

Load Balancing

Fault/Recovery

MILS SecureCloud

Statistics

Key-Value Store

DistributedFile System

Generate ReportsAI Tools Simulation

Analysis ToolsAlerting Scheduling/Automation Legacy Tools

SQL Services

Remote DataReplication

T&E Specific Custom BDA ServicesAnomaly Detection Trend AnalysisCausality Detection Regression AnalysisGround Truth Comparison Pattern Recognition

Filter Sort Summarize Parallelize Optimize

Machine LearningData Mining

CustomizedUIs

Structured

Unstructured

Audio/Video

Schema

ComputingResources

ComputingResources

CreateAutomated

Products

Abstraction Layer (Virtualization)

Hypervisor

Virtualized Legacy Tools

Infrastructure as a Service Platform as a Service Software as a Service

Virtualized New Tools

Simulation as a Service

Graph-Based

Schema

Audio/Video Analysis

NewDatabases

Provisioning

StreamingScripting

COTS/GOTS SoftwareNew Hardware/Network

TRMC-Developed Software

Existing Range HW/SW

Applications

Resource Mgmt

VM Library

Cloud

License

Customization

Data Services

Organization

Core

OperationsShare

Serve

MessagingMetadata

Store Retrieve

VersioningTaggingPublish/Subscribe Crawl/Index

Transfer

Transform

Catalog

Search

Verify

AdministrativeCOO/DREnforce Policies Archive ToolsDB Admin Config Mgmt

Sync Data/Video

Spatio-temporal

Ontologies

MPP Programming and Execution Engine

C/R/U/D Consistency

ExistingComputers

PipelineWorkflowRange

Protocols

TENA Data Lifecycle

Workflow CreateSoftware

IDE

SDK

Page 11: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Security Architecture:Notional MILS and CDS

Regional Analytics Capability

Long-TermStorage

Med-Speed

High-Speed

Classification C

Long-TermStorage

Med-Speed

High-Speed

Classification B

Long-TermStorage

Med-Speed

High-Speed

Classification A

Long-TermStorage

Med-Speed

High-Speed

MLS Database

Enterprise Big Data Analysis

MILS-CDS

Page 12: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

12

Data Science

Computer Science

Machine Learning

Math and Statistics

Traditional Software

Traditional Research

Subject Matter Expertise

Big DataAnalytics

• Unique DoD data challenges require an interdisciplinary approach with skills & analytical techniques required from 3 broad areas:

• Statistics – Especially Bayesian statistics with multivariate analysis– Knowledge of probability, distributions, hypothesis testing, and multivariate analysis

• Computer Science– Databases, SQL, data structures, algorithms, parallel computing, distributed computing, etc.

• Subject Matter Expertise– Ability to assess which models are feasible, desirable,

and practical in different settings– Clear ideal of the distinction between correlation and

causality

Help Wanted: DoD Data Scientists

“Proliferation of sensors and large data sets are overwhelming analysts, as they lack the tools to efficiently process, store, analyze, and retrieve vast amounts of data”

ASD(R&E) Department of Defense Research & Engineering website

Page 13: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

TRMC Investments SupportJoint Strike Fighter Needs

13

Realistic DistributedMission Environments

On-Board Instrumentation

CRIIS

Miniaturized Data Capture

QRIPOn Board JSF

Model Validation& Improvement

Interoperable With JSE

TENA

LVC Integration

Next-Generation Threats

NCR

Cyber T&E

EWIIP

EW

Analysts, Evaluators, & Decision-Makers

Data Ingest & Validation

RAPIDS

Big Data Analytics & Evaluation

JSF-KMKM Data Archive

MLS-JCNECross Domain Solutions

MILS Network

JMETC

Page 14: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

14

JSF T&E KM & Data Needs Addressed by TRMC

1. Data Capture: DART Pod is too large, requires significant jet modifications, and is not certified to support F-35 full operational profile

2. Data Warehousing: Flight test data should be stored in a government facility to expedite data access & discovery

3. Data Ingest: Current DART Pod test data ingest is too slow to meet multi-ship quick-look and quick-turn requirements– examples: 2 on 2; 4 turn 2; 4 on 4 turn 4 on 4

4. Data Access: Test data should be available for quick-look analysis during mission debrief to inform decision making

5. Video: DART Pod video should be available for quick-look analysis during mission debrief to inform decision making

6. Big Data Analytics: Analysis capabilities need to proactively identify “unknown unknowns” and other anomalies impossible for a human to discern

7. Remote Operations: Analysts need a rapid reaction capability to harvest data and conduct quick-look analyses in situations / locations where a network connection is not possible

Page 15: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

JSF-KM Improvements to Existing T&E Capabilities

DT Today

OT Today

With JSF-KM

Parallel Data

Ingest

30 minutes (multiple aircraft)

Raw Data Available

Video/Data at Post-Mission Debrief Big Data

Analytics

Govt. Analyst Data Request

Analysis

Note: Numbers reflect single 2 hour flight mission

Data Ingest

Raw Data Available Govt. Analyst

Data Request Analysis

2 hours (per aircraft) 1 day 1 week

30 seconds

Data Ingest

Raw Data Available Govt. Analyst

Data RequestAnalysis

1-2 hours (per aircraft) 10 minutes 4-5 hours

Data Ready for Use @ (Govt)

30 seconds

90 minutes

Data Ready for Use @ LM

>20 weeks of data available online

Data Ready for Use @ (Govt)3 weeks of data available online

15

Page 16: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Sample JSF-KM Success Stories• Identified flights which experienced propulsion component failure

– During a blind analysis of 1,392 flights of propulsion data, JSF-KM data scientist was able to identify 7 of 10 flights with JSF analyst known engine issues

– Led to creation of a predictive model* for identifying future failures (*model validation pending)– Without JSF-KM this predictive model may not have been generated

• Video available during post-mission debrief due to JSF-KM data ingest improvements from DART Pod

– Existing tools could not process video in time to support post-mission de-brief– Without JSF-KM, there would be no flight video during post-mission debrief

• Discovery of avionics box issue during first night mission– Pilot and Analyst discovered problem from video data available 30 minutes after landing– Avionics Box was replaced before another mission was flown– Without JSF-KM, problem would not have been discovered for several days

• Reduced data profile time from 5+ hours to 47 seconds per Query– Big Data tool enabled massive improvement to data profile generation– Without JSF-KM it would still take 5+ hours to perform data profile data runs

• 9 hour routine analysis process reduced to 23 milliseconds– Patuxent River system drastically reduced routine MATLab analysis process from 9 hours to 23

milliseconds prior to KM system even being fully deployed– Patuxent River leadership already identifying other airframes which could use the system

16

Page 17: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Big Data Initiative Summary

• TRMC is acting upon recommendations from the Comprehensive Review of T&E Infrastructure. Strategic Goals:

– Integrate T&E infrastructure into cohesive Knowledge Management enterprise– Modernize T&E practices & processes to leverage Big Data analytics techniques– Apply Big Data analytics tools & techniques to the T&E mission space

• TRMC-funded proofs of concept are delivering proven capabilities– Enabling Big Data analytics for JSF T&E– Improving transfer of knowledge between fielded and next-gen systems– Informing an investment roadmap that advises future infrastructure, process, and

workforce decision-making

• Big Data Architecture Reference Document (ARD) will ensure interoperability and efficiencies in next-generation range knowledge management

– Big Data ARD will be standardized through JMETC Configuration Review Board (JCRB)– https://www.tena-sda.org/display/BDKM/Documentation

17

TRMC will consider additional pilots that continue to expand big data analytics in acquisition

Page 18: TRMC Big Data Analysis / Knowledge Management Initiative · TRMC Big Data Analysis / Knowledge Management Initiative Ryan Norman Big Data and Knowledge Management Initiative Lead

Event Scheduling / Event QuestionsInteroperability EventsKeith Poch(850) [email protected]

Help Desk

Connectivity / Network Questions

NCRC Expansion / Site Questions

JMETC Points of Contact (POCs)

JMETC Program ManagerGeorge Rumford(571) 372-2724

[email protected]

TENA Software Development Activity DirectorRyan Norman

(571) [email protected]

National Cyber Range Complex DirectorAJ Pathmanathan

(571) [email protected]

NCRC, Deputy DirectorRob Tamburello(501) [email protected]

Cyber EventsLizann Messerschmidt(571) [email protected]

JMETC MILS Network (JMN)Ben Wilson(757) [email protected]

JMETC Secret Network (JSN)Jeff Braget(850) [email protected]

Action Items, Questions, Tasks, Software Needs, Bug Reports: https://www.tena-sda.org/helpdesk

TENA Products / Software RepositoryTENA Software Development ManagerSteve Bachinsky(703) [email protected]

Miscellaneous QuestionsFor JMETC questions: [email protected] TENA questions: [email protected]

WebsitesUnclassified, FOUO, DoD-Restricted (CAC required): https://www.trmc.osd.milDistribution A, Industry, non-DoD (username/password required): https://www.tena-sda.org

Range Support and TrainingTENA User Support ManagerGene Hudgins(850) [email protected]

JMETC Information Assurance LeadRobin Deiulio(540) [email protected]

JTEX-03: August 21-23, 2018; Orlando, FL