federal big data working group meetup

18
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup September 22, 2014 1

Upload: mohammed-jalila

Post on 02-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

Federal Big Data Working Group Meetup. Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Federal Big Data Working Group  Meetup

1

Federal Big Data Working Group Meetup

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

September 22, 2014

Page 2: Federal Big Data Working Group  Meetup

2

There Was Some Confusion• Meetup for September 8th: NITRD FASTER CoP Leadership was not ready.

– So I said we would come back to it and instead support the new Meetup on September 10th featuring Professor Borne.

• New Meetup on September 10th: Data Science and Analytics in Government with Professor Kirk Borne.– See comments.

• New Meetup on September 22nd: NITRD FASTER CoP Meetings & Inter-American Development Bank Open Data Portal:– See NITRD FASTER CoP Meetings on August 19th and September 16th.– Professor Borne suggested Annette Hester, Developer of the Open Data Portal for the

Inter-American Development Bank.• Meetups on October 6th and October 9th:

– Wolfram Data Science Platform and Michael Daconta, Build a Knowledge Base (with the my experimental software EzKb).

– Digital Government Institute’s First Annual Big Data Conference, Ronald Reagan Building and International Trade Center.

Page 3: Federal Big Data Working Group  Meetup

3

NITRD FASTER• Faster Administration of Science and Technology Education and Research

(FASTER) Community of Practice (CoP):– FASTER’s goal is to enhance collaboration and accelerate agencies’ adoption of

advanced IT capabilities developed by Government-sponsored IT research. FASTER hosts Expedition and Emerging Technology workshops as well as monthly meetings with invited guest speakers to achieve this goal.

– NITRD created FASTER for Federal agency CIOs and/or their advanced technology specialists. FASTER, seeks to accelerate deployment of promising research technologies; share protocol information, standards, and best practices; and coordinate and disseminate technology assessment and testbed results. The Federal CIO Council under the leadership of the Office of Management and Budget (OMB) coordinates the use of IT systems. NITRD coordinates federally supported IT research under the leadership of OSTP (with OMB participation). FASTER, supported by the NITRD NCO, communicates with OMB and the Federal CIO Council concerning IT R&D matters that are of general interest to Federal agencies.

– FASTER is responding to the Open Government Directive by using the technologies of the Social Data Web (e.g., Linked Open Data and the Semantic Web).

Web Site

Page 4: Federal Big Data Working Group  Meetup

4

Agenda• Joint Meetup for NSF Data Scientists, Data Infrastructure, and Data

Publication• Agenda:

– 6:30 p.m. Welcome and Introduction, FASTER Co-chairs, Robert Chadduck (NSF), and Dr. Robert Bohn (NIST)

– 6:45 p.m Big Data and the NITRD: NSF Strategic Plan for Big Data and Open Research Data Publications, Dr. George Strawn, NITRD Director Slides

– 7:10 p.m. Brief Member Introductions– 7:15 p.m NSF Strategic Plan Knowledge Base, Dr. Brand Niemann, Federal Big Data

Working Group Meetup Slides– 7:45 p.m. Finding Funding for Research Topics on NSF Website(a simple example of

“finding”), 04-AUG-2014, Dr. Chuck Rehberg, CTO, Semantic Insights™ a Division of Trigent Software, Inc. Slides

– 8:30 p.m. Open Discussion– 8:45 p.m. Networking– 9:00 p.m. Depart

Page 5: Federal Big Data Working Group  Meetup

5

NSF Strategic Plan Knowledge Base

http://semanticommunity.info/Data_Science/NSF_Strategic_Plan

The FBDWG Meetup is doing the NSF Strategic Plan with Linked Open Data and the Semantic Web!

Page 6: Federal Big Data Working Group  Meetup

6

Data Science and Analytics in Government with Professor Kirk Borne

• Leigh Carden: Hi everyone! Just wanted to say thanks again to everyone for coming, and to Dr. Borne for a fantastic presentation, as always. I will send a follow up email out to everyone with the slides.

• Kirk Borne: Thank you everyone for your very positive feedback and comments. Thanks for coming, and thanks for your great questions and interactions.

http://www.meetup.com/Data-Science-Analytics-in-Government/events/202086162/

Page 7: Federal Big Data Working Group  Meetup

7

Data Science and Analytics in Government with Professor Kirk Borne

• Brand Niemann: Some asked for more information on my comments as follows:– State-of-the-art in cognitive computing: Wolfram Alpha,

Data Science Platform, Discovery Platform, Language, and Data Summit 2014 - Our October 6th Meetup

– Best Big Data Ontology Application for Federal Government: Professor Jens Pohl - Our July 28th Meetup

– Why the current "elephants" are good at nothing - Our June 30th Meetup

Page 8: Federal Big Data Working Group  Meetup

8

Data Science and Analytics in Government with Professor Kirk Borne

• Brand Niemann: I also thought it would be helpful to show an example of content, network, and data analytics and the data ecosystem for big astronomy data (77 TB) we did recently for our October 9th Meetup based on the NITRD FASTER CoP Meeting, August 19, 2014:– From SkyServer to SciServer: The JHU DIBBs Project

• Update on the Johns Hopkins University Data Infrastructure Building Blocks (DIBBs) Project latest developments

• An informative presentation and discussion with Dr. Alexander Szalay who shared his perspectives regarding the Johns Hopkins University Data Infrastructure Building Blocks (DIBBs) Project. Dr. Szalay is the Alumni Centennial Professor of Astronomy at the Johns Hopkins University, and also a professor in the Department of Computer Science.

• Webcast: http://youtu.be/Q35Xeh3KDTk

Page 11: Federal Big Data Working Group  Meetup

11

Data Science for JHU DIBBs Project: Conclusions

• Science is increasingly driven by data (big and small)• New instruments: “microscopes” & “telescopes” for data• A major challenge on the “long tail”• A new, Fourth Paradigm of Science is emerging…• SDSS has been at the cusp of this transition• Now the SciServer is continuing the legacy Gray's Law of Data

Engineering:– Scientific computing is revolving around data– Need scale out solution for analysis‐– Take the analysis to the data!– Start with “20 queries”– Go from “working to working”Source: Professor Szalay, August 19, 2014

Page 12: Federal Big Data Working Group  Meetup

12

NITRD FASTER CoP Meeting, September 16, 2014: TACC Wrangler

• Dr. Stanzione is the executive director of the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. He shared his perspectives regarding Wrangler, a transformational data intensive resource for the open science community.

• The Texas Advanced Computing Center (TACC) at The University of Texas at Austin and its partners are designing, building and deploying Wrangler, a groundbreaking data analysis and management system for the national open science community. Supported by a grant from the National Science Foundation (NSF), the new system is scheduled for production in January 2015.

• Wrangler is designed from the ground up for emerging and existing applications in data intensive science. Wrangler will be one of the highest performance data analysis systems ever deployed, and will be the most replicated, secure storage for the national open science community.

• Wrangler features a novel primary storage tier based on NAND Flash memory, which will enable reading and writing data at up to one terabyte per second and executing up to 275 million IOPS (input/output operations per second). In addition, the 10 petabyte disk storage system of Wrangler will be fully replicated to Indiana University, a partner in the project, providing data access reliability and security. Wrangler will support the popular Hadoop software framework and a full ecosystem of analytics methods and technologies for Big Data.

Page 13: Federal Big Data Working Group  Meetup

13

NITRD FASTER CoP Meeting, September 16, 2014: My Comments

• Interesting evolution of a major university supercomputer center to data science support services that could support the universities’ data science program and other universities’ data science programs.– Free or cheaper than Amazon and FedRamp-compliant. Plan to have

catalog of data sources and curation capabilities.• Like MIT Data Science Program Michael Stonebraker’s SciDB.org

across multiple universities.– But no spin-off companies like Tamr.

• Suggested GMU’s evolution to Learning from Data (Undergraduate), Data Science MS (with Hadoop) and Data Science Ph.D. (with Cray Urika Graph Computing).– Agreed this is a good model going forward.

Page 14: Federal Big Data Working Group  Meetup

14

Current Activities• Predictive Analytics World-Government Conference, September 15-16th:

– Some Highlights: Harlan Harris (Data Science DC), Elizabeth Handley (CMS Center for Program Integrity), Damon Davis (HHS IDEA Lab), Elaine Ayo (Data Science/Data Journalist Graduate Student, Georgetown University), and Dr. Jennifer Bachner (Director, MS in Government Analytics, JHU). Web Site

• Symposium on Big Data Analytics and Applications for Defense and National Security, September 23-24, 2014.– Free to government and military. I will provide story. Web Site

• 2014 IEEE International Conference on Big Data, October 27-30, Washington DC.:– Submitted Paper (pending-Brand Niemann) and NIST Workshop (accepted-Joan Aron) Proposals.

Web Site• Symposium on Predictive Analytics For Defense and Government, November 18-19,

Washington, DC.– Invited Presentation: Federal Big Data Initiative: Content, Network, and Data Analytics for

NITRD/NSF Data Science, Data Infrastructure, and Data Publications.• Ongoing analytics with OpenFDA data for Dr. Taha Kass-Hout, FDA’s first Chief Health

Informatics Officer (CHIO):– Interest in our Meetup on OpenFDA and Keynote at AFCEA Bethesda’s Health IT Day, December

2, Bethesda North Marriott Hotel and Conference Center. Web Site

Page 15: Federal Big Data Working Group  Meetup

15

DGI’s Annual Big Data Conference, October 9, Washington, DC Reagan Building

• Session title: Challenges and Solutions for Big Data in the Public Sector

• Moderator: Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community, and Co-organizer, Federal Big Data Working Group Meetup

• Panelists:– Dr. Kirk Borne, Professor of Astrophysics and Computational

Science, George Mason University– Dr. Tom Rindflesch, Information Research Specialist at

Cognitive Science Branch, National Institutes for Health (NIH)http://www.digitalgovernment.com/Events/Conferences/Government-Big-Data-Conference--Expo.shtml

Free to government and military and discount for FBDWG members.

Page 16: Federal Big Data Working Group  Meetup

16

Symposium on Predictive Analytics For Defense and Government, November 18-19, Washington, DC

Big Data Analytics Data Science

All Content - Structured and Unstructured

Results and Decisions Mining and Discovery

Data EcosystemData Set 1...Data Set N

ContentNetworkData

DescriptivePrescriptive

Microscope and Telescope

Data FAIRportData Commons

Semantic Community

Federal Big Data Initiative: Content, Network, and Data Analytics for NITRD/NSF Data Science, Data Infrastructure, and Data Publications• Challenges and Solutions for Big Data in the Public Sector• Fourth Paradigm and Fourth Question• Federal Big Data Working Group Meetup: Mission Statement, What Are We Doing (NIH Data

Commons); and How Are we Doing It?• Data Science for JHU/NITRD/NSF DIBBs Project: Knowledge Bases for Data Publications• Content, Network, and Data Analytics for the JHU/NITRD/NSF DIBBs Project: Visualizations

Page 17: Federal Big Data Working Group  Meetup

17

DRAFT Future Meetups

• October 6: Wolfram Data Science Platform (Invited) and Michael Daconta, Build a Knowledge Base with the my (experimental) software EzKb

• November 3: Georgetown Massive Data Institute (Invited-Mary Galvin Organizing)

• December 1: NSF GEO/EarthCube and ICER (Integrative and Collaborative Education & Research) or HealthData.gov Data Science Data Publications for Damon Davis.

Page 18: Federal Big Data Working Group  Meetup

18

Agenda• NITRD FASTER CoP Meetings & Inter-American Development Bank Open Data Portal• Agenda:

– 6:30 p.m. Welcome and Introduction - New Tutorial and Mentoring on Recent NITRD FASTER CoP Meetings, Data Science and Analytics in Government Meetup, Current and Upcoming Activities, and Big Data, Analytics, and Data Science. Slides. (See Story and Slides for September 8th Meetup Not Held)

– 7:10 p.m. Brief Member Introductions – 7:15 p.m Finding Funding for Research Topics on NSF Website (a simple example of “finding”),

04-AUG-2014, Chuck Rehberg, CTO, Semantic Insights™ a Division of Trigent Software, Inc. Slides

– 7:45 p.m. Annette Hester, Project Coordinator, Energy Innovation Center, Infrastructure and Environment Sector, http://www.iadb.org/energy • Annette Hester has developed the Open Data portal for the Inter-American Development Bank. Her

portal allows international development users to export numerous categories of data in JSON, RDF, and CSV formats. She is exploring more use cases and opportunities for Linked Data to inform decisions in government and business. Slides

– 8:30 p.m. Open Discussion – 8:45 p.m. Networking – 9:00 p.m. Departhttp://www.meetup.com/Federal-Big-Data-Working-Group/events/206366842/