geospatial intelligence middle east 2013_big data_steven ramage
Post on 10-May-2015
1.432 Views
Preview:
DESCRIPTION
TRANSCRIPT
Steven Ramage
Head of Ordnance Survey International
Big Data Considerations
Geospatial Intelligence Middle East,
May 2013
Geospatial Intelligence Middle East 2013
Recently the Military GIS and Intelligence communities have
gained a better understanding of the incredible increase of “Cloud”
empowered applications, the challenges and opportunities of Big
Data, the importance of social media, the availability of improved
applications, and the dramatic improvement in quality and
availability of remote sensing data. This, and the increased speed
of GIS applications and the integration of a full-motion video
analysis product, empowers military forces and national security
agencies to exploit and analyze full motion video from UAVs and
other airborne vehicles.
http://tinyurl.com/cd8z6y5
http://www.computerweekly.com/feature/Ordnance-
Survey-gets-to-grips-with-geospatial-big-data
“ Ordnance Survey has all but completed a five-year IT
improvement programme to enhance its operations. That
programme – with Oracle as the main IT partner – has already
transformed those operations into an enterprise grid computing
system that pulls 17 databases into one Oracle spatial
database management platform. The platform supports all
geospatial data types and models. The system combines open
source Linux with Oracle’s grid computing architecture, which
makes it possible to coordinate large numbers of low-cost servers
and corresponding storage so they operate like one large
computer. ”
• Big thinking (value)
• Big strategy (necessary)
• Big governance (stewardship)
• Big access (sharing)
• Big cooperation (supply chain)
• Big privacy (security)
• Big quality (QA/QC)
• Big people (skills training)
Big Data challenges
What is big data?
Shutterstock
http://hortonworks.com/blog/big-data-defined/ April 4th, 2013 Russell Jurney
• Wikipedia defines as problems posed by the awkwardness of
legacy tools in supporting massive datasets: what is a massive
dataset? Megabytes Yottabytes.
• Collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools
or traditional data processing applications.
• There is a ‘Big Data’ opportunity: transformative economics.
Big Data is the opportunity space created by new open source,
distributed systems from the consumer internet space.
The big data environment
• Volume
• Data at rest; levels increasing
• Velocity
• Data in motion; speed at which it transits enterprises and entire industries is faster than ever
• Variety
• Data in many forms; hundreds of millions of web pages, emails and unstructured data, such as Word documents and PDFs, as well as a nearly infinite number of events and information from every enterprise data centres
• Value
• Do you need it?
Facebook now has 50 billion photographs
• It uses local storage to be fast but inexpensive
• It uses clusters of commodity hardware to be inexpensive
• It uses free software to be inexpensive
• It is open source to build from community learning
• Cheap storage means logging enormous volumes of data to
many disks is easy. Processing this data is less so. Distributed
systems which have the above four properties are disruptive
because they are approximately 100 times cheaper than
other systems for processing large volumes of data, and
because they deliver high I/O performance.
The big data environment
• Apache Hadoop is one such system. Hadoop ties together a
cluster of commodity machines with local storage using free and
open source software to store and process vast amounts of data
at a fraction of the cost of other systems [Example: Esri/spatial-
framework-for-hadoop, GitHub: social network for programmers]
• SAN Storage $2-10/GB Local Storage $0.05/GB
The big data environment
• Capture every shred of data in the cheapest place possible
• Provide access to this data across the organization
• Mine the data for value
• “To undergo the transformative processes that unabridged
access to data provides, enabling bigger, better, faster more
profound insight than ever before”. Blogger
The big data environment
• How many of us need to undertake operations that rank every
web page that exists?
• What processing tasks cannot be handled on a single computer
or even a laptop? [Megabyte to Gigabyte range]
• Weren’t you doing data analysis before data became big?
• Do you have the requirement or capability to check
correlations or patterns that you can act on if you have
even more data?
• False positives. Vincent Granville wrote ‘The curse of big data’,
even if a dataset includes 1000 items there are many millions of
correlations, a few will be extremely high just by chance.
• Getting more into the field of data science (stats, quality, etc.)
Most data isn’t big and businesses are wasting
money pretending it is:
www.qz.com/81661/most-data
Mapping the global Twitter heartbeat:
the geography of Twitter
http://firstmonday.org/ojs/index.php/fm/article/vi
ew/4366/3654
• In 2012, supercomputing manufacturer Silicon Graphics
International (SGI), the University of Illinois and social media
data vendor GNIP collaborated to create the “Global Twitter
Heartbeat” project (http://www.sgi.com/go/twitter) in order to
map global emotion expressed on Twitter in real-time.
• GNIP provided access to the Twitter Decahose, which consists
of 10 percent of all tweets sent globally each day.
• SGI provided access to one of its new UV2000 supercomputers
with 256 processors and 4TB of RAM running the Linux
operating system.
From 12:01AM 23 October
2012 through to11:59PM 30
November 2012
Twitter Decahose from GNIP
streamed 1,535,929,521
tweets from 71,273,997
unique users, averaging 38
million tweets from 13.7
million users each day.
Use the location of social
media posts for emergency
warning, real-time local
situation reporting, etc.
Big data perspective on mapping the
geography of Twitter
• iPhones and Blackberries yield an additional 1% of all tweets
being georeferenced
• However, they’ve been missed by previous studies because
• They store their geographic information in the textual
Location field rather than the machine-readable Geo
metadata field
• In the big data era we need to look at the data itself, not just
assume it follows the manual.
Kalev Leetaru, University of Illinois on CrisisMappers
http://www.CrisisMappers.net
Shutterstock
Why do we need big data?
Analytics plus geospatial data is changing the
way we get insights (hidden patterns)
• Geospatial analytics gives you the ability to ask “where”
questions of business data
Where did it
happen?
Where will it
happen?
Where is it
happening?
Source: Teradata
Analytics plus geospatial data is changing the
way we get insights
• Where are my customers?
• Where are my competitors?
• How far will customers travel to a branch or store?
• Which of my competitor’s customers can I draw to a branch or store?
• Which customers live close to a branch or store?
• Where can I increase profitability?
• How can I mitigate financial risk from flooding?
Is there a ‘problem with crowdsourcing
intelligence’?
DefenceIQ, May 2013 Thomas Chappelow
http://www.defenceiq.com/defence-technology/articles/the-problem-
with-crowdsourcing-intelligence-in-syr/
• blogging, tweeting, mapping and photographing every single
detail…creating an unprecedented mountain of information that
can be farmed for actionable intelligence
• lack of traditional sources to rely on, the global intelligence
community has to look elsewhere for information…
crowdsourcing appears a juicy prospect – until it goes wrong
• Provenance, verification and trust
• Just as important for HUMINT as GEOINT
Big data cycle
Shutterstock
Some experience gained
• Ordnance Survey is 222 years old
• Civilian organisation since 1983; 1100 staff
• Independent Government Department and Executive Agency reporting directly to a Government Minister
• Trading Fund since April 1999
• Annual Report for 2011/12: Revenue of £141.8m, profit before exceptional items of £31.9m, dividend £17.2m
• Southampton headquarters with 26 field offices in Great Britain
Ordnance Survey today
The size of the task
Topographic Layer (approximate volumes)
1:1250 Scale = 17 000 km2
1:2500 Scale = 158 000 km2
1:10 000 Scale = 66 000 km2
Over one million units of change per year.
Address Layer 27.5 million geocoded postal addresses, with 500 000 changes per year.
Transport Network Layer 5.37 million kms of roads, 3.97million links, 885 881 route instructions – over 20 000 changes per month.
Updating the Ordnance Survey database
Wide Range of Customers and Markets
A database to connect via real world information
• Every object represented in OS MasterMap has a unique
Reference identifier called a TOID. These TOIDs can be used to
connect other information and are linked to other core references
OS MasterMap current layers
Ordnance Survey
and
IBM Netezza
Shutterstock
Stress
Testing our
Data
Data Queries
New Insights
Storytelling with
Location Data
Using IBM Netezza for high performance
geospatial analytics
Netezza and geospatial analytics
• In-database geospatial analytic functions
• Native understanding of geospatial data
• High performance out of the box
• Scales to terabytes of data
• No indexes or aggregates to manage
• Open, standards-based interface and data model
Analyse all data in a single appliance
Stress
Testing our
Data
Stress testing our data – Volume of data
Data
Queries
? ?
? ?
?
?
?
? ?
Data queries – Volume of data
Data queries – Volume of data
We analysed 41 million
records in 19 hours.
We could not run this
query in the past.
New Insights
New insights – Volume and variety of data
Storytelling
with
Location
Data
Storytelling with location data
Big Data – Linked Data
• As Ordnance Survey approaches the end of the transformation of its
operations, it is preparing its data to exploit the myriad
interconnections that can exist between physical entities in what has
been described as the “Internet of Things”. This web of
interconnections between disparate objects and ideas is made
possible through linked data technology.
• Linked data assigns a unique tag – a three-fact, uniform resource
identifier known as a triple – to each thing of interest. For example,
population data can be linked to socio-economic statistics for a
given town.
• Linked Data Web, currently estimated to include more than 30 billion
triples, with some 20% of those having geographic content.
Joining up Government
‘Find me all GPs in my ward, bus stops within a 500 metre radius of those GPs, but exclude bus stops in areas of high crime’.
Environment
Transport
Health
Education
Business
Weather
Crime
Council
Hyperlocal example
• Big thinking (value)
• Big strategy (necessary)
• Big governance (stewardship)
• Big access (sharing)
• Big cooperation (supply chain)
• Big privacy (security)
• Big quality (QA/QC)
• Big people (skills training)
Big Data challenges
• Strategic review and assessment
• Capacity and capability building
• Knowledge transfer and training
• Value of geographic information
• Technology direction – 3D, quality,
open standards and much more
• National authoritative mapping
• National address infrastructure
• National geodetic infrastructure
• National spatial data infrastructure
Ordnance Survey International: advisory services
Thank you for your attention. For further information contact:
Steven Ramage, Head of Ordnance Survey International
steven.ramage@ordnancesurvey.co.uk
Ordnance Survey International
top related