big data on crossrail

9
Big Data on Crossrail John Brzeski, CH2M Hill 15 th June 2016, BGA Annual Conference

Upload: john-brzeski

Post on 13-Apr-2017

73 views

Category:

Engineering


3 download

TRANSCRIPT

Big Data on Crossrail

Big Data on Crossrail

John Brzeski, CH2M Hill

15th June 2016, BGA Annual Conference

Good afternoon everyone.

Im here to talk to you about Big Data on Crossrail.

Project/What data we have

What is Big Data?

Examples and Challenges

How managed and the Future

First Id like to start by telling you about the project and the data we have. Then Ill attempt to explain to you what I think big data is. We can they see if it is possible to call our data big data.

Ill show you some examples and explain some of the challenges we have face along the way

Finally Id like to share with you what I think the future of this data is, something which I believe has the potential to be very interesting

1

Crossrail

Im sure everyone in the room knows about Crossrail

Scope is - Data in Central section

TBM, SCL, Shafts & Portals

My role to manage data from all

Brief

UCIMS

Im sure everyone in the room knows all about Crossrail so I wont go in to too much detail on the project itself. This scope of presentation is to explain how we have handled the data relating to ground movements caused by the Crossrail construction activities in the central section. Which consists of 42km of twin bore TBM tunnels, 5 new underground stations excavated using SCL methods and various associated portals and access shafts.

My role on the project was to manage the data being generated by the main works contractors with regards to construction progress and Instrumentation and monitoring. The brief for this part of the project was to provide the client with a centralised database of monitoring and construction information in order for interested parties to view and download data. This was to be known as the Underground Construction Information Management System, or more colloquially as UCIMS.

2

The data

I&M

100,000 sensors

Up to sub-hourly readings

InSAR

850,000 Reflectors

6 readings per quarter

Construction Data

TBM, SCL, grouting, excavations, piling, dwalls

200,000 separate events

Asset Information

Utilities, buildings, structures

16,000 assets

Approx. 1billion records

Mapping

Aerial Images

Ok, so what data do we have?

Our data

Cool representation of our data points

Blue sensors

Yellow construction

Grey InSAR

Others Assets an Mapping

Total

Hell of a lot. All types from many. Europe's biggest civil project reflected in scale of data project

3

Is this Big Data?

1 billion readings

Sensors, satellite monitoring, building information, construction events

Generated at max frequencies of 1 reading every 15 minutes

Erroneous values common in I&M

Decision making tool for construction process control

5 Vs

Volume

Variety

Velocity

Veracity

Value

Its a hell of a lot but can we call it Big data?

Have to say what BD is first.

This nice chap is a bit of a guru

Uses the 5 Vs to describe the nature of BD

Not really quantitative

Not about the data its about making use of it.

For us

Is this big data? Yes (has the potential)

Are we managing to get the most VALUE out of the huge VOLUME of data given its VARIETY, VELOCITY and VERACITY?

4

A word (or two) on data format

So, if were saying we have 1 billion records. What about variety?

This refers to the format

Every system needs to be flexible

Need to be prescriptive

AGS 3.1 was chosen. Gave us the framework to conform with the BS for geotech dm

Needed some tweaks

Innovation was construction groups see animation

Specs being reviewed

Used for the last 4 years.

Mixed reception but raised the bar on CRL

In terms of the Five Vs this is really the variety

Its important for any system to be able to cope with various data formats and there are so many out there for different data types. When dealing with multiple contractors it is important to establish a format and more importantly, structure, that all can adhere to in order to use the data effectively.

Crossrail adopted AGS 3.1 for this format. Whilst AGS 3.1 does include the monitoring spec, it did not encompass everything we needed so we added instrument types and readings fields.

The construction data groups were devised in consultation with the main works contractors and with reference to Crossrail specifications. We made an AGS group for each construction type as explained on the previous slide.

The CRL AGS spec is currently under review by the AGS but we have used them over the last 4 years to standardise the type of data and metadata we wanted.

5

Data veracity

Knocks !

Spikes !

Gaps/Noise !

Too much !

Now on to the trustworthiness of our data

The veracity

Some issues in I&M spikes, knocks, gaps, noise, delays, too much!

Example

Eliminating in live system

Interesting thing is Im often asked to sanitise

Understand they exist and develop strategies to see past.

Others in the industry working on this.

What I will say is that we have gone through a continual QA process with the data. Everything is present and correct and in the right position.

Eliminating these errors in a live monitoring system is not a realistic objective, it is more about understanding that they can exist and developing tools to see past them. One of the biggest challenges I faced on Crossrail was getting consumers to approach the analysis of the data from a bigger data perspective. As we all know, monitoring is all about identifying trends and comparing observed behaviours to events. This is not generally easily possible by viewing series of individual graphs and then looking up events elsewhere, then comparing this to other environmental factors. Everything needs to be in one place so that relationships can be explored and data should be aggregated to see trends. ## In terms of stakeholder assurance, the most successful contracts were those who could take all the data being generated and distil it down to the pertinent points without missing information.

The challenge is to have a system that can still perform the required analysis with the spurious data included without it skewing results. I have often been asked to remove spurious data from the database but it is sometimes valuable to keep it as it shows what type of errors you can expect to see in the data and you can then work on algorithms to identify and discount them. Something I know that others in the industry are also working on.

6

.

Initial implementation

13000 files per day

2000 downloads a day

500 plots a day

So how did we manage all of this on Crossrail?

The contract for doing this was let along with the Route Wide I&M contract

This is what was delivered. Some screenshots

Lancaster Gate

TBM

Sensor

Profile

Stats

Ramping down

Served construction phase and fulfilled brief

Use evolving

Damage claims

Single source of truth

Not scalable value not realised

7

What next?

ValueDesignAcademic ResearchFuture ProjectsConstruction ControlStakeholder AssuranceData ManagementDamage Assessments

So, given the scaling issue, what have we done?

Rebuilding of dB from source AGS

Focus has been on connectivity & scalability

Scalability geos both ways

APIs for dashboard

See TCR example

API for connection via Esri arcMAP and other GIS

Conclusion

Conclusion most important v, Value

Clear that a lot of value has been delivered

Always the case that data is forgotten

Aim is to make data open

Data.gov.uk BGS, OS, TFL

Challenges

Making the data available to all would ensure the full value is extracted

Some of the values to be gained shown on slide

Love to hear from you if you have experience in this.

8

Thank you.

Questions?

[email protected]

9