machine learning applications for production prediction

215
University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2020-12-09 Machine Learning Applications for Production Prediction and Optimization in Multistage Hydraulically Fractured Wells Chaikine, Ilia Chaikine, I. (2020). Machine Learning Applications for Production Prediction and Optimization in Multistage Hydraulically Fractured Wells (Unpublished doctoral thesis). University of Calgary, Calgary, AB. http://hdl.handle.net/1880/112817 doctoral thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca

Upload: others

Post on 10-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning Applications for Production Prediction

University of Calgary

PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2020-12-09

Machine Learning Applications for Production

Prediction and Optimization in Multistage

Hydraulically Fractured Wells

Chaikine, Ilia

Chaikine, I. (2020). Machine Learning Applications for Production Prediction and Optimization in

Multistage Hydraulically Fractured Wells (Unpublished doctoral thesis). University of Calgary,

Calgary, AB.

http://hdl.handle.net/1880/112817

doctoral thesis

University of Calgary graduate students retain copyright ownership and moral rights for their

thesis. You may use this material in any way that is permitted by the Copyright Act or through

licensing that has been assigned to the document. For uses that are not allowable under

copyright legislation or licensing, you are required to seek permission.

Downloaded from PRISM: https://prism.ucalgary.ca

Page 2: Machine Learning Applications for Production Prediction

UNIVERSITY OF CALGARY

Machine Learning Applications for Production Prediction and Optimization in Multistage

Hydraulically Fractured Wells

By

Ilia Chaikine

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

GRADUATE PROGRAM IN CHEMICAL AND PETROLEUM ENGINEERING

CALGARY, ALBERTA

DECEMBER, 2020

© Ilia Chaikine 2020

Page 3: Machine Learning Applications for Production Prediction

ii

Abstract

Due to improvements in horizontal drilling and completion technologies over the past several

decades, multistage hydraulic fracturing has become very popular and has led to an explosive

growth of the shale and tight oil and gas production worldwide. Even though the completion

techniques are well known and relatively simple, the dynamics of fracture formation and

hydrocarbon flow within the reservoir are extremely complex. Even with the recent

developments, little is known about how the rock mechanical properties, completion design and

well spacing affect the morphology of fracture networks and the production of hydrocarbons at

the wellhead. Because of this lack in understanding there are no models as of yet that are capable

of forecasting production performance with good accuracy. The focus of the thesis is the

Montney Formation in Alberta. The research presented in this thesis describes a method to use a

convolutional-recurrent neural network (c-RNN) to generate synthetic shear sonic logs with high

accuracy and to link a broad range of input parameters, both geological and stimulation, at every

stage along a horizontal well bore to the production performance at the well head. The results

show that the production performance is driven more by the rock mechanical properties

surrounding the perforation clusters than the design of the hydraulic fracture. The results also

show that well spacing has affect on production performance. The outcomes of the research

provide tools for improving the accuracy of rock mechanical models, optimizing hydraulic

fracturing operations with respect to water usage and the placements future wells in the reservoir

to maximize gas production.

Page 4: Machine Learning Applications for Production Prediction

iii

Acknowledgements

I would like to thank my supervisor Dr. Ian Gates for taking me in as his student. His knowledge,

encouragement and support has made me passionate about research and its limitless potential.

I would also like to thank Sproule Associates Limited for providing me the financial support

needed to pursue my goals. Special thanks also go to my colleagues at Sproule, firstly to my

mentor and boss Scott Pennell, for always making time to listen and discuss ideas and for always

supporting my goals, without your help this research project would have not happened. I would

also like to thank Irina Baisitova for mentoring me since the start of my engineering career and

helping me get through some of those tough graduate courses; Alexey Romanov for always

making time to show me how to work Petrel and build geomodels and teaching me all about

statistics; Surya Karri for teaching me all I needed to know about petrophysics and sonic logs;

Victor Verkhogliad for teaching me all about Montney geology and geology in general; and

Richard Holst for your enthusiasm about my project and always being there to answer any

question I had.

I give my greatest thanks to my wife and best friend Kaitlyn Johnson, you have made this

graduate school experience wonderful and your unfaltering love, faith and support

has made my life the only life I want to live.

Page 5: Machine Learning Applications for Production Prediction

iv

Dedication

To my wife Kaitlyn, thank you for being there for me through thick and thin and supporting me in

all my goals and dreams.

Page 6: Machine Learning Applications for Production Prediction

v

Table of Contents

Abstract ........................................................................................................................................... ii

Acknowledgements ........................................................................................................................ iii

Dedication ...................................................................................................................................... iv

Table of Contents ............................................................................................................................ v

List of Tables ................................................................................................................................. ix

List of Figures ................................................................................................................................ xi

Chapter 1: Introduction ................................................................................................................... 1

1.1 Background ......................................................................................................................................... 1

1.2 Montney Formation and Study Area .................................................................................................. 4

1.3 Problem Statement ............................................................................................................................. 6

1.4 Organization of Thesis ......................................................................................................................... 8

Chapter 2: Literature Review ........................................................................................................ 10

2.1 Multi-Stage Hydraulic Fracturing ...................................................................................................... 10

2.2 Variables Affecting Post Fracture Production Performance ............................................................. 11

2.2.1 Geological properties surrounding a wellbore location ............................................................ 12

2.2.2 Completion design ..................................................................................................................... 13

2.2.3 Well spacing and completion order ........................................................................................... 14

2.3 Hydrocarbon Production Forecasting ............................................................................................... 17

2.3.1 Forecasting ................................................................................................................................. 17

2.3.2 Forecasting Hydrocarbon Production ........................................................................................ 18

2.4 Time Series Forecasting and Machine Learning Algorithms ............................................................. 21

2.4.1 Parametric Models ..................................................................................................................... 23

2.4.2 Supervised Machine Learning and Artificial Neural Networks .................................................. 24

2.5 History of Machine Learning in Production Forecasting and Optimization ...................................... 34

2.6 What is Missing in Literature? .......................................................................................................... 35

Page 7: Machine Learning Applications for Production Prediction

vi

Chapter 3: A New Neural Network Procedure to Generate Highly Accurate Synthetic Shear

Sonic Logs in Unconventional Reservoirs .................................................................................... 37

3.1 Preface .............................................................................................................................................. 37

3.2 Abstract ............................................................................................................................................. 38

3.3 Introduction ...................................................................................................................................... 39

3.4 Study Area and Data Processing ....................................................................................................... 42

3.4.1 Study Area and Structure Model ............................................................................................... 42

3.4.2 Data Preparation ........................................................................................................................ 44

3.5 Experimental Setup and Types of Neural Network Algorithms ........................................................ 47

3.5.1 Experimental Setup .................................................................................................................... 47

3.5.2 Comparisons .............................................................................................................................. 49

3.5.3 Networks used in the experiments ............................................................................................ 50

3.5.4 Overfitting .................................................................................................................................. 51

3.5.5 Stopping procedure ................................................................................................................... 53

3.5.6 Summary of Procedure .............................................................................................................. 57

3.6 Results and Discussion ...................................................................................................................... 59

3.7 Conclusions ....................................................................................................................................... 72

Chapter 4: A Convolutional-Recurrent Neural Network Model for Predicting Multi-Stage

Horizontal Well Production .......................................................................................................... 74

4.1 Preface .............................................................................................................................................. 74

4.2 Abstract ............................................................................................................................................. 75

4.3 Introduction ...................................................................................................................................... 76

4.4 Input Data Preparation ..................................................................................................................... 78

4.4.1 Geological Properties ................................................................................................................. 80

4.4.2 Completion Variables ................................................................................................................. 81

4.4.3 Well spacing and completion order ........................................................................................... 83

4.4.4 Production Data ......................................................................................................................... 84

4.5 Experimental Setup ........................................................................................................................... 88

4.5.1 Networks used in the experiments ............................................................................................ 88

4.5.2 Input shape and normalization .................................................................................................. 89

Page 8: Machine Learning Applications for Production Prediction

vii

4.5.3 Experimental Setup .................................................................................................................... 91

4.5.4 Hyperparameter Tuning ............................................................................................................. 95

4.6 Results and Discussion ...................................................................................................................... 97

4.7 Conclusions ..................................................................................................................................... 105

Chapter 5: Optimizing Water Usage during Multi-Stage Hydraulic Fracturing with a

Convolutional-Recurrent Neural Network .................................................................................. 107

5.1 Preface ............................................................................................................................................ 107

5.2 Abstract ........................................................................................................................................... 108

5.3 Introduction .................................................................................................................................... 109

5.4 Study Area and Proposed Wells ...................................................................................................... 114

5.4.1 Well spacing and completion order ......................................................................................... 115

5.4.2 Rock mechanical properties ..................................................................................................... 116

5.4.3 Completion Parameters ........................................................................................................... 116

5.4.4 Input shape and normalization ................................................................................................ 117

5.5 Neural Network Algorithms and Experimental Setup..................................................................... 118

5.6 Results and Discussion .................................................................................................................... 120

5.7 Conclusions ..................................................................................................................................... 134

Chapter 6: Using a Convolutional-Recurrent Neural Network Forecasting Model to Optimize the

Positioning of New Wells in a Partially Developed Field .......................................................... 136

6.1 Preface ............................................................................................................................................ 136

6.2 Abstract ........................................................................................................................................... 137

6.3 Introduction .................................................................................................................................... 138

6.4 Study Area and Completion Scenarios ............................................................................................ 141

6.4.1 Well spacing and completion order ......................................................................................... 141

6.4.2 Rock mechanical properties ..................................................................................................... 142

6.4.3 Completion Parameters ........................................................................................................... 143

6.5 Neural Network Algorithm for Gas Production Forecasting ........................................................... 143

6.6 Well Combinations and Procedure ................................................................................................. 145

6.7 Results and Discussion .................................................................................................................... 150

6.8 Conclusions ..................................................................................................................................... 159

Page 9: Machine Learning Applications for Production Prediction

viii

Chapter 7: Conclusions and Recommendations ......................................................................... 161

7.1 Conclusions ..................................................................................................................................... 161

7.2 Recommendations .......................................................................................................................... 163

References ................................................................................................................................... 166

Appendix A: Temperature Transient Analysis of the Steam Chamber during a SAGD Shutdown

Event ........................................................................................................................................... 175

A.1 Abstract ........................................................................................................................................... 175

A.2 Introduction .................................................................................................................................... 176

A.3 Literature Review ............................................................................................................................ 178

A.4 Temperature Transient Analysis ..................................................................................................... 179

A.4.1 Non-Condensing Model for SAGD Start up ............................................................................. 180

A.4.2 Condensing Model for SAGD Ramp up .................................................................................... 184

A.4.3 Variable Thermal Diffusivity .................................................................................................... 189

A.5 Results and Discussion .................................................................................................................... 192

A.5.1 Constant Thermal Diffusivity ................................................................................................... 192

A.5.2 Variable Thermal Diffusivity .................................................................................................... 196

A.6 Conclusions ..................................................................................................................................... 198

Page 10: Machine Learning Applications for Production Prediction

ix

List of Tables

Table 2.1: Completion variables. .................................................................................................. 14

Table 2.2: Parametric and non-parametric algorithms for analyzing time series data. ................. 22

Table 3.1: Total number of data points in each well used for the study. ...................................... 47

Table 3.2: Starting hyperparameters for the c-RNN developed in this study. .............................. 57

Table 3.3: Results of the study comparing the performance of the various methods of generating

synthetic DTS curves. ................................................................................................................... 60

Table 4.1: The stage variables that were used as inputs in the experiments. ................................ 80

Table 4.2: Completion variables ................................................................................................... 81

Table 4.3: Starting hyperparameters for the c-RNN developed in this study ............................... 96

Table 4.4: Results of the leave-one out experiments performed on only the geological variables

....................................................................................................................................................... 97

Table 4.5: Results of the leave-one out experiments performed on only the completion variables

....................................................................................................................................................... 97

Table 4.6: Results of the leave-one out experiments performed on geological, completion and

spacing variables ........................................................................................................................... 97

Table 5.1: The stage variables that were used as inputs in the sensitivity experiment. .............. 115

Table 5.2: The ranges of all the variable input parameters and the increments used for the

sensitivity experiment. ................................................................................................................ 117

Table 5.3: Results from the sensitivity analysis using the crosslinked gel as the fracture fluid . 121

Table 5.4: Results from the sensitivity analysis using the slickwater as the fracture fluid. ....... 123

Table 6.1: Results from the sensitivity analysis using crosslinked gel as the fracture fluid. ...... 151

Page 11: Machine Learning Applications for Production Prediction

x

Table 6.2: Results from the sensitivity analysis using slickwater as the fracture fluid. ............. 154

Page 12: Machine Learning Applications for Production Prediction

xi

List of Figures

Figure 1.1: Resource triangle: the top region of the triangle represents conventional resources

available on the planet whereas the larger bottom region represents unconventional resources.

(Holditch, 2013). ............................................................................................................................. 1

Figure 1.2: Areal extent of the Montney Formation (Canadian Energy Regulator, 2018). ............ 5

Figure 1.3: Location of Study Area (Canadian Energy Regulator, 2018). ..................................... 6

Figure 2.1: Diagram of a hydraulic fracturing operation ( US Environmental Protection Agency,

2013). ............................................................................................................................................ 10

Figure 2.2: Example of multiple horizontal wells drilled form a pad (Energy Essentials, 2015). 15

Figure 2.3: Showing bounded and unbounded wells. Here well B is bounded from both sides, A

and C are bounded from one side, and well D is unbounded. (Belyadi et al., 2017) .................... 17

Figure 2.4: The accuracy of forecasts increases with the amount of system understanding. ....... 18

Figure 2.5: A schematic diagram of two bipolar neurons. (Mohaghegh, 2000). .......................... 25

Figure 2.6: Schematic diagram of a typical artificial neuron (Mohaghegh, 2000). ...................... 26

Figure 2.7: Structure of a three-layer ANN (Neville et al. 2004). ................................................ 27

Figure 2.8: ANN training model using PSO (adapted from Panja et al. 2017). ........................... 28

Figure 2.9: Visualization of overfitting (Bhande, 2018). .............................................................. 32

Figure 2.10: A plot showing how training and validation error evolves over the amount of

epochs. .......................................................................................................................................... 32

Figure 2.11: Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right:

An example of a thinned net produced by applying dropout to the network on the left. Crossed

units have been dropped (Srivatava et al., 2014). ......................................................................... 33

Page 13: Machine Learning Applications for Production Prediction

xii

Figure 3.1: Plot of shear slowness (DTS) versus compressional slowness (DTP) measurements

for 14 wells in the Montney Formation. ....................................................................................... 41

Figure 3.2: A 3D visualization of all wells in the study area. Red = horizontal, blue = deviated,

and black = vertical. ...................................................................................................................... 43

Figure 3.3: 3D generated Montney and Belloy surfaces in Petrel. The vertical axis is 25x

exaggerated compared to the X and Y scales and represent the subsea depth. ............................ 44

Figure 3.4: Areal view of the study area generated in Petrel. Black points with no labels are wells

with no DTS logs, labeled wells have the DTS logs..................................................................... 45

Figure 3.5: Architecture of the ANN (A) and c-RNN (B) used in the study. ............................... 50

Figure 3.6: Format of input and output data of the c-RNN. ......................................................... 51

Figure 3.7: Effect of batch size on the evolution of the validation error for blind well VT_42. .. 52

Figure 3.8: Training and validation error for two blind well tests. ............................................... 54

Figure 3.9: Validation error for all blind well tests, results are split into two graphs for clarity. 55

Figure 3.10: 20,000 iterations of the blind well DV_28 experiment. ........................................... 55

Figure 3.11: A zoomed in plot showing the training and blind well error tremor. ....................... 56

Figure 3.12: General procedure for training the synthetic DTS tool that can be applied to any

formation. ...................................................................................................................................... 58

Figure 3.13: Process for using the training tool to generate synthetic DTS curves. ..................... 59

Figure 3.14: Synthetic logs generated by the cheat ANN (left) and the average of three c-RNN

(right) runs compared to the true logs for one of the worst blind wells – VT_10. ....................... 64

Figure 3.15: Synthetic logs generated by the cheat ANN (left) and the average of three c-RNN

(right) runs compared to the true logs for the best blind well – VT_18. ...................................... 65

Page 14: Machine Learning Applications for Production Prediction

xiii

Figure 3.16: Synthetic logs generated by the cheat ANN (left) and the average of three c-RNN

(right) runs compared to the true logs for the most important well - VT_42. .............................. 66

Figure 3.17: Synthetic logs generated by the c-RNN for wells VT_26 and VT_29 compared to

the true logs. .................................................................................................................................. 68

Figure 3.18: Synthetic logs generated by the c-RNN for wells VT_69 and VT_108 compared to

the true logs. .................................................................................................................................. 69

Figure 3.19: Sequence of synthetic well logs generated for well VT_18 at different number of

epochs. Y-axis is DTS (µs/ft) and x-axis is the thickness steps from the top of the formation. ... 72

Figure 4.1: Areal extent of the 74 horizontal wells used in the study. ......................................... 79

Figure 4.2: An example of the long-range rock mechanical profiles surrounding the horizontal

wells. .............................................................................................. Error! Bookmark not defined.

Figure 4.3: A visual representation the Arps decline overlaying the true production. This is

shown on the rate-cumulative production plot for one of the wells. The red line represents the gas

rate assuming 100% on time every month, the orange line represents gas rate scaled down to

account for the actual on time during the month. ......................................................................... 85

Figure 4.4: A plot showing the differences in rate restrictions between the actual production and

that of the Arps decline curve. ...................................................................................................... 85

Figure 4.5: The three common types of plots used to describe the production performance of a

well. Cumulative versus time (top), rate versus cumulative (middle), rate versus time (bottom).

These plots were generated using one of the wells in the study. .................................................. 87

Figure 4.6: c-RNN structure that was used in the study. .............................................................. 89

Figure 4.7: Format of input and output data of the c-RNN model ............................................... 90

Page 15: Machine Learning Applications for Production Prediction

xiv

Figure 4.8: Plot showing the numerous predictions that a model trained on the same dataset could

make. ............................................................................................................................................. 93

Figure 4.9: Plots showing how the average prediction changes based on the number of runs ..... 94

Figure 4.10: Distribution of individual well MAPE from the best case. .................................... 100

Figure 4.11: Plot the worst (left) and best (right) well production profiles created by the best-case

model. The red line is the average of the 30 runs, the green line is the true profile. .................. 100

Figure 4.12: Plot the best-case aggregate production profile of all 74 wells vs time. ................ 102

Figure 4.13: Plot how aggregating well together affects the mean average percent error (MAPE).

..................................................................................................................................................... 103

Figure 5.1: 40 proposed well locations (blue) added to the existing 74 wells (red) in the study

area generated in Petrel. .............................................................................................................. 114

Figure 5.2: Aggregated cumulative 5-year production vs proppant amount and fluid amount per

stage using 15 stages per well and the crosslinked gel (top) and the slickwater (bottom) as the

fracture fluid................................................................................................................................ 127

Figure 5.3: Affect of stage count per well on the aggregated cumulative 5-year production using

crosslinked gel (top) and slickwater (bottom) as the fracture fluid. ........................................... 128

Figure 5.4: crosslinked gel vs slickwater results for 15 stage count. .......................................... 129

Figure 5.5: Aggregated cumulative production versus the total water injected for various stage

counts of crosslinked gel and slickwater wells using 120 tonnes of proppant per stage. ........... 130

Figure 5.6: injection efficiency for different stage count and fluid types versus total fluid injected

..................................................................................................................................................... 131

Figure 5.7: 5-year cumulative production (top) and injection efficiency (bottom) for the 74

existing crosslinked gel and slcikwater wells versus total fluid injected.................................... 133

Page 16: Machine Learning Applications for Production Prediction

xv

Figure 6.1: One random combination of 20 wells (top), another random combination of 20 wells

(bottom)....................................................................................................................................... 147

Figure 6.2: Best and worst 20 well positions found in the 40 well prediction using the slickwater

(water intense) stimulation. ......................................................................................................... 149

Figure 6.3: Best and worst 20 well positions found in the 40 well prediction using the crosslinked

gel (water conservative) stimulation. .......................................................................................... 150

Figure 6.4: Results of the 104 combinations presented in a histogram using both crosslinked

(top) and slickwater (bottom) as the stimulation fluid. ............................................................... 157

Figure A.1: Cross sectional schematic of a typical SAGD process (Gotawala and Gates, 2010).

..................................................................................................................................................... 177

Figure A.2: Schematic diagram of a three zone non-condensing model used for SAGD start

up (modified from Zhu et al., 2012). ........................................................................................ 181

Figure A.3: Discretization domain for one-dimensional radial heat transfer - temperature grid

layout. ......................................................................................................................................... 183

Figure A.4: Schematic diagram of a three-zone condensation model used for SAGD ramp up

(modified from Zhu and Zeng, 2014). ...................................................................................... 185

Figure A.5: Decision tree for steam quality grid calculation. .................................................... 187

Figure A.6: Decision tree for steam temperature grid calculation. ............................................. 187

Figure A.7: Discretization domain for one-dimensional radial phase change - steam quality

grid layout. ................................................................................................................................. 188

Figure A.8: Non-condensing model evolution of radial temperature profile. ............................ 193

Figure A.9: Condensing model evolution of radial temperature profile. .................................... 194

Page 17: Machine Learning Applications for Production Prediction

xvi

Figure A.10: Condensing model evolution of radial steam quality profile. ............................... 194

Figure A.11: Temperature change with time at the center of the well bore during start up and

ramp up. ...................................................................................................................................... 195

Figure A.12: Slope of temperature plotted against time at the center of the well bore during star

up and ramp up............................................................................................................................ 196

Figure A.13: Non-condensing radial temperature profile after 2 weeks for variable and constant

kh cases. ....................................................................................................................................... 197

Figure A.14: Non-condensing temperature evolution at the center of the wellbore for variable

and constant kh cases. .................................................................................................................. 197

Page 18: Machine Learning Applications for Production Prediction

1

Chapter 1: Introduction

1.1 Background

Unconventional hydrocarbon resources differ from conventional ones in that they reside in tight,

low quality formations and are more difficult to extract (Cander, 2012). Unconventional

resources are also much more abundant on the planet than conventional ones as shown in Figure

1.1.

Figure 1.1: Resource triangle: the top region of the triangle represents conventional resources

available on the planet whereas the larger bottom region represents unconventional resources.

(Holditch, 2013).

Many types of geological formations are classified as bearing unconventional resources

including tight gas sands, gas shales, heavy oil sands, coalbed methane, oil shales, and gas

hydrates. The focus of this research is on tight sands and shale, which are low permeability and

the methods and technology required to extract hydrocarbons from these reservoirs is different

from that of conventional reservoirs. For production, tight sands and shale reservoirs require

higher well density. Furthermore, the wells drilled will not produce economically until they are

stimulated using methods such as hydraulic fracturing (Ma et al., 2016)

Conventional Resources

Unconventional Resources

Page 19: Machine Learning Applications for Production Prediction

2

Hydraulic fracturing has become the most widely used technique of producing hydrocarbons

from low permeability reservoirs. The process includes pumping massive amounts of sand, water

and chemicals at a high rate and pressure into the formation to induce and prop open fractures

that increase the surface area of the reservoir that is connected to the wellbore (Belyadi et al.,

2017). Modern hydraulic fracturing is typically performed in multiple stages along the entire

horizontal wellbore. Advancements in hydraulic fracturing and horizontal drilling techniques

over the past several decades have made producing hydrocarbons from unconventional reservoirs

economically feasible – this has led to the shale boom in North America and now throughout the

world (Morton, 2013; Trembath et al., 2012).

The geological heterogeneity and flow mechanisms in unconventional reservoirs are not well

understood especially in the context of the geomechanical dynamics during the fracturing

process and during production of fluids from the reservoir. Furthermore, stimulation adds

complexity stemming from interactions between pre-existing natural fracture networks and

fractures that are formed during the stimulation. Fracture dynamics are extremely complex and

exact fracture morphology before and after stimulation is not possible to observe although

passive and microseismic provides options to visualize aspects of the fractured reservoir rock.

The injection of proppant during stimulation only adds to these complexities. One of the

challenges that the industry is facing today is the lack of understanding of unconventional

reservoir stimulation mechanisms and their effect on production as well as the impact of pre-

existing hydraulic fractures within the formation. The development of robust and reliable tools

would allow to forecast production from these types of reservoirs with reasonable certainty

Page 20: Machine Learning Applications for Production Prediction

3

(Cohen et al., 2012). Accurate forecasts would allow for better economic evaluations of assets,

implementation of optimization strategies, minimizing expenditures, and lowering environmental

impact of oil and gas operations.

Over the past 15 years, the drastic increase of hydraulic fracturing activity was paralleled with

significant advances in digital data capture and storage (Canvanillas et al., 2016). The

combination has generated a massive, ever-expanding dataset of information on hydraulic

fracturing. Over the same period computing power has increased exponentially and innovations

in machine learning models have made them well positioned in finding hidden patterns in data

and making accurate predictions (Goel et al., 2019).

Machine learning is a subset of artificial intelligence which utilizes computer algorithms capable

of improving performance automatically through experience (Tom, 1997). There are three broad

categories of machine learning: supervised learning, unsupervised learning and reinforcement

learning. In supervised learning both the inputs and outputs are presented with the goal of finding

their general correlations and being able to make prediction when only inputs are shown.

Supervised learning is used for classification and regression problem. In unsupervised learning

no outputs are presented and the algorithm is left to try and find similarities in the inputs.

Unsupervised learning is most commonly used for clustering, association, anomaly detection and

dimensionality reduction (Leung and Leckie, 2005). In reinforced learning the algorithm

interacts with a dynamic environment and learns to achieve goals through rewards

(Chandramouli et al., 2018). The research presented in this thesis deals with forecasting and

regression and the supervised learning approach was utilized.

Page 21: Machine Learning Applications for Production Prediction

4

The motivation for this research was driven by three factors:

1. the lack of current understanding of unconventional reservoirs,

2. the very large and largely unexplored digital data set generated from hydraulic

fracturing operations in unconventional reservoirs, and

3. recent advances in machine learning models and computing power.

By utilizing new machine learning techniques, we may be able to increase the understanding of

unconventional reservoirs and how hydraulic fracturing affects production by exposing patterns

and input-output relationships that have not been identified before in physics-based modelling.

The application of machine learning algorithms could lead to significant opportunities for

increasing production, lowering costs and lowering the environmental impact of oil and gas

operations.

1.2 Montney Formation and Study Area

The Montney formation shown in Figure 1.2 is a tight siltstone/shale oil and gas play that spans

over 500 km from Alberta in the southeast to British Columbia in the northwest and covers over

130,000 km2. The quality of reservoir varies significantly over the formation, with the general

trend of conventional sandstone in the southeast to pure shale in the northwest. The thickness and

depth also vary greatly from less than 1 m thickness and 500 m depth in the southeast to over

350 m thickness and 4,000 m depth in the northwest. The in-place resource volumes for the play

are estimated at over 2,000 trillion cubic feet of gas and over 150 billion barrels of oil and

condensate. (Reynolds et al., 2014; Wang and Chen, 2016).

Page 22: Machine Learning Applications for Production Prediction

5

Figure 1.2: Areal extent of the Montney Formation (Canadian Energy Regulator, 2018).

Due to the enormous volumes of in-place hydrocarbons and the major developments in

horizontal drilling and multi-stage hydraulic fracturing technologies, the Montney Formation has

become a major gas and natural gas condensate producer. With thousands of multistage

horizontal wells drilled to date, it accounts for over 38% of total Canadian natural gas production

(Canadian Energy Regulator, 2018). Due to its size and expected production growth, the

Montney Formation was chosen for the research documented in this thesis.

The study area is a small section of the Montney Formation and is show in Figure 1.3. It is

located near Dawson Creek in the province of British Columbia, Canada, covering an area of

around 30 km by 30 km with an average thickness of 300 m. In this area, the Montney Formation

Page 23: Machine Learning Applications for Production Prediction

6

underlies the Doig Formation and overlays the Belloy Formation. This particular study area was

chosen because it contains the highest density of vertical and horizontal wells – this creates a

large data set. The study area contains 145 vertical, 35 deviated wells and 255 horizontal wells

that penetrate the Montney Formation with only 60 of these wells penetrating all the way to the

Belloy Formation. Most of the horizontal and vertical wells are drilled in the middle of the study

area with some horizontal wells located in the northwest. All the horizontal wells in the study

target the Montney Formation with the majority drilled in the top 20% of its thickness.

Figure 1.3: Location of Study Area (Canadian Energy Regulator, 2018).

1.3 Problem Statement

The overall objective of the research presented in this thesis is to explore the use of machine

learning algorithms for understanding relationships between geology, geomechanical properties,

completion design, and fluid injection properties and the production performance of the well.

Page 24: Machine Learning Applications for Production Prediction

7

From this analysis, models can be devised to predict the production performance of multistage

hydraulically fractured horizontal wells. The approach makes minimal assumptions about the

exact structure of the reservoir and fluid flow mechanisms within the reservoir but instead

attempts to link the horizontal well input parameters and surround geology and geomechanical

properties to its production profile. As there are a myriad of variables driving production

behavior, machine learning was applied in the research documented in this thesis. More

specifically, the convolutional recurrent neural network (c-RNN) was chosen for its ability to

process large amount of data as well as find patterns in sequential data. This machine learning

method also needs to have the ability to aid in optimization problems such as optimizing

completion strategies and well placements. In general, for the research conducted, the approach

was developed following these steps:

1. With a machine learning model, create synthetic shear sonic logs to get a better

understanding of rock mechanical properties within a formation.

2. Create a three-dimensional (3D) rock mechanical model of the study by using the

density, compressional sonic and shear sonic (real and synthetic) logs of vertical and

deviated wells in the study area.

3. Use the rock mechanical properties, completion design, well spacing and timing

parameters as inputs to the machine learning algorithm to predict the five-year

cumulative gas production profiles of horizontal wells.

4. Use the machine learning model to minimize water usage while maximizing

production during hydraulic fracturing in future wells.

Page 25: Machine Learning Applications for Production Prediction

8

5. Apply the machine learning model to find optimal positioning of future drills to

maximize production

1.4 Organization of Thesis

Chapter 2 is a literature review of multistage hydraulic fracturing, the variables that affect

postproduction performance, machine learning tools, as well as forecasting and optimizing post

fracture production performance.

In Chapter 3, a procedure is developed that can generate highly accurate synthetic shear sonic

logs. This procedure utilizes a c-RNN as it is capable of learning patterns in sequential data and

links the deviation survey along with the compressional and bulk density logs to the shear sonic

log. This procedure is a cost effective and fast alternative to running shear sonic (DTS) logs and

provides a greater insight into the rock mechanics of a formation.

In Chapter 4, a model is proposed that utilizes a c-RNN to predict the five-year cumulative gas

production profiles in multistage hydraulically fractured wells. The model was trained by linking

the cumulative production to a combination of completion parameters, rock mechanical

properties and well spacing for every stage of a well of 74 wells in the Montney Formation. The

accuracy of the model’s predictions was found to increase exponentially as the production of

multiple wells was aggregated.

Page 26: Machine Learning Applications for Production Prediction

9

In Chapter 5, we use the model developed in Chapter 4 to forecast the production of 40

proposed wells to be drilled alongside existing producers. 1,080 sensitivities were run to explore

how different combinations of stage count, fluid type, water amount, and proppant amount affect

the aggregated 5-year cumulative gas production of the 40 proposed wells. The study focuses on

reducing water usage while maximizing production and shows that it is possible to achieve a

cumulative production that is 76% of the maximum by injecting only 3% of the total water

volume required to achieve this maximum.

In Chapter 6, we use the model developed in Chapter 4 to forecast the production of 20

proposed wells. These 20 wells can only be drilled in the 40 fixed well locations identified in

Chapter 5, and the objective is to find the best placement for these wells. This resembles a

situation where budget constrains have forced a partial field development to be the only option.

Optimal combinations were found using two types of completion strategies, the water intense 30-

stage slickwater treatment and a water conservative 15-stage crosslinked gel treatment.

Chapter 7 highlights the major findings and conclusions of the thesis as well as

recommendations for future work.

Appendix A is a separate study not directly related to the research that was completed over the

course of the degree. Here, temperature fall-off from steam chambers where injection has

stopped is examined and a new three-zone model is designed with heat losses taking into account

temperature-dependent thermal conductivity. The model solves the transient heat conduction

and steam condensing equations.

Page 27: Machine Learning Applications for Production Prediction

10

Chapter 2: Literature Review

2.1 Multi-Stage Hydraulic Fracturing

First used in 1947, hydraulic fracturing has become the most widely used technique of producing

hydrocarbons from low permeability reservoirs. The process includes pumping massive amounts

of sand, water and chemicals at a high rate and pressure into the formation in order to induce and

prop open fractures that increase the surface area of the reservoir that is connected to the

wellbore. This in turn, increases the volumes that can be recovered (Belyadi et al., 2017). Figure

2.1 shows a visualization of the hydraulic fracturing process.

Figure 2.1: Diagram of a hydraulic fracturing operation ( US Environmental Protection Agency,

2013).

Page 28: Machine Learning Applications for Production Prediction

11

In the field, the two most popular methods to hydraulically fracture a formation are the plug and

perf and the sliding sleeve. Although there is no overall recommendation on which method is

better, typically the choice between the two is driven by economics as well as an operator’s

success in a particular formation. The plug and perf method is typically used for cased holes and

offers more flexibility in adjusting the position of the perforations. The sliding sleeve is used in

open hole wells and contains pre-perforated sleeves, so that a perforating gun is not needed

(Belyadi et al., 2017). Most hydraulic fluids are water-based and contain 90 to 97% water by

volume (U.S. Environmental Protection Agency, 2016). Two of the most common water-based

fluids used today are slickwater and crosslinked gel (Montgomery, 2013). Modern hydraulic

fracturing is typically performed in multiple stages along the entire horizontal wellbore.

Advancements in hydraulic fracturing and horizontal drilling techniques over the past several

decades have made producing hydrocarbons from unconventional reservoirs economically

feasible which lead to a unconventional boom in North America and throughout the world

(Morton, 2013).

2.2 Variables Affecting Post Fracture Production Performance

The production performance of a horizontal well that has been hydraulically fractured is

influenced by a multitude of variables, in this study we identify three major categories:

1. geological properties of the formation surrounding the perforations,

2. completion design of each stage, and

3. well spacing and completion order.

Page 29: Machine Learning Applications for Production Prediction

12

2.2.1 Geological properties surrounding a wellbore location

The geological properties surrounding the perforations in a horizontal well have arguably the

greatest influence on the production performance. Geological properties include the volume of

hydrocarbons in place, the permeability, the natural fracture geology, in situ stresses, and rock

mechanical properties. The selection of the right place to drill a well is critical to ensure an

ongoing expanding commercial operation.

The geological properties of a formation can be described as a mixture of both the properties of

the matrix and the morphology of the fracture network. The matrix properties include the gas and

fluid saturation, porosity, as well as matrix permeability. These properties cannot be altered to

any meaningful extent. Fractures, whether natural or induced are not static and evolve with

changing conditions. (Holland et al., 2009). Fractures on the other hand do have a significant

effect on fluid flow within a reservoir, either in the form of increased reservoir permeability or

increased permeability anisotropy (Nelson, 2001). The fracture morphology, distribution and

connectivity to a wellbore will significantly impact the production that is measured at the

wellhead. Mechanical rock properties are a major driver of how the fracture network evolves and

are especially important during a hydraulic fracture treatment, when the natural fracture network

gets drastically altered.

The horizontal wells in this study area are drilled within a small areal extent of the Montney

Formation and are all located at a similar depth near the top of the formation. The Montney

Formation was deposited in a shallow marine shelf environment which makes the geological

Page 30: Machine Learning Applications for Production Prediction

13

properties of the matrix fairly constant through the study area (Moslow, 2000). The rock

mechanics, which govern fracture networks differ significantly both areally and vertically. This

gives each horizontal well a unique rock mechanical profile. Mechanical properties can be

estimated using three sets of log measurements: shear sonic travel time (DTS), compressional

sonic travel time (DTP) and bulk density (RHOB) (Fjar et al., 2008). Horizontal wells do not

usually have these logs and the only reasonable way to estimate these horizontal logs is by

constructing a 3D rock mechanical model upscaled and interpolated using logs from nearby

vertical wells.

2.2.2 Completion design

After a well has been drilled, the completion design is the only thing the operator can control.

Completion design is a complicated process with many variables. A typical completion is

divided into multiple stimulation stages along the wellbore that are isolated from each other,

each stage can contain one or more perforation clusters. Since each stage is stimulated

separately, the stimulation process may differ between stages.

The stage count and spacing are one of the first decisions to be made during a completion design,

the idea is to contact as much surface area as possible while minimizing fracture interference

within clusters (Belyadi et al., 2017). The optimal spacing is difficult to determine and is tied to

both economics and the geology of the formation. The amount of proppant and water pumped

per stage are also very important variables as they dictate how large the fracture network will get

and how efficiently the fractures will stay propped open once the well starts to flowback. There

Page 31: Machine Learning Applications for Production Prediction

14

are many other variables in the design of a completion that influence production, some of the

common variables are described in Table 2.1.

Table 2.1: Completion variables.

Service company performing the stimulation Amount of foam

Sliding sleeve or plug and perf Proppant concentration

Gun diameter Proppant size and type

Charge size Size of flush stage

Number of perforations per cluster Amount of crosslinked gel

Distance of the cluster from the well head Flowrate of each stage

Amount of acid used in spearhead stage Pressure of each stage

Fluid type (slickwater or gelled) Length of time of each stage

Size of pad stage Flowback design

Amount of energizer (CO2 or N2)

2.2.3 Well spacing and completion order

Whether a well is bounded or unbounded, the distance between producing wells, and the length

of time one well has been on production before an offsetting well is drilled all influence

production performance (Belyadi et al., 2017). During the early stages of field development,

many single wells are drilled. These primary wells usually produce unbounded, meaning that

they have no offsetting wells to interact with. As production data is gathered from the primary

wells it makes more economical sense to drill infill wells beside them, this type of drilling is

done from a pad which is shown in Figure 2.2.

Page 32: Machine Learning Applications for Production Prediction

15

Figure 2.2: Example of multiple horizontal wells drilled form a pad (Energy Essentials, 2015).

The production performance from primary wells is known to be better than that of the secondary

infill wells. This is because primary wells produce for the original reservoir which has been

unaltered for millions of years. The primary well production causes a depletion in original

reservoir pressure known as the drainage area that propagates away from it. As the pressure in

the drainage area drops so too does the flow rate to the wellbore. The reservoir stresses along

with the geological properties such as porosity and permeability also get altered. If drilled close

enough, the primary and secondary well will share a drainage area leading to lower production

rates in the secondary well. Lindsay et al. (2018) analyzed 564 wells in the Wolfcamp Formation

in the Permian’s Delaware Basin and found that 66% of the time the primary wells outperformed

Page 33: Machine Learning Applications for Production Prediction

16

secondary wells that were drilled within 300 m. When the results were adjusted for the more

modern, larger completions of the secondary wells, 79% of the primary wells performed better.

It is not just the distance between wells that has an affect on production, but the length of

unbounded depletion time and consequently the amount of volume that one well had already

produced before a nearby secondary well was drilled. Defeu et al. (2018) showed both of these

effects in an optimization study on wells from the Wolfcamp formation. Wells with unbounded

depletion time of 6 months had the same minimal impact on the production performance of

secondary wells drilled at 180, 230 and 275 m away from the primary. Wells with unbounded

depletion time of 36 months or more showed a 50% reduction in production performance of

secondary wells drilled 180 m away versus a 15% reduction for wells drilled 275 m away.

A primary well produces unbounded until a secondary infill well is drilled nearby making both

wells bounded. Wells can be bounded from one or both sides, as more wells are drilled in a field,

more wells become bounded from both sides. The spacing and completion order impacts the

overall performance of a field and needs to be carefully selected before a field is developed.

Figure 2.3 depicts typical bounded and unbounded configurations. Accounting for the distance

between neighboring wells and the length of time and volume that each neighbor produced is an

indirect way to account for changes in the geological properties and reservoir stresses which are

caused by well production.

Page 34: Machine Learning Applications for Production Prediction

17

Figure 2.3: Showing bounded and unbounded wells. Here well B is bounded from both sides, A

and C are bounded from one side, and well D is unbounded. (Belyadi et al., 2017)

2.3 Hydrocarbon Production Forecasting

2.3.1 Forecasting

Forecasting is a useful tool that is used in many situations that require decision making: a city

decides on the width of highways based on forecasts of population growth and vehicle usage;

stores decides on how much stock to buy based on forecasts of customer purchases. The length

of forecasts differs based on what they are used for, telecommunication routing only requires

forecasting several minutes ahead, large capital investments require multiyear forecasts

(Brockwell and Davis, 1996).

Page 35: Machine Learning Applications for Production Prediction

18

Certain things are easy to predict while others are extremely difficult. The trajectory of a comet

can be forecasted thousands of years into the future with extreme accuracy; however, forecasting

even one day of a stock market price with any degree of certainty is nearly impossible. This is

because the accuracy of a prediction depends on our understanding of the system, as show in

Figure 2.4

Figure 2.4: The accuracy of forecasts increases with the amount of system understanding.

2.3.2 Forecasting Hydrocarbon Production

Forecasting the production of oil and gas has been a major part of petroleum engineering for

most of the industry’s history. This is because having a forecast that can reasonably predict

future oil production grant the ability to estimate the assets remaining value and to optimize the

operation of existing wells and future drilling programs.

Page 36: Machine Learning Applications for Production Prediction

19

Forecasting production is possible because reservoirs behave in a manner that is driven by the

underlying physics and geology. The oil rate at the well head is a function of drawdown pressure,

quality of the wellbore, porosity, permeability, net pay, oil saturations and several other

variables. Further to that many reservoirs around the world have been on production for several

decades and have produced a large dataset of reservoir geology, drilling and production and

pressure data. Even though reservoir physics is well understood and there are large data sets

available, the error in many forecasts still remains high, this is because

• reservoirs are very large,

• reservoir geology is very heterogenous,

• reservoirs contain vast complex natural fracture networks,

• reservoirs are deep underground and are not visible,

• the only direct measurement of the reservoir comes from vertical well bores which only

cover a tiny fraction of the entire reservoirs, and

• reservoir geology can change as hydrocarbons are produced and pressures drop.

For a long time, forecasting oil production for existing producers was done empirically by

decline curve analysis (DCA) or material balance (Aldeman and Jacoby, 1979). Forecasting

production for undeveloped (not yet drilled) wells was only done by analogy and by volumetric

calculations taking recovery efficiencies into account. The problem with traditional methods is

that they are based on subjective interpretation of the data, for example picking the proper slope

for DCA. Traditional methods suffer from human cognitive biases and because of this, long-term

production forecasts have been inaccurate and have to be revised through time as the reservoir is

Page 37: Machine Learning Applications for Production Prediction

20

produced. This is the reason why the volumetric accumulation in petroleum reservoirs (e.g. the

original oil or gas in place) have tended to be updated every few years as the petroleum resource

is produced.

Numerical simulation is another popular method to forecast production, based on historical data

and physics-based models. Numerical simulation develops a dynamic model using available

static and dynamic data. The dynamic model is based on a static geological model which is built

by upscaling interpreted data from vertical well logs. The flow model is developed using the

well-known engineering fluid flow principles followed by the history matching process. Despite

handling complex physics and chemistry, phase behavior, and approximations to the

heterogeneity of the geological system, numerical methods tend to do even worse than empirical

methods at long-term forecasting (Mohaghegh, 2017).

Numerical reservoir simulators build a model from the bottom-up by combining historical data

measurements (flowing bottom hole pressure, production rates, etc.) and geological

interpretations (static geological model of faults, porosity, permeability, etc.) with functional

relationships (heat and mass transfer, multiphase flow in porous media with fluid constitutive

equations and thermodynamics). In most cases, reservoir simulation modelers assume that field

measurements and geological models are uncertain and the functional relationships are certain

and fundamentally true. To match historical data the geological model is tuned until a good

historical match is achieved. It is then assumed that the model provides a reasonable

representation of the reservoir system and the model is suitable for forecast. The problem is that

the geological model is highly uncertain and the functional relationships that are assumed to be

Page 38: Machine Learning Applications for Production Prediction

21

true are nowhere close to being able to model the entire complexity of nature (Mohaghegh,

2017).

Empirical and numerical methods have proven to be bad at forecasting long-term, yet today they

are still the most widely used methods for forecasting oil production, reserves, and resources.

Governments, operators and investors rely heavily on these forecasts to make major decisions,

and that is why an improvement in the forecast accuracy may be of high practical value. Since oil

and gas plays such a large role with respect to global energy supply and economics, these

decisions not only affect the industry but the entire global economy, the environment, and overall

quality of human life on this planet. To reduce the amount of bad decisions made there is a need

for more accurate way of long-term forecasting reserves and resources that is driven by data.

2.4 Time Series Forecasting and Machine Learning Algorithms

Time series (TS) modeling saw its beginning from the work of Yule (1927). A time series is a set

of data recorded over a period of time, most commonly taken in regular intervals to insure

uniformity. Hourly power consumption, monthly product sales or daily oil production are all

examples of a time series. The importance of time series is twofold: firstly, it can be used to

identify patterns or behaviors in the past and secondly it can be used to make predictions about

the future. The latter has gained much popularity because of its vast applications in nearly every

field from forecasting sales to predicting the weather. Time series analysis is important since it

has the ability to identify causal factors, and these factors can then be manipulated to optimize

the future. Today, it is heavily used by investors, statisticians, meteorologists, engineers,

Page 39: Machine Learning Applications for Production Prediction

22

governments and has countless applications in any field that requires decision making. Due to its

usefulness and high popularity in industry it has also become a highly active field of research.

(Kumura et al. 2013).

Most time series analytics in the past were done manually but due to the recent trend of data

digitization, ease of storage and transition the amount of raw data available has made manual

processing an impossible task (De Gooijer and Hyndman, 2006). Today most data processing is

done automatically making use of modern computing power.

As described in detail by Chen et al. (1997), time series algorithms fall mainly into two classes:

parametric and non-parametric. A summary of these is listed in Table 2.2. Early methods were

mostly parametric approaches with the goal of fitting a statistical model to time series data.

Parametric approaches make certain assumptions and tend to suffer from severe limitations. Due

to this, most models used today are non-parametric which make no assumptions about the

underlying structure of the process (Meir, 2000).

Table 2.2: Parametric and non-parametric algorithms for analyzing time series data.

Time Series Forecasting

Parametric Non-parametric

• Exponential Smoothing

• Autoregressive Moving Average

• Autoregressive Integrated Moving

Average (ARIMA)

• Multivariate Local Polynomial Regression

• Functional Coefficient Autoregressive

• Adaptive Function Coefficient Autoregressive

• Additive Autoregressive

• Artificial Neural Networks (ANN)

• Support Vector Machines (SVM)

• Genetic and Evolutionary Approaches (GA)

• Particle Swarm Optimization (PSO)

• Simulated Annealing (SA)

Page 40: Machine Learning Applications for Production Prediction

23

2.4.1 Parametric Models

Exponential smoothing was first proposed in the late 1950s (Brown 1959; Holt 1957; Winters

1960) and has been the foundation of most successful forecasting methods. Forecasts produced

using exponential smoothing are simple moving averages of past observations which are

weighted equally. Exponential functions are used to assign exponentially decreasing weights

over time. Exponential smoothing models are based on a description of the trend and seasonality

in the data and can quickly generate fairly reliable forecasts for a wide variety of time series. The

AutoRegressive Integrated Moving Average (ARIMA) model is the most popular and frequently

used stochastic time series model (Box and Jenkins, 1970). ARIMA models aim to describe the

autocorrelations in the data and provide a complimentary approach to exponential smoothing.

Some other examples of parametric models include the autoregressive conditional

heteroscedasticity (ARCH) method introduced by Engle (1982) which captures the time-varying

conditional variance or volatility. The generalized autoregressive conditional heteroskedasticity

(GARCH) (Bollerslev, 1986) represents the variance of the error term as a function of its

autoregressive terms, thereby allowing a more parsimonious representation of the time series.

Krishnamurthy and Yin (2002) combined a hidden Markov model and AR models under a

Markov regime where AR parameters switch in time according to the realization of a finite-state

Markov chain, for nonlinear time series forecasting. These parametric models tend to be limited

when modeling nonlinear and stationary time series forecasting by assuming local linearity with

an AR-type structure.

Page 41: Machine Learning Applications for Production Prediction

24

2.4.2 Supervised Machine Learning and Artificial Neural Networks

Most non-parametric time approaches utilize supervised machine learning. The goal in

supervised machine learning is to learn how the outputs are connected to the input. During

supervised learning the algorithm is presented with training data that contains both the inputs and

outputs, the samples are presented multiple times and gradually the machine begins to learn the

patterns and reduce its forecast error. During the training round a validation dataset is also

monitored to see how accurate the algorithm is performing on data it has not seen and to monitor

overfitting (Chandramouli et al., 2018).

Out of the non-parametric methods of forecasting, Artificial Neural Networks (ANNs) have

become the most popular (Agrawal and Adhikari, 2013). ANNs are information processing

systems inspired by biological neural networks found in the brains of animals and allow for

complex nonlinear relationships between the input and response variables. ANNs have the ability

to both memorize and reason and are a method in which a computer can learn or train itself to

solve a problem it was not programmed to solve. ANNs are trained by reading a large number of

input patterns and have been applied to forecasting problems in various disciplines including

finance, sociology, medicine, engineering and many others (Cartwright, 2015). To understand

how an ANN works a basic understanding of the workings of a biological neural network is

required.

Page 42: Machine Learning Applications for Production Prediction

25

2.4.2.1 Biological Neural Networks

A very simple explanation of biological neurons is as follows. Typical biological neurons contain

three main parts: cell body in which the nucleolus is contained, dendrites and an axon. Figure 2.5

depicts two bipolar neurons. Information enters the cell body through the dendrites in the form of

electrical pulses or signals. Depending on the nature of the input signal the cell body will activate

in either an excitatory or inhibitory way, the cell body will then send an output signal through the

axon to the surrounding neurons. The electrical signal travels from one neuron and activates a

train of signals in another. A neural pathway is the point between two neurons and a synapse is

the point where the ends of an axon of one neuron come into close contact with the dendrites of

another. Neurons are the basic building blocks of neural networks.

Figure 2.5: A schematic diagram of two bipolar neurons. (Mohaghegh, 2000).

Page 43: Machine Learning Applications for Production Prediction

26

Biological neural networks can contain up to 100,000,000,000 neurons and are extremely

complex. In a neural network, one neuron is usually connected to thousands of other neurons.

Although individual electrical pulses in a neuron travel much slower than a signal in a computer,

the parallel structure of the brain allows for extremely fast processing; hundreds of times faster

than the capabilities of household computers (Mohaghegh, 2000).

2.4.2.2 Basic Structure of ANNs

Like biological neural networks, ANNs are composed of basic processing elements known as

perceptrons or artificial neurons (Mohaghegh, 2000). ANNs are composed of multiple neurons

that pass signals between one another, each neuron has connections to multiple other neurons.

Each input signal carries an associated weight which multiplies the signal being transmitted. The

neuron applies an activation function to the weighted sum of all the inputs to determine the

output signal. The output of one neuron then becomes an input to another. The neurons have

multiple inputs but only one output. Figure 2.6 is a schematic of a typical artificial neuron.

Figure 2.6: Schematic diagram of a typical artificial neuron (Mohaghegh, 2000).

Page 44: Machine Learning Applications for Production Prediction

27

Individual neurons in an ANN are arranged in rows or layers known as a multilayer feed-forward

network (Hyndman and Athanasopoulos, 2017). As depicted in Figure 2.7 the basic feed-forward

ANN has three layers: an input layer, hidden layer and an output layer. The hidden layer enables

the network to model nonlinear and complex functions and is responsible for feature extraction

and provides increased dimensionality. ANNs can have multiple hidden layers, and the more

hidden layers that are added, the more complex and nonlinear the model becomes (Mohaghegh,

2000). The number of neurons in the input layer corresponds to the number of parameters

presented to the network.

Figure 2.7: Structure of a three-layer ANN (Neville et al. 2004).

Page 45: Machine Learning Applications for Production Prediction

28

2.4.2.3 ANN training methods

Training is an optimization process where the weights of the inputs between neurons along with

the biases are calibrated until a desired output is reached. Once the network is trained it can

predict with greater accuracy (Ahmadi et al. 2015). A widely used training technique is the

backward propagation of errors (backpropagation). In backpropagation the difference between

desired and actual output (error) is calculated from the output layer backwards to the input layer

and makes small incremental adjustments in the input weights (Cheng et al. 2012).

Backpropagation is an iterative procedure that takes a certain time until the weights have been

calibrated. Other methods to make adjustments to the weights include genetic algorithms (GA),

particle swarm optimization (PSO), unified particle swarm optimization and imperialist

competitive algorithms. Their ability to optimize makes them not only a valuable tool in training

but also optimizing an objective function of a field. Figure 2.8 shows how PSO can be used to

train an ANN.

Figure 2.8: ANN training model using PSO (adapted from Panja et al. 2017).

Page 46: Machine Learning Applications for Production Prediction

29

2.4.2.4 Deep Neural Networks

Deep neural networks (networks with many successive layers) were introduced in the early

1990s and due to their impressive performance have become one the most popular machine

learning tools today. With each successive layer in the network, more complex representations

are developed, through this they completely automate the task of feature engineering which in

the past was a very time consuming a laborious process. There are many types of network layers

with more discoveries being made year over year. The studies in this thesis make use of three

types of layers (Chollet, 2017):

1. Densely connected – this is the simplest type of layer, every node in one layer is fully

connected to every node in the next. The traditional ANN uses only densely connected

layers as shown in Figure 2.7. The main problem with a network that uses only densely

connected layers is that it cannot capture sequential information in the input data. ANNs

suffer from the vanishing and exploding gradient problem which is associated with the

backpropagation algorithm. This problem appears if a network has too many hidden

layers. If the derivates of the error in the output layer are too large, the gradient will

increase exponentially with each layer and eventually explode, if the derivates of the

output layer error are too small, they will decrease exponentially with each layer and

eventually disappear (Pykes, 2020).

2. Recurrent – this type of layer is used for sequence-based data such as time series or logs.

It iterates through a sequence of a sample while maintaining a state of memory relative to

what has already been seen before. A network that uses recurrent layers is known as a

Page 47: Machine Learning Applications for Production Prediction

30

recurrent neural network (RNN). The two most popular types of recurrent layers are Long

Short-Term Memory (LSTM) and the gated recurrent unit (GRU). Both units have

similar performance and are related in that they gate information to prevent the vanishing

gradient problem. The GRU was chosen for this thesis because of its simpler structure

and is higher computational efficiency over the LSTM (Chung et al., 2014). RNNs also

suffer from the vanishing and exploding gradient problem.

3. Convolutional – this is a layer that learns spatial hierarchies of patterns. When staked

together, these layers make a convolutional network (CNN), they are able to capture both

local and global patterns. Filters (or kernels) are the building block of CNNs, and are

used to extract relevant features from the input using the convolutional operation. Due to

this, they are mostly used for image identification tasks; however, they have also found

success in processing sequence data. Their performance is competitive with recurrent

layers and is usually computationally less intensive. As with the other type of networks

the CNN suffers from the vanishing and exploding gradient problem.

Deep networks are able to combine many different types of layers, which opens nearly

unlimited potential for experimenting with configurations to find an optimal structure for a

certain type of problem. There is no optimal network structure since it depends largely on the

problem at hand. Some rules of thumb exist that can be used as a starting guide. However,

the only way to find the optimal configuration is by manually tuning hyperparameters and

running many experiments. Hyperparameters are parameters whose value is set before the

Page 48: Machine Learning Applications for Production Prediction

31

training occurs. Important hyperparameters are the type and number of layers, number of

nodes in each layer, dropout (which introduces randomness), batch size and optimizer.

2.4.2.5 Convolutional Recurrent Neural Network

For the experiments presented in this thesis a convolutional recurrent hybrid (c-RNN) network

was chosen. This is because it is able to combine the speed and ability to process large amounts

of data of a convolutional network (CNN) with the sequence processing ability of a recurrent

network (RNN). The study presented in Chapter 3 of this thesis required forecasting a sequence

4,000 steps long, but standalone RNNs are able to memorize patterns in only a few hundred steps

and would not be able to predict this long a sequence with great accuracy. Convolutional

networks are able to convert long sequences into shorter ones of higher-level features, which

gives a c-RNN hybrid the ability to process sequences with thousands of steps (Chollet, 2017).

2.4.2.6 Overfitting

Deep neural networks are good at learning complicated relationships between inputs and outputs;

however, if trained too long the networks start to confuse noise with signals and start to overfit to

the training set which results in lower generalization capabilities (Srivatava et al. 2014). This is

especially problematic when the amount of training data is limited as in the studies presented in

this thesis. A visual of overfitting is shown below in Figure 2.9.

Page 49: Machine Learning Applications for Production Prediction

32

Figure 2.9: Visualization of overfitting (Bhande, 2018).

Most machine learning experiments are run with three sets of data: training, validation and test.

The model is trained on the training set, the validation set is used to monitor overfitting and the

test set is then run after network has finished training. The typical approach is to limit overfitting

stop the training at the point where validation error is at a minimum as shown in Figure 2.10.

Figure 2.10: A plot showing how training and validation error evolves over the amount of

epochs.

Page 50: Machine Learning Applications for Production Prediction

33

This approach is good when there are at least a few hundred samples available. The studies in

this thesis had a maximum of 74 samples and running multiple experiments showed that the

point at which the validation error was at a minimum was inconsistent with different samples in

the training data. After numerous experimentations we found that the best way to limit

overfitting with the small sample sizes was to maximize the batch size and apply dropout.

Dropout is one of the most effective and commonly used methods to reduce overfitting (Chollet,

2017). The term “dropout” refers to the temporary removal (setting to zero) of a fraction of

nodes from a network along with their incoming and outgoing connections during training. The

units to drop are chosen at random with a probability of p which is known as the dropout rate

which is usually set between 0.2 and 0.5 (Chollet, 2017). At test time, no units are dropped,

instead layer output values are scaled down by p. At its core, dropout adds randomness which

breaks up patterns that are not significant (Srivatava et al. 2014). A visualization of dropout is

shown in Figure 2.11.

Figure 2.11: Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right:

An example of a thinned net produced by applying dropout to the network on the left. Crossed

units have been dropped (Srivatava et al., 2014).

Page 51: Machine Learning Applications for Production Prediction

34

Batch size refers to the number of training examples that are seen before a network update is

made. Batch size comes in three options (Brownlee, 2017):

• Batch mode: batch size is equal to the total number of samples in the training set,

• Mini-batch mode: batch size is smaller than that in batch mode but greater than one, and

• Stochastic mode: batch size is equal to one.

2.5 History of Machine Learning in Production Forecasting and Optimization

Time series forecasting in the petroleum engineering is relatively new with mostly parametric

models being used. Ayeni and Pilat (1992) used the ARIMA technique to forecast crude oil

reserves in South Louisiana. Ediger and Akar (2007) used ARIMA and S-ARIMA models to

forecast total primary energy demand over time in Turkey.

Machine learning-based, non-parametric forecasting have also been used: He et al. (2001) used

ANN forecasting to predict existing and infill oil well production using only production data.

The experiments included two data sets (for existing and infill wells prediction) of 9 wells each.

One and a half years of production history was used to train the ANN, the authors conclude good

predictive capacities for short term (1-2 year) forecasts. Chakra et al. (2013) applied a higher

order neural network (HONN) using limited reservoir data to predict the production of oil, gas

and water. Qiao et al (2017) used least square-SVM coupled with PSO to predict the production

of oil and gas. These studies although having shown promising results in short-term forecasting,

have had significant errors in forecasts of two or more years.

Page 52: Machine Learning Applications for Production Prediction

35

Aizenberg et al. (2016) applied a complex multilayer network with multi-value neurons

(MLMVN) was constructed to forecast oil production for 12 years into the future on 14 long

producing wells using their own production history for an oil field in Mexico. The forecasts

where then compared with actual production and the results had an average symmetric mean

absolute percentage error (SMAPE) of 17% with one 12-year forecast having an SMAPE of only

6.14%.

In terms of optimization, algorithms such as GA or PSO have been used mostly as a

complimentary tool to speed up the training process in ANNs and SVMs (Ahmadi et al., 2015).

However there have been numerous studies that employed these algorithms for optimizing field

development and operations. GAs were used to maximize NPV by optimizing the schedule and

location of horizontal wells with fixed orientations (Abukhamsin, 2009); GAs were also

employed in large-scale field development optimization involving the placement of several

hundred wells (Tanaka et al., 2018). PSO has been applied to geophysical inverse problems

(Fernández-Martínez et al., 2010) as well as determining the optimum well location and types

(Onwunalu and Durlofsky, 2010).

2.6 What is Missing in Literature?

There are many models that have been built to forecast production from horizontal oil and gas

wells of which many of them employ new and complex machine learning models. The overall

trend in these studies is that they do not sufficiently cover the broad spectrum of inputs that

affect production especially for the case of hydraulically fractured reservoirs. Many studies on

Page 53: Machine Learning Applications for Production Prediction

36

hydraulically fractured reservoirs train a network to link production to completion parameters

and top hole x,y coordinates; however, they miss the fundamental fact that geology is

heterogenous and the proper inputs, the entire geological profile that surround a 2 km long

horizontal wellbore needs to be taken into account. Also, horizontal wells are completed in

multiple stages (sometimes up to 100) and many studies take the average completion parameters

of the entire well bore. Taking the average is a huge oversimplification because each stage is

unique. Some stages are completed quickly, some take longer. Some stages only place partial

amounts of the proppant or fail to break the rock at all. The stages are also spaced far apart (60

meters or more) which means each stage has a unique surrounding geology leading to every

stage having extremely different fracture networks. Furthermore, many studies do not account

for the distance between producing wells and the interaction effects that occur if wells are

sharing a common drainage area.

To the author’s knowledge there are currently no studies that have attempted to link the

completion and geological parameters at a stage level (instead of averaging to a well level) along

with well spacing information to the production performance of horizontal wells. The research

documented in this thesis attempts to fill this gap.

Page 54: Machine Learning Applications for Production Prediction

37

Chapter 3: A New Neural Network Procedure to Generate Highly Accurate Synthetic Shear

Sonic Logs in Unconventional Reservoirs

3.1 Preface

This chapter has been published as an SPE paper 201453-MS after presenting at the SPE Annual

Technical Conference and Exhibition, October 26-29, 2020. https://doi.org/10.2118/201453-MS.

This manuscript is co-authored by Ian D. Gates.

Page 55: Machine Learning Applications for Production Prediction

38

3.2 Abstract

Shear sonic travel time (DTS), along with compressional sonic travel time and bulk density are

required in order to estimate rock mechanical properties which play an important role in fracture

propagation and the success of hydraulic fracture treatments in horizontal wells. DTS logs are

often missing from the log suite due to their costs and time to process. The following study

presents a machine learning procedure capable of generating highly accurate synthetic DTS

curves. A hybrid convolutional-recurrent neural network (c-RNN) was chosen in the

development of this procedure as it can learn sequential data which a traditional neural network

(ANN) cannot. The accuracy of the c-RNN was superior when compared to that of the ANN,

simple baselines and empirical correlations. This procedure is a cost effective and fast alternative

to running DTS logs and with further development, has the potential to be used for predicting

production performance from unconventional reservoirs.

Page 56: Machine Learning Applications for Production Prediction

39

3.3 Introduction

Mechanical rock properties such as the Young’s modulus and Poisson’s ratio are important to

understand the way fractures propagate within reservoir rocks during hydraulic fracturing.

Furthermore, the mechanical properties of the rocks in an unconventional reservoir are

anisotropic and heterogenous, which makes the production performance of a horizontal well

highly dependent upon its location and orientation within the reservoir. It remains unclear how

fractures propagate within these tight reservoirs especially in the context of the state-of-stress but

a better understanding of the mechanical properties can yield insights that enable the ability to

ultimately predict the production performance of each stage and cumulatively, of the well.

In typical practice, measuring mechanical properties is done through direct and indirect methods

(Maleki et al., 2014). Direct methods involve taking core samples from the reservoirs and

performing laboratory tests, for example, triaxial stress tests, to understand the stress-strain

behavior of the rock sample as well as the stress at which the rock fails. Due to the high costs

and difficulty of rock sample extraction, typically only a few core samples are retrieved from the

formation of interest. Also, laboratory tests are expensive and time consuming (Maleki et al.,

2014). Even when obtained, core data only provides point-value (discrete) information on the

formation of interest – the fraction of the reservoir sampled is typically under 1%. Discrete data

is not very useful when describing the entire formation since the mechanical properties of rock

vary greatly with depth and the loading conditions of the surrounding formations which can vary

areally due to variable thickness and densities of the formations and unconformities. A

continuous mechanical property measurement along the thickness of the formation is desired.

Page 57: Machine Learning Applications for Production Prediction

40

Fortunately, mechanical rock properties can be derived indirectly from a combination of three

sets of log measurements, specifically: bulk density (RHOB), compressional slowness (DTP),

and shear slowness (DTS) (Fjar et al., 2008). The dynamic Young’s modulus dynamic and

Poisson’s ratio can be derived from the compression and shear wave velocity (which are the

inverse of the DTP and DTS log measurements) along with the bulk density using the following

formulas:

𝐸𝑀𝐷𝑌𝑁 =𝜌𝑉𝑠

2(3𝑉𝑝2 − 4𝑉𝑠

2)

𝑉𝑝2 − 𝑉𝑠

2

ν𝐷𝑌𝑁 =𝑉𝑝

2 − 2𝑉𝑠2

2(𝑉𝑝2 − 𝑉𝑠

2)

Where EMDYN is the dynamic Young’s modulus (pascals); νDYN is the dynamic Poisson’s ratio;

Vp is the compression wave velocity (m/s); Vs is the shear wave velocity (m/s); ρ is bulk density

(kg/m3).

The problem is that shear sonic data is often missing from a well log suite due to its high cost

and length of time to acquire. Without log or core data, the only way to estimate the shear sonic

value is either through empirical correlations or statistical methods (Hadi and Nygaard, 2018).

Most empirical correlations attempt to express the relationship between the DTP and DTS

measurements obtained from the wells with DTS logs. Empirical correlations were the main way

to estimate DTS for many years with the more familiar works from Caroll et al. (1969), Castagna

et al. (1985), and Han et al. (1986). The difficulty is that although DTS is highly correlated to

DTP, multiple variables influence the DTS including pore pressure, fluid saturation, clay and

shale content, stress profiles and other factors (Barre, 2009). Empirical correlations are also

Page 58: Machine Learning Applications for Production Prediction

41

highly dependent upon the location and size of the study area, and result in poor estimates if

taken outside that area.

Figure 3.1: Plot of shear slowness (DTS) versus compressional slowness (DTP) measurements

for 14 wells in the Montney Formation.

It is easy to see the highly nonlinear, seemingly uncorrelated nature of the relationship which

makes predicting DTS difficult. Since it is impossible to know all of the causal factors

contributing to DTS, the better approach is to apply statistical methods. Machine learning

algorithms have capabilities to integrate subtleties between data sets more than simple regression

models. These algorithms work by learning patterns between input and output variables that are

too complex and subtle for regular statistics to capture. There have been multiple studies

applying various machine learning techniques to DTS prediction. Eskandari et al. (2004) used

artificial neural networks (ANN) and showed that they performed better than empirical

correlations. Rajabi et al. (2010) used genetic algorithms (GA) and fuzzy logic (FL) which were

able to capture the general shape of the log but did not perform well on localized variations.

Maleki et al. (2014) used both support vector regression (SVR) and ANN and found that

although SVR was a better predictor its accuracy was limited and was not a good replacement to

Page 59: Machine Learning Applications for Production Prediction

42

the true log. Al-Anazi and Gates (2015) used support vector regression (SVR) to predict

Poisson’s ratio and Young’s modulus with a fuzzy-based ranking algorithm to select the most

significant input variables and filter out dependency. Their results demonstrated that the learning

and predictive capabilities of SVR was similar or superior to that of a backpropagation neural

network depending on the control parameters of the SVR. There is no single algorithm that is

superior to others and several types should be used in a particular study; however, ANNs have

recently become the popular choice (Parapuram et al. 2018; Hadi and Nygaard, 2018). Recurrent

neural networks (RNNs) are of particular interest as they are useful for sequence data such as

logs as they retain a memory of previous patterns. RNNs have shown to overperform ANNs

(Zhang et al. 2018). Studies to date have been able to capture the general trend of the

relationship; however, they do not adequately capture smaller localized variations. These

localized variations are important since they can play a major role in the mechanical properties

which in turn can affect the way fractures propagate.

In the following, we present a procedure that uses a hybrid variation of the RNN to generate

synthetic DTS curves from DTP and RHOB logs and well deviation survey. The procedure is

applied to a set of wells in the Montney Formation in Alberta, Canada.

3.4 Study Area and Data Processing

3.4.1 Study Area and Structure Model

The study area is the same for all research chapters and is described in Chapter 1: introduction of

this thesis and shown in Figures 1.2 and 1.3. A three-dimensional (3D) structural model of the

Page 60: Machine Learning Applications for Production Prediction

43

study was constructed to get a better visual understanding of the reservoir and to serve as a

placeholder for data. Figure 3.2 shows the model along with the Montney and Belloy tops.

Figure 3.2: A 3D visualization of all wells in the study area. Red = horizontal, blue = deviated,

and black = vertical.

The geological model was constructed in the Petrel geological modelling package

(Schlumberger, 2018) using well information, deviation surveys, formation tops as well as logs

from public data sources. Public data requires quality checks which are performed on deviation

surveys, surface coordinates, surface elevation and well tops. It was found that most publicly

reported well top depths needed correction. This was done by using the gamma ray (GR) and

resistivity logs. The Montney formation top surface was generated by using ~400 well tops. Due

to the high number and density of penetrations, the Montney surface is believed to have

relatively low uncertainty. The Belloy Formation on the other hand only had 60 penetrations.

The Montney has a much lower variation in thickness as compared to the variation in true

vertical depth, so a thickness map should have more certainty than a Belloy surface generated

using only 60 penetrations. The thickness map was generated using Montney thickness data from

Page 61: Machine Learning Applications for Production Prediction

44

the 60 wells that penetrate the entire formation and the Belloy surface was generated by adding

the thickness map to the Montney surface map. In this way, the uncertainty of the Belloy surface

is low. The 3D map of the generated Montney and Belloy tops is shown in Figure 3.3.

Figure 3.3: 3D generated Montney and Belloy surfaces in Petrel. The vertical axis is 25x

exaggerated compared to the X and Y scales and represent the subsea depth.

Having a 3D model not only helps with visualization of the study area but also provides a means

of estimating the reservoir thickness for the 85% of wells that do not penetrate the entire

formation. The Montney Formation has a dip and varies in its true vertical depth, so a geological

layer or feature at location x1, y1 and a given depth z would not necessarily be present at the

same depth z at location x2, y2. The same layer is likely however to be present along the same

thickness interval, so this type of measurement is a better baseline for comparison than true

vertical depth.

3.4.2 Data Preparation

Estimating rock mechanical properties requires three sets of logs: RHOB, DTP, and DTS. The

majority of the 180 vertical and deviated wells have RHOB and DTS logs but only 14 (8% of the

Page 62: Machine Learning Applications for Production Prediction

45

wells) have all three sets of logs that covered >90% of the reservoir thickness. Figure 3.4 shows

all the vertical and deviated wells without DTS marked by black points and the wells with DTS

marked by a label. If accurate synthetic DTS logs could be generated for the remaining 166

vertical and deviated wells with no DTS, the certainty of a rock mechanical model could

potentially increase significantly.

Figure 3.4: Areal view of the study area generated in Petrel. Black points with no labels are wells

with no DTS logs, labeled wells have the DTS logs.

Page 63: Machine Learning Applications for Production Prediction

46

The following data was extracted for the 14 wells in the Montney Formation: deviation survey,

RHOB, DTP, and DTS logs. Initially, GR logs were also extracted, however through much

preliminary testing it was found that GR did not improve model accuracy and, in some cases,

even worsened the performance. Even after the GR log was converted to a volume of shale (Vsh)

log, it was still found to be unhelpful since the GR reaction for similar layers of shale varied

greatly from well to well.

Log measurements for each well have a specific resolution or measurement frequency that

ranges from 1 to 6 inches (2.5-15 cm). This caused some wells to have 6 times more

measurements than others for an interval of equal thickness. This is a problem since all wells

need to carry the same weight to have an unbiased analysis. To combat this discrepancy all logs

were converted to a resolution of 4,000 steps per 100% reservoir thickness, which is 0.025% per

step. The 4,000 step scale was chosen since it represented the average number of measurements

in the well logs and did not lower the resolution for the majority of the wells. Wells that did not

penetrate the entire Montney Formation had an estimated formation thickness which was

calculated using the top and bottom surfaces. As a result, every well had 4,000 measurements of

RHOB, DTP, and DTS from top to bottom. Each step also contained x and y coordinates

(constant for vertical wells). Deviated well coordinates for every step were calculated from the

deviation survey. Finally, the data was checked for quality and consistency, sections of the

formation that had missing values of one log were removed. Although not frequent, anomalies in

measurement (such as a sudden 40% spike in density) were also removed. This cleaning of data

reduced the total number of points per well; however, the step change of 0.025% thickness

Page 64: Machine Learning Applications for Production Prediction

47

remained the same leaving the experiment unbiased towards a certain well. Table 3.1

summarizes the number of data points for each of the 14 wells.

Table 3.1: Total number of data points in each well used for the study.

Well # of data points % thickness coverage

VT_02 3,913 98%

VT_10 3,543 89%

VT_18 3,999 100%

VT_26 3,574 89%

VT_29 3,999 100%

VT_42 3,520 88%

VT_49 3,988 100%

VT_69 3,845 96%

VT_108 3,904 98%

VT_132 4,000 100%

VT_137 3,946 99%

DV_16 3,952 99%

DV_19 3,372 84%

DV_28 3,999 100%

Total 53,554 96%

3.5 Experimental Setup and Types of Neural Network Algorithms

3.5.1 Experimental Setup

The synthetic DTS curve was generated by using the following five inputs: x coordinate, y

coordinate, depth measured in true vertical depth subsea (TVD subsea), RHOB and DTP for

every available point along the Montney thickness. Since there are only 14 wells in the data set,

the only suitable method to determine accuracy is the “leave-one-out” cross-validation method.

In this method, 13 wells are used as training data with one well to validate (blind), the

Page 65: Machine Learning Applications for Production Prediction

48

experiment is then run a total of 14 times (one for each blind well) and the results of all 14

experiments are examined. This “blind” well is not truly blind as its performance can be

measured. However, it is important that an individual blind well does not change the way the

tests are performed. This means the network structure and training procedure must be fixed

throughout the evaluation. The network can be tuned after all 14 experiments are completed to

see if an improvement is made. As the network makes 4,000-point sequence predictions, multiple

metrics are required to understand the accuracy of the prediction. For this study we compare four

types of accuracy metrics:

1. MAPE – Mean Absolute Percentage Error,

2. MaxPE – Maximum Percentage Error,

3. amount of points with >3% error, and

4. amount of points with >5% error.

MAPE is an indicator of the network performance on all points in the sequence. It does not

reveal how much of the sequence has poor accuracy, or what the maximum error is. MaxPE

shows the maximum error in the entire predicted sequence, but does not give insight into the

average performance (Zhang et al. 2015). The amount of points greater than (3 or 5%) is a metric

between MAPE and MaxPE and gives insight into what fraction of the predicted sequence is

reliable. These metrics although helpful, are not very intuitive and the best way to compare

algorithm performance is by plotting the entire synthetic log generated by the algorithm overtop

the measured log. Visually comparing synthetic logs generated by different algorithms can more

Page 66: Machine Learning Applications for Production Prediction

49

easily point to how well each algorithm predicts small variations and which log sections are

predicted better than others.

Consistency is also important, if network performance differs too much for the same blind well

experiment, there is a problem and the network cannot be trusted for true blind wells. To

measure consistency, each of the 14 blind well experiments were reset and trained three times,

without modifying any network parameters or training procedures. Metrics including the

maximum minus minimum error and the standard deviation of errors of the three runs were

measured. Running multiple experiments on each blind well is also a good way to average the

results by removing really good or really bad results.

3.5.2 Comparisons

To test if an algorithm has statistical significance, it must be compared to a baseline such as:

1. Taking the average DTS/DTP ratio of training wells and applying it to the blind well, or

2. Taking the average DTS minus DTP value of training wells and applying it to the blind

well.

The networks are also compared to the DTS estimates of two popular empirical correlations:

1. Castagna et al. (1985) for shales:

Vs (km/s) = 0.77 Vp – 0.8674

2. Han et al. (1986) for shaly sandstones:

Vs (km/s) = 0.85 Vp – 1.14

Page 67: Machine Learning Applications for Production Prediction

50

3.5.3 Networks used in the experiments

Two types of networks were chosen to run experiments: a simple feedforward network (ANN)

and c-RNN. The networks were programmed in Python using the keras library (Chollet, 2015).

The ANN was chosen as it is the simplest type of network and a good baseline to compare to the

more complicated c-RNN. The c-RNN hybrid network was chosen because it combines the

speed and ability to process large amounts of data of a convolutional network with the sequence

processing ability of a recurrent network. Traditional RNNs are only able to hold memory a few

hundred steps and since each of the wells has 4,000 steps in the input sequence, a standalone

RNN would not be able to predict with great accuracy. Convolutional networks are able to

convert long sequences into shorter ones of higher-level features, which gives a c-RNN hybrid

the ability to process sequences with thousands of steps (Chollet, 2017). For both the ANN and

the c-RNN hybrid the activation function was chosen to be the mean absolute error (MAE), this

activation function was found through trial and error and was chosen because it was able to train

the network faster other activations function such as the root mean square error (RMSE) or the

MAPE. The architecture of the networks is shown in Figures 3.5 and 3.6 is a visual

representation of the input and output data format of the c-RNN.

Figure 3.5: Architecture of the ANN (A) and c-RNN (B) used in the study.

Page 68: Machine Learning Applications for Production Prediction

51

Figure 3.6: Format of input and output data of the c-RNN.

3.5.4 Overfitting

Deep neural networks are good at learning complicated relationships between inputs and outputs;

however, if trained too long the networks start to confuse noise with signals and start to overfit to

the training set which results in lower generalization capabilities (Srivatava et al. 2014). This is

especially problematic when the amount of training data is limited as in the case in this study.

Since the aim of this study is to generate synthetic DTS for wells with no actual DTS there

should be a way to determine the stopping point of the training without looking at blind well

performance. Most machine learning experiments are run with three sets of data: training,

Page 69: Machine Learning Applications for Production Prediction

52

validation and test. The model is trained on the training set, the validation set is used to monitor

overfitting and the test set is then run after network has finished training. The typical approach is

to stop the training at the point where validation error is at a minimum. This approach is good

when there are at least a few hundred samples available; this study only has 14 wells in total.

Running multiple experiments showed that the point at which the validation error was at a

minimum was inconsistent between different blind wells and in repeat trials of the same blind

well; this implied that a way to limit overfitting was needed. To limit overfitting, we maximized

the batch size and applied dropout.

Figure 3.7 plots how the validation error evolves with different batch sizes (as a percent of the

maximum batch size) for the VT_42 blind well test. The plot reveals that fluctuations in blind

error become less aggressive and overfitting becomes less of an issue with larger batch sizes. For

these reasons it was decided to run all experiments in full batch mode.

Figure 3.7: Effect of batch size on the evolution of the validation error for blind well VT_42.

Page 70: Machine Learning Applications for Production Prediction

53

3.5.5 Stopping procedure

3.5.5.1 ANN

The ANN is meant to serve as a baseline to compare the c-RNN too. If the c-RNN performs

better than the best possible ANN results than the complexity and longer run time of the c-RNN

are justifiable. To reach the best possible result of the ANN a cheat run was performed for every

blind well. In the cheat run, the blind well error was observed during training and the network

was forced to stop when the blind error value was at its lowest. The point at which the blind error

is minimum was unique for every blind well and the only way to find it was to observe the blind

error rate at every epoch. Each blind well test was reset and run multiple times and the run with

the lowest blind error was used to generate a synthetic DTS log. This type of procedure is only

meant to test the maximum potential of the network and is not applicable when generating

synthetic DTS for wells with no DTS since the error on these wells cannot be observed.

3.5.5.2 c-RNN

Page 71: Machine Learning Applications for Production Prediction

54

Figure 3.8: Training and validation error for two blind well tests.

Figure 3.8 shows how the training and validation error evolves over 1,000 training epochs for

two of the blind well experiments. “Tremors” in both the training set and blind well error start to

occur at fairly regular intervals after ~400 epochs and the error rate seems to stabilize at a very

low constant error between them. These tremors were found to be universal across all wells, as

can be seen in Figure 3.9.

Page 72: Machine Learning Applications for Production Prediction

55

Figure 3.9: Validation error for all blind well tests, results are split into two graphs for clarity.

A long run was performed on one of the wells to see if this trend continued. Figure 3.10 shows

the train and test error for 20,000 iterations of data.

Figure 3.10: 20,000 iterations of the blind well DV_28 experiment.

Page 73: Machine Learning Applications for Production Prediction

56

Overfitting seemed to have stopped and the blind well error is more or less constant between

tremors. After about 2,000 iterations the tremors in test data get very large but still converge

back to a low and stable error rate. The tremors in the blind well point to a possibility for

improvement in accuracy if a way to stop the algorithm at the low point of the blind tremor is

found.

Since the validation error does not exist when generating synthetic DTS curves for wells with no

DTS, one can only use the training error to determine when to stop the network. From the figures

it is seen that the tremors in both the training set and blind well occur nearly at the same time

(validation tremor starts a bit earlier and ends a bit later than the training tremor). Figure 3.11

presents a zoom into one of the tremors.

Figure 3.11: A zoomed in plot showing the training and blind well error tremor.

Page 74: Machine Learning Applications for Production Prediction

57

Since the training and validation tremors where found to coincide with one another, a simple

empirical procedure was developed to determine the stopping criteria for the c-RNN:

A. Run network for 1,000 epochs (when network has stopped improving results).

B. If training error is stable and far enough away from the previous tremor, stop training.

C. If training error is experiencing a tremor or very close to previous tremor, continue

running network for 50-100 more epochs (but before next tremor occurs) until you reach

a state in step B, stop training.

D. If another tremor occurs, repeat C until you reach a state of B.

Finally, to see how close this procedure comes to the maximum potential of this network, a cheat

run was performed for every blind well. Like the ANN cheat run, the blind well error was

observed during training and the network was stopped at the lowest blind value.

3.5.6 Summary of Procedure

Table 3.2 presents the architecture and hyperparameters used for the c-RNN network used in this

study. Figure 3.12 shows the procedure required to generate a trained version of the synthetic

DTS tool presented in this study.

Table 3.2: Starting hyperparameters for the c-RNN developed in this study.

Layer Input Conv Conv

Max

Pooling Conv GRU GRU Output

Number of

nodes 5 32 32 - 32 100 100 1

Activation

Function -

rectified

linear unit rectified

linear unit - rectified

linear unit rectified

linear unit rectified

linear unit -

Recurrent

Dropout - - - - - 0.5 0.5 -

Page 75: Machine Learning Applications for Production Prediction

58

Figure 3.12: General procedure for training the synthetic DTS tool that can be applied to any

formation.

Generate top and bottom surfaces for the formation of interest

Load all pertinent data into Petrel (well tops, logs and deviation surveys)

Extract RHOB, DTP and DTS logs as well as deviation surveys in formation of interest for

all wells with DTS logs

Convert well logs to a set number of points per 100% formation thickness

Select 1 DTS well as validation set and remaining as training set

Train c-RNN network on training set using recommended starting architecture and

hyperparameters listed in Table 2

Retrain network with previous hyperparameters on different train/validation sets to ensure

network is generalizing to other wells (if generalization poor, re-tune hyperparameters)

Save network hyperparameters

Run network for 10,000 epochs on several validation wells to confirm overfitting has been

removed

Tune network hyperparameters until best validation accuracy is achieved

Take note at which epoch validation error is stable for most validation wells; this is the

stopping point (a stopping point of 1,000 epochs was chosen for this study)

Page 76: Machine Learning Applications for Production Prediction

59

Figure 3.13 shows how to use the trained tool to generate synthetic DTS curves. The training and

generating procedures are not limited to the Montney and can be applied to any formation of

interest.

Figure 3.13: Process for using the training tool to generate synthetic DTS curves.

3.6 Results and Discussion

Tables 3.3a to 3.3f summarize the results of the leave-one-out experiments. Tables 3.3a to 3.3d

compare the four different metrics for the two empirical correlations, the two simple baselines,

the best possible results achieved with the ANN cheat run, the average of the three c-RNN runs

along with the best possible result achieved with the c-RNN cheat run. Tables 3.3e and 3.3f show

the consistency of the three c-RNN runs. These metrics are the difference between maximum and

minimum error values and the standard deviation of the three runs for each blind well. The

MAPE and MaxPE (Tables 3.3a and 3.3b) are not the best metrics as they do not describe the

amount of local errors, but they do provide more information about the results. The more

revealing metric is the fraction of points >3% error (Table 3.3c) and the visual comparison.

Page 77: Machine Learning Applications for Production Prediction

60

Table 3.3: Results of the study comparing the performance of the various methods of generating

synthetic DTS curves.

Table 3.3a - MAPE

well

Castagna et

al. Han et al. DTS/DTP DTS-DTP ANN Cheat

Ave of 3

c-RNN Runs c-RNN Cheat

VT_02 2.80% 2.19% 1.24% 1.03% 0.91% 1.00% 0.83%

VT_10 2.31% 3.17% 1.61% 1.53% 1.88% 1.76% 1.27%

VT_18 1.79% 3.47% 1.35% 0.93% 0.62% 0.66% 0.50%

VT_26 3.39% 2.55% 1.80% 2.29% 1.12% 1.02% 0.83%

VT_29 3.80% 1.89% 1.87% 2.15% 1.16% 1.27% 1.09%

VT_42 7.05% 3.27% 4.07% 3.11% 1.17% 1.02% 0.85%

VT_49 1.53% 3.95% 1.66% 1.01% 1.81% 1.40% 0.66%

VT_69 6.41% 2.84% 3.29% 2.11% 1.28% 1.20% 0.91%

VT_108 2.05% 3.01% 1.43% 1.31% 0.92% 0.99% 0.82%

VT_132 2.73% 2.73% 1.36% 1.37% 1.01% 0.95% 0.75%

VT_137 4.59% 2.93% 2.17% 1.37% 1.26% 1.99% 1.31%

DV_16 1.60% 4.11% 2.17% 1.75% 0.74% 1.01% 0.75%

DV_19 2.57% 4.92% 2.95% 2.32% 1.59% 1.27% 1.18%

DV_28 2.67% 3.12% 1.78% 2.15% 1.62% 1.52% 0.85%

Total 3.21% 3.14% 2.03% 1.73% 1.23% 1.22% 0.89%

Table 3.3b - MaxPE

well

Castagna et

al. Han et al. DTS/DTP DTS-DTP ANN Cheat

Ave of 3

c-RNN Runs c-RNN Cheat

VT_02 8.37% 9.50% 4.96% 4.06% 4.36% 4.18% 3.52%

VT_10 10.44% 14.00% 12.90% 13.32% 14.63% 14.00% 12.98%

VT_18 6.85% 10.93% 5.87% 5.71% 4.72% 4.44% 3.82%

VT_26 14.76% 11.14% 8.45% 6.15% 5.09% 5.32% 5.52%

VT_29 11.42% 7.70% 7.03% 6.59% 5.11% 4.87% 4.12%

VT_42 16.84% 13.59% 9.34% 7.61% 5.66% 6.07% 5.18%

VT_49 6.02% 10.54% 6.31% 5.50% 4.79% 7.18% 4.32%

VT_69 15.84% 12.57% 8.54% 6.31% 3.87% 4.43% 3.73%

VT_108 6.79% 11.06% 7.10% 5.93% 3.97% 4.03% 3.77%

VT_132 12.15% 11.89% 6.65% 8.88% 6.07% 7.08% 6.75%

VT_137 14.48% 11.36% 7.13% 7.47% 6.15% 5.98% 6.74%

DV_16 8.73% 13.20% 6.73% 5.88% 5.71% 5.05% 5.51%

DV_19 10.59% 14.82% 10.53% 7.53% 7.32% 6.32% 5.67%

DV_28 11.49% 14.23% 8.64% 9.65% 10.07% 10.36% 11.32%

Total 16.84% 14.82% 12.90% 13.32% 14.63% 14.00% 12.98%

Page 78: Machine Learning Applications for Production Prediction

61

Table 3.3c – Percentage of points above 3% error

well

Castagna et

al. Han et al. DTS/DTP DTS-DTP ANN Cheat

Ave of 3

c-RNN Runs c-RNN Cheat

VT_02 41.53% 27.17% 5.39% 2.12% 1.97% 1.92% 0.43%

VT_10 32.30% 50.65% 10.33% 9.37% 19.00% 15.50% 8.58%

VT_18 14.60% 53.84% 4.25% 1.18% 0.15% 0.18% 0.13%

VT_26 47.01% 34.42% 18.33% 25.63% 3.19% 2.63% 1.15%

VT_29 61.24% 19.55% 19.08% 25.81% 4.80% 7.87% 4.20%

VT_42 94.29% 42.67% 69.03% 56.65% 3.55% 4.01% 1.99%

VT_49 13.59% 65.92% 11.69% 1.71% 10.88% 2.96% 0.68%

VT_69 92.09% 32.35% 49.67% 18.99% 3.07% 3.34% 0.60%

VT_108 23.95% 44.54% 8.97% 5.84% 2.38% 1.71% 0.54%

VT_132 35.08% 43.78% 6.70% 7.08% 0.78% 0.97% 1.20%

VT_137 62.19% 38.55% 28.08% 5.22% 3.57% 24.84% 5.25%

DV_16 13.36% 68.09% 21.18% 16.37% 0.89% 3.13% 1.01%

DV_19 35.29% 67.53% 42.41% 28.32% 13.14% 5.13% 5.46%

DV_28 31.48% 48.66% 13.05% 17.75% 8.35% 5.50% 2.10%

Total 42.30% 45.42% 21.45% 15.37% 5.26% 5.66% 2.31%

Table 3.3d – Percentage of points above 5% error

well

Castagna et

al. Han et al. DTS/DTP DTS-DTP ANN Cheat

Ave of 3

c-RNN Runs c-RNN Cheat

VT_02 14.16% 5.49% 0.00% 0.00% 0.00% 0.00% 0.00%

VT_10 8.98% 16.23% 3.81% 3.13% 5.51% 4.64% 3.11%

VT_18 2.13% 21.71% 0.15% 0.10% 0.00% 0.01% 0.00%

VT_26 20.03% 9.26% 5.65% 1.71% 0.03% 0.09% 0.08%

VT_29 25.81% 3.85% 4.95% 4.93% 0.08% 0.00% 0.00%

VT_42 75.28% 21.73% 26.90% 5.40% 0.17% 0.49% 0.06%

VT_49 0.85% 25.53% 0.53% 0.13% 0.00% 0.05% 0.00%

VT_69 65.80% 15.14% 16.33% 0.52% 0.00% 0.00% 0.00%

VT_108 2.87% 19.19% 0.38% 0.41% 0.00% 0.00% 0.00%

VT_132 14.03% 10.55% 0.25% 0.68% 0.20% 0.30% 0.60%

VT_137 39.38% 17.82% 6.06% 0.23% 0.10% 0.74% 0.15%

DV_16 1.77% 32.82% 0.63% 0.51% 0.25% 0.03% 0.10%

DV_19 12.51% 42.26% 16.96% 4.51% 1.99% 0.34% 0.33%

DV_28 16.35% 16.33% 0.90% 3.20% 1.15% 0.52% 0.53%

Total 21.09% 18.22% 5.67% 1.76% 0.63% 0.49% 0.34%

Page 79: Machine Learning Applications for Production Prediction

62

Table 3.3e – Maximum minus minimum error of the three c-RNN runs

well MAPE MaxPE

Percentage of

points > 3% error

Percentage of

points > 5% error

VT_02 0.29% 1.24% 2.58% 0.00%

VT_10 0.78% 1.36% 11.94% 0.62%

VT_18 0.18% 1.07% 0.08% 0.03%

VT_26 0.14% 0.70% 0.39% 0.11%

VT_29 0.07% 0.32% 0.95% 0.00%

VT_42 0.42% 2.14% 4.89% 1.25%

VT_49 0.57% 4.67% 3.06% 0.05%

VT_69 0.50% 1.28% 5.28% 0.00%

VT_108 0.03% 0.55% 0.56% 0.00%

VT_132 0.52% 0.89% 0.08% 0.03%

VT_137 0.29% 0.21% 8.64% 0.30%

DV_16 0.32% 0.76% 1.49% 0.05%

DV_19 0.06% 0.84% 0.50% 0.06%

DV_28 0.16% 0.49% 1.20% 0.03%

Average 0.31% 1.18% 2.97% 0.18%

Table 3.3f – Standard deviation of errors of the three c-RNN runs

well MAPE MaxPE

Percentage of

points > 3% error

Percentage of

points > 5% error

VT_02 0.14% 0.68% 1.32% 0.00%

VT_10 0.41% 0.76% 6.39% 0.36%

VT_18 0.10% 0.54% 0.04% 0.01%

VT_26 0.07% 0.35% 0.22% 0.06%

VT_29 0.04% 0.18% 0.50% 0.00%

VT_42 0.22% 1.11% 2.55% 0.72%

VT_49 0.33% 2.34% 1.53% 0.03%

VT_69 0.25% 0.67% 2.65% 0.00%

VT_108 0.01% 0.30% 0.30% 0.00%

VT_132 0.26% 0.46% 0.04% 0.01%

VT_137 0.15% 0.11% 4.42% 0.18%

DV_16 0.16% 0.38% 0.78% 0.03%

DV_19 0.03% 0.48% 0.26% 0.03%

DV_28 0.08% 0.28% 0.61% 0.01%

Average 0.16% 0.62% 1.54% 0.10%

Page 80: Machine Learning Applications for Production Prediction

63

The results show that the empirical correlations did not work for this study. The likely reason for

this is because the empirical correlations were developed from a very broad (global) set of

reservoirs and do not translate well to specific (local) reservoirs well. The simple baselines

performed about twice as well as the empirical formulas. This is no surprise as they were derived

from the average of 13 training wells, making them very local and only applicable to this

particular study area. Both neural networks show statistical significance as they outperformed the

simple baselines and the empirical correlations.

The metrics show that the c-RNN performance is similar that of the ANN cheat run; however, a

visual comparison reveals that the c-RNN has much better predictive capabilities. Figures 3.14,

3.15 and 3.16 compare the ANN cheat and average c-RNN synthetic logs for one of the worst

wells (VT_10), the best well (VT_18) and the most central well VT_42 respectively. The visual

comparison shows how the average c-RNN run can predict the small variations much better than

the best result achieved by the cheat ANN run.

Page 81: Machine Learning Applications for Production Prediction

64

Figure 3.14: Synthetic logs generated by the cheat ANN (left) and the average of three c-RNN

(right) runs compared to the true logs for one of the worst blind wells – VT_10.

MAPE = 1.76% MAPE = 1.88%

Page 82: Machine Learning Applications for Production Prediction

65

Figure 3.15: Synthetic logs generated by the cheat ANN (left) and the average of three c-RNN

(right) runs compared to the true logs for the best blind well – VT_18.

MAPE = 0.62% MAPE = 0.66%

Page 83: Machine Learning Applications for Production Prediction

66

Figure 3.16: Synthetic logs generated by the cheat ANN (left) and the average of three c-RNN

(right) runs compared to the true logs for the most important well - VT_42.

Depth of majority of

nearby horizontal wells

MAPE = 1.17% MAPE = 1.02%

Page 84: Machine Learning Applications for Production Prediction

67

Wells VT_10 and VT_137 had the highest error rate in the c-RNN runs. As shown in Figure 3.4

these wells are located near the edge of the study area where not many wells have been drilled,

so their accuracy is not of great importance for the purpose of this study. Most of the vertical and

horizontal wells are drilled near the center of the study area closest to well VT_42 and four other

nearby wells: VT_26, VT_29, VT_69 and VT_108. It is important to focus on these wells as they

dictate the success of the network to generate synthetic curves for the majority of wells with no

DTS. From the results, VT_42 had a MAPE of only 1% and only ~7% of the points had an error

>3% error, the other four wells also performed fairly well with an MAPE close to 1%. Figures

3.17 and 3.18 show the synthetic curves generated by the c-RNN compared to the true DTS

curves for the more centrally located wells VT_26, VT_29, VT_69 and VT_108. The cheat run

of the c-RNN had the best performance. This suggests potential for improvement in the stopping

criteria and design of the c-RNN network.

Page 85: Machine Learning Applications for Production Prediction

68

Figure 3.17: Synthetic logs generated by the c-RNN for wells VT_26 and VT_29 compared to

the true logs.

MAPE = 1.27% MAPE = 1.02%

Page 86: Machine Learning Applications for Production Prediction

69

Figure 3.18: Synthetic logs generated by the c-RNN for wells VT_69 and VT_108 compared to

the true logs.

MAPE = 1.20% MAPE = 0.99%

Page 87: Machine Learning Applications for Production Prediction

70

Figure 3.19 shows how the synthetic curves generated for blind well VT_18 improve in accuracy

with the number of training iterations. The results show that the broad trend of the DTS log is

captured early in the network training and more detailed variations are captured with more

training iterations.

Epoch 1:

Epoch 10:

Page 88: Machine Learning Applications for Production Prediction

71

Epoch 40:

Epoch 70:

Epoch 120:

Page 89: Machine Learning Applications for Production Prediction

72

Epoch 380:

Figure 3.19: Sequence of synthetic well logs generated for well VT_18 at different number of

epochs. Y-axis is DTS (µs/ft) and x-axis is the thickness steps from the top of the formation.

3.7 Conclusions

A procedure to generate highly accurate synthetic DTS curves has been developed. This

procedure can act as a cost effective and fast alternative to running actual DTS logs. This does

come with the condition that the area of interest is small enough and that there are enough wells

with true DTS measurements available. Generation of a synthetic DTS measurement at a given

point along the reservoir thickness only required five inputs: x and y coordinates, depth subsea,

RHOB and DTP. The GR log was tested and was found to not improve the performance of the

network. As shown by the cheat run, the procedure does still have room for improvement. Some

potential reasons for the error in the synthetic DTS curves are:

• 13 training wells are not enough to learn every possible signal.

• Some signals are caused by variables not captured in the study (ie. fractures, stress

profiles, pore fluids, high porosity, shale, thin beds, gas presents, high total organic

content (TOC)).

Page 90: Machine Learning Applications for Production Prediction

73

• DTS measurement error is caused by the tool itself or poor borehole conditions.

Although the network did perform poorly on two of the blind well tests (VT_137 and VT_10) it

is more important to look at the performance of wells located closer to the majority of vertical

and horizontal wells. As there is no real way to measure synthetic DTS curve accuracy for wells

with no DTS one can only rely on the results of the network for nearby wells. Most wells are

located in the center of the study area and most of the horizontal wells reside in the top 25%

thickness and the network has shown to be able to generate fairly accurate curves for that region

of the study. The c-RNN has also shown to have fairly consistent results between the runs, but it

is still suggested to take the average of three runs when generating synthetic curves for wells

with no DTS. This procedure is not limited to the Montney Formation. As long as the area is

small enough and contains enough wells with DTS logs it could be applied to any other

formation. Even in this small window of the Montney, the 250 existing horizontal wells only

cover a small fraction of the thick reservoir and there is remaining opportunity for further

development.

Having a predictive procedure like this could potentially give the user a way to optimize where

to drill and how to complete a horizontal well before it is actually drilled. If the number of wells

per area is large enough the procedure could be applied to any other formation.

Page 91: Machine Learning Applications for Production Prediction

74

Chapter 4: A Convolutional-Recurrent Neural Network Model for Predicting Multi-Stage

Horizontal Well Production

4.1 Preface

This Chapter has been published as a manuscript in the Journal of Petroleum Science &

Engineering on November 21, 2020. This article is co-authored by Ian D. Gates.

Page 92: Machine Learning Applications for Production Prediction

75

4.2 Abstract

In this study, a hybrid convolutional-recurrent neural network (c-RNN) is evaluated for making

predictions of the five-year cumulative production profiles in multistage hydraulically fractured

wells. The model was trained by using a combination of completion parameters, rock mechanical

properties, and well spacing and completion order for each stage of 74 wells in the Montney

Formation in British Columbia. The prediction accuracy of the various combinations was

measured by using the mean average percent error (MAPE) generated through the leave-one-out

method. The best combination of inputs was found to be the rock mechanical properties

surrounding each perforation cluster, the proppant amount used for every stage, and the spacing

and completion order of neighboring wells. The novelty of this study is that the input variables

used are at the stage level rather than the average of the entire well. The accuracy of the model

was found to increase exponentially as the production of multiple wells was aggregated. The

approach yields insights for planning new well drills in fields with existing development since it

provides the ability to run multiple field development scenarios without having to spend capital.

Page 93: Machine Learning Applications for Production Prediction

76

4.3 Introduction

The post hydraulic fracture production performance of a horizontal well is influenced by a

multitude of variables, in this study we identify three major categories:

3. Geological properties of the formation surrounding the perforations,

4. Completion design of each stage, and

5. Well spacing and completion order.

The geological properties surrounding the perforations in a horizontal well have arguably the

greatest influence on the production performance. Geological properties include the volume of

hydrocarbons in place, the permeability, the natural fracture geology, in situ stresses, and rock

mechanical properties. The selection of the right place to drill a well is critical to ensure an

ongoing expanding commercial operation.

A model that predicts the production profile of a well before it is drilled with reasonable

accuracy would help to ensure a commercial operation allowing an operator to run multiple

scenarios of well placements and completion types and give the ability to optimize the

development of the field to their needs. Such a model could only predict the production

performance given a set of input parameters at each perforation cluster along the wellbore; it

would be difficult to predict how much each individual cluster parameter contributes to the

output. This is partly because the production in a horizontal wellbore is measured as a sum of the

production coming into each perforation cluster. The wells in our study had between 5 and 47

clusters that were spaced from 58 to 300 m apart and the completion design of each stage in the

Page 94: Machine Learning Applications for Production Prediction

77

same well was not identical. This results in each cluster having a unique post stimulation

fracture network making each cluster contribute non-equal amounts to overall production.

Furthermore, fracture stimulation is far from a perfect process: sometimes the reservoir never

reaches breakdown pressure and/or a plug breaks down and the fluid meant for one stage goes

into another. In other cases, the injected fluid may only flow into one thick natural fracture

thereby not really increasing the contact surface area. If there are multiple clusters in one stage,

there is no guarantee that the total stage fluid would divide itself evenly into each cluster and

create identical copies of fracture networks. This non-uniformity in production is well known,

He et al. (2017) and Al-Shamma et al. (2014) showed that up to 30% of fracture clusters in a

horizontal wellbore are non-productive.

The typical way to predict production from hydraulically fractured wells is to use numerical

reservoir simulators. Numerical simulation develops a dynamic model using available static and

dynamic data. The dynamic model is based on a static geological model which is built by

upscaling interpreted data from vertical well logs. The flow model is developed using the well

known engineering fluid flow principles followed by the history matching process. Fracture

propagation is modeled in various 2D or 3D geometries: some common types include the

Kristianovich-Geertsma-de Klerk (KGD) and the Perkins-Kern-Nordgren (PKN) geometry as

well as the pseudo-three-dimensional (P3D) model. Traditional numerical simulation to predict

production is known as a bottom-up approach (Mohaghegh, 2017). This is because these models

assume that all of the reservoirs complexities and fracture networks are known. Typically, the

geological models are upscaled from single wells and have large uncertainty (Mohaghegh, 2011).

Furthermore, the models that describe the morphology of fractures are grossly oversimplified:

Page 95: Machine Learning Applications for Production Prediction

78

real hydraulic fracture networks rarely resemble perfect ellipsoids and are impossible to perfectly

model. Due to these assumptions, numerical simulators are usually poor estimators of actual

production performance.

An alternative approach is to use an empirical, top-down approach using machine learning

algorithms. In this approach, minimal assumptions are made about the structure of the reservoir

or how fluid flows through it. Rather the input variables of each stage are presented to a machine

learning algorithm that attempts to link them to the production. The purpose of this study is to

develop a model that can predict, with reasonable accuracy the five-year production

performance, post hydraulic fracture of horizontal wells that targets a field with existing

development. The model utilizes a convolutional-recurrent neural network (c-RNN) hybrid to

link the input variables of each stage to a well’s production performance. The input variables are

highly detailed and include the rock mechanical properties surrounding each perforation interval,

the type and size of stimulation of each stage along with the spacing, and completion order of

neighboring wells. Different combinations of inputs are tested to find which combination leads to

the best predictions. The study focuses on 74 horizontal wells located in the Montney Formation

in Alberta, Canada.

4.4 Input Data Preparation

The study area is the same for all research chapters and is described in Chapter 1: introduction of

this thesis and shown in Figures 1.2 and 1.3. To make sure that matrix properties are similar in

the wells, the study focuses only on the top layer of the Montney, Figure 4.1 shows an areal

extent of the 74 wells in the study area.

Page 96: Machine Learning Applications for Production Prediction

79

Figure 4.1: Areal extent of the 74 horizontal wells used in the study.

For this study we gathered the geological properties and completion variables of each stage and

perforation interval for all 74 wells. We also gathered the well spacing and completion order.

Table 4.1 shows the stage input variables that were used in the experiments. The five-year

production profile of the wells was used to train the network.

Page 97: Machine Learning Applications for Production Prediction

80

Table 4.1: The stage variables that were used as inputs in the experiments.

Rock mechanical properties Completion Well spacing and completion order

RHOBnear Type of fluid Length of unbounded time

DTSnear Total proppant placed Unbounded gas production

DTPnear Total fluid injected Length of time each offset well

produced before well is drilled

RHOBfar Total CO2 injected Volume of gas each offset well

produced before well is drilled

DTSfar Distance between

perforation clusters

Percentage of length that each offset

well covers this well

DTPfar Average perpendicular distance from

each offset well to this well

The number of clusters per stage was typically only one. However, some of the newer wells had

up to three clusters per stage. The clusters in a stage share the total proppant and fluid amounts

of the stage. This makes it difficult to allocate the amount of proppant and fluid entering each

cluster. To account for this, the rock mechanical variables of those stages were taken as the

average of all the clusters in that stage.

4.4.1 Geological Properties

Both the matrix and the natural fracture network contribute to the overall geological properties.

As previously mentioned, the matrix properties are expected to be similar between wells.

Fracture networks which are based on the rock mechanics would be expected be vastly different

not only between wells but between perforation clusters along the wellbore. The rock mechanical

properties were used to describe the geology surrounding each perforation cluster. Rock

mechanics are a function of the DTS, DTP and RHOB logs. As shown in Figure 3.4, only 14/180

of the vertical and deviated wells in the study area had DTS measurements. In Chapter 3 a

procedure was described that generates accurate synthetic shear sonic logs (with an average

MAPE of 1.2%). For this study, this procedure was used to generate DTS logs for 166 of the

Page 98: Machine Learning Applications for Production Prediction

81

vertical and deviated wells that were missing one. The Petrel software (Schlumberger, 2019) was

used to upscale and interpolate each of the three logs from all 180 wells throughout the

formation, generating a 3D model of each of the three logs enabling an estimation of the DTS,

DTP and RHOB log profiles along the horizontal wellbores. Two types of profiles were built: the

near and the far. The near profiles were used to estimate the rock mechanical properties at the

wellbore; they would be expected to directly influence how the fractures start to propagate. The

far profiles shown were built using 50 m3 cubes that follow the wellbore profile. Each cube

resembled the average of the properties within that cube. The cubes were an attempt to describe

the how hydraulic fractures would propagate through the cube after they were initiated. The

measured depth (MD) of each cluster was used to get the corresponding rock mechanics

surrounding it. Each perforation cluster in each well had 6 rock mechanical values: (DTS, DTP,

RHOB)near and (DTS, DTP, RHOB)far.

4.4.2 Completion Variables

Table 4.2: Completion variables

Completion Variable Range per stage

Service company performing the stimulation Numerous companies

Type of completion Sliding sleeve or plug and perf

Gun diameter 73 mm – 86 mm

Charge size 15 - 31

Number of perforations per cluster 11 - 42

Distance between successful stages 27 m – 875 m

Amount of acid used in spearhead stage 0 m3 – 9 m3

Fluid type slickwater or crosslinked gel

Size of pad stage 1 m3 – 138m3

Amount of energizer (CO2) 20 m3 - 560 m3

Proppant concentration 25 – 1,920 kg/m3

Page 99: Machine Learning Applications for Production Prediction

82

Proppant size and type 50/40, 30/50, 20/40 mesh

Total proppant placed 0 tonnes – 300 tonnes

Total proppant inejcted 10 m3– 1400 m3

Size of flush stage 0.5 m3 – 98 m3

Amount of crosslinked gel 0 m3 – 8 m3

Flowrate of each stage 1 m3/min - 13 m3/min

Pressure of each stage 5 – 75 MPa

All of the completion variables that are listed in Table 4.2 were gathered for each stage of the 74

horizontal wells in the study using publicly available completion reports. Not every well had all

these variables publicly available. From the various stage variables gathered, to limit the number

of variables, the experiments were conducted with the following variables: the type of fluid, total

proppant placed, total fluid injected, total CO2 injected, and the distance between perforation

clusters. These variables were chosen since they were available at every well.

The 74 horizontal wells in the study were completed between 2006 and 2017, the amount of

stages per well ranged from 5 to 31. In total there was 943 attempted stages for all 74 wells.

Some of the stages failed due to pump problems, screen off or the breakdown pressure never

being reached. There were a total of 23 (about 2%) failed stages and all of the failed stages

resulted in minimal amounts of proppant entering the formation (maximum 3 tonnes). For the

purposes of this study the failed stages were ignored, thus the total amount of stages used was

920. Each successful stage has completion variables as well as rock mechanical properties linked

to it.

Page 100: Machine Learning Applications for Production Prediction

83

4.4.3 Well spacing and completion order

Well spacing and completion order were considered at a well level not a stage level. Because of

this, the input values representing spacing and completion order for the well would be equal

every stage in that well. The following inputs were gathered for each well:

a) The length of time the well produced unbounded;

b) The volume of gas the well produced unbounded;

c) The length of time that each offset well produced before this well began production

(could be positive or negative depending on the order);

d) The volume of gas that each offset well produced before this well began production;

e) The percentage of length that each offset well covers this well;

f) The average perpendicular distance from each offset well to this well.

Each well in the study could have up to three offsetting wells (one on one side and two wells

sharing a portion of the length of another). The reason that (c) above could be positive or

negative is because the number of offsets each well has during field development is not constant,

a negative value in (c) means that this well was drilled before the offset was drilled. This is

important to account for as future offsetting wells may negatively impact the production of the

current well. If we use the model to predict the production performance of a new well,

accounting for when a new offset would be drilled beside this well would require some

forecasting; however, this decision is fully controlled by the operator and for all of the wells in

this study, the timing is already known. Having the ability to tell the network when a future

offset well would be drilled would enable the model to run various sensitivities.

Page 101: Machine Learning Applications for Production Prediction

84

4.4.4 Production Data

Predicting the gas production profile is the focus of this study. Gas production data, along with

the operational hours were gathered from the public database on a monthly basis for each well

from the start of production to January 2020. The 74 wells in this study produced mainly dry gas,

condensate production was rare and only reported in a few wells for a small portion of their

production history. The condensate production made up only 0.01% of the total barrel of oil

equivalent (BOE) production of the 74 wells and due to this the condensate production was

ignored for this study and all wells were assumed to produce dry gas.

The amount of gas that a well produces in a month is controlled by the reservoir and operating

conditions. Reservoirs conditions behave in a somewhat predictable manner, but well operation

is human driven which makes it impossible to predict. For example, most of the wells in the

study had rate a restriction between 5,000 and 8,000 Mscf/day and most of the wells did not

produce 100% percent of the time for every single month. In order to have a model that predicts

reservoir behavior, the operating conditions needed to be constant. To do this, we created an

Arps decline curve that mimicked the true production data in the rate cumulative plot. The rate

restriction for all wells was set at 5,000 Mscf/day and the production time was set to 100%. An

example of the Arps overlaying process for one of the wells is shown in Figure 4.2. Figure 4.3

shows how the Arps overlay rate restriction is lower than the actual for some of the wells. The

Arps decline is a smooth curve and represents what the production trend would look like under

perfect conditions.

Page 102: Machine Learning Applications for Production Prediction

85

Figure 4.2: A visual representation the Arps decline overlaying the true production. This is

shown on the rate-cumulative production plot for one of the wells. The red line represents the gas

rate assuming 100% on time every month, the orange line represents gas rate scaled down to

account for the actual on time during the month.

Figure 4.3: A plot showing the differences in rate restrictions between the actual production and

that of the Arps decline curve.

Production trends can be described by volume produced, time produced or the production rate

which is a derivative of the two. Figure 4.4 shows the various ways to visualize a production

trend. The most commonly used are the cumulative production versus time plot, rate versus

Page 103: Machine Learning Applications for Production Prediction

86

cumulative production plot and rate versus time a plot. This study focuses on the cumulative

versus time plot, specifically a five-year cumulative production curve at one-year increments.

The cumulative versus time plot was chosen because it provides a simple, high level view of

production performance that can aid an operator when running sensitivities on where to drill a

new well and how to complete it. Although the rate-based plots provide much more detail such

as the initial rate, decline rate, decline exponents, shut down time and flush production, these

would add unnecessary complexity and cause the network to make less accurate predictions.

Page 104: Machine Learning Applications for Production Prediction

87

Figure 4.4: The three common types of plots used to describe the production performance of a

well. Cumulative versus time (top), rate versus cumulative (middle), rate versus time (bottom).

These plots were generated using one of the wells in the study.

Some wells in the study have not yet reached five years of production, which creates a problem

when training the network. For these wells, the Arps decline curve was used to forecast out

production beyond the end of historical data. Only 16 out of the 74 wells required to have their

production to be forecasted between 1 and 30 months beyond January 2020. Although this

inevitability introduced some uncertainty to the analysis it was very useful in that it allowed the

experiment to be run on all 74 wells. The experiment was also run using only the 58 wells with

no forecasted production to see if forecasting production for the 16 well lead to better prediction

accuracy.

Page 105: Machine Learning Applications for Production Prediction

88

4.5 Experimental Setup

4.5.1 Networks used in the experiments

Two types of networks were used in this study: a simple feedforward network (ANN) and a multi

headed convolutional-recurrent hybrid (c-RNN). The networks were programmed in Python

using the keras library (Chollet, 2015).

The ANN was chosen as it is the simplest type of network and serves as good baseline to

compare the performance to the more complicated c-RNN. The c-RNN hybrid network was

chosen because it combines the speed and ability to process large amounts of data of a

convolutional network with the sequence processing ability of a recurrent network. The c-RNN

also uses a 3D input which allows it to use a stage level input. In Chapter 3, the c-RNN was

found to have superior performance to that of the ANN.

In multi-headed architectures each input variable is handled by a separate convolutional network

(head) and the output of each of these networks (heads) are merged and inputted into a recurrent

network before a prediction is made; these types of models are known to offer better

performance in some instances (Bagnall, 2015). The architecture of the multi-headed c-RNN is

shown in Figure 4.5.

Page 106: Machine Learning Applications for Production Prediction

89

Figure 4.5: c-RNN structure that was used in the study.

4.5.2 Input shape and normalization

An ANN requires a 2D input (sample, variable). Since the output of the networks (production

profile) is forecasted on a well level, the input into an ANN must be on a well level as well. The

problem is that the input variables are different for every stage in a well, so the only way to use

an ANN for these experiments is to average an input variable along all stages in a well. The

Page 107: Machine Learning Applications for Production Prediction

90

expectation then is that this would lead to higher error rates because certain variables such as

rock mechanical properties can change drastically from stage to stage over a 2 km horizontal

wellbore.

The c-RNN on the other hand uses a 3D input (step, variable, sample). The c-RNN input shape

works much better with our study because our input data is on a stage level, each stage goes into

the step axis. The input shape is depicted in Figure 4.6 as a series of tables, where the columns

represent the input variables and the rows represent the stages and each table is a different well.

Figure 4.6: Format of input and output data of the c-RNN model

Each value in the tables represents a particular input variable at a particular stage. The maximum

number of successful stages per well in this study was 31. The input shape cannot change during

an experiment so wells with less than 31 successful stages simply had a “0” value associated

with the stages it did not have. This made the total size of the training input = (n, 31, 73) and the

total size of the test = (n, 31, 1), where n is the number of input parameters chosen for the

experiment. Because the number of stages in a well is in a different axis than the rest of the

variables, it can be thought of as a baseline parameter that is always present no matter what

Page 108: Machine Learning Applications for Production Prediction

91

combination of input parameters are shown to the network. The c-RNN was used as the main

network in the experiment because of its 3D input shape. The ANN performance was compared

to the c-RNN in the last experiment which used the best configuration of input parameters.

All the input variables along with the 5-year predictions where normalized to values between 0

and 1, where 0 represents the minimal value in that parameter and 1 represents the maximum.

Normalizing the inputs and outputs is a very common step in pre-processing of data, as it

removes the differences in the scales across variables that may decrease model performance. For

example, the RHOB values in this study ranged between 2.45 g/cc and 2.66 g/cc but the total

fluid placed in a stage ranged from 66 m3 to 1,805 m3 if the data was presented in its raw form.

The network would put more bias on the fluid placed per stage as the numerical values are orders

of magnitude larger than RHOB.

4.5.3 Experimental Setup

The study only contained 74 wells. In machine learning this is considered to be a small sample

size. The best method to determine model accuracy for limited sample sizes is the “leave-one-

out” cross-validation method. In this method, 73 wells are used to train the network and one well

known as the blind well is used to evaluate. The network is trained 74 time using 74 different

blind wells and the average accuracy of all 74 wells are examined.

The metrics used to measure the accuracy of the model was the mean absolute percentage error

(MAPE) and mean absolute error (MAE):

𝑀𝐴𝑃𝐸 = 1

𝑛∑

|𝐴𝑡−𝐹𝑡|

𝐴𝑡

𝑛𝑡=1 𝑀𝐴𝐸 =

1

𝑛∑ |𝐴𝑡 − 𝐹𝑡|𝑛

𝑡=1

Page 109: Machine Learning Applications for Production Prediction

92

Where At is the actual value, Ft is the forecasted value and n is the number of samples. MAPE is

the most common accuracy metric and works best if there are no extremes in data. The MAPE

was used as the main metric in this study because it scales the error to the data and give the

ability to compare the results between wells with different cumulative productions. Due to the

scaling of errors, each individual well MAPE contributes an equal amount to the entire field

MAPE, which removes bias towards wells with larger production volumes. For this study the

individual well MAPE was calculated by averaging the volumetric error of every year in the 5-

year cumulative production trend. The field MAPE (average of all 74 wells) was then used as the

ultimate metric when comparing the results of various combinations of input variables and

network hyperparameters. The MAE was used as a secondary measure of accuracy to make sure

that the MAPE is not affected by extreme outliers, the MAE is a volumetric measurement and

represents the average error of all 74 wells in all 5 years of production forecast. If there are no

extremes in data, the total field MAE and MAPE are expected to follow similar trends.

The outcomes of a neural network are not only dependant on the training data, but also the

starting values of the weight parameters. In machine learning, a model is typically initiated with

random weights, so the model can have a range of predictions while using the same training data,

an example of this is shown in Figure 4.7. Having a range of outcomes is problematic as it is not

consistent and makes the model less trustworthy. The only way to have consistent predictions is

by taking the average of multiple runs. The more times a model is restarted and run the less the

average prediction will differ, Figure 4.8 shows how the average prediction changes with the

amount of runs for a well. We chose to use 30 runs per blind well because the average did not

change much with additional runs which required more computing time. Each run had a total of

Page 110: Machine Learning Applications for Production Prediction

93

300 training epochs, this value was chosen since the blind test error for all 74 wells no longer

decreased at that point.

Figure 4.7: Plot showing the numerous predictions that a model trained on the same dataset could

make.

Page 111: Machine Learning Applications for Production Prediction

94

Figure 4.8: Plots showing how the average prediction changes based on the number of runs

1

run

2 runs

5 runs 10 runs

30 runs 100 runs

500 runs

Page 112: Machine Learning Applications for Production Prediction

95

4.5.4 Hyperparameter Tuning

Network hyperparameters such as the number of nodes and layers, and number of epochs can be

tuned after all 74 experiments are completed to see if an improvement is made. The network

hyperparameters used for the experiments were chosen by a trial and error procedure that

includes the following steps:

1. Choose which hyperparameters can be modified – in our case this was the number of filters

and pooling layers in the convolutional heads as well as the number of nodes and layers in

the RNN part.

2. Define a range for these hyperparameters (ie, maximum and minimum number of nodes and

layers)

3. Run sensitivities using the same experimental setup but changing the hyperparameters. First

change one hyperparameter at a time then run different combinations of each to see which

configuration yields the best result.

The amount of combinations is infinite, so finding the optimal hyperparameters is impossible;

however, this approach does help tune the network to produce fairly accurate results. The tuned

hyperparameters used for the network are presented in Table 4.3.

Page 113: Machine Learning Applications for Production Prediction

96

Table 4.3: Starting hyperparameters for the c-RNN developed in this study

Same for every head

Layer Input Conv Conv

Max

Pooling Merge Conv GRU GRU Output

Number of

nodes 1 64 64 -

32 50 50 1

Activation

Function -

rectified

linear

unit

rectified

linear

unit -

rectified

linear

unit

rectified

linear

unit

rectified

linear

unit -

Recurrent

Dropout - - - -

- 0.5 0.5 -

Once the network hyperparameters were chosen, multiple experiments were run on different

combinations of stage input parameters to see how they affected the error metrics. The first

experiment was a base case prediction which was run using zero stage variables. In this case the

input shape of the training data was (1, 31, 73) where the x-axis was either a 1 or a 0 representing

a stage being present or not. By doing this, the only information presented to the network was the

number of stages in a well. Next, the experiments were run on individual factors, followed by a

combination of factors from the geological and completion categories. Finally, once the best

combination of parameters was found the well spacing and completion order information was

added to see if it had any affect on the outcome.

Page 114: Machine Learning Applications for Production Prediction

97

4.6 Results and Discussion

Table 4.4: Results of the leave-one out experiments performed on only the geological variables

Variables Used Average MAPE Maximum MAPE Minimum MAPE MAE (MMcf)

No variables (base case) 21.18% 112.31% 1.73% 685

RHOBnear 19.60% 84.49% 1.82% 648

DTSnear 19.56% 83.44% 2.08% 663

DTSfar, DTPfar and RHOBfar 19.08% 89.79% 1.63% 634

DTPnear and DTSnear 18.74% 91.22% 0.90% 629

DTPnear 18.39% 91.71% 1.72% 612

RHOBnear and DTSnear 17.91% 68.86% 1.56% 606

RHOBnear and DTPnear 16.73% 85.17% 1.74% 565

(DTS, DTP, RHOB)far and (DTS, DTP, RHOB)near 16.60% 84.91% 1.64% 563

DTSnear, DTPnear and RHOBnear 16.57% 84.42% 1.78% 562

Table 4.5: Results of the leave-one out experiments performed on only the completion variables

Variables Used Average MAPE Maximum MAPE Minimum MAPE MAE (MMcf)

No variables (base case) 21.18% 112.31% 1.73% 685

Distance between successful stages m 23.07% 100.84% 2.00% 757

Total CO2 injected 21.96% 124.02% 1.94% 737

Total proppant injected and total fluid injected 21.14% 105.50% 1.72% 687

Total fluid injected 21.08% 124.23% 1.17% 682

Type of fluid used 20.68% 118.72% 1.36% 658

Total proppant injected 20.43% 99.04% 1.88% 662

Total proppant, fluid and CO2 injected 19.38% 100.55% 0.95% 636

Total proppant, fluid, CO2 injected, and type of fluid used 19.21% 100.30% 0.80% 633

Table 4.6: Results of the leave-one out experiments performed on geological, completion and

spacing variables

Variables Used Average MAPE Maximum MAPE Minimum MAPE MAE (MMcf)

No variables (base case) 21.18% 112.31% 1.73% 685

(DTS, DTP, RHOB)near, total proppant, fluid, CO2

injected, and type of fluid used 17.34% 79.88% 1.41% 583

(DTS, DTP, RHOB)near, total proppant and fluid injected 16.37% 84.95% 1.85% 563

(DTS, DTP, RHOB)near and total proppant injected 16.27% 89.99% 0.82% 556

(DTS, DTP, RHOB)near, total proppant injected, well

spacing and completion order 14.90% 83.12% 0.97% 534

Page 115: Machine Learning Applications for Production Prediction

98

Tables 4.4 through 4.6 summarize the results of the leave-one-out experiments using the c-RNN

network. Each table has the base case prediction included for comparison; the MAPE and MAE

of the base case came out to be 21.18% and 685 MMcf respectively. The MAE was used as a

secondary check to make sure the MAPE is not affected by very large or very small

measurements. From the tables it is clear that the MAE and MAPE follow similar trends,

because of this we will only comment on the MAPE for the remainder of the study.

Table 4.4 shows the results using combinations of only rock mechanical variables. Using all

three near profile logs resulted in average prediction error of 16.57% which is lower than the

19.08% error resulting from the far profile logs. Also, using both far and near profile logs as

inputs did not have any improvements on the prediction performance over using just the near

profile logs. This is likely because the far profile logs were averages of 50 m3 blocks, that ended

up being poor representations of how fractures propagate through a formation whereas the near

profile logs were not averaged and describe the conditions at fracture initiation. Predictions were

also made using individual near profile logs, out of the individual well logs, the DTP log lead to

a better prediction than the DTS or RHOB. Using all three rock mechanical logs as inputs

resulted in better performance than just one or two; this suggests that using all three logs is

important when describing the rock mechanics.

Table 4.5 shows the results using only completion variables. Runs were done on individual

variables as well as combinations of variables. Out of the individual variables, the total proppant

injected had the lowest average error of 20.43%. The combination of total proppant, fluid, CO2

injected, and the type of fluid used had the lowest error of 19.21%. Using the distance between

Page 116: Machine Learning Applications for Production Prediction

99

successful stages as the only input made the model predict with and error of 23.07% which was

worse than that of the base case. The prediction accuracy of the model was greater using only the

rock mechanical properties versus only the stimulation variables. This suggests that the rock

mechanics surrounding a well have a greater effect on production performance than the

stimulation design.

Table 4.6 shows the results from the combination of rock mechanical properties and completion

variables as well as the addition of the well spacing and completion order. The best prediction

with an average error rate of 16.27% was made by using the three near profile logs in

combination with the total proppant injected. Adding more stimulation variables to the input

such as total fluid/CO2 injected, or type of fluid used did not improve the accuracy of the model.

Finally, when information about the well spacing and completion timing was added to the input,

the network was able to make even better predictions with an average error rate of 14.90%, this

was the best case in this study. This points to the fact that this information plays an important

role in production performance. The difference between the best case and the base case was only

6.28%, this suggest that the number of stimulation stages in a well (which is the input of the base

case) has a large affect on production.

Figure 4.9 shows the distribution of the individual well MAPE values for the best case, 47 out of

the 74 wells had a MAPE <15%, and almost all the wells had a MAPE <50%, only one well had

an error rate of 83%. Figure 4.10 shows the production profiles of the best and worst wells.

Page 117: Machine Learning Applications for Production Prediction

100

Figure 4.9: Distribution of individual well MAPE from the best case.

Figure 4.10: Plot the worst (left) and best (right) well production profiles created by the best-case

model. The red line is the average of the 30 runs, the green line is the true profile.

There were 16 wells used in the training set that have not yet reached five years of on production

time, and their trend had to be forecasted out to using Arps decline. To see if this forecasting of

production had an impact on results the best case was also run with the 16 wells removed. The

average error rate for the 58 non-forecasted wells was 15.2% when the 16 forecasted wells were

Page 118: Machine Learning Applications for Production Prediction

101

included in the training set. When the 16 forecasted wells were removed from the training set,

the average error rate for the 58 non-forecasted wells turned out to be 18.0%, this suggests that

using more wells in the training set is beneficial even if production from these wells needed to be

partially forecasted.

The input parameters for the best-case experiment were also used to test ANN accuracy. Since

the ANN can only take a 2D input, the input variable had to be averaged along the stages. The

MAPE from the ANN turned out to be 20.6%. This demonstrates that the c-RNN is the better

network to use.

Predicting individual well production before it is drilled is difficult, and it comes at no surprise

that the best case MAPE of the model was 14.9%. There are several reasons for this. Firstly, the

information about the geology surrounding a horizontal well comes with great uncertainty.

Secondly, the process of how hydraulically induced fractures propagate through a formation is

extremely complex and depends on a myriad of factors that cannot be measured; the possibilities

of network structures are endless. Also, the inability of measuring production coming out of each

perforation interval makes it impossible to link the rock mechanics and stage parameters directly

to the stage production. Finally, the interaction affects between wells are difficult to quantify and

add more complexity to the problem.

An interesting observation was that the accuracy of the model increased if the individual well

predictions were added together and compared to the actual. Adding up the best-case production

Page 119: Machine Learning Applications for Production Prediction

102

profiles of all 74 wells resulted in an average MAPE of only 0.8%, the aggregated production

profile of the 74 wells is depicted in Figure 4.11.

Figure 4.11: Plot the best-case aggregate production profile of all 74 wells vs time.

This reduction in MAPE is due to the network underpredicting the performance of some wells

and overpredicting the performance of others. When these differences are added they begin to

cancel each other out. To see how the number of wells in aggregated group affect the MAPE, we

chose aggregate numbers of 2, 3, 5, 10, 30 and 50 wells and ran 500 different combinations of

randomly chosen wells for each aggregate amount and recorded each combinations MAPE. The

aggregate MAPE was taken as the average of the 500 runs. Doing 500 runs on each aggregate

amount produces a statistical representation of the average of all possible combinations of wells

in that aggregate. The results of this analysis is shown in Figure 4.12.

Page 120: Machine Learning Applications for Production Prediction

103

Figure 4.12: Plot how aggregating well together affects the mean average percent error (MAPE).

The MAPE drops exponentially with the increase in the number of aggregated wells. By

predicting the summed production profile of 10 wells the prediction accuracy drops to 4.5%,

which is a significant improvement over the 15% error rate of individual wells. Due to this, the

model developed in this study becomes more useful with larger development plans. The

aggregated production profile assumes all wells will come on production at the same time, which

is not the case in real life development, but it would still be of use to provide an overall picture

of total production that would be expected from the new wells.

Since the DTP was an important input it is important to understand how the DTP at the

perforation intervals affect the cumulative production at the wellhead. The DTP affects the

brittleness of a rock which dictates the way fractures form within a reservoir. A more brittle

Page 121: Machine Learning Applications for Production Prediction

104

reservoir is expected to produce more hydrocarbons than a ductile one since brittle rock forms

more complex fracture networks which increase the contact area between reservoir and wellbore.

Brittleness is related to the mechanical properties and rocks with a large Young’s modulus and

small Poisson’s ratio are more brittle than those with a small Young’s modulus and large

Poisson’s ratio (Wang and Gale, 2009). When the DTP is increased and all other variable

including the DTS are held constant both the Poisson’s ratio and the Young’s modulus decrease;

however, the Poisson’s ratio decreases more drastically. Conversely when the DTP is lowered

both the Poisson’s ratio and the Young’s modulus increase, however the Poisson’s ratio increases

more drastically. Because of this, the brittleness increases with increasing DTP and since

brittleness is related to production increase in DTP is expected to increase production as well. To

test this a sensitivity was run to see what affect the DTP input has on cumulative production. In

this sensitivity all input variables for all 74 wells were held constant and only the DTP stage

inputs were multiplied by a factor ranging between 0.5 and 1.5 to see how this affected the

aggregated 74 well 5-year cumulative production. Figure 4.13 shows the results of the

sensitivity. The results show a linear correlation between DTP and cumulative production and

that the cumulative production increases with increased DTP as expected.

Page 122: Machine Learning Applications for Production Prediction

105

Figure 4.13: Results of a sensitivity showing how the 5 year cumulative production from all 74

wells is affected with changes in the DTP.

4.7 Conclusions

In this study we developed a machine learning model to predicts the five-year cumulative

production profiles in multistage hydraulically fractured wells. The novelty of this study is that it

uses not just the completion variables of a well but also the rock mechanical properties

surrounding perforation clusters along with well spacing and completion timing of wells. The

model was able to predict individual well production profiles with an average error rate of

14.9%, with the best well having an error of only 1% and the worst well having an error of 83%.

The average error rate decreases exponentially as the production of multiple wells is aggregated.

There are many ways to improve this model’s performance, arguably the biggest would be to

have a flow rate measuring tool installed at every perforation cluster. Having this tool would

345

350

355

360

365

370

375

380

385

50% 60% 70% 80% 90% 100% 110% 120% 130% 140% 150%

5 y

ear

Cum

ula

tive

Pro

duct

ion o

f 74 w

ells

(B

cf)

% of oiginal DTP

Page 123: Machine Learning Applications for Production Prediction

106

enable linking the completion variables and the rock mechanics of each stage directly to the

amount of production coming from that stage. Having more vertical wells with actual DTS logs

would lessen the need to generate synthetics which would improve the accuracy of the 3D

models of the DTS logs. Also, it would be useful to have a 3D seismic survey of the formation

since it can be used to identify faults which could be used as inputs into the network as well.

Finally, it is unlikely that the neural network used in this study has the optimal configuration.

Finding this optimal would improve the accuracy but would require many more tests. Installing

this many measuring devices and having DTS logs on every well would most likely be

uneconomic and would probably never occur in a profit driven industry.

Unfortunately, even if economics was not a factor, there is no way to build a model capable of

predicting the production performance of each well with 100% certainty. This is because we are

not able to see with full clarity what goes on under the ground. Geological formations are very

heterogenous and any computer model will fall short of describing their true structure, in

addition to that, fracture dynamics are extremely complex and the exact fracture configurations

before and after stimulations are not possible to observe.

That being said, this model would be of great use to anyone planning on drilling multiple wells

in fields with existing development. This is because the model accuracy was shown to increase

exponentially if it is used for predicting production of more than one well. The lower error gives

the ability to run multiple field development scenarios without having to spend capital. This

would grant the ability to optimize well placement and completion which would lower overall

costs and potentially lower the environmental impact of hydraulic fracture operations.

Page 124: Machine Learning Applications for Production Prediction

107

Chapter 5: Optimizing Water Usage during Multi-Stage Hydraulic Fracturing with a

Convolutional-Recurrent Neural Network

5.1 Preface

This Chapter has been submitted as a manuscript to the Journal of Petroleum Science &

Engineering in 2020. This article is co-authored by Ian D. Gates.

Page 125: Machine Learning Applications for Production Prediction

108

5.2 Abstract

A machine learning model was developed and trained on 74 existing wells in the Montney

formation in Canada, to run sensitivities on 40 proposed wells that are to be drilled alongside the

existing producers with the objective of minimizing water usage. 1,080 sensitivities were run to

explore how different combinations of stage count, fluid type, water amount, and proppant

amount affect the aggregated 5-year cumulative gas production of the 40 proposed wells. The

results from the machine learning analysis finds that the injection efficiency (cumulative

gas/total fluid injected) drops exponentially when more water is injected. The results show that it

is possible to achieve a cumulative production that is 76% of the maximum by using only 3% of

the water required to achieve this maximum. This represents a significant reduction of the water

usage which supports cleaner and more efficient drilling operations as well as lower costs

associated with water treatment and disposal as well as potential induced seismicity.

Page 126: Machine Learning Applications for Production Prediction

109

5.3 Introduction

Currently, hydraulic fracturing has become the most common method for producing

hydrocarbons from ultra-low permeability reservoirs. Advancements in hydraulic fracturing and

horizontal drilling techniques over the past several decades have made producing hydrocarbons

from unconventional reservoirs economically feasible which lead to a shale boom in North

America and throughout the world (Morton, 2013). In hydraulic fracturing, large amounts of

water and sand are pumped down a wellbore and injected into a target formation at high rates

and pressures (Montgomery and Smith, 2010). The injected fluid creates new fracture networks

starting at the well within the formation that are propped open by the sand effectively increasing

the surface area of the reservoir that is connected to the wellbore (Yew, 1997). Modern hydraulic

fracturing is typically performed in multiple stages along the entire horizontal wellbore. After

hydraulic fracturing, the well is put on production with relatively short peak production period

with a rapid decline to a plateau value that can be maintained for an extended period of time (Tan

et al., 2018).

Most hydraulic fluids are water-based and contain 90 to 97% water by volume (U.S.

Environmental Protection Agency, 2016). Two of the most common water-based fluid used

today are slickwater and crosslinked gel (Montgomery, 2013). Crosslinked fluids were

introduced in the 1960s after it was found that fractures heal unless a propping agent was co-

injected with the water (Palisch et al., 2010). Crosslinked gel is a relatively viscous fluid

(compared to that of water) typically used in ductile formations with higher permeability.

Commonly, guar gum is used as the thickening agent and borate ions as the crosslinking agents.

The advantage of using crosslinked gel is that they exhibit low fluid leak off, low pump rate, low

Page 127: Machine Learning Applications for Production Prediction

110

water usage and ability to pump high sand concentrations (Fink, 2015). The biggest concern with

this type of fluid is that the gel residue that remains in the formation may block the newly created

fractures (Belyadi et al., 2017). Slickwater fluid is typically used in low-permeability, high net

pay reservoirs. Slickwater is water that has had its viscosity reduced by the addition of polymers

such as polyacrylamides (Montgomery, 2013). The lower viscosity of slickwater leads to a

reduction of viscous effects (friction). From a pumping point of view, this lowers the energy

required to move the fluid but on the other hand, this reduces the ability of the liquid to suspend

and transport the proppant from the surface to the reservoir (Palisch et al., 2010). This also leads

to narrower fracture aperture which in turn results in lower hydraulic conductivity of the fracture

(Barati and Liang 2014). To offset this, slickwater is pumped at relatively higher rates (100+

bbl/min) and relatively low concentrations of proppant resulting in massive water requirements

(Barati and Liang 2014). The primary technical advantage of slickwater is the reduced damage

within fractures and reduced fracture height growth (Fink, 2015). Depending on water

availability and costs, low concentration slickwater treatments may also be lower cost than

crosslinked gel since they require less chemicals (Ba et al., 2019). Given the relatively large

volume of water for slickwater stimulation, there is a pressing need to reduce the amount of

water injected into the formation. Furthermore, there might be advantages of low volume

hydraulic fracturing due to the lower chance of induced seismicity (Schultz et al., 2018).

Although there has been a lot of research on which fluid is better for a particular type of tight

formation, for the most part, the choice of fluid is driven by the actual production data and the

operator’s success in a particular formation (Belyadi et al., 2017). Lately, slickwater treatments

Page 128: Machine Learning Applications for Production Prediction

111

have become the more popular choice for hydraulic fracturing stimulation. The three main

reasons for this are as follows (Palisch et al., 2010):

1. industry’s need to cut costs (via use of less chemicals),

2. reservoirs being fractured today have lower permeability and gel clean-up from the

induced fractures have become a stimulation challenge, and

3. fractures created by crosslinked gel were not performing as well as expected due to

formation damage.

Based on a data survey of around 40,000 wells in the United States, the average volume of water

used to create a hydraulic fracture is about 9,500 m3 (Freyman, 2014) with around 30,000 wells

being fractured per year between 2011 and 2014 (U.S. Environmental Protection Agency, 2016).

The fracture intensity, well length and stage count of multi-stage horizontal wells have also

increased over time which has led to an increased water usage. Kondash et al. (2018) found that

between 2011 to 2016, the average water injected along with the average flowback per well has

increased in all six of the major shale producing regions in the Unites States. The large amounts

of water injected during hydraulic fracturing impacts surrounding water resources and requires

more wastewater management (U.S. Environmental Protection Agency, 2016). The greater the

water used, the larger the impacts (Schultz et al., 2018).

The water that is used in hydraulic fracture fluid is typically sourced from groundwater and

surface water resources (U.S. Environmental Protection Agency, 2016). The water may also be

sourced from reused wastewater from previous fracture jobs however this is not widely practiced

and only ~5% of fracture jobs use recycled wastewater (U.S. Environmental Protection Agency,

Page 129: Machine Learning Applications for Production Prediction

112

2016). Groundwater and surface water resources are also the main source of drinking water,

household processes, irrigation, livestock and industrial processes. Water used for a hydraulic

fracture on a single well does not usually impact local water resources; however, if multiple

treatments for many wells are being performed in a single area, the total volume of fluid needed

to fracture the wells may take up a significant portion of locally available water resources

(Scanlon et al., 2014). Areas that are prone to drought or hot weather are significantly impacted

since high withdrawals of local surface and ground water may reduce drinking water availability

(U.S. EPA, 2015). For example, in 2011, water wells overlying the Haynesville Shale used for

extracting drinking water were excessively drained due to local hydraulic fracture operations and

drought (Louisiana Ground Water Resources Commission, 2012). Hydraulic fracturing may also

impact groundwater levels: a study conducted by Scanlon et al. (2014) in Texas showed that

groundwater levels dropped by 31 meters to 61 meters after hydraulic fracturing activity

increased in 2009. High local water withdrawal may lead to erosion, sedimentation, and habitat

fragmentation (Lin et al., 2018).

Large volumes of injected water result in large volumes of flowback wastewater which needs to

be treated, handled, transported, and disposed of (Veil, 2015). Wastewater is typically reinjected

or disposed of above ground. Reinjecting can either be done via injection wells or in other

hydraulic fracture operations. To handle the wastewater above ground, it can either be processed

in wastewater treatment facilities and released back into rivers or evaporated in evaporation

ponds (U.S. Environmental Protection Agency, 2016).

Page 130: Machine Learning Applications for Production Prediction

113

Wastewater handling carries risks and reinjecting water can potentially contaminate freshwater

aquifers (Faruque and Goldowitz, 2017) or cause inducing seismicity (Eaton et al., 2018). The

choice to reuse wastewater in hydraulic fracturing is rarely practiced since it depends on the

quality, quantity and cost associated with reusing the wastewater. Inadequate wastewater

treatment has been known to impact drinking water resources. For example, in Pennsylvania,

wastewater from Marcellus Shale gas wells was treated and released to surface waters. The

wastewater treatment facilities were unable to properly remove the high levels of total dissolved

solids and the discharged wastewater contributed to elevated levels of total dissolved solids

(particularly bromide) in the Monongahela River Basin (Pennsylvania Department of

Environmental Protection, 2015).

The literature suggests that there is a compelling desire to reduce the amount of water used in

hydraulic fracturing jobs. In this study, we use the model developed and trained in Chapter 4 to

run sensitivities on a proposed future field development scenario. In this scenario, 40 additional

wells are to be drilled adjacent to the existing 74 wells and are scheduled to all come on

production at the same time. 1,080 sensitivities are run to explore how different combinations of

stage count, fluid type, water amount, and proppant amount affect the aggregated 5-year

cumulative gas production of the 40 proposed wells. The ultimate purpose of the study is to see if

injected water can be minimized without having a large negative impact on the 5-year

cumulative production.

Page 131: Machine Learning Applications for Production Prediction

114

5.4 Study Area and Proposed Wells

The study area is the same for all research chapters and is described in Chapter 1: introduction of

this thesis and shown in Figures 1.2 and 1.3. Figures 4.1 shows the areal extent of the 74

producing wells that currently exist in the area. The 40 proposed wells chosen to be drilled are

shown in Figure 5.1.

Figure 5.1: 40 proposed well locations (blue) added to the existing 74 wells (red) in the study

area generated in Petrel.

The positioning of these 40 locations represent a full field infill drilling plan that would take

place in a field with existing development. The length of the proposed wells was fixed at 2 km as

this was the representative length of existing wells. Variable stage counts of 10, 15 and 30 per

Page 132: Machine Learning Applications for Production Prediction

115

well were used for the sensitivities. This range was chosen as it represents the minimum, average

and maximum number of completion stages that were done in the existing 74 producers.

The model used in this study uses three groups of input parameters at every stage along the

wellbore to make a prediction about the gas production. These 3 groups are:

• well spacing and completion order,

• rock mechanical properties, and

• completion parameters

The individual parameters of each group are listed in Table 5.1.

Table 5.1: The stage variables that were used as inputs in the sensitivity experiment.

Rock mechanical properties Completion Well spacing and completion order

Density (RHOB) Type of fluid Length of unbounded time

Compressional Sonic (DTP) Total fluid injected Unbounded gas production

Shear Sonic (DTS) Total proppant placed Length of time each offset well

produced before well is drilled

Volume of gas each offset well

produced before well is drilled

Percentage of length that each offset

well covers this well

Average perpendicular distance from

each offset well to this well

5.4.1 Well spacing and completion order

Existing producers in this study are drilled close to each other so the production of gas from one

well affects the production of its neighbors. This was shown in Chapter 4 where the individual

well error rate of the model decreased from 16.3 to 14.9% when well spacing and completion

order parameters (time and gas volume that a well produced in an unbounded state, time and gas

volume that offset wells produced before a well was drilled, percentage of length that each offset

well covers a well, average perpendicular distance between offsetting wells) were taken into

Page 133: Machine Learning Applications for Production Prediction

116

account. Because the well spacing and completion order was important, they were also used for

the 40 proposed wells in this study. The spacing parameters were based on distance from a

proposed well location to existing producers as well the distance between the proposed wells

themselves. Well spacing and completion order makes more sense on a well level not a stage

level, so the input values representing spacing and completion order for the well would be

constant at every stage in that well. This group of input parameters was also constant for every

sensitivity run as the trajectory of the wells was fixed.

5.4.2 Rock mechanical properties

The rock mechanical properties at each stage of the proposed wellbore trajectory where extracted

using the rock mechanical model built in Chapter 4. Briefly, this model was developed by

upscaling and 3D interpolating the density (RHOB), compressional sonic (DTP) and shear sonic

(DTS) logs of 180 vertical and deviated wells located in the study area. The DTS log was present

in only 14 of these wells and had to be synthetically generated for the other 166 using the neural

network developed in Chapter 3. Because stage count per well can be either 10, 15 or 30

depending on the scenario, the rock mechanical properties are different depending on the stage

count. For each proposed well we extracted 3 sets of rock inputs based on the number of stages.

5.4.3 Completion Parameters

For this study the fluid type, fluid amount and proppant amount are the completion parameters.

Just like the stage count, the completion parameters had a range of values in the sensitivity

analysis. The range of fluid and proppant amount as well as the proppant density was limited to

within the values of the existing 74 wells. This was done because the model was trained on the

Page 134: Machine Learning Applications for Production Prediction

117

input parameters of the 74 existing wells and extrapolating values outside the ranges that already

exist may lead to large errors. Table 5.2 shows the ranges of all the variable input parameters

used for the sensitivities as well as the increments used. On average, the 26 existing slickwater

wells used 650 m3 of liquid per stage while the 40 existing crosslinked gel wells used 145 m3.

Due to the much higher stage water volumes of the slickwater wells we split the input water

amount into two ranges, one for each fluid type.

Table 5.2: The ranges of all the variable input parameters and the increments used for the

sensitivity experiment.

Parameter Range Stage Count 10, 15 or 30 Fluid Type Slickwater or crosslinked gel Proppant Density 85-1,050 kg/m3 Proppant Amount per stage 40-300 tonnes (20 tonne increments) Fluid Amount per Stage Crosslinked: 90-260 m3 (10 m3 increments)

Slickwater: 250-1,000 m3 (50 m3 increments)

5.4.4 Input shape and normalization

As depicted in Figure 4.7, the input to the neural network is a series of tables where the columns

represent the stage variables and rows represent the stages. Each value in this table represents a

particular input variable at a particular stage. The maximum number of successful stages per well

in the existing 74 wells was 31. Since the network was trained on existing wells the input shape

cannot, so wells with less than 31 successful stages simply had a “0” value associated with the

stages it did not have. All the input variables along where normalized to values between 0 and 1,

where 0 represents the minimal value in that parameter and 1 represents the maximum from the

existing 74 wells.

Page 135: Machine Learning Applications for Production Prediction

118

5.5 Neural Network Algorithms and Experimental Setup

The neural network used in this study was developed in Chapter 4. In this work, the network was

trained using the 74 existing producers in the area with the supervised learning approach. During

supervised learning both the inputs and the outputs are provided to the network. The inputs are

the well spacing and completion order, rock mechanical properties and the completion

parameters. The output was the 5-year cumulative production profile in one-year increments. The

network predicts an output given the inputs, this predicted output is compared to the actual and

the error is backpropagated back through the system adjusting the weight parameters. The weight

parameters are started with random values but get adjusted as more of the well inputs and outputs

are shown to network. Once the network sees the entire data hundreds of times over the network

beings to find general trends and patterns within the study area and the overall prediction error

rate drops (Reed and Marks, 1998). Once the network is trained to a sufficiently low error rate it

can be used to make predictions.

The trained network cannot distinguish between existing or proposed wells, as long as the input

data is in the correct format the network will make a prediction. The trained network can be used

to see how the 5-year cumulative production of a proposed well is affected when the inputs are

changed, i.e. it can be used to run sensitivities. As long as the proposed wells are located within

the study area (the geology is similar to what the network has been trained on), of similar length

and of similar completion designs the aggregate error rate should be similar to that of the existing

producers. Figure 4.13 shows how the error of the model drops as more wells are aggregated, at

40 wells the error rate should be around 2%.

Page 136: Machine Learning Applications for Production Prediction

119

The model consists in this study is a multi-headed convolutional-recurrent hybrid network (c-

RNN). The c-RNN hybrid network was chosen because it combines both the speed and ability to

process large amounts of data of a convolutional network with the sequence processing ability of

a recurrent network. The c-RNN hybrid has been shown to outperform traditional neural

networks in Chapter 3. In multi-headed architectures each input variable is handled by a separate

convolutional network (head) and the output of each of these networks (heads) are merged and

inputted into a recurrent network before a prediction is made; these types of models offer better

performance (Bagnall, 2015). The network was programmed in Python using the keras library

(Chollet, 2015). The architecture of the network is shown in Figure 4.6 and the network

hyperparameters are shown in Table 4.3.

As depicted in Figure 4.7, the inputs to the neural network are a series of tables where the

columns represent the stage variables and rows represent the stages. Each value in this table

represents a particular input parameter at a particular stage. The maximum number of successful

stages per well in the existing 74 wells was 31. Since the network was trained on existing wells

the input shape cannot change, so wells with less than 31 successful stages simply had a “0”

value associated with the stages it did not have. All the input variables were normalized to values

between 0 and 1, where 0 represents the minimal value in that parameter and 1 represents the

maximum from the existing 74 wells.

The network was used to run 1,080 sensitivities to explore how different combinations of stage

count, fluid type, water amount, and proppant amount affect the aggregated 5-year cumulative

gas production of the 40 proposed wells.

Page 137: Machine Learning Applications for Production Prediction

120

5.6 Results and Discussion

Tables 5.3 and 5.4 show the results from the 1,080 sensitivity runs, the tables depict how the

aggregated 5-year cumulative production (known from now as “the output”) from the proposed

40 wells is affected by changes in the input. Tables 5.3a, 5.3b and 5.3c depict the crosslinked gel

results using 10, 15 and 30 stages per well respectively. Table 5.4a, 5.4b and 5.4c depict the

slickwater results using 10, 15 and 30 stages per well respectively.

Out of the existing 74 existing wells, 48 were completed using the crosslinked gel and 26 were

completed using the slickwater. Crosslinked gel was the original fluid choice and was used from

2006 to 2014 after which point all wells were fractured using slickwater. Since the slickwater

stimulations used newer technology, they tended to have more stages. The stage count for the

crosslinked gel wells had a range of 5 to 17 with an average value of 10. The stage count for the

slickwater wells had a range of 9 to 31 with an average value of 18. For this study, the

cumulative production is forecasted using stage counts of 10, 15 and 30 for both fluids. Since the

largest crosslinked gel stage count was 17, forecasting using 30 stages per crosslinked gel well

would be an extrapolation. Only two of the existing slickwater wells stages counts are lower than

14 and because of this forecasting using 10 stages per slickwater well is also an extrapolation. 15

stages per well should be used when comparing the results of the two types of fluids as these are

not extrapolations and would be have the best basis for comparison.

Page 138: Machine Learning Applications for Production Prediction

121

Table 5.3: Results from the sensitivity analysis using the crosslinked gel as the fracture fluid

Table 5.3a – 10 Stages Crosslinked Gel (Results in Bcf)

Proppant

amount

per stage (tonnes)

Well

proppant

amount (tonnes)

Row 1 - Fluid injected per stage (m3), Row 2 - Fluid injected per well (m3)

90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 2,100 2,200 2,300 2,400 2,500 2,600

40 400 193.0 193.1 193.2 193.3 193.5 193.6 193.8 194.1 194.4 194.7 195.0 195.3 195.5 195.7 195.9 196.1 196.3 196.5

60 600 194.0 194.1 194.2 194.3 194.5 194.6 194.8 195.1 195.4 195.7 196.0 196.3 196.5 196.8 197.0 197.2 197.4 197.6

80 800 197.0 197.1 197.2 197.3 197.4 197.6 197.8 198.1 198.4 198.7 199.0 199.3 199.5 199.7 200.0 200.2 200.4 200.6

100 1,000 203.8 203.9 204.0 204.1 204.2 204.4 204.8 205.1 205.4 205.7 205.9 206.2 206.4 206.6 206.8 207.1 207.3

120 1,200 204.6 204.7 204.8 205.0 205.3 205.7 206.0 206.3 206.5 206.8 207.0 207.2 207.4 207.6 207.8

140 1,400 202.4 202.6 202.9 203.2 203.5 203.8 204.1 204.3 204.5 204.8 205.0 205.2 205.4

160 1,600 200.4 200.7 201.0 201.3 201.5 201.7 202.0 202.2 202.4 202.6 202.8

180 1,800 198.6 198.9 199.1 199.4 199.6 199.8 200.0 200.2 200.4

200 2,000 197.0 197.3 197.5 197.7 197.9 198.1 198.3

220 2,200 195.4 195.6 195.8 196.0 196.2 196.4

240 2,400 194.0 194.2 194.4 194.6

260 2,600 192.8 193.0

Table 5.3b – 15 Stages Crosslinked Gel (Results in Bcf)

Proppant

amount

per stage (tonnes)

Well

proppant

amount (tonnes)

Row 1 - Fluid injected per stage (m3), Row 2 - Fluid injected per well (m3)

90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

1,350 1,500 1,650 1,800 1,950 2,100 2,250 2,400 2,550 2,700 2,850 3,000 3,150 3,300 3,450 3,600 3,750 3,900

40 600 195.2 195.4 195.5 195.6 195.8 195.9 196.2 196.5 196.9 197.3 197.6 197.9 198.2 198.4 198.7 198.9 199.1 199.4

60 900 198.9 199.1 199.2 199.3 199.5 199.6 199.8 200.2 200.6 201.0 201.3 201.6 201.8 202.1 202.4 202.6 202.8 203.1

80 1,200 204.8 204.9 205.1 205.2 205.3 205.5 205.7 206.1 206.5 206.8 207.1 207.4 207.7 208.0 208.2 208.4 208.7 208.9

100 1,500 214.8 214.9 215.0 215.2 215.3 215.5 215.9 216.3 216.6 216.9 217.2 217.4 217.7 217.9 218.1 218.4 218.6

120 1,800 217.5 217.6 217.8 218.0 218.3 218.7 219.0 219.3 219.6 219.8 220.1 220.3 220.5 220.7 221.0

140 2,100 216.8 217.1 217.4 217.7 218.1 218.4 218.6 218.9 219.1 219.3 219.6 219.8 220.0

160 2,400 216.2 216.6 216.9 217.2 217.5 217.7 218.0 218.2 218.4 218.6 218.8

180 2,700 215.8 216.0 216.3 216.5 216.8 217.0 217.2 217.4 217.6

200 3,000 215.2 215.5 215.7 215.9 216.1 216.3 216.5

220 3,300 214.5 214.7 214.9 215.1 215.3 215.5

240 3,600 213.9 214.1 214.3 214.5

260 3,900 213.4 213.6

Out of existing proppant density range

Out of existing proppant density range

Page 139: Machine Learning Applications for Production Prediction

122

Table 5.3c – 30 Stages Crosslinked Gel (Results in Bcf)

Proppant Amount

per stage

(tonnes)

Well proppant

amount

(tonnes)

Row 1 - Fluid injected per stage (m3), Row 2 - Fluid injected per well (m3)

90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260

2,700 3,000 3,300 3,600 3,900 4,200 4,500 4,800 5,100 5,400 5,700 6,000 6,300 6,600 6,900 7,200 7,500 7,800

40 1,200 222.7 222.9 223.0 223.2 223.3 223.5 223.8 224.2 224.7 225.1 225.5 225.9 226.3 226.7 227.0 227.4 227.7 228.1

60 1,800 226.8 226.9 227.0 227.2 227.4 227.5 227.8 228.2 228.6 229.0 229.4 229.8 230.1 230.5 230.8 231.2 231.5 231.8

80 2,400 232.5 232.6 232.7 232.9 233.0 233.1 233.4 233.7 234.1 234.5 234.9 235.2 235.5 235.8 236.1 236.4 236.7 237.0

100 3,000 241.3 241.4 241.5 241.6 241.7 241.9 242.1 242.5 242.7 243.0 243.3 243.5 243.8 244.0 244.2 244.5 244.7

120 3,600 243.9 244.0 244.1 244.3 244.5 244.8 245.1 245.3 245.5 245.8 246.0 246.2 246.4 246.6 246.8

140 4,200 245.1 245.3 245.5 245.8 246.1 246.3 246.6 246.8 247.0 247.2 247.5 247.7 247.9

160 4,800 246.5 246.8 247.0 247.3 247.5 247.7 248.0 248.2 248.4 248.6 248.9

180 5,400 247.7 248.0 248.2 248.4 248.7 248.9 249.1 249.3 249.5

200 6,000 248.9 249.2 249.4 249.6 249.8 250.0 250.2

220 6,600 249.8 250.1 250.3 250.5 250.7 250.9

240 7,200

250.9 251.1 251.3 251.5

260 7,800 251.8 252.0

Out of existing proppant density range

Page 140: Machine Learning Applications for Production Prediction

123

Table 5.4: Results from the sensitivity analysis using the slickwater as the fracture fluid.

Table 5.4a – 10 Stages Slickwater (Results in Bcf)

Proppant

amount per stage

(tonnes)

Well

proppant amount

(tonnes)

Row 1 - Fluid injected per stage (m3), Row 2 - Fluid injected per well (m3)

250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000

2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500 10,000

40 400 225.2 226.1 226.7 226.7 226.5 226.1 225.6 225.0 224.3 223.7 223.0 222.3 221.6 221.0 220.3 219.7

60 600 225.5 226.4 227.0 227.1 226.9 226.5 226.1 225.5 224.8 224.2 223.5 222.9 222.2 221.6 220.9 220.3

80 800 227.5 228.4 229.0 229.1 228.9 228.6 228.2 227.6 227.0 226.4 225.7 225.1 224.5 223.9 223.3 222.7

100 1,000 232.9 233.7 234.3 234.5 234.3 234.0 233.6 233.1 232.5 231.9 231.4 230.8 230.2 229.6 229.1 228.5

120 1,200 232.7 233.5 234.1 234.2 234.1 233.8 233.4 232.9 232.4 231.8 231.2 230.7 230.1 229.6 229.1 228.5

140 1,400 229.6 230.4 231.0 231.2 231.0 230.7 230.4 229.9 229.3 228.8 228.2 227.6 227.1 226.6 226.0 225.5

160 1,600 226.3 227.1 227.7 227.8 227.7 227.4 227.1 226.5 226.0 225.5 224.9 224.4 223.8 223.3 222.8 222.3

180 1,800 223.0 223.8 224.4 224.5 224.4 224.1 223.8 223.3 222.8 222.2 221.7 221.2 220.7 220.2 219.7 219.2

200 2,000 219.8 220.6 221.2 221.4 221.2 221.0 220.7 220.2 219.7 219.2 218.7 218.2 217.7 217.3 216.8 216.4

220 2,200 216.9 217.7 218.3 218.4 218.3 218.1 217.8 217.4 216.9 216.4 216.0 215.5 215.1 214.7 214.2 213.8

240 2,400 214.1 214.9 215.5 215.7 215.6 215.4 215.1 214.7 214.3 213.9 213.4 213.0 212.6 212.2 211.8 211.5

260 2,600 211.5 212.3 213.0 213.1 213.1 212.9 212.6 212.3 211.9 211.5 211.1 210.7 210.3 210.0 209.6 209.3

280 2,800 209.9 210.6 210.8 210.7 210.6 210.3 210.0 209.6 209.2 208.9 208.5 208.2 207.8 207.5 207.2

300 3,000 207.7 208.3 208.5 208.5 208.4 208.1 207.8 207.5 207.1 206.8 206.5 206.2 205.9 205.6 205.4

Page 141: Machine Learning Applications for Production Prediction

124

Table 5.4b – 15 Stages Slickwater (Results in Bcf)

Proppant

amount per stage

(tonnes)

Well

proppant amount

(tonnes)

Row 1 - Fluid injected per stage (m3), Row 2 - Fluid injected per well (m3)

250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000

3,750 4,500 5,250 6,000 6,750 7,500 8,250 9,000 9,750 10,500 11,250 12,000 12,750 13,500 14,250 15,000

40 600 199.6 200.7 201.6 202.0 202.1 202.2 202.2 202.1 202.0 201.9 201.8 201.7 201.6 201.5 201.4 201.3

60 900 203.2 204.3 205.2 205.5 205.7 205.8 205.8 205.7 205.6 205.5 205.4 205.3 205.3 205.2 205.1 205.0

80 1,200 208.8 209.9 210.7 211.1 211.3 211.4 211.4 211.3 211.3 211.2 211.1 211.0 210.9 210.8 210.8 210.7

100 1,500 218.1 219.1 219.9 220.3 220.4 220.5 220.6 220.5 220.4 220.3 220.3 220.2 220.1 220.0 220.0 219.9

120 1,800 220.4 221.4 222.2 222.6 222.7 222.8 222.9 222.8 222.7 222.7 222.6 222.5 222.4 222.4 222.3 222.2

140 2,100 219.6 220.6 221.4 221.7 221.9 222.0 222.0 222.0 221.9 221.8 221.7 221.7 221.6 221.5 221.4 221.4

160 2,400 218.5 219.5 220.3 220.7 220.9 220.9 221.0 220.9 220.9 220.8 220.7 220.6 220.5 220.5 220.4 220.3

180 2,700 217.5 218.5 219.3 219.6 219.8 219.9 219.9 219.9 219.8 219.7 219.6 219.6 219.5 219.4 219.4 219.3

200 3,000 216.5 217.5 218.3 218.6 218.8 218.9 218.9 218.9 218.8 218.7 218.7 218.6 218.5 218.5 218.4 218.4

220 3,300 215.6 216.6 217.4 217.7 217.9 218.0 218.0 218.0 217.9 217.9 217.8 217.7 217.7 217.7 217.6 217.6

240 3,600 214.7 215.7 216.5 216.8 217.0 217.1 217.2 217.1 217.1 217.0 217.0 216.9 216.9 216.8 216.8 216.8

260 3,900 213.9 214.8 215.6 216.0 216.2 216.3 216.3 216.3 216.3 216.2 216.2 216.1 216.1 216.1 216.0 216.0

280 4,200 214.0 214.8 215.2 215.3 215.4 215.5 215.5 215.4 215.4 215.4 215.3 215.3 215.3 215.3 215.3

300 4,500 213.2 214.0 214.4 214.5 214.6 214.7 214.7 214.7 214.7 214.6 214.6 214.6 214.6 214.6 214.6

Page 142: Machine Learning Applications for Production Prediction

125

Table 5.4c – 30 Stages Slickwater (Results in Bcf)

Proppant

amount per stage

(tonnes)

Well

proppant amount

(tonnes)

Row 1 - Fluid injected per stage (m3), Row 2 - Fluid injected per well (m3)

250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1,000

7,500 9,000 10,500 12,000 13,500 15,000 16,500 18,000 19,500 21,000 22,500 24,000 25,500 27,000 28,500 30,000

40 1,200 233.2 234.7 236.0 236.8 237.4 237.9 238.3 238.7 239.0 239.3 239.6 239.9 240.2 240.4 240.7 240.9

60 1,800 236.5 237.9 239.1 239.8 240.4 240.9 241.3 241.6 241.9 242.2 242.5 242.8 243.1 243.3 243.6 243.8

80 2,400 241.0 242.2 243.2 243.9 244.4 244.9 245.2 245.5 245.8 246.1 246.4 246.6 246.9 247.1 247.4 247.6

100 3,000 247.6 248.4 249.1 249.6 250.0 250.3 250.6 250.9 251.1 251.3 251.6 251.8 252.0 252.2 252.4 252.6

120 3,600 249.4 250.1 250.7 251.2 251.5 251.8 252.1 252.3 252.5 252.7 252.9 253.1 253.3 253.5 253.7 253.8

140 4,200 250.5 251.3 251.9 252.3 252.7 253.0 253.3 253.5 253.7 253.9 254.0 254.2 254.4 254.5 254.7 254.8

160 4,800 251.5 252.3 253.0 253.4 253.8 254.1 254.3 254.5 254.7 254.9 255.0 255.2 255.3 255.5 255.6 255.7

180 5,400 252.4 253.2 253.8 254.3 254.6 254.9 255.1 255.3 255.5 255.6 255.8 255.9 256.1 256.2 256.3 256.4

200 6,000 253.0 253.8 254.5 255.0 255.3 255.6 255.8 255.9 256.1 256.3 256.4 256.6 256.7 256.9 257.0 257.1

220 6,600 253.7 254.5 255.2 255.6 256.0 256.2 256.4 256.6 256.8 257.0 257.1 257.3 257.4 257.5 257.6 257.8

240 7,200 254.4 255.2 255.8 256.3 256.6 256.9 257.1 257.3 257.4 257.6 257.7 257.9 258.0 258.1 258.2 258.3

260 7,800 254.9 255.8 256.4 256.9 257.2 257.4 257.6 257.8 257.9 258.1 258.2 258.4 258.5 258.6 258.7 258.8

280 8,400 256.2 256.9 257.3 257.6 257.9 258.1 258.2 258.3 258.5 258.6 258.7 258.8 259.0 259.1 259.2

300 9,000 256.6 257.2 257.7 257.9 258.2 258.4 258.5 258.6 258.8 258.9 259.0 259.1 259.2 259.3 259.4

Page 143: Machine Learning Applications for Production Prediction

126

Figure 5.2 compares how the 15-stage output for both the crosslinked gel and slickwater is

affected by changes in proppant and fluid amounts. Crosslinked gel does see a small

improvement in the output when the fluid per stage is increased. The slickwater case peaks at

400 m3/stage with no further improvements to the output. The output for both the slickwater and

crosslinked gel peaks at 120 tonnes per stage; when more proppant is added the output beings to

drop.

Figure 5.3 shows how the stage count affects the output. Here each stage has different lines that

represent varying fluid amount. The output using the crosslinked gel increases with stage count.

The 10-stage slickwater output outperformed the 15-stage slickwater output. As mentioned

previously the 10 stages per well using slickwater are likely extrapolated and may be unreliable.

The 30 stage output for the crosslinked gel is also extrapolated. However, it does follow a similar

trend of the 30 stages using the slickwater.

Figure 5.4 compares the 15-stage output of crosslinked and slickwater wells. The results are

close between the two types of fluids, especially when the fluid amount per stage is similar

(example 260 m3/stage for crosslinked and 250 m3/stage for slickwater)

Page 144: Machine Learning Applications for Production Prediction

127

Figure 5.2: Aggregated cumulative 5-year production vs proppant amount and fluid amount per

stage using 15 stages per well and the crosslinked gel (top) and the slickwater (bottom) as the

fracture fluid.

Slickwater - 15 stages

Crosslinked gel - 15 stages

Page 145: Machine Learning Applications for Production Prediction

128

Figure 5.3: Affect of stage count per well on the aggregated cumulative 5-year production using

crosslinked gel (top) and slickwater (bottom) as the fracture fluid.

Crosslinked gel

Slickwater

Page 146: Machine Learning Applications for Production Prediction

129

Figure 5.4: Crosslinked gel vs slickwater results for 15 stage count.

To effectively observe how the injected water amounts affect the output, the cases where 120

tonnes of proppant per stage were used since this is the amount at which most of the cases hit

either the maximum or an inflection point. Figure 5.5 shows the output versus the total water

injected for various stage counts for both crosslinked gel and slickwater wells using 120 tonnes

of proppant per stage. From this result, it can be seen that output increases with stage count (with

the exception of the 10 stage slickwater treatment), also the output is minimally impacted when

more fluid is injected per stage and after a certain amount of fluid it no longer improves output.

Page 147: Machine Learning Applications for Production Prediction

130

Figure 5.5: Aggregated cumulative production versus the total water injected for various stage

counts of crosslinked gel and slickwater wells using 120 tonnes of proppant per stage.

The maximum output from the sensitivities was 259 Bcf. If maximum output was the ultimate

goal, then the best way to complete the 40 wells would be to be with slickwater, 30 stages, 1,000

m3 of fluid per stage and 300 tonnes of proppant per stage. This would result in a total injected

fluid of 1,200,000 m3 and 360,000 tonnes of proppant. If, however the goal is to minimize the

fluid input, the best strategy would be to complete with crosslinked gel, 10 stages, 120 m3 of

fluid per stage and 120 tonnes of proppant per stage. This would result in a total injected fluid of

48,000 m3 and 48,000 tonnes of proppant. This type of completion would result in an output of

205 Bcf which is 79% of the maximum possible output, the water usage is only 4% and the

proppant usage is 13% of the maximum possible case.

Page 148: Machine Learning Applications for Production Prediction

131

Since the goal of this study is to minimize water injection a new parameter can be identified

known as injection efficiency:

Injection Efficiency (Mcf

m3) =

Aggregated 5 year cumulative production (Mcf)

Total fluid injected in all wells (m3)

Figure 5.6 shows how the injection efficiency for different stage count and fluid types changes as

more fluid is injected. The results show that the crosslinked gel has a better injection efficiency

than slickwater and that the injection efficiency decreases as more stages are added to wells. This

makes sense since the crosslinked gel uses far less water injection than slickwater and the lower

the stage count, the less the water use.

Figure 5.6: injection efficiency for different stage count and fluid types versus total fluid injected

Page 149: Machine Learning Applications for Production Prediction

132

If the goal was to maximize injection efficiency, the best strategy would be to complete with

crosslinked gel, 10 stages, 90 m3 of fluid per stage and 80 tonnes of proppant per stage. This

would result in total injected fluid of only 36,000 m3 and 32,000 tonnes of proppant. This type of

completion would result in an output of 197 Bcf which is 76% of the maximum possible output,

the water usage is only 3% and the proppant usage is 9% of the maximum possible case.

With all proposed well forecasts, it is worth crosschecking the outputs with the existing wells.

Figure 5.7 shows the individual well 5-year cumulative production and injection efficiency for

the 74 existing crosslinked gel and slickwater wells versus total fluid injected. The injection

efficiency of the crosslinked gel is much higher than the slickwater, the results are also similar to

the injection efficiency curves generated from the sensitivity runs in Figure 5.5 and 5.6.

Page 150: Machine Learning Applications for Production Prediction

133

Figure 5.7: 5-year cumulative production (top) and injection efficiency (bottom) for the 74

existing crosslinked gel and slcikwater wells versus total fluid injected

Page 151: Machine Learning Applications for Production Prediction

134

5.7 Conclusions

In this study we used a machine learning model developed and trained on 74 existing wells in the

Montney formation in Canada, to run sensitivities on 40 proposed wells to be drilled adjacent to

the existing producers. 1,080 sensitivities were run to explore how different combinations of

stage count, fluid type, water amount, and proppant amount affect the aggregated 5-year

cumulative gas production of the 40 proposed wells.

The results of the sensitivities show that injecting greater amounts of water during a hydraulic

fracture does result in slightly more cumulative production in both slickwater and crosslinked gel

completions. The injection efficiency on the other hand drops exponentially with greater water

injection. The results show that it is possible to achieve a cumulative production that is 76% of

the maximum by using only 3% of the water required to achieve this maximum. The results also

show that after cumulative production peaks at about 120 tonnes of proppant per stage for wells

with 10 or 15 stages and only marginally higher for wells with 30 stages.

The model in this study can be used to find the optimal stage count, fluid type, water and

proppant usage when adding new wells to a field with existing development. The model can be

used for any formation as long as there are sufficient vertical wells with rock mechanical logs in

the area. The accuracy of the model increases when the field has more existing development.

Over the last decade the industry has shifted away from crosslinked gel in favor of slickwater, at

the same time, stage count per well along with water injected per stage has also increased. This

has led to a dramatic increase in water usage which has impacted local water resources and

increased the need for wastewater management. This study has shown that bigger is not

Page 152: Machine Learning Applications for Production Prediction

135

necessarily better and that using slickwater and many stages should not be the automatic choice

for completion design even though it the most popular. Conservation of water should be a top

consideration together with cost and profit, every aspect of completion design should be

carefully considered for every type of play. The choice should be driven by resulting production

history from the field and should evolve as more data is generated. Several types of completion

design should be experimented with to see which have the highest injection efficiency at the

lowest cost.

Page 153: Machine Learning Applications for Production Prediction

136

Chapter 6: Using a Convolutional-Recurrent Neural Network Forecasting Model to Optimize the

Positioning of New Wells in a Partially Developed Field

6.1 Preface

This Chapter has been submitted as a manuscript to the Journal of Petroleum Science &

Engineering in 2020. This article is co-authored by Ian D. Gates.

Page 154: Machine Learning Applications for Production Prediction

137

6.2 Abstract

The ultimate recoverable hydrocarbon volumes and economic value of a partially developed field

is controlled by the positioning of future wells. The placement of wells involves aspects of the

understanding of the geology and heterogeneity of the resource as well as the costs of surface

pads and facilities. In this study, a convolutional-recurrent neutral network (c-RNN) developed

and trained to forecast shale gas production given well positions, spacing parameters, and

geomechanical properties within the reservoir, as well as the completion strategies: either a water

intense 30-stage slickwater treatment or a water conservative 15-stage crosslinked gel treatment.

The forecasting model is used to guide the optimal positioning of 20 new wells selected from 40

possible well positions; there are exactly 137,846,528,820 ways to position 20 wells in 40

predetermined positions; this defines the well placement space. To run this many positioning

combinations is computationally not feasible. Here, an approach is described to optimize well

placement using a subspace of the well placement space. The results show that with only 100

random combinations a normal distribution describing the entire well placement space begins to

form. The approach described permits a simple and intuitive method that can be used to find a

well placement combination with high aggregated 5-year cumulative gas production volume and

demonstrates a practical application of the c-RNN forecasting model.

Page 155: Machine Learning Applications for Production Prediction

138

6.3 Introduction

Maximizing hydrocarbon recovery while minimizing costs has always been a priority of the oil

and gas industry and with green fields becoming increasingly rare, optimization of developed

fields has become of great interest to organizations. If completion design is held constant in a

particular area, two factors drive the production performance of a future well: 1. geology and 2.

well placement (location, orientation, and depth or trajectory). The geological properties of a

reservoir, such as permeability, porosity, fluid saturation, rock mechanical properties, in-situ

stresses, etc., have a large effect on how fracture networks form and interact during stimulation

and how fluids flow through the formation during production. Geological properties are also

heterogenous and can vary dramatically even over distances of order of one hundred meters.

Thus, no two wells in a formation will have the exact same geological profile along the wellbore.

Well placement – which gives rise to well spacing – is another important factor because wells

drilled to near each other can share drainage areas which can negatively impact the production of

both wells. This is especially true in multi-stage hydraulically fracture horizontal wells since the

fracture networks from neighboring wells can easily link up. Well spacing depends ultimately on

geology: in less permeable rock, wells can be drilled closer together than in reservoirs with high

permeability. Because of geological heterogeneity, the position of a wellbore path within a

reservoir plays an important role in the future production performance of that well. Most fields

are only partially developed which implies that the amount of hydrocarbon volume left in the

reservoir to produce is driven by the position of future wells. Therefore, a method that can

optimize future well placement would be of great use in field development.

Page 156: Machine Learning Applications for Production Prediction

139

Historically, the most common way for industry to find the optimal well placements has been to

use exhaustive physics-based reservoir simulation methods together with a set of candidate well

locations which are typically defined by the user. In that approach, a geological model and a

numerical reservoir simulator are used to forecast production from all possible well placements

in the set of candidate well locations to see which combination of well placements result in the

highest cumulative field production or economic value or both (Jang et al., 2018). This approach,

however, is time consuming as full field simulation models typically contain tens of millions of

grid blocks requiring large compute time for evaluating multiple development options. To

overcome this limitation, various optimisation techniques have been developed including genetic

algorithms (GA), simulated annealing (SA), neural networks (ANN) and particle swarm

optimisation (PSO) (Bittencourt and Horne,1997; Centilmen et al., 1999; Yeten et al., 2002;

Emerick et al., 2009; Salmachi et al., 2013; Onwunalu and Durlofsky, 2010). Despite these

optimization methods, many being automated, the required computation time is still significant.

An optimized well placement strategy is only as good as the geological, reservoir, and

forecasting model used to produce it. If the geological or reservoir model is poor or the

forecasting model is inaccurate or uncertain, then it is likely that any well placements that are

chosen by the optimization algorithm will not produce optimal volumes. Numerical reservoir

simulators are a bottom up approach (Mohaghegh, 2017) where the geological model or an

ensemble of equiprobable models is defined and the system is optimized against an objective

function taking into account the set of candidate well locations. In most cases, a single geological

model is defined and the model is constructed from upscaled data from wells that are kilometers

away from each other yielding an oversimplification of a reservoir, its geological properties and

Page 157: Machine Learning Applications for Production Prediction

140

the complexities of its fracture networks. Thus, many operators have little confidence in the

ability to use numerical reservoir simulation optimization for forecasting and the placement of

wells to maximize future production.

An alternative approach that has been recently getting attention is by using machine learning

algorithms. These algorithms make minimal assumptions about the structure or geology of the

reservoir or how fluid flows through it. Rather, they are trained using existing data to find links

between the inputs (such as geology, completion type and spacing between wells) and the output

(such as cumulative production). The greater the richness of the input data, the better the

accuracy of the forecasting model, and the more capable the machine learning methods are able

to optimize well placements.

In this study, we use a machine learning algorithm to determine where best to locate 20 wells

amongst 40 potential locations. In the context of a company, this could be interpreted as having

40 well location options for a full field development but having a budget for a partial field

development that requires selecting only a subset of 20 of them. The objective function is to

maximize the aggregated 5-year cumulative gas production using only 20 of the well locations

taking into account the factors described above about geology and secondary wells drilled after a

first well and spacing from adjacent older wells and how cumulative production is impacted.

If done manually, there are 137,846,528,820 different combinations (the well placement space)

to position 20 wells in 40 predetermined positions. If done with a physics-based reservoir

simulator and if each simulation took 10 minutes to run, it would take over 2,000,000 years to

Page 158: Machine Learning Applications for Production Prediction

141

run all the cases. Even if only 1% of the well placement space was evaluated, this would take

20,000 years of simulations. Even with parallel processing capabilities, for example, with 1,000

cores, a 1% subset evaluation would take 20 years. Thus, options to do this in meaningful time,

say on the order of days to weeks would be desired. Machine learning offers an option to do this.

6.4 Study Area and Completion Scenarios

The study area is the same for all research chapters and is described in Chapter 1: introduction of

this thesis and shown in Figures 1.2 and 1.3. The 40 possible future well positions are shown in

Figure 5.1 and are the same as the 40 well positions that where used to run sensitivities on

completion designs in Chapter 5.

The forecasting model developed in this study uses three groups of input parameters at every

stage along the wellbore to make a prediction about the gas production. These 3 groups are:

• well spacing and completion order,

• rock mechanical properties, and

• completion parameters.

The individual parameters of each group are listed in Table 5.1.

6.4.1 Well spacing and completion order

The existing producers in this study are drilled close to each other so the production of gas from

one well affects the production of its neighbors. This was shown by Chapter 4 where the

Page 159: Machine Learning Applications for Production Prediction

142

individual well error rate of the forecasting model decreased from 16.3 to 14.9% when well

spacing and completion order parameters (time and gas volume that a well produced in an

unbounded state, time and gas volume that offset wells produced before a well was drilled,

percentage of length that each offset well covers a well, average perpendicular distance between

offsetting wells) were taken into account. Because the well spacing and completion order was

important, they were also used for the proposed wells in this study. The spacing parameters were

based on distance from a proposed well location to existing producers as well as the distance

between the proposed wells themselves. Well spacing and completion order makes more sense

on a well level not a stage level, so the input values representing spacing and completion order

for the well would be constant at every stage in that well.

6.4.2 Rock mechanical properties

The rock mechanical properties at each stage of the proposed wellbore trajectory were extracted

using the rock mechanical model built in Chapter 4. Briefly, this model was developed by

upscaling and 3D interpolating the density (RHOB), compressional sonic (DTP) and shear sonic

(DTS) logs of 180 vertical and deviated wells located in the study area. The DTS log was present

in only 14 of these wells and had to be synthetically generated for the other 166 using the neural

network developed in Chapter 3. Because stage count per proposed well can be either 15 or 30

depending on the scenario, the rock mechanical properties are different depending on the stage

count. For each proposed well we extracted 2 sets of rock inputs based on the number of stages.

Page 160: Machine Learning Applications for Production Prediction

143

6.4.3 Completion Parameters

For this study the fluid type, fluid amount and proppant amount are used as the completion

parameters. Two completion cases are considered for each well: 1. water intense and 2. water

conservative. This makes the optimization problem more challenging. The completion

parameters for the two cases are as follows:

• Water intense:

o Fracture fluid - slickwater

o 30 stages per well

o 150 tonnes of proppant per stage

o 650 m3 of fluid per stage

• Water conservative:

o Fracture fluid - crosslinked gel

o 15 stages per well

o 100 tonnes of proppant per stage

o 150 m3 of fluid per stage

The stage count, proppant amount, and fluid amount for each of these cases were chosen based

on the average of the existing 26 slickwater wells and 40 crosslinked gel wells in the area of

interest.

6.5 Neural Network Algorithm for Gas Production Forecasting

The neural network used in this study was developed in Chapter 4. In this work, the network was

trained using the 74 existing producers in the area with the supervised learning approach. During

supervised learning both the inputs and the outputs are provided to the network. The inputs are

Page 161: Machine Learning Applications for Production Prediction

144

the well spacing and completion order, rock mechanical properties and the completion

parameters. The output was the 5-year cumulative production profile in one-year increments. The

network predicts an output given the inputs, this predicted output is compared to the actual and

the error is backpropagated back through the system adjusting the weight parameters. The weight

parameters are started with random values but get adjusted as more of the well inputs and outputs

are shown to network. Once the network sees the entire data hundreds of times over, the network

begins to find general trends and patterns within the study area and the overall prediction error

rate drops (Reed and Marks, 1998). Once the network is trained to a sufficiently low error rate it

can be used to make predictions.

The trained network cannot distinguish between existing or proposed wells, as long as the input

data is in the correct format the network will make a prediction. The trained network can be used

to see how the 5-year cumulative production of a proposed well is affected when the inputs are

changed, i.e. it can be used to run sensitivities. As long as the proposed wells are located within

the study area (the geology is similar to what the network has been trained on), of similar length

and of similar completion designs the aggregate error rate should be similar to that of the existing

producers. Figure 4.13 shows how the error of the model drops as more wells are aggregated, at

20 wells the error rate is expected to be around 2.8%.

The model consists in this study is a multi-headed convolutional-recurrent hybrid network (c-

RNN). The c-RNN hybrid network was chosen because it combines both the speed and ability to

process large amounts of data of a convolutional network with the sequence processing ability of

a recurrent network. The c-RNN hybrid has been shown to outperform traditional neural

Page 162: Machine Learning Applications for Production Prediction

145

networks in Chapter 3. In multi-headed architectures each input variable is handled by a separate

convolutional network (head) and the output of each of these networks (heads) are merged and

inputted into a recurrent network before a prediction is made; these types of models offer better

performance (Bagnall, 2015). The network was programmed in Python using the keras library

(Chollet, 2015). The architecture of the network is shown in Figure 4.6 and the network

hyperparameters are shown in Table 4.3.

As depicted in Figure 4.7, the inputs to the neural network are a series of tables where the

columns represent the stage variables and rows represent the stages. Each value in this table

represents a particular input parameter at a particular stage. The maximum number of successful

stages per well in the existing 74 wells was 31. Since the network was trained on existing wells

the input shape cannot change, so wells with less than 31 successful stages simply had a “0”

value associated with the stages it did not have. All the input variables were normalized to values

between 0 and 1, where 0 represents the minimal value in that parameter and 1 represents the

maximum from the existing 74 wells.

6.6 Well Combinations and Procedure

Although there are 137,846,528,820 ways to position 20 wells in 40 predetermined positions, in

the approach used here, a very small number is used for making predictions with the forecasting

model. Two steps are conducted to create the sample space for evaluating well placement as

follows:

Page 163: Machine Learning Applications for Production Prediction

146

Step 1: 100 combinations of the well placements were chosen at random and the gas volumes

were forecasted using the c-RNN model. Different positioning combinations of the 20 wells lead

to different well spacing and completion order parameters for each of the wells which must be

recalculated for each arrangement of the 20 wells relative to themselves as well as to the existing

wells. Recalculation of these parameters is important because the produced gas volume depends

on spacing. As an example, Figure 6.1 shows two random positioning combinations of the 20

wells illustrating how different arrangements lead to different spacings between all wells. To

generate the results the spacing input parameters were updated for every well in each

combination. For every combination the c-RNN generated a unique 5-year cumulative gas

production profile for every well, the production of all 20 wells was then aggregated. The 100

well location combinations were run with each of the slickwater (water intense) and cross-linked

gel (water conservative) stimulation methods. This gave rise to 200 total well location

combinations and stimulation designs. The purpose of the random combinations was to roughly

approximate what the true normal distribution curve of all possible combinations would look

like. This approximate distribution is meant to serve as a background on which to compare the

performance of other well combinations. It is also used to validate or falsify claims of optimal

well combinations.

Page 164: Machine Learning Applications for Production Prediction

147

Figure 6.1: One random combination of 20 wells (top), another random combination of 20 wells

(bottom).

Page 165: Machine Learning Applications for Production Prediction

148

Step 2: Given that there are over 100 billion combinations, it is unlikely that 100 random

combinations would give insights into which combination yields the highest or the lowest

cumulative gas. For this reason, a second set of combinations are examined using the following

procedure to generate 4 additional combinations:

1. Run a full field scenario using all 40 possible well positions with slickwater (water intense)

stimulation; denote this set of cases as SW40.

2. Identify subset of the 20 wells with the highest production volume from the SW40 set;

denote this subset as SW20H. The remaining subset of 20 wells become subset SW20L (the

lowest production volume wells). Figure 6.2 displays the SW20H and SW20L subsets.

3. Run a full field scenario using all 40 possible well positions with crosslinked gel (water

conservative) stimulation; denote this set of cases as CG40.

4. Identify subset of the 20 wells with the highest production volume from the CG40 set; denote

this subset as CG20H. The remaining subset of 20 wells become subset CG20L (the lowest

production volume wells). Figure 6.3 shows the CG20H and CG20L subsets.

This procedure generated 4 well combinations that were added to the 100 random combinations

generated in Step 1 giving a total of 104 positioning combinations for each stimulation method.

In total, 208 different aggregated 5-year cumulative gas production values were determined (104

per stimulation method). The point of generating these 4 additional well combinations is that it is

a shortcut which finds the “approximate” lowest and highest combinations when only 20 wells

are used. However, the shortcut is only approximate since it uses all 40 wells with spacing

parameters for all 40 wells involved in the calculation. The four additional cases from Step 2 are

Page 166: Machine Learning Applications for Production Prediction

149

then re-run using only the 20 wells with the slickwater and crosslinked gel stimulations using the

spacing parameters for the 20 wells.

Figure 6.2: Best and worst 20 well positions found in the 40 well prediction using the slickwater

(water intense) stimulation.

Existing

20 best wells

20 worst wells

Page 167: Machine Learning Applications for Production Prediction

150

Figure 6.3: Best and worst 20 well positions found in the 40 well prediction using the crosslinked

gel (water conservative) stimulation.

6.7 Results and Discussion

Tables 6.1 and 6.2 show the results of the 100 random positioning combinations along with the

four additional combinations found from forecasting the drilling of all 40 wells in scenarios

SW40 and CG40. The results in Table 6.1 are from the crosslinked gel (water conservative)

stimulation and the results in Table 6.2 are from the slickwater (water intense) stimulation.

Existing

20 best wells

20 worst wells

Page 168: Machine Learning Applications for Production Prediction

151

Table 6.1: Results from the sensitivity analysis using crosslinked gel as the fracture fluid.

20 Well Cumulative Gas Production (Bcf)

Combination Year 1 Year 2 Year 3 Year 4 Year 5

001 33 57 75 87 96

002 32 55 71 83 91

003 33 57 74 86 95

004 33 57 75 88 98

005 34 58 76 88 97

006 33 56 73 85 94

007 34 58 76 89 98

008 33 57 74 87 95

009 32 54 71 82 90

010 34 60 78 92 102

011 33 57 74 86 95

012 34 60 78 91 101

013 34 59 77 90 99

014 34 60 78 91 100

015 34 58 76 89 99

016 34 58 75 87 96

017 33 56 73 85 93

018 34 60 79 93 103

019 32 55 71 83 91

020 33 57 74 86 95

021 33 56 72 84 92

022 33 57 74 86 95

023 34 59 78 91 101

024 33 58 76 88 97

025 34 60 79 93 103

026 33 57 75 87 96

027 35 62 81 95 106

028 34 60 79 92 102

029 32 54 70 81 89

030 34 58 76 89 98

031 35 61 80 93 103

032 35 60 78 91 100

033 33 57 74 86 96

034 34 59 77 90 100

035 35 59 77 89 98

036 33 56 72 84 92

037 34 58 75 87 96

038 33 58 76 89 99

Page 169: Machine Learning Applications for Production Prediction

152

Table 6.1 continued 20 Well Cumulative Gas Production (Bcf)

Combination Year 1 Year 2 Year 3 Year 4 Year 5

039 33 56 73 85 93

040 32 56 72 84 93

041 34 59 77 90 100

042 32 55 72 84 92

043 33 58 77 91 100

044 34 59 77 89 99

045 34 59 77 91 100

046 35 60 79 92 101

047 34 58 74 86 95

048 33 56 72 84 92

049 35 60 78 92 101

050 32 55 71 83 91

051 34 60 78 92 101

052 33 58 76 89 99

053 34 59 77 90 99

054 33 58 76 90 100

055 35 61 81 94 104

056 32 55 72 84 92

057 34 59 77 90 99

058 33 56 73 85 94

059 34 59 77 90 100

060 33 58 76 89 98

061 34 59 77 91 100

062 33 57 74 86 95

063 33 57 75 87 96

064 35 62 82 96 106

065 34 60 79 93 103

066 35 60 78 91 100

067 34 58 76 89 98

068 33 56 73 84 93

069 32 55 71 82 91

070 34 59 78 91 100

071 34 58 76 89 98

072 34 59 77 90 100

073 34 58 76 88 98

074 34 59 77 90 99

075 33 58 77 90 100

076 33 56 73 85 93

077 1 59 78 91 101

Page 170: Machine Learning Applications for Production Prediction

153

Table 6.1 continued 20 Well Cumulative Gas Production (Bcf)

Combination Year 1 Year 2 Year 3 Year 4 Year 5

078 33 58 77 90 100

079 34 58 76 89 98

080 33 58 76 88 98

081 34 59 77 90 100

082 33 57 75 88 98

083 32 56 72 84 93

084 33 55 72 83 92

085 34 58 76 89 99

086 33 57 74 86 94

087 33 57 74 85 94

088 33 57 74 86 95

089 34 57 74 85 94

090 34 59 77 91 100

091 33 58 76 88 98

092 33 57 74 86 95

093 34 59 77 89 98

094 33 58 76 90 100

095 34 60 79 93 104

096 33 57 75 88 97

097 34 59 78 91 101

098 34 60 79 93 103

099 33 58 77 90 99

100 33 57 73 85 94

CG20H well combination 37 66 88 104 116

CG20L well combination 32 53 68 79 87

SW20H well combination 36 64 84 99 110

SW20L well combination 34 58 75 88 97

Page 171: Machine Learning Applications for Production Prediction

154

Table 6.2: Results from the sensitivity analysis using slickwater as the fracture fluid.

20 Well Cumulative Gas Production (Bcf)

Combination Year 1 Year 2 Year 3 Year 4 Year 5

001 34 62 84 102 115

002 34 60 80 96 109

003 33 59 81 98 111

004 33 59 80 97 110

005 35 64 86 104 118

006 34 60 81 97 109

007 33 60 82 99 113

008 34 61 82 99 111

009 32 57 77 92 104

010 34 63 86 105 119

011 34 61 82 100 113

012 34 62 85 103 117

013 34 61 83 101 114

014 34 61 84 102 116

015 35 63 86 105 119

016 35 62 85 103 117

017 33 60 82 99 112

018 35 63 87 106 120

019 34 60 81 97 109

020 35 62 84 101 115

021 33 59 80 97 110

022 34 61 82 99 112

023 33 60 83 100 114

024 33 59 81 98 111

025 35 64 87 106 120

026 33 59 81 97 110

027 35 64 87 106 121

028 34 61 83 101 115

029 32 58 79 95 107

030 35 63 86 104 118

031 35 64 87 106 120

032 36 65 89 108 123

033 32 58 79 96 109

034 33 60 81 98 111

035 35 62 84 101 115

036 34 61 83 99 112

037 35 61 82 99 112

038 33 60 82 100 114

Page 172: Machine Learning Applications for Production Prediction

155

Table 6.2 continued 20 Well Cumulative Gas Production (Bcf)

Combination Year 1 Year 2 Year 3 Year 4 Year 5

039 35 61 82 99 111

040 33 58 79 95 107

041 34 62 84 102 116

042 34 61 82 99 112

043 33 60 82 100 114

044 34 61 83 101 114

045 33 60 83 100 114

046 35 64 88 107 122

047 35 63 85 102 115

048 34 61 82 99 112

049 34 63 86 105 119

050 33 60 81 98 110

051 36 64 87 105 120

052 34 62 85 103 118

053 35 62 85 102 116

054 33 59 81 98 111

055 35 64 87 106 121

056 32 57 77 93 105

057 35 62 85 102 116

058 33 60 81 97 110

059 34 62 84 102 116

060 33 60 82 99 112

061 34 61 83 101 115

062 36 64 86 104 117

063 34 61 83 100 113

064 35 64 89 108 124

065 34 63 86 105 120

066 35 64 87 105 119

067 34 62 84 102 115

068 35 62 84 101 113

069 33 59 79 95 107

070 34 63 86 104 119

071 35 63 86 104 118

072 33 60 82 99 112

073 35 62 84 102 115

074 34 62 84 101 115

075 33 60 82 99 113

076 35 61 83 99 112

Page 173: Machine Learning Applications for Production Prediction

156

Table 6.2 continued 20 Well Cumulative Gas Production (Bcf)

Combination Year 1 Year 2 Year 3 Year 4 Year 5

077 34 61 83 101 114

078 33 59 80 97 110

079 34 62 84 101 115

080 32 58 79 96 110

081 35 63 86 104 119

082 33 59 80 97 111

083 32 58 79 96 108

084 33 59 79 94 106

085 34 61 84 102 117

086 34 60 81 98 111

087 34 61 83 100 113

088 34 60 82 99 112

089 34 61 83 99 112

090 34 61 83 101 114

091 33 59 81 98 111

092 34 61 82 99 113

093 35 63 86 104 118

094 33 60 83 100 114

095 34 61 84 102 116

096 33 60 81 98 111

097 35 63 87 105 119

098 34 62 85 103 117

099 34 62 85 103 117

100 35 62 84 101 114

SW20H well combination 36 66 91 111 127

SW20L well combination 34 61 83 100 114

CG20H well combination 36 66 91 112 129

CG20L well combination 34 60 80 96 108

Figure 6.4 shows two histograms (one for each stimulation case) which depict the distribution of

the aggregated 5-year cumulative gas production of the 100 random positioning combination

along with the 4 best/worst combinations generated by running the full field development

scenario in Step 2 of the procedure. The 100 random combinations in the histograms are rough

approximations of what the true normal distribution curve of all possible combinations would

look like.

Page 174: Machine Learning Applications for Production Prediction

157

Figure 6.4: Results of the 104 combinations presented in a histogram using both crosslinked

(top) and slickwater (bottom) as the stimulation fluid.

Bins of aggregated 5-year cumulative production of 20 wells (Bcf)

[87,

88

]

(88,

89

]

(89,

90

]

(90,

91

]

(91,

92

]

(92,

93

]

(93,

94

]

(94,

95

]

(95,

96

]

(96,

97

]

(97,

98

]

(98,

99

]

(99,

10

0]

(100

, 1

01

]

(101

, 1

02

]

(102

, 1

03

]

(103

, 1

04

]

(104

, 1

05

]

(105

, 1

06

]

(106

, 1

07

]

(107

, 1

08

]

(108

, 1

09

]

(109

, 1

10

]

(110

, 1

11

]

(111

, 1

12

]

(112

, 1

13

]

(113

, 1

14

]

(114

, 1

15

]

(115

, 1

16

]

Num

ber

of

Co

mb

inca

tio

ns

per

b

ins

0

2

4

6

8

10

12

14

16

Bins of aggregated 5-year cumulative production of 20 wells (Bcf)

[104

, 1

05

]

(105

, 1

06

]

(106

, 1

07

]

(107

, 1

08

]

(108

, 1

09

]

(109

, 1

10

]

(110

, 1

11

]

(111

, 1

12

]

(112

, 1

13

]

(113

, 1

14

]

(114

, 1

15

]

(115

, 1

16

]

(116

, 1

17

]

(117

, 1

18

]

(118

, 1

19

]

(119

, 1

20

]

(120

, 1

21

]

(121

, 1

22

]

(122

, 1

23

]

(123

, 1

24

]

(124

, 1

25

]

(125

, 1

26

]

(126

, 1

27

]

(127

, 1

28

]

(128

, 1

29

]

Num

ber

of

Co

mb

inat

ions

per

b

ins

0

2

4

6

8

10

12

14

Slickwater (water intense) case

Crosslinked gel (water conservative) case

CG20H SW20H

CG20L

SW20L

CG20H SW20H

SW20L

CG20L

Page 175: Machine Learning Applications for Production Prediction

158

The average 5-year aggregated cumulative gas production was 104 Bcf for the water

conservative (crosslinked gel) case and 114 Bcf for the water intense (slickwater) case. The

histograms also show just how much better the best combination was than the average. The

combination that yielded the best results in both water intense and water conservative cases was

the CG20H combination. The CG20H combination performed 19% higher than the average and

34% higher than the worst combination in the water conservative case. In the water intense case

the CG20H combination performed 13% higher than the average and 24% higher than the worst

combination. The cost to drill and complete 20 wells should be the same no matter where they

are drilled in the area. By running these scenarios before drilling, for the same price it is possible

to achieve a 19% higher recoverable volume then average.

The CG20H combination subset outperformed the SW20H combination subset in terms of

cumulative gas produced in both the water conservative and water intense stimulation scenarios.

This is an interesting result because when slickwater was used as the stimulation fluid, the best

20 wells in the full field development (SW20H) were not the best 20 wells in a partial field

development but when crosslinked gel was used, the best 20 wells in the full field development

(CG20H) were the best 20 wells in a partial field development. Another interesting observation

is that the positioning combinations in subset CG20L was the worst positioning combination in

the water conservative case but neither subsets CG20L nor SW20L were the worst cases (in fact

they were closer to the average) in the water intense case.

The approximate best and worst combinations generated by running the full 40 well development

were not the best/worst for slickwater but were the best/worst for crosslinked gel. The only

Page 176: Machine Learning Applications for Production Prediction

159

difference between the 20 well scenario and the 40 well scenario is the well spacing. Since the

approximate best/worst combination ranking was affected between the 20 and 40 well scenarios

only when slickwater was used, this suggests that well spacing plays a larger role when

completing with slickwater. When crosslinked gel was used, the best/worst combination ranking

was not affected between the 20 and 40 well scenarios. These results are driven by the larger

well spacing of the 20 well field development versus the full field 40 well development. The

results suggest that the distance between wells is a large factor in production performance when

stimulating with slickwater. This makes intuitive sense as the amount of fluid injected during a

slickwater stimulation is much larger than a crosslinked stimulation and thus would be expected

to create fracture networks that extend further away from the wellbore then if crosslinked gel is

used. Also, the fact that the CG20H well combination performed the best in both water intense

and water conservative stimulation cases suggests that the rock mechanical profile near the

wellbore plays a greater role than completion type when it comes to choosing the best places to

drill locations.

On average, completing the same 20 wells with slickwater lead to the cumulative gas production

being 10% higher then when those wells were completed with crosslinked gel. However; for

10% more gas volumes the slickwater completions had to use 300% more proppant and 867%

more fluid then the crosslinked gel completions.

6.8 Conclusions

In this study, a convolutional-recurrent neural network previously developed and trained on 74

existing wells were used to find the aggregated 5-year cumulative gas production of various

Page 177: Machine Learning Applications for Production Prediction

160

positioning combinations of 20 new wells. The positioning of the 20 new wells was selected

from 40 predetermined positions that were chosen to be have similar length, depth and well

spacing as the existing wells in the field. The results show that with only 100 random

combinations a normal distribution describing the entire sample space of 137,846,528,820 begins

to form. This distribution is useful as it can be used to see how a particular positioning

combination compares to the average of all possible locations. The best combination in the study

was found to be the cross-linked gel well combination (shown in green in Figure 6.3). The results

also suggest that neither the well spacing nor the completion fluid affected the best well position

combination which suggests that the rock mechanical profile near the wellbore plays the greatest

role when it comes to choosing the best places to drill locations.

The method did not sample the complete space (consisting of 137,846,528,820 well

combinations) but rather examined a simple and intuitive method to use the c-RNN gas

production model to guide the decision for choosing the best 20 locations. The tool developed in

this study does not rely on the bottom-up approach of conventional simulators and instead links

well inputs to well performance using a machine learning algorithm. Because the algorithm was

trained and evaluated on existing wells in the study area and had a 5 year, 20 well aggregate

forecast error of only 2.8%, it is arguably more useful then forecast produced by numerical

simulators. Therefore, the optimized well positioning combination will also be more trustworthy.

The approach developed in this study is useful to anyone planning on optimizing the positioning

of new wells in a partially developed field. It allows the user to test how various well placements

would perform as well as provides a guideline to find the best possible well placement.

Page 178: Machine Learning Applications for Production Prediction

161

Chapter 7: Conclusions and Recommendations

The main objectives of this thesis were to improve production forecasting as well as optimize

completion and well placement of multi-stage horizontal wells by using machine learning

methods. The research in this thesis focuses on a 30 km by 30 km section of the unconventional

part of the Montney Formation in the province of Alberta, Canada. This area contains a large

number of vertical and horizontal wells thus providing a rich data set for analysis.

7.1 Conclusions

1. A procedure was designed that can generate highly accurate shear synthetic logs using a

convolutional recurrent network (c-RNN). This procedure can act as a cost effective and

fast alternative to running actual DTS logs in a field with prior development. The rock

mechanical properties surrounding a wellbore can be calculated using the density,

compressional and shear sonic logs of wells penetrating the formation. The rock

mechanical properties of the wells can also be upscaled and interpolated to construct a

three-dimensional (3D) model. Since shear sonic logs are missing from the majority of

wells, generating accurate shear synthetic logs helps increase the accuracy of the rock

mechanical model.

2. A 3D rock mechanical model of the study area was generated using the density and

compressional sonic logs along with the shear sonic (14 actual and 166 synthetic) logs of

Page 179: Machine Learning Applications for Production Prediction

162

the 180 vertical and deviated wells in the study area. This 3D model allowed to generate

rock mechanical profiles along the horizontal well bores.

3. The c-RNN was employed to generate five-year cumulative gas production profiles for

74 horizontal multistage hydraulically fractured wells in the study area. The model was

trained by using a combinations of completion parameters, rock mechanical properties,

and well spacing and completion order for each stage in the wells. The best combination

of inputs was found to be the rock mechanical properties surrounding each perforation

cluster, the proppant amount used for every stage, and the spacing and completion order

of neighboring wells. The novelty of this study is that the input variables used are at the

stage level rather than the average of the entire well. The accuracy of the model was

found to increase exponentially as the production of multiple wells was aggregated.

4. The model trained to predict the production performance of the 74 horizontal wells in the

area was used to predict the production performance of 40 new wells planned to be

drilled along side the existing producers. 1,080 sensitivities were run to explore how

different combinations of stage count, fluid type, water amount, and proppant amount

affect the aggregated 5-year cumulative gas production. The study finds that the injection

efficiency (cumulative gas/total fluid injected) drops exponentially when more water is

injected. The results show that it is possible to achieve a cumulative production that is

76% of the maximum by using only 3% of the water required to achieve this maximum.

Page 180: Machine Learning Applications for Production Prediction

163

5. The model trained to predict the production performance of the 74 horizontal wells in the

area was used to predict the production performance of 20 new wells that can be drilled

in 40 pre-determined well positions. The results show that with only 100 random

combinations a crude normal distribution describing the entire sample space of

137,846,528,820 begins to form which reveals the average of all possible combinations.

The best combination was found by taking the top 20 wells from the crosslinked gel full

field scenario. This combination performed better than all other combinations in the 20

well scenarios for both the water intense and water conservative cases. It was also found

that rock mechanical profile near the wellbore plays the greatest role when it comes to

choosing the best places to drill locations.

7.2 Recommendations

1. To investigate if the accuracy of the model used to generate synthetic shear logs can be

improved by training it on a field with more than 14 wells that contain actual shear sonic

logs. Also, to test if the accuracy can be improved by looking incorporating other logs

(training data) into the training set.

2. There are many ways to improve the accuracy of the production forecasting model.

Arguably the biggest way would be to have flowrate measuring tool installed at every

perforation cluster, a tool like this would enable linking the completion variables and the

rock mechanics of each stage directly to the amount of production coming from that stage

instead. Another way would be to add the microcosmic measurements at each stage of the

Page 181: Machine Learning Applications for Production Prediction

164

well as these would aid in the understanding of the fracture morphology at each stage

along the wellbore. Also, it is unlikely that the neural network used in this study has the

optimal configuration, finding this optimal would improve the accuracy but would

require many more tests. Of course, more installed devices requires additional cost and

since the industry is driven more by profit than scientific curiosity it is likely this will not

happen.

3. Construct the study area in a simulation model to see if the simulation model prediction

as better or worse than the machine learning model. Perhaps also see if the machine

learning model accuracy can be enhanced by incorporating the geology, fracture network

modeling and flow mechanics of the simulation model.

4. Run an economic analysis on water injection efficiency to see if using the water

conservative crosslinked gel fluid during a hydraulic fracture leads to a better or worse

net present value than fracturing with the water intense slickwater fluid.

5. Conduct a more in-depth investigation to see which inputs are driving the choice of the

top 20 well positions in Figure 6.2 and 6.3 to be located more in the bottom right corner

of the study area.

6. Another recommendation is to test the predictor models on different study areas both

conventional and unconventional to see how the prediction accuracy improves or

decreases.

Page 182: Machine Learning Applications for Production Prediction

165

Page 183: Machine Learning Applications for Production Prediction

166

References

A. Y. Abukhamsin. Optimization of well design and location in a real field. 2009. Master’s

thesis, Department of Energy Resources Engineering, Stanford University.

Abdullah Faruque and Joshua Goldowitz (December 20th 2017). Effect of Hydrofracking on

Aquifers, Aquifers - Matrix and Fluids, Muhammad Salik Javaid and Shaukat Ali Khan,

IntechOpen, DOI: 10.5772/intechopen.72327. Available from:

https://www.intechopen.com/books/aquifers-matrix-and-fluids/effect-of-hydrofracking-

on-aquifers

Adelman, M.A. and H.D. Jacoby, 1979, Alternative methods of oil supply forecasting, in: R.S.

Pindyck, ed., The production and pricing of energy resources (JAI Press, Greenwich,

CT).

Aizenberg I, Sheremetov L, Villa-Vargas L, Martinez-Mun˜oz J (2016) Multilayer neural

network with multi-valued neurons in time series forecasting of oil production.

Neurocomputing 175:980–989

Al-Anazi, A.F. and Gates, I.D. On Support Vector Regression to Predict Poisson’s Ratio and

Young’s Modulus of Reservoir Rock, Chapter 5 in Cranganu et al. (eds.), Artificial

Intelligent Approaches in Petroleum Geosciences. 2015, Springer International

Publishing, DOI 10.1007/978-3-319-16531-8_5.

Alberta Energy Regulator ST98-2014: Alberta’s Energy Reserves 2013 and Supply/Demand

Outlook 2014–2023, ISSN 1910-4235, Alberta Energy Regulator, Suite 1000, 250-5th St.

SW, Calgary, Alberta, Canada, T2P 0R4.

Andrew J. Kondash, Nancy E. Lauer, and Avner Vengosh. The intensification of the water

footprint of 461 hydraulic fracturing. Science Advances, 4(8), 2018.

Ayeni, B. and Pilat, R. (1992) Crude Oil Reserve Estimation: An Application of the

Autoregressive Integrated Moving

B. Al-Shamma, H. Nicole, P. R. Nurafza,W. C. Feng, et al., Evaluation of multi-fractured

horizontal well performance: Babbage field case study, in: SPE Hydraulic Fracturing

Technology Conference, Society of Petroleum Engineers, 2014.

B. Zivkovic, I. Fujii, An analysis of isothermal phase change of phase change material within

rectangular and cylindrical containers, Sol. Energy 70 (2001) 51–61.

Ba Geri, M., Imqam, A., & Flori, R. (2019, April 8). A Critical Review of Using High Viscosity

Friction Reducers as Fracturing Fluids for Hydraulic Fracturing Applications. Society of

Petroleum Engineers. doi:10.2118/195191-MS

Barati, R. and J.-T. Liang (2014). "A Review of Fracturing Fluid Systems Used For Hydraulic

Fracturing of Oil and Gas Wells." Journal of Applied Polymer Science 131(16).

Barre, D. (2008). Mechanical Properties Log Processing and Calibration (accessed 29 May 2019)

Page 184: Machine Learning Applications for Production Prediction

167

Belyadi, H., Fathi, E., Belyadi, F., 2017. Hydraulic Fracturing in Unconventional Reservoirs,

first edition. USA. Gulf Professional Publishing.

Bhande, A., 2018, What is underfitting and overfitting in machine learning and how to deal with

it, https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-

learning-and-how-to-deal-with-it-6803a989c76, (Accessed June 1, 2019)

Bittencourt AC and Horne RN (1997) Reservoir development and design optimization. In: SPE

annual technical conference and exhibition, San Antonio, Texas, 5–8 October 1997.

Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of

Econometrics, 31(3), 307–327.

Brockwell, P.J. and Davis, R.A. (1996). Introduction to Time Series and Forecasting. New York:

Springer.

Brown, Robert Goodell. 1959. Statistical Forecasting for Inventory Control. McGraw/Hill.

Brownlee, J., 2017, A Gentle Introduction to Mini-Batch Gradient Descent and How to

Configure Batch Size, https://machinelearningmastery.com/gentle-introduction-mini-

batch-gradient-descent-configure-batch-size/, (Accessed April 16, 2020)

Butler, R.M. Thermal Recovery of Oil and Bitumen. Prentice-Hall, Englewood Cliffs, New

Jersey, 1991.

Canada Energy Regulator, 2018, Market Snapshot: Evolving technology is a key driver of

performance in modern gas wells: a look at the Montney Formation, one of North

America’s biggest gas resources, https://www.cer-

rec.gc.ca/nrg/ntgrtd/mrkt/snpsht/2018/04-04-1vlvngtchnlg-eng.html

Cander, H., 2012. What are unconventional resources? A simple definition using viscosity and

permeability. In: Geologist, A.A.o.P. (Ed.), AAPG Annual Convention and Exhibition.

American Association of Petroleum Geologists and Society for Sedimentary Geology,

Tulsa, OK, USA.

Cartwright, Hugh. Artificial Neural Networks. 2nd ed., Springer-Verlag New York, 2015.

Castagna, J.P., 1985, Shear-wave time-average equation for sandstones: Presented at the 55th

Ann. Internat. Mtg., Soc. Expl. Geophys.

Cavanillas, J.M., Curry, E., Wahlster, W. New Horizons for a DataDriven Economy – Springer.

2016; 152-154.

Centilmen, A., Ertekin, T., Grader, A.S., 1999. Applications of neural networks in multiwell

field development. In: Presented at the SPE Annual Technical Conference and

Exhibition, Houston, Texas, USA, 3–6 October. https://doi.org/10.2118/56433-MS.

Chaikine, I., and Gates, I. 2020b “A Machine Learning Model to Predict Multi-Stage Horizontal

Well Production”, PETROL21107, Journal of Applied Petroleum Science & Engineering

(number).

Chaikine, I., and Gates, I., 2020a “A New Machine Learning Procedure To Generate Highly

Accurate Synthetic Shear Sonic Logs In Unconventional Reservoirs”, SPE-201453-MS,

Page 185: Machine Learning Applications for Production Prediction

168

presented at the DPE Annual Technical Conference and Exhibition, Denver CO, October

5-7, 2020

Chen, Gemai, Bovas Abraham, and Greg W. Bennett. "PARAMETRIC AND

NONPARAMETRIC MODELLING OF TIME SERIES—AN EMPIRICAL STUDY",

Environmetrics 8.1 (1997): 63-74.

Chollet, F. (2015) keras, GitHub. https://github.com/fchollet/keras

Chollet, F. Deep Learning with Python, Manning Publications Co., Greenwich, CT, 2017

Cohen CE, Xu W, Weng X, Tardy P. Production Forecast After Hydraulic Fracturing in

Naturally Fractured Reservoir: Coupling a Complex Fracturing Simulator and a Semi-

Analytical Production Model. SPE paper 152541 presented at the SPE Hydraul‐ ic

Fracturing Technology Conference and Exhibition held in The Woodlands, Texas, USA,

6-8 February 2012.

De Gooijer, J. G., & Hyndman, R. J. (2006). 25 years of time series forecasting. International

journal of forecasting, 22(3), 443-473.

Defeu, C., Garcia, G., Ejofodmi, E. et al. 2018. Time Dependent Depletion of Parent Well and

Impact on Well Spacing in the Wolfcamp Delaware Basin. Presented at the SPE Liquids-

Rich Basins Conference-North America held in Midland, TX, USA, 05-06 September.

SPE-191799-MS

Douglas Bagnall. 2015. Author identification using multi-headed recurrent neural networks. In

Working Notes Papers of the CLEF 2015 Evaluation Labs, volume 1391.

Ediger, V.S. and Akar, S. (2007) ARIMA Forecasting of Primary Energy Demand by Fuel in

Turkey. Energy Policy, 35, 1701-1708. http://dx.doi.org/10.1016/j.enpol.2006.05.009

Edmunds, N.R. and Gittins, S.D. 1993. Effective Application of Steam Assisted Gravity

Drainage to Long Horizontal Well Pairs. JCPT 32 (6): 49-55.

Emerick AA, Silva E, Messer B, et al. (2009) Well placement optimization using a genetic

algorithm with nonlinear constraints. In: SPE reservoir simulation symposium, The

Woodlands, Texas, USA, 2–4 February 2009.

Energy Essentials, 2015, A guide to shale gas,

https://knowledge.energyinst.org/__data/assets/pdf_file/0020/124544/Energy-Essentials-

Shale-Gas-Guide.pdf, (Accessed April 16, 2020)

Engle, R.F. (1982) Autoregressive conditional heteroscedasticity with estimates of the variance

of United Kingdom inflation. Econometrica, 50(4), 987–1007.

Eskandari H, Rezaee MR, Mohammadnia M (2004) Application of multiple regression and

artificial neural network techniques to predict shear wave velocity from well log data for

a carbonate reservoir, south-west Iran. In: CSEG RECORDER, pp 42–48

Fernández-Martínez, J. L., E. García-Gonzalo, J. P. Fernández Álvarez, H. A. Kuzma, and C. O.

Menéndez-Pérez, 2010a, PSO: A powerful algorithm to solve geophysical inverse

problems. Application to a 1D-DC resistivity case: Journal of Applied Geophysics, 71,

no. 1, 13–25, doi: 10.1016/j.jappgeo.2010.02.001.

Page 186: Machine Learning Applications for Production Prediction

169

Fink, J. Petroleum Engineer’s Guide to Oil Field Chemicals and Fluids (Gulf Professional,

Oxford, 2015).

Fjar, E.; Holt, R.M.; Raaen, A.M.; Risnes, R. Petroleum Related Rock Mechanics, 2nd ed.;

Elsevier: Amsterdam, The Netherlands, 2008; pp. 309–339.

Freyman, M. Hydraulic Fracturing & Water Stress: Water Demand by the Numbers; Technical

Report for Ceres, 2014, February, p 85.

G.E.P. Box, G. Jenkins, “Time Series Analysis, Forecasting and Control”, Holden-Day, San

Francisco, CA, 1970.

Gotawala, D.R. and Gates, I.D. 2010. On the Impact of Permeability Heterogeneity on SAGD

Steam Chamber Growth. Natural Resources Research 19(2): 151-164.

Hadi, F., and Nygaard, R. 2018. Shear Wave Prediction in Carbonate Reservoirs: Can Artificial

Neural Network Outperform Regression Analysis? ARMA Paper 18-905 presented at the

52nd US Rock Mechanics / Geomechanics Symposium. Seattle, Washington, US, 17-20

June 2018.

Han, De-hua, Nur, A., and Morgan, D., 1986, Effects of porosity and clay content on wave

velocities in sandstones: Geophysics, 51, 2093-2107.

Holditch, S.A., 2013. Unconventional oil and gas resource development Let’s do it right. Journal

of Unconventional Oil and Gas Resources 1–2, 2–8.

Holland, M., J. L. Urai, P. Muchez, and J. M. Willemse (2009), Evolution of fractures in a highly

dynamic thermal, hydraulic, and mechanical system –(I) Field observations in Mesozoic

carbonates, Jabal Shams, Oman Mountains, GeoArabia [Manama], 14(1), 57–110.

Holt, Charles E. 1957. “Forecasting Seasonals and Trends by Exponentially Weighted

Averages.” O.N.R. Memorandum 52. Carnegie Institute of Technology, Pittsburgh USA.

Hyndman & Athanasopoulos (2017) Forecasting: principles and practice, 2nd edition, OTexts:

Melbourne.

I. Jang, S. Oh, Y. Kim, C. Park, and H. Kang, “Well-placement optimisation using sequential

artificial neural networks,” Energy Exploration & Exploitation, vol. 36, no. 3, pp. 433–

449, 2017.

I.D. Gates, 2013, Basic Reservoir Engineering. ISBN: 978-1-4652-3684-5.

Irani, M., and Cokar, M., 2014. Understanding the impact of temperature-dependent thermal

conductivity on the steam-assisted gravity-drainage (SAGD) process. Part 1: Temperature

front prediction. Paper SPE 170064 presented at the SPE Heavy Oil Conference-Canada,

Calgary, 10-12 June. doi: 10.2118/170064-MS.

J. Li, J. H. Cheng, J. Y. Shi, and F.Huang, “Brief introduction of back propagation (BP) neural

network algorithm and its improvement,” in Advances in Computer Science and

Information Engineering—Volume 2, D. Jin and S. Lin, Eds., vol. 169 of Advances in

Intelligent and Soft Computing, pp. 553–558, Springer, Berlin, Germany, 2012.

Page 187: Machine Learning Applications for Production Prediction

170

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation

of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs],

December 2014. URL http://arxiv.org/abs/1412.3555.

K. Leung and C. Leckie, “Unsupervised anomaly detection in network intrusion detection using

clusters,” Proceedings of the Twenty-eighth Australasian conference on Computer

Science-Volume 38, pp. 333–342, 2005

Krishnamurthy, V. and Yin, G.G. (2002) Recursive algorithms for estimation of hidden Markov

models and autoregressive models with Markov regime. IEEE Transactions on

Information Theory, 48(2), 458–476.

Kumara M.P.T.R., Fernando W.M.S., Perera J.M.C.U., Philips C.H.C, Time Series Prediction

Algorithms – Literature Review, Department of Computer Science and Engineering,

University of Moratuwa, Sri Lanka, 2013.

Lin, W.; Bergquist, A. M.; Mohanty, K.; Werth, C. J., Environmental Impacts of Replacing

Slickwater with Low/No-Water Fracturing Fluids for Shale Gas Recovery. ACS

Sustainable Chemistry & Engineering 2018, 6 (6), 7515-7524.

Lindsay, G., Miller, T., Xu, T.. 2018. Production Performance of Infill Horizontal Wells vs. Pre-

Existing Wells in the Major US Unconventional Basins. Presented at the SPE Hydraulic

Fracturing Technology Conference, The Woodlands, Texas, USA, 23-25 January. SPE-

189875-MS. https://doi-org.ezproxy.lib.ucalgary.ca/10.2118/189875-MS

Louisiana Ground Water Resources Commission. (2012). Managing Louisiana’s groundwater

resources: An interim report to the Louisiana Legislature. Baton Rouge, LA: Louisiana

Department of Natural Resources.

http://dnr.louisiana.gov/index.cfm?md=pagebuilder&tmp=home&pid=907.

M. Ahmadi , R. Soleimani, M. Lee, T. Kashiwao, and A. Bahadori, Determination of oil well

production performance using artificial neural network (ANN) linked to the particle

swarm optimization (PSO) tool, Petroleum, Volume 1, Issue 2, pp. 118–132, June 2015.

Ma, Y. Zee Holditch, Stephen A.. (2016). Unconventional Oil and Gas Resources Handbook -

Evaluation and Development - 1.1 Introduction. Elsevier. Retrieved from

https://app.knovel.com/hotlink/pdf/id:kt010QJP51/unconventional-oil-gas/introduction

Maleki, S., A. Moradzadeh, R. Ghavami, and R. Gholami. 2014. Prediction of shear wave

velocity using empirical correlations and artificial intelligence methods. NRIAG Journal

of Astronomy and Geophysics, 3(1), 70–81. https://doi.org/10.1016/j.nrjag.2014.05.001

Mitchell, Tom (1997). Machine Learning. New York: McGraw Hill. ISBN 0-07-042807-7.

OCLC 36417892.

Mohaghegh, S. D. (2011, January). Reservoir Simulation and Modeling Based on Pattern

Recognition. In SPE Digital Energy Conference and Exhibition. Society of Petroleum

Engineers.

Mohaghegh, S. D. (2017). Data-Driven Reservoir Modeling. Society of Petroleum Engineers.

Page 188: Machine Learning Applications for Production Prediction

171

Mohanty S (2005) Estimation of vapour liquid equilibria of binary systems, carbon dioxide-ethyl

caproate, ethyl caprylate and ethyl caprate using artificial neural networks. Fluid Phase

Equilib 235(1):92–98

Montgomery, C. T., & Smith, M. B. (2010, December 1). Hydraulic Fracturing: History of an

Enduring Technology. Society of Petroleum Engineers. doi:10.2118/1210-0026-JPT

Montgomery, C., Fracturing Fluid Components. InTech 2013,

http://dx.doi.org/10.5772/56422;.10.5772/56422

Morton M Q, 2013, “Unlocking the earth – a short history of hydraulic fracturing” GEOExPro,

10(6) 86-90

Moslow, T.F., 2000. Reservoir architecture of a fine-grained turbidite system: Lower Triassic

Montney Formation, Western Canada Sedimentary Basin, in: Deep Water Reservoirs of

the World. pp. 686–713.

National Energy Board, 2018, Market Snapshot: Evolving technology is a key driver of

performance in modern gas wells: a look at the Montney Formation, one of North

America’s biggest gas resources,

http://www.nebone.gc.ca/nrg/ntgrtd/mrkt/snpsht/2018/04-04-1vlvngtchnlg-eng.html

(Accessed April 16, 2020)

Nelson, Ronald A.. (2001). Geologic Analysis of Naturally Fractured Reservoirs (2nd Edition).

Elsevier.

Neville, Suzanna & Hjellvik, Vidar & Mackinson, Steven & van der Kooij, Jeroen. (2020).

Using artificial neural networks to combine acoustic and trawl data in the Barents and

North Seas. ICES CM 2004/R:05

Onwunalu JE and Durlofsky LJ (2010) Application of a particle swarm optimization algorithm

for determining optimum well location and type. Computers & Geosciences 14: 183–198.

Palisch, T. T., Vincent, M., & Handren, P. J. (2010, August 1). Slickwater Fracturing: Food for

Thought. Society of Petroleum Engineers. doi:10.2118/115766-PA

Parapuram, G.; Mokhtari, M.; Ben Hmida, J. An Artificially Intelligent Technique to Generate

Synthetic Geomechanical Well Logs for the Bakken Formation. Energies 2018, 11, 680.

Pennsylvania Department of Environmental Protection. (2015). Oil and gas reporting website,

statewide data downloads by reporting period.

https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/DataExports/DataEx

ports.aspx.

Pykes, K., 2020, The Vanishing/Exploding Gradient Problem in Deep Neural Networks,

https://towardsdatascience.com/the-vanishing-exploding-gradient-problem-in-deep-

neural-networks-191358470c11

R. K. Agrawal and R. Adhikari, “An Introductory Study on Time Series Modeling and

Forecasting” CoRR, 2013.

R.M. Butler. 1997. Thermal Recovery of Oil and Bitumen. GravDrain Inc., Calgary, Alberta,

Canada ISBN: 0-9682563-0-9.

Page 189: Machine Learning Applications for Production Prediction

172

Rajabi M, Bohloli B, Ahangar EG (2010) Intelligent approaches for prediction of compressional,

shear and Stoneley wave velocities from conventional well log data: a case study from

the Sarvak carbonate reservoir in the Abadan Plain (Southwestern Iran). Comput Geosci

36(5):647–664

Reynolds, M., Bachman, R., Peters, W., 2014: “A Comparison of the Effectiveness of Various

Fracture Fluid Systems Used in Multi-Stage Fractured Horizontal Wells: Montney

Formation, Unconventional Gas”; SPE 168632 presented at the Hydraulic Fracturing

Technology Conference, The Woodlands TX, Feb. 4 – 6.

Ron Meir, Nonparametric Time Series Prediction Through Adaptive Model Selection, Machine

Learning, Kluwer Academic Publishers, 39, 5-34, 2000. [online] Available:

http://webee.technion.ac.il/Sites/People/rmeir/Publications/MeirTimeSeries00.pdf

Russell D. Reed and Robert J. Marks. 1998. Neural Smithing: Supervised Learning in

Feedforward Artificial Neural Networks. MIT Press, Cambridge, MA, USA.

S. Goel, I. Khan and T. Garg, "An Applicative Study of Advances in Machine Learning," 2019

6th International Conference on Computing for Sustainable Global Development

(INDIACom), New Delhi, India, 2019, pp. 255-259.

S. Mohaghegh, Virtual-intelligence applications in petroleum engineering: part 1—artificial

neural networks, J. Pet. Technol. 52 (2000) 64–73.

Salmachi A, Sayyafzadeh M and Haghighi M (2013) Infill well placement optimization in coal

bed methane reservoirs using genetic algorithm. Fuel 111: 248–258.

Sarker, N. N. and Ketkar, M. A., “Developing Excel Macros for Solving Heat Diffusion

Problems,” ASEE- 2004-1520, Proceedings of the 2004 American Society for

Engineering Education Annual Conferences & Exposition, Salt Lake City, Utah.

Scanlon, BR; Reedy, RC; Nicot, JP. (2014). Will water scarcity in semiarid regions limit

hydraulic fracturing of shale plays? Environmental Research Letters 9.

http://dx.doi.org/10.1088/1748-9326/9/12/124011.

Schlumberger 2019, Petrel E&P Software Platform.

Schultz, R., Atkinson, G., Eaton, D. W., Gu, Y. J., & Kao, H. (2018). Hydraulic fracturing

volume is associated with induced earthquake productivity in the Duvernay play.

Science, 359(6373), 304–308. https://doi.org/10.1126/science.aao0159

Sharma, J. and Gates, I.D. Multiphase Flow at the Edge of a Steam Chamber. Canadian Journal

of Chemical Engineering, 88(3):312-332,2010.

Shen, C. 2011. Evaluation of Wellbore Effect on SAGD Start-up. Paper presented at the

Canadian Unconventional Resources Conference held in Calgary, Alberta, Canada, 15-17

November 2011.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple

way to prevent neural networks from overfitting. J. Machine Learning Res. 15, 1929–

1958 (2014)

Page 190: Machine Learning Applications for Production Prediction

173

Subramanian Chandramouli; Amit Kumar Das; Saikat Dutt. (2018) Machine Learning. Pearson

Education India

Tanaka, S., Wang, Z., Dehghani, K., Jincong He, H., Velusamy, B., Wen, X.-H., Large Scale

Field Development Optimization Using High Performance Parallel Simulation and Cloud

Computing Technology, SPE 191728, 2018 SPE Annual Technical Conference and

Exhibition held in Dallas, Texas, 24-26 September 2018.

Temizel, C., Purwar, S., Abdullayev, A., Urrutia, K., Tiwari, A., Erdogan, S.S., 2015 Efficient

Use of Data Analytics in optimization of hydraulic fracturing in unconventional gas

reservoirs. SPE-177549-MS

Trembath, Alex, Jesse Jenkins, Ted Nordhaus, and Michael Shellenberger. 2012. “Where the

Shale Gas Revolution Came From: Government’s Role in the Development of Hydraulic

Fracturing in Shale.” Breakthrough Institute.

U.S. EPA. Hydraulic Fracturing For Oil and Gas: Impacts From the Hydraulic Fracturing Water

Cycle on Drinking Water Resources In the United States (Final Report). U.S.

Environmental Protection Agency, Washington, DC, EPA/600/R-16/236F, 2016.

United States Environmental Protection Agency, 2013, The Hydraulic Fracturing Water Cycle,

https://web.archive.org/web/20130525032531/http://www2.epa.gov/hfstudy/hydraulic-

fracturing-water-cycle

Veil, J. (2015). U.S. produced water volumes and management practices in 2012. Oklahoma,

City, OK: Ground Water Protection Council.

ttp://www.gwpc.org/sites/default/files/Produced%20Water%20Report%202014-

GWPC_0.pdf.

Wang, S., and Chen, S. 2016. Evaluation and Prediction of Hydraulic Fractured Well

Performance in Montney Formations Using a Data-Driven Approach. Presented at the

SPE Western Regional Meeting, Anchorage, Alaska, 23–26 May. SPE-180416-MS.

http://dx.doi.org/10.2118/180416-MS.

Wang, S.; Chen, S. Insights to fracture stimulation design in unconventional reservoirs based on

machine learning modeling. J. Pet. Sci. Eng. 2019a, 174, 682–695. [CrossRef]

Wang F.P., Gale J.F.W., Screening criteria for shalegas systems. Gulf Coast Association of

Geological Societies Transactions, 2009, 59, 779-793

Winters, P R. 1960. “Forecasting Sales by Exponentially Weighted Moving Averages.”

Management Science 6:324–42.

Y. He, S. Cheng, J. Qin, J. Chen, Y. Wang, N. Feng, H. Yu, et al., Successful application of well

testing and electrical resistance tomography to determine production contribution of

individual fracture and waterbreakthrough locations of multifractured horizontal well in

changqing oil field, china, in: SPE Annual Technical Conference and Exhibition, Society

of Petroleum Engineers, 2017.

Yeten B, Durlofsky LJ and Aziz K (2002) Optimization of nonconventional well type, location,

and trajectory. In: SPE annual technical conference and exhibition, San Antonio, TX,

USA, 29 September–2 October 2002.

Page 191: Machine Learning Applications for Production Prediction

174

Yuan, J. Y., and Mcfarlane, R. 2009. Evaluation of steam circulation strategies for SAGD start-

up. JCPT 50(1): 20-32. SPE143655-PA.

Yudeng Qiao, Jun Peng, Lan Ge, Hongjin Wang, "Application of PSO LS-SVM forecasting

model in oil and gas production forecast", 2017 IEEE 16th International Conference on

Cognitive Informatics & Cognitive Computing (ICCI*CC), vol. 00, no. , pp. 470-474,

2017, doi:10.1109/ICCI-CC.2017.8109791

Yule, G.U. 1927. On a method of investigating periodicities in disturbed series with special

reference to Wolfer’s sunspot numbers. Philosophical Transactions of the Royal Society

of London Series A, 226, 267–98.

Z. He, L. Yang, J. Yen, C. Wu, Neural-Network Approach to Predict Well Performance Using

Available Field Data, in: Proceedings of SPE Western Regional Meeting, Bakersfield,

California, 26–30 March, SPE68801, 2001.

Zhang, D., Yuntian, C., Jin, M., 2018. Synthetic well logs generation via Recurrent Neural

Networks. Pet. Explor. Dev. 45, 629–639.

Zhang, J., Florita, A., Hodge, B. M., Lu, S., Hamann, H. F., Banunarayanan, V., & Brockway,

A.M. (2015). A suite of metrics for assessing the performance of solar power forecasting.

Solar Energy, 111, 157-175. http://dx.doi.org/10.1016/j.solener.2014.10.016

Zhu L, Zeng F, Huang Y. A correlation of steam chamber size and temperature falloff in the

early-period of the SAGD Process. Fuel, 2015 148: 168–177.

Zhu, L., Zeng, F. A Condensation Heating Model for Evaluating Early-Period SAGD

Performance. J. Petrol. Sci. Eng. 100, 131–145. Transport in Porous Media, 104(2), 363-

383 (2014)

Zhu, L., Zeng, F., Zhao, G., Duong, A.: Using transient temperature analysis to evaluate steam

circulation in SAGD start-up processes. J. Petrol. Sci. Eng. 100, 131–145 (2012)

Page 192: Machine Learning Applications for Production Prediction

175

Appendix A: Temperature Transient Analysis of the Steam Chamber during a SAGD

Shutdown Event

A.1 Abstract

In 2015, major forest fires in Northeast Alberta Canada shut down nearly all major oil sands

extraction and mining operations leading to a loss of 47 million barrels of bitumen within four

weeks; at a $20 netback, this is equivalent to nearly $1 billion dollars in lost value. For in-situ

Steam-Assisted Gravity Drainage operations, the shutdown of operations meant that steam

injection stopped and as a consequence, with continued heat losses from the steam chambers to

the surrounding rock formations, the temperature of the steam chambers fell. This implies that

when the steam injection resumes, some fraction of the injected energy is consumed bringing the

steam chamber back to the temperature and pressure it was at prior to the shutdown. Here, we

examine temperature fall-off from steam chambers where injection has stopped by using a new

three-zone model with heat losses taking into account temperature-dependent thermal

conductivity. The model solves the transient heat conduction and steam condensing equations.

The results show that while the steam chamber collapses it heats the surrounding reservoir

through latent heat loss. A warmer reservoir leads to more mobile oil and less heat loss once

steam injection restarts.

Page 193: Machine Learning Applications for Production Prediction

176

A.2 Introduction

The performance and efficiency of oil sands thermal recovery processes such as Steam-Assisted

Gravity Drainage (SAGD) is tied to heat transfer within the reservoir. As illustrated in cross-

section in Figure A.1, when the steam enters the oil sands formation through a horizontal steam

injection well, it flows to the outer edge of the steam chamber and releases its latent heat which

in turn is conducted and convected into the colder oil sands at the chamber’s edge (Butler, 1991).

The bitumen is heated to the steam’s temperature and as a consequence, its viscosity drops by

over five orders of magnitude until it is less than 10 cP. The mobilized bitumen then drains

under gravity to a horizontal production well at the base of the steam chamber (Butler, 1991).

The key to successful performance of SAGD is the conformance of steam along the horizontal

well pair. Providing the well pairs are parallel to each other and reasonably horizontal, steam

conformance is controlled in turn by the geological properties and heterogeneity of the oil sands

formation. In particular, since bitumen mobilization provides pore space for steam to invade,

steam conformance is controlled by the ability of the bitumen to be both mobilized and drained

from the reservoir. Thus, the key underlying mechanism of steam conformance is heat transfer

from steam to bitumen and bitumen drainage as illustrated by Austin-Adigio and Gates (2018).

This also implies that a continuous supply of steam must be provided to the reservoir to enable

continuous mobilization, drainage, and production of bitumen.

Page 194: Machine Learning Applications for Production Prediction

177

Figure A.1: Cross sectional schematic of a typical SAGD process (Gotawala and Gates, 2010).

If steam injection stops, then the pressure of the steam chamber declines. If production were to

continue while steam injection was halted, the pressure would decline in the chamber simply

through the production of fluids from the chamber. The other consequence of a shutdown of

steam injection is that with the pressure declining, the saturation temperature of the steam would

also fall. Furthermore, with continued heat losses from the steam chamber to the surrounding

formation as well as to the overburden and understrata, together with the pressure decline, the

temperature of the steam chamber would drop. With sufficient time, the temperature of the

chamber would fall to the point that oil mobilization suffers and drainage declines lead to

potentially non-economic production. If long enough, to re-pressurize the chamber and get it

back to steam temperature may take an extended period of time of steam injection after the

shutdown period was done. A key question that arises is what is the evolution of the steam

chamber temperature after steam injection is stopped in the case with no production from the

chamber? Here, we examine this question.

Page 195: Machine Learning Applications for Production Prediction

178

A.3 Literature Review

On temperature decline analysis, there are very few papers that describe methods or applications

of temperature decline methods or both. Duong (2008a) was the first to attempt using

temperature to model steam chamber development. Duong’s study introduced two novel

concepts of cooling time and heating ring and proposed a new technique that uses temperature

measurements taken during a shut in to estimate the effectiveness of conductive heating. The

mathematical model developed by Duong represents the horizontal well as a one-dimensional

radial-cylindrical heat flux equation for an infinite-acting formation. The model is considered

analogous to pressure transient analysis of a pressure build-up test. Temperature fall off data

during shut ins can be used to estimate cooling time and thermal diffusivity of the target

formation.

Zhu et al. (2012) developed semi analytical models that predict the change in steam chamber

temperature during a shut in for a two-dimensional system in the cross-well plane. The models

were developed for both the circulation (startup) and early steam injection (ramp up) phases of

SAGD. The study also developed an inverse model which interprets actual temperature falloff

data at a given location along the wellbore to back calculate the steam chamber size. The inverse

model works by generating hundreds of temperature profiles corresponding to different steam

chambers and iterating until the generated temperature profiles matches the input temperature

profile. In 2015, Zhu et al. showed that plotting the temperature falloff derivative curve against

time resulted in a straight line for the early stage of SAGD. Using numerical simulation, they

developed a correlation between steam chamber size and temperature falloff data, and also

Page 196: Machine Learning Applications for Production Prediction

179

developed a practical method to estimate the steam chamber size variation along the horizontal

well bore. The models developed by Zhu et al. (2012) solve partial differential equations

analytically by using Laplace transforms. This approach, although accurate, limits the analysis to

simple linear systems where properties are constant. To deal with non-linear systems arising

from temperature-dependent properties, numerical methods such as the finite difference method

are required (Ketkar and Reddy 2003; Sarker and Ketkar 2013).

A.4 Temperature Transient Analysis

Temperature transient analysis uses the evolution of temperature to evaluate properties of a

system. For the case of SAGD, the decline of the temperature can be used to determine heat loss

rates or to determine the size of the steam chamber. Here, we derive a new transient temperature

numerical model that takes temperature-dependent thermal conductivity into account. During

the early life of a SAGD operation there are two types of stages which correspond to two types

of heating scenarios (Edmunds and Gittins, 1991). The first stage is the startup stage during

which steam is circulated in both well pairs to heat the surrounding formation via thermal

conduction. Circulation works by injecting steam into the tubing and circulating it out the

annulus. Once the bitumen around the wells has reached a sufficient temperature to be mobile

and a thermal communication between the well pairs is established the operation moves to Stage

2 which is known as ramp up. During ramp up, steam is injected into the formation and the steam

chamber expands upwards until it hits the overburden formation. While the steam chamber is

growing vertically the rate of oil production increases. After the steam chamber begins to expand

horizontally the production rate levels off, and conventional SAGD begins.

Page 197: Machine Learning Applications for Production Prediction

180

If a shut in occurs during the circulation phase the dominant form of heat loss will be via

conduction with no latent heat transfer occurring. If the shut in occurs during ramp up the steam

chamber will lose energy to the surrounding reservoir both through sensible and latent heat

transfer. Although convection will exist when the steam chamber collapses it can be assumed

that the dominant form is still conduction.

The mathematical model developed here follows the model first proposed by Zhu et al. (2012).

The models were built on the following assumptions:

1. Beyond the edge of the steam chamber, heat transfer via conduction is the dominant

mode of heat transfer.

2. All temperature zones are modelled as circles with radial temperature distributions. If no

steam is present, then the center of the circle is the highest temperature. If steam is

present, the shape of the steam chamber is assumed to be a circle with constant

temperature until the radius where the steam reaches the saturation curve with zero

quality beyond which point the temperature drops with increasing radius.

3. The steam chamber is modelled as a radial system in the cross-well plane.

4. The reservoir has uniform thermal properties.

A.4.1 Non-Condensing Model for SAGD Start up

During the startup phase of a SAGD operation, steam is circulated through the well to establish

thermal communication between the injector and producer and also make the nearby oil mobile.

Page 198: Machine Learning Applications for Production Prediction

181

In effect, the wells act as line heat sources. Since the steam is circulated into and out of the well,

no actual steam is injected into the reservoir so the SAGD process can be modeled as simple

conduction from the well to the reservoir. To model the start-up process of SAGD we employed

the non-condensing three zone model first proposed by Zhu et al. (2012) depicted in in Figure

A.2.

Figure A.2: Schematic diagram of a three zone non-condensing model used for SAGD start

up (modified from Zhu et al., 2012).

The three temperature zones are bounded by three circles of increasing radius. The innermost

circle is the steam injection well which is at steam temperature. Zone 1 is the region

immediately surrounding the well and is also at the steam temperature. Zone 2 is saturated liquid

water with its outer limit, and Zone 3, is at reservoir temperature. The three region model is also

useful when modeling a steam chamber since the hot inner zone contains wet steam and thus it is

at saturation conditions and thus its temperature is equal to the steam temperature.

Page 199: Machine Learning Applications for Production Prediction

182

A.4.1.1 Mathematical form of non-condensing model

For heat transfer in a cylindrical geometry, the dimensionless heat transfer equation is given by

(Zhu et al.2012):

𝜕2𝑇𝐷

𝜕𝑟𝐷2 +

1

𝑟𝐷

𝜕𝑇𝐷

𝜕𝑟𝐷=

𝜕𝑇𝐷

𝜕𝑡𝐷

where TD, tD, and rD, are dimensionless temperature, time and radius, defined by:

𝑇𝐷 =𝑇−𝑇𝑟

𝑇𝑠−𝑇𝑟, 𝑟𝐷 =

𝑟

𝑟𝑤, 𝑡𝐷 = 𝑡

𝛼

𝑟𝑤2

and Ts is the steam temperature, Tr is the initial reservoir temperature, rw is the well bore radius,

and α is the thermal conductivity of the reservoir. The three zones have the following initial

conditions:

Zone 1 (hot zone): 𝑇𝐷(𝑟𝐷, 𝑡𝐷 = 0) = 1 at 0 < 𝑟𝐷 < 𝑅𝐷1

Zone 2 (intermediate zone): 𝑇𝐷(𝑟𝐷, 𝑡𝐷 = 0) =ln (

𝑟𝐷𝑅𝐷2

)

ln (𝑅𝐷1𝑅𝐷2

) at 𝑅𝐷1 < 𝑟𝐷 < 𝑅𝐷2

Zone 3 (cold zone): 𝑇𝐷(𝑟𝐷 , 𝑡𝐷 = 0) = 0 at 𝑅𝐷2 < 𝑟𝐷

The boundary conditions are as follows:

• Since all zones are in contact with each other the heat transfer rate is equal at the boundaries,

and

• The temperature at infinite radius is assumed to always be at Tr.

Page 200: Machine Learning Applications for Production Prediction

183

A.4.1.2 Numerical solution of the non-condensing model

The finite difference method is used to solve the radial heat transfer equation. The

discretization grid is shown in Figure A.3.

Figure A.3: Discretization domain for one-dimensional radial heat transfer - temperature grid

layout.

The finite difference method can be solved either explicitly or implicitly. The explicit method

uses a forward difference approach and solves for temperature at a new time step using the

temperature at the previous time step. The implicit method uses both the previous time step and

the current time step to solve for temperature at the current time step using an iterative

procedure. The model designed in this study uses the implicit Crank-Nicholson method to solve

for temperature at the jth time and ith radial nodes (Sarker and Ketkar, 2004). Since the non-

condensing model assumes no latent heat transfer (condensation), the only form of heat transfer

is sensible heat transfer. The grid is built by first specifying RD1 and RD2 and the number of

nodes needed in the radial direction. Since RD1 and RD2 represent the size of the steam chamber

they are selected based on the length of steam injection prior to shut in and operator experience.

Page 201: Machine Learning Applications for Production Prediction

184

It was found that the node representing the “infinite” boundary condition is far enough away so

that results are invariant when the boundary radius is at r > 5R2.

The first ‘i’ node is = 0 and represent the center of the SAGD injector well bore, the grid is

divided to contain the three zones described above. The population of the grid involves three

steps:

1. Filling in the temperature for all nodes in the top row at initial conditions tD = 0. TD in the

hot zone nodes is = 1, TD in the cold zones is = 0.

2. Filling in the temperature for all nodes in the first (rD=0) and last columns (rD->∞) the TD

at the infinite boundary node will always be = 0 since it assumed that no heat reaches it.

The finite difference equation for Node 0 is then (Sarker and Ketkar 2004):

𝑇0𝑗

=(1 − 2𝑃)𝑇0

𝑗−1+ 2𝑃𝑇1

𝑗−1+ 2𝑃𝑇1

𝑗

1 + 2𝑃

where 𝑃 = 𝛥𝑡

𝛥𝑟2

3. Solving for the temperature at the other nodes is given by (Sarker and Ketkar 2004):

𝑇𝑖𝑗

=𝑇𝑖

𝑗−1(1 − 𝑃) +𝑃2 [(1 +

12𝑖) 𝑇𝑖+1

𝑗−1+ (1 +

12𝑖) 𝑇𝑖+1

𝑗+ (1 −

12𝑖) 𝑇𝑖−1

𝑗−1+ (1 −

12𝑖) 𝑇𝑖−1

𝑗]

1 + 𝑃

A.4.2 Condensing Model for SAGD Ramp up

During the ramp up stage of SAGD, steam is injected into the reservoir and a steam chamber

grows within the reservoir. Since it loses its latent heat to the surrounding formation, steam

Page 202: Machine Learning Applications for Production Prediction

185

needs to be continuously injected to maintain growth. If a shut in occurs, heat losses causes the

steam within the chamber to condense into liquid water. During condensation, the temperature of

the steam stays equal and only the overall quality of the steam decreases. To model a shut in

during the ramp up phase we used the condensing three system SAGD model first proposed by

Zhu and Zeng (2014) depicted in Figure A.4. Since the dominant form of heat transferred beyond

the chamber is conduction, the steam chamber can be modeled as a shrinking circle.

Figure A.4: Schematic diagram of a three-zone condensation model used for SAGD ramp up

(modified from Zhu and Zeng, 2014).

A.4.2.1 Mathematical form of condensing model

The governing equation for transient heat transfer, including the effect of latent heat, in 1D radial

coordinates is given by:

Page 203: Machine Learning Applications for Production Prediction

186

𝜕ℎ

𝜕𝑡=

1

𝑟

𝑘𝑇𝐻

𝜌

𝜕2𝑇

𝜕𝑟2− 𝜆

𝜕𝑓𝑠𝑡

𝜕𝑡

where h is the specific enthalpy, kTH is the thermal conductivity, ρ is density, is the specific

latent heat of vaporization, and fst is the steam quality. The three zones have the following

initial conditions:

Zone 1 (hot zone): 𝑓𝑠𝑡(𝑟𝐷 , 𝑡𝐷 = 0) = 1 at 0 < 𝑟𝐷 < 𝑅𝐷1

Zones 2 (intermediate zone) and 3 (cold zone): 𝑓𝑠𝑡(𝑟𝐷 , 𝑡𝐷 = 0) = 0 at 𝑅𝐷1 < 𝑟𝐷

The boundary condition is as follows: the steam quality at infinite radius is equal to 0.

A.4.2.2 Numerical solution of the condensing model

The finite difference solution to the latent heat transfer problem is more complex and requires

two mirror grids. The first grid calculates the steam quality (fst) and the second grid calculates T.

Both grids communicate with each other and are considered “mirrors” since the steam quality at

the radial node i and time node j is directly linked to the enthalpy and temperature at the same

node. Latent heat transfer is done before sensible heat transfer so if the steam quality > 0 at a

particular node, then the temperature at that node is equal to the injected steam temperature Ts.

To test if there is still steam left at each new time step and to solve for the steam quality at the

new time step, two decision trees are required: one tree for the steam quality grid and another

for temperature grid as depicted in Figures A.5 and A.6.

Page 204: Machine Learning Applications for Production Prediction

187

Figure A.5: Decision tree for steam quality grid calculation.

Figure A.6: Decision tree for steam temperature grid calculation.

A thorough search of the literature revealed that an implicit finite difference formula has not

been developed to solve for the transient radial steam quality. For the study, we developed a

Crank Nicholson scheme adapting the finite difference method for melting ice derived by

Page 205: Machine Learning Applications for Production Prediction

188

Zivkovic and Fujii (2001). The discretization domain for steam quality is shown in Figure

A.7.

Figure A.7: Discretization domain for one-dimensional radial phase change - steam quality

grid layout.

To populate both the temperature and steam quality grids, the following steps are taken:

1. Filling in the temperature for all nodes at initial conditions tD = 0. TD in the hot zone

nodes is equal to 1, TD in the cold zones is equal to 0.

2. Filling in the steam quality for all nodes in the top rows at initial conditions tD = 0.

3. Filling in the temperature for all nodes in the first (rD=0) and last columns (rD>∞) the TD

at the infinite boundary node is always equal to 0. The temperature at Node 0 is

calculated by using (Sarker and Ketkar 2004):

𝑇0𝑗

=(1 − 2𝑃)𝑇0

𝑗−1+ 2𝑃𝑇1

𝑗−1+ 2𝑃𝑇1

𝑗

1 + 2𝑃

4. The steam quality at the infinite boundary node is equal to 0. The steam quality at node 0

is calculated by using:

Page 206: Machine Learning Applications for Production Prediction

189

𝑓𝑠𝑡0𝑗

= 𝑓𝑠𝑡0𝑗−1

+𝑐𝑃

𝐿(𝑇1

𝑗−1+ 𝑇1

𝑗− 𝑇0

𝑗−1− 𝑇0

𝑗)

5. Using the decision tree in Figures A.5 and A.6, the steam quality and temperature at all

other node are calculated. The “Latent Heat Calc” referenced in the decision tree is given

by:

𝑓𝑠𝑡𝑖𝑗

= 𝑓𝑠𝑡𝑖𝑗−1

+𝑐𝑃

2𝐿[(1 +

1

2𝑖) (𝑇𝑖+1

𝑗−1− 𝑇𝑖

𝑗−1+ 𝑇𝑖+1

𝑗− 𝑇𝑖

𝑗)

+ (1 −1

2𝑖) (𝑇𝑖

𝑗−1− 𝑇𝑖−1

𝑗−1+ 𝑇𝑖

𝑗− 𝑇𝑖−1

𝑗)]

The finite difference model was encoded in Microsoft Excel. The code reads the user inputs

from the spreadsheet and simultaneously builds two finite difference grids, one for the

temperature and one for the steam quality. The grids are built by using three FOR loops, one for

the spatial nodes one for the time steps and one for the iterative nodes used in the implicit

method. The mirror grids communicate with each other using the decision trees in Figures A.5

and A.6.

A.4.3 Variable Thermal Diffusivity

One advantage of having a numerical solution over an analytical is that non-linear complexity

can be added to the problem. The model developed by Zhu et al. (2012) assumes a constant

thermal diffusivity as derived in classic SAGD models such as those developed by Butler (1997)

and Gates and Sharma (2010). In reality, however, the thermal diffusivity of an oil sands

reservoir changes with temperature and does play a role in the temperature profile. Thermal

Page 207: Machine Learning Applications for Production Prediction

190

diffusivity α as defined below is the quotient of the thermal conductivity kH (W/mC) and

volumetric heat capacity Ms (J/m3C):

α = 𝑘𝐻

𝑀𝑠

Irani and Cokar (2014) explain that the both density and heat capacity experience negligible

change during temperature changes and that the change in thermal diffusivity due to temperature

is dominated by the change in thermal conductivity. Oil sands are a mixture of various

components such as methane, steam, water bitumen and rock (quartz) which all have different

thermal conductivities. Irani and Cokar (2014) describe that the thermal conductivities of gas and

steam rise as temperature increases and that the thermal conductivities of bitumen and water

change little whereas the thermal conductivity of reservoir rock drops as temperature enlarges.

The bulk thermal conductivity of an oil sand is driven primarily by the rock and secondly by the

water and oil saturation. The thermal conductivity of gas and steam are very small and can be

ignored. Irani and Cokar (2014) explain that oil sands thermal conductivity is essentially a linear

function of temperature that can be written in the following form:

𝑘𝐻 = a + bϕ + c√𝑆𝑤 + 𝑑√𝑆𝑜 − 𝑒𝑇

where ϕ is the porosity, Sw is the water saturation, So is the oil saturation, T is temperature and a,

b, c, d and e are constants. Since the first four parts of the equation tend to remain constant, the

equation can be simplified to:

𝑘𝐻 = 𝑘𝐻𝑟𝑒𝑓 − 𝐵(𝑇 − 𝑇𝑟𝑒𝑓)

where kHref is the formation thermal conductivity at a reference temperature Tref and B is a

constant. From Irani and Cokar, the overall thermal conductivity of oil sands decreases linearly

with temperature with a slope (B) that varies between 0.001-0.002 K-1 and a changing y-intercept

Page 208: Machine Learning Applications for Production Prediction

191

of 1.7-2.7 W/mC. For our study we have used a reference temperature as the common SAGD

steam temperature of 250C. kHref was chosen as 1.8 W/mC. As suggested by Irani and Cokar

(2014) the coefficient “B” was set at 0.002 K-1. The formula for thermal diffusivity is then given

by:

α(W/mC) = 𝑘𝐻

𝑀𝑠=

𝑘𝐻𝑟𝑒𝑓 − 𝐵𝑇𝑟𝑒𝑓

𝑀𝑠=

1.8 − 0.002(𝑇(C) − 250)

𝑀𝑠

A.4.3.1 Incorporation into Numerical Model

To incorporate variable thermal diffusivity into the numerical model the equations must be

modified. For the non-condensing model, the temperature of grid block i and j is a function of

the temperature in the surrounding grid blocks and the parameter P (a function of 𝛥𝑡𝐷 and 𝛥𝑟𝐷2),

and 𝛥𝑡𝐷 is proportional to α, since α is now variable, P can be re-written as:

𝑃 = 𝛥𝑡

∆𝑟2𝛼 =

𝛥𝑡

∆𝑟2

1.8 − 0.002(𝑇 − 250)

𝑀𝑠

𝛥𝑡 and ∆𝑟2 are both constant. Since the thermal conductivity, over small temperature

differences, changes little with temperature, the P value used to calculate the temperature in grid

block i and j is based on the temperature from the previous time step grid block i and j-1:

𝑃𝑖𝑗

= 𝛥𝑡

∆𝑟2

1.8 − 0.002(𝑇𝑖𝑗−1

− 250)

𝑀𝑠

Consequently, to solve for the temperature in the i and j block, the following formulas are used:

For temperature:

𝑇𝑖𝑗

=𝑇𝑖

𝑗−1(1 − 𝑃𝑖

𝑗) +

𝑃𝑖𝑗

2 [(1 +12𝑖) 𝑇𝑖+1

𝑗−1+ (1 +

12𝑖) 𝑇𝑖+1

𝑗+ (1 −

12𝑖) 𝑇𝑖−1

𝑗−1+ (1 −

12𝑖) 𝑇𝑖−1

𝑗]

1 + 𝑃𝑖𝑗

For steam quality:

Page 209: Machine Learning Applications for Production Prediction

192

𝑓𝑠𝑡𝑖𝑗

= 𝑓𝑠𝑡𝑖𝑗−1

+𝑐𝑃𝑖

𝑗

2𝐿[(1 +

1

2𝑖) (𝑇𝑖+1

𝑗−1− 𝑇𝑖

𝑗−1+ 𝑇𝑖+1

𝑗− 𝑇𝑖

𝑗)

+ (1 −1

2𝑖) (𝑇𝑖

𝑗−1− 𝑇𝑖−1

𝑗−1+ 𝑇𝑖

𝑗− 𝑇𝑖−1

𝑗)]

A.5 Results and Discussion

A.5.1 Constant Thermal Diffusivity

Figure A.8 depicts how the radial temperature profile (in dimensional values) evolves during a

shut in that occurs while the SAGD operation is in the startup period. The hot zone radius R1 is

selected as 1 m and the intermediate zone radius R2 is taken to be 3 m. The well, steam and

reservoir properties used were reflective of typical parameters found for Athabasca oil sands

reservoirs where SAGD operations are practiced (rw = 0.084 m, Ts = 250C, Tr = 10C, kH =

1.8x10-3 kW/mC, Ms = 3,000 kJ/m3C). The results show that the temperature drops rapidly

after steam is stopped during the startup period. Within one month, the temperature at the well

has dropped to less than half its original value and after six months, the temperature is lower than

15% of the original temperature. This implies that if a SAGD well is in the startup period and if

steam injection is stopped, a substantial amount of heat will have to be injected to get the

interwell pair temperature distribution back to the target temperature (typically about 80C)

sufficient to mobilize bitumen from this part of the reservoir.

Page 210: Machine Learning Applications for Production Prediction

193

Figure A.8: Non-condensing model evolution of radial temperature profile.

Figure A.9 depicts the how the radial temperature profile evolves during a shut-in event that

occurs while the SAGD operation is in the ramp up period. In this period, the hot inner zone is at

steam temperature. Here, the hot zone radius R1 is taken to be 1 m and intermediate zone radius

R2 is equal to 3 m. The steam quality fst is equal to 0.9 which is reflective of typical SAGD

operations. The well, steam and reservoir properties used were the same as the non-condensing

model listed above. Figure A.10 shows how the radial steam quality evolves over time. The

results show that after steam injection stops, first, the latent heat within the system is largely

consumed and thereafter, the sensible heat in the water is then lost. The steam quality drops to a

maximum of 0.3 after 6 months of cooling with live steam only occupying a very small radius

around the well. The results show that the even though the hot and intermediate zone radii are

the same for both the condensing and non-condensing models, the steam chamber takes longer

too cool due to the extra energy stored as latent heat.

0

50

100

150

200

250

300

0 2 4 6 8 10

T (o

C)

Radial Distance from wellbore center (m)

0 Hours

1 week

2 weeks

1 month

2 months

6 months

Page 211: Machine Learning Applications for Production Prediction

194

Figure A.9: Condensing model evolution of radial temperature profile.

Figure A.10: Condensing model evolution of radial steam quality profile.

0

50

100

150

200

250

300

0 2 4 6 8 10

T (o

C)

Radial Distance from wellbore center (m)

0 Hours

1 week

2 weeks

1 month

6 weeks

2 months

6 months

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.0 0.5 1.0 1.5 2.0

Stea

m Q

ual

ity

Radial Distance from wellbore center (m)

0 Hours

1 week

2 weeks

1 month

6 weeks

Page 212: Machine Learning Applications for Production Prediction

195

Figure A.11 depicts how the temperature evolves with shut in time in the center of the wellbore

during the start-up and ramp up phases. The temperature during start up is always less than ramp

up since there is no latent heat transfer, however, the change is much more gradual since the heat

has not had time to distribute throughout the reservoir. The temperature during the ramp up

phase remained constant for about 6 weeks until the steam chamber fully condensed and then

dropped very suddenly, this is because the heat had over 6 weeks to distribute. This sudden drop

is better visualized in Figure A.12 which shows the negative slopes of the temperature profiles

versus time. The startup phase reached maximum value of about 0.34C/hour whereas the ramp

up phase reached values of over 19C/hour the moment the steam chamber collapsed.

Figure A.11: Temperature change with time at the center of the well bore during start up and

ramp up.

0

50

100

150

200

250

300

0 20 40 60 80 100 120 140 160 180

T (o

C)

time (days)

start up

ramp up

Page 213: Machine Learning Applications for Production Prediction

196

Figure A.12: Slope of temperature plotted against time at the center of the well bore during star

up and ramp up.

The temperature at the center of the wellbore during the ramp up phased leveled off around 67C

whereas the temperature during the star up phase kept decreasing to close to initial reservoir

temperature of 10C. This is because the latent heat loss from the steam had warmed up the

surrounding reservoir much more than that of sensible heat transfer during start up. The

temperature will continue to drop until it reaches 10C but this will take orders of magnitude

longer to occur than sensible heat transfer.

A.5.2 Variable Thermal Diffusivity

Figure A.13 shows the results of incorporating the variable heat conductivity into the model.

From the results, it is clear that variable heat conduction plays a relatively small role in the

overall result, however it demonstrates the model’s ability to incorporate complexities. Figure

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

0 20 40 60 80 100 120 140 160 180

-dT/

dt

(oC

/ho

ur)

time (days)

start up

ramp up

Page 214: Machine Learning Applications for Production Prediction

197

A.14 shows how a variable kh speeds up the cooling time, this is because the formation kh

increases as temperature drops causing heat to dissipate faster.

Figure A.13: Non-condensing radial temperature profile after 2 weeks for variable and constant

kh cases.

Figure A.14: Non-condensing temperature evolution at the center of the wellbore for variable

and constant kh cases.

0

50

100

150

200

250

300

0 2 4 6 8 10

T (o

C)

Radial Distance from wellbore center (m)

2 weeks constant kh

2 Weeks variable kh

0

50

100

150

200

250

300

0.0 20.0 40.0 60.0 80.0 100.0 120.0

T (o

C)

time (days)

constant kh

variable kh

Page 215: Machine Learning Applications for Production Prediction

198

A.6 Conclusions

In this study, a heat transfer model was developed to analyze how SAGD steam chambers

respond when steam injection is stopped. The model was solved numerically by using the finite

difference method. The results have shown that once the steam chamber has fully condensed the

temperature drops rapidly, but levels off at a higher value than the initial reservoir temperature.

While the steam chamber shrinks, it heats the surrounding formation through latent heat loss, this

causes the change of temperature in the middle of the well bore to drop very gradually. Having a

higher reservoir temperature is helpful during SAGD restart since the reservoir takes less energy

to heat and the oil is more mobile. The effect of temperature-dependent thermal diffusivity is

relatively small and a constant thermal diffusivity provides a good estimate of the temperature

transients