expert systems with applications - laramie, wyoming · 436 p. tahmasebi et al. / expert systems...
TRANSCRIPT
Expert Systems With Applications 88 (2017) 435–447
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
Data mining and machine learning for identifying sweet spots in shale
reservoirs
Pejman Tahmasebi a , ∗, Farzam Javadpour b , Muhammad Sahimi c
a Department of Petroleum Engineering, University of Wyoming, Laramie, Wyoming 82071, USA b Bureau of Economic Geology, Jackson School of Geosciences, The University of Texas at Austin, 78713, Austin, TX, USA c Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, CA 90089-1211, USA
a r t i c l e i n f o
Article history:
Received 12 February 2017
Revised 20 June 2017
Accepted 11 July 2017
Available online 14 July 2017
Keywords:
Neural networks
Genetic algorithm
Big data
Shale formation
Fracable zones
Brittleness index
a b s t r a c t
Due to its complex structure, production form a shale-gas formation requires more drillings than those
for the traditional reservoirs. Modeling of such reservoirs and making predictions for their production
also require highly extensive datasets. Both are very costly. In-situ measurements, such as well-logging,
are one of most indispensable tools for providing considerable amount of information and data for such
unconventional reservoirs. Production from shale reservoirs involves the so-called fracking, i.e. injection
of water and chemicals into the formation in order to open up flow paths for the hydrocarbons. The
measurements and any other types of data are utilized for making critical decisions regarding develop-
ment of a potential shale reservoir, as it requires hundreds of millions of dollar initial investment. The
questions that must be addressed include, does the region under study can be used economically for
producing hydrocarbons? If the response to the first question is affirmative, then, where are the best
places to carry out hydro-fracking? Through the answers to such questions one identifies the sweet spots
of shale reservoirs, which are the regions that contain high total organic carbon (TOC) and brittle rocks
that can be fractured. In this paper, two methods from data mining and machine learning techniques are
used to aid identifying such regions. The first method is based on a stepwise algorithm that determines
the best combination of the variables (well-log data) to predict the target parameters. However, in order
to incorporate more training, and efficiently use the available datasets, a hybrid machine-learning algo-
rithm is also presented that models more accurately the complex spatial correlations between the input
and target parameters. Then, statistical comparisons between the estimated variables and the available
data are made, which indicate very good agreement between the two. The proposed model can be used
effectively to estimate the probability of targeting the sweet spots. In the light of an automatic input and
parameter selection, the algorithm does not require any further adjustment and can continuously evalu-
ate the target parameters, as more data become available. Furthermore, the method is able to optimally
identify the necessary logs that must be run, which significantly reduces data acquisition operations.
© 2017 Elsevier Ltd. All rights reserved.
1
h
o
s
m
t
t
r
f
h
i
T
m
a
i
p
s
o
o
h
0
. Introduction
Due to the recent progress in multistage hydraulic fracturing,
orizontal drilling and advanced recovery methods, shales, rec-
gnized as unconventional reservoirs, have become a promising
ource of energy. They are formed by fine-grained organic-rich
atters and were previously considered as source and seal rock
hat, due to gas production and high pressures in the conven-
ional reservoirs were traditionally called the trouble zones. Such
egions were usually ignored, which explains why no compre-
∗ Corresponding author.
E-mail addresses: [email protected] (P. Tahmasebi),
[email protected] (F. Javadpour), [email protected] (M. Sahimi).
f
p
g
g
ttp://dx.doi.org/10.1016/j.eswa.2017.07.015
957-4174/© 2017 Elsevier Ltd. All rights reserved.
ensive data are available for them. Thus, their characterization
s still a massive task ( Tahmasebi, Javadpour, & Sahimi, 2015a,b;
ahmasebi, Javadpour, Sahimi, & Piri, 2016; Tahmasebi, 2017; Tah-
asebi & Sahimi, 2015 ). Furthermore, shales exhibit highly vari-
ble structures and complexities from basin to basin, and even
n small fields. They host very small pores, and have low-matrix
ermeability and heterogeneity, both at the laboratory and field
cales. Due to such difficulties and given the fact that new meth-
ds for characterization of shale reservoirs are still being devel-
ped, application of the characterization and modeling methods
or the traditional reservoir to shales is of great importance. In
articular, accurate characterization of such reservoirs entails inte-
rating various information, including petrophysical, geochemical,
eomechanical, and reservoir data ( Tahmasebi & Sahimi, 2016a,b;
436 P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447
a
o
g
t
a
a
t
B
t
2
t
t
d
N
m
t
2
g
i
t
i
y
r
e
I
k
d
E
r
c
h
p
s
e
y
i
w
y
w
t
c
2
i
p
i
t
o
o
c
n
R
Tahmasebi, Javadpour, & Sahimi, 2016; Tahmasebi, Sahimi, & An-
drade, 2017; Tahmasebi, Sahimi, Kohanpur, & Valocchi, 2016 ).
Apart from its type, efficient drilling may be thought of as tar-
geting the most productive zones of a reservoir with maximum
exposure. For shale reservoirs, this concept is equivalent to ar-
eas with high total organic carbon (TOC) and high fracability, i.e.
brittleness, which calls for comprehensive characterization of such
complex formations. High TOC and fracable index (FI) reflect high
quality of shale-gas reservoirs. The role of the TOC is clear, as it is
one of the main factors for identifying an economical shale reser-
voir. Higher TOC, ranging from 2% to 10%, represents richer organic
contents and, consequently, higher potential for gas production.
Since natural gas is trapped in both organic and inorganic mat-
ters, the TOC denotes the entire organic carbons, and is a direct
measure of the volume and maturity of the reservoir.
The FI influences the flow of hydrocarbons in a shale reservoir
and any future fracking in it. Thus, identifying the layers in a reser-
voir with high FI is of great importance. The FI controls a shale
reservoir’s production since it strongly influences the wells’ pro-
duction. It also provides very useful insight into where and how
new wells should be placed and spaced. In fact, unlike conven-
tional reservoirs that depend on long-range connectivity of the
permeable zones, optimal well locations and spacing control the
performance of shale reservoirs and future fracking operations in
them. Thus, separating the brittle and ductile zones of rock is a key
aspect of successful characterization of shale-gas reservoirs. More-
over, brittle shale has high potentials for being naturally fractured
and, consequently, exhibits good response to fracking treatments.
Past successful experience indicated that characterization of
shale reservoirs need accurate identification of the so-called sweet
spots, i.e. the zones that present the best production or the poten-
tial for high production, and the potential fracable zones, which
are critical to maximizing the production and future recovery. The
placement of most of the wells is closely linked with the sweet
spots, as well as the fracable zones for hydraulic fracturing. For ex-
ample, the TOC represents the ability of a shale reservoir in stor-
ing and producing hydrocarbons. Fracability is controlled mainly
by mineralogy and elastic properties, such as the Young’s and bulk
moduli and the Poisson’s ratio ( Sullivan Glaser et al., 2013 ). There-
fore, identification of the sweet spots is of great importance to
shale reservoirs. Such spots are characterized through high kero-
gen content, low water saturation, high permeability, high Young’s
modulus and low Poisson’s ratio.
One of the primary, as well as most affordable, methods for
characterizing complex reservoirs is coring and collecting petro-
physical data, as well as well logs. The latter can be integrated
with former in order to develop a more reliable model. In prin-
ciple, well-log data can be provided continuously and, thus, they
represent a real-time resource. Because of a huge number of wells
in a typical shale reservoir, we refer to such datasets as big data .
Clearly, such information is very useful when it is coupled with
some techniques that help better identify the sweet spots and fra-
cable zones. Eventually, the questions that must be addressed are:
where one should/should not drill new wells? Where are the zones
with high/low fracability index?
Aside from such critical questions, another issue regarding the
available big data is the fact that new data are continuously ob-
tained as the production proceeds. Thus, any algorithm for the
analysis of big data should be flexible enough for rapid adapta-
tion of new data. Furthermore, another important feature of the
algorithm should be its ability to use the available information to
create a “training platform” for forecasting the important parame-
ters.
In this paper, a very large database consisting of well logs, x-
ray diffraction (XRD) data, and experimental core analysis is used
to develop a model that reduces the cost and increases the prob-
bility of identifying the sweet spots. First, we describe a method
f data mining called stepwise regression ( Efromyson, 1960; Mont-
omery, Peck, & Vining, 2012 ) for identifying the correlations be-
ween the target (i.e., dependent) parameters – the TOC and FI -
nd the available well-log data (the independent variables). Then,
hybrid method borrowed from machine learning and artificial in-
elligence is proposed for accurate predictions of the parameters.
oth methods can be tuned rapidly, and can use the older database
o accurately characterize shale reservoirs.
. Methodology
As mentioned earlier, two very different methods are used in
his paper. The first is borrowed from data-mining field by which
he correlation between an independent variable and a series of
ependent variables is constructed and used for future forecasting.
ext, a method of machine learning for developing a more robust
odel is introduced that recognizes the complex relations between
he variables.
.1. Linear regression models based on statistical techniques
Due to dealing with a multivariate dataset, multiple linear re-
ression (MLR) is selected to model the relationship between the
ndependent and response variables, the TOC and FI. Suppose that
he response variable y is related to k predictor variables. Then, as
s well-known,
= β0 + β1 x 1 + β2 x 2 + . . . + βk x k + ε, (1)
epresents a MLR that contains k predictor variables. ε is the model
rror and βk with k = 0, 1, …, k are partial regression coefficients.
n practice, βk and the variance of the model error ( σ 2 ) are not
nown a priori . The parameters may be estimated using the sample
ata.
In the k -dimensional space of the predictor variables,
q. (1) represents a hyperplane. The expected change in the
esponse y as a result of a change in x j is contained in the coeffi-
ients β j . It should be noted that the variation of y is subject to
olding all the remaining variables x i ( i � = j ) constant. More com-
lex functional forms can still be included and modeled with a
light modification of Eq. (1) . For example, consider the following
quation:
= β0 + β1 x 1 + β2 x 3 2 + β3 e
x 3 + β4 x 1 x 2 + ε, (2)
f we let �1 = x 1 , �2 = x 3 2 , �3 = e x
3 and �4 = x 1 x 2 then Eq. (2) is
ritten as
= β0 + β1 �1 + β2 �2 + β3 �3 + β4 �4 + ε, (3)
hich is again a MLR model. Thus, Eq. (3) may be used to predict
he response variable using new dependent variables. The coeffi-
ients of Eq. (1) are estimated using least-square curve fit.
.1.1. Evaluating sub-models significance
Clearly, in the presence of multiple variables, selecting the most
mportant ones for reducing the complexity and performance of a
redictive model is of great importance. In other words, we are
nterested in identifying the best subset of parameters that main-
ains the variability while minimizing the complexity. Thus, vari-
us subsets must be compared and, then, one must decide which
ne is more representative and accurate than others. One common
riterion for model adequacy is the coefficient of multiple determi-
ation, R 2 . Thus, for a model with p variables, R 2 is calculated by
2 p =
S S R ( p )
S S T = 1 − S S Res ( p )
S S T , (4)
P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447 437
w
t
h
t
R
w
d
i
H
i
T
2
m
o
s
m
s
F
I
c
a
t
d
t
m
t
n
a
T
u
d
v
fi
(
s
m
A
t
c
b
s
T
i
F
b
s
s
a
L
2
t
a
l
o
t
t
f
t
i
h
p
g
m
b
l
2
2
s
i
Z
a
r
f
2
Z
c
m
o
a
i
s
p
p
p
r
i
a
a
p
t
c
t
t
m
r
I
p
u
i
o
b
v
m
t
I
w
2
d
here SS Res ( p ) and SS R ( p ) indicate, respectively, the residual sum of
he squares and regression sum of squares. A higher p results in a
igher R 2 p . However, an adjusted coefficient of multiple determina-
ion that is more transparent is defined by
2 Adj,p = 1 −
(n − 1
n − p
)(1 − R
2 p
), (5)
here n is the number of variables. It should be noted that R 2 Adj,p
oes not increase by adding more variables, but R 2 p exceeds R 2 p+ s f and only if the partial F statistics exceeds 1 ( Edwards, 1969;
aitovsky, 1968; Seber, George, & Lee, 2003 ). The partial F statistics
s used for evaluating the significance of adding a new variable s .
hus, one can use a subset of models that maximize R 2 Adj,p
.
.1.2. Variable selection
Variable selection can be computationally expensive, as one
ust determine the result of adding/eliminating of each variable
ne by one. There are, however, some efficient methods for doing
o that are briefly described below.
- Forward selection
This method begins by adding the first variable that has the
aximum correlation with the response variable. Then, if the
tatistics F exceeds F in , it will be added to the model. Note that
in is used as a threshold and is calculated by, F = ( (
S S Res 1 −S S Res 2 p 2 −p 1
)
( S S Res 2 n −p 2
) ) .
n a similar fashion, the next variable is added to the model if it
arries the maximum correlation after the previously added vari-
ble. Once again, the statistics F is compared with F in and added
o model, if it exceeds the threshold. The process continues until F
oes not exceed the predefined threshold or no variable is left.
- Backward selection
Backward variable elimination acts exactly as the opposite of
he forward selection. It first incorporates all the n variables in the
odel and, then, removes them one-by-one according to some cri-
erion. The elimination is carried out using the statistics of elimi-
ation, F in . In other words, the minimum statistics of the variables
re compared with F out and are eliminated if it is less than F out .
he procedure is then repeated for the remaining n − 1 variables
ntil the smallest statistics F is not less than F out .
- Stepwise selection
The above algorithms are computationally expensive when one
eals with a large dataset, as most of the unconventional reser-
oirs are. Instead, one may use stepwise selection, which is an ef-
cient method to identify the best combination of the variables
Efromyson, 1960 ). Similar to the backward algorithm, in stepwise
election the independent variables are all incorporated into the
odel one at a time and are re-evaluated using a pre-defined F .
fter adding a new variable, the statistics F out is compared with
he pre-defined criterion to decide if some of the existing variables
an be eliminated without losing the accuracy. That this is possi-
le is due to one of the previously incorporated variables losing its
ignificance when one or more new variables are added to model.
he algorithm is terminated when either the defined statistics F
s maximized, or the improvement from adding any new variable
does not exceed the pre-defined F . It also should be noted that
oth F in and F out are used in this method. Generally, one can con-
ider F out < F in in order allowing a variable to be added easier. It
hould be noted that one may benefit from the recently-developed
lgorithms for forward selections ( Cernuda et al., 2013; Cernuda,
ughofer, Märzinger, & Kasberger, 2011; Serdio et al., 2017 ).
.2. Non-linear models based on soft computing architectures
The method described above can be used when one expects
o extract spatial linear relationships between the available vari-
ble. These methods are, however, unable of discovering the non-
inear relationships, as we encounter frequently in characterization
f complex shale reservoirs. Thus, in this section we demonstrate
he ability of non-linear models based on soft computing for ex-
racting such nonlinear relationships
The main purpose of machine leaning algorithms is to learn
rom the existing data and evidence, to be able to make predic-
ions. Thus, most of such approaches rely on the input data, which
s why they are classified as data-driven techniques. Such methods
ave been used widely in the earth science problems, including
ermeability/porosity estimation ( Karimpouli & Malehmir, 2015 ),
rade estimation ( Tahmasebi & Hezarkhani, 2011 ), contaminant
odeling ( Rogers & Dowla, 1994 ), predicting temperature distri-
ution and methane output in large landfills that are essentially
arge-scale porous media in which biodegradation occur ( Li et al.,
011; Li, Qin, Tsotsis, & Sahimi, 2012; Li, Tsotsis, Sahimi, & Qin,
014 ), etc. Due to the considerable complexity and unpredictable
patial correlations between the available data for shale reservoirs
n this paper, we use the fuzzy logic approach, first proposed by
adeh (1965 ), and combine it with two more machine learning
lgorithms, namely, neural networks (NNs) and the genetic algo-
ithm (GA) in order to increase the model’s predictive power. What
ollows is a brief description of the techniques.
.2.1. Fuzzy inference system
Fuzzy set theory, or fuzzy logic (FL), was first introduced by
adeh (1965) in which, unlike the Boolean logic that has two out-
omes, namely, true (1) or false (0), the true value of a variable
ay be between in between in [0, 1]. In other words, the FL allows
ne to define a partial set membership (see below). Thus, even if
set of vague, ambiguous, noisy, imprecise, and incomplete input
nformation is made available, the FL can still produce a valid re-
ponse.
Briefly, the FL tries to identify a connection between the in-
ut and output through defining some if-then rules, and is com-
osed of some elements that are described here. Typically, the in-
ut and output data are presented using a membership function ,
epresented by a curve. The curve attributes some weights to the
nput and defines how it can be mapped onto the output. The rules
re the instructions that define the conditional statements. They,
long with the membership functions, map the input onto the out-
ut. Finally, the interdependence of the input and output is defined
hrough some fuzzy preposition, or the so-called FL operators . A re-
ent comprehensive review on the FL is given by Gomide (2016) .
The aforementioned elements in the FL are typically difficult
o set. For example, membership functions, their distributions and
he compositions of the fuzzy rules strictly control the FL perfor-
ance, but are difficult to set. Thus, as the first solution, such pa-
ameters may be defined through an exhaustive trial and error.
n addition, prior knowledge and experience of the under-study
roblem can also help defining more accurate if-then rules. In fact,
ltra-tight shale gas reservoirs are currently under extensive stud-
es and, thus, some of their complexities are not even discovered
r understood yet. Thus, defining such elements for shales may
e very misleading. For this aim, the NNs are employed to alle-
iate such shortcomings and define the necessary parameters in a
ore meaningful way. Another solution can be use of the GA in
he FL ( Cordón, Gomide, Herrera, Hoffmann, & Magdalena, 2004 ).
t should be noted that there are many other techniques that deal
ith fuzzy systems extraction and learning from data ( Abonyi,
003; Lughofer, 2011; Pedrycz & Gomide, 2007 ), but they are not
iscussed in this paper.
438 P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447
a
t
a
fi
b
r
w
a
c
i
a
F
j
a
t
F
c
p
L
�
t
o
L
w
ω
L
o
ω
n
ω
∑
p
m
A
F
a
i
p
fi
r
d
f
t
e
a
e
i
i
2.2.2. Integrated neural networks and fuzzy logic
The NNs have been inspired by human brain, with a signif-
icantly less complexity, which can be used for modeling com-
plex and highly nonlinear problems whose solutions by the com-
mon statistical or analytical methods are not straightforward
( Bishop, 1995 ) or impossible to drive. Briefly, a NN is composed
of some hierarchical nodes called neurons, which are trained using
some data. The inputs are mapped onto the output space though
some weights (or functions) that are initially generated randomly.
Then, the weights are updated using the produced error in the
output. The process, called training, continues until the error is
smaller than a threshold. Next, the resulting NN, or more precisely
the mathematical functions derived by it, are used for making pre-
diction. Back propagation is one of the most popular learning al-
gorithms for the NN that is used in conjunction with the gradient
descent or the Levenberg–Marquardt method, two optimization al-
gorithms by which the weights are changed iteratively to minimize
the following error, in order to determine their optimal values:
E =
1
2
∑
i
( out p i − pr d i ) 2
(6)
where outp i is the actual output and prd i is the predicted value.
Minimizing E entails setting to zero its derivatives with respect to
the parameters w ij in order to decide how to vary w ij for two con-
nected nodes ( i, j ). Thus,
w i j := w i j + �w i j (7)
where
�w i j = −η∂E
∂ w i j
(8)
in which η is a learning rate. In order to prevent the optimization
method from being trapped in local minimum of E , as opposed to
its true global minimum, one can use the following approach. Sup-
pose that
ϕ j = − ∂E
∂N N j
(9)
and
ξ j = −∂N N j
∂ w j
(10)
then, �w ij is evaluated by
∂E
∂ w i j
=
∂NN
∂ w j
× ∂E
∂N N j
= ϕ j ξ j (11)
We then have
�w i j = ηϕ j ξ j (12)
Since,
∂E
∂ ξ j
= −( out p i − pr d i ) (13)
and
∂ξ
∂N N j
= f ′ (N N j
)(14)
ϕj is also evaluated by,
ϕ j =
∂E
∂N N j
=
∂E
∂ ξ j
× ∂ξ
∂N N j
= ( out p i − pr d i ) f ′ (N N j
)(15)
Thus, an approach based on integrating the NN and FL can ad-
dress the aforementioned issues in defining the latter’s parameters
( Jang, 1992, 1993 ). The integrated system benefits from the training
ability of the NN for minimizing the generated error in the output.
Thus, through the optimization loop in the NN the FL parameters
re tuned. In this fashion, the strengths of the FL qualification and
he NN quantification are efficiently integrated.
Suppose that n and 1 variables are available as the input ( inpt )
nd output ( out ), respectively, as is the case in this paper. Then, the
rst-order Sugeno fuzzy system ( Tanaka & Sugeno, 1992 ) is defined
y
j : If inp t 1 is �1 1 , inp t 2 is �
1 2 , . . . , and in p t j is �
j i , then
f r =
r ∑
i =1
n ∑
j=1
β j i inp t j + εi
(16)
here � j i
and β j i
are, respectively, the i th fuzzy membership set
nd the consequent parameter for the j th input. f r indicates the
onsequence part of the r th rule. In this study, the Sugeno model
s used in which the membership functions and weight adjustment
re denoted by premise and consequent parts, respectively; see
ig. 1 . The internal operations for the input mapping, weight ad-
ustment and consequent parameter integration are demonstrated
s well.
As already mentioned, the NN is integrated with the fuzzy sys-
em shown in Fig. 1 . The new integrated architecture is shown in
ig. 2 ( Jang, 1993 ). The resulting hybrid machine learning (HML)
ontains various elements in each layer that are as follow. The in-
uts are denoted by L 1 , which in this study are various well logs.
2 represents the layer that contains the membership functionsj
i , and measures the degree to which the inpt j satisfies the quan-
ifier:
utp j i = � j
i
(inp t j
)(17)
3 , the third layer, multiples the input data and produces the firing
eights for each rule r , as follow:
i = t
(ξ�1
1 , . . . , ξ
� j i
)(18)
4 , the fourth layer, normalizes each firing rule based on the sum
f all the previously calculated weights:
r =
ω i ∑ r i =1 ω i
(19)
Next, the calculated normalized firing strengths are used in the
ext layer, L 5 , to generate the adaptive function by
r f r = ω r
(β1
r inp t 1 + β2 r inp t 2 + . . . + βn
r inp t n + εr
)(20)
The overall output is then calculated in the sixth layer, L 6 , as
r
i =1
ω i f i =
∑ r i =1 ω i f i ∑ r
i =1 ω i
(21)
Finally, the response, calculated from the previous layer, ap-
ears in the last layer, L 7 .
Clearly, some of the aforementioned parameters, such as the
embership functions, are vital for an optimal hybrid network.
hybrid learning procedure is applied to the architecture of
ig. 2 that estimates the premise and the consequent parameters
nd, consequently, allows capturing the complexity and variabil-
ty between the input and output data. Thus, based on the back-
ropagation error, the parameters in the premise part are first kept
xed in order to compute the error. Next, by holding fixed the pa-
ameters in the consequent part, the premise’s parameters are up-
ated using the gradient-descent method. Finally, the membership
unctions and their consequent weights are optimized.
Although the NN assists the FL for optimal parameter selec-
ion, there are still some parameters that need to be optimized. For
xample, the NN cannot decide how many membership functions
re needed. In addition, due to using two user-dependent param-
ters – the learning rate and the momentum coefficient – the NN
tself may be trapped in a local minimum. It is worth mention-
ng that momentum adds a small friction to the previous weights
P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447 439
Fig. 1. Demonstration of first-order Sugeno fuzzy model along with various membership functions and inputs.
Fig. 2. The integrated neural-fuzzy architecture.
a
w
s
c
d
w
e
d
n
r
c
i
o
c
2
t
c
i
b
m
e
o
a
c
t
t
E
e
c
f
i
w
u
1
m
r
c
r
g
s
w
t
e
n
s
T
t
nd updates to the current state. This parameter prevents the net-
ork from converging to local minimum. A high value can over-
hoot the minimum and, likewise, a lower value cannot avoid lo-
al minima and it causes the computations to be too long. To ad-
ress such issues, a third method, namely, the GA, is integrated
ith the neural-fuzzy structure that helps determining the param-
ters. Briefly, due to presence of various user-dependent and time-
emanding parameters in the NN, such as its number of layers,
umber of membership functions in the FL and the learning pa-
ameters (e.g., the momentum and rate), the GA can be used to ac-
elerate the above trial-and-error procedure. In other words, defin-
ng the aforementioned parameters requires a considerable amount
f time, whereas the GA algorithm can efficiently and systemati-
ally identify the optimal values.
.2.3. The genetic algorithm
The GA is one of the most successful optimization algorithms
hat can deal with a variety of problems that are not feasible, both
omputationally and in terms of the accuracy, to be addressed us-
ng many other optimization techniques. For example, the GA can
e used when the objective function (the energy E to be mini-
ized) is discontinuous, non-differentiable, highly nonlinear, and
ven stochastic ( Goldberg & Holland, 1988 ). The technique is built
n an initial population of the candidates that mimic the process,
nd is inspired by natural selection. Each candidate has some spe-
ific properties, i.e., chromosomes, which can be modified, e.g. mu-
ated, based on the existing evidence. Thus, the aim is to modify
he initial population to reach an optimal solution that minimizes
globally. At each step, the chromosomes (the parameters to be
stimated in the present work) are randomly selected from the
urrent parents – the population. Then, “children” are generated
or the next population. After some initial trial and error, the GA
dentifies the potentially useful chromosomes, so that the children
ill produce smaller error.
The GA uses three types of rules to create the next generation
sing the current population, which are as follow ( Deb & Agrawal,
999 ). (i) Mutation, an operation that changes the value of chro-
osomes; (ii) selection , which chooses the parents from the cur-
ent children in order to produce the next generation, and (iii)
rossover, an operation that combines the chromosomes (the pa-
ameters) to increase the variability and prevent the energy E from
etting trapped in a local minimum.
In the first step, all the variables are represented by binary
trings of 0 and 1. Each chromosome indicates several genes by
hich various possible values for each free parameter, which needs
o be optimized, are examined. Thus, an initial population is gen-
rated that contains various parameters of the FL and NN, such as
umber of the inputs (i.e., well logs), the number of the member-
hip functions, the learning rate, and the momentum coefficient.
he candidate are tested and updated progressively by the GA, un-
il the optimal values are obtained.
440 P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447
Fig. 3. (a) The wireline well log dsata used in this study, and (b) mineral compositions from the XRD analysis.
w
v
K
w
o
s
d
t
3. Results and discussion
As discussed, due to the extensive variability of shale reservoirs,
an extensive amount of information is required for their character-
ization and, hence, it helps to reduce the uncertainty and improve
real-time recovery operations. Thus, since well logs provide useful
information, and at the same time are widely available, the ob-
jective of this study is to use such data to predict two important
properties of shales, namely, the TOC and FI. Clearly, none of the
ell logs can by itself predict the two indices, as there is always
ery complex correlation between the well logs ( Dashtian, Jafari,
oohi Lai, Masihi, & Sahimi, 2011; Karimpouli & Tahmasebi, 2016 ),
hich in some cases are mixed with noise. Another important goal
f this study is to develop a methodology for deciding the neces-
ary logs that must be run. In other words, significant reduction in
ata acquisition efforts can be achieved, once the correlation be-
ween the two indices and the available data is identified. Together,
P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447 441
Fig. 4. Mineralogical composition of the samples used in the analysis.
t
s
(
i
P
f
c
t
t
T
b
p
(
F
w
s
a
3
t
t
e
h
r
(
p
e
v
o
a
(
t
a
r
t
i
t
t
t
w
e
t
C
n
d
u
w
t
t
b
a
m
F
a
c
b
T
3
d
a
a
T
r
(
u
s
i
a
t
I
o
m
a
l
n
l
l
e
h
b
t
m
o
a
v
g
he steps described below quantify the two important properties of
hale reservoirs in real time.
There are several ways of defining and measuring brittleness
Zoback, 2007 ). One of the most common methods is based on us-
ng the elastic properties, such as the Young’s modulus and the
oisson’s ratio. The latter can be thought of as the rock ability to
ail under stress, whereas the ability of rock to resist fracturing
an be quantify using the Young’s modulus. Rickman, Mullen, Pe-
re, Grieser, and Kundert (2008) proposed equations for estimating
he brittleness index using well logs.
In this paper, the FI is defined using mineralogical contents.
his is motivated by the fact that fracability is mostly controlled
y hard mineral contents of rock, such as quartz, calcite and
yrite and, thus, the XRD of such minerals are used as follows
Jarvie, Hill, Ruble, & Pollastro, 2007 ):
I =
Qz
Qz + Cal + Cly (22)
here Qz, Cal and Cly are the quartz, calcite and clay contents, re-
pectively. The FI defined by Eq. (22) , along with the TOC was used
s the dependent (i.e., target) parameters.
.1. Linear regression models based on statistical technique
The mineralogical analysis and geochemical data for defining
he FI were used; their distribution is depicted in Fig. 4 , where
he sum of the quartz, clay and carbonate minerals is displayed on
ach vortex. The plot indicates that the content of the samples has
igher carbonate.
On the other hand, well-log data are also available. The spatial
elationships between the mineralogy, the FI and the gamma ray
GR) log are presented in Fig. 5 . As expected, the GR log and FI
ositively correlate with the Qz content.
Before applying the MLR analysis, one might also check the lin-
ar dependence (i.e., pairwise correlation) between the dependent
ariables FI and TOC, on the one hand, and the available well logs,
n the other hand. Such a relationship was studied, and the results
re presented in Table 1 . The correlation between the FI and sonic
DT ), density ( PE) , resistivity ( RT10) and Ur logs are stronger than
he other logs, whereas the overall correlation between the TOC
nd the well logs is strong. For example, there is considerable cor-
elation between the TOC, GR and DT. Strong correlation between
he TOC and GR is expected, since most of the gas in the samples
s trapped in the organic matters.
One can also investigate more complex correlations between
hree variables. Such correlations are shown in Figs. 6 and 7 for
he FI and TOC evaluations, respectively. A weak correlation be-
ween the first and second variables and the FI can be seen. Such
eak correlations are not sufficient when fitting a 3D surface. For
xample, a considerable number of mismatches between the fit-
ed surfaces and the selected variables are visible in both cases.
learly, the correlations with one/two variables do not reveal sig-
ificant information, and cannot be relied upon. Thus, in order to
evelop a more robust and accurate model, the proposed MLR is
sed, which optimally selects the informative variables. The step-
ise algorithm is applied to select the appropriate variables among
he initial well-log data to estimate the TOC and FI. For this aim,
he threshold F in for entering any new variable was considered to
e 0.1 with a confidence interval of 90%. As for removing the vari-
ble, F out was also set to 0.15. Thus, the following model for esti-
ating the FI was obtained:
I = 0 . 0 04 P E + 0 . 0 01 T hor + 0 . 032 (23)
The correlation coefficient for Eq. (23) is 0.44, which is very low
nd indicates that the MLR is not able to accurately predict the FI.
On the other hand, the equation for the TOC has a correlation
oefficient of 0.88, hence providing reliable predictions for the TOC
ased on the well-log data, and is given by
OC = 0 . 016 GR + 0 . 087 DT − 1 . 380 K − 4 . 020 (24)
.2. Non-linear models based on soft computing architectures
Before feeding the input data to the proposed HML method, the
ata are normalized to prevent any superficial scaling effect. Thus,
ll the data were normalized between zero and one. The well logs
re used as the input, while the TOC and FI represent the output.
o test the performance of the proposed method, the data were
andomly divided into two groups of training (80%) and testing
20%). The training data should be selected carefully to cover the
pper and lower possible boundaries of the input space.
Due to its flexibility for controlling the shape, the Bell member-
hip function,
f ( x ; a, b, c ) =
1
1 +
∣∣ x −c a
∣∣2 b (25)
nstead of the Gaussian distribution, was used because it can be
djusted easily when the proposed hybrid network needs to tune
he variability space between the well-log and the TOC/FI output.
n Eq. (25) , a, b , and c are three parameters that can be varied in
rder to change the shape of the function. In this paper, the mo-
entum and pure-line axon were used as the learning algorithm
nd the transfer function, respectively. Clearly, the output of pure-
ine function is the same as the input. Some other parameters that
eed to be set include the step size and the momentum rate as the
earning variables and the input processing elements as the topo-
ogical parameter. Such parameters can be obtained using trial and
rror, which is expensive computationally. The GA algorithm can,
owever, provide optimal solutions, so as to avoid long and cum-
ersome steps.
An important feature of the GA is its chromosomes (parame-
ers in the present study) that are composed of genes. The chro-
osomes are each composed of four genes that are the number
f inputs, membership functions for each input, the momentum,
nd the learning rate, which is the same as the number of the
ariables that should be optimized. First, some random values are
enerated and assigned to each chromosome. Then, the network
442 P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447
Fig. 5. Spatial distribution of (a) the FI, and (b) the GR, based on the mineralogical information.
Table 1
Correlation between the FT, TOC and well-log data.
Well-log/Dep. Var GR Rhob Nphi DT PE RT10 SP Ur Thor K
FI −0.014 0.245 −0.34 0.458 0.443 −0.413 −0.260 0.443 0.366 0.179
TOC 0.736 −0.563 0.703 0.745 −0.675 −0.266 0.428 0.538 0.597 0.410
Fig. 6. (a) and (b) represent the scatter plots with the third variable, the FI, while (c) and (d) show 3D fitted surface using the available data.
P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447 443
Fig. 7. Same as Fig. 6 , but for the TOC.
i
m
M
w
a
o
m
t
t
i
a
a
t
s
a
w
0
i
u
g
o
F
t
g
a
T
f
m
G
Table 2
Comparison of MSEs for the HML and MLR
method for estimating the TOC and FI.
TOC FI
Method HML MLR HML MLR
MSE 0.08 0.43 0.12 1.12
t
s
(
(
s
e
r
t
a
i
d
r
a
t
M
h
F
i
F
s evaluated and the fitness function, represented by the following
ean-squares error (MSE), is computed for the chromosomes:
SE =
1
n
n ∑
i =1
( T i − P i ) 2 (26)
here T i and P i and the target and predicted values, respectively,
nd n indicates the number of the training data. The MSE can take
n any positive value, of course.
Next, the chromosomes (parameters) are updated based on the
agnitude of the last computed MSE to obtain a better fit. Thus,
he proposed MSE, Eq. (26) , is evaluated in each successful run of
he training. The individuals are selected, combined, and mutated
n such a way that they minimize the error. Then, the new parents
re replaced with the previous erroneous ones that help produce
more accurate generation. The steps that result in the final op-
imal network need to be repeated several times. The algorithm is
ummarized in Fig. 8 .
The GA parameters were set in variable ranges. For example,
fter some preliminary calculations the range of the crossover rate
as set between 0.05 and 0.95, while the mutation was between
.01 and 0.3. Each of the probabilities was tested in a round of 54
terations and the optimal values were used in the GA. The pop-
lation size was also set to be between 12 and 60. Using the al-
orithm, several networks with various configurations of the GA
perators were tested in order to set the final optimal parameters.
or example, as demonstrated in Fig. 9 , the optimal networks for
he TOC and FI were obtained, respectively, in the 25th and 17th
enerations. Unlike the MLR, the GA algorithm recognized that not
ll the input parameters have to be incorporated in the model.
herefore, the final models are based on the contributions of a
ew of the welllog data. The resulting networks contain the opti-
al parameters since various configurations were tested using the
A. Such models must achieve the lowest estimation error. Thus,
he following parameters were determined to provide the optimal
olution:
i) For the FI evaluation: a mutation rate of 0.15, a network with
five nodes in the MFs layer, a learning rate of 0.55, and a mo-
mentum of 0.64.
ii) For the TOC evaluation: a mutation rate of 0.12, a network with
four nodes in the MFs layer, a learning rate of 0.62, and a mo-
mentum of 0.71.
The results of application of the MLR and HML algorithms are
hown in Fig. 10 . They both provide acceptable results for the TOC
stimation, but the HML delivers predictions that are more accu-
ate. To fit the TOC curve, the MLR tries to mimic the overall dis-
ribution of the selected variables such as, for example, the GR
nd DT logs, whereas the HML represents a different form, and
ts predictions are in very good agreement with the GR log, in-
icating that the proposed TOC evaluation is physically valid. The
esults of the FI estimation, however, represent very different vari-
tions, which are mainly due to the very complex correlation be-
ween the FI and the well logs (see also Table 1 ). For example, the
LR method represents a strongly biased FI curve. On the other
and, the HML method determines a more accurate curve for the
I evaluation. The results are numerically compared using the MSE
n Table 2 .
The distribution of the GR log versus FI is presented in
ig. 11 (a), which indicates a direct relationship between the two
4 4 4 P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447
Fig. 8. Flowchart of the implemented HML algorithm. The data are first analyzed and then fed to the FL. The NN and GA are integrated to train and optimize the new hybrid
system.
Fig. 9. The MSE variation versus generation for evaluation of (a) the TOC, and (b) the FI.
4
q
c
c
o
m
w
r
d
v
n
u
n
a
v
z
variables and the TOC. Indeed, as can be seen, the TOC increases
when both the GR and FI do. It also indicates significant positive
correlations between the GR log and the FI. Furthermore, since the
MLR method recognized strong correlation between the TOC and
GR, DT and K (permeability), a more informative multidimensional
representation of such relationship is depicted in Fig. 11 (b). The
axes represent the GR, DT and the K logs. The other two variables,
namely, the TOC and FI, are shown using the size and color, re-
spectively. Simultaneous variation of the size and color indicates
the spatial dependence of the selected variables.
For the purpose of better identification of the fracable zones,
the same practice was implemented on the GR log and the TOC
against the FI. Then, the FI was clustered into four groups us-
ing the k-means method ( MacQueen, 1967 ). The classification was
performed using the information on the GR log, the FI and the
TOC. Thus, one may use the well-log data and instantly determine
whether the area under development is fracable or not. The accu-
racy of the method can also be verified in Fig. 13 as the probability
curves represent very different distribution of the FI data.
f
. Summary and conclusions
Due to their highly complex structures, shale-gas reservoirs re-
uire very accurate modeling. Vertical and lateral variability ne-
essitate more drilling, which consequently leads to significant in-
rease in the cost of the operations. Wireline well logging is one
f the most accessible and affordable approaches to continuously
onitor such complexities. Shale-gas repositories are associated
ith extensive/big data. Obviously, analysis of the uncertainty and
isk assessment for future development would benefit from big
ata. However, processing and analyzing such data require ad-
anced techniques.
Aside from acquiring more informative data using various tech-
iques, real-time data integration and modeling is still a significant
nsolved problem. Together, both the tools and data analysis tech-
iques aim at identifying the sweet spots that are potentially favor-
ble for successful and economic development of shale-gas reser-
oir. In shale-gas reservoirs ‘sweet spots’ are mostly defined as the
ones that contain high total organic carbon (TOC) and are easily
racable, i.e., high fracable index (FI). In this paper, two methods
P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447 445
DM.
mr oN
XRD RT10[5 340]
GR[9 154]
Nphi[2 20]
Dt[48 90]
U[0.75 7.5]
PE[3.5 6]
HML (TOC)[0.24 4.36]
HML (FI)[0 1]
MLR (TOC)[0.24 4.36]
MLR (FI)[0 1]
Qz Carb. Clay
Fig. 10. Evaluation of the TOC and FI using the HML and MLR methods. The core analysis data for each dataset are shown as dots.
Fig. 11. (a) Spatial correlation between the GR (the X axis), FI ( Y axis) and TOC (the third axis represented by color). (b) Spatial multivariate correlation between the GR, Dt,
K, TOC (size of the spheres) and FI (color of the spheres). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of
this article.)
w
i
t
a
o
s
i
fi
c
i
l
t
n
w
o
w
b
n
c
p
i
T
t
“
t
m
m
p
c
a
t
w
l
f
o
t
c
r
q
t
a
r
t
a
w
p
b
r
w
c
d
O
ere presented for simultaneously dealing with such big data and
dentifying the sweet spots.
The first method was based on data mining techniques that try
o determine the spatial correlations between the variables using
n efficient stepwise algorithm. The method fits a surface based
n the well-log data and predicts both the TOC and FI. The final
urface is based only on switching the input parameters and test-
ng various coefficients. Clearly, however, the method is not able to
rst learn and then adaptively change the coefficients to fit a more
omplex and higher-order surface.
As an alternative, fuzzy logic, a useful method in machine learn-
ng techniques was selected since it can better process and ana-
yze the complex spatial relationship between the shale data. The
echnique is, however, based on some user-dependent rules that
eed a vast dataset for shales, their geological settings, wireline
ell-log data interpretation, and so on, which give rise to another
bstacle. From the same school of machine learning, neural net-
orks, due to their excellent performance in providing a learning
asis, were used to assists defining the knowledge-based rules. The
ew technique works very well when the input data are not very
omplex and with high dimensions. Furthermore, it also has some
arameters that must be set carefully, such as the number of the
nput data, the number of MFs, and learning-related parameters.
he data used in this study are, however, highly complex and mul-
idimensional that require extensive trial and error calculation if
brute-force computations” are to be used. Thus, the method is in-
egrated with the genetic algorithm in order to avoid such time de-
anding and cumbersome parameter adjustment. The new hybrid
achine learning (HML) technique was developed, and shown to
rovide much more accurate predictions for the TOC and FI, when
ompared with those of the MLR method, and are in very good
greement with the available experimental data for both proper-
ies.
The proposed methods in this study can still not capture the
hole complexity of shale reservoirs as they manifest highly non-
inear behavior. For example, the HML can hardly follow the trends
or the TOC and fracable index. The soft computing method, on the
ther hand, requires knowledge on various aspects of artificial in-
elligence. Aside from all such weaknesses, for example, the soft
ommuting method was designed in such a way to minimize the
equired knowledge. These techniques can be used when one re-
uires use of minimum amount of information in order to iden-
ify the sweet spots. The HML method requires least possible effort
nd user knowledge for shale reservoir characterization and can be
apidly updated using new data. The other available techniques for
he application of this study, however, are mostly based on trial
nd error. They require vast amount of knowledge in geology and
ell-log interpretation, whereas the proposed methods in this pa-
er minimize such necessities.
The implication of the present work is clear. Methods that are
ased on trial and error are not useful for characterization of shale
eservoirs, and increasing the amount of data, which can be costly,
ill not improve their performance. On the other hand, advanced
omputational and analysis techniques of the type that we have
iscussed and utilized in this paper offer at least two advantages.
ne is that one does not blindly and based on ad-hoc approaches
446 P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447
Fig. 12. (a) Spatial correlation between the GR ( X axis), TOC ( Y axis) and FI (the third axis represented by color). (b) Graphical representation of the k-means clustering. (c)
3D view of clustering of the well-log data into four groups of brittle, medium brittle, ductile and medium ductile. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
Fig. 13. Density distribution of the clustered well-logs based on the FI data. Three
distinct probabilities indicate various distributions of the FI data.
b
v
s
p
v
A
f
N
o
o
a
C
E
fi
i
c
w
R
A
B
C
try to develop correlations between the various variables. The sec-
ond advantage of such methods is that they identify the most per-
tinent variables, so that if one is to invest more resources in or-
der to collect more data, one can concentration on measuring and
gathering information about the most relevant variables, hence re-
ducing the financial burden of such operations.
As the future work, one might consider bringing more geolog-
ical information and defining more informative inputs. There are
some geological signatures that are not reordered in the logging,
ut can be extracted from other study, such as geophysical sur-
eys. Developing a more robust definition for fracbility can also be
tudied further. As another future avenue, one can use the elastic
roperties, instead of the XRD data, which are more feasible and
ersatile.
cknowledgements
PT thanks the financial support from the University of Wyoming
or this research. FJ would like to thank the support from
ano Geosciences lab and the Mudrock Systems Research Lab-
ratory ( MSRL) consortium at the Bureau of Economic Geol-
gy, The University of Texas at Austin . MSRL member companies
re Anadarko, BP , Cenovus, Centrica, Chesapeake, Cima, Cimarex,
hevron , Concho, ConocoPhillips , Cypress, Devon, Encana, Eni , EOG,
XCO, ExxonMobil, Hess, Husky, Kerogen, Marathon , Murphy, New-
eld, Penn Virginia, Penn West, Pioneer , Samson, Shell , Statoil , Tal-
sman, Texas American Resources, The Unconventionals, U.S. Inter-
rop, Valence, and YPF. XRD data provided by Dr. H. Rowe and
ireline well-log data provided by R. Eastwood.
eferences
bonyi, J. (2003). Fuzzy model identification. In Fuzzy model identification for
control (pp. 87–164). Birkhäuser Boston: Boston, MA . https://doi.org/10.1007/978- 1- 4612- 0027- 7 _ 4
ishop, C. M. (1995). Neural networks for pattern recognition . Clarendon Press .
ernuda, C., Lughofer, E., Hintenaus, P., Märzinger, W., Reischer, T., Pawliczek, M.,et al. (2013). Hybrid adaptive calibration methods and ensemble strategy for
prediction of cloud point in melamine resin production. Chemometrics and In-telligent Laboratory Systems, 126 , 60–75. https://doi.org/10.1016/j.chemolab.2013.
05.001 .
P. Tahmasebi et al. / Expert Systems With Applications 88 (2017) 435–447 447
C
C
D
D
E
E
G
G
H
J
J
J
K
K
L
L
L
L
M
M
P
R
R
S
S
S
T
T
T
T
T
T
T
T
T
T
T
T
Z
Z
ernuda, C., Lughofer, E., Märzinger, W., & Kasberger, J. (2011). NIR-based quantifi-cation of process parameters in polyetheracrylat (PEA) production using flexi-
ble non-linear fuzzy systems. Chemometrics and Intelligent Laboratory Systems,109 (1), 22–33. https://doi.org/10.1016/j.chemolab.2011.07.004 .
ordón, O., Gomide, F., Herrera, F., Hoffmann, F., & Magdalena, L. (2004). Ten yearsof genetic fuzzy systems: Current framework and new trends. Fuzzy Sets and
Systems, 141 (1), 5–31. https://doi.org/10.1016/S0165-0114(03)00111-8 . ashtian, H., Jafari, G. R., Koohi Lai, Z., Masihi, M., & Sahimi, M. (2011). Analysis
of cross correlations between well logs of hydrocarbon reservoirs. Transport in
Porous Media, 90 (2), 445–464. https://doi.org/10.1007/s11242-011-9794-x . eb, K. , & Agrawal, S. (1999). Understanding interactions among genetic algorithm
parameters. In Foundations of Genetic Algorithms, 5 (5), 265–286 . dwards, A. W. F. (1969). Statistical methods in scientific inference. Nature,
222 (5200), 1233–1237. https://doi.org/10.1038/2221233a0 . fromyson, M. A. (1960). Multiple regression analysis. Mathematical methods for dig-
ital computers . NewYork: John Wiley and Sons, Inc .
oldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine learning.Machine Learning, 3 (2/3), 95–99. https://doi.org/10.1023/A:1022602019183 .
omide, F. (2016). Fundamentals of fuzzy set theory. In Handbook on com-putational intelligence (pp. 5–42). WORLD SCIENTIFIC . https://doi.org/10.1142/
9789814675017 _ 0 0 01 aitovsky, Y. (1968). Missing data in regression analysis. Journal of the Royal Statis-
tical Society. Series B (Methodological), 30 (1), 67–82. Retrieved from http://www.
jstor.org/stable/2984459 . ang, J.-S. R. (1992). Self-learning fuzzy controllers based on temporal backpropa-
gation. IEEE Transactions on Neural Networks, 3 (5), 714–723. https://doi.org/10.1109/72.159060 .
ang, J.-S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEETransactions on Systems, Man, and Cybernetics, 23 (3), 665–685. https://doi.org/
10.1109/21.256541 .
arvie, D. M., Hill, R. J., Ruble, T. E., & Pollastro, R. M. (2007). Unconventionalshale-gas systems: The Mississippian Barnett Shale of north-central Texas as
one model for thermogenic shale-gas assessment. AAPG Bulletin, 91 (4), 475–499.https://doi.org/10.1306/12190606068 .
arimpouli, S., & Malehmir, A. (2015). Neuro-Bayesian facies inversion of prestackseismic data from a carbonate reservoir in Iran. Journal of Petroleum Science and
Engineering, 131 , 11–17. https://doi.org/10.1016/j.petrol.2015.04.024 .
arimpouli, S., & Tahmasebi, P. (2016). Conditional reconstruction: An alternativestrategy in digital rock physics. Geophysics, 81 (4), D465–D477. https://doi.org/10.
1190/geo2015 . i, H., Qin, S. J., Tsotsis, T. T., & Sahimi, M. (2012). Computer simulation of gas
generation and transport in landfills: VI – Dynamic updating of the model us-ing the ensemble Kalman filter. Chemical Engineering Science, 74 , 69–78. https:
//doi.org/10.1016/j.ces.2012.01.054 .
i, H., Sanchez, R., Joe Qin, S., Kavak, H. I., Webster, I. A., Tsotsis, T. T., et al.(2011). Computer simulation of gas generation and transport in landfills. V:
Use of artificial neural network and the genetic algorithm for short- and long-term forecasting and planning. Chemical Engineering Science, 66 (12), 2646–2659.
https://doi.org/10.1016/j.ces.2011.03.013 . i, H., Tsotsis, T. T., Sahimi, M., & Qin, S. J. (2014). Ensembles-based and GA-based
optimization for landfill gas production. AIChE Journal, 60 (6), 2063–2071. https://doi.org/10.1002/aic.14396 .
ughofer, E. (2011). Evolving fuzzy systems - Methodologies, advanced concepts
and applications : 266. Berlin, Heidelberg: Springer . https://doi.org/10.1007/978- 3- 642- 18087- 3 .
acQueen, J. (1967). Some methods for classification and analysis of multivariate ob-servations . The Regents of the University of California .
ontgomery, D. C. , Peck, E. A. , & Vining, G. G. (2012). Introduction to linear regressionanalysis . Wiley .
edrycz, W. , & Gomide, F. (2007). Fuzzy systems engineering : Toward human-centriccomputing . John Wiley .
ickman, R., Mullen, M. J., Petre, J. E., Grieser, W. V., & Kundert, D. (2008). A practi-cal use of shale petrophysics for stimulation design optimization: All shale plays
are not clones of the Barnett Shale. SPE annual technical conference and exhibi-tion . Society of Petroleum Engineers . https://doi.org/10.2118/115258-MS
ogers, L. L., & Dowla, F. U. (1994). Optimization of groundwater remediation us-ing artificial neural networks with parallel solute transport modeling. Water Re-
sources Research, 30 (2), 457–481. https://doi.org/10.1029/93WR01494 .
eber, G. A. F. , George, A. F. , & Lee, A. J. (2003). Linear regression analysis . Wiley-In-terscience .
erdio, F., Lughofer, E., Zavoianu, A.-C., Pichler, K., Pichler, M., Buchegger, T., et al.(2017). Improved fault detection employing hybrid memetic fuzzy modeling and
adaptive filters. Applied Soft Computing, 51 , 60–82. https://doi.org/10.1016/j.asoc.2016.11.038 .
ullivan Glaser, K. K. , Johnson, Brian Toelle, G. M. , Kleinberg, R. L. , Miller Kuala
Lumpur, P. , Wayne Pennington, M. D. , Lee Brown, A. , et al. (2013). Seeking thesweet spot: Reservoir and completion quality in organic shales. Oilfield Review,
25 , 16–29 . ahmasebi, P. (2017). Structural adjustment for accurate conditioning in large-scale
subsurface systems. Advances in Water Resources, 101 . https://doi.org/10.1016/j.advwatres.2017.01.009 .
ahmasebi, P., & Hezarkhani, A. (2011). Application of a modular feedforward neural
network for grade estimation. Natural Resources Research, 20 (1), 25–32. https://doi.org/10.1007/s11053- 011- 9135- 3 .
ahmasebi, P., & Sahimi, M. (2015). Reconstruction of nonstationary disordered ma-terials and media: Watershed transform and cross-correlation function. Physical
Review E, 91 (3), 32401. https://doi.org/10.1103/PhysRevE.91.03240 . ahmasebi, P., Javadpour, F., & Sahimi, M. (2015a). Multiscale and multiresolution
modeling of shales and their flow and morphological properties. Scientific Re-
ports, 5 , 16373. https://doi.org/10.1038/srep16373 . ahmasebi, P., Javadpour, F., & Sahimi, M. (2015b). Three-dimensional stochastic
characterization of shale SEM images. Transport in Porous Media, 110 (3), 521–531. https://doi.org/10.1007/s11242-015-0570-1 .
ahmasebi, P., Javadpour, F., & Sahimi, M. (2016a). Stochastic shale permeabilitymatching: Three-dimensional characterization and modeling. International Jour-
nal of Coal Geology, 165 , 231–242. https://doi.org/10.1016/j.coal.2016.08.024 .
ahmasebi, P., Javadpour, F., Sahimi, M., & Piri, M. (2016b). Multiscale study forstochastic characterization of shale samples. Advances in Water Resources, 89 ,
91–103. https://doi.org/10.1016/j.advwatres.2016.01.008 . ahmasebi, P., & Sahimi, M. (2016a). Enhancing multiple-point geostatistical mod-
eling: 1. Graph theory and pattern adjustment. Water Resources Research, 52 (3),2074–2098. https://doi.org/10.1002/2015WR017806 .
ahmasebi, P., & Sahimi, M. (2016b). Enhancing multiple-point geostatistical model-
ing: 2. Iterative simulation and multiple distance function. Water Resources Re-search, 52 (3), 2099–2122. https://doi.org/10.1002/2015WR017807 .
ahmasebi, P., Sahimi, M., & Andrade, J. E. (2017). Image-based modeling ofgranular porous media. Geophysical Research Letters . https://doi.org/10.1002/
2017GL073938 . ahmasebi, P., Sahimi, M., Kohanpur, A. H., & Valocchi, A. (2016c). Pore-scale simula-
tion of flow of CO2 and brine in reconstructed and actual 3D rock cores. Journalof Petroleum Science and Engineering . https://doi.org/10.1016/j.petrol.2016.12.031 .
anaka, K., & Sugeno, M. (1992). Stability analysis and design of fuzzy con-
trol systems. Fuzzy Sets and Systems, 45 (2), 135–156. https://doi.org/10.1016/0165-0114(92)90113-I .
adeh, L. A. (1965). Fuzzy sets. Information and Control, 8 (3), 338–353. https://doi.org/10.1016/S0019- 9958(65)90241- X .
oback, M. D. (2007). Reservoir geomechanics . Cambridge University Press .