cumulative probability p90 p50 p10_2

Cooper Energy Investor Series

Cumulative Probability – P90, P50, P10

The terms P90, P50 and P10 are occasionally used by persons when they are discussing volumes of hydrocarbons. But what do these terms actually mean?

Contrary to what you may have heard or understood, P90 does NOT mean that the volume estimate under discussion has a 90% chance of occurring. Similarly, P50 and P10 do NOT mean that those volume estimates have a 50% or 10% chance of occurring respectively. So what do they mean?

To understand what these terms mean you have to understand statistical theory and how hydrocarbon volumetric estimates are prepared.

Most high school graduates will be familiar with a normal frequency distribution. This is shown diagrammatically below:

Figure 1: Normal Distribution f(x,0,1)

1

‐1.28σ(P90)

f(x) ~ 0.18

f(x) ~ 0.4

0σ(P50)

Figure 1 is very off-putting because it looks very mathematical.

The easiest way to understand a frequency distribution is this: imagine a tree in your garden; if you had the time, and the inclination, you could go out and measure the length of every leaf on the tree. Some leaves will be long, some leaves will be short and some leaves will be medium. As you are measuring the leaves you put them into five cups depending on the size of the leaf. The first cup is for the smallest leaves; the last cup is for the largest leaves. Once you have pulled all the leaves off the tree the cups (or bins to use the correct statistical term) will look like the picture below:

Figure 2: Leaves on a tree example

2

Num

ber of Leaves of th

at size

50

200

500

150

50

As you can see, the medium sized leaves are most common while the very small and very large leaves are least common. In nature things tend to group around a central common size or point.

What you have done is describe the uncertainty of the leaf sizes by a 5-bin frequency distribution. If you look at Figure 2, you should be able to see that the shape at the top of the green boxes is similar to the shape of the red line in Figure 1. Figure 1 is known as a continuous distribution (the line flows continuously) – think of it as a distribution with a very large number of bins. Figure 2 is a discrete distribution (it has a discrete number of bins that capture the number of leaves that fall within a certain size range).

We can do this exercise for every measureable thing and create a frequency distribution.

There are other (non-normal) frequency distribution shapes possible but these do not need to be discussed here. If you understand a normal frequency distribution then that’s all you need to know for the time being.

Now that you understand frequency distributions, what’s P90 etc?

With the frequency distribution that we just created we can add up all the numbers from one end and create a CUMULATIVE frequency distribution. It’s just another way of showing the data. Using the leaf example, if we start adding up the leaves from the biggest end and work our way to the smallest end we end up with the following:

Figure 3: Leaves on a tree example

3

Cumulative Leaves from

biggest to sm

allest

950900

700

20050

So how does that help us? Well we can say things like: there are 900 leaves bigger than the smallest leaves or there are 200 leaves bigger than the medium size leaves.

We can do the same exercise with the continuous frequency distribution in Figure 1 and we end up with the following continuous cumulative frequency distribution:

Figure 4: Upper Cumulative Distribution Q(x,0,1)

4

‐1.28σ(P90)

Q(x) = 0.5

Q(x) = 0.9

0σ(P50)

Although this looks terribly mathematical, it’s similar to the graph you have just produced with the leaf example. The main difference is that the numbers on the Y-axis (or vertical axis) have been divided by the biggest number at the end thereby normalising the axis to 100%. You should be able to see that the shape described by the top of the green boxes in Figure 3 looks very similar to the shape of the red line in Figure 4. Figure 4 looks smoother than Figure 3 because Figure 4 was created from the smooth continuous distribution in Figure 1.

Since the Y-axis in Figure 4 has been normalised to 100% we can read off the estimates that correspond to the 90%, 50% and 10% cumulative frequency. These estimates are usually termed the P90, P50 and P10 confidence levels.

Using Figure 4, the estimate at the P90 confidence level is -1.28 and the estimate at the P50 confidence level is 0. It’s just the way the scale is presented – it has been normalised to zero at the middle.

As per the leaves example, P90 means that 90% of the estimates (or outcomes) are expected to be bigger than this estimate. P50 means that 50% of the estimates (or outcomes) are expected to be bigger than this estimate. This is NOT the same as the chance of that estimate occurring.

The chance of a single estimate occurring can be read off Figure 1. If we ask the question a different way: from Figure 4, what is more likely to occur more frequently - P90 or P50? To help you, you can’t actually answer the question from the cumulative frequency distribution (Figure 4) and you will need to jump from the cumulative frequency curve (Figure 4) back to the frequency distribution (Figure 1).

An easier way to understand the question would be to use the leaf example, assume P90 is the same as the small leaves and P50 is the same as the medium leaves. So the question becomes: what is more likely to occur – the small leaves or the medium leaves? The

medium leaves are more likely to occur of course. So in a normal distribution, the P50 value is more likely to occur than the P90 value.

In simple general terms, that is why P50 is sometimes also known as the best estimate because it’s the estimate that occurs more frequently.

So how does this help you to understand oil and gas estimates?

An oil or gas estimate is calculated by multiplying together a number of parameters, for example:

Oil in place equals rock volume of the reservoir multiplied by porosity multiplied by oil saturation (there are actually a lot more input variables but let us keep it simple for now).

Rock volume, porosity and oil saturation are measureable things. There is however uncertainty surrounding the measurement of those parameters. To cater for this uncertainty we describe the input parameters by continuous frequency distributions. If we then multiply all the input frequency distributions together (a computer does this for us), the output, oil in place, ends up as a frequency distribution. We can then take this oil in place frequency distribution and create an oil in place cumulative frequency distribution. This is shown diagrammatically as follows:

Figure 5: Oil in place calculation and estimation

5

Oil in place=Rock Volume x Porosity x Oil Saturation

Multiply the frequency distributions together to obtain a frequency distribution and then create a cumulative frequency distribution.

From the frequency distribution we can read off the P90, P50 and P10 confidence levels.

In summary, to create an oil volume distribution:

Step 1: create continuous frequency distributions for each input parameters (rock volume, porosity, oil saturation).

Step 2: multiple the input parameters together (using a computer) and create an oil in place continuous frequency distribution.

Step 3: take the oil in place continuous frequency distribution and create an oil in place continuous cumulative frequency distribution.

Step 4: From the oil in place continuous cumulative frequency distribution read off the estimate sizes that correspond to the P90, P50 and P10 confidence levels.

So now that you understand frequency distributions, cumulative frequency distributions and how we use them to create volumetric estimates you should be able to answer a few questions:

Question 1: What does P90 mean?

Answer: It means that 90% of the calculated estimates are bigger than the P90 estimate.

Question 2: Is the P90 estimate or the P50 estimate more likely to occur?

Answer: P50 is more likely to occur because the estimate is expected to occur more often than the P90 estimate in the frequency distribution.

Question 3: What’s the most important number – P90, P50 or P10?

Answer: P50 is the most important number because it’s the best estimate. P90 and P10 just show the range in the uncertainty of the estimate.

Question 4: Am I more confident in the P90 estimate or the P50 estimate?

Answer: You are more confident in the P90 estimate. As 90% of the estimates are greater than the P90 estimate you would more confident that the final actual outcome will be greater than the P90 estimate than greater than the P50 estimate. Recall that only 50% of the estimates are greater than the P50 estimate. This doesn’t mean that the P90 estimate has a higher chance of occurring, as explained above, all it means is that you have a higher confidence in that estimate being exceeded by the actual outcome. This can be a difficult concept to grasp. An easier way to think about it may be to say “I’m confident that the actual outcome will be greater than my P90 estimate but overall I expect that the final outcome will be closest to my P50 estimate”.

Question 5: Does everybody do frequency distribution (or probabilistic) calculations?

Answer: No. Some people just multiply single values (deterministic best estimates) together to calculate a single output estimate, not a frequency distribution.

Question 5a: So how can you tell the confidence of that single output estimate?

Answer: You can’t, it’s a single best estimate. You just have to use it as it is calculated.

Note that in the example above we only calculated the oil in place. We can go one step further and calculate the recoverable oil. Recoverable oil equals oil in place multiplied by a recovery factor. For the recovery factor we can create a frequency distribution like all other input parameters. Multiplying the oil in place frequency distribution by the recovery factor frequency distribution we end up with a recoverable oil frequency distribution and then we can convert this to a cumulative frequency distribution and read off the P90, P50 and P10 estimates. All the same concepts as discussed above apply.

Question 6: How do you create frequency distributions for all the variables?

Answer: You go to university for 4-5 years and become a geophysicist, geologist, petrophysicist or reservoir engineer, you get a good job with an oil and gas company, you get trained over 5-10 years, you gain a lot of experience and knowledge in earth sciences and the physics of oil and gas moving through rocks and then you get to work on interesting things like estimating recoverable oil and gas. But seriously, that question is taking us outside the scope of this document as it involves knowledge, experience and the measurement and analysis of the data that make up each of the individual input parameters.

For further detailed reading investors should consult the Recoverable Hydrocarbon Guidelines on Cooper Energy’s website – policies section.

© Cooper Energy Limited

For further information contact Cooper Energy via the website.

cumulative probability p90 p50 p10_2

Documents