philsci-archive.pitt.eduphilsci-archive.pitt.edu/11717/1/searching_for_productive… · web...

Searching for Productive Causes in Big Data: The Information-Transmission Account

Billy Wheeler

Abstract

It has been argued that the use of Big Data in scientific research casts doubt on the need for causal knowledge in making sound predictions (Mayer-Schonberger & Cukier, 2013). In this article I argue that it is possible to search for productive causes in Big Data if one adopts the 'information-transfer account of causation' (Illari, 2011; Illari & Russo, 2014), a version of the causal process theory. As it stands, the current formulation is inadequate as it does not specify how information is to be measured. I consider three concepts of information: (i) information as knowledge update, (ii) information as entropy and (iii) information as algorithmic complexity, and argue that the last of these provides the best way to achieve this with respect to Big Data. How this can be used to search for causal connections among Big Data is then illustrated with respect to exposomics research.

1. Introduction

1.1. The End of Causation?

David Hume is famous for putting forward the view that causation in the world amounts to nothing more than constant conjunction. Event-A causes event-Bso the analysis goesif all A-events are routinely followed by B-events. Whether Hume actually held this position is contentious. Less controversial is Hume's opinion about the role our idea of causation plays in our inferential practices. According to Hume:

‘Tis evident that all reasonings concerning matters of fact are founded on the relation of cause and effect, and that we can never infer the existence of one object from another, unless they be connected together. (1739, p. 649)

1

This picture concerning the role of causation has, for a long time, seemed to be supported by the practice of natural science. Scientists start with a hypothesis about which events cause other events. This hypothesis is then subject to experiment and once confirmed forms the basis for a causal law which can be used to predict and manipulate the world around us. Despite its widespread appeal, this picture has recently been called into question by the arrival of Big Data as a source of scientific knowledge. In his famous article (2008), Anderson claims that features of Big Data that make it different from traditional evidence rules out the need for top-down 'theory-driven' science. Instead, he recommends a 'data-driven' science where the correlations speak for themselves. A similar sentiment is presented in Viktor Mayer-Schonberger & Kenneth Cukier (2013), although they are far more explicit about the consequences this has for causation:

The ideal of identifying causal mechanisms is a self-congratulatory illusion; big data overturns this. Yet again we are at a historical impasse where "god is dead." (2013, p. 18)

Turning traditional slogans on their head, enthusiasts of Big Data proclaim that 'correlation supersedes causation'. With the arrival of Big Data establishing causal connections is no longer needed to successfully predict and intervene in the world. We can get by just as well with correlations and searching for causation is an expensive and time-consuming diversion.

Enthusiasts of Big Data point to numerous success stories to support their claim that it heralds a new age of 'doing science', free from theory and free from causal inference. An oft-cited example is ‘Google Flu Trends’ (Ginsberg, et al. 2009). Using existing data from influenza outbreak in the United States, Google looked for correlations between infected areas and common search terms. Eventually they found a correlation which was able to predict the spread of flu much faster than official routes (such as doctor records). Another well-known case was the surprising correlation found by Wal-Mart

2

between hurricane reports and the sale of Pop-Tarts: every time a hurricane was announced, sales increased. Wal-Mart were able to exploit this by placing Pop-Tarts closer to the store front during stormy weather, thereby increasing profits.

In these two cases causation was nowhere to be found. Correlations went in and predictions came outthere was no need to formulate a causal hypothesis at any stage. But to hold that this applies to the use of Big Data in science, as some Big Data enthusiasts suggest, is to overlook one crucial fact. In the worlds of finance and health science there is an added pressure to get predictions, especially when lives and money are at stake. In some cases it might well be worth making predictions on far than sturdy evidenceespecially if (i) the consequences of doing so and getting it wrong are small and (ii) the consequences of doing so and getting it right are big. The chances of getting it wrong might be worth it if the payoff is high enough.

This cost-benefit appraisal is enhanced by Big Data, especially given the speeds at which it can be utilised. But it doesn't exclude cognitive understanding. A recent study by Canali (2015) shows that scientists using Big Data in the field of exposomics employ practices that are indicative of the search for causal knowledge. Even more explicitly, the Causality Workbench Project (Guyon, et al. 2011) aims to find computer programs that can search and discover causal links (and not merely correlations) among Big Data. What this suggests is that the arrival of Big Data calls for a rethink of the role of causationand how it is inferredrather than relegation.

1.2. Rethinking Causation for Data-Intensive Science

In this paper I will investigate whether there is a role to play for productive intuitions about causation in the use of Big Data. In particular, I will focus on the question: given the nature of Big Data, is it possible to infer productive causes as opposed to only difference-making causes?

3

Metaphysical theories of causation are frequently separated into two groups (Hall, 2004). Firstly, there are those who take after Hume in thinking that causation in the world amounts to nothing more than regularity or constant conjunction. Sophisticated versions of this intuition might appeal to probabilities (Reichenbach, 1956), counterfactuals (Lewis, 1973) and interventions (Woodward, 2003). The crucial point is that the occurrence of the cause makes a difference to the occurrence of the effect, but there is no 'tie' or 'link' between them. For this reason these theories are often clustered under the title of 'difference-making' accounts. Secondly, there are those who think that to say C causes E means that there is a connection, usually understood as a mechanism or process, linking C to E. These views are called 'productive' because they suggest the cause in some way produces or 'brings about’ the effect.

There is evidence that scientists are interested in finding both difference-making and productive causes (Russo & Williamson, 2007 & 2011; Clarke et al., 2013 & 2014) and that neither has an exclusive hold on the practice of science. The nature of Big Data challenges both intuitions. A number of aspects of Big Data make it interesting to study from a methodological standpoint. Here I will focus on just two: (i) the observational nature of the data collected and (ii) its automated analysis by technology. Whilst not all examples of Big Data exhibit these two features (some data may be circumscribed and some will be analysed manually by human scientists) it covers a sufficiently large number of cases to capture what's different about it.1

These two features prove problematic for traditional ways of thinking about causation and causal inference in science. Turning to difference-making accounts, the study of Big Data happens after the means of production. This means it is not possible to intervene at any stage and see what might have

1 For example the data-intensive studies carried out by the exposomics project (Vineis, et al. (2009) and the GWAS project (Ratti, 2015) both exhibit these features.

4

happened had the circumstances been different. This rules out certain interventionist approaches as well as counterfactual approaches to causation.

Production notions of causation seem equally problematic. The most well-known process theory is the conserved quantities view (Salmon, 1998; Dowe, 2000) which identifies causal interaction as the exchange of a conserved quantity, e.g. mass-energy, charge and momentum. Yet few Big Data sets are about these kinds of properties and so this approach has limited use in inferring causes outside the physical sciences. Identifying evidence of complex mechanisms in Big Data appears an even more formidable task: mechanisms require more than just the cause and effect event, they also need activities and entities to connect the two (Machamer, Darden & Craver 2000; Glennan, 2002). Whilst human scientists may be able to draw on past evidence to hypothesise a mechanism connecting two correlated variables, it is hard to see how this inference could be automatedat least given the current computational resources available.

Wolfgang Pietsch (2015) recently took up the task of providing a difference-making account of causation compatible with data-intensive science. He gives a sophisticated version of the regularity view based on eliminitive induction. In brief: a factor or variable C is causally relevant for E if, and only if, in a fixed context B one can find instances of both 'C&E' and 'not-C & not-E' (2015, p. 12). But we have seen that scientists need evidence of both difference-making and production in order to infer the existence of causal connections. So here I will investigate whether a notion of productive cause exists which is suitable for use in Big Data analysis. I will do this by exploring a recent alternative version of the process view known as the ‘information-transmission account’ (Collier 1999, 2010; Illari, 2011 ; Illari & Russo, 2015).

In section 2 I will outline the information-transmission account in detail and explain how it arose out of an attempt to overcome problems with Dowe’s

5

conserved quantities view. The main difficulty with the information-transmission account as it currently stands is it does not tell us how to measure information, yet without this we cannot track its flow and therefore identify causal connections among data. To remedy this, I consider three well-known concepts of information: knowledge updates, entropy and algorithmic complexity. Section 3 will be largely theoretical as each of the concepts is used to formulate a different version of the transmission account. Out of all three, I argue that interpreting information as algorithmic complexity provides the best concept for a theory of causation as well as being practical in the search for causal links. Finally in section 4 I illustrate how this concept of information could be used to search for evidence of productive causes in data-intensive science.

2. The Information-Transmission Account

2.1. The Conserved Quantities View (CQV)

The CQV can be seen as combining two earlier notions for understanding productive causes: those of 'transfer' and 'process'. Transfer approaches to causality had been advocated by Ron Aronson (1971) and David Fair (1979). According to Aronson, two events are causally connected if an amount of a physical quantity, such as velocity, momentum, heat etc., is transferred from the cause to the effect. The collision of two billiard-balls provides a standard example. When one ball strikes another there is an exchange of momentum between the two balls. This suffices to make the connection causal, according to Aronson.2 To rule out possible accidental exchanges, both Aronson and Fair demand that the transferred quantity retain the same identity between the exchanges.

Philip Dowe criticises these early transfer views over the nature of the identity that is required. Convincingly, Dowe argues it is impossible to trace

2 Fair's view is slightly different in that he identifies the transferred property exclusively as 'energy'.

6

the identity of the exchanged quantities Aronson and Fair give (2000, pp. 55-59). Energy, velocity, momentum etc., lose their numerical identity when exchanged. If two ball bearings are swung on a Newton's cradle exchanging their momentum along the stationary ones, it is not possible to say from which striking ball their gain in momentum came from. Although they might not retain their numerical identity, they do however keep their numerical equality: the total amount of momentum is constant throughout the process. Dowe therefore identifies the exchanged properties as physical conserved quantities, which are all only those quantities described by a conservation law. The relation between cause and effect is then one of 'exchange' of a conserved quantity; principally either mass-energy, momentum or charge.

Another criticism Dowe has of Aronson and Fair's transfer views is that they fail to capture what he calls 'immanent causation' (2000, pp. 52-55). A wrench spinning in space can have its movement explained by its own inertiayet it is clear there is no transfer of momentum from the spanner to anything else. To handle these cases Dowe makes use of the idea of a 'causal process'. The wrench is a causal process which transmits momentum from one point in space-time to another. The idea of a causal process previously played an important role in Wesley Salmon's (1984) mark-transmission theory. According to Salmon causal processes can be separated from 'pseudo processes' by the ability of the formerbut not the latterto transmit a mark. Since this approach makes use of processes having 'abilities', Salmon ultimately rejected it as depending on counterfactuals (Salmon, 1994).

Dowe was able to resurrect the 'process intuition' by demanding of causal processesnot that they have the ability to transmit a markbut that they actually transmit a conserved quantity. Causal interaction, what Dowe calls 'transient causation', can then be explained as the exchange of a conserved quantity between two (or more) causal processes.

7

CQ1. A causal process is a world line of an object that possesses a conserved quantity.

CQ2. A causal interaction is an intersection of world lines that involves exchange of a conserved quantity.

The CQV falls down as a suitable notion for data-intensive science in its applicability (Illari, 2011; Illari & Russo, 2014). By its very nature, the CQV only predicts causation where there is the exchange of a fundamental physical property. But most data-intensive science takes place in the biological and social sciences where the data is unlikely to be anything about these. To remedy this, Boniolo et al. (2014) propose replacing Dowe's conserved quantities with what they call 'extensive quantities'. An extensive quantity is a quantity of an object such that the quantity for the total volume of the object is equal to the sum total of its parts. This makes mass an extensive quantity since the mass of an object is the sum of its parts, whereas it excludes velocity, since the velocity of an object is the same as the velocity of its individual parts. Other examples of extensive quantities include no. of moles, entropy, and angular momentum. Other examples of intensive quantities include temperature, colour, and shape.

Yet even extensive quantities may not be enough to cover all the cases we are interested in. Whilst Boniolo et al. have identified a greater range of potentially exchanged quantities; these are still limited to properties in the physical sciences. But as we have seen, causal inference occurs in many fields, and it is unlikely the scientist would have access this kind of data in all of the cases that may use Big Data as evidence.

2.2. Information as a Conserved Quantity

The previous discussion shows that neither Dowe nor Biololo et al., have provided a version of the CQV that is suitably general for analysing Big Data for causal connections. The lack of applicability was one motivating factor for

8

Illari (2011) and Illari & Russo (2014) to propose that the transferred quantity along and between causal processes should be understood as information. A similar view had also been advocated by John Collier (1999) who writes 'The basic idea is that causation is the transfer of a particular token of a quantity of information from one state of a system to another' (1999, p. 215).

Thinking of the transferred quantity as information does appear to give us what we need. Any data we have can be considered informative, whether that data is about the charge of an electric plate or the age of an individual. Since data is the primary carrier of information it is a more suitable concept for automated analysis and computer scientists have a good deal of experience of quantifying data and making comparative claims about its 'size'.

Coming back to the CQV, we could think of the causal processes as channels which transmit data and the intersection of two causal processes (where causal interaction occurs) as the transfer of information between two such channels. Illari and Russo (2014) argue that this synthesises the evidence already established that uncovering mechanisms is important in finding productive causes. They propose that the channel through which the information flows is the mechanism in question. Here I will not speculate on what precisely the nature of physical channels is, as my concern will be more with the nature of the information which flows along it. For that reason I will use 'causal process' and 'channel' interchangeably whilst noting that at some point the advocate of information-transmission owes us an account of what channels are and how they permit the flow of information.

Both Collier (1999 & 2010) and Illari & Russo (2014) envision that the transferred information along and between channels must be the same information, understood in terms of numerical identity. As we have seen, some quantities do not possess identity over time, and information appears to work like this as well. If I give somebody the information that 'Tomorrow is

9

Sunday', I do not lose this information. I still retain it even though I have shared it. Yet if information tracked numerical identity surely I would lose possession? This suggests that if we are to use 'information' as a transferred quantity then we should adopt Dowe's position of numerical equality rather than identity.

This raises the question of whether or not information is a conserved quantity. In physics it has been a standard assumption that in a closed system information cannot be created or destroyed. The one exception to this is black holes (see John Preskill (1992) for an overview of the problem), yet even the latest evidence here suggests black holes may not have the ability to destroy information after all (Hawking, 2015). Information, therefore, appears to be a safe proposal for a quantity that retains numeral equality when transmitted and transferred.

We can adopt as a working theory a version of the CQV which we might call the Informational Conserved Quantities View or i-CQV for short:

i-CQV 1: A causal process is a world line of an object that conserves information.

i-CQV 2: A causal interaction is an intersection of causal processes whose sum total information is conserved.

These definitions are meant to preserve as much as possible of Dowe's original insight but replace the multitude of physical conserved quantities with the single conserved quantity of 'information'.

Thinking of causation as the transfer of information seems like a promising idea, but as it stands, it is more of an analogy than a fully-fledged view. As Illari and Russo themselves acknowledge:

The challenge is precisely to find a concept that covers the many diverse kinds of causal linking in the world, that nevertheless says

10

something substantive about causality (2014, p. 148).

The generality of the concept of information is its strength and its weakness. As has been remarked many times the concept of information is multifaceted and difficult to pin-down.3 Whilst the many different applications of the term 'information' might be useful for understanding causation in each respective domain, it will be important to find a singular concept, if we are to have a philosophically significant and general account of causation that covers a variety of cases. What's more, we need a concept of information that fulfils much the same role as the conserved quantities in the CQV. Three aspects of conserved quantities are essential for their role:

First, whether or not a quantity is transferred between two objects is an objective matter, not relative to an individual observer.

Secondly, the amount of quantity transferred must be measurable. Whilst 'quantifiable' will do for causation in itself, the quantities need to be actually measurable, if it is to provide any grounding for causal inference.

Thirdly, each of the quantities (q) must be additive such that q(A)+ q(B) = q(A+B). This is needed to ensure the amount of a quantity is conserved between causal interaction.

Whilst this list is a good starting point it still leaves a range of possible concepts of information to choose from. I will now turn to assess three of these concepts: (i) information as knowledge update, (ii) information as entropy, and (iii) information as algorithmic complexity. I have chosen these because each is relatively well known, influential in each one's domain of application and general enough to cover all fields in which causal inference might take place.

3 See Floridi (2010) for an extensive survey of the different kinds of information that have been recognised.

11

3. Three Concepts of Information

3.1. Information as 'Knowledge Update'

The first concept I will consider comes from the field of epistemic logic. This branch of logic concerns itself with modeling the knowledge states of agents and how they change when receiving new information. It has been influential not just in philosophy, but has had applications in the fields of computer science, robotics and network security. The relevant concept here is that of the 'epistemic state' of an agent and how that state changes. The basic idea is that when an agent receives some new piece of information, their epistemic state changes. The epistemic state of an agent is modeled using Kripke semantics: each possible world available to the agent is a possible way the world could be.

To illustrate suppose an agent does not know which day of the week it is. This means there are seven possible worlds accessible to her. She is told ‘it is a weekend day’. This reduces the number of possibilities. Furthermore she is told ‘it is not-Saturday’. She updates her state and there is only one possibility remaining: it must be Sunday. Every time the agent receives a new piece of information, their state changes and we can measure 'how informative' that information is by their change of state. This means that this particular concept of information is semantic and qualitative since it depends upon prior assumptions about what possibilities are available to the agent.

It might be argued that the semantic and qualitative nature of this concept rules out its usefulness in the i-CQV. But this is not obviously the case. Take our three aspects of conserved quantities needed to model causation: objectivity, measurability and additivity. Although how informative a piece of information is is relative to the current state of the agent, this can be modeled objectively, and once the current state is given, the informativeness of that state is the same for everyone. It is not a matter of personal opinion or subjective value and so cannot change from person to person. Its

12

semantic character does not exclude its measurability either: provided one has a model for the agent and the possibilities, the amount of information can be calculated. Lastly, this concept is additive: the update provided by the messages 'weekend day' and 'not-Saturday' is equal to the single message 'weekend day and not-Saturday'.

How would the i-CQV look if information is interpreted as knowledge update? Let us imagine that the world line of an object provides a knowledge update to an agent, which we may take to be some kind of measurement or observation on a particular occasion.

Causal Process : A world line is a causal process if the epistemic update received by an agent at time t1 excludes the same possibilities as an epistemic update received by an agent at time t2, where t1 and t2 are different points along the world line.

Causal Interaction: There is a causal interaction between two causal processes A and B if the total epistemic update received by an agent observing A and B at time t1 excludes the same possibilities as the total epistemic update received by an agent observing A and B at time t2, where t1 is a point prior to intersection and t2 is a point after intersection.

In the case of a single causal process, what Dowe calls 'immanent causation', how much information an agent receives by observing that object should remain the same, no matter which time they observe it. This would be opposed to a pseudo-process which can become more-or-less informative at different times. This seems highly plausible in the case of knowledge updates. Let's go back to our example of the wrench spinning alone in space. It appears that whatever the current state of the agent and no matter what time they observe the wrench, they will receive the same knowledge update, and therefore the same amount of information.

13

Unfortunately, there are a number of issues with this proposal that make it highly problematic as an interpretation of the i-CQV in the long-run.

Firstly it is clear the agent in the definitions above needs to be an 'ideal agent' in a specific epistemic context. What an agent already knows affects the informativeness of a given piece of information. How detrimental is the inclusion of an ideal agent to this proposal? In terms of causal inference the use of an ideal agent is not that troubling. This is a common device in many approaches to scientific reasoning. Scientists do not reason 'in a vacuum' and provided we are explicit about what knowledge they have during the reasoning process, we can lay down rules for good and bad inferential practices. The inclusion of an ideal agent is more problematic in terms of giving a general philosophical analysis of causation 'as it is in the world'. This is because it makes the fact about whether 'A causes B' relative to the state of the agentyet intuitivelywe feel that whether or not causation occurs is incidental to what an agent does or does not know.

A second worry arises from the fact that an agent can only update their knowledge once when receiving a given piece of information. Once an agent has been told it's a weekend day this message cannot provide any information later if they are told for a second time. Yet it is precisely this conservation of information that we need from the causal process. One way to get around this problem would be to place the conserved quantity as an ability of the world line, so that it becomes a causal processes if it possesses the ability to update an agent's epistemic state by the same degree. This is very similar to Salmon's original mark-transmission theory. As we have seen already (section 2.1.) Salmon rejected this on the grounds that it involves a counterfactual in characterising the ability: in his case 'to be marked' and in our case 'to update an epistemic state'. This would make the resulting view a version of the difference-making approach to causation. Since our goal is to provide a production account this solution is not one we can appeal to.

14

So far I have focused on the case of a single causal process but explicating causal interaction proves to be even more challenging with this concept of information.

Imagine a situation in which particles A and B are passing through space with different values for momentum, energy and charge. At a point along their world lines they collide and transfer some of these quantities. According to the CQV the sum total of quantities remains the same through the collision, but the sum total of knowledge updates received before and after does not. To explain this, let t1 be a time along their word lines prior to collision and t2 a time along their world lines after collision. An agent observing the particles at t1 knows something about the particles at that timenamely their energy, momentum and charge at t1. Like all updates this excludes a number of possibilities at t1 about the world and gives them a particular amount of information. But an agent who observes the particles at t2 cannot exclude these possibilities, as they have no access to the properties of A and B before t2. Given their current observations there are more possibilities available for the energy, momentum and charge at t1, since a number of different configurations are compatible with their current observation.

This demonstrates that it is possible to learn something at t1 which cannot be learnt at t2 and so for that reason information as 'knowledge-update' is not always conserved during interaction. What we need is a numerical measure of information so that we can say 'the same amount of information is conserved' rather than ‘exactly the same information’. As the concept of information coming from epistemic logic is semantic and qualitative it cannot provide this role. However a related concept, that of 'entropy', might give us what we need.

3.2. Information as 'Entropy'

The next concept of information has been hugely influential, especially in electrical telecommunication, where it has provided rigorous, mathematical

15

definitions of optimal coding, noise and channel capacity. The basic idea is that the more likely a message is (out of all possible messages), the less informative it is and vice versa. For any given message or symbol produced by an information source, there is an assumed probability distribution. The 'entropy' (H) contained within a message x is given by its probability p(x) according to the following equation (Shannon & Weaver, 1949):

Entropy: H(x) = -p(x) log2 p(x)

How useful is this idea for thinking about productive causes and causal inference? The first thing to say is that the origin of the concept itself models nicely causation understood as the flow or transfer of information. The original application was for copper wires transmitting messages via electrical wave or impulse (Pierce, 1961). If we think of causal processes as taking place along a channel, then we can readily appreciate the relationship. Secondly, by taking the negative log of the probability, the resulting quantity of information is additive: H(x) + H(y) = H(x+y). This makes the concept suitable for use in the CQV which requires sum totals to conserve after interaction. Lastly, by defining the information of a message quantitatively via its likelihood of occurring we do not need to worry about the meaning of the message. Its semantic content or value is irrelevant on this model. This allows us to avoid the main worry from section 3.1. that content is not conserved during causal interaction.

At the moment I have not been precise about how we measure the entropy of a channel, which could mean one of two things. It could mean the 'entropy rate', which is an average of the information carried by a message per second, or it could mean the 'self-information' of a given message received by the receiver at the end of the channel. It's not obvious how either of these relates to causal processes. The entropy rate is an average: by definition this will remain constant. This make it difficult to separate genuinely causal processes from pseudo-processes on the basis of conserved entropy rate:

16

trivially all channels will have a conserved entropy rate. Likewise, thinking of the entropy as the property of a channel with respect to a particular message is problematic if a channel produces messages with different likelihoods and therefore different entropies.

Fortunately, the nature of single causal processes suggests we can model them as channels transmitting a solitary message through space-time. When an agent 'receives' that message by observation, they do not wait for another. Providing the causal process is not interacting, it will transmit just one message, with a single amount of self-information:

Causal Process: A causal process is a channel which transmits a message with constant entropy value.

This has prima facie plausibility with respect to the wrench in space. The lost object could only be one of the objects in the toolbox, therefore it has a predetermined probability distribution. The entropy it carries remains constant and would inform the receiver equally no matter which time they intersected it. The step to causal interaction is straightforward:

Causal Interaction: A causal interaction occurs between two channels A and B if the sum total of entropies before interaction equals the sum total of entropies after interaction.

Notice that this view also says causation depends on probabilities, yet it is quite different in nature from other difference-making probabilistic approaches such as those of Reichenbach (1959). Here we are not defining causation in terms of 'chance-raising' events. Instead we are saying that the chance of a message occurring remains constant through interaction and so therefore the total information vis-a-vis entropy remains constant.

Although it characterises causation differently, this version of the i-CQV inherits concerns typically raised against probabilistic difference-makingthe most important being how we explain where the values in the probability

17

distribution come from. How we do this depends on which interpretation of probability we take. For obvious reasons I am keen to avoid a protracted discussion of the pros and cons of various interpretations of probability. I refer the reader to Gillies (2000b) for an overview of the existing literature. I think it is worth, however, looking at three of the most relevant interpretations in order to highlight the difficulties the entropy view faces.

(i) The relative frequency interpretation gives an objective value of probability for an outcome based on the number of positive instances out of all possible instances. This view contains problems especially concerning causal inference. Firstly, there is an issue regarding how we ascertain the values: sampling is our best option here, which is already consistent with scientific practice, but our sampling method may fall short through sampling bias or statistical irregularity. The case of Big Data does provide some reprieve: if our sample contains all available data (or a high percentage of it), then these biases and chance irregularities can be smoothed over. There is a second issue though: as scientists are not omniscient they cannot appeal to the 'frequency in the limit' as the number of cases approaches infinity. The probability for any outcome then is contingent on past occurrences. As this is constantly changingso too will the entropy. A single causal process which transmits a solitary message cannot be expected to have constant entropy, since presumably instances of the event-type are happening elsewhere in the universe and thus changing its probability distribution. Interpreting probability as relative frequency has the undesirable consequence that entropy cannot be a conserved quantity.

(ii) The aforementioned problem can be overcome if we adopt a physical propensity interpretation (Popper, 1959). This view is also objective but claims the probability is given by a propensity of a mechanism or system to produce a certain outcome. For example, flipping a fair coin has a propensity (as a real tendency or dispositional property of the system) to produce heads 1/2 of the time. The trouble with propensity interpretations, as has been

18

discussed before (Gillies 2000a, p. 825), is that their value is underdetermined by the evidence. If an event occurs once, its relative frequency is 1, but its physical propensity may be different. Naturally therefore this raises questions about our ability to ever know the propensities and therefore of entropy and causal connection. There are also metaphysical worries: philosophers attracted to causal process theories usually do so on empirical or Humean grounds. But to borrow an expression from John Earman (1984), propensities fail the 'empiricist loyalty test' since two worlds could agree on all occurrent/observable facts but differ over the chances for physical systems.

(iii) The last option to consider equates probability with subjective degrees of belief (Ramsey, 1926). This interpretation has the virtue of already been extensively discussed in Bayesian confirmation theory (Howson & Urbach, 1993). However this interpretation is also problematic for thinking about causation as conservation of entropy. Like the epistemic update view, this notion would depend on an agent and their background beliefs. It is quite possible that here subjective degrees of belief are not conserved in causal interaction at all, especially when the outcome of that interaction is surprising to the agent. Alexander Flemming's combined degrees of belief of their being penicillin mould and bacteria in his petri dish may be far higher than his belief that one would eradicate the other. It is hard to see how conservation could be guaranteed in such cases.

This section has shown that the main sticking point for entropy versions of the i-CQV is its dependence on probability. Whilst this provides a quantitative theory, it requires some explanation of the origin of the probability distribution. Well known accounts all seem problematic and these problems will have to be dealt with before this becomes a viable option.

3.3. Information as Algorithmic Complexity

The final notion of information I will consider here originates from algorithmic

19

information theory (AIT) which was developed independently by Ray Solomonoff (1964), Andrei Kolmogorov (1965) and Gregory Chaitin (1966). Like Shannon's entropy concept of information, AIT also provides a quantitative measure. The basic idea is that informativeness is connected to complexity: the more complex an object the more information is required to describe it. The size of the information is measured formally as the length of a program running on a universal computing device that can produce a description of the object. An example is best used to illustrate this idea. Compare the following two strings:

(a) 00011010011101001011

(b) 01010101010101010101

Entropy considerations alone suggest that (a) and (b) contain the same amount of information (assuming they are produced by an ergodic source). On closer inspection string (b) clearly exhibits greater structure than (a), which at first glance seems random in nature. The structure in (b) can be described by an algorithm. This makes string (b) computationally less complex than (a). In order for a universal computer to output (a) it would need to repeat the entire message whereas for string (b) it need only execute the operation 'print 01 ten times'.

AIT defines the amount of information in a message or string S as the length of the shortest program which, when executed, outputs S and halts. This quantity is known as algorithmic or Kolmogorov complexity (K) after one of its co-discoverers.

Algorithmic complexity looks like a suitable concept for the i-CQV. It is objective: once the operating language of the universal computer is given, the value of K is the same for everyone. It is measurable: the size of a string can be given simply by just counting the number of bits (in the case of

20

binary). It is additive: K(S1) + K(S2) = K(S1+S2).4

A version of the i-CQV that adopts algorithmic complexity as a measure of information would therefore look something like the following:

Causal Process: A causal process is the world line of an object that conserves algorithmic complexity.

Causal Interaction: There is a causal interaction between causal processes A and B if the sum total algorithmic complexity of A and B before intersection is the same as the sum total algorithmic complexity after intersection.

In the case of the lone wrench in space, the first definition looks plausible. Regardless of which time we describe the wrench, the total amount of resources required to describe it fully will remain the same. Likewise for interaction. Two particles A and B which collide and transfer physical quantities at time t will require the same amount of resources to describe before t as they will after t.

Algorithmic complexity appears to be a promising concept of information for the i-CQV then: it satisfies the three requirements on a conserved quantity and there is intuitive reason to think that it is indeed conserved across causal interaction.

One potential worry is that the value of K is language-dependent. As we are measuring K as the number of symbols in the string, evidently its length will depend on the vocabulary of our encoding. This could be used to show that complexity is not, after all, a conserved quantity. Imagine that I use one language L1 to describe all the properties of the wrench before some particular time t. However, after t, I describe it using a different language, 4 It is true that for finite strings the additivity rule may not be met, this is because short strings with structure may not be compressible if the size of their algorithm is large. This difference dissipates as their size increases. So assuming S1 and S2 are relatively large strings, we can assume that additivity is met.

21

that of L2. Since K is language-dependent this means that its complexity will not be conserved and that therefore the amount of information carried by the causal process is also not conserved.

There is a solution to this problem the advocate of complexity could appeal to here. They can exploit a result in AIT known as the 'invariance theorem' (Li & Vintanyi, date):

Invariance Theorem: (S) KU1(S) — KU2 (S) c

This states that for all strings the difference in their complexities equals a constant c whose value depends only on the computational resources required to translate from one coding language to another. If the strings are themselves long relative to a translation program, then the difference becomes minimal. In the limit, as the sizes of S tend towards infinity, it is irrelevant.

In reality we are not dealing with strings of infinite size and so the choice of encoding will have some affect. This could be problematic when it comes to finding evidence of causal connections. One way to avoid this would be to set as a requirement that all descriptions of the world be carried out in a particular language L. Provided scientists continue to use L to describe the world, causal processes will conserve complexity. This raises the question of who decides L and on what basis. The worry is that our choice will always be somewhat arbitrary. We could try appealing to a 'natural language' based on natural kind terms, but as van Fraassen has pointed out (1989, p. 53) regardless of the success of our laws and theories, we will never be able to know whether or not our language is one comprised of such terms.

An altogether better solution is to use the value for 'c' to place a restriction on conservation. Hence the definitions above of causal process and causal interaction hold for a given computing language. When we are using different encodings to describe the world over the course of an object's world

22

line, then conservation will be maintained within a range of values less than c. Provided it is clear which scenario is present, I do not see any difficulties arising from the language-dependency character of K.

3.4. Summary of the Concepts

The above discussion shows that out of the three concepts, measuring the amount of information in terms of algorithmic complexity seems the least problematic. To be sure each notion has its own problems and I don't say here that a version of the i-CQV modified to incorporate ‘knowledge update’ or ‘entropy’ could not be made to work. Nevertheless, information as algorithmic complexity has less internal problems as a theory of causation in its own right and therefore offers the most potential for tracking reliable causal inferences.

4. Searching for Causes in Big Data: The Case of Exposomics

How might the i-CQV interpreted in terms of algorithmic complexity be used to find evidence of causal connections in practice? To specifically highlight its possible role in data-intensive science, I will attempt to answer this question against the backdrop of exposomics research. The reasons for choosing this field are threefold: (1) exposomics research is currently one of the largest scientific studies incorporating Big Data. As it is contemporary and ongoing it provides fresh information about the methodology of data studies not tainted by historical reconstructions of the process; (2) it has already been discussed in length, particularly in Russo and Williamson (2012) that this field utilises evidence of both difference-making and production when asserting causal connections; and (3) scientists engaged in this project have expressed their interest in finding processes that run from exposure conditions to the onset of a disease (Canali, 2015).

Exposomics research focuses on the search for ‘biomarkers’: these are factors both internal and external to the agent which might be connected to

23

the onset of a disease. In this respect, it can be seen as a combination of traditional epidemiology (which studies external factors) and genomics (which studies the interplay between internal factors such as gene expression and protein activities in the cell). In exposomics, data is collected from many different sources that might include the lifestyle of an individual, location, age, pollution exposure, family history, genetic composition, ongoing illnesses etc. Programs can then be used to analyse this data for correlations, which are then investigated further by individuals, looking for intermediate biomarkers that suggest evidence of a causal process for the disease.

This discipline provides an ideal case to articulate how productive causes might be searched for in Big Data. Recall that the i-CQV interpreted via algorithmic complexity says there is a causal interaction between two processes if their sum total values of K are equivalent. This suggests the following rule of causal inference:

K-Rule: Track values for biomarkers along the route of two (or more causal processes) A and B. Encode the description of each process so the value of each equals (K), the algorithmic complexity. For each value K1 and K2, if the value for K1+K2 before interaction equals the value of K1+K2 after interaction, then there is evidence of a productive cause linking A and B.

The K-Rule only uses the data given and does not require any interventions. For that reason it seems suitable to data-intensive science that only uses observational data. Likewise comparing the size of data is a routine task for computers and so searching for causes in this way appears to be something that could be automated.

I reality, however, the K-Rule will never be able to be followed. The reason for this is that for a given data structure such as a string of symbols S, K is non-computable. One cannot define a program which, when given S, outputs

24

its value for K. Given any string we will never know if our best compression of it actually is the best compression possible. What this shows is that scientists using this method can only at best approximate the K-Rule, given the algorithms they currently have for compressing their data. This suggests the formulation of a weaker version of the K-Rule that is more attainable in practice:

Best Compression-Rule: Track values for biomarkers along the route of two (or more causal processes) A and B. Encode the description of each process using the best compression algorithms available C1 and C2. If the length of the best compression available of A and B before interaction (using C1 and C2) is the same as the length of the compression achieved by applying C1 and C2 to A and B after interaction, then there is evidence of a productive cause linking A and B.

Although this inference-rule has a rather cumbersome formula its underlying logic is simple. It assumes that the amount of compression achieved by using a particular algorithm is the same before as well as after interaction. The compressibility of the data (with respect to our current algorithms) is invariant along the causal process. In the situation where K is not known, this is the second-best option.

Again, this inference-rule is compatible with our understanding of data-intensive science as involving observational data and automated analysis. Human scientists will need to supply the necessary algorithm as this will be a creative activity. However once known, it is a routine computational procedure to apply that algorithm in compressing sets of data and measuring the resulting length. If this kind of 'compression invariance' is found between the data, what can the scientist conclude? At most that there is evidence of a causal process. Just as with difference-making conclusions, one needs to be cautious: the equivalence may just come down to a coincidence.

25

Nonetheless, when this result is added alongside difference-making evidence such as statistical dependency, it adds further support to the claim that causality is really present.

Although this shows how in potential evidence of productive causes can be found in Big Data, it would be foolish to conclude that this shows traditional methods in science are now obsolete. For a start, I agree with Russo and Williamson (2012), that evidence of both kinds of productive causeprocesses as well as mechanismsare needed to establish causal claims. Indeed, it is doubtful whether causal processes established as the conservation of information, as envisioned by the i-CQV, could ever be explanatory. One might find there is an informational equivalence between certain biomarkers and the outcome of a disease, and yet this is far short of explaining why those conditions gives rise to the disease. Rather, what it suggests is a program of further, non-automated research, by the scientist to look for complex mechanisms connecting the two. This depends on the scientist’s background knowledge of similar systems and the creative design of experiments to test hypotheses about which mechanisms might be responsible.

ReferencesAnderson, C., 2008. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired Magazine, 23 June.

Aronson, J., 1971. On the Grammar of Cause. Synthese, Volume 22, pp. 414-30.

Boniolo, G., Faraldo, R. & Saggion, A., 2011. Explicating the notion of Causation: The role of extensive quantities. In: P. Illari, F. Russo & J. Williamson, eds. Causality in the Sciences. Oxford: Oxford University Press, pp. 503-525.

26

Canali, S., 2015. Big Data, Epistemology and Causality: Knowledge in and Knowledge out in EXPOsOMICS. Under Review.

Chaitin, G., 1966. On the Length of Programs for Computing Finite Binary sequences. Journal of the ACM, 13(4), pp. 547-569.

Clarke, B. et al., 2013. The Evidence that Evidence-Based Medicine Omits. Preventive Medicine, 57(6), pp. 745-747.

Clarke, B. et al., 2014. Mechanisms and the Evidence Hierarchy. Topoi, 33(2), pp. 339-360.

Collier, J., 1999. Causation is the Transfer of Information. Australasian Studies in History and Philosophy of Science , Volume 14, pp. 215-245.

Collier, J., 2010. Information, Causation and Computation. In: G. Crnkovic & M. Burgin, eds. Information and Computation: Essays on Scientific and Philosophical Understanding of Foundations of Information and Computation. London: World Scientific, pp. 89-106.

Dowe, P., 2000. Physical Causation. Cambridge: Cambridge University Press.

Earman, J., 1984. Laws of Nature: The Empiricist Challenge. In: R. J. Bogdan, ed. D. M. Armstrong. Dordrecht: D. Reidel Publishing Company, pp. 191-223.

Fair, D., 1979. Cuasation and the Flow of Energy. Erkenntnis, Volume 14, pp. 219-250.

Floridi, L., 2010. Information: A Very Short Introduction. Oxford: Oxford University Press.

Gillies, D., 2000. Philosophical Theories of Probability. London: Routledge.

Ginsberg, J., 2009. Detecting influenza epidemics using search engine query data. Nature, Issue 457, pp. 1012-1014.

27

Glennan, S., 2002. Rethinking Mechanistic Explanation. Philosophy of Science, 69(1), pp. 342-353.

Guyon, e. a., 2011. Causality Workbench. In: P. Illari, F. Russo & J. Williamson, eds. Causality in the Sciences. Oxford: Oxford University Press, pp. 543-561.

Hall, N., 2004. Two Concepts of Information. In: J. Collins, N. Hall & L. A. Paul, eds. Causation and Counterfactuals. Cambridge: MIT Press, pp. 198-222.

Hawking, S., 2015. Stephen Hawking says he's solved a black hole mystery, but physicists await the proof. [Online] Available at: http://phys.org/news/2015-08-stephen-hawking-black-hole-mystery.html[Accessed 04 10 2015].

Howson, C. & Urbach, P., 1993. Scientific Reasoning. Chicago: Open Court.

Illari, P., 2011. Why theories of causality need production: an information-transmission account. Philosophy & Technology, 24(2), pp. 95-114.

Illari, P. & Russo, F., 2014. Causality: Philosophical Theory Meets Scientific Practice. Oxford: Oxford University Press.

Illari, P., Russo, F. & Williamson, J., 2011. Causality in the Sciences. Oxford: Oxford University Press.

Kitchin, R., 2014. The Data Revolution. London: Sage.

Kolmogorov, A., 1965. Three Approaches to the Definition of the Quantity of Information. Problems of Information Transmission, 1(1), pp. 1-7.

Lewis, D., 1973. Causation. Journal of Philosophy, Volume 2, pp. 556-567.

Li, M. & Vintanyi, P., 1993. An Introduction to Kolmogorov Complexity and its Applications. New York: Springer-Verlag.

28

Machamar, P., Darden, L. & Craver, C., 2000. Thinking about Mechanisms. Philosophy of Science, 67(1), pp. 1-21.

Mayer-Schonberger, V. & Cukier, K., 2013. Big Data: A Revolution that will Transform how we Live, Work and Think. London: John Murray.

Pierce, J., 1961. An Introduction to Information Theory: Symbols, Signals and Noise. 1980 ed. New York: Dover.

Pietsch, W., 2015. The Causal Nature of Modeling with Big Data. Philosophy and Technology, Volume doi: 10.1007/s13347-015-2202-2, pp. 1-35.

Popper, K., 1959. The Logic of Scientific Discovery. New York: Basic Books.

Preskill, J., 1992. Do Black Holes Destroy Information?. arXiv: 9209058.

Ramsey, F., 1990. Philosophical Papers. Cambridge: Cambridge University Press.

Ratti, E., 2015. Big Data Biology: Between Eliminative Inferences and Exploratory Experiments. Philosophy of Science, 82(2), pp. 198-218.

Reichenbach, H., 1956. The Direction of Time. Chicago: University of Chicago Press.

Russo, F. & Williamson, J., 2007. Interpreting Causality in the Health Sciences. International Studies in the Philosophy of Science, 21(2), pp. 157-170.

Russo, F. & Williamson, J., 2012. EnviroGenomarkers: The Interplay Between Mechanisms and Difference Making in Establishing Causal Claims. Medicine Studies, 3(4), pp. 249-262.

Salmon, W., 1984. Scientific Explanation and the Causal Structure of the World. Princeton: Princeton University Press.

29

Salmon, W., 1994. Causality without Counterfactuals. Philosophy of Science, 61(2), pp. 297-312.

Salmon, W., 1998. Causation and Explanation. Oxford: Oxford University Press.

Shannon, C. & Weaver, W., 1949. The Mathematical Theory of Communication. Urbana: University of Illinois Press.

Solomonoff, R., 1964a. A Formal Theory of Inductive Inference: Part I. Information and Control, 7(1), pp. 1-22.

Solomonoff, R., 1964b. A Formal Theory of Inductive Inference: Part II. Information and Control, 7(2), pp. 224-254.

van Fraassen, B., 1989. Laws and Symmetry. Oxford: Clarenden Press.

Vineis, P., Khan, A., Vlaanderen, J. & Vermeulen, R., 2009. The Impact of New Research Technologies on Our Understanding of Environmental Causes of Disease: The Concept of Clinical Vulnerability. Environmental Health, 8(54).

30

philsci-archive.pitt.eduphilsci-archive.pitt.edu/11717/1/searching_for_productive… · web...

Documents