content analysis and grounded theory dr ayaz afsar 1

55
Content analysis and grounded theory Dr Ayaz Afsar 1

Upload: bertram-johnston

Post on 29-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Interviews

Content analysis and grounded theory

Dr Ayaz Afsar1IntroductionThis topic addresses two main forms of qualitative data analysis: content analysis and grounded theory. Many qualitative data analysts undertake forms of content analysis. One of the enduring problems of qualitative data analysis is the reduction of copious amounts of written data to manageable and comprehensible proportions. Data reduction is a key element of qualitative analysis, performed in a way that attempts to respect the quality of the qualitative data. One common procedure for achieving this is content analysis, a process by which the many words of texts are classied into much fewer categories. 2The goal is to reduce the material in different ways.Categories are usually derived from theoretical constructs or areas of interest devised in advance of the analysis (pre-ordinate categorization) rather than developed from the material itself, though these may be modied, of course, by reference to the empirical data.

3What is content analysis?

The term content analysis is often used sloppily. In effect, it simply denes the process of summarizing and reporting written data the main contents of data and their messages. More strictly speaking, it denes a strict and systematic set of procedures for the rigorous analysis, examination and verication of the contents of written data.Krippendorp (2004: 18) denes it as a research technique for making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their use. Texts are dened as any written communicative materials which are intended to be read, interpreted and understood by people other than the analysts. 4Originally deriving from analysis of mass media and public speeches, the use of content analysis has spread to examination of any form of communicative material, both structured and unstructured. It may be applied to substantive problems at the intersection of culture, social structure, and social interaction; used to generate dependent variables in experimental designs; and used to study groups as microcosms of society. Content analysis can be undertaken with any written material, from documents to interview transcriptions, from media products to personal interviews. It is often used to analyse large quantities of text, facilitated by the systematic, rule-governed nature of content analysis, not least because this enables computer-assisted analysis to be undertaken.

5Content analysis has several attractions. It is an unobtrusive technique in that one can observe without being observed. It focuses on language and linguistic features, meaning in context, is systematic and veriable (e.g. in its use of codes and categories), as the rules for analysis are explicit, transparent and public. Further, as the data are in a permanent form (texts), verication through reanalysis and replication is possible.

6Weber (1990: 9) sees the purposes of content analysis as including the coding of open-ended questions in surveys, the revealing of the focus of individual, group, institutional and societal matters, and the description of patterns and trends in communicative content. The latter suggestion indicates the role of statistical techniques in content analysis; indeed Weber suggests that the highest quality content-analytic studies use both quantitative and qualitative analysis of texts (texts dened as any form of written communication).Content analysis takes texts and analyses, reduces and interrogates them into summary form through the use of both pre-existing categories and emergent themes in order to generate or test a theory. It uses systematic, replicable, observable and rule-governed forms of analysis in a theory-dependent system for the application of those categories.7 Krippendorp (2004: 224) suggests that there are several features of texts that relate to a denition of content analysis, including the fact that texts have no objective reader-independent qualities; rather they have multiple meanings and can sustain multiple readings and interpretations. There is no one meaning waiting to be discovered or described in them. Indeed, the meanings in texts may be personal and are located in specic contexts, discourses, and purposes, and, hence, meanings have to be drawn in context. Content analysis, then, describes the manifest characteristics of communication (asking who is saying what to whom, and how), infers the antecedents of the communication (the reasons for, and purposes behind, the communication, and the context of communication and infers the consequences of the communication (its effects). Krippendorp suggests that content analysis is at its most successful when it can break down linguistically constituted facts into four classes: attributions, social relationships, public behaviours and institutional realities.

8How does content analysis work?

Ezzy (2002: 83) suggests that content analysis starts with a sample of texts (the units), denes the units of analysis (e.g. words, sentences) and the categories to be used for analysis, reviews the texts in order to code them and places them into categories, and then counts and logs the occurrences of words, codes and categories. From here statistical analysis and quantitative methods are applied, leading to an interpretation of the results. Put simply, content analysis involves coding, categorizing (creating meaningful categories into which the units of analysis words, phrases, sentences etc. can be placed), comparing (categories and making links between them), and concluding drawing theoretical conclusions from the text.

9Content analysis involves counting concepts, words or occurrences in documents and reporting them in tabular form. This indicates essential features of the process of content analysis:breaking down text into units of analysisundertaking statistical analysis of the unitspresenting the analysis in as economical a form as possible.This masks some other important features of content analysis, including, for example, examination of the interconnectedness of units of analysis (categories), the emergent nature of themes and the testing, development and generation of theory. The whole process of content analysis can follow eleven steps.

10StepsStep 1: Dene the research questions to be addressed by the content analysisThis will also include what one wants from the texts to be content-analysed. The research questions will be informed by, indeed may be derived from, the theory to be tested.Step 2: Dene the population from which units of text are to be sampled.The population here refers not only to people but also, and mainly, to text the domains of the analysis. For example, is it to be newspapers, programmes, interview transcripts, textbooks, conversations, public domain documents, examination scripts, emails, online conversations and so on?Step 3: Dene the sample to be includedHere the rules for sampling people can apply equally well to documents. One has to decide whether to opt for a probability or non-probability sample of documents, a stratied sample (and, if so, the kind of strata to be used), random sampling, convenience sampling, domain sampling, cluster sampling, purposive, systematic, time sampling, snowball and so on.

11Robson (1993: 2759) indicates the careful delineation of the sampling strategy here, for example, such-and-such a set of documents, such-and-such a time frame (e.g. of newspapers), such-and-such a number of television programmes or interviews.The key issues of sampling apply to the sampling of texts: representativeness, access, size of the sample and generalizability of the results.Krippendorp (2004: 145) indicates that there may be nested recording units, where one unit is nested within another, for example, with regard to newspapers that have been sampled it may be thus:

12the issues of a newspaper sampled; the articles in an issue of a newspaper sampled; the paragraphs in an article in an issue of a newspaper sampled; the propositions constituting a paragraph in an article in an issue of a newspaper sampled. Step 4: Dene the context of the generation of the document.This will examine, for example: how the material was generated; who was involved; who was present; where the documents come from; how the material was recorded and/or edited; whether the person was willing to, able to, and did tell the truth; whether the data are accurately reported ; whether the data are corroborated; the authenticity and credibility of the documents; the context of the generation of the document; the selection and evaluation of the evidence contained in the document.

13Step 5: Dene the units of analysisThis can be at very many levels, for example, a word, phrase, sentence, paragraph, whole text, people and themes. Robson (1993: 276) includes here, for newspaper analysis, the number of stories on a topic, column inches, size of headline, number of stories on a page, position of stories within a newspaper, the number and type of pictures. His suggestions indicate the careful thought that needs to go into the selection of the units of analysis. Different levels of analysis will raise different issues of reliability, and these are discussed later.It is assumed that the units of analysis will be classiable into the same category text with the same or similar meaning in the context of the text itself (semantic validity) although this can be problematic (discussed later).The description of units of analysis will also include the units of measurement and enumeration.

14ContStepsThe coding unit denes the smallest element of material that can be analysed, while the contextual unit denes the largest textual unit that may appear in a single category.Krippendorp distinguishes three kinds of units. Sampling units are those units that are included in, or excluded from, an analysis; they are units of selection. Recording/coding units are units that are contained within sampling units and are smaller than sampling units, thereby avoiding the complexity that characterises sampling units; they are units of description.Context units are units of textual matter that set limits on the information to be considered in the description of recording units; they are units that delineate the scope of information that coders need to consult in characterising the recording units15Krippendorp (2004) continues by suggesting a further ve kinds of sampling units: physical (e.g. time, place, size); syntactical (words, grammar, sentences, paragraphs, chapters, series etc.); categorical (members of a category have something in common); propositional (delineating particular constructions or propositions); and thematic (putting texts into themes and combinations of categories). The issue of categories signals the next step. The criterion here is that each unit of analysis (category conceptual, actual, classication element, cluster, issue) should be as discrete as possible while retaining delity to the integrity of the whole, i.e. that each unit must be a fair rather than a distorted representation of the context and other data. The creation of units of analysis can be done by ascribing codes to the data.

16Step 6: Decide the codes to be used in the analysisCodes can be at different levels of specicity and generality when dening content and concepts. There may be some codes which subsume others, thereby creating a hierarchy of subsumption subordination and superordination in effect creating a tree diagram of codes. Some codes are very general; others are more specic. They keep words as words; they maintain context specicity.Codes may be descriptive and might include: situation codes; perspectives held by subjects; ways of thinking about people and objects; process codes; activity codes; event codes; strategy codes; relationship and social structure codes; methods codes. However, to be faithful to the data, the codes themselves derive from the data responsively rather than being created pre-ordinately. Hence the researcher will go through the data ascribing codes to each piece of datum.

17A code is a word or abbreviation sufciently close to that which it is describing for the researcher to see at a glance what it means (in this respect it is unlike a number). For example, the code trust might refer to a persons trustworthiness; the code power might refer to the status or power of the person in the group.Miles and Huberman (1984) advise that codes should be kept as discrete as possible and that coding should start earlier rather than later as late coding enfeebles the analysis, although there is a risk that early coding might inuence too strongly any later codes. It is possible, they suggest, for as many as ninety codes to be held in the working memory while going through data, although clearly, there is a process of iteration and reiteration whereby some codes that are used in the early stages of coding might be modied subsequently and vice versa, necessitating the researcher to go through a data set more than once to ensure consistency, renement, modication and exhaustiveness of coding (some codes might become redundant, others might need to be broken down into ner codes). By coding up the data the researcher is able to detect frequencies (which codes are occurring most commonly) and patterns (which codes occur together).

18 Hammersley and Atkinson propose that the rst activity here is to read and reread the data to become thoroughly familiar with them, noting also any interesting patterns, any surprising, puzzling or unexpected features, any apparent inconsistencies or contradictions (e.g. between groups, within and between individuals and groups, between what people say and what they do).Step 7: Construct the categories for analysisCategories are the main groupings of constructs or key features of the text, showing links between units of analysis. For example, a text concerning teacher stress could have groupings such as causes of teacher stress, the nature of teacher stress, ways of coping with stress and the effects of stress. 19Categories are inferred by the researcher, whereas specic words or units of analysis are less inferential; the more one moves towards inference, the more reliability may be compromised, and the more the researchers agenda may impose itself on the data. Categories will need to be exhaustive in order to address content validity; indeed Robson (1993: 277) argues that a content analysis is no better than its system of categories and that these can include: subject matter; direction (how a matter is treated positively or negatively); values; goals; method used to achieve goals; traits (characteristics used to describe people); actors (who is being discussed); authority (in whose name the statements are being made); location; conict (sources and levels); and endings (how conicts are resolved).

20This stage (i.e. constructing the categories) is sometimes termed the creation of a domain analysis. This involves grouping the units into domains, clusters, groups, patterns, themes and coherent sets to form domains. A domain is any symbolic category that includes other categories. At this stage it might be useful for the researcher to recode the data into domain codes, or to review the codes used to see how they naturally fall into clusters, perhaps creating overarching codes for each cluster. Unitization is the process of putting data into meaning units for analysis, examining data, and identifying what those units are.21A meaning unit is simply a piece of datum which the researcher considers to be important; it may be as small as a word or phrase, or as large as a paragraph, groups of paragraphs, or, indeed, a whole text, provided that it has meaning in itself. Spradley (1979) suggests that establishing domains can be achieved by four analytic tasks: selecting a sample of verbatim interview and eld noteslooking for the names of thingsidentifying possible terms from the samplesearching through additional notes for other items to include.He identies six steps to achieve these tasks:

22six steps to achieve these tasks:select a single semantic relationshipprepare a domain analysis sheetselect a sample of statements from respondentssearch for possible cover terms and include those that t the semantic relationship identiedformulate structural questions for each domain identiedlist all the hypothesized domains.

23Domain analysis, then, strives to discover relationships between symbols. Like codes, categories can be at different levels of specicity and generality. Some categories are general and overarching; others are less so. Typically codes are much more specic than categories. This indicates the difference between nodes and codes. A code is a label for a piece of text; a node is a category into which different codes fall or are collected. A node can be a concept, idea, process, group of people, place or, indeed, any other grouping that the researcher wishes it to be; it is an organizing category. Whereas codes describe specic textual moments, nodes draw together codes into a categorical framework, making connections between coded segments and concepts.

24It is rather like saying that a text can be regarded as a book, with the chapters being the nodes and the paragraphs being the codes, or the content pages being the nodes and the index being the codes. Nodes can be related in several ways, for example: one concept can dene another; they can be logically related; and they can be empirically related.Step 8: Conduct the coding and categorizing of the data.Once the codes and categories have been decided, the analysis can be undertaken. This concerns the actual ascription of codes and categories to the text. Coding has been dened by Kerlinger (1970) as the translation of question responses and respondent information to specic categories for the purpose of analysis. Many questions are precoded, that is, each response can be immediately and directly converted into a score in an objective way. Rating scales and checklists are examples of precoded questions. Coding is the ascription of a category label to a piece of data; which is either decided in advance or in response to the data that have been collected.

25Mayring suggests that summarizing content analysis reduces the material to manageable proportions while maintaining delity to essential contents, and that inductive category formation proceeds through summarizing content analysis by inductively generating categories from the text material. This is in contrast to explicit content analysis, the opposite of summarizing content analysis, which seeks to add in further information in the search for intelligible text analysis and category location. The former reduces contextual detail, the latter retains it. Structuring content analysis lters out parts of the text in order to construct a cross-section of the material using specied preordinate criteria.

26It is important to decide whether to code simply for the existence or the incidence of the concept.This is important, as it would mean that, in the case of the former existence the frequency of a concept would be lost, and frequency may give an indication of the signicance of a concept in the text. Further, the coding will need to decide whether it should code only the exact words or those with a similar meaning. The former will probably result in signicant data loss, as words are not often repeated in comparison to the concepts that they signify; the latter may risk losing the nuanced sensitivity of particular words and phrases. Indeed some speechmakers may deliberately use ambiguous words or those with more than one meaning.In coding a piece of transcription the researcher goes through the data systematically, typically line by line, and writes a descriptive code by the side of each piece of datum, for example:

27TextCodeThe students will undertakePROBproblem-solving in scienceI prefer to teach mixed ability classesMIXABIL

One can see that the codes here are abbreviations, enabling the researcher to understand immediately the issue that they denote because they resemble that issue (rather than, for example, ascribing a number as a code for each piece of datum, where the number provides no clue as to what the datum or category concerns). Where they are not abbreviations, Miles and Huberman (1994) suggest that the coding label should bear sufcient resemblance to the original data so that the researcher can know, by looking at the code, what the original piece of datum concerned.

28Step 9: Conduct the data analysisOnce the data have been coded and categorized, the researcher can count the frequency of each code or word in the text, and the number of words in each category. This is the process of retrieval, which may be in multiple modes, for example words, codes, nodes and categories. Some words may be in more than one category, for example where one category is an overarching category and another is a subcategory. To ensure reliability, Weber suggests that it is advisable at rst to work on small samples of text rather than the whole text, to test out the coding and categorization, and make amendments where necessary. The complete texts should be analysed, as this preserves their semantic coherence.

29Words and single codes on their own have limited power, and so it is important to move to associations between words and codes, i.e. to look at categories and relationships between categories.Establishing relationships and linkages between the domains ensures that the data, their richness and context-groundedness are retained. Linkages can be found by identifying conrming cases, by seeking underlying associations and connections between data subsets.Weber suggests that it is preferable to retrieve text based on categories rather than single words, as categories tend to retrieve more than single words, drawing on synonyms and conceptually close meanings. One can make category counts as well as word counts. Indeed, one can specify at what level the counting can be conducted, for example, words, phrases, codes, categories and themes.

30The implication here is that the frequency of words, codes, nodes and categories provides an indication of their signicance. This may or may not be true, since subsequent mentions of a word or category may be difcult in certain texts (e.g. speeches). Frequency does not equal importance, and not saying something (withholding comment) may be as important as saying something. Content analysis analyses only what is present rather than what is missing or unsaid. Further, as Weber (1990) says:pronouns may replace nouns the further on one goes through passage; continuing raising of the issue may cause redundancy as it may be counter-productive repetition; constraints on text length may inhibit reference to the theme; some topics may require much more effort to raise than others31The researcher can summarize the inferences from the text, look for patterns, regularities and relationships between segments of the text, and test hypotheses. The summarizing of categories and data is an explicit aim of statistical techniques, for these permit trends, frequencies, priorities and relationships to be calculated. At the stage of data analysis there are several approaches and methods that can be used. Krippendorp suggests that these can include:extrapolations: trends, patterns and differencesstandards: evaluations and judgementsindices: e.g. of relationships, frequencies of occurrence and co-occurrence, number of favourable and unfavourable items linguistic re-presentations.

32Once frequencies have been calculated, statistical analysis can proceed, using, for example: factor analysis: to group the kinds of responsetabulation: of frequencies and percentages cross-tabulation: presenting a matrix where the words or codes are the column headings and the nominal variables (e.g. the newspaper, the year, the gender) are the row headingscorrelation: to identify the strength and direction of association between words, between codes and between categoriesgraphical representation: for example to report the incidence of particular words, concepts, categories over time or over textsregression: to determine the value of one variable/word/code/category in relationship to another a form of association that gives exact values and the gradient or slope of the goodness of t line of relationship the regression line multiple regression: to calculate the weighting of independents on dependent variables structural equation modelling and LISREL

33analysis: to determine the multiple directions of causality and the weightings of different associations in a pathway analysis of causal relationsdendrograms: tree diagrams to show the relationship and connection between categories and codes, codes and nodes.While conducting qualitative data analysis using numerical approaches or paradigms may be criticized for being positivistic, one should note that one of the founders of grounded theory (Glaser 1996) is on record as saying that not only did grounded theory develop out of a desire to apply a quantitative paradigm to qualitative data, but also paradigmal purity was unacceptable in the real world of qualitative data analysis, in which tness for purpose should be the guide. Further, one can note that Miles and Huberman (1984) strongly advocate the graphic display of data as an economical means of reducing qualitative data. Such graphics might serve both to indicate causal relationships as well as simply summarizing data.

34Step 10: SummarizingBy this stage the investigator will be in a position to write a summary of the main features of the situation that have been researched so far. The summary will identify key factors, key issues, key concepts and key areas for subsequent investigation. It is a watershed stage during the data collection, as it pinpoints major themes, issues and problems that have arisen, so far, from the data (responsively) and suggests avenues for further investigation. The concepts used will be a combination of those derived from the data themselves and those inferred by the researcher . At this point, the researcher will have gone through the preliminary stages of theory generation. Patton (1980) sets these out for qualitative data:

35nding a focus for the research and analysis organizing, processing, ordering and checking data. writing a qualitative description or analysis inductively developing categories, typologies and labels. analysing the categories to identify where further clarication and cross-clarication are needed. expressing and typifying these categories through metaphors making inferences and speculations about relationships, causes and effects.

36Bogdan and Biklen (1992: 15463) identify several important factors that researchers need to address at this stage, including forcing oneself to take decisions that will focus and narrow the study and decide what kind of study it will be; developing analytical questions; using previous observational data to inform subsequent data collection; writing reexive notes and memos about observations, ideas, what is being learned; trying out ideas with subjects; analysing relevant literature while conducting the eld research; generating concepts, metaphors and analogies and visual devices to clarify the research.Step 11: Making speculative inferencesThis is an important stage, for it moves the research from description to inference. It requires the researcher, on the basis of the evidence, to posit some explanations for the situation, some key elements and possibly even their causes. It is the process of hypothesis generation or the setting of working hypotheses that feeds into theory generation.

37The stage of theory generation is linked to grounded theory, and I will turn to this later in the lecture. Here I will provide an example of content analysis that does not use statistical analysis but which nevertheless demonstrates the systematic approach to analysing data that is at the heart of content analysis.

38At a wider level, the limits of content analysis are suggested by Ezzy who argues that, due to the pre-ordinate nature of coding and categorizing, content analysis is useful for testing or conrming a pre-existing theory rather than for building a new one, though this perhaps understates the ways in which content analysis can be used to generate new theory, not least through a grounded theory approach (discussed later). In many cases content analysts know in advance what they are looking for in text, and perhaps what the categories for analysis will be. Ezzy (2002: 85) suggests that this restricts the extent to which the analytical categories can be responsive to the data, thereby conning the data analysis to the agenda of the researcher rather than the other. In this way it enables pre-existing theory to be tested. Indeed Mayring (2004: 269) argues that if the research question is very open or if the study is exploratory, then more open procedures than content analysis, e.g. grounded theory, may be preferable.

39However, inductive approaches may be ruled out of the early stages of a content analysis, but this does not keep them out of the later stages, as themes and interpretations may emerge inductively from the data and the researcher, rather than only or necessarily from the categories or pre-existing theories themselves. Hence to suggest that content analysis denies induction or is conned to the testing of pre-existing theory is uncharitable; it is to misrepresent the exibility of content analysis. Indeed Flick (1998) suggests that pre-existing categories may need to be modied if they do not t the data.40Grounded theory

Theory generation in qualitative data can be emergent, and grounded theory is an important method of theory generation. It is more inductive than content analysis, as the theories emerge from, rather than exist before, the data. Strauss and Corbin (1994: 273) remark: grounded theory is a general methodology for developing theory that is grounded in data systematically gathered and analysed. There are several features of this denition:Theory is emergent rather than predened and tested.Theory emerges from the data rather than vice versa.Theory generation is a consequence of, and partner to, systematic data collection and analysis.Patterns and theories are implicit in data, waiting to be discovered.

41

Glaser (1996) suggests that grounded theory is the systematic generation of a theory from data; it is an inductive process in which everything is integrated and in which data pattern themselves rather than having the researcher pattern them, as actions are integrated and interrelated with other actions. Glaser and Strausss (1967) seminal work rejects simple linear causality and the decontextualization of data, and argues that the world which participants inhabit is multivalent, multivariate and connected. Glaser (1996) says, the world doesnt occur in a vacuum and the researcher has to take account of the interconnectedness of actions. In everyday life, actions are interconnected and people make connections naturally; it is part of everyday living, and hence grounded theory catches the naturalistic element of research and formulates it into a systematic methodology. Grounded theory is faithful to how people act; it takes account of apparent inconsistencies, contradictions, discontinuities and relatedness in actions. 42Grounded theory is a systematic theory, using systematized methods (discussed below) of theoretical sampling, coding constant comparison, the identication of a core variable, and saturation.Grounded theory is not averse to quantitative methods, it arose out of them (Glaser 1996) in terms of trying to bring to qualitative data some of the analytic methods applied in statistical techniques (e.g. multivariate analysis).In grounded theory the researcher discovers what is relevant; indeed Glaser and Strausss (1967) work is entitled The Discovery of Grounded Theory.However, where it parts company with much quantitative, positivist research is in its view of theory. In positivist research the theory pre-exists its testing and the researcher deduces from the data whether the theory is robust and can be conrmed. The data are forced into a t with the theory. Grounded theory, on the other hand, does not force data to t with a predetermined theory;indeed the difference between inductive and deductive research is less clear than it appears to be at rst sight. For example, before one can deduce, one has to generate theory and categories inductively.

43Grounded theory starts with data, which are then analysed and reviewed to enable the theory to be generated from them; it is rooted in the data and little else. Here the theory derives from the data it is grounded in the data and emerges from it. As Lincoln and Guba (1985: 205) argue, grounded theory must t the situation that is being researched.Glaser (1996) writes that forcing methodologies were too ascendant, not least in positivist research and that grounded theory had to reject forcing or constraining the nature of a research investigation by pre-existing theories. As grounded theory sets aside any preconceived ideas, letting the data themselves give rise to the theory, certain abilities are required of the researcher, for example:

44tolerance and openness to data and what is emergingtolerance of confusion and regression (feeling stupid when the theory does not become immediately obvious)resistance to premature formulation of theory ability to pay close attention to datawillingness to engage in the process of theorygeneration rather than theory testing; it is an experiential methodologyability to work with emergent categories rather than preconceived or received categories.

45As theory is not predetermined, the role of targeted pre-reading is not as strong as in other kinds of research (e.g. using literature reviews to generate issues for the research), indeed it may be dangerous as it may prematurely close off or determine what one sees in data; it may cause one to read data through given lenses rather than anew. As one does not know what one will nd, one cannot be sure what one should read before undertaking grounded theory. One should read widely, both within and outside the eld, rather than narrowly and in too focused a direction. There are several elements of grounded theory that contribute to its systematic nature, and it is to these that I now turn.

46Theoretical sampling

In theoretical sampling, data are collected on an ongoing, iterative basis, and the researcher keeps on adding to the sample until there is enough data to describe what is going on in the context or situation under study and until theoretical saturation is reached (discussed below). As one cannot know in advance when this point will be reached, one cannot determine the sample size or representativeness until one is actually doing the research. In theoretical sampling, data collection continues until sufcient data have been gathered to create a theoretical explanation of what is happening and what constitutes its key features.It is not a question of representativeness, but, rather, a question of allowing the theory to emerge.

47

Theoretical sampling is the process of data collection for generating theory whereby the analyst jointly collects, codes, and analyses his data and decides what data to collect next and where to nd them, in order to develop his theory as it emerges. This process of data collection is controlled by the emerging theoryTthe basic criterion governing the selection of comparison groups for discovering theory is their theoretical relevance for furthering the development of emerging categories rather than, for example, conventional sampling strategies.

48Coding

Coding is the process of disassembling and reassembling the data. Data are disassembled when they are broken apart into lines, paragraphs or sections. These fragments are then rearranged, through coding, to produce a new understanding that explores similarities, differences, across a number of different cases. The early part of coding should be confusing, with a mass of apparently unrelated material. However, as coding progresses and themes emerge, the analysis becomes more organized and structured.

49In grounded theory there are three types of coding: open, axial and selective coding, the intention of which is to deconstruct the data into manageable chunks in order to facilitate an understanding of the phenomenon in question.Open coding involves exploring the data and identifying units of analysis to code for meanings, feelings, actions, events and so on. The researcher codes up the data, creating new codes and categories and subcategories where necessary, and integrating codes where relevant until the coding is complete. Axial coding seeks to make links between categories and codes, to integrate codes around the axes of central categories; the essence of axial coding is the interconnectedness of categories (Cresswell 1998: 57). Hence codes are explored, their interrelationships are examined, and codes and categories are compared to existing theory.Selective coding involves identifying a core code; the relationship between that core code and other codes is made clear and the coding scheme is compared with pre-existing theory. Cresswell (1998: 57) writes that in selective coding, the researcher identies a story line and writes a story that integrates the categories in the axial coding model.

50As coding proceeds, the researcher develops concepts and makes connections between them. Flick et al. (2004: 19) argue that repeated coding of data leads to denser concept-based relationships and hence to a theory, i.e. that the richness of the data is included in the theoretical formulation.

51Constant comparison

The application of open, axial and selective coding adopts the method of constant comparison. In constant comparison the researcher compares the new data with existing data and categories, so that the categories achieve a perfect t with the data. If there is a poor t between data and categories, or indeed between theory and data, then the categories and theories have to be modied until all the data are accounted for. New and emergent categories are developed in order to be able to incorporate and accommodate data in a good t, with no discrepant cases. Glaser and Strauss (1967: 102) write that the purpose of the constant comparative method of joint coding and analysis is to generate theory . . . by using explicit coding and analytic procedures. That theory is not intended to ascertain universality or the proof of suggested causes or other properties. Since no proof is involved, the constant comparison method . . . requires only saturation of data not consideration of all available data.

52

In constant comparison, then, discrepant, negative and disconrming cases are important in assisting the categories and emergent (grounded) theory to t all the data. Constant comparison is the process by which the properties and categories across the data are compared continuously until no more variation occurs (Glaser 1996), i.e. saturation is reached. In constant comparison data are compared across a range of situations, times, groups of people, and through a range of methods.The process resonates with the methodological notion of triangulation. Glaser and Strauss (1967: 10513) suggest that the constant comparison method involves four stages: comparing incidents and data that are applicable to each category; integrating these categories and their properties; bounding the theory; setting out the theory. The rst stage here involves coding of incidents and comparing them with previous incidents in the same and different groups and with other data that are in the same category. 53The second stage involves memoing and further coding. Here the constant comparative units change from comparison of incident with incident to comparison of incident with properties of the category that resulted from initial comparisons of incidents (Glaser and Strauss 1967: 108). The third stage of delimitation occurs at the levels of the theory and the categories and in which the major modications reduce as underlying uniformities and properties are discovered and in which theoretical saturation takes place. The nal stage of writing theory occurs when the researcher has gathered and generated coded data, memos, and a theory, and this is then written in full.

54The End55