restructuring longitudinal data

9
RESTRUCTURING LONGITUDINAL DATA A guide to the unknown…

Upload: vsuarezf2732

Post on 16-Nov-2015

7 views

Category:

Documents


5 download

DESCRIPTION

Longitudinal Data

TRANSCRIPT

Restructuring longitudinal data

Restructuring longitudinal dataA guide to the unknown

What is longitudinal data?A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space. For example, part of a longitudinal dataset could contain specific students and their standardized test scores in six successive years.

One type of Longitudinal data is also known as Panel data and is data from a (usually small) number of observations over time on a (usually large) number of cross-sectional units like individuals, households, firms, or governments.

ExamplesSubset of hierarchical data observations that are correlated because there is some tie to same unit.E.g. in educational studies, where we observe student i in school u. Presumably there is some tie between the observations in the same school.In such data, observe yj,u where u indicates a unit and j indicates the jth observation drawn from that unit. Thus no relationship between yj,u and yj,u even though they have the same first subscript. In true longitudinal data, t represents comparable time.

Why Restructure?One approach to working with longitudinal data sets is to restructure the data set- either going from one observation per subject to several or vice versa. For example, you may have several diagnosis codes in a single observation (visit) and want to compute frequencies of each possible diagnosis code. To do this, you will find it more convenient to have one observation for each diagnosis code, resulting in possibly several observations per subject.

Restructuring the DataData structure analysis includes making sure that all the components of the data structures are closely related, that closely related data are not in separate structures, and that the best type of data structure is being used. The data may be a lot easier to manage and understand when it is a representation which tries to abstract its relevant similarities.

Often, in data warehouses, data restructuring involves changing some aspects of the way wherein the database is logically or physically arranged.

Restructuring the Data Cont.There are generally four types of data restructuring operations namely:

Trimming Flattening Stretching Grafting In trimming, the extracted data from the input is placed in the output without having to change any of the change in the hierarchical relationships but some unwanted components of the data removed.

In flattening, the operation produced a form from a structure branch of an input by extracting all information at the level of the values of the basic attributes of the branch.

The stretching operating can produce a data structure output which has hierarchical levels than the input.

Finally, a grafting operating involves combining two hierarchies horizontally to form a wider hierarchy by matching common values.

Restructuring longitudinal dataIn SPSS you go to data/restructure. This allows you to restructure your data from multiple variables(columns) in a single case to groups of related cases(rows) or vice versa, or you can choose to transpose your data.SPSS SYNTAX:VARSTOCASES /ID=id /MAKE trans1 FROM VAR00001 VAR00002 VAR00003 VAR00004 /INDEX=Index1(4) /KEEP= /NULL=KEEP.

Restructuring longitudinal data in SASYou can create observations using an array staement and a do loop or you can simply transpose the existing data.data neonatal;infile 'F:\Thesis Docs\Data\neonatal.txt' delimiter='09'x truncover dsd missover obs=104;input location $ _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_ _2003_ _2004_ _2005_ _2006_ _2007_;run;proc sort data=neonatal;by location;run;proc transpose data=neonatalout=neonatal2name=yearprefix=neonatal; by location;var _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_ _2003_ _2004_ _2005_ _2006_ _2007_;run;data neonatal3 (drop=neonatal2);set neonatal2;run;proc print data=neonatal3 noobs;run;

ConclusionRestructuring is fun!