sampling data in t-sql

18
T-SQL Sampling Page 1 of 18 [email protected] Sampling Data in T-SQL One of the great benefits of databases is that numerical analysis can be done against the entire population. Trends and behaviors can be performed just as easily against ten million records as against one hundred. That is, assuming that all of the information necessary is contained within the data set. Unfortunately, that isn’t always the case; sometimes it is necessary to conduct manual research for additional i nformation and report against the results. The goal of sampling is to select the smallest number of records that significantly represent the characteristics of the whole. The statistical process of determining sample size N will not be addressed here; it is assumed that you know N and are only curious about how to select that many records in a way that guarantees a reasonable level of randomness. One of the inspirations to document the process of sampling SQL data was the NCQA/HEDIS Systematic Sample Methodology. NCQA is the National Committee for Quality Assurance, which runs a research program called the Healthcare Effectiveness Data and Information Set (HEDIS). The majority of HEDIS measures involve analyzing the entire population within a health plan women between specific ages, and counting the number who have received specific procedures a breast cancer screening. However, some measures require data not currently in the medical or claims data, and these utilizes a hybrid approach to identifying the eligible population, then sampling that population to identify specific members whose medical records will be examined manually to determine how that person’s results will be evaluated. But before jumping into the NCQA/HEDIS sampling methodology, let’s examine some simpler alternatives. Appendix A contains a T-SQL script to create and populate a sample table called Individuals within a database named Demonstration. You are welcome to create the same table and run the example queries to make this more of a hands-on experience. Likewise, the techniques shown here can be used in any number of situations beyond HEDIS measures. On a side note: the names used are drawn from the staff list of the NPR radio show Car Talk. Example 1: Selecting all records The SQL query: Select I.IdNo, I.LName, I.FName, I.BirthDate From Individuals I will produce a result set of 137 row in data entry order.

Upload: tim-mills-groninger

Post on 09-Jul-2015

947 views

Category:

Technology


1 download

DESCRIPTION

Explanation and examples of sampling SQL tables via random or systematic methods with emphasis on NCQA/HEDIS methodology

TRANSCRIPT

T-SQL Sampling Page 1 of 18 [email protected]

Sampling Data in T-SQL

One of the great benefits of databases is that numerical analysis can be done against the entire population.

Trends and behaviors can be performed just as easily against ten million records as against one hundred.

That is, assuming that all of the information necessary is contained within the data set. Unfortunately,

that isn’t always the case; sometimes it is necessary to conduct manual research for additional information

and report against the results.

The goal of sampling is to select the smallest number of records that significantly represent the

characteristics of the whole. The statistical process of determining sample size N will not be addressed

here; it is assumed that you know N and are only curious about how to select that many records in a way

that guarantees a reasonable level of randomness.

One of the inspirations to document the process of sampling SQL data was the NCQA/HEDIS Systematic

Sample Methodology. NCQA is the National Committee for Quality Assurance, which runs a research

program called the Healthcare Effectiveness Data and Information Set (HEDIS). The majority of HEDIS

measures involve analyzing the entire population within a health plan – women between specific ages,

and counting the number who have received specific procedures – a breast cancer screening. However,

some measures require data not currently in the medical or claims data, and these utilizes a hybrid

approach to identifying the eligible population, then sampling that population to identify specific

members whose medical records will be examined manually to determine how that person’s results will

be evaluated.

But before jumping into the NCQA/HEDIS sampling methodology, let’s examine some simpler

alternatives. Appendix A contains a T-SQL script to create and populate a sample table called Individuals

within a database named Demonstration. You are welcome to create the same table and run the example

queries to make this more of a hands-on experience. Likewise, the techniques shown here can be used in

any number of situations beyond HEDIS measures. On a side note: the names used are drawn from the

staff list of the NPR radio show Car Talk.

Example 1: Selecting all records

The SQL query:

Select

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

will produce a result set of 137 row in data entry order.

T-SQL Sampling Page 2 of 18 [email protected]

Note that some records are duplicated and the results are show in data entry order. Selecting record in

data entry or natural order is not sufficient for sampling – there are too many biases built in. Thus,

assuming that we want the sample size N to be 25, the Top 25 constraint in the query will return the just

25 records, just the wrong one from a statistical standpoint.

SelectTop 25

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

The need to randomize the records selected can be achieved by using the NewId() function to put the

returned records in random order. By adding the clause Order by NewId() to the statement you will see a

different set of records returned every time you run the query.

SelectTop 25 percent

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

OrderBy

NewId()

Note that by adding percent to the Top 25 clause you will return a percentage of the total table – in this

case, 35 rows out of the 137 total.

T-SQL Sampling Page 3 of 18 [email protected]

This is the simplest way to extract a specific sized random sample from a data set. Unfortunately, not all

statisticians will accept a purely random sample. Whether this bias is the result of pre-computer era

difficulties in generating random numbers and applying them to a population or it represents a legitimate

concern about the distribution of the data, many situations will require a periodic or systematic sample of

the data. Until the abacus and quill pen generation that creates these specifications finally dies off

systematic sampling will remain a mandatory skill.

The simplest, and most common (but not in HEDIS), systematic sampling method is to select a starting

record and then every nth record from that record on. Determining the nth record interval found by

dividing the total population/record count (137 in our example) by the sample size N or 5.48, which is

rounded down to 5. A simple way to skip a set numbers of records in a set is with modulus. The

mathematical function modulus (also just mod) shows the remainder of one number divided by another.

For example, the modulus of 6 /5 is 1, while mod 15/5 is 0. In T-SQL modulus operator is “%.” The

individuals table has a sequential system assigned field call IdNo, so selecting every 5th record can be

achieved by adding the constraint IdNo%5 = 0; to start at a number greater than 1 (say 11), add the

constraint IdNo> 11. To change the offset, add some starting value to IdNo.

The query

Select

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

Where

(IdNo + 2)%5 = 0

And I.IdNo > 11

will return 25 records exactly, with a three position offset from mod 5.

Omitting the IdNo> 11 condition will return 27 rows, but you can apply a Top 25 constraint to return the

exact N desired. This method does a very good job of returning every nth record as determined by the

IdNo field. However, if the table has significant deletions, and the deleted records follow any kind of

pattern, the exact periodicity of the sequence can be jeopardized.

The Row_Number() function in T-SQL allows you assign a new sequential number to each row when the

query is executed. The same mod function can be used with the new row number with a much more

T-SQL Sampling Page 4 of 18 [email protected]

regular segmentation. To jump ahead a bit, the NCQA/HEDIS systematic sampling methodology

requires that the population be ordered by last name, first name, and birth date (in descending order one

year, and ascending the next) and since Row_Number() requires a sort, the following query orders and

numbers the full data set per that requirement.

The query

Select

ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc) RowNum,

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

returns

From here it is trivial to apply the previous mod example to the RowNum field and return a reasonable

periodic sample. Unfortunately, the NCQA/HEDIS systematic sampling methodology does not use a set

interval. Instead, each ith member (in this example, the second through 25th records returned) has a

specific calculation.

The first record is called START and is determined by multiplying a random number supplied by NCQA

by the eligible members (EM) divided by the final sample size (FSS). In the calculations EM/FSS is

referred to as N. For this exercise assume that START is equal to 2.

The calculation for each member of the sample is

ith member = START + [(i-1) x N]

In T-SQL a common table expression (CTE) can be used to determine the row to be selected as follows.

;With Row_List_CTE(RowNum)

As (

Select

ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc)

From

T-SQL Sampling Page 5 of 18 [email protected]

Individuals I

)

SelectTop 25

RowNum

, RowNum - 1 SelRec

, 137.0/25.0 NR

,(RowNum-1)*(137.0/25.0) RN1

,CAST(ROUND(2+(RowNum-1)*(137.0/25.0),0)AsInt) Final

From

Row_List_CTE

The first line defines the CTE Row_List_CTE with a single column called RowNum. The Select clause

following the As populates the CTE with rows 1 through 137.

The next Select statement uses the CTE to demonstrate each step in the calculation.

RowNumis the original row number. SelRec is the sample record to select and is the [(i-1)… part of the

calculation. NR is N in the calculation, the result of EM/FSS (137/25) or 5.48. RN1 shows the core of the

calculation or [(i-1) x N]. The Final field shows the calculation that rounds RN1 to the nearest integer

and adds the START value of 2. This is the row number of the record to be used in the sample. This

approach selects an interval of 5 or 6 rows between each selected record which does, grudgingly, improve

the quality of the sample by varying the interval.

To show the basic elements of name, birth date, and age in the final query the same CTE can be used to

select the demographics and determine the sample.

Declare @EvalDt Date

Declare @N Decimal(9,3)

Set @EvalDt ='12-31-2012'

Set @N =137.0/25.0

;With Ind_List_CTE(RowNum, IdNo, LName, FName, BirthDate)

As (

Select

ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc)

RowNum,

I.IdNo,

T-SQL Sampling Page 6 of 18 [email protected]

I.LName,

I.FName,

I.BirthDate

From

Individuals I

)

Select

RowNum

, IdNo

, FName

, LName

, BirthDate

,DATEDIFF(Hour,BirthDate, @EvalDt)/8766 As Age

From

Ind_List_CTE

Where RowNum In

(

SelectTop 25

--CAST(ROUND(2+(RowNum-1) * @N,0) As Int)

Floor(ROUND(2+(RowNum-1)* @N,0))

From

Ind_List_CTE

)

In order to make the query more useable in the future, key variables are declared in the first section.

Because the typical HEDIS measure wants the age of the member at the conclusion of the evaluation year

the parameter @EvalDate is defined and set to 12-31-2012. Subsequently age will be determined by

calculating the difference in hours between the birth date and the evaluation date and dividing that by

8766, the average number of hours in a year.

Rather than use the Cast function from the previous example the Floor function is used to calculate the

row number to be selected for the sample. This calculation is then used in the where clause filter the

records for the sample.

T-SQL Sampling Page 7 of 18 [email protected]

The resulting data set follows the NCQA/HEDIS technical specifications exactly and will return the same

results from the same source data every time.

One concern resulting from this approach is that identically named parents and children are unlikely to

appear together in the same sample. However, this will likely exclude duplicates as well, so might be

considered an advantage. Feel free to experiment with the sample data to determine which records are

duplicates and which may just represent multiple generations

Likewise, feel free to adapt the examples presented here to your circumstances, either for a random

sample or a truly systematic sample, for HEDIS measure or some other application.

To work through these examples, first copy and paste Appendix A into SQL Server Management Server

and run the script to create and populate the table. Use Appendix B to create a comprehensive set of

queries. Move the comment marker to activate/deactivate query sections.

Appendix A

Creating the sample data.

/*

Create the sample table Individuals and populate it

*/

USE

Demonstration

GO

/****** Object: Table [dbo].[Individuals] Script Date: 02/26/2013

10:34:23 ******/

IFEXISTS(SELECT*FROMsys.objectsWHEREobject_id=OBJECT_ID(N'[dbo].[Individuals]

')ANDtypein(N'U'))

DROPTABLE [dbo].[Individuals]

GO

/****** Object: Table [dbo].[Individuals] Script Date: 02/26/2013

10:34:24 ******/

SETANSI_NULLSON

GO

SETQUOTED_IDENTIFIERON

GO

SETANSI_PADDINGON

GO

CREATETABLE [dbo].[Individuals](

[IdNo] [int] IDENTITY(1,1)NOTNULL,

[FName] [varchar](50)NULL,

[LName] [varchar](50)NULL,

[BirthDate] [date] NULL,

[Sex] [varchar](10)NULL,

[Address] [varchar](100)NULL,

[City] [varchar](50)NULL,

[St] [varchar](10)NULL,

[Zip] [varchar](10)NULL,

[Phone] [varchar](25)NULL,

[Fee] [money] NULL

)ON [PRIMARY]

GO

SETANSI_PADDINGOFF

GO

SetNocounton

Go

Insertinto dbo.Individuals values ('Imelda','Czechs', '8/20/2013','M','121

Lasting Light Way','Buck County Village','PA','13432','(345) 148-4523 x

123',38.63 )

Insertinto dbo.Individuals values ('Imelda','Czechs', '8/20/2013','M','121

Lasting Light Way','Buck County Village','PA','13432','(345) 148-4523 x

123',38.63 )

Insertinto dbo.Individuals values ('Douse

Anne','Burnham','12/15/1935','Sex','2345 West

Maine','Anytown','IL','60604','808/445-5934', 9.25)

Insertinto dbo.Individuals values ('Sue','Flockey','8/4/1981','M','2012 S

Michigan Ave','Chicago','IL','60600','312/668-5531', 71.25)

Insertinto dbo.Individuals values ('Dasha','Chekhov','9/24/1984','F','2132 S

Michigan','Chicago','IL','60601','312/134-7467', 63.04)

Insertinto dbo.Individuals values ('Vishnu','Payup','4/4/1960','M','4022 N

Damen','Chicago','IL','60612','708/205-1234x123', 74.67)

Insertinto dbo.Individuals values ('Bjorn A.','Payne

Diaz','7/16/1960','F','4515 N Damen','Chicago','IL','60612','(312) 321-5678

', 62.37)

Insertinto dbo.Individuals values ('Wilma','Butfit','11/28/1988','F','4523 N

Paulina','Chicago','IL','60611','312/819-3891', 43.03)

Insertinto dbo.Individuals values ('Carmine','Dioxide','9/13/1981','M','4533

N Paulina','Chicago','IL','60606','312/222-9266', 73.02)

Insertinto dbo.Individuals values ('Ulanda

Hugh','Lucky','6/20/1976','F','5433 West Ave','Chicago','IL','60601','(900)

851-3471 ', 7.05)

Insertinto dbo.Individuals values ('Will','Price

Randomly','10/2/1970','F','695 N. Clinton','Chicago','IL','60601','(312) 390-

6886 x1212 ', 35.29)

Insertinto dbo.Individuals values ('Rush','Inuit','8/23/1957','F','7979 W.

Fullerton','Chicago','IL','60607','312/677-6019', 29.48)

Insertinto dbo.Individuals values ('Lou','Segusi','4/19/1957','F','7981 W.

Fullerton','Chicago','IL','60607','312/244-4610', 9.03)

Insertinto dbo.Individuals values ('Turner','Luce','1/12/1988','M','77 Sunset

Strip','Hollywood','CA','90211','114/219-4103', 6.47)

Insertinto dbo.Individuals values ('Everett','Possum','6/30/1994','M','123

Sesame St','Lansing','IL','60645','514/196-4755', 65.03)

Insertinto dbo.Individuals values ('Bud','Uronner','1/24/1963','M','640 Kay

Drive','Palo Alto','CA','90909','537/178-3081', 23.48)

Insertinto dbo.Individuals values ('Stu','Earley','1/4/1942','F','234 South

Willintonm Apt 3C','Townton','NJ','04323','174/697-1209', 15.25)

Insertinto dbo.Individuals values ('Amadeus O.','Early','9/6/2001','M','1131

N. Devon Av','Chicago','IL','60630','174/697-1209', 50.40)

Insertinto dbo.Individuals values ('Viola','Fuss','10/25/2009','M','4200

Peake Lane','Portsmouth','RI','23703','(312)222-4343', 82.83)

Insertinto dbo.Individuals values ('Phyllis','Steen','2/21/1959','M','611 N.

Devon','Chicago','IL','60630','(312)239-4343', 1.81)

Insertinto dbo.Individuals values ('Dot','Snice','10/4/2000','F','7311 Quick

Avenue','River Forest','IL','60630','(312)222-4343', 46.58)

Insertinto dbo.Individuals values ('Luciano','Pavearoadi','12/4/1960','','414

Linden Avenue','Chicago','IL','60630','(312)239-4343', 60.43)

Insertinto dbo.Individuals values ('Lois','Steem','9/6/1981','M','629 S.

Ridgeland Ave.','Chicago','IL','60630','(312)222-4343', 30.75)

Insertinto dbo.Individuals values ('Kurt','Reply','12/4/1965','F','827 N.

Marion Street','Chicago','IL','60630','(312)222-4343', 23.63)

Insertinto dbo.Individuals values ('Hugo','Gurll','8/30/1952','F','1123 Fair

Oaks','Chicago','IL','60630','(312)222-4343', 80.40)

Insertinto dbo.Individuals values ('Gladys','Radio','2/11/1962','M','100 N.

Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 98.55)

Insertinto dbo.Individuals values ('Kent C.','Detrees','4/30/2000','M','7708

Monroe','Forest Park','IL','60130','(708)692-4343', 5.50)

Insertinto dbo.Individuals values ('Joaquin','de

Planque','1/20/1975','M','825 Forest

Avenue','Chicago','IL','60630','(312)222-4343', 27.52)

Insertinto dbo.Individuals values ('Lisa','Carr','3/20/1953','F','401 Linden

Avenue','Chicago','IL','60630','(312)222-4343', 77.95)

Insertinto dbo.Individuals values ('Orson','Buggy','8/22/1970','F','165 N.

Kenilworth, #6G','Chicago','IL','60630','(312)222-4343', 50.55)

Insertinto dbo.Individuals values ('Nomar','Wheaton','9/5/1940','M','606 S.

Scoville','Chicago','IL','60630','(312)222-4343', 24.81)

Insertinto dbo.Individuals values ('Janet','Torino','9/11/1936','F','115 S.

Harvey','Chicago','IL','60630','(312)222-4343', 28.03)

Insertinto dbo.Individuals values ('Hubert H.','Humvee

II','11/14/1949','M','1025 Randolph','Chicago','IL','60630','(312)222-4343',

73.87)

Insertinto dbo.Individuals values ('Hugh','Wake','1/24/2008','M','112 N.

Kenilworth','Chicago','IL','60630','(312)222-4343', 33.37)

Insertinto dbo.Individuals values ('Adam','Illion','6/12/1967','M','110 N.

Taylor','Chicago','IL','60630','(773) 232-1212 ', 63.02)

Insertinto dbo.Individuals values ('Luke','Warm','2/26/1966','F','4917 W.

Midway Park, Apt C','Chicago','IL','60644','(312)222-4343', 76.40)

Insertinto dbo.Individuals values ('Joaquin','Matilda','8/17/1946','M','1001

S. Devon Ave','Chicago','IL','60630','(312)222-4343', 4.12)

Insertinto dbo.Individuals values ('James','Bondo','8/30/1984','M','447 N.

Kenilworth Ave.','Chicago','IL','60630','(312)222-4343', 59.37)

Insertinto dbo.Individuals values ('Rusty','Steele','10/7/1980','F','603

Edgewood Place','River Forest','IL','60540','(312)222-4343', 52.02)

Insertinto dbo.Individuals values ('Megan','Model','12/13/1974','F','116 S.

Devon','Chicago','IL','60630','(312)222-4343', 19.08)

Insertinto dbo.Individuals values ('Fitz','Matush','5/6/1951','M','1032 N.

Devon Ave','Chicago','IL','60630','(312)222-4343', 61.14)

Insertinto dbo.Individuals values ('Mischa','Turnov','7/7/1975','M','1838

Woodland Ave.','Western Springs','IL','60559','(312)222-4343', 15.07)

Insertinto dbo.Individuals values ('Nadia','Geddit','4/28/1947','M','1000

Home Ave.','Chicago','IL','60630','(312)222-4343', 84.60)

Insertinto dbo.Individuals values ('Freida','Gogh','3/14/2007','M','7301

Ibsen','Chicago','IL','60631','(312)222-4343', 27.92)

Insertinto dbo.Individuals values ('Frieda','Wander','11/14/1971','M','15 E.

Jackson','Chicago','IL','60604','(312)222-4343', 51.54)

Insertinto dbo.Individuals values ('Sasha','Noyes','7/14/1963','F','428 S.

East Ave.','Chicago','IL','60630','(312)222-4343', 80.72)

Insertinto dbo.Individuals values ('Ed','Amame','2/24/1954','F','746 N.

Lombard Ave.','Chicago','IL','60630','(312)222-4343', 44.23)

Insertinto dbo.Individuals values ('Vera Lee','Isay','10/10/1954','M','1028

Gunderson Ave.','Chicago','IL','60630','(312)222-4343', 24.85)

Insertinto dbo.Individuals values ('Juan','Anatou','3/2/2005','F','1028

Gunderson Ave.','Chicago','IL','60630','(312)222-4343', 13.79)

Insertinto dbo.Individuals values ('I.','Shelby

Released','12/4/1952','M','1126 Hayes','Chicago','IL','60630','', 25.58)

Insertinto dbo.Individuals values ('Tilda','Plierslip','3/2/1981','M','108

Bishop Quarter Lane','Chicago','IL','60630','(312)222-4343', 76.78)

Insertinto dbo.Individuals values ('Odessa','Paige

Turner','10/27/1982','F','152 N. Scoville','Chicago','IL','60630','(312)222-

4343', 85.12)

Insertinto dbo.Individuals values ('Hadley','Newham','12/22/2012','F','433 S.

Ridgeland Ave.','Chicago','IL','60630','(312)222-4343', 31.11)

Insertinto dbo.Individuals values ('Menachem','Down','7/30/1967','M','633 N.

Marion','Chicago','IL','60630','(312)222-4343', 16.88)

Insertinto dbo.Individuals values ('Eureka','Garlic','10/8/1943','M','1031 S.

Gunderson','Chicago','IL','60630','(312)222-4343', 62.59)

Insertinto dbo.Individuals values ('Isaiah','Olchap','10/9/1950','M','743 S.

Gunderson','Chicago','IL','60630','(312)222-4343', 34.92)

Insertinto dbo.Individuals values ('Laura','Biden','5/11/2012','F','647

Woodbine','Chicago','IL','60630','(312)222-4343', 99.88)

Insertinto dbo.Individuals values ('Hugo','First','1/3/2005','F','100 Forest

Place','Chicago','IL','60630','(312)222-4343', 81.48)

Insertinto dbo.Individuals values ('Angus','MacCoatup','8/18/2001','F','425

Washington Blvd. #1','Chicago','IL','60630','(312)222-4343', 97.90)

Insertinto dbo.Individuals values ('Phillip','Airtime','10/26/1955','M','831

N. Grove','Chicago','IL','60630','(312)222-4343', 67.01)

Insertinto dbo.Individuals values ('Bruno','Moore','1/9/1981','M','13 E. Lake

Street','Northlake','IL','60164','(312)222-4343', 73.63)

Insertinto dbo.Individuals values ('Carlos','Antenna','7/15/1991','M','151

Lemoyne Parkway','Chicago','IL','60630','(312)222-4343', 76.83)

Insertinto dbo.Individuals values

('Euripedes','Ibreakayourface','2/24/1975','F','127 S. Home

Ave.','Chicago','IL','60630','(312)222-4343', 56.94)

Insertinto dbo.Individuals values ('Sam','Boney','6/13/1955','F','911

Lathrop','River Forest','IL','60630','(312)222-4343', 3.92)

Insertinto dbo.Individuals values ('Barbara','Seville','9/21/1945','F','128

S. Austin','Chicago','IL','60630','(312)222-4343', 40.32)

Insertinto dbo.Individuals values ('Horatio','Algebra','2/23/1958','M','641

N. Marion St.','Chicago','IL','60630','(312)222-4343', 38.21)

Insertinto dbo.Individuals values ('Amos','Reid','6/22/1943','M','416

Harrison St.','Chicago','IL','60630','(312)222-4343', 95.53)

Insertinto dbo.Individuals values ('Ira','Caull','10/10/1956','M','721

Ontario #204','Chicago','IL','60630','(312)222-4343', 27.43)

Insertinto dbo.Individuals values ('Victor','Analysis','11/5/2001','F','1656

W. Estes Ave.','Chicago','IL','60645','(312)222-4343', 6.05)

Insertinto dbo.Individuals values ('Art','Majors','10/7/1968','M','746

Clinton Place','River Forest','IL','60630','(312)222-4343', 24.95)

Insertinto dbo.Individuals values ('Bernadette','Bridge','7/14/1993','M','426

S. Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 96.87)

Insertinto dbo.Individuals values ('Wayne','Back','9/9/1993','M','1136 S.

Scoville Ave.','Chicago','IL','60630','(312)222-4343', 59.72)

Insertinto dbo.Individuals values ('Juan','Menudo','4/15/1993','M','117 S.

Euclid Ave.','Chicago','IL','60630','(312)222-4343', 44.15)

Insertinto dbo.Individuals values ('Jacques','Hughes','10/1/1966','F','1021

N. Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 25.14)

Insertinto dbo.Individuals values ('Yessir','Itsaflat','5/16/1955','F','11050

Westminster','Westchester','IL','60154','(312)222-4343', 39.31)

Insertinto dbo.Individuals values ('Al','Lowetta','6/27/1941','M','936

Chicago Ave.','Chicago','IL','60630','(312)222-4343', 15.16)

Insertinto dbo.Individuals values ('Saul','Wellingood','9/21/1984','M','124

S. Elmwood','Chicago','IL','60630','(312)222-4343', 42.79)

Insertinto dbo.Individuals values ('Jillian','Here','2/13/1947','M','124 S.

Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 14.97)

Insertinto dbo.Individuals values ('Colette','ODay','12/28/1971','M','1125

Linden','Chicago','IL','60630','(312)222-4343', 81.40)

Insertinto dbo.Individuals values ('Hugh','Jass','4/13/1992','F','141 S.

Taylor Ave.','Chicago','IL','60630','(312)222-4343', 35.32)

Insertinto dbo.Individuals values ('Gladys','Overwith','10/6/1942','F','1000

N. Harvey','Chicago','IL','60630','(312)222-4343', 50.18)

Insertinto dbo.Individuals values

('George','Stayontopothis','4/19/1988','F','1500 Monroe','River

Forest','IL','60630','(312)222-4343', 97.75)

Insertinto dbo.Individuals values ('Ophelia','Paine','9/9/1997','M','111 N.

Elmwood','Chicago','IL','60630','(312)222-4343', 44.48)

Insertinto dbo.Individuals values ('Xavier','Breath','12/2/2002','F','119 S.

Harvey Ave.','Chicago','IL','60630','(312)222-4343', 22.64)

Insertinto dbo.Individuals values ('Levon','Hold','1/18/1980','F','147

Harrison','Chicago','IL','60630','(312)222-4343', 7.16)

Insertinto dbo.Individuals values ('Billy','Aiken','3/15/1965','F','1200

Linden Ave.','Chicago','IL','60630','(312)222-4343', 19.58)

Insertinto dbo.Individuals values ('C.','Boynton Glick','6/25/1942','M','114

Lake','Chicago','IL','60630','(312)222-4343', 41.97)

Insertinto dbo.Individuals values ('Philip','Harmonic','1/12/1985','M','134

Gale Ave.','River Forest','IL','60630','(312)222-4343', 61.82)

Insertinto dbo.Individuals values ('Yvonne','Apeesamey','5/31/1957','M','1047

Wenonah','Chicago','IL','60630','(312)222-4343', 98.38)

Insertinto dbo.Individuals values ('Eileen','Tudor-

Wright','6/24/2012','M','415 N. Elmwood

Ave.','Chicago','IL','60630','(312)222-4343', 82.41)

Insertinto dbo.Individuals values ('Nadia','Belimi','10/24/1993','M','129 S.

Ridgeland','Chicago','IL','60630','(312)222-4343', 62.28)

Insertinto dbo.Individuals values ('Dustin','Dubree','6/18/1977','F','15255

4th Ave.','Phoenix','IL','60426','(312)222-4343', 13.63)

Insertinto dbo.Individuals values ('Evan','Elpus','7/8/1956','F','122 N.

Ridgeland','Chicago','IL','60630','(312)222-4343', 2.75)

Insertinto dbo.Individuals values ('Cody','Pendant','8/8/2013','F','120 S.

Taylor Ave.','Chicago','IL','60630','(312)222-4343', 92.84)

Insertinto dbo.Individuals values ('Pat','Pending','4/10/2010','M','125 S.

Elmwood','Chicago','IL','60630','(312)222-4343', 6.47)

Insertinto dbo.Individuals values ('Hugh','Lyon Sack','12/1/1974','F','636

Linden Ave.','Chicago','Il','60630','(312)222-4343', 69.50)

Insertinto dbo.Individuals values ('Drew A.','Blank','2/26/2000','M','116 S.

Scoville Ave.','Chicago','IL','60630','(312)222-4343', 15.34)

Insertinto dbo.Individuals values ('Lauren','Order','7/31/1936','M','167 N.

Ridgeland','Our Fair City','MA','10101','(312)222-4343', 71.99)

Insertinto dbo.Individuals values ('Rex','Galore','4/20/1965','M','623 N.

Euclid','Chicago','IL','60630','(312)222-4343', 95.65)

Insertinto dbo.Individuals values ('Haywood','Jabuzoff','3/18/2006','F','720

S. Harvey','Chicago','IL','60630','(312)222-4343', 49.34)

Insertinto dbo.Individuals values ('Justin','Volk V','10/2/1979','F','938

Norht Blvd., #205','Chicago','IL','60630','(312)222-4343', 96.82)

Insertinto dbo.Individuals values ('Heronimus

B.','Blind','8/28/1973','M','1126 Edmer

Ave.','Chicago','IL','60630','(312)222-4343', 32.02)

Insertinto dbo.Individuals values

('Donnatella','DiCoppas','1/5/1998','M','635

Fairoaks','Chicago','IL','60630','(312)222-4343', 92.51)

Insertinto dbo.Individuals values ('Gil T.','Azell','4/29/1950','M','412

Randolph St.','Chicago','IL','60630','(312)222-4343', 28.18)

Insertinto dbo.Individuals values ('Major','Error','9/19/1991','M','124 S.

Devon','Chicago','IL','60630','(312)222-4343', 83.65)

Insertinto dbo.Individuals values ('Ginger','Vitis','8/5/1964','F','904

Forest Ave.','Chicago','IL','60630','(312)222-4343', 99.68)

Insertinto dbo.Individuals values ('Don','Pickett','1/29/1993','M','1020

Clinton Ave.','Chicago','IL','60630','(312)222-4343', 91.54)

Insertinto dbo.Individuals values ('Ike','Arumba','5/16/1956','M','1112 N.

Elmwood Ave','Chicago','IL','60630','(312)222-4343', 40.02)

Insertinto dbo.Individuals values ('Tyra','Meesu','7/23/1973','F','P.O. BOX

770','Chicago','IL','60630','(312)222-4343', 21.45)

Insertinto dbo.Individuals values ('Bill','Shredder','12/22/1995','F','110 W.

Madison Ave. #2F','Chicago','IL','60630','(312)222-4343', 24.35)

Insertinto dbo.Individuals values ('Dot','Matrix','4/19/1969','F','933

Jackson Ave.','River Forest','IL','60630','(312)222-4343', 31.08)

Insertinto dbo.Individuals values ('Fred','Knott','5/7/1989','M','121 Home

Ave.','Chicago','IL','60630','(312)222-4343', 23.69)

Insertinto dbo.Individuals values ('Marianna','Trench','12/27/1965','M','141

S. Scoville Ave.','Chicago','IL','60630','(312)222-4343', 4.01)

Insertinto dbo.Individuals values ('Anita','Hammer','6/23/1980','M','1231

Belleforte','Chicago','IL','60630','(312)222-4343', 26.66)

Insertinto dbo.Individuals values ('Upton','Leftus','9/23/1987','F','126 N.

Ridgeland','Chicago','IL','60630','(312)222-4343', 73.82)

Insertinto dbo.Individuals values ('Amanda

B.','Reckondwyth','8/25/1936','F','1132 N.

Ridgeland','Chicago','IL','60630','(312)222-4343', 20.11)

Insertinto dbo.Individuals values ('Nomar','Winter','6/24/1948','F','800

Gunderson Ave.','Chicago','IL','60630','(312)222-4343', 74.19)

Insertinto dbo.Individuals values ('Iona','Heap','9/14/1999','M','424 S.

Austin Blvd. #3','Chicago','IL','60630','(312)222-4343', 26.92)

Insertinto dbo.Individuals values ('Lucinda','Boltz','7/31/2007','F','170 N.

Cuyler','Chicago','IL','60630','(312)222-4343', 93.56)

Insertinto dbo.Individuals values ('Kay','Sera','7/29/1976','M','283 Pleasent

Valley Rd','Westville','OH','34534','', 1.72)

Insertinto dbo.Individuals values ('Juan','Moorehouse','8/13/1967','F','234

Coldwater','Minneapolis','MN','57564','', 68.11)

Insertinto dbo.Individuals values ('Rose','Hips','8/15/1983','F','121 Temona

Dr','Pleasent Hills','PA','50143','', 96.28)

Insertinto dbo.Individuals values ('Isabelle','Ringing','9/2/1936','M','350

N. Orleans, #892','Chicago','IL','60654- ','', 99.67)

Insertinto dbo.Individuals values ('Maury','Missions','2/6/1984','M','5411 W

Fullerton Ave','Chicago','IL','60639-1482','', 40.95)

Insertinto dbo.Individuals values ('Oscar','Ruitt','4/11/1940','M','2141

South Tan Court','Chicago','IL','60616- ','', 23.96)

Insertinto dbo.Individuals values ('Lois','Bidder','10/27/2012','F','1400 W

Augusta Blvd','Chicago','IL','60622-3939','', 52.40)

Insertinto dbo.Individuals values ('Donatella','Debois','6/1/1982','M','25 E

Washington St Fl 16','Chicago','IL','60602-1708','', 2.93)

Insertinto dbo.Individuals values ('Eamon','Lowe','7/21/1990','M','1515 W

Monroe St','Chicago','IL','60607-2497','', 17.38)

Insertinto dbo.Individuals values ('Linus','Scrimmage','1/19/2012','F','923

N. Robinson, Suite 400','Oklahoma City','OK','73102-2203','', 49.70)

Insertinto dbo.Individuals values ('Holly','Unlikely','10/16/1961','F','3

First National Plaz','Chicago','IL','60602- ','', 67.60)

Insertinto dbo.Individuals values ('Eileen','Yorway','6/30/1992','M','2448 W

Grace St','Chicago','IL','60618-4719','', 8.05)

Insertinto dbo.Individuals values ('Lee','Eyeapoka','12/29/1997','F','100 W

Randolph St','Chicago','IL','60601-3108','', 84.23)

Insertinto dbo.Individuals values ('Donatello','Nobatti','7/19/1999','M','401

S Clinton St','Chicago','IL','60607- ','', 3.02)

Insertinto dbo.Individuals values ('Ewell','Rudy Day','9/27/1980','F','500 N

Peshtigo Ct','Chicago','IL','60611-4309','', 37.42)

Insertinto dbo.Individuals values ('Sumner','Reruns','8/14/1990','M','401 S

Clinton St','Chicago','IL','60607- ','', 33.97)

Insertinto dbo.Individuals values ('Holly','Unlikely','4/2/1986','M','77 W

Jackson Blvd','Chicago','IL','60604-3511','', 9.13)

Insertinto dbo.Individuals values ('Ophelia','Self','1/13/1945','F','4554 N.

Broadway, St. 301','Chicago','IL','60640- ','', 81.84)

-- EOF Add_Individuals.sql

Appendix B

Queries

-- Samples.sql

/*

Sample methods to select a sample set from a larger table

*/

Use

Demonstration

Go

-- View the entire data set

/*

-- Shows entire table contents in data entry order

Select

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

*/

/*

Shows the first 25 records in the order that they were entered.

Not a good approachto retrieving a trustworthly sample.

Includes dupelicate rows 1 and 2

Select Top 25

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

*/

/*

Shows 25 randomly selected records.

This is the preferred method for a truely random sample.

Each time this query run it will select a different set of records.

Add "Percent" after "Top 25"to select a percentage of the full data set

Select Top 25 percent

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

Order By

NewId()

*/

/*

Use the modulus to select every nth record based on IdNo. Avoid any

early record bias by startingat a higher IdNo

Select

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

Where

(IdNo + 2) %5 = 0

And I.IdNo > 11

*/

/*

The NCQA Systematic Selection standard call for dataset to be sorted by

last name, first name and birthday and for this set to be numbered

consectutively. This query, which will be used

in the following examples as a common table expression. It is shown

here to demonstrate the sortingfunctionality of Row_Number directly.

Select

ROW_NUMBER() Over (Order By I.LName, I.FName, I.BirthDate Asc)

RowNum,

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

*/

/*

Systematic selection of records per NCQA only requires the row number

for the actually selection. Oncecompleted, the results can be joined

back to the orignal dataset for additional records

;With Ind_List_CTE (RowNum, IdNo, LName, FName, BirthDate)

As (

Select

ROW_NUMBER() Over (Order By I.LName, I.FName, I.BirthDate Asc)

RowNum,

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

)

Select * from Ind_List_CTE

*/

/*

For the systematic sample per HEDIS/NCQA Technical Specifications.

Assume a starting value of 2

;With Row_List_CTE (RowNum)

As (

Select

ROW_NUMBER() Over (Order By I.LName, I.FName, I.BirthDate Asc)

From

Individuals I

)

Select Top 25

RowNum

, RowNum - 1 SelRec

, 137.0/25.0 NR

, (RowNum-1) * (137.0/25.0) RN1

, CAST(ROUND(2+(RowNum-1) * (137.0/25.0),0) As Int) Final

From

Row_List_CTE

*/

/*

Final Query

*/

Declare @EvalDt Date

Declare @N Decimal(9,3)

Set @EvalDt ='12-31-2012'

Set @N =137.0/25.0

;With Ind_List_CTE(RowNum, IdNo, LName, FName, BirthDate)

As (

Select

ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc)

RowNum,

I.IdNo,

I.LName,

I.FName,

I.BirthDate

From

Individuals I

)

Select

RowNum

, IdNo

, FName

, LName

, BirthDate

,DATEDIFF(Hour,BirthDate, @EvalDt)/8766 As Age

From

Ind_List_CTE

Where RowNum In

(

SelectTop 25

--CAST(ROUND(2+(RowNum-1) * @N,0) As Int)

Floor(ROUND(2+(RowNum-1)* @N,0))

From

Ind_List_CTE

)

-- EOF Samples.sql