sampling data in t-sql
DESCRIPTION
Explanation and examples of sampling SQL tables via random or systematic methods with emphasis on NCQA/HEDIS methodologyTRANSCRIPT
T-SQL Sampling Page 1 of 18 [email protected]
Sampling Data in T-SQL
One of the great benefits of databases is that numerical analysis can be done against the entire population.
Trends and behaviors can be performed just as easily against ten million records as against one hundred.
That is, assuming that all of the information necessary is contained within the data set. Unfortunately,
that isn’t always the case; sometimes it is necessary to conduct manual research for additional information
and report against the results.
The goal of sampling is to select the smallest number of records that significantly represent the
characteristics of the whole. The statistical process of determining sample size N will not be addressed
here; it is assumed that you know N and are only curious about how to select that many records in a way
that guarantees a reasonable level of randomness.
One of the inspirations to document the process of sampling SQL data was the NCQA/HEDIS Systematic
Sample Methodology. NCQA is the National Committee for Quality Assurance, which runs a research
program called the Healthcare Effectiveness Data and Information Set (HEDIS). The majority of HEDIS
measures involve analyzing the entire population within a health plan – women between specific ages,
and counting the number who have received specific procedures – a breast cancer screening. However,
some measures require data not currently in the medical or claims data, and these utilizes a hybrid
approach to identifying the eligible population, then sampling that population to identify specific
members whose medical records will be examined manually to determine how that person’s results will
be evaluated.
But before jumping into the NCQA/HEDIS sampling methodology, let’s examine some simpler
alternatives. Appendix A contains a T-SQL script to create and populate a sample table called Individuals
within a database named Demonstration. You are welcome to create the same table and run the example
queries to make this more of a hands-on experience. Likewise, the techniques shown here can be used in
any number of situations beyond HEDIS measures. On a side note: the names used are drawn from the
staff list of the NPR radio show Car Talk.
Example 1: Selecting all records
The SQL query:
Select
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
will produce a result set of 137 row in data entry order.
T-SQL Sampling Page 2 of 18 [email protected]
Note that some records are duplicated and the results are show in data entry order. Selecting record in
data entry or natural order is not sufficient for sampling – there are too many biases built in. Thus,
assuming that we want the sample size N to be 25, the Top 25 constraint in the query will return the just
25 records, just the wrong one from a statistical standpoint.
SelectTop 25
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
The need to randomize the records selected can be achieved by using the NewId() function to put the
returned records in random order. By adding the clause Order by NewId() to the statement you will see a
different set of records returned every time you run the query.
SelectTop 25 percent
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
OrderBy
NewId()
Note that by adding percent to the Top 25 clause you will return a percentage of the total table – in this
case, 35 rows out of the 137 total.
T-SQL Sampling Page 3 of 18 [email protected]
This is the simplest way to extract a specific sized random sample from a data set. Unfortunately, not all
statisticians will accept a purely random sample. Whether this bias is the result of pre-computer era
difficulties in generating random numbers and applying them to a population or it represents a legitimate
concern about the distribution of the data, many situations will require a periodic or systematic sample of
the data. Until the abacus and quill pen generation that creates these specifications finally dies off
systematic sampling will remain a mandatory skill.
The simplest, and most common (but not in HEDIS), systematic sampling method is to select a starting
record and then every nth record from that record on. Determining the nth record interval found by
dividing the total population/record count (137 in our example) by the sample size N or 5.48, which is
rounded down to 5. A simple way to skip a set numbers of records in a set is with modulus. The
mathematical function modulus (also just mod) shows the remainder of one number divided by another.
For example, the modulus of 6 /5 is 1, while mod 15/5 is 0. In T-SQL modulus operator is “%.” The
individuals table has a sequential system assigned field call IdNo, so selecting every 5th record can be
achieved by adding the constraint IdNo%5 = 0; to start at a number greater than 1 (say 11), add the
constraint IdNo> 11. To change the offset, add some starting value to IdNo.
The query
Select
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
Where
(IdNo + 2)%5 = 0
And I.IdNo > 11
will return 25 records exactly, with a three position offset from mod 5.
Omitting the IdNo> 11 condition will return 27 rows, but you can apply a Top 25 constraint to return the
exact N desired. This method does a very good job of returning every nth record as determined by the
IdNo field. However, if the table has significant deletions, and the deleted records follow any kind of
pattern, the exact periodicity of the sequence can be jeopardized.
The Row_Number() function in T-SQL allows you assign a new sequential number to each row when the
query is executed. The same mod function can be used with the new row number with a much more
T-SQL Sampling Page 4 of 18 [email protected]
regular segmentation. To jump ahead a bit, the NCQA/HEDIS systematic sampling methodology
requires that the population be ordered by last name, first name, and birth date (in descending order one
year, and ascending the next) and since Row_Number() requires a sort, the following query orders and
numbers the full data set per that requirement.
The query
Select
ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc) RowNum,
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
returns
From here it is trivial to apply the previous mod example to the RowNum field and return a reasonable
periodic sample. Unfortunately, the NCQA/HEDIS systematic sampling methodology does not use a set
interval. Instead, each ith member (in this example, the second through 25th records returned) has a
specific calculation.
The first record is called START and is determined by multiplying a random number supplied by NCQA
by the eligible members (EM) divided by the final sample size (FSS). In the calculations EM/FSS is
referred to as N. For this exercise assume that START is equal to 2.
The calculation for each member of the sample is
ith member = START + [(i-1) x N]
In T-SQL a common table expression (CTE) can be used to determine the row to be selected as follows.
;With Row_List_CTE(RowNum)
As (
Select
ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc)
From
T-SQL Sampling Page 5 of 18 [email protected]
Individuals I
)
SelectTop 25
RowNum
, RowNum - 1 SelRec
, 137.0/25.0 NR
,(RowNum-1)*(137.0/25.0) RN1
,CAST(ROUND(2+(RowNum-1)*(137.0/25.0),0)AsInt) Final
From
Row_List_CTE
The first line defines the CTE Row_List_CTE with a single column called RowNum. The Select clause
following the As populates the CTE with rows 1 through 137.
The next Select statement uses the CTE to demonstrate each step in the calculation.
RowNumis the original row number. SelRec is the sample record to select and is the [(i-1)… part of the
calculation. NR is N in the calculation, the result of EM/FSS (137/25) or 5.48. RN1 shows the core of the
calculation or [(i-1) x N]. The Final field shows the calculation that rounds RN1 to the nearest integer
and adds the START value of 2. This is the row number of the record to be used in the sample. This
approach selects an interval of 5 or 6 rows between each selected record which does, grudgingly, improve
the quality of the sample by varying the interval.
To show the basic elements of name, birth date, and age in the final query the same CTE can be used to
select the demographics and determine the sample.
Declare @EvalDt Date
Declare @N Decimal(9,3)
Set @EvalDt ='12-31-2012'
Set @N =137.0/25.0
;With Ind_List_CTE(RowNum, IdNo, LName, FName, BirthDate)
As (
Select
ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc)
RowNum,
I.IdNo,
T-SQL Sampling Page 6 of 18 [email protected]
I.LName,
I.FName,
I.BirthDate
From
Individuals I
)
Select
RowNum
, IdNo
, FName
, LName
, BirthDate
,DATEDIFF(Hour,BirthDate, @EvalDt)/8766 As Age
From
Ind_List_CTE
Where RowNum In
(
SelectTop 25
--CAST(ROUND(2+(RowNum-1) * @N,0) As Int)
Floor(ROUND(2+(RowNum-1)* @N,0))
From
Ind_List_CTE
)
In order to make the query more useable in the future, key variables are declared in the first section.
Because the typical HEDIS measure wants the age of the member at the conclusion of the evaluation year
the parameter @EvalDate is defined and set to 12-31-2012. Subsequently age will be determined by
calculating the difference in hours between the birth date and the evaluation date and dividing that by
8766, the average number of hours in a year.
Rather than use the Cast function from the previous example the Floor function is used to calculate the
row number to be selected for the sample. This calculation is then used in the where clause filter the
records for the sample.
T-SQL Sampling Page 7 of 18 [email protected]
The resulting data set follows the NCQA/HEDIS technical specifications exactly and will return the same
results from the same source data every time.
One concern resulting from this approach is that identically named parents and children are unlikely to
appear together in the same sample. However, this will likely exclude duplicates as well, so might be
considered an advantage. Feel free to experiment with the sample data to determine which records are
duplicates and which may just represent multiple generations
Likewise, feel free to adapt the examples presented here to your circumstances, either for a random
sample or a truly systematic sample, for HEDIS measure or some other application.
To work through these examples, first copy and paste Appendix A into SQL Server Management Server
and run the script to create and populate the table. Use Appendix B to create a comprehensive set of
queries. Move the comment marker to activate/deactivate query sections.
Appendix A
Creating the sample data.
/*
Create the sample table Individuals and populate it
*/
USE
Demonstration
GO
/****** Object: Table [dbo].[Individuals] Script Date: 02/26/2013
10:34:23 ******/
IFEXISTS(SELECT*FROMsys.objectsWHEREobject_id=OBJECT_ID(N'[dbo].[Individuals]
')ANDtypein(N'U'))
DROPTABLE [dbo].[Individuals]
GO
/****** Object: Table [dbo].[Individuals] Script Date: 02/26/2013
10:34:24 ******/
SETANSI_NULLSON
GO
SETQUOTED_IDENTIFIERON
GO
SETANSI_PADDINGON
GO
CREATETABLE [dbo].[Individuals](
[IdNo] [int] IDENTITY(1,1)NOTNULL,
[FName] [varchar](50)NULL,
[LName] [varchar](50)NULL,
[BirthDate] [date] NULL,
[Sex] [varchar](10)NULL,
[Address] [varchar](100)NULL,
[City] [varchar](50)NULL,
[St] [varchar](10)NULL,
[Zip] [varchar](10)NULL,
[Phone] [varchar](25)NULL,
[Fee] [money] NULL
)ON [PRIMARY]
GO
SETANSI_PADDINGOFF
GO
SetNocounton
Go
Insertinto dbo.Individuals values ('Imelda','Czechs', '8/20/2013','M','121
Lasting Light Way','Buck County Village','PA','13432','(345) 148-4523 x
123',38.63 )
Insertinto dbo.Individuals values ('Imelda','Czechs', '8/20/2013','M','121
Lasting Light Way','Buck County Village','PA','13432','(345) 148-4523 x
123',38.63 )
Insertinto dbo.Individuals values ('Douse
Anne','Burnham','12/15/1935','Sex','2345 West
Maine','Anytown','IL','60604','808/445-5934', 9.25)
Insertinto dbo.Individuals values ('Sue','Flockey','8/4/1981','M','2012 S
Michigan Ave','Chicago','IL','60600','312/668-5531', 71.25)
Insertinto dbo.Individuals values ('Dasha','Chekhov','9/24/1984','F','2132 S
Michigan','Chicago','IL','60601','312/134-7467', 63.04)
Insertinto dbo.Individuals values ('Vishnu','Payup','4/4/1960','M','4022 N
Damen','Chicago','IL','60612','708/205-1234x123', 74.67)
Insertinto dbo.Individuals values ('Bjorn A.','Payne
Diaz','7/16/1960','F','4515 N Damen','Chicago','IL','60612','(312) 321-5678
', 62.37)
Insertinto dbo.Individuals values ('Wilma','Butfit','11/28/1988','F','4523 N
Paulina','Chicago','IL','60611','312/819-3891', 43.03)
Insertinto dbo.Individuals values ('Carmine','Dioxide','9/13/1981','M','4533
N Paulina','Chicago','IL','60606','312/222-9266', 73.02)
Insertinto dbo.Individuals values ('Ulanda
Hugh','Lucky','6/20/1976','F','5433 West Ave','Chicago','IL','60601','(900)
851-3471 ', 7.05)
Insertinto dbo.Individuals values ('Will','Price
Randomly','10/2/1970','F','695 N. Clinton','Chicago','IL','60601','(312) 390-
6886 x1212 ', 35.29)
Insertinto dbo.Individuals values ('Rush','Inuit','8/23/1957','F','7979 W.
Fullerton','Chicago','IL','60607','312/677-6019', 29.48)
Insertinto dbo.Individuals values ('Lou','Segusi','4/19/1957','F','7981 W.
Fullerton','Chicago','IL','60607','312/244-4610', 9.03)
Insertinto dbo.Individuals values ('Turner','Luce','1/12/1988','M','77 Sunset
Strip','Hollywood','CA','90211','114/219-4103', 6.47)
Insertinto dbo.Individuals values ('Everett','Possum','6/30/1994','M','123
Sesame St','Lansing','IL','60645','514/196-4755', 65.03)
Insertinto dbo.Individuals values ('Bud','Uronner','1/24/1963','M','640 Kay
Drive','Palo Alto','CA','90909','537/178-3081', 23.48)
Insertinto dbo.Individuals values ('Stu','Earley','1/4/1942','F','234 South
Willintonm Apt 3C','Townton','NJ','04323','174/697-1209', 15.25)
Insertinto dbo.Individuals values ('Amadeus O.','Early','9/6/2001','M','1131
N. Devon Av','Chicago','IL','60630','174/697-1209', 50.40)
Insertinto dbo.Individuals values ('Viola','Fuss','10/25/2009','M','4200
Peake Lane','Portsmouth','RI','23703','(312)222-4343', 82.83)
Insertinto dbo.Individuals values ('Phyllis','Steen','2/21/1959','M','611 N.
Devon','Chicago','IL','60630','(312)239-4343', 1.81)
Insertinto dbo.Individuals values ('Dot','Snice','10/4/2000','F','7311 Quick
Avenue','River Forest','IL','60630','(312)222-4343', 46.58)
Insertinto dbo.Individuals values ('Luciano','Pavearoadi','12/4/1960','','414
Linden Avenue','Chicago','IL','60630','(312)239-4343', 60.43)
Insertinto dbo.Individuals values ('Lois','Steem','9/6/1981','M','629 S.
Ridgeland Ave.','Chicago','IL','60630','(312)222-4343', 30.75)
Insertinto dbo.Individuals values ('Kurt','Reply','12/4/1965','F','827 N.
Marion Street','Chicago','IL','60630','(312)222-4343', 23.63)
Insertinto dbo.Individuals values ('Hugo','Gurll','8/30/1952','F','1123 Fair
Oaks','Chicago','IL','60630','(312)222-4343', 80.40)
Insertinto dbo.Individuals values ('Gladys','Radio','2/11/1962','M','100 N.
Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 98.55)
Insertinto dbo.Individuals values ('Kent C.','Detrees','4/30/2000','M','7708
Monroe','Forest Park','IL','60130','(708)692-4343', 5.50)
Insertinto dbo.Individuals values ('Joaquin','de
Planque','1/20/1975','M','825 Forest
Avenue','Chicago','IL','60630','(312)222-4343', 27.52)
Insertinto dbo.Individuals values ('Lisa','Carr','3/20/1953','F','401 Linden
Avenue','Chicago','IL','60630','(312)222-4343', 77.95)
Insertinto dbo.Individuals values ('Orson','Buggy','8/22/1970','F','165 N.
Kenilworth, #6G','Chicago','IL','60630','(312)222-4343', 50.55)
Insertinto dbo.Individuals values ('Nomar','Wheaton','9/5/1940','M','606 S.
Scoville','Chicago','IL','60630','(312)222-4343', 24.81)
Insertinto dbo.Individuals values ('Janet','Torino','9/11/1936','F','115 S.
Harvey','Chicago','IL','60630','(312)222-4343', 28.03)
Insertinto dbo.Individuals values ('Hubert H.','Humvee
II','11/14/1949','M','1025 Randolph','Chicago','IL','60630','(312)222-4343',
73.87)
Insertinto dbo.Individuals values ('Hugh','Wake','1/24/2008','M','112 N.
Kenilworth','Chicago','IL','60630','(312)222-4343', 33.37)
Insertinto dbo.Individuals values ('Adam','Illion','6/12/1967','M','110 N.
Taylor','Chicago','IL','60630','(773) 232-1212 ', 63.02)
Insertinto dbo.Individuals values ('Luke','Warm','2/26/1966','F','4917 W.
Midway Park, Apt C','Chicago','IL','60644','(312)222-4343', 76.40)
Insertinto dbo.Individuals values ('Joaquin','Matilda','8/17/1946','M','1001
S. Devon Ave','Chicago','IL','60630','(312)222-4343', 4.12)
Insertinto dbo.Individuals values ('James','Bondo','8/30/1984','M','447 N.
Kenilworth Ave.','Chicago','IL','60630','(312)222-4343', 59.37)
Insertinto dbo.Individuals values ('Rusty','Steele','10/7/1980','F','603
Edgewood Place','River Forest','IL','60540','(312)222-4343', 52.02)
Insertinto dbo.Individuals values ('Megan','Model','12/13/1974','F','116 S.
Devon','Chicago','IL','60630','(312)222-4343', 19.08)
Insertinto dbo.Individuals values ('Fitz','Matush','5/6/1951','M','1032 N.
Devon Ave','Chicago','IL','60630','(312)222-4343', 61.14)
Insertinto dbo.Individuals values ('Mischa','Turnov','7/7/1975','M','1838
Woodland Ave.','Western Springs','IL','60559','(312)222-4343', 15.07)
Insertinto dbo.Individuals values ('Nadia','Geddit','4/28/1947','M','1000
Home Ave.','Chicago','IL','60630','(312)222-4343', 84.60)
Insertinto dbo.Individuals values ('Freida','Gogh','3/14/2007','M','7301
Ibsen','Chicago','IL','60631','(312)222-4343', 27.92)
Insertinto dbo.Individuals values ('Frieda','Wander','11/14/1971','M','15 E.
Jackson','Chicago','IL','60604','(312)222-4343', 51.54)
Insertinto dbo.Individuals values ('Sasha','Noyes','7/14/1963','F','428 S.
East Ave.','Chicago','IL','60630','(312)222-4343', 80.72)
Insertinto dbo.Individuals values ('Ed','Amame','2/24/1954','F','746 N.
Lombard Ave.','Chicago','IL','60630','(312)222-4343', 44.23)
Insertinto dbo.Individuals values ('Vera Lee','Isay','10/10/1954','M','1028
Gunderson Ave.','Chicago','IL','60630','(312)222-4343', 24.85)
Insertinto dbo.Individuals values ('Juan','Anatou','3/2/2005','F','1028
Gunderson Ave.','Chicago','IL','60630','(312)222-4343', 13.79)
Insertinto dbo.Individuals values ('I.','Shelby
Released','12/4/1952','M','1126 Hayes','Chicago','IL','60630','', 25.58)
Insertinto dbo.Individuals values ('Tilda','Plierslip','3/2/1981','M','108
Bishop Quarter Lane','Chicago','IL','60630','(312)222-4343', 76.78)
Insertinto dbo.Individuals values ('Odessa','Paige
Turner','10/27/1982','F','152 N. Scoville','Chicago','IL','60630','(312)222-
4343', 85.12)
Insertinto dbo.Individuals values ('Hadley','Newham','12/22/2012','F','433 S.
Ridgeland Ave.','Chicago','IL','60630','(312)222-4343', 31.11)
Insertinto dbo.Individuals values ('Menachem','Down','7/30/1967','M','633 N.
Marion','Chicago','IL','60630','(312)222-4343', 16.88)
Insertinto dbo.Individuals values ('Eureka','Garlic','10/8/1943','M','1031 S.
Gunderson','Chicago','IL','60630','(312)222-4343', 62.59)
Insertinto dbo.Individuals values ('Isaiah','Olchap','10/9/1950','M','743 S.
Gunderson','Chicago','IL','60630','(312)222-4343', 34.92)
Insertinto dbo.Individuals values ('Laura','Biden','5/11/2012','F','647
Woodbine','Chicago','IL','60630','(312)222-4343', 99.88)
Insertinto dbo.Individuals values ('Hugo','First','1/3/2005','F','100 Forest
Place','Chicago','IL','60630','(312)222-4343', 81.48)
Insertinto dbo.Individuals values ('Angus','MacCoatup','8/18/2001','F','425
Washington Blvd. #1','Chicago','IL','60630','(312)222-4343', 97.90)
Insertinto dbo.Individuals values ('Phillip','Airtime','10/26/1955','M','831
N. Grove','Chicago','IL','60630','(312)222-4343', 67.01)
Insertinto dbo.Individuals values ('Bruno','Moore','1/9/1981','M','13 E. Lake
Street','Northlake','IL','60164','(312)222-4343', 73.63)
Insertinto dbo.Individuals values ('Carlos','Antenna','7/15/1991','M','151
Lemoyne Parkway','Chicago','IL','60630','(312)222-4343', 76.83)
Insertinto dbo.Individuals values
('Euripedes','Ibreakayourface','2/24/1975','F','127 S. Home
Ave.','Chicago','IL','60630','(312)222-4343', 56.94)
Insertinto dbo.Individuals values ('Sam','Boney','6/13/1955','F','911
Lathrop','River Forest','IL','60630','(312)222-4343', 3.92)
Insertinto dbo.Individuals values ('Barbara','Seville','9/21/1945','F','128
S. Austin','Chicago','IL','60630','(312)222-4343', 40.32)
Insertinto dbo.Individuals values ('Horatio','Algebra','2/23/1958','M','641
N. Marion St.','Chicago','IL','60630','(312)222-4343', 38.21)
Insertinto dbo.Individuals values ('Amos','Reid','6/22/1943','M','416
Harrison St.','Chicago','IL','60630','(312)222-4343', 95.53)
Insertinto dbo.Individuals values ('Ira','Caull','10/10/1956','M','721
Ontario #204','Chicago','IL','60630','(312)222-4343', 27.43)
Insertinto dbo.Individuals values ('Victor','Analysis','11/5/2001','F','1656
W. Estes Ave.','Chicago','IL','60645','(312)222-4343', 6.05)
Insertinto dbo.Individuals values ('Art','Majors','10/7/1968','M','746
Clinton Place','River Forest','IL','60630','(312)222-4343', 24.95)
Insertinto dbo.Individuals values ('Bernadette','Bridge','7/14/1993','M','426
S. Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 96.87)
Insertinto dbo.Individuals values ('Wayne','Back','9/9/1993','M','1136 S.
Scoville Ave.','Chicago','IL','60630','(312)222-4343', 59.72)
Insertinto dbo.Individuals values ('Juan','Menudo','4/15/1993','M','117 S.
Euclid Ave.','Chicago','IL','60630','(312)222-4343', 44.15)
Insertinto dbo.Individuals values ('Jacques','Hughes','10/1/1966','F','1021
N. Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 25.14)
Insertinto dbo.Individuals values ('Yessir','Itsaflat','5/16/1955','F','11050
Westminster','Westchester','IL','60154','(312)222-4343', 39.31)
Insertinto dbo.Individuals values ('Al','Lowetta','6/27/1941','M','936
Chicago Ave.','Chicago','IL','60630','(312)222-4343', 15.16)
Insertinto dbo.Individuals values ('Saul','Wellingood','9/21/1984','M','124
S. Elmwood','Chicago','IL','60630','(312)222-4343', 42.79)
Insertinto dbo.Individuals values ('Jillian','Here','2/13/1947','M','124 S.
Elmwood Ave.','Chicago','IL','60630','(312)222-4343', 14.97)
Insertinto dbo.Individuals values ('Colette','ODay','12/28/1971','M','1125
Linden','Chicago','IL','60630','(312)222-4343', 81.40)
Insertinto dbo.Individuals values ('Hugh','Jass','4/13/1992','F','141 S.
Taylor Ave.','Chicago','IL','60630','(312)222-4343', 35.32)
Insertinto dbo.Individuals values ('Gladys','Overwith','10/6/1942','F','1000
N. Harvey','Chicago','IL','60630','(312)222-4343', 50.18)
Insertinto dbo.Individuals values
('George','Stayontopothis','4/19/1988','F','1500 Monroe','River
Forest','IL','60630','(312)222-4343', 97.75)
Insertinto dbo.Individuals values ('Ophelia','Paine','9/9/1997','M','111 N.
Elmwood','Chicago','IL','60630','(312)222-4343', 44.48)
Insertinto dbo.Individuals values ('Xavier','Breath','12/2/2002','F','119 S.
Harvey Ave.','Chicago','IL','60630','(312)222-4343', 22.64)
Insertinto dbo.Individuals values ('Levon','Hold','1/18/1980','F','147
Harrison','Chicago','IL','60630','(312)222-4343', 7.16)
Insertinto dbo.Individuals values ('Billy','Aiken','3/15/1965','F','1200
Linden Ave.','Chicago','IL','60630','(312)222-4343', 19.58)
Insertinto dbo.Individuals values ('C.','Boynton Glick','6/25/1942','M','114
Lake','Chicago','IL','60630','(312)222-4343', 41.97)
Insertinto dbo.Individuals values ('Philip','Harmonic','1/12/1985','M','134
Gale Ave.','River Forest','IL','60630','(312)222-4343', 61.82)
Insertinto dbo.Individuals values ('Yvonne','Apeesamey','5/31/1957','M','1047
Wenonah','Chicago','IL','60630','(312)222-4343', 98.38)
Insertinto dbo.Individuals values ('Eileen','Tudor-
Wright','6/24/2012','M','415 N. Elmwood
Ave.','Chicago','IL','60630','(312)222-4343', 82.41)
Insertinto dbo.Individuals values ('Nadia','Belimi','10/24/1993','M','129 S.
Ridgeland','Chicago','IL','60630','(312)222-4343', 62.28)
Insertinto dbo.Individuals values ('Dustin','Dubree','6/18/1977','F','15255
4th Ave.','Phoenix','IL','60426','(312)222-4343', 13.63)
Insertinto dbo.Individuals values ('Evan','Elpus','7/8/1956','F','122 N.
Ridgeland','Chicago','IL','60630','(312)222-4343', 2.75)
Insertinto dbo.Individuals values ('Cody','Pendant','8/8/2013','F','120 S.
Taylor Ave.','Chicago','IL','60630','(312)222-4343', 92.84)
Insertinto dbo.Individuals values ('Pat','Pending','4/10/2010','M','125 S.
Elmwood','Chicago','IL','60630','(312)222-4343', 6.47)
Insertinto dbo.Individuals values ('Hugh','Lyon Sack','12/1/1974','F','636
Linden Ave.','Chicago','Il','60630','(312)222-4343', 69.50)
Insertinto dbo.Individuals values ('Drew A.','Blank','2/26/2000','M','116 S.
Scoville Ave.','Chicago','IL','60630','(312)222-4343', 15.34)
Insertinto dbo.Individuals values ('Lauren','Order','7/31/1936','M','167 N.
Ridgeland','Our Fair City','MA','10101','(312)222-4343', 71.99)
Insertinto dbo.Individuals values ('Rex','Galore','4/20/1965','M','623 N.
Euclid','Chicago','IL','60630','(312)222-4343', 95.65)
Insertinto dbo.Individuals values ('Haywood','Jabuzoff','3/18/2006','F','720
S. Harvey','Chicago','IL','60630','(312)222-4343', 49.34)
Insertinto dbo.Individuals values ('Justin','Volk V','10/2/1979','F','938
Norht Blvd., #205','Chicago','IL','60630','(312)222-4343', 96.82)
Insertinto dbo.Individuals values ('Heronimus
B.','Blind','8/28/1973','M','1126 Edmer
Ave.','Chicago','IL','60630','(312)222-4343', 32.02)
Insertinto dbo.Individuals values
('Donnatella','DiCoppas','1/5/1998','M','635
Fairoaks','Chicago','IL','60630','(312)222-4343', 92.51)
Insertinto dbo.Individuals values ('Gil T.','Azell','4/29/1950','M','412
Randolph St.','Chicago','IL','60630','(312)222-4343', 28.18)
Insertinto dbo.Individuals values ('Major','Error','9/19/1991','M','124 S.
Devon','Chicago','IL','60630','(312)222-4343', 83.65)
Insertinto dbo.Individuals values ('Ginger','Vitis','8/5/1964','F','904
Forest Ave.','Chicago','IL','60630','(312)222-4343', 99.68)
Insertinto dbo.Individuals values ('Don','Pickett','1/29/1993','M','1020
Clinton Ave.','Chicago','IL','60630','(312)222-4343', 91.54)
Insertinto dbo.Individuals values ('Ike','Arumba','5/16/1956','M','1112 N.
Elmwood Ave','Chicago','IL','60630','(312)222-4343', 40.02)
Insertinto dbo.Individuals values ('Tyra','Meesu','7/23/1973','F','P.O. BOX
770','Chicago','IL','60630','(312)222-4343', 21.45)
Insertinto dbo.Individuals values ('Bill','Shredder','12/22/1995','F','110 W.
Madison Ave. #2F','Chicago','IL','60630','(312)222-4343', 24.35)
Insertinto dbo.Individuals values ('Dot','Matrix','4/19/1969','F','933
Jackson Ave.','River Forest','IL','60630','(312)222-4343', 31.08)
Insertinto dbo.Individuals values ('Fred','Knott','5/7/1989','M','121 Home
Ave.','Chicago','IL','60630','(312)222-4343', 23.69)
Insertinto dbo.Individuals values ('Marianna','Trench','12/27/1965','M','141
S. Scoville Ave.','Chicago','IL','60630','(312)222-4343', 4.01)
Insertinto dbo.Individuals values ('Anita','Hammer','6/23/1980','M','1231
Belleforte','Chicago','IL','60630','(312)222-4343', 26.66)
Insertinto dbo.Individuals values ('Upton','Leftus','9/23/1987','F','126 N.
Ridgeland','Chicago','IL','60630','(312)222-4343', 73.82)
Insertinto dbo.Individuals values ('Amanda
B.','Reckondwyth','8/25/1936','F','1132 N.
Ridgeland','Chicago','IL','60630','(312)222-4343', 20.11)
Insertinto dbo.Individuals values ('Nomar','Winter','6/24/1948','F','800
Gunderson Ave.','Chicago','IL','60630','(312)222-4343', 74.19)
Insertinto dbo.Individuals values ('Iona','Heap','9/14/1999','M','424 S.
Austin Blvd. #3','Chicago','IL','60630','(312)222-4343', 26.92)
Insertinto dbo.Individuals values ('Lucinda','Boltz','7/31/2007','F','170 N.
Cuyler','Chicago','IL','60630','(312)222-4343', 93.56)
Insertinto dbo.Individuals values ('Kay','Sera','7/29/1976','M','283 Pleasent
Valley Rd','Westville','OH','34534','', 1.72)
Insertinto dbo.Individuals values ('Juan','Moorehouse','8/13/1967','F','234
Coldwater','Minneapolis','MN','57564','', 68.11)
Insertinto dbo.Individuals values ('Rose','Hips','8/15/1983','F','121 Temona
Dr','Pleasent Hills','PA','50143','', 96.28)
Insertinto dbo.Individuals values ('Isabelle','Ringing','9/2/1936','M','350
N. Orleans, #892','Chicago','IL','60654- ','', 99.67)
Insertinto dbo.Individuals values ('Maury','Missions','2/6/1984','M','5411 W
Fullerton Ave','Chicago','IL','60639-1482','', 40.95)
Insertinto dbo.Individuals values ('Oscar','Ruitt','4/11/1940','M','2141
South Tan Court','Chicago','IL','60616- ','', 23.96)
Insertinto dbo.Individuals values ('Lois','Bidder','10/27/2012','F','1400 W
Augusta Blvd','Chicago','IL','60622-3939','', 52.40)
Insertinto dbo.Individuals values ('Donatella','Debois','6/1/1982','M','25 E
Washington St Fl 16','Chicago','IL','60602-1708','', 2.93)
Insertinto dbo.Individuals values ('Eamon','Lowe','7/21/1990','M','1515 W
Monroe St','Chicago','IL','60607-2497','', 17.38)
Insertinto dbo.Individuals values ('Linus','Scrimmage','1/19/2012','F','923
N. Robinson, Suite 400','Oklahoma City','OK','73102-2203','', 49.70)
Insertinto dbo.Individuals values ('Holly','Unlikely','10/16/1961','F','3
First National Plaz','Chicago','IL','60602- ','', 67.60)
Insertinto dbo.Individuals values ('Eileen','Yorway','6/30/1992','M','2448 W
Grace St','Chicago','IL','60618-4719','', 8.05)
Insertinto dbo.Individuals values ('Lee','Eyeapoka','12/29/1997','F','100 W
Randolph St','Chicago','IL','60601-3108','', 84.23)
Insertinto dbo.Individuals values ('Donatello','Nobatti','7/19/1999','M','401
S Clinton St','Chicago','IL','60607- ','', 3.02)
Insertinto dbo.Individuals values ('Ewell','Rudy Day','9/27/1980','F','500 N
Peshtigo Ct','Chicago','IL','60611-4309','', 37.42)
Insertinto dbo.Individuals values ('Sumner','Reruns','8/14/1990','M','401 S
Clinton St','Chicago','IL','60607- ','', 33.97)
Insertinto dbo.Individuals values ('Holly','Unlikely','4/2/1986','M','77 W
Jackson Blvd','Chicago','IL','60604-3511','', 9.13)
Insertinto dbo.Individuals values ('Ophelia','Self','1/13/1945','F','4554 N.
Broadway, St. 301','Chicago','IL','60640- ','', 81.84)
-- EOF Add_Individuals.sql
Appendix B
Queries
-- Samples.sql
/*
Sample methods to select a sample set from a larger table
*/
Use
Demonstration
Go
-- View the entire data set
/*
-- Shows entire table contents in data entry order
Select
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
*/
/*
Shows the first 25 records in the order that they were entered.
Not a good approachto retrieving a trustworthly sample.
Includes dupelicate rows 1 and 2
Select Top 25
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
*/
/*
Shows 25 randomly selected records.
This is the preferred method for a truely random sample.
Each time this query run it will select a different set of records.
Add "Percent" after "Top 25"to select a percentage of the full data set
Select Top 25 percent
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
Order By
NewId()
*/
/*
Use the modulus to select every nth record based on IdNo. Avoid any
early record bias by startingat a higher IdNo
Select
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
Where
(IdNo + 2) %5 = 0
And I.IdNo > 11
*/
/*
The NCQA Systematic Selection standard call for dataset to be sorted by
last name, first name and birthday and for this set to be numbered
consectutively. This query, which will be used
in the following examples as a common table expression. It is shown
here to demonstrate the sortingfunctionality of Row_Number directly.
Select
ROW_NUMBER() Over (Order By I.LName, I.FName, I.BirthDate Asc)
RowNum,
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
*/
/*
Systematic selection of records per NCQA only requires the row number
for the actually selection. Oncecompleted, the results can be joined
back to the orignal dataset for additional records
;With Ind_List_CTE (RowNum, IdNo, LName, FName, BirthDate)
As (
Select
ROW_NUMBER() Over (Order By I.LName, I.FName, I.BirthDate Asc)
RowNum,
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
)
Select * from Ind_List_CTE
*/
/*
For the systematic sample per HEDIS/NCQA Technical Specifications.
Assume a starting value of 2
;With Row_List_CTE (RowNum)
As (
Select
ROW_NUMBER() Over (Order By I.LName, I.FName, I.BirthDate Asc)
From
Individuals I
)
Select Top 25
RowNum
, RowNum - 1 SelRec
, 137.0/25.0 NR
, (RowNum-1) * (137.0/25.0) RN1
, CAST(ROUND(2+(RowNum-1) * (137.0/25.0),0) As Int) Final
From
Row_List_CTE
*/
/*
Final Query
*/
Declare @EvalDt Date
Declare @N Decimal(9,3)
Set @EvalDt ='12-31-2012'
Set @N =137.0/25.0
;With Ind_List_CTE(RowNum, IdNo, LName, FName, BirthDate)
As (
Select
ROW_NUMBER()Over (OrderBy I.LName, I.FName, I.BirthDate Asc)
RowNum,
I.IdNo,
I.LName,
I.FName,
I.BirthDate
From
Individuals I
)
Select
RowNum
, IdNo
, FName
, LName