1 external data sources 2008 cas ratemaking seminar march 17-18, 2008 john stenmark consulting...

27
1 External Data External Data Sources Sources 2008 CAS Ratemaking Seminar 2008 CAS Ratemaking Seminar March 17-18, 2008 March 17-18, 2008 John Stenmark John Stenmark Consulting Actuary Consulting Actuary Actuarial Data Management Services Actuarial Data Management Services AD MS

Upload: kristopher-reed

Post on 22-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

1

External Data External Data SourcesSources

2008 CAS Ratemaking Seminar2008 CAS Ratemaking Seminar March 17-18, 2008March 17-18, 2008

John StenmarkJohn StenmarkConsulting ActuaryConsulting Actuary

Actuarial Data Management ServicesActuarial Data Management Services

ADMS

2

External Data SourcesExternal Data SourcesOverviewOverview

Why use (Zip Code based) Demographic Data

Zip Code vs. Zip Code Tabulation Area Issues

Some Possible Methodologies to Address those Issues

Census.gov Data Guide Cartographic Boundary File Guide

3

External Data SourcesExternal Data SourcesWhy use (Zip Code based) Demographic DataWhy use (Zip Code based) Demographic Data

Predictive Modeling allows/encourages the use of data outside of the rating variables and, in fact, outside of the company.

The first external data that a company is likely to use is Demographic Data, usually by Zip Code.

Predictive Modeling has two phases: the Modeling itself (usually frequency and severity based) and derivation of rates and relativities using the modeled data (frequency and severity combined into modeled pure premium)

Demographic data is used in the Modeling Phase Especially useful in a multi-state database

4

External Data SourcesExternal Data SourcesZip Code vs. Zip Code Tabulation Area IssuesZip Code vs. Zip Code Tabulation Area Issues

The ProblemThe Problem Postal Zip Codes are not defined by Boundaries, but

by postal routes. ZCTAs (as used in this presentation) were defined

during the 2000 census as boundaries by the U.S. Census Bureau based, at least partially, on the U. S. Postal Zip Codes at that time and have not been changed.

The Postal Zip Codes in your data do change quite often.

Over time your insured postal zip codes (and the territory boundaries defined from those codes) will increasingly diverge from the ZCTAs.

Therefore the Zip Code for a particular policy or claim may not have demographic data associated with it.

5

External Data SourcesExternal Data SourcesZip Code vs. Zip Code Tabulation Area IssuesZip Code vs. Zip Code Tabulation Area Issues

The Solution(s)The Solution(s)

Assign Derived Demographic Data elements by County (filling entire database)

Then assign data elements by ZCTA where there is a match

Disadvantage: Slight inaccuracy problem where Postal Zip is in a different geographic area from ZCTA

Disadvantage: Precision is inconsistent (county demographics for some insureds and Zip for others)

Advantage: Easy to apply

6

External Data SourcesExternal Data SourcesZip Code vs. Zip Code Tabulation Area IssuesZip Code vs. Zip Code Tabulation Area Issues

The Solution(s)The Solution(s)

Geocode company data Assign each policy/claim to ZCTA using

U. S. Census Bureau boundary files. Assign ZCTA demographic data to

policy/claim Disadvantage: More complex and time

consuming (resource intensive) Advantage: Far more accurate

7

External Data SourcesExternal Data SourcesCensus DataCensus Data

Average Education YearsAverage Education Years Population DensityPopulation Density Mean Age Mean Age Percent RuralPercent Rural Percent FarmPercent Farm Travel TimeTravel Time Median Income Median Income

Median year Owner occupied Median year Owner occupied structure built structure built

Median year householder moved into Median year householder moved into unit unit

Median value for all owner-occupied Median value for all owner-occupied housing units housing units

Median price asked Median price asked Median selected monthly owner costsMedian selected monthly owner costs

Average Education YearsAverage Education Years Population DensityPopulation Density Mean Age Mean Age Percent RuralPercent Rural Percent FarmPercent Farm Travel TimeTravel Time Median Income Median Income

There are numerous sources of demographic data but… The source for most of these data is the US Census

Bureau at http://www.census.gov/ . Many variables can be derived from this data. Some possible variables appear below:

8

External Data SourcesExternal Data Sources Census DataCensus Data

9

External Data SourcesExternal Data Sources Census DataCensus Data

10

External Data SourcesExternal Data Sources Census DataCensus Data

11

External Data SourcesExternal Data Sources Census DataCensus Data

12

External Data SourcesExternal Data Sources Census DataCensus Data

13

External Data SourcesExternal Data Sources Census DataCensus Data

■ For step-by-step instructions for moving the data and the structure into a data base format (including screen shots), please see

www.census.gov/support/SF3ASCII.html . ■ Structure files in Access97 and other formats are available at

http://www.census.gov/support/2000/SF3/ .■ We are unable to provide one-on-one support for applications of the data to specific spreadsheets or data base software.

So how do you use the data from these Zip Files?So how do you use the data from these Zip Files?

Very Cryptic Text FilesVery Cryptic Text Files

An Access database is available (referenced in the An Access database is available (referenced in the Readme document.) The text below is from: Readme document.) The text below is from: ftp://ftp2.census.gov/census_2000/datasets/Summary_File_3/Arkansas/ftp://ftp2.census.gov/census_2000/datasets/Summary_File_3/Arkansas/0README_SF3.doc0README_SF3.doc

14

External Data SourcesExternal Data Sources Census DataCensus Data

To download SF3.mdb Click here:To download SF3.mdb Click here:

15

External Data SourcesExternal Data Sources Census DataCensus Data

After Downloading SF3.mdb open the Access database

Seventy six tables corresponding to the seventy-six zipped ftp files

In addition a SF3GEO Table, SF3GEO Dictionary Table and a Tables Table

These define the structure of the database

16

External Data SourcesExternal Data Sources Census DataCensus Data

Data is imported into the tables using the File – Get External Data – Import Command.

You will need to change the file extensions from .uf3 to .txt for this to work.

The geo files are fixed width the others are comma delimited

The database has specs for each table and these can (should) be accessed using the Advanced button on the import wizard.

17

18

External Data SourcesExternal Data Sources Census DataCensus Data

Use the “Tables” Table to select the columns that you want, then determine which files you need to import.

Remember that the tables contain all geographic areas: State, County, Zip, Block, County/Zip, etc.You will need to work with one of those at a time.Summing the entire file will scale up your results somewhat

19

External Data SourcesExternal Data Sources Census DataCensus Data

The Tables TableThe TEXT column provides the

description of each data element

The TABLE column provides the Table name (remember the data

must be loaded into each needed table)

The FIELDNUM column provides the Column Name that

will contain the data element.

In this case to get one stat (e.g. Average Education) a weighted

average is required

So to get Average Education the table (P037) tells us we must load Table SF30003 from file

named la00003 and Select P037001 thru P037035

20

External Data SourcesExternal Data Sources Census DataCensus Data

The SF3GEO TableIndexed on LOGRECNO Column

The NAME column provides the Description for each row

The ZCTA5 Column provides the five digit Zip for the row

Notice that there are partial Zips (split between Parishes/Counties)

21

External Data SourcesExternal Data Sources Census DataCensus Data

SELECT SF3GEO.ZCTA5, SELECT SF3GEO.ZCTA5, SF3GEO.AREALAND, SF3GEO.NAME, SF3GEO.AREALAND, SF3GEO.NAME, SF30003.P037001SF30003.P037001FROM SF3GEO INNER JOIN SF30003 ON FROM SF3GEO INNER JOIN SF30003 ON SF3GEO.LOGRECNO = SF3GEO.LOGRECNO = SF30003.LOGRECNOSF30003.LOGRECNOWHERE (((SF3GEO.COUNTY) Is Null) AND WHERE (((SF3GEO.COUNTY) Is Null) AND ((SF3GEO.ZCTA5) Not Like "###XX" And ((SF3GEO.ZCTA5) Not Like "###XX" And (SF3GEO.ZCTA5) Not Like "###HH"))(SF3GEO.ZCTA5) Not Like "###HH"))ORDER BY SF3GEO.ZCTA5;ORDER BY SF3GEO.ZCTA5;

By joining SF3GEO and the selected table on LOGRECNO the demographic data is by columns and the geographic data is by rows.

Note: Make sure the query selects only the geographic data desired. I. e. give it the smell test

The following query:

Yields:Yields:

22

External Data SourcesExternal Data Sources Cartographic Boundary FilesCartographic Boundary Files

So how do you use the data from these Zip Files? Very Cryptic Text Files An Access database is available (referenced in

the Readme document). The text below is from:

ftp://ftp2.census.gov/census_2000/datasets/Summary_File_3/Arkansas/0README_SF3.doc

(but any of the 0readme.doc will do)

■ For step-by-step instructions for moving the data and the structure into a data base format (including screen shots), please see

www.census.gov/support/SF3ASCII.html . ■ Structure files in Access97 and other formats are available at

http://www.census.gov/support/2000/SF3/ .■ We are unable to provide one-on-one support for applications of the data to specific spreadsheets or data base software.

23

External Data SourcesExternal Data Sources Cartographic Boundary FilesCartographic Boundary Files

In addition to demographic data the Census Bureau publishes boundary files for each of its boundaries

Remember - since Postal Zip Codes and Postal Zip Code definitions change over time and the Census Bureau redefined ZCTAs somewhat for the census there will be a mismatch between the boundary files and the Zip Codes in your experience database

First go to: http://www.census.gov/geo/www/cob/bdy_files.html

24

External Data SourcesExternal Data Sources Cartographic Boundary FilesCartographic Boundary Files

From there: For 5-Digit ZIP Code Tabulation Areas

(ZCTAs) go to: http://www.census.gov/geo/www/cob/z52000.html

For County and County Equivalent Areas go to: http://www.census.gov/geo/www/cob/co2000.html

25

External Data SourcesExternal Data Sources Cartographic Boundary FilesCartographic Boundary Files

Three types of files on each page. For ZCTAs they are: Census 2000 5-Digit ZIP Code Tabulation Areas

(ZCTAs) in ARC/INFO Export (.e00) format Census 2000 5-Digit ZIP Code Tabulation Areas

(ZCTAs) in ArcView Shapefile (.shp) format Census 2000 5-Digit ZIP Code Tabulation Areas

(ZCTAs) in ARC/INFO Ungenerate (ASCII) format Most mapping software will read the Shapefile

format.

26

External Data External Data SourcesSources

2008 CAS Ratemaking Seminar2008 CAS Ratemaking Seminar March 17-18, 2008March 17-18, 2008

John StenmarkJohn StenmarkConsulting ActuaryConsulting Actuary

Actuarial Data Management ServicesActuarial Data Management Services(601) 955-3022(601) 955-3022

[email protected]@comcast.netADMS

27

External Data External Data SourcesSources

2008 CAS Ratemaking Seminar2008 CAS Ratemaking Seminar March 17-18, 2008March 17-18, 2008

John StenmarkJohn StenmarkConsulting ActuaryConsulting Actuary

Actuarial Data Management ServicesActuarial Data Management Services(601) 955-3022(601) 955-3022

[email protected] [email protected]