scott hollenbeck – scott.m.hollenbeck@irs barry johnson – barry.w.johnson@irs

25
Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal Revenue Service Scott Hollenbeck – [email protected] Barry Johnson – [email protected] Melissa Ludlum – [email protected]

Upload: brielle-fischer

Post on 30-Dec-2015

46 views

Category:

Documents


0 download

DESCRIPTION

Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal Revenue Service. Scott Hollenbeck – [email protected] Barry Johnson – [email protected] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Improving the Quality of Tax Statistics: Recent Innovations in Editing and

Imputation Techniques at the Statistics of Income Division of the U.S. Internal Revenue Service

Scott Hollenbeck – [email protected] Johnson – [email protected]

Melissa Ludlum – [email protected]

Page 2: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Today’s Presentation

Overview of Statistics of Income (SOI)

Dealing with Missing Data

Recent Innovations

Future Plans

Page 3: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

What Does SOI Do?

Primary source of U.S. tax data Data from 110 tax returns and information documents Test and correct data collected during administrative

processing (IRS Masterfile) Collect extensive additional data from forms, schedules

and attachments Most projects collect data from samples Products

Micro data files for U.S. Treasury Department & Congress Public-use files Tables and analysis (www.irs.gov/taxstats)

Page 4: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

SOI Data Collection Systems

Maintains computer network separate from main IRS processing

Data collection takes place in IRS Submissions Processing Centers

Graphical User Interface (GUI) systems based in ORACLE

Data tested for internal consistency Post-edit processing overseen by

headquarters’ staff

Page 5: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Three Major SOI Programs

Individual Income Tax Filed by individuals and married couples to report most

forms of personal income 133 million returns filed in 2006

Corporation Income Tax Filed by incorporated businesses to report income from

parent corporation and subsidiaries 2.5 million returns filed in 2006

Tax-exempt Organizations Annual information returns report assets, income,

expenses 833,000 returns filed in 2006

Page 6: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Missing Data – Unit Nonresponse

Causes Extensions/late-filed returns Tax evasion

Strategies Update values from prior year using survey

responses Utilize records for recent prior years filed

during the selection period

Page 7: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Missing Data – Item Nonresponse

Causes Taxpayer neglects to provide attachments Paper return is being used by another IRS

function Strategies

Use IRS Masterfile data for key values Impute values based on existing data and

information provided on prior and/or subsequent return

Surveys and direct contact with preparers

Page 8: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

What’s New?

Digital images of tax returns

Electronic filing

Automated error correction/imputation routines

Page 9: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Digital Return Images In 1998 SOI began scanning operations Images stored in Tagged Image File Format (TIFF) In 2006, imaged more than 71.5 million pages

from 30 different tax and information returns Many users:

SOI headquarters staff SOI edit operations IRS Functions General Public (tax-exempt organizations only)

Page 10: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Split-Screen Edit Systems

Combines scanned image and GUI edit system on a single 24 inch wide-aspect monitor

Image displayed using Adobe Acrobat or specially adapted ORACLE programs

Image and edit systems are synchronized

Online access to instructions, dictionaries, other tools

Page 11: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs
Page 12: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Split-Screen Edit Systems

Positive feedback from editors Slight overall improvement in productivity and

quality Images available to geographically disbursed

work force Reduced storage of paper documents Reduced impact on other IRS functions

Page 13: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Electronic Filing of Tax Returns

2004 Modernized electronic filing (MeF) began Uses Extensible Markup Language (XML) to

capture: Numeric and character strings supplied by

taxpayer Information tags

2005 mandatory e-file for large business and tax-exempt organizations 20.5% SOI sample of corporate income taxes 13.5% SOI sample of tax-exempt organizations

Page 14: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

SOI Use of MeF Data

In 2006, SOI developed programs to render digital images from XML data

Edit returns using split-screen applications

In 2007, will populate ORACLE data tablesdirectly with XML data Editors will validate data, supply codes and

allocate certain data items

Page 15: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Electronic Filing of Tax Returns

Individual income tax returns 1986 – E-file through paid preparers 1992 – E-file from home computers allowed 1994 – 98% of all filers eligible to e-file 2006 – 73 million returns, or 54%, e-filed Data stored in Tax Return Database (TRDB)

ASCII data, not tagged XML 2010 – Scheduled for conversion to MeF

Page 16: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

SOI Individual Income Tax Program

Sample of returns processed differently depending on certain criteria

Edited returns

“Missing returns”

Forced closed returns

Page 17: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Individual Processing Programs

Online editing system – editors transcribe, code and review any potential data discrepancies

Post Edit Reconciliation Process (PERP) – automated computer program which validates and adjusts data

Page 18: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Edited Returns

Edited returns are processed through the online editing system by an editor, then reviewed using the PERP program

Prior to Tax Year 2004, all sampled returns which were not “missing” were manually edited

Currently only paper returns and electronically filed returns with specific characteristics are edited through online system

Page 19: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

“Missing Returns”

Each year, approximately 250 paper returns selected for the sample are not located

Limited IRS Masterfile data available

PERP program used to impute missingdetails of forms and schedules

Page 20: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Forced Closed Returns

Automated processing of certain E-filed returns in the SOI sample

Bypass the online editing system and processed through the PERP program

Returns with possible discrepancies are reviewed by National Office analyst

Returns that pass all tests are considered “forced closed” and added to final data file

Page 21: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Results from Forced Closing Returns

Tax Year 2004 – First year using automated closing of selected electronically filed returns

Total sample size – 200,295 returns Electronically filed – 64,670 returns “Forced Closed” – 18,193 returns Editing hours saved – 1,400 hours

Page 22: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

Results from Forced Closing Returns

Tax Year 2005 – Second year of program, expanded criteria for returns eligible to be “forced closed”

Total sample size – 292,837 returns Electronically filed – 114,897 returns “Forced Closed” – 47,753 returns Editing hours saved – 4,100 hours

Page 23: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

The Future - Data

More returns and information documents will be filed electronically

Optical Character Recognition or Intelligent Character Recognition will be used to capture data from paper-filed returns

Data will be available in real time Enable larger sample sizes and increased

use of population files

Page 24: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

The Future – Field Operations

Increased resources dedicated to resolving data inconsistencies as opposed to data transcription

Paperless environment – use of electronic data or digital images created from paper returns

Increased use of prior year data to identify and correct data anomalies

Page 25: Scott Hollenbeck –  Scott.M.Hollenbeck@irs Barry Johnson –  Barry.W.Johnson@irs

The Future - Products

Improvements in technology and increased use of electronic filing will allow SOI to produce more data, more quickly and more efficiently

Increased sample sizes will allow small area estimates

Population files will allow for creation of ad hoc panels, linkage of data items across tax form types and research on infrequent data items