logo of contributing agency - statsarchive.stats.govt.nz/~/media/statistics/browse...5 1 purpose of...

46
IDI Data Dictionary: IR tax data September 2015 edition

Upload: others

Post on 12-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

IDI Data Dictionary:

IR tax data

September 2015 edition

Logo of contributing agency

Crown copyright ©

This work is licensed under the Creative Commons Attribution 3.0 New Zealand licence.

You are free to copy, distribute, and adapt the work, as long as you attribute the work to

Statistics NZ and abide by the other licence terms. Please note you may not use any

departmental or governmental emblem, logo, or coat of arms in any way that infringes any

provision of the Flags, Emblems, and Names Protection Act 1981. Use the wording

‘Statistics New Zealand’ in your attribution, not the Statistics NZ logo.

Liability

While all care and diligence has been used in processing, analysing, and extracting data

and information in this publication, Statistics New Zealand gives no warranty it is error free

and will not be liable for any loss or damage suffered by the use directly, or indirectly, of the

information in this publication.

Citation

Statistics New Zealand (2015). IDI Data Dictionary: IR tax data (September 2015 edition).

Available from www.stats.govt.nz.

ISSN 2463-3615 (online)

Published in September 2015 by

Statistics New Zealand

Tatauranga Aotearoa

Wellington, New Zealand

Contact

Statistics New Zealand Information Centre: [email protected]

Phone toll-free 0508 525 525

Phone international +64 4 931 4600

www.stats.govt.nz

3

Contents

1 Purpose of this data dictionary .................................................................................... 5

2 About the tax data ......................................................................................................... 6

Coverage ......................................................................................................................... 6

Methodology .................................................................................................................... 6

Privacy, security, or confidentiality issues ....................................................................... 6

List of datasets ................................................................................................................. 6

3 Data dictionary for ird_ems .......................................................................................... 7

Dataset description .......................................................................................................... 7

Summary table ................................................................................................................. 7

Detailed information ......................................................................................................... 8

4 Data dictionary for ird_addresses ............................................................................ 15

Dataset description ........................................................................................................ 15

Summary table ............................................................................................................... 15

Detailed information ....................................................................................................... 15

5 Data dictionary for ird_customers ............................................................................. 20

Dataset description ........................................................................................................ 20

Summary table ............................................................................................................... 20

Detailed information ....................................................................................................... 20

6 Data dictionary for ird_client_names ........................................................................ 24

Dataset description ........................................................................................................ 24

Summary table ............................................................................................................... 24

Detailed information ....................................................................................................... 24

7 Data dictionary for ird_tax_registrations .................................................................. 27

Dataset description ........................................................................................................ 27

Summary table ............................................................................................................... 27

Detailed information ....................................................................................................... 27

8 Data dictionary for ird_cross_reference ................................................................... 31

Dataset description ........................................................................................................ 31

Summary table ............................................................................................................... 31

Detailed information ....................................................................................................... 31

9 Data dictionary for ird_rtns_keypoints_ir3 ............................................................... 34

Dataset description ........................................................................................................ 34

Summary table ............................................................................................................... 34

Detailed information ....................................................................................................... 34

IDI Data Dictionary: IR tax data (September 2015 edition)

4

10 Data dictionary for ird_attachments_ir20 ................................................................. 38

Dataset description ........................................................................................................ 38

Summary table ............................................................................................................... 38

Detailed information ....................................................................................................... 38

11 Data dictionary for ird_attachments_ir4s ................................................................. 41

Dataset description ........................................................................................................ 41

Summary table ............................................................................................................... 41

Detailed information ....................................................................................................... 41

12 Data dictionary for ird_old_systems_numbers ........................................................ 44

Dataset description ........................................................................................................ 44

Summary table ............................................................................................................... 44

Detailed information ....................................................................................................... 44

13 Glossary ........................................................................................................................ 46

5

1 Purpose of this data dictionary

IDI Data Dictionary: IR tax data (September 2015 edition) documents the content of the datasets the Inland Revenue (IR) provides to Statistics New Zealand to use in the Integrated Data Infrastructure (IDI). This document pulls together a number of documents that exist in relation to the IR tax data to create a ‘formalised’ central reference point for users.

This dictionary gives information on the variables contained in the IR tax datasets from April 1999 – including technical information and descriptions.

Use this data dictionary if you are interested in understanding and accessing the IR tax data in the IDI for your research.

6

2 About the tax data

Coverage Reference period start: 1 April 1999

Reference period end: ongoing

Geographic coverage: all New Zealand

Methodology Type of data: administrative data capture.

Data collector: Inland Revenue

Frequency of data collection: supplied monthly to the IDI

Privacy, security, or confidentiality issues In addition to the confidentiality clauses pertaining to all data held by Statistics New Zealand, the use of IR tax data is governed under conditions specified under the Memorandum of Understanding between Stats NZ and Inland Revenue as well as the conditions covered under the Tax Administration Act 1994.

The IR tax datasets that are accessible to researchers do not contain any name or address information to identify an individual. All researchers who have access to the tax data have had their research proposals assessed using Statistics NZ’s microdata access protocols and only approved researchers who have been granted access by Statistics NZ and the Inland Revenue Department may view the tax data.

Read Statistics NZ’s microdata access protocols.

All outputs produced from tax data must be aggregated and counts suppressed if the underlying unrounded count is fewer than 6.

List of datasets ird_ems

ird_addresses

ird_customers

ird_client_names

ird_tax_registrations

ird_cross_reference

ird_rtns_keypoints_ir3

ird_attachments_ir20

ird_attachments_ir4s

ird_old_systems_numbers

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

7

3 Data dictionary for ird_ems

Dataset description Contents of dataset: The employee level data from the EMS return for period dates from 1 April 1999.

Conditions: Active records only, ie. ir_ems_return_line_item_code = 'A'

Exclude records with gross earnings equal to 0.

Note: Employers are able file late returns and/or amend EMS returns relating to prior periods. This means that in a given period, data may be updated with:

(a) New Active data for the latest period – however, do not include records that were

created and made inactive within the same period.

(b) New data relating to prior periods – include new data that has been submitted to

Inland Revenue, but relates to prior periods.

(c) Revisions relating to prior periods – include changes/revisions to data already

supplied.

Summary table

IDI variable name Primary key

Manda-tory

Format Classification name

Source variable name

snz_uid Y Y N

snz_ird_uid N N employee_ird_number

snz_employer_ird_uid Y Y N employer_ird_number

ir_ems_employer_location_nbr Y Y 4N employer_location_number

ir_ems_return_period_date Y Y Datetime return_period_date

ir_ems_line_nbr Y Y 6N line_number

ir_ems_snz_unique_nbr Y Y N

ir_ems_version_nbr Y Y 6N version_number

ir_ems_doc_lodge_prefix_nbr Y Y 1N doc_lodge_nbr_prefix l

ir_ems_doc_lodge_nbr Y Y 9N doc_lodge_nbr

ir_ems_doc_lodge_suffix_nbr Y Y 2N doc_lodge_nbr_suffix

ir_ems_gross_earnings_amt N 13.2N gross_earnings_amount

ir_ems_gross_earnings_imp_co

de Y 1A gross_earnings_imp_code

ir_ems_paye_deductions_amt N 13.2N paye_deductions_amount

ir_ems_paye_imp_ind Y 1A paye_imp_ind

ir_ems_earnings_not_liable_am

t N 13.2N earnings_not_liable_amount

ir_ems_earnings_not_liab_imp_

ind Y 1A earnings_not_liab_imp_ind

ir_ems_fstc_amt N 13.2N ftsc_amount

ir_ems_sl_amt N 13.2N sl_amount

IDI Data Dictionary: IR tax data (September 2015 edition)

8

IDI variable name Primary key

Manda-tory

Format Classification name

Source variable name

ir_ems_withholding_type_code Y 1A withholding_type_code

ir_ems_income_source_code Y 3A income_source_code

ir_ems_employee_start_date N Datetime date_employee_started

ir_ems_employee_end_date N Datetime date_employee_finished

ir_ems_lump_sum_ind N 1A lump_sum_indicator

ir_ems_tax_code Y 6A tax_codes tax_code

ir_ems_return_line_item_code Y 1A return_line_item_status_cod

e

ir_ems_processed_date y Datetime date_processed

ir_ems_ird_timestamp_date Y Datetime timestamp

ir_ems_enterprise_nbr N 10A

ir_ems_pbn_nbr N 10A

Detailed information ______________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (ird number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: snz_employer_ird_uid

Definition: A local unique identifier (for an employer) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

IDI Data Dictionary: IR tax data (September 2015 edition)

9

Name of classification:

Notes:

_________________________________________

Variable name: ir_ems_employer_location_nbr

Definition:

A location number is a sequence number that identifies/distinguishes between the associated locations that have return filing obligations that a customer may have.

Format: Numeric, 9N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_return_period_date

Definition: Period covered by the return.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_line_nbr

Definition: A line item number is a sequence number used to identify the different line items on a return attachment eg it is incremented from 1 by 1 for each line item.

Format: Numeric, 6N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

10

Variable name: ir_ems_version_nbr

Definition: A version number is a means of distinguishing one version of a return attachment line item from another. The version number is initialised at zero then incremented from 1 by 1 each time the record is changed.

Format: Numeric, 6N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_doc_lodge_prefix_nbr

Definition: The prefix of the document lodgement number under which this schedule (or EMS) was filed. A prefix of 3 indicates a manual return, a prefix of 8 indicates an e-filed return.

Format: Numeric, 1N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_doc_lodge_nbr

Definition: The document lodgement number (DLN) is a unique number assigned to documents or returns lodged.

Format: Numeric, 9N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_doc_lodge_suffix_nbr

Definition: Suffix to document lodgement number.

Format: Numeric, 2N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_gross_earnings_amt

Definition: Total earnings before tax deducted. The gross earnings paid to the employee. The EMS may include more than one line item entry.

Format: Numeric, 13.2

Name of classification:

IDI Data Dictionary: IR tax data (September 2015 edition)

11

Notes:

_______________________________________

Variable name: ir_ems_gross_earnings_imp_code

Definition:

Format: 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_paye_deductions_amt

Definition: Total income tax deductions.

Format: Numeric, 13.2

Name of classification:

Notes: This includes withholding payments

_______________________________________

Variable name: ir_ems_paye_imp_ind.

Definition:

Format: 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_earnings_not_liable_amt

Definition: Income not liable for ACC earner premium.

Format: Numeric, 13.2N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_earnings_not_liab_imp_ind

Definition:

Format: 1A

Name of classification:

Notes:

IDI Data Dictionary: IR tax data (September 2015 edition)

12

_______________________________________

Variable name: ir_ems_fstc_amt

Definition: Family Support Tax Credit – the amount of family support paid to each WINZ beneficiary for the line item. This column only applies to NZISS customers. FSTC is on DWI (WINZ) EMS schedules only as DWI are the only (beneficiary) ‘employer’ to fill in this column, so it does not appear on the standard EMS form.

Format: Numeric, 13.2N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_sl_amt

Definition: Student loan repayments – student loan deduction amount for the line item. The amount is always displayed as a negative number. The student loan amount is then subtracted from the total student loan.

Format: Numeric, 13.2N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_withholding_type_code

Definition: P for PAYE deductions, W for withholding tax deductions.

Format: Character, 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_income_source_code

Definition: Code representing the source of income.

Format: Character, 3A

Name of classification: W&S – wages and salary, WHP – withholding payment, BEN – benefits, STU – Student Allowance, PPL – Paid Parental Leave, PEN – Pensions (superannuation), CLM – Claimants Compensation.

Notes:

_______________________________________

Variable name: ir_ems_employee_start_date

Definition: Start date of the employee. Is entered by the employer on the EMS.

IDI Data Dictionary: IR tax data (September 2015 edition)

13

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_employee_end_date

Definition: End date of the employee. Is entered by the employer on the EMS.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_lump_sum_ind

Definition: Flag to indicate a lump sum payment.

Format: Character, 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_tax_code

Definition: Tax code of employee. The tax code at which deductions have been made for the employee for this line item number eg 'M' main source of income. Only one job can have this code at any one time.

Format: Character, 6A

Name of classification: tax_codes

Notes:

_______________________________________

Variable name: ir_ems_return_line_item_code

Definition: Status code. A code is an abbreviation for a return line item status. Status values are 'A' active or 'I' inactive.

Format: Character, 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_processed_date

Definition: Process date.

IDI Data Dictionary: IR tax data (September 2015 edition)

14

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_______________________________________

Variable name: ir_ems_enterprise_nbr

Definition: A unique identifier generated by Statistics NZ for an enterprise. An enterprise is an institutional unit and generally corresponds to legal entities operating in New Zealand. It can be a company, partnership, trust, estate, incorporated society, producer board, local or central government organisation, voluntary organisation, or self-employed individual.

Format: 10A

Name of classification:

Notes:

_______________________________________

Variable name: ir_ems_pbn_nbr

Definition: Permanent Business Number. 10-character code, consisting of 'PB' prefix, followed by a unique 8-digit number. This is a Statistics NZ generated construct for a geographically located business unit.

Format: 10A

Name of classification:

Notes:

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

15

4 Data dictionary for ird_addresses

Dataset description Contents of dataset: This table contains geocoded address information for an individual.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Source variable name

snz_uid Y N

snz_ird_uid Y Y N ird_number

ir_apc_location_nbr Y Y 4N location_number

ir_apc_address_type_code Y Y 1A address_types address_type

ir_apc_snz_unique_nbr N N

ir_apc_applied_date Y Datetime date_applied

ir_apc_tax_type_code Y Y 3A tax_types tax_type

ir_apc_main_address_ind Y Y 1A main_address_indicator

ir_apc_post_code N 6A post_code

ir_apc_address_status_code N 1A address_status address_status

ir_apc_ceased_date N Datetime date_ceased

ir_apc_ird_timestamp_date Y Datetime timestamp

ir_apc_region_code N 2A

ir_apc_ta_code N 3A

ir_apc_meshblock_code N 7A

ir_apc_meshblock_imputed_ind N 1A

snz_idi_address_register_uid N N

Detailed information _________________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: 7N

Name of classification:

Notes:

_______________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

IDI Data Dictionary: IR tax data (September 2015 edition)

16

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_location_nbr

Definition: Location number of the EMS filer (payroll system)

Format: Numeric, 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_address_type_code

Definition: Type of address a client may have e.g. 'L' - Physical Location Address, 'P'- Postal address, 'R' -Registered Office, 'S' - Specific address, etc.

Format: Character, 1A

Name of classification: address_types

Notes:

_______________________________________

Variable name: ir_apc_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_applied_date

Definition: Date from which record became valid

Format: Datetime, dd/mm/yy

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_tax_type_code

Definition: Tax code.

IDI Data Dictionary: IR tax data (September 2015 edition)

17

Format: Character, 3A

Name of classification: tax_types

Notes:

_______________________________________

Variable name: ir_apc_main_address_ind

Definition: Y/N indicator that denotes whether the address is the client's main address. A client may have more than one main address.

Format: Character, 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_post_code

Definition: This is a numeric code that has been assigned by the NZ Post for an area within New Zealand and is used for the delivery of mail

Format: Character, 6A

Name of classification:

Notes: In the post code field approximately 90 percent of data is available.

_______________________________________

Variable name: ir_apc_address_status_code

Definition: Current address status of customer, eg 'D' return to district office, 'I' invalid address, 'O' overseas address, 'V' valid address etc

Format: Character, 1A

Name of classification: address_status

Notes:

_______________________________________

Variable name: ir_apc_ceased_date

Definition: Date from which record ceased to be valid

Format: Datetime, dd/mm/yy

Name of classification:

Notes:

_______________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

18

Variable name: ir_apc_ird_timestamp_date

Definition: Indicates when data was extracted from into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_______________________________________

Variable name: ir_apc_region_code

Definition:

Format: 2A

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_ta_code

Definition:

Format: 3A

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_meshblock_code

Definition: A seven digit mesh block number which is the lowest level of a customer's geographic location.

Format: 7A

Name of classification:

Notes:

_______________________________________

Variable name: ir_apc_meshblock_imputed_ind

Definition:

Format: 1A

Name of classification:

Notes:

_______________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

19

Variable name: snz_idi_address_register_uid

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

20

5 Data dictionary for ird_customers

Dataset description Contents of dataset: This table holds birth_month, birth_year, and entity_type.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Source variable name

snz_uid Y Y N

snz_ird_uid Y Y N ird_number

ir_cus_snz_unique_nbr Y Y N

ir_cus_location_nbr Y Y 4N location_number

ir_cus_entity_type_code Y 1A entity_types entity_type

ir_cus_entity_class_code Y 2A entity_classes entity_class

ir_cus_client_status_code Y 1A client_status client_status

ir_cus_applied_date N Datetime date_applied

ir_cus_ceased_date N Datetime date_ceased

ir_cus_birth_year_nbr N 4N date_of_birth

ir_cus_birth_month_nbr N 2N date_of_birth

ir_cus_org_commencement_dat

e N Datetime

org_commencement_dat

e

ir_cus_loan_indicator_code N 1A loan_indicator

ir_cus_resident_indicator_code N 1A resident_indicator

ir_cus_sic_code N 8A sic_codes sic_code

Detailed information _______________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

IDI Data Dictionary: IR tax data (September 2015 edition)

21

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_location_nbr

Definition: Location number of taxpayer.

Format: Numeric, 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_entity_type_code

Definition: Type of entity eg C = company, M = Māori authority, P = partnership, I= individual etc.

Format: Character, 1A

Name of classification: entity_types

Notes:

_______________________________________

Variable name: ir_cus_entity_class_code

Definition: Class of entity eg BS = Building Society, UT = unit trust, SW = salary or wages etc.

Format: Character, 2A

Name of classification: entity_classes

Notes:

_______________________________________

Variable name: ir_cus_client_status_code

Definition: Status of the client eg C = ceased, B = bankrupt, A = active, L = liquidation, R = receivership, M = amalgamated company, S = struck off, U = undischarged bankrupt.

Format: Character, 1A

IDI Data Dictionary: IR tax data (September 2015 edition)

22

Name of classification: client_status

Notes: e.g. active/bankrupt/ceased active

_______________________________________

Variable name: ir_cus_applied_date

Definition: Date from which the record became active.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_ceased_date

Definition: Date from which the record became inactive.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_birth_year_nbr

Definition:

Format: 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_birth_month_nbr

Definition:

Format: 2N

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_org_commencement_date

Definition: Commencement date for any entity other than an individual, ie company, partnership, trust etc. Loan transfer date.

Format: Datetime, yyyymmdd

Name of classification:

Notes: May be set to 1/1/1970 if unknown.

IDI Data Dictionary: IR tax data (September 2015 edition)

23

_______________________________________

Variable name: ir_cus_loan_indicator_code

Definition: ‘Y’ indicates presence of student loan for individuals.

Format: Character, 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_resident_indicator_code

Definition: NZ resident / non-resident for tax purposes (R/N)

Format: Character, 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_cus_sic_code

Definition: Industry Code, eg 511010 = supermarkets, 523100 = furniture retailing.

Format: Character, 8A

Name of classification: sic_codes

Notes:

_______________________________________

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

24

6 Data dictionary for ird_client_names

Dataset description Contents of dataset: This table holds sex and client status information.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Source variable name

snz_uid Y Y N

snz_ird_uid Y Y N ird_number

ir_cli_snz_unique_nbr Y N

ir_cli_location_nbr Y N 4N location_number

ir_cli_client_name_type

_code Y N 2A client_name_type client_name_type

ir_cli_sequence_nbr Y N 3N sequence_number

ir_cli_applied_date Y N Datetime date_applied

ir_cli_sex_snz_code N 1A

ir_cli_sex_imp_code Y 1A

ir_cli_ceased_date N Datetime date_ceased

ir_cli_ird_timestamp_da

te N Datetime timestamp

Detailed information _________________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

Name of classification:

Notes:

_________________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

25

Variable name: ir_cli_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_cli_location_nbr

Definition: Location number of the EMS filer (payroll system)

Format: Numeric, 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_cli_client_name_type_code

Definition: A code denoting the client name type eg P = preferred name, S = secondary name etc.

Format: Character, 2A

Name of classification: client_name_type

Notes:

_______________________________________

Variable name: ir_cli_sequence_nbr

Definition: A (sequence) number is the numeric code given to each of a client's names within the combination of IRD number, location number and client name type. It is not a serial number, but duplicates the code in the client name type entity. There is a 1:1 relationship between client name number and client name type code: No. Code 10 = P 20 = S 30 = A 40 = C 50 = T.

Format: Numeric, 3N

Name of classification:

Notes:

_______________________________________

Variable name: ir_cli_applied_date

Definition: Date from which the record became valid.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

IDI Data Dictionary: IR tax data (September 2015 edition)

26

_______________________________________

Variable name: ir_cli_sex_snz_code

Definition:

Format: 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_cli_sex_imp_code

Definition:

Format: 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_cli_ceased_date

Definition: Date from which the record became invalid.

Format: Datetime, yyyymmdd

Name of classification:

Notes: new name, death etc.

_______________________________________

Variable name: ir_cli_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_________________________________________

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

27

7 Data dictionary for ird_tax_registrations

Dataset description Contents of dataset: This table holds information about tax types.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Source variable name

snz_uid Y Y N

snz_ird_uid Y Y N ird_number

ir_treg_location_nbr Y Y 4N location_number

ir_treg_tax_type_code Y Y 3A tax_types tax_type

ir_treg_applied_date Y Y Datetime date_applied

ir_treg_snz_unique_nbr Y Y N ir_treg_snz_unique_nbr

ir_treg_treg_start_date Y Y Datetime treg_date_start

ir_treg_treg_end_date Y Datetime treg_date_end

ir_treg_filing_frequency_

code N 2A

tax_filing_freq

uency filing_frequency

ir_treg_treg_status_code N 1A tax_reg_status treg_status

ir_treg_ceased_date N Datetime date_ceased

ir_treg_posting_ind_code N 1A

posting_indicat

ors posting_ind

ir_treg_electronic_filing_i

nd N 1A electronic_filing_ind

ir_treg_corporate_filing_i

nd N 1A corporate_filing_ind

ir_treg_has_agent_ind Y 1A has_agent_ind

ir_treg_ird_timestamp_da

te Y Datetime timestamp

Detailed information _________________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across

IDI Data Dictionary: IR tax data (September 2015 edition)

28

refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_treg_location_nbr

Definition: Location number of the EMS filer (payroll system)

Format: Numeric, 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_treg_tax_type_code

Definition: Tax type

Format: Character, 3A

Name of classification: tax_types

_______________________________________

Variable name: ir_treg_applied_date

Definition: Date from which the record became active.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_treg_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_treg_treg_start_date

Definition: Date the client first registered for a particular tax type

Format: Datetime, yyyymmdd

IDI Data Dictionary: IR tax data (September 2015 edition)

29

Name of classification:

Notes:

_______________________________________

Variable name: ir_treg_treg_end_date

Definition: Date the client deregistered for a particular tax type.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_treg_filing_frequency_code

Definition: eg D = twice monthly, Q = quarterly, I = irregularly

Format: Character, 2A

Name of classification: tax_filing_frequency

Notes:

_______________________________________

Variable name: ir_treg_treg_status_code

Definition: Active/Ceased 'X' Unknown

Format: Character, 1A

Name of classification: tax_reg_status

Notes:

_______________________________________

Variable name: ir_treg_ceased_date

Definition: Date from which the record became invalid.

Format: Datetime, yyyymmdd

Name of classification:

Notes: new name, death etc.

_______________________________________

Variable name: ir_treg_posting_ind_code

Definition: Distinguishes the type of address eg P = postal, Q = liquidator, A = agent etc.

Format: Character, 1A

Name of classification: posting_indicators

IDI Data Dictionary: IR tax data (September 2015 edition)

30

Notes:

_______________________________________

Variable name: ir_treg_electronic_filing_ind

Definition: 'Y' if an electronic filer, 'N' if paper filer

Format: Character, 1A

Name of classification:

Notes:

_______________________________________

Variable name: ir_treg_corporate_filing_ind

Definition: Indicates whether the customer is part of a corporate filing group.

Format: Character, 1A

Name of classification:

Notes: 'N' = not part of a group, 'P' = parent, 'S' = subsidiary

_______________________________________

Variable name: ir_treg_has_agent_ind

Definition: Indicates whether a tax agent acts on behalf of the customer.

Format: Character, 1A

Name of classification:

Notes: 'Y' = yes, 'N' = no.

_______________________________________

Variable name: ir_treg_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_________________________________________

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

31

8 Data dictionary for ird_cross_reference

Dataset description Contents of dataset: This table is maintained by IR and holds information about the set of relationships between two IRD numbers. Most of the information on this table is found when the annual returns are processed.

As most of the other information is found on annual returns, it’s only when the returns are processed that information may be validated. However, it may not always occur.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Source variable

name

snz_uid Y Y N

ir_xrf_from_snz_ird_uid Y Y N ird_number_from

ir_xrf_to_snz_ird_uid Y Y N ird_number_to

ir_xrf_applied_date Y Y Datetime date_applied

ir_xrf_ceased_date N Datetime date_ceased

ir_xrf_reference_type_code Y Y 3A cross_referenc

e_types reference_type

ir_xrf_first_year_nbr N 4N first_year

ir_xrf_latest_year_nbr Y Y 4N latest_year

ir_xrf_ird_timestamp_date Y Datetime timestamp

Detailed information _________________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: ir_xrf_from_snz_ird_uid

Definition:

Format: N

Name of classification:

Notes:

_________________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

32

Variable name: ir_xrf_to_snz_ird_uid

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_xrf_applied_date

Definition: Date from which record is valid.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_xrf_ceased_date

Definition: Date from which record is invalid.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_xrf_reference_type_code

Definition:

AAC Amalgd/Amalging Co

ASS Associated Person

BAN Bankrupt

BEN Beneficiary

DEC Deceased

DEP Dependent

DIR Director

DUP Duplicate IRD No

EOH Exec Office Holder

GPR GENERAL PARTNER

IGN NOMINATED ICA CO

JVT Joint Venture

LPR LIMITED PARTNER

LQR LIQUIDATOR

LTI LOOK-THROUGH INT

LTO LOOK THROUGH OWNER

NOM Nominated Company

NOP NOMINEE

NOR NOMINATOR

IDI Data Dictionary: IR tax data (September 2015 edition)

33

NRC NON RES CHLD SUPPT

PTR Partner

SHR Shareholder

SPO Spouse/Defacto

SUB Subsidiary Company

TEE Trustee

TRA TRANSITIONAL CLIEN

VAD VOLUNTARY ADMINIST

Format: Character, 3A

Name of classification: cross_reference_types

Notes: eg shareholder/partner/bankrupt

_______________________________________

Variable name: ir_xrf_first_year_nbr

Definition: Start date of the cross reference relationship.

Format: Numeric,

Name of classification:

Notes: eg shareholder/partner/bankrupt

_______________________________________

Variable name: ir_xrf_latest_year_nbr

Definition: Latest year of the cross reference relationship.

Format: Numeric,

Name of classification:

Notes: eg shareholder/partner/bankrupt

_______________________________________

Variable name: ir_xrf_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data .warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_______________________________________

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

34

9 Data dictionary for ird_rtns_keypoints_ir3

Dataset description Contents of dataset: This table contains information for the active items which have non-zero partnership, self-employment, or shareholder salary income.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Variable name

snz_uid Y Y N

snz_ird_uid Y Y N ird_number

ir_ir3_location_nbr Y Y 4N location_number

ir_ir3_return_period_date Y Datetime return_period_date

ir_ir3_snz_unique_nbr Y Y N

ir_ir3_tot_pship_income_amt N 13.2N total_partnership_income_808

ir_ir3_tot_sholder_salary_amt N 13.2N total_shareholder_salary_809

ir_ir3_net_profit_amt N 13.2N net_profit_702

ir_ir3_income_imp_ind Y 1A

ir_ir3_net_rents_826_amt N 13.2N net_rents_826

ir_ir3_tot_wholding_paymnts_

amt

N 13.2N

tot_w_holding_payments_100

514

ir_ir3_tot_expenses_claimed_

amt

N 13.2N

total_expenses_claimed_1512

ir_ir3_gross_earnings_407_a

mt

N 13.2N

gross_earnings_407

ir_ir3_ird_timestamp_date N Datetime timestamp

Detailed information _________________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

IDI Data Dictionary: IR tax data (September 2015 edition)

35

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir3_location_nbr

Definition: Location number of the EMS filer (payroll system).

Format: Numeric, 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir3_return_period_date

Definition: Period covered by return.

Format: Datetime, dd/mm/yy

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir3_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir3_tot_pship_income_amt

Definition: Partnership income.

Format: Numeric, 13.2N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir3_tot_sholder_salary_amt

Definition: Shareholder salary income.

Format: Numeric, 13.2N

Name of classification:

IDI Data Dictionary: IR tax data (September 2015 edition)

36

Notes:

_______________________________________

Variable name: ir_ir3_net_profit_amt

Definition: Self-employment income.

Format: Numeric, 13.2N

Name of classification:

Notes:

________________________________________

Variable name: ir_ir3_income_imp_ind

Definition:

Format: 1A

Name of classification:

Notes:

________________________________________

Variable name: ir_ir3_net_rents_826_amt

Definition: Net rental income

Format: Numeric, 13.2

Name of classification:

Notes:

________________________________________

Variable name: ir_ir3_tot_wholding_paymnts_amt

Definition: Total gross earnings (with withholding tax deducted at source).

Format: Numeric, 13.2N

Name of classification:

Notes:

________________________________________

Variable name: ir_ir3_tot_expenses_claimed_amt

Definition: Total expenses claimed.

Format: Numeric, 13.2N

Name of classification:

Notes:

IDI Data Dictionary: IR tax data (September 2015 edition)

37

_________________________________________

Variable name: ir_ir3_gross_earnings_407_amt

Definition: Gross earnings with PAYE deducted at source.

Format: Numeric, 13.2N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir3_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_______________________________________

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

38

10 Data dictionary for ird_attachments_ir20

Dataset description Contents of dataset: This table contains information for active items which have non-zero partnership income.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Variable name

snz_uid Y Y N

snz_ird_uid Y Y N ird_number

snz_employer_ird_uid Y N employer_ird_number

ir_ir20_location_nbr Y Y 4N location_number

ir_ir20_return_period_date Y Y Datetime return_period_date

ir_ir20_snz_unique_nbr Y N

ir_ir20_tot_share_of_inc_8

65_amt N 13.2N tot_share_of_inc_865_amt

ir_ir20_income_imp_ind Y 1A income_imp_ind

ir_ir20_ird_timestamp_date N Datetime timestamp

Detailed information _______________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: snz_employer_ird_uid

IDI Data Dictionary: IR tax data (September 2015 edition)

39

Definition: A local unique identifier (for an employer) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir20_location_nbr

Definition: Location number of the payer.

Format: Numeric, 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir20_return_period_date

Definition: The return period.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_________________________________________

Variable name: ir_ir20_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: ir_ir20_tot_share_of_inc_865_amt

Definition: Value of partnership income.

Format: Numeric, 13.2N

Name of classification:

Notes:

_______________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

40

Variable name: ir_ir20_income_imp_ind

Definition:

Format: 1A

Name of classification:

Notes:

_________________________________________

Variable name: ir_ir20_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_______________________________________

Dictionary of Child, Youth and Family data in the Integrated Data Infrastructure

41

11 Data dictionary for ird_attachments_ir4s

Dataset description Contents of dataset: This table holds information about the active items which have non-zero shareholder income.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Source variable name

snz_uid Y Y N

snz_ird_uid Y N ird_number

snz_employer_ird_uid Y Y N employer_ird_number

ir_ir4_location_nbr Y Y 4N location_number

ir_ir4_return_period_date Y Y Datetime return_period_date

ir_ir4_snz_unique_nbr Y Y N

ir_ir4_tot_sholder_sal_809

_amt N 13.2N

total_shareholder_salary_

809

ir_ir4_income_imp_ind Y 1A income_imp_ind

ir_ir4_ird_timestamp_date Y Datetime timestamp

Detailed information _______________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

Name of classification:

Notes:

_______________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

42

Variable name: snz_employer_ird_uid

Definition: A local unique identifier (for an employer) derived by Statistics NZ from an IR unique identifier (IRD number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir4_location_nbr

Definition: Location number of the payer.

Format: Numeric, 4N

Name of classification:

Notes:

_______________________________________

Variable name: ir_ir4_return_period_date

Definition: The return period.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_________________________________________

Variable name: ir_ir4_snz_unique_nbr

Definition:

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: ir_ir4_tot_sholder_sal_809_amt

Definition: Value of shareholder salary.

Format: Numeric, 13.2N

Name of classification:

Notes:

_______________________________________

IDI Data Dictionary: IR tax data (September 2015 edition)

43

Variable name: ir_ir4_income_imp_ind

Definition:

Format: 1A

Name of classification:

Notes:

_________________________________________

Variable name: ir_ir4_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_______________________________________

44

12 Data dictionary for ird_old_systems_numbers

Dataset description Contents of dataset: This table contains the mapping of IRD numbers from old system to the new system.

Summary table

IDI variable name Primary

key

Manda-

tory

Format Classification

name

Source variable name

snz_uid Y Y N

ir_osn_old_snz_ird_uid Y Y N old_system_number

snz_ird_uid Y N ird_number

ir_osn_location_nbr Y N location_number

ir_osn_applied_date Y Y Datetime date_applied

ir_osn_ceased_date N Datetime date_ceased

ir_osn_ird_timestamp_date Y Datetime timestamp

Detailed information _________________________________________

Variable name: snz_uid

Definition: A global unique identifier created by Statistics NZ. There is a snz_uid for each distinct identity in the IDI. This identifier is changed and reassigned each refresh.

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: ir_osn_old_snz_ird_uid

Definition:

Format: N

Name of classification:

Notes:

_________________________________________

Variable name: snz_ird_uid

Definition: A local unique identifier (for an employee) derived by Statistics NZ from an IR unique identifier (ird number). This identifier will remain the same for an identity across refreshes. Where we receive more information during a subsequent refresh that indicates that two or more identities represent the same identity, the identifier may change.

Format: N

IDI Data Dictionary: IR tax data (September 2015 edition)

45

Name of classification:

Notes:

_______________________________________

Variable name: ir_osn_location_nbr

Definition: Location number of the payer.

Format: Numeric,

Name of classification:

Notes:

_________________________________________

Variable name: ir_osn_applied_date

Definition: Date from which record is valid.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_______________________________________

Variable name: ir_osn_ceased_date

Definition: Date from which record is invalid.

Format: Datetime, yyyymmdd

Name of classification:

Notes:

_________________________________________________

Variable name: ir_osn_ird_timestamp_date

Definition: Indicates when data was extracted into Inland Revenue’s data warehouse.

Format: Datetime, yyyymmdd

Name of classification:

_________________________________________________

46

13 Glossary

Term Definition

IDI name (Stats NZ) The variable names in the IDI SQL database.

Mandatory field

(IR) Indicates a field which cannot be “null”.

Primary key (Stats NZ) An identifier for a unique database item (may consist of a single item or multiple items in combination).