international collaboration on industrialization of editing: business case (part 1, wp38)
DESCRIPTION
International Collaboration on Industrialization of Editing: Business Case (Part 1, WP38). Li-Chun Zhang Statistics Norway. Industrialization of Editing: Some issues to be dealt with. Overall objective, principles and guidelines (e.g. the “new” paradigm of editing) - PowerPoint PPT PresentationTRANSCRIPT
1
1
International Collaboration on International Collaboration on Industrialization of Editing:Industrialization of Editing:Business Case (Part 1, WP38) Business Case (Part 1, WP38)
Li-Chun Zhang
Statistics Norway
2
Industrialization of Editing: Some issues to be dealt with
• Overall objective, principles and guidelines (e.g. the “new” paradigm of editing)
• Conceptual reference framework with regard to GSBPM
• Conceptual reference framework with regard to GSIM to-be
• Design of generic functionality
• Minimum set of standard methods
• IT tools and platforms
3
Objectives & principles
• Example: Objectives (the “new” paradigm)– Error-source identification and error prevention– Collect information about quality– Identification and adjustment of critical errors in data
• Example: Objectives (SNZ proposal)– Efficiency as quality against cost– Continuous quality improvement– Provide quality information
• Example: Principles– Original data as much as possible (“old” Felligi-Holt paradigm)– Maximum automated processing– Analysis of (editing) process efficiency– Training, documentation– …
4
Generic Statistical Data Editing Process (GSDEP)
• GSBPM ≠ Flow Chart
• An example from EDIMBUS
• Mapping GSDEP with GSBPM– Micro vs. macro editing– “Editing & Imputation” (E&I) vs.
“Editing & Estimation” (E&E)
• Connections to GSIM to-be
5
Common Statistical Data Reference (CSDR):Interface btw. SDE and GSIM to-be
• Statistical production as transformations of data
=> steady / major states of data
• Common Micro Data Format for database management
• Common Functional Data Format for method library
6
Design of generic functionality• Databases
– Micro database of CMDF data files (M-Base)– Functional database of functional data files and alignment tables (F-Base)– Function library (F-Lib) contains all available standardized generic (program) tools.
• Builders– Functional data builder (D-Build) transforms relevant CMDF data files into the required functional data files, and
updates the relevant alignment tables.– Function builder (F-Build) takes functional data files as the input data and tools from the F-Lib, and configures the
necessary parameters according to a given specification for machine-based or automated data processing.– Screen builder (S-Build) takes fnctional and/or CMDF data files as the input data, and configures an environment
for manual inspection/editing of individual records/questionnaires according to a given specification.
• Runners: – Batch processor is the environment for executing automated/machined-based SDE processes, chiefly relying on
functions that are configured in the F-Build.– Manual processor is the environment for manually executing SDE processes, chiefly relying on the interface
provided through the S-Build.– Selection and Drilling are the dedicated environments for carrying out selective editing and drilling up-and-down
among hierarchically structured aggregations.– Data processor supports the necessary administration of data and metadata.
• Managers: – ANOPE is the environment for quality assessment of the editing processes.– Response manager provides the interface for re-contact with the data providers, and other generally related
production processes (such as Process 4 Collect).
Claude PoirierStatistics Canada
Next steps
• Objectives, guidelines and principles– Finalize user requirements– Identify existing methods– React to functional gaps– Set up the framework– Develop the toolset– Deliver training
7
Finalizing user requirements
• Prioritizing edit and imputation requirements
– Micro-editing methodsAutomated E&I on numerical and categorical data
– Macro-editing methodsSelective editing; Macro editing; Editing of macro data
– On-line editingCollection edits and self-administered edits
– Data confrontation and certificationMethods using multiple data sources
– Standardized platformCommon architecture
8
Existing tools and Platforms
• Identifying and analysing existing products
– SigEE (Australia)
– BANFF, CANCEIS (Canada)
– BEST, POSS (New Zealand)
– ISEE, DYNAREV (Norway)
– TRITON, SELEKT (Sweden)
9
Reacting to functional gaps
• Not all requirements will be satisfied
• Brainstorming sessions are being organised
• Development priorities will be discussed
Developing the tool set
• Consolidate preferred tools
– Adapt existing tools to the environment
– Develop pre/post processors to fit the environment
• Develop missing functions10
Delivering training material
• User guide
• Methodology documentation
• System documentation
Comments / Questions
• It’s your turn
11
Frequently asked questions (FAQ)
Q1: What governance model drives the project?
Q2: When do we expect the suite of editing functions to be delivered?
Q3: As a member of the collaboration network, will my agency have to pay any fees for accessing and using released functions?
Q4: My statistical agency is not part of the network. Are there any fees that are planned to let me use the products?
Q5: My agency would like join the network. Is this possible? How?
12
Frequently asked questions (FAQ)
Q6: I understand from your presentation that a common environment is being planned? Would I be able to use the functions in another environment?
Q7: My agency is willing to share a system but its foundation software is not compliant with the proposed environment. What will happen?
Q8: My agency is willing to offer a system or a module for the network. Who will own the module?
Q9: Will the resulting products become open-source?
13