adc, edr, ttk: views and experiences of the e …€¦  · web viewthese data are either the...

24

Click here to load reader

Upload: dinhliem

Post on 28-Aug-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Project funded under the Fifth Framework Programme (FP5) on the Transfer of Technology and Know-how (TTK)

within the European Statistical System

AMRADS / TTK 2 CONFERENCETRANSFER OF TECHNOLOGY AND KNOW-HOW

24 – 27 NOVEMBER 2003

Organised By

INFORMER S.A.In Cooperation with

ISTAT

The venue is:

ISTAT

VIA CESARE BALBO, 16

00184 ROMA – ITALY

Room “AULA MAGNA”

Page 2: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

ADC, EDR, TTK: VIEWS AND EXPERIENCES OF THE E-QUEST TEAMGÜNTHER ZETTL, FREDERICK RENNERT AND WOLFGANG KOLLER

By

Guenther Zettl

Statistics Austria, Hintere Zollamtsstr. 2b, A-1035 [email protected]; [email protected]; [email protected]

Abstract: e-Quest, Statistics Austria’s comprehensive, metadata-driven system for statistical raw data collection by means of self interviews with electronic questionnaires, has raised plenty of interest within and outside of the European Statistical System. Some experiences, views and opinions on matters of collecting statistical raw data are presented in this paper by the persons responsible for the architecture and development of e-Quest. A special focus is put on issues of transfer of technology and know-how in the area of electronic data reporting.

Keywords: e-Quest, automated data collection, electronic data reporting, electronic questionnaire, transfer of technology and know-how.

IntroductionTwo and a half years ago, at the ETK-NTTS1 conference in Crete, we presented the software „e-Quest Questionnaire Manager“ which had been deployed for the first time to Austrian enterprises just a few weeks before in order to assist them in answering, managing and transmitting statistical questionnaires. This program is the core part of Statistics Austria’s comprehensive, metadata-driven system for statistical raw data collection by means of self interviews with electronic questionnaires (CASI2). Other important components are the “e-Quest Metadata Manager” (which provides a graphical user interface for the definition of metadata describing questionnaires and objects such as surveys, survey versions, types of observation units and so on) and the “Receive” subsystem including the “e-Quest Package Manager” for the handling of incoming response data.3

Preparation and development of e-Quest were launched in Summer 1998. After finishing version 1 of all programs at the end of 2001, work has continued on enhancing the software, e.g. by increasing performance, correcting bugs, adapting it for Windows XP and adding new functions, for example a plug-in component which is capable of displaying large multi-lingual codelists which can either be flat or hierarchically structured (for the provision of such classifications an XML4 schema was elaborated as the standard format).

1 ETK: Exchange of Technology and Know-how. NTTS: New Techniques and Technologies for Statistics.2 CASI: Computer Assisted/Aided/Administered Self Interviewing3 A detailed description of the e-Quest system can be found in [1].4 XML: Extensible Markup Language

2

Page 3: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Currently the “e-Quest/Web” project deals with an important extension of e-Quest with regard to web forms. After completion of this project it will be possible to re-use metadata created with the “e-Quest Metadata Manager” to generate HTML5 and JSP6 files, Javascript code, Java classes (e.g. for server-side validation and data storage), web services and SQL7 statements, thus facilitating and speeding up the process of developing web questionnaires which will be integrated into a “questionnaire portal”.The e-Quest system has raised plenty of interest within and outside of the ESS8. In a discussion at an EDR9 workshop it was described as “the most flexible and complete solution currently available in the area of primary EDI”, and it also was called a “strong candidate for best practice imprimatur” [2]. We are convinced that this assessment will be even more correct in the near future when “e-Quest/Web” will go into production.Based on (in total) 15 years of work in the field of EDR, in this paper we – i.e. the persons responsible for the architecture and development of e-Quest – will present some experiences, views and opinions on matters of collecting statistical raw data. A special focus will be put on issues of transferring technology and know-how (TTK).

The growing importance of ADC

The application of information and communication technology (ICT) to tasks carried out during the collection phase of the statistical production process is by no means a new phenomenon. In many NSIs10

tools like CAPI/CATI11 software or OMR/OCR12 programs have been in use successfully, in some cases even for decades.In recent years, however, the interest in automated data collection (ADC; also called automated data capture) has been growing rapidly, as can be witnessed by an increasing number of conferences, R&D projects and software development endeavours dealing exclusively or in part with this subject. This may be caused by the following reasons, among others: technological innovations, the need for alleviation of the respondents’ burden, user expectations and pressure to rationalise. Let us have a look at these topics in more detail:

Technological innovationsIn connection with this item two keywords have to be given pride of place: computerisation and global networks. The former has created the foundation for most ADC methods, as it has led to an increased availability of structured electronic data, not only in enterprises but also for other types of respondents such as municipalities, schools, etc.13 These data are either the answers to the questions asked by statistical institutes, or they form the raw material which by means of calculations and re-classification can be transformed into statistical raw data.The global network of the Internet provides an infrastructure for the fast transfer of data between providers and collectors of statistical information (PSIs and CSIs), e.g. by using e-mail or FTP14. Furthermore, the World Wide Web and new technologies such as web services have offered new options for the development of EDR tools.In the mid-1990s Statistics Austria provided modems to enterprises free of charge in order to promote the usage of the IDEP/CN8 software for Intrastat declarations. Such a step would not be necessary anymore, as nowadays most PCs sold (even in super markets such as Aldi or Lidl) already have built-in modems and network cards.

5 HTML : Hypertext Markup Language6 JSP: Java Server Pages7 SQL: Structured Query Language8 ESS: European Statistical System9 EDR: Electronic Data Reporting10 NSI: National Statistical Institute11 CAPI: Computer Assisted Personal Interviewing. CATI: Computer Assisted Telephone Interviewing12 OMR: Optical Mark Recognition. OCR: Optical Character Recognition.13 Of course there are differences in the rate of computerisation between various countries, in particular if we look across the borders of the EU. This has to be taken into account when TTK is considered. A tool like e-Quest certainly would not be the right choice for NSIs in countries where computerisation is less advanced.14 FTP: File Transfer Protocol

3

Page 4: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Need for alleviation of the respondents’ burden

A very simple model would describe a statistical institute as a data processing and information producing system with two interfaces:a) At the input interface, raw data are entered into the “NSI black box” by the data suppliers

(respondents, existing data registers).b) At the output interface, statistical results (object data and metadata in different forms of

presentation) are passed on to “data users”.Regarding the output end of this system, external persons usually play an active role, looking for and – hopefully – finding information from which they can gain some benefit. At the input side, however, the providers of statistical raw data often are not acting voluntarily, and in many cases they do not understand the usefulness of the work they are forced to carry out. Therefore in most countries the NSIs have experienced massive pressure to lower the burden placed upon respondents, in particular those of the business sector (because enterprises often are obliged to answer several and voluminous questionnaires).In Austria the tangible reduction of the load on citizens and enterprises was one of the central political aims of the Federal Statistics Act 2000. This law, which forms the legal basis of the work conducted by Statistics Austria, includes the following regulations:

Before any surveys are conducted, it has to be investigated whether existing administrative data can be used instead (§ 6).

Whenever possible, surveys should be conducted on the basis of voluntary cooperation (§ 6). Sample surveys must be preferred to censuses (§ 7). If surveys are conducted regularly, rotation of the respondents should be undertaken (§ 7). The design of questionnaires must take into consideration the specific characteristics of the

respondents, such as branch of industry, size of the enterprise, etc. (§ 14). Statistics Austria must make provisions for electronic data reporting (§ 28). On demand, the relevant supporting material for electronic responses (that means mostly

software for the preparation, control and transmission of the necessary information) must be placed at the disposal of the respondents free of charge, as long as this is useful as well as technically justifiable (§ 28).

§ 28 of the Federal Statistics Act 2000 makes clear that – when the requested data are not available in existing sources and therefore primary data collection is necessary – Austrian legislation sees EDR as an effective way to alleviate the respondents’ burden.

User expectationsWithin only a few years’ time, the World Wide Web has evolved into a powerful means of communication. For an ever growing number of people the web is the first place where they look for information of all kinds. Taking into account that the main task of a statistical institute is the production of information, it is clear that an NSI must not ignore this trend – or else it would disappoint the expectations of its customers.Most NSIs have quickly adapted to the new dissemination channels. Statistics Austria’s web site, for example, has been growing continuously since its launch in 1997. The statistical database ISIS with thousands of multi-dimensional data cubes can now be queried in a user-friendly way with a client software which can be downloaded as a Java applet and executed within a web browser. Another important project in this context is the development of a “publication object database” which is intended to store all documents and to make them available on Intra- and Internet (the latter one only if a file is released for an external audience, of course). Another goal of this project is the integration of workflows covering the processes of writing and approving different types of publications (e.g. press releases). A content management software developed by Stellent Inc. has been acquired and forms the cornerstone of this solution.What has been said above about the dissemination of statistical information also applies to the collection of raw data. More and more the respondents of any survey will take for granted that electronic means of data reporting are offered to them in addition to the usual paper questionnaires. An NSI will have to accommodate these expectations if it does not want to risk a massive impairment of its image.

4

Page 5: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Pressure to rationaliseIn most countries statistical institutes are facing the same problem: although manpower resources and funding are being frozen or cut back, at the same time the tasks they have to perform are increasing in complexity and scope, and the demand for timely statistical information of high quality is greater than ever. Dealing with this problem and aiming for the (partly) conflicting goals of producing more and better statistics with less money sometimes seems like trying to square the circle. Among other measures, it probably will be necessary to adapt the organisation of the statistical production processes and to intensify the efforts to build an integrated ICT infrastructure supporting these processes.On the data collection side the utilisation of ADC can contribute to save time and money15 and increase the quality of the collected raw data (e.g. by early validation checks immediately after entering the answers into an electronic questionnaire and by avoiding potential error sources like keying in of paper questionnaires), thus being of benefit not only to the respondents, but to the statistical institute as well.

Data collection forms an essential part of the assignment of any NSI. Therefore modernisation and streamlining of the steps pertaining to this task are a challenge for all of them. As many statistical institutes are working on similar projects (e.g. the development of web questionnaire applications), the transfer of know-how – in the simplest case by learning from the successes and failures of others – and perhaps of complete tools can accelerate the creation of the respective infrastructures and lower the associated costs.

DefinitionsIn the previous chapters the terms “electronic data reporting” (EDR) and “automated data collection (respectively capture)” (ADC) were mentioned several times, perhaps leading to the impression that they are synonyms. Before we continue, let us define these terms in order to avoid misunderstandings.In [3] Uwe Kunzler described EDR – in the strictest sense – as a method of primary (or direct) data collection which is based on electronic questionnaires for self interviewing (CSAQ16). In a wider meaning, EDR would also include methods like TDE17, CAPI and CATI.Following this definition, we regard EDR as specialisation or subset of the more general term ADC. In this understanding ADC comprises all methods of enhancing and automating the process of statistical raw data collection by means of ICT. Some of the most important techniques in this regard are:

Scanning of paper questionnairesThe scanned images allow a quick retrieval of questionnaires whenever a statistician has to look at the original data filled in by the respondent. More advanced solutions include the usage of OMR and OCR to avoid the manual typing in of paper questionnaires, and of programs for automatic and/or computer assisted coding of textual answers.As paper forms will likely continue to play an important role in primary data collection, OCR is a software category worth investing in for many NSIs.

CAPI and CATIIn the field of personal and telephone interviews, several software products such as Blaise and Interview Technology are available. These tools usually present one question at a time to the interviewer who keys in the answer and is then routed to the next question. Advanced features like telephone number administration, automatic dialling, etc. are often included as standard features or can be added by supplementary program modules. Some packages also can be used for self interviews over the web, again in a one-question-per-page mode, or allow the presentation of multimedia files (a functionality which is more important for market research institutes than for NSIs).

CASI

Computer assisted self interviewing as an alternative to the traditional paper questionnaires can be implemented in several variants:

15 In the case of electronic questionnaires on two conditions: a reasonable percentage of respondents can be convinced to use them, and an economical way of developing new questionnaires must be found.16 CSAQ: Computerised Self Administered Questionnaire17 TDE: Touch-tone (or Telephone) Data Entry

5

Page 6: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

o E-mail questionnaires: The respondent answers the e-mail containing the questions and writes his data directly into the reply mail.

o Questionnaires in the form of Word, Excel, PDF etc. files: The documents (which are sent to the respondents by e-mail or downloaded by them from a web address) are either printed and sent back on paper or filled in on the PC and returned electronically, e.g. as an e-mail attachment. Implementing such a solution can be very simple, but also quite complex (specifically, Excel in connection with VBA18 macros can be used as a development platform for sophisticated applications).

o CSAQ software which has to be installed on the PCs of the respondents: These programs – which are tailor-made for a specific survey – are used offline and finally transmit the response data to a server of the data collector. A great variety of features is conceivable, depending on the requirements of the respective survey (for example: Should the application also be useable by third party declarants who fill in questionnaires on behalf of their clients? Should it be possible to search for specific codes in huge classifications? Should data import be supported?). A well-known example of this type of CASI software is IDEP/CN8.

o Multi-questionnaire CSAQ systems: In contrast to the above mentioned CSAQ programs, multi-questionnaire systems such as Statistics Austria’s e-Quest are designed to be survey-independent, thus avoiding some of the disadvantages of the former category. New questionnaires can be created in a cost-efficient way by defining them in the form of metadata. The respondents have to install the questionnaire management application only once and can then add new questionnaires simply by loading the metadata into the system.

o Web forms: These questionnaires are presented as HTML pages in a web browser, thus making it unnecessary to install software on the respondents’ PCs. The answers are filled in online and are transmitted to the web server of the CSI. Similar to other CASI solutions, the effort necessary for the development of web questionnaires can vary widely, depending on the required features and security measures.

o Crosses between different CASI variants: In some cases it is not possible to assign a certain software to only one of the above mentioned categories. The Questionnaire Presentation Tool of the IQML19 project, for instance, mainly uses web technology, but a few ActiveX components must still be installed. Another example is e-Quest: after finishing the current “e-Quest/Web” project, it will not only be a multi-questionnaire CSAQ system that requires installation on the PCs of the respondents, but will also allow the creation of web forms accessible by a standard browser at a “questionnaire portal”.

Specification of standardised data formatsWhen an NSI defines and publishes a certain format for the response data of a survey, the data providers are able to write programs which produce corresponding files, and business software companies can integrate a respective export function into their products. An example for this ADC technique is the specification of the CUSDEC/INSTAT Edifact message for Intrastat declarations.

18 VBA: Visual Basic for Applications19 IQML: Intelligent Questionnaire Markup Language – a 5th framework R&D project

6

Page 7: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Automatic extraction of business dataWhat is regarded as “raw” data by an NSI may in fact be the result of expensive and time-consuming processing steps carried out by a data provider. Therefore the automatic extraction of data from the EDP systems of the respondents offers great promise with regard to the goal of lowering the burden. Within the TELER20 and IQML projects, such software has been developed and tested. Statistics Austria is cooperating in the Eurostat project STIPES21 which aims to implement a flexible and modular program for raw data extraction and conversion. It is planned to couple the “STIPES Transformation Program” (as it is called as a working title) with the “e-Quest Questionnaire Manager”.

Other methods

Among other ADC techniques are the following:o Acceptance of raw data in a format specific to the respective data provider (as the NSI has to

develop programs to convert these files into a format it can process, this procedure is only economical in the case of particularly important PSIs).

o Provision of software for encrypted electronic transfer of any file from the respondent to the data collector, as it has been implemented by Statistics Canada in its “Data Return Facility”.

o Automatic retrieval of information available in the World Wide Web, for example prices from web shops.

o Use of laptop and handheld computers for data entry, for example for the registration of prices in shops.

o Touch-tone Data Entry.o Interactive Voice Recognition.o etc.

All these methods for automated data collection have their own specific advantages and disadvantages. When a decision has to be made which of them should be applied to a certain survey, several criteria have to be taken into account (to name a few: type, complexity, scope and periodicity of the survey; availability of data in electronic form; technical equipment – hardware and software – of the data providers; expectations of the respondents and their willingness to accept and use a certain technique; resources and know-how available at the statistical institute). Without consideration of the costs, a multi-mode approach – combining paper questionnaires with one or more ADC methods – will probably often turn out to be the optimal solution.

e-Quest and TTK

One of the first questions that must be answered by an NSI starting to implement ADC for one or more surveys is: “To buy or to make?”

For some techniques of automated data collection the answer is fairly simple. If there exist practically proven tools on the market which implement a large percentage of the needed features (including all “must” criteria) and for which sufficient support by the manufacturer is guaranteed, as it probably is the case in the areas of scanning, OCR and CAPI/CATI, then the most economical solution would certainly be the purchase of one of these products. Still, the selection of the best-fitting tool – not only in terms of functionality, but also with regard to hardware and software requirements and the expected costs of its integration into existing statistical processing systems – and the acquisition of know-how for using it may prove to be quite demanding.

20 TELER: Telematics for Enterprise Reporting – a 4th framework R&D project21 STIPES: Statistical Inquiries from Popular European Software

7

Page 8: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

In the field of CASI, however, the situation is not so straightforward. Electronic questionnaire software which has been tailor-made for a certain survey cannot be re-used for other surveys or in other NSIs (except when a survey, like Intrastat, is identical or very similar in different countries). When we got started on developing an integrated EDR solution for the Austrian short-term, structural business and production input statistics in Summer 1998, it quickly became clear that buying was not an option on hand. Meanwhile the circumstances have changed in so far as some survey-independent CASI solutions have been developed (often by a statistical institute or as an outcome of an EU R&D project) and used successfully – so a transfer of technology (in the sense of selling/buying a tool) would now be feasible. Of course, the prospective buyer cannot avoid carrying out a thorough requirements analysis and, based on its results, a detailed evaluation of the obtainable products.Let us have a look at some important evaluation criteria (and associated difficulties regarding the technology transfer between NSIs), illustrated by the example of e-Quest:

Is the software adequate for the intended usage scenario?If an NSI

only plans to implement EDR for a handful of surveys and does not expect any surpassing demand for electronic questionnaires,

if the questionnaire forms are quite simple, if the NSI does not want a solution which requires deployment of the software on CD-ROM

and installation on the PCs of the respondents, if features which differentiate the “e-Quest Questionnaire Manager” from other tools are not

required (for example:

very high security of the response data because the answers are stored locally in a fully-featured relational database server;

plug-in modules for displaying extensive classifications; very flexible specification of event handling within

questionnaires by writing short VBS22 subroutines which are embedded into the metadata describing the questionnaire form;

possibility for implementing very complex validation rules in the form of VBS code;

auto-import function for easily automating the provision of answers which do not change often in a periodical survey;

export/import interface in XML format; etc.),

then it is likely that e-Quest will not be the optimal product. To be precise: in such a scenario the classical e-Quest would be overkill, as it was developed in particular 1) to suit the requirements of very complex surveys and 2) to build a standardised infrastructure for electronic data collection that can be re-used and adapted for many surveys.The picture changes, however, if e-Quest/Web – currently under development – is taken into consideration. Part of the results of this project will be a portal where a PSI authenticates his identity with user-ID and password and selects a questionnaire, which he can fill in online (in one or more sessions, because saving partially completed forms will be supported) with a standard web browser, without being forced to activate Javascript or cookies. These web forms will be created by statisticians23 with the “e-Quest Metadata Manager”. Based on the metadata defined with this user-friendly software, all components of the web application (as already listed in a previous chapter: HTML and JSP files, Javascript code (optionally), Java classes, web services and SQL statements) will be generated.

22 VBS: Visual Basic Script23 as long as there is no need for advanced functions, which will require some work by a Java programmer

8

Page 9: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Is further development and maintenance of the software guaranteed?Although the classical e-Quest has been created in cooperation with CSC Austria, the software – including its source code – is completely the property of Statistics Austria. After finishing the project at the end of 2001, maintenance and development have continued (partly by Statistics Austria, partly by CSC) – the current version of the “e-Quest Questionnaire Manager”, for example, is 1.9. As can be seen by the “e-Quest/Web” project24, Statistics Austria is committed to carry on and further improve the system.

Are training and technical assistance available?The criterion whether sufficient support will be provided is a crucial one whenever the transfer of technology from one statistical institute to another is considered – especially when the tool in question is complex. A lot of programs are produced by NSIs, but mostly for their local needs. Selling software and supporting it do not belong to their core business.What has been said in the previous paragraph also holds true in the case of e-Quest. In the first place e-Quest was developed to fulfil our own requirements, but because of its generic nature it might be employed by other CSIs as well. As Statistics Austria neither has the necessary know-how nor the resources for selling and supporting software, it was decided to cooperate with CSC Austria in this respect. If another statistical institute is interested in using e-Quest, CSC will be its contract partner and also offer additional services, for example training or writing software components for the integration with legacy systems. We think for any prospective buyer this is a good solution, as CSC is an acclaimed ICT company active in several countries.Regarding e-Quest/Web, a similar arrangement with Software AG and CFC will probably be considered.

What steps of the data collection process are supported?e-Quest is much more than just an electronic CSAQ program. It is a generic, metadata-driven system for statistical raw data collection by means of self-interviews with electronic questionnaires. It provides a standardised, integrated and survey-independent infrastructure for the development (with the “e-Quest Metadata Manager”) and distribution of function-rich electronic questionnaires and for the management and initial processing of the incoming response data (the so-called “Receive” subsystem, consisting of several programs). The “e-Quest Questionnaire Manager” is not only used by the respondents, but also by statisticians for viewing and eventually correcting the data.Starting early in 2004, the e-Quest system will also comprise a web portal and allow to a great extent the automatic generation of all components of web questionnaires.The administration of respondents and their associated observation units is not part of e-Quest. This information must be provided by external programs in a standardised XML format.

Can the software be used out-of-the-box?In theory it is possible to install the e-Quest programs, create database tables, define some configuration parameters and immediately start to define simple questionnaires, deploy them (and the “e-Quest Questionnaire Manager”) to respondents and process the incoming files with the “Receive” subsystem. In real-life scenarios, however, some effort will be required to integrate the system into the existing environment and to learn to use it (but those tasks most likely have to be carried out for any generic CASI solution). At least a plug-in module which transforms the response data from XML into another format for follow-up processing or which loads them into survey-specific database tables must be written. For surveys in which respondents and observation units are pre-defined by the CSI, the respective respondent-specific metadata in XML format has to be prepared too.

24 where the software companies Software AG and CFC (not to be confused with CSC) are our partners9

Page 10: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

How much does it cost?e-Quest is not a cheap product. But when its flexibility and manifold features are considered, it quickly becomes obvious that implementing a comparable EDR system from scratch would cost a lot more. One should also take into account that the requested price is a one-time payment which includes all future versions. And finally the fact should not be forgotten that e-Quest was not financed by the R&D budget of the EU – although its development surely had a lot in common with R&D work.

What hardware and software are required?The question if a tool which is considered for TTK runs in the existing EDP environment of the prospective customer, is without doubt another important evaluation criterion. Often NSIs are not willing to buy additional hardware and/or software (for example a certain operating or database management system) as it may lead to a substantial price increase of the solution they have an eye upon (apart from licence fees, also know-how about these products has to be built up or must be obtained from an external service provider).The classical e-Quest is based on Microsoft Windows (practically all versions from Windows 95 upwards), thus it is possible to use the software on probably 90 % of all PCs. The database systems Microsoft “SQL Server” (including MSDE25), IBM DB2 and MIMER SQL are supported. Statistics Austria’s in-house installation for the processing of incoming response data uses DB2 for z/OS, running on an IBM mainframe computer.

The Windows program „e-Quest Metadata-Manager“ is not only part of the classical e-Quest system, but also of e-Quest/Web. At Statistics Austria, the e-Quest/Web portal and the generated web questionnaires will run on Linux in connection with the IBM Websphere HTTP26 and Application Servers and DB2 for z/OS. The generation of HTML and JSP files, Java classes etc. will be conducted in Websphere Application Developer. But as the system is very flexible, we do not expect that adapting it to other databases and Java application servers (conforming to the J2EE27 standard) will cause much problems – as a matter of fact, our partners Software AG and CFC are using an Oracle database and the free open source tools Eclipse and Tomcat for development instead of the IBM products. 28

Which languages are supported?On the one hand the various languages spoken in Europe contribute to the cultural variety within the EU, on the other hand they often are a cause for additional costs as compared to – for example – the situation in the United States. Not surprisingly, TTK is influenced by the “language problem” too.Unfortunately, as e-Quest has been developed in a German speaking country, the user interface of all programs and the lion’s share of the documentation are only available in German – which unquestionably makes it more difficult for most NSIs to evaluate the software. But the source code has been prepared for multiple languages, so a quick adaptation would be possible.

If the evaluation of existing tools leads to a negative result (or if there are no solutions available which fulfil the specific requirements and the “must” criteria defined by a statistical institute), the question “To buy or to make?” has to be answered with the second option. When an NSI chooses to develop a CASI software, the transfer of know-how becomes even more important than in the case of purchasing a ready-made product.

25 MSDE: Microsoft Data Engine, a version of “SQL Server“ which can be deployed free of charge26 The Websphere HTTP Server practically is the same as the open source web server Apache.27 J2EE: Java 2 Enterprise Edition28 Adding support for Microsoft’s Internet Information Server and the .NET framework would require more work by Software AG and CFC, but would be feasible too.

10

Page 11: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Know-how transfer often is a two-way process. When the e-Quest project was launched in 1998 (at that time we just spoke about the “electronic questionnaire project”, later changing the name to SDSE29 and e-Collect, before its final name crystallised at the end of 2000), one of our first tasks was to investigate existing CASI tools and EDR projects and to “inhale” as much information about EDR as possible. To give an example: from the TELER project (in that year still in progress) we learnt that electronic questionnaires are not the final goal, but just the first step. What really should be aimed at, at least for voluminous monthly or quarterly surveys, is automating the provision of statistical raw data as much as possible. This is the reason why we consider the import interface of the “e-Quest Questionnaire Manager” as one of its most important features. It is planned to go the next step in this direction by integrating the “STIPES Transformation Program”.

Right from the beginning, we tried to not only obtain knowledge from others but also to return some expertise, e.g. in the form of status reports. For the first time this happened in May 2000 at the UN/ECE30

Seminar on Integrated Statistical Information Systems (ISIS 2000) in Riga. A room paper written two days before the flight to Latvia and distributed on the first day of the conference was regarded as being so interesting by the program committee that it was promoted to the status of “invited paper” over night. In the following years we contributed papers and posters to several other meetings, for example UN/ECE conferences in Washington and Geneva, the ETK-NTTS conference in Crete, the International Conference on Electronic Commerce (ICEC) in Vienna, the International Conference on Questionnaire Development, Evaluation and Testing Methods (QDET) in Charleston, South Carolina31, the Informatics 2003 conference in Bratislava and now the AMRADS32 / TTK 2 conference in Rome. Presentations were given at several other occasions, e.g. at meetings of the CORD33 task force (where Statistics Austria has become one of the core members) and of the EEG634, at the AMRADS training course in Tirana, at an EDR course held at the TES institute in Luxembourg and at work sessions of the CODACMOS35 project.We have also become member of the EEG6 working group 4 (WG4) which is responsible for developing a standardised XML format for raw data collection, called XML4DR36. Our XML schema for the provision of classifications was accepted by WG4 and will be incorporated in the XML4DR standard.

Apart from papers and presentations, personal contacts with people who are working in the same field are probably the most rewarding aspect of participating in these meetings. Informal talks can be a very significant source of information, allowing a flow of know-how in both directions. Visiting experts at other NSIs in order to exchange experiences is also worth mentioning in this respect. In the last years the e-Quest team welcomed delegations from Hungary and Lithuania, and just a few weeks ago we had a telephone conference with colleagues from the Swiss Federal Statistical Office.

When an NSI takes the decision to develop an EDR software – no matter if it wants to implement a web application, a stand-alone CASI program, a generic, metadata-driven CSAQ system or whatever else – the question must be answered whether it has a sufficient number of EDP experts with adequate ICT skills at its disposal.

29 SDSE: System zur Durchführung statistischer Erhebungen (system for conducting statistical surveys)30 UN/ECE: United Nations Economic Commission for Europe31 On the QDET web site (http://www.jpsm.umd.edu/qdet/qdet-set.html, link “Pictures of the Conference”) a photo of Wolfgang Koller (manager and “godfather” of the e-Quest and e-Quest/Web projects) in front of the e-Quest poster can be seen.32 AMRADS: Accompanying Measure for Research and Development in Statistics – a 5th framework R&D project33 CORD: Collection of Raw Data34 EEG6: EDI Expert Group 635 CODACMOS: Cluster of Data Collection Integration & Metadata Systems for Official Statistics – a 5th framework R&D project36 XML4DR: XML for Data Reporting

11

Page 12: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

In general, writing software is not an unusual activity for a statistical institute, but most applications are intended for internal use only. An electronic questionnaire, however, targets an audience beyond the borders of this well-known environment, making high demands with regard to software quality. It’s “enemy territory” out there, as we use to say. Different web browser versions which do not conform completely to W3C37 standards, different operating systems, different versions of the Java or .NET runtime environment, previously installed software which interferes with one’s own programs (“DLL hell”, as it is called by Microsoft itself) – they all contribute to the complexity of an EDR project and can be the cause of many troubles.

In the good old days of mainframe computing, the scope of EDP experts’ professional competence was quite manageable: one operating system, one or two programming languages such as Assembler, COBOL or PL/I, JCL (Job Control Language), a few utility programs and some other IBM products such as CICS or ISPF, some software analysis methods, some basic algorithms and data structures, in later years additional relational database system knowledge, SQL and data modelling techniques such as entity-relationship diagrams – that was it, more or less.

Nowadays, however, the amount of know-how that has to be mastered by any statistical institute’s ICT division has multiplied (just to name a few key technologies, concepts and products that might be of relevance in the context of EDR projects, without explaining the acronyms: object oriented analysis and design, programming languages such as C, C++, Visual Basic, Delphi, Java, C#, Javascript, VB Script, Perl, PHP, extensive class libraries, UML, XML, XML schemas, DOM, SAX, HTML, CSS, XSLT, Windows APIs, COM, ODBC, JDBC, LDAP, JSP, ASP, .NET, ADO, 3- and n-tier architecture, model-view-controller paradigm, EJB, servlets, applets, web services, SOAP, JNI, J2EE, model-driven architecture, web deployment, EAR, JAR, OLEDB, CGI, ActiveX, data encryption, Installshield) – way more than a single person could ever hope to grasp in all their detail.

If an NSI does not have enough manpower or ICT knowledge to carry out an EDR project by itself, outsourcing could be a solution which also allows the transfer of know-how from the software company to the statistical institute38 (unfortunately, it may also lead to new problems, such as increased dependence on external service providers). The classical e-Quest, as already mentioned, was implemented by CSC Austria – but throughout the project there was a very close cooperation with Statistics Austria’s project team, because we were convinced that this was the only way such an ambitious undertaking could succeed.

One big problem in connection with outsourcing is to find the right partner. In the case of e-Quest, we conducted an international call for tenders which was carried out as a two-phase negotiation procedure. In a first step we were looking for interested software companies, based on a short description of the task at hand and the required skills. Six of 15 bidders were then selected for the second phase. In addition to a requirements document of almost 30 pages they were invited to an obligatory meeting that we organised in order to supply background information and to give them an opportunity to pose further questions.

Cooperation between NSIs could be another way to cope with the problem of lacking know-how – provided that this is feasible in the concrete circumstances. The e-Quest system had to be realized in such a tight schedule that any additional coordination effort would have been contra-productive. Sometimes a small but dedicated team can achieve more and in a shorter time than a large consortium is able to do.

37 W3C: World Wide Web Consortium38 By the way: Frederick Rennert, originally the technical project leader for the e-Quest system at CSC Austria, has changed his affiliation and is now working for Statistics Austria – which might be viewed as a special form of TTK.

12

Page 13: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Occasionally a statistical survey is conducted similarly in several countries, allowing for a centralised development of EDR software, as happened with IDEP/CN8. Unfortunately, Eurostat has ceased its support for this tool. In our view this was a regrettable decision which endangers the existence of a software that in some countries was very successful (in Austria it is used by about 50 % of enterprises which have to fill in Intrastat declarations; Statistics Austria receives 95 % of all Intrastat data records electronically39). Recently, some NSIs have developed web forms for Intrastat, some others may be planning such an endeavour – in this context the question has to be asked, how often is the wheel being re-invented in these projects?

In order to accelerate the implementation of ADC infrastructures and to support the transfer of technology and know-how (not only in the area of automated data collection), the concept of open source software could be an idea that is worth investigation. Open source has gained substantial momentum in recent years and – considering lively discussions at CORD and STNE40 meetings – seems to be of interest to many NSIs. Eurostat plans to deal with this subject at the beginning of 2004.

In our view the most interesting question in this respect is: Will it be possible to establish an active community within the ESS which not only uses software written by other NSIs and donated as open source, but which also carries on the maintenance and further development of these programs so that the original creator also gains some benefit from the donation? If Eurostat decides to release it as open source, the “STIPES Transformation Program” could be a good test case because of its modular design and well-defined functionality. Last but not least, instead of a …

39 not all of them by IDEP/CN8 users.40 STNE: Statistics, Telematic Networks and EDI

13

Page 14: ADC, EDR, TTK: Views and Experiences of the e …€¦  · Web viewThese data are either the answers to the questions asked by statistical institutes, ... Questionnaires in the form

Conclusion

… let us mention one final experience that we had (again) during the e-Quest project. Internally we call it “Koller’s house-building law” or just “Koller’s law”, but it holds true for any kind of project:“A complex activity always takes twice as long as expected, plus 14 days – even if you already took Koller’s law into consideration.”

References

Wolfgang Koller, Frederick Rennert, Günther Zettl: e-Quest: A metadata-based system for electronic raw data collection. In: “Statistical Journal of the United Nations Economic Commission for Europe”, Volume 19 Number 3 (2002), p. 187 - 200

Mauro Masselli: Report on Automated Data Collection and Capture. Available at the AMRADS web site (http://amrads.jrc.cec.eu.int/downld/wg/doc/ 35_REPORT_Automated_Data_Capture.doc)

Uwe Kunzler: Electronic data reporting (EDR), metadata, standards and the European statistical system (ESS). In: “Statistical Journal of the United Nations Economic Commission for Europe”, Volume 19 Number 3 (2002), p. 119 - 130

14