workflow management for etl development

13

Click here to load reader

Upload: jacob-j

Post on 13-Mar-2017

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Workflow management for ETL development

Workflow management for ETL development

Aaron W. Smitha, Nayem Rahmanb* and Jacob J. Schmittb

aIntel Corporation, Folsom, CA, USA; bIntel Corporation, Hillsboro, OR, USA

(Received 30 May 2013; accepted 26 July 2013)

The management of data, including collection, control, movement, transformationand storage, has moved from a background consideration to the forefront ofmaintaining and increasing a business’ competitive edge. Accordingly, the processes,technologies, activities, stakeholders and governance around developing Extract,Transform and Load (ETL) applications have increased in number and complexity.To ensure ETL solutions can be delivered at the speed of business while maintainingquality, availability and stability of the live environment, these constraints must bedealt with as efficiently as possible. Simply removing steps is often not possible dueto separation of duty, legal or business/project/information technology (IT) processbest-practices requirements. Therefore, taking action to improve and then managethe process touch-points, workflows and inputs/outputs becomes critical to success.This article proposes a Project Lifecycle Integration Tool (PLIT) which centralizesprocesses and information, manages and automates workflow, and logs informationnecessary to monitor and improve quality of processes and output in support of anETL development environment.

Keywords: workflow; workflow management; ETL; data warehouse; project lifecycle

1. Introduction

The difficulties and costs to coordinate the people, processes and projects active inbuilding, maintaining, growing and evolving the data warehouse environments will growat least as fast as the environment(s) they are meant to support. Information technology(IT)’s focus is usually on the technology challenges and opportunities needed to simplykeep pace, let alone succeed at delivering return on investment (ROI) to the decisionsupport service (DSS) and business intelligence (BI) solutions (Chaudhuri, Dayal, &Narasayya, 2011; Shanks & Bekmamedova, 2012). Simultaneously, IT is expected todeliver this growth within budgets which are frequently flat to declining, particularly inthe current economic climate. Combining these facts with a general lack of trainingwithin IT on process design, measurement or improvement, it is not surprising thatprocesses’ quality suffers. ‘Good enough’ is likely to be an answer to the question ofhow a data warehouse environment would describe if its processes meets its needs.

With little additional prying, a long list of issues, problems and constraints wouldlikely be supplied by the people using the processes available to them for managing

*Corresponding author. Email: [email protected]

Journal of Decision Systems, 2013Vol. 22, No. 4, 319–331, http://dx.doi.org/10.1080/12460125.2013.829961

� 2013 Taylor & Francis

Page 2: Workflow management for ETL development

their work in the data warehouse. As the saying goes, ‘admitting you have a problem isthe first step’. However, in the technologically-focused data warehouse, that first step isoften the only one taken.

In this article, we propose the need for data warehouse environments to recognizethe significant impact their processes have on their success and the speed, efficiencyand cost of maintaining and growing data warehousing solutions. ‘Good enough’ willconsistently not deliver the DSS and BI value expected by the business nor will it keepcosts at a level necessary to keep IT investments sustainable. With a small investmentin time and resources to improve processes, ROI in the form of human resource produc-tivity can be realized. If that is followed up with further investment in automation tomanage the improved processes (Sen, Sinha, & Ramamurthy, 2006), significantly moreROI can be delivered.

We will describe one such investment which resulted in the Project Lifecycle Inte-gration Tool (PLIT). PLIT centralizes and automates workflows for the improvedprocesses used in support of Extract, Transform and Load (ETL) development in a datawarehouse environment. There were four key problem areas defined during the initialenvironment scan (Figure 1): (1) processes are decentralized – there is a lack of clearpurpose or ownership of governing processes led to fragmentation and stove-pipes; (2)inconsistent tools – efficiency is lost moving between tools and formats; users are con-fused and frustrated; (3) developers question the value of the processes – how processinputs generate outputs valuable to personnel or customers is unclear or lacking, and(4) governing processes are perceived as distractions from the developer’s mission – theconnection between governing the development of code and generating business valueis not explained, measured or reported.

When re-designing the processes and developing the PLIT application, threeresolution areas were focused upon: (1) centralize processes and information – eliminatefragmentation and create a single user experience; (2) focus on just-in-time information– only ask what is needed for the process and only ask it at the time it will be utilized,and (3) measure quality of process and quality of output – use the information enteredby users and logs generated by the tool to form a data repository allowing foroperational monitoring and analysis to driver further improvements. The PLIT solutioncomplements other interfacing tools and processes by using common language andintentionally not stepping into the areas controlled by traditional Project Managementand Resource Management solutions.

Figure 1. Problems, resolution areas and goal.

320 A.W. Smith et al.

Page 3: Workflow management for ETL development

2. Research approach

In data warehouse projects there are many stakeholders and many dependencies ofwork. To manage those works, in the past, we used different tools including objectmigration tools (Rahman, Burkhardt, & Hibray, 2010), solution manager tools, sourcecontrol tools and a few other tools and utilities. But that posed great challenges andrisks in synchronizing and doing a flawless production release. In this study, we makean attempt to manage ETL development, testing and production release-related workusing a workflow management tool called the Project Lifecycle Integration Tool (PLIT).We did a thorough investigation and study of previously used tools and proposedworkflow management tool and then did several experiments to find the differences,benefits, and cost avoidance with the use of proposed workflow management tools.

We also did an extensive literature review in identifying the application and benefitof workflow management in many other areas of business. Nowadays, BI is the key foran organization’s data analysis (Chen, Chiang, & Storey, 2012) and its strategic as wellas tactical business decision-making (Rutz, Nelakanti, & Rahman, 2012). And most ofthe medium to large business organizations (Wixom & Watson, 2001) use the datawarehouse as an underlying data layer for their BI tools. In fact, the data warehouse isconsidered as a prominent IT infrastructure (Weill, Subramani, & Broadbent, 2002) ofbusiness organizations. Once a data warehouse is successful in a business organization,tons of projects and applications start landing in data warehouses over the years. This isbecause business organizations find value in landing applications consisting of generalreporting, BI (Meredith, Remington, O’Donnell, & Sharma, 2012), data mining, etc.Hence, a significant amount of new work, enhancement and maintenance activitieshappen as part of daily business operations.

To implement projects in a data warehouse, a bunch of stakeholders (Oberweis,1994) are involved including management, ETL architects/developers, data analysts,database administrators (DBA), systems analysts and the release management team.These stakeholders need information regarding project milestones; they need to assignor take actions required (ARs) and provide deliverables within deadlines. A workflowmanagement tool for ETL development in data warehouses allows all the stakeholdersto get project information and do their work within the deadlines. Information capturedin this workflow tool also helps to retrieve information for audit purposes. Hence, aworkflow management tool for a data warehouse is critical for data warehouse imple-mentation and maintenance.

Academics and researchers have done a significant amount of research in the fieldof data warehousing (Gelbard & Spiegler, 2005; Rahman, Marz, & Akhter, 2012; Rao& Osei-Bryson, 2008; Steiger, 2010). Most recent work on data warehousing hasfocused on faster development and deployment (Brobst, McIntire, & Rado, 2008; Gorla,2003; Junior, Mendonca, & Rodrigues, 2009; Rahman, Rutz, & Akhter, 2011), datawarehousing strategies in connection with BI (Cooper, Watson, Wixom, & Goodhue,2000; Lonnqvist & Pirttimaki, 2006; Ramakrishnan, Jones, & Sidorova, 2012; Watson,2009) and implementation issues (Ballou & Tayi, 1999; Chenoweth, Corral, &Demirkan, 2006; Eden & Padmanabhan, 2006; Hwang & Xu, 2007). Research workhas also been done in data warehouse refreshes using various tools and utilities (Chau-dhuri & Dayal, 1997; White, 2005). Contrary to that, in our work we focus on imple-menting projects in data warehouses faster using a workflow management approach.This includes doing ETL work, unit testing and functional testing in the developmentenvironment, migrating ETL objects to the testing environment and performing systems

Journal of Decision Systems 321

Page 4: Workflow management for ETL development

and user acceptance testing and, finally, deploying to the production data warehouseenvironment. So, starting ETL work in development and then deploying to productionvia the testing environment by following the project milestones is very important forsuccessful project implementations in a data warehouse. The workflow management iscrucial for implementing data warehousing projects efficiently and in a timely manner.

Business organizations need to respond quickly to changes, and provide new ser-vices faster without compromising productivity, quality and costs (Sheth, 1995; Xu &Ramesh, 2002) in implementing data warehousing projects. In this instance, a workflowmanagement tool proves itself immensely beneficial in terms of improved efficiency viaautomation, better process control via improved management, improved customer ser-vices, flexibility (software control over process), and business process improvement(Oberweis, 1994; Xu & Ramesh, 2002).

Extensive research has been done on workflow management (Alonso & Schek, 1996;Aversano, Canfora, De Lucia and Gallucci, 2002; Cugola, Di Nitto, Fuggetta, & Ghezzi,1996; Georgakopoulos & Hornick, 1994; Leymann & Roller, 1997; Schmidt, 1999;Sheth, 1995; Weske, 2001) from different perspectives. Adams, ter Hofstede, and LaRosa (2011) presented open-source software for workflow management. Their effort wastowards standardization of workflow management. Weske, Goesmann, Holten, and Stri-emer (1999, 2001) designed development methodology for workflow application devel-opment processes to enable project managers and application developers to go through acomplex structure of the processes. Georgakopoulos and Hornick (1995) present an over-view of workflow management from the perspective of process modeling to workflowautomation infrastructure. The authors discuss infrastructure technologies to address thelimitations of existing workflow technology. Fetaji, Fetaji, and Ebibi (2010) attempted tosolve design, development and evaluation of sustainable workflow and document man-agement processes. Grambow, Oberhauser, and Reichert (2011) examined a workflowlanguage for the description of software engineering processes. Yu and Buyya (2005) pre-sented a taxonomical overview of different approaches for building and executing work-flows on grids. They studied the design and engineering similarities as well as differencesin grid workflow systems. Aversano, Betti, De Lucia, and Stefanucci (2001) undertook acase study by introducing workflow technology in a large software company. Theyattempted manage the ordinary maintenance process through a workflow-based approach.However, none of the above work was done to study how to manage the processes relat-ing to ETL development in a data warehousing project. There are many stakeholders andindividual tasks involved in data warehousing projects (Figure 2). In this study, we makean attempt to manage ETL development and a host of other related work by using aworkflow management tool called the Project Lifecycle Integration Tool (PLIT).

The PLIT tool is designed to sit between and complement other tools and systemssuch as Project Management, Resource Management, Change and Release Management,and Live Environment Support. By providing a ‘one-stop shop’ and automation for theinputs and output between these necessary but often disconnected Project LifecycleManagement steps, PLIT can produce significant ROI by reducing wait time betweensteps and keeping ETL developers more productive, spending time on developing,testing and documenting their code (Figure 3).

3. PLIT high-level design

PLIT is project-centric, but it is not a Project Management tool and is best thought ofas a Workflow Management tool. When the data warehouse organization has used

322 A.W. Smith et al.

Page 5: Workflow management for ETL development

Engagement Processes to accept incoming demand, prioritize and resource that demand,it will organize that work under a program or project. When a project will involve ETLdevelopment work, the Project Manager will open PLIT to enter the high-level detailsof that project including name, description, Project Manager, Sponsoring Organization,

Figure 2. Extract, Transform and Load (ETL) project workflow prior to improvements.

Figure 3. Extract, Transform and Load (ETL) project workflow after improvements.

Journal of Decision Systems 323

Page 6: Workflow management for ETL development

and numbers relating the project to financial tracking (Figure 4). The user will also clar-ify the ETL components to be involved in the project (these are triggers for specificworkflows) and enter the estimated dates for Quality Assurance (QA) and productionmigrations of the project. Upon creating the project this information is saved to thebackend database and a sequentially unique Project ID is assigned to the entry. Whenthis confirmation is returned to the user’s GUI (graphical user interface), the next tab(Resources) becomes visible and available for use. Also as part of the automatedprocesses, AR (Action-Required) tasks are triggered by the project creation, captured inthe AR system, and emailed to contacts pre-configured in the Administrative Console.

In this same ‘requirements achieved, next step available’ methodology, the rest ofthe tool is procedurally made available to the user, helping by keeping the user experi-ence tidy and guiding the user through the process steps. Once a project is created theuser can access a visual and interactive workflow representation of the project which iscolour-coded based on the status of critical-path tasks within the open project. Thecoloured process steps can be clicked to drill into further detail and again to take theuser directly to the section within the tool related to that step in the workflow.

The AR Tab (Figure 5) is visible in the tool from project creation and gives allinterested parties a single place to track and respond to tasks generated by key mile-stones (described above in project creation), triggered by the need for tasks to be per-formed (described below in the ETL Tab), triggered by time-based milestones(described later), or even ad hoc tasks created manually by stakeholders of the project.All the tasks receive unique numbers for easy tracking and are flexible in their simpleformat. Needing only a descriptive name, a due date and an assigned-by and anassigned-to value to create, the tool will alert the task recipient by email.

After project creation, the user is led to name key resources on the matching tab.Using a drop-down of roles to choose from, key resources can be named to the project.The roles included are limited only to those who will be performing milestone tasks orwill be assigned required actions from critical tasks during the project workflow. Theassignment of some roles is restricted both by who may add or modify the role assign-ment as well as by the list of available names. When resources are coming from a lim-ited pool of pre-known people, a drop-list is populated by the individuals who havebeen granted the matching security role in the backend database. Similarly, securityroles are leveraged to limit who may edit some roles when process dictates a resourcingor scheduling decision outside the tool must be made. This double-use method of rolesgranting functional access to the tool, as well as leveraging the roles (such as DatabaseAdministrator, ETL Architect and Migration Analyst) to control the names availableand access to use related drop-lists, lowers the maintenance of the tool to almost zero

Figure 4. Details section of project tab.

324 A.W. Smith et al.

Page 7: Workflow management for ETL development

and makes the management of resources a self-service function. When a resource couldcome from any company employee, a search function is available to look up by nameor ID pulled from publicly available identifying information. This information is alsoleveraged when the system generates emails to resources named to the project.

The next step is the ETL Tab (Figures 6 & 7) where the responsibility of populatinginformation likely shifts from Project Manager to lead ETL Developer (ETL Tech Lead)because only they will know the required details. The ETL Tab is the most functionallycomplex section of PLIT, giving the ability to pull in existing ETL objects, request newETL objects, define complex relationships between the ETL objects, enter all therequired configuration details for each object and trigger the AR tasks to those responsi-ble for the work. A version control system maintains the data integrity of the ETLobject configuration details, protecting from competing updates, saving time by avoid-ing the need for data re-entry and maintaining a change log for historical and reportingpurposes. Nested tab sections and contextually sensitive buttons help keep the interfacestandard and simplified despite the large amount of functionality available in thissection. This area is synonymous with the design and develop phases of a softwaredevelopment project, and these functions are expanded further in later sections of thisdocument.

It is necessary to enter at least one ETL object, existing or new, before the next setof tabs becomes visible. After one or more objects have been added to the project and

Figure 5. Action required (AR) tab.

Figure 6. Extract, Transform and Load (ETL) tab in add new subject area/project folder model.

Journal of Decision Systems 325

Page 8: Workflow management for ETL development

saved, the Support Tab and Release Tab are available for use. The Support Tab uses thesame version control capability to display existing details or allow for the update oraddition of information regarding the individuals and teams who will support a givenobject during its development, stabilization and eventual sustaining time in the liveenvironment. Details captured include the type, level, schedule, geography and key con-tact information for the supporting resource. These details are saved in relational tablesso that new support relationships can be defined with a few clicks rather than reenteringinformation.

The Release Tab is where the PLIT workflow participates with the environment’sChange and Release processes. Although robust enough to stand as the sole Changetracking mechanism, PLIT was designed with the requirement to integrate rather thanreplace the existing System of Record for Change request and Release tracking. To thisend, the focus was again to avoid duplication by only asking for the minimum data nec-essary to facilitate the remaining development workflow. PLIT accepts the numbers pro-vided by the other Change Management system rather than generating new ones and,since they are unique, leverages them as Primary Key where appropriate.

Through the functionality of the Release Tab, the original estimated release datesfor the project to move to Quality Assurance or Production can be updated. If updatedmore than once, the tool prompts the user for a reason for the change, allowing for pro-cess quality data to be gathered for analysis. A Change and Release Management(CRM) sub-tab restricted to only those with the CRM role allows the entry and mainte-nance of information tracking the status of the migrations and where the interfacing sys-tem numbers are maintained. Digital signatures for key approvals required prior to QAand Production migrations are held on an Authorizations sub-tab. PLIT uses the identityof the active user to determine read-only versus signature authority based on the rolesentered in the Project’s Resources and CRM tabs. The final sub-tab manages the Capac-ity Management process where the named ETL Tech Lead can either self-certify theproject’s impact to the environment or provide details and submit a request for an inde-pendent review by the environment’s Capacity Manager (Figure 8).

The last area of PLIT is the Reporting Tab, where three predefined reporting struc-tures have been provided directly in the tool due to their immediacy to information ortasks related to the project currently being viewed by the user. Two of the reports aredesigned to consolidate information from multiple tabs within the tool and present it ina single text window, formatted for a copy/paste action to facilitate often repeated pro-cesses used by Administrators in the ETL environment but occurring outside the scopeof PLIT. The third report provides the ability to compare the multiple versions of anyETL object held in the version control system. This give the ability to follow the gene-

Figure 7. Extract, Transform and Load (ETL) tab – ready for new details.

326 A.W. Smith et al.

Page 9: Workflow management for ETL development

alogy of a given object as well as to provide administrators the ability to easily seewhat has changed in what can be very complex sets of configuration information pro-vided by developers. Additional reporting is handled by direct read access to the data-base view or through an independent reporting application outside the scope of thisarticle.

Finally, PLIT provides an Administrative interface which is only available to userswith the Admin role. Through this interface maintenance of the tool can be performed,such as maintaining the email addresses of distribution lists used by the workflow logicof PLIT. The Service Level Agreements (SLA) used by AR tasks generated by the toolare controlled in this interface. The ability to unlock ETL objects which were notchecked back into the Version Control system, and to unlock PLIT projects when thelast user is unavailable to request them to log out of a project, is controlled here. Lastly,the status messages displayed to users logging in to PLIT as well as the tools availabil-ity are controllable, so custom information can be displayed for a fix set of dates andusers can be locked out for maintenance or other needs.

4. Benefits

The stated goals for developing a solution like PLIT, workflow centralization and auto-mation, were expected to reduce wait time, remove unneeded data collection and auto-mate the hand-off between process connection points in the ETL developmentprocesses.

Utilizing baseline data, it was estimated there would be productivity savings in thedeveloper community of >3200 hours, equivalent to a 2.2 Full Time Equivalent (FTE)head count annually. ETL changes implemented into the production environment wereclassified into simple changes (< a month to build) and complex changes (> a month tobuild). The goal was to reduce non-value-add time (mostly time waiting on others for

Figure 8. Capacity tab (under release tab).

Journal of Decision Systems 327

Page 10: Workflow management for ETL development

tasks or approvals) for both categories by 50%. All of the waste reduction and processimprovement estimates were exceeded.

In the first quarter of use, simple changes (approximately 75% of projects per-formed) were calculated to see a 73% reduction in non-value-add time and complexchanges (approximately 25% of project performed) were calculated at a 67% reductionfor a combined weighted savings of 71.5% of developers’ time. The calculated produc-tivity savings in the developer community were >4690 hours or equivalent to returninga 3.2 FTE head count. The calculation method used had to rely on aggregated numbersdue to no tracking data being available on exactly how many developer resourcesworked on specific changes. Developer hours are not billable so direct tracking of workis not performed.

Productivity calculations were performed using the following steps: (1) count thenumber of planned ETL releases (emergency changes were not included) implementedinto the production environment by month, (2) count the number of developers activein the environment in the target timeframe, (3) average all the developer heads acrossreleases in a month multiplied by a 32-hour workweek (accounting for meetings andother non-development related time) using a 4-4-5 workweek calendar, (4) apply the80/20 rule and discount 20% of the hours from Step 3 to help account for the realitythat not all developers are working on release-related activity each week, and (5) furtherdiscount the results from Step 4, only keeping 71.5% of the hours based on the com-bined actual savings of non-value-add time.

For the first full year of use, the calculated actual developer productivity (Tambeand Hitt, 2012) savings were 10,924 hours or 7.26 FTE head count returned. In the sec-ond full year of use, the productivity savings have been further discounted from Step 5above to account for the fact no improvements to the tool have been applied and sys-tem latency and process mismatch have degraded the benefits seen: 1% off in Q1, 4%in Q2, 8% in Q3 and 15% in Q4. Even with this increasing discount, calculated annualproductivity savings of 9265 hours or 6.16 FTE head count were returned over the ori-ginal processes.

In the 2.5 years PLIT has been in use, the annual number of scheduled releases hasincreased at nearly double the volume from just over 300 to greater than 600. In thesame period the monthly number of scheduled, project driven changes implemented tothe production live environment increased from an average of 34 with a peak of 51 toan average of 45 with a peak of 81 (Figure 9). Prior to the implementation of PLIT, theaverage number of weekly project reviews was estimated at 13 with a peak of 28.During the lifespan of the tool the average has grown to 23 and the peak of 55.Although there are many environmental drivers to this significant growth in ETL devel-opment activity outside of PLIT, it can be stated with confidence that the original pro-cesses before PLITs implementation would not have been able to sustain this volume ofactivity. The success of the data warehouse environment and the business processes andvalue dependent upon the ETL solutions being developed would have been significantlyhampered.

5. Concluding remarks and future work

In the ever-constrained world of IT resourcing, working ‘smarter, not harder’ has neverbeen more necessary. With a relatively small upfront investment to holistically reevalu-ate, streamline and automate workflow processes, significant benefits can be gained. Byretaining only essential processes, scheduling them just in time, then centralizing and

328 A.W. Smith et al.

Page 11: Workflow management for ETL development

automating them, productivity will be seen through reduction in time wasted, reductionof steps and an increased confidence in the value of governance processes.

Technology (Ramiller & Swanson, 2009) will continue to evolve, but in order toincrease the value potential IT can deliver to the business, the processes through whichIT builds and lands services must also evolve. The proposed Project Lifecycle Integra-tion Tool (PLIT) describes one method through which thousands of hours of productiv-ity can be recovered. Constraints blocking the efficient implementation of IT solutionsare more often due to process and policy than technology. IT has, unsurprisingly, beensingularly focused on technology as the best way to keep up with the accelerating paceof business (Roberts and Grover, 2012). However, the benefits of expanding that focusto include process is compelling; in this case, allowing a doubling of the volume ofscheduled releases in the data warehouse development pipeline.

Further research can be conducted on how to replicate the success of these work-flow improvements on more enterprise-capable platforms rather than requiring custom-built solutions. Activities in the cloud and virtualization communities hint at otherhighly configurable, open-ended, and crowd-sourced attempts to reach higher levels ofsustainable throughput.

AcknowledgmentsThe authors wish to thank the anonymous referees for their useful comments which have led tothis improved version of the article.

ReferencesAdams, M., ter Hofstede, A. H. M., & La Rosa, M. (2011). Open source software for workflow

management: The case of YAWL. IEEE Software, 28, 16–19.Alonso G., & Schek, H. J. (1996). Research issues in large workflow management systems. In

Proceedings of the NSF Workshop on Workflow and Process Automation in InformationSystems (pp. 126–132). Athens, GA.

Aversano, L., Betti, S., De Lucia, A., & Stefanucci, S. (2001). Introducing workflow managementin software maintenance processes. In Proceedings of IEEE International Conference on Soft-ware Maintenance (pp. 441–450). Florence, Italy.

Figure 9. Nearly double velocity after improvements.

Journal of Decision Systems 329

Page 12: Workflow management for ETL development

Aversano, L., Canfora, G., De Lucia, A., & Gallucci, P. (2002). Business process reengineeringand workflow automation: A technology transfer experience. Journal of Systems and Soft-ware, 63(1), 29–44.

Ballou, D. P., & Tayi, G. K. (1999). Enhancing data quality in data warehouse environments.Communications of the ACM, 42, 73–78.

Brobst, S., McIntire, M., & Rado, E. (2008). Agile data warehousing with integrated sandboxing.Business Intelligence Journal, 13, 1–10.

Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology.SIGMOD Record, 26, doi: 10.1145/248603.248616

Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of business intelligence technol-ogy. Communications of the ACM, 54, 88–98. doi:10.1145/1978542.1978562

Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business intelligence and analytics: From bigdata to big impact. MIS Quarterly, 36, 1165–1188.

Chenoweth, T., Corral, K., & Demirkan, H. (2006). Seven key interventions for data warehousesuccess. Communications of the ACM, 49, doi: 10.1145/1107458.1107464

Cooper, B. L., Watson, H. J., Wixom, B., & Goodhue, D. L. (2000). Data warehousing supportscorporate strategy at First American Corporation. MIS Quarterly, 24, 547–567.

Cugola, G., Di Nitto, E., Fuggetta, A., & Ghezzi, C. (1996). A framework for formalizing incon-sistencies in human-centered systems. ACM Transactions on Software Engineering and Meth-odologies, 5, 191–230.

Eden, C., & Padmanabhan, V. (2006). Building an enterprise data warehouse and business intelli-gence solution [online]. Intel IT Whitepaper. Retrieved from http://www.intel.com/IT

Fetaji, B., Fetaji, M., & Ebibi, M. (2010). Software engineering interoperable environment foruniversity process workflow and document management. World Academy of Science, Engi-neering and Technology, 40, 123–128.

Gelbard, R., & Spiegler, I. (2005). Living with database conflicts: A temporal branching tech-nique. Distributed and Parallel Databases, 12, 251–265.

Georgakopoulos, D., & Hornick, M. (1994). A framework for enforceable specification ofextended transaction models and transactional workflow. International Journal of Intelligentand Cooperative Information Systems (World Scientific), doi: 10.1142/S0218215794000144.

Georgakopoulos, D., & Hornick, M. (1995). An overview of workflow management: Fromprocess modeling to workflow automation infrastructure. Distributed and Parallel Databases,3, 119–153.

Gorla, N. (2003). Features to consider in a data warehousing system. Communications of theACM, 46(2), 111–115.

Grambow, G., Oberhauser, R., & Reichert, M. (2011). Towards a workflow language for softwareengineering [online]. Retrieved from http://dbis.eprints.uni-ulm.de/700/1/SE11_SEWL_Gram-bow.pdf

Hwang, M. I., & Xu, H. (2007). The effect of implementation factors on data warehousing suc-cess: An exploratory study. Journal of Information, Information Technology, and Organiza-tions, 2, 1–14.

Junior, M. C., Mendonca, M., & Rodrigues, F. (2009). Data warehousing in an industrial softwaredevelopment environment. In Proceedings of the 33rd Annual IEEE Software EngineeringWorkshop (pp. 131–135). IEEE Computer Society, doi: 10.1109/SEW.2009.7

Leymann, F., & Roller, D. (1997). Workflow-based applications. IBM Systems Journal, 36,102–123.

Lonnqvist, A., & Pirttimaki, V. (2006). The measurement of business intelligence. InformationSystems Management, 23, 32–40.

Meredith, R., Remington, S., O’Donnell, & Sharma, N. (2012). Organisational transformationthrough business intelligence: Theory, the vendor perspective and a research agenda. Journalof Decision Systems, 21, 187–201.

Oberweis, A. (1994). Workflow management in software engineering projects. In Proceedings ofthe 2nd International Conference on Concurrent Engineering and Electronic Design Automa-tion (pp. 55–60). Bournemouth, United Kingdom.

Rahman, N., Burkhardt, P. W., & Hibray, K. W. (2010). Object migration tool for data ware-houses. International Journal of Strategic Information Technology and Applications (IJSITA),1, 55–73.

330 A.W. Smith et al.

Page 13: Workflow management for ETL development

Rahman, N., Marz, J., & Akhter, S. (2012). An ETL metadata model for data warehousing.Journal of Computing and Information Technology (CIT), 20, 95–111.

Rahman, N., Rutz, D., & Akhter, S. (2011). Agile development in data warehousing. Interna-tional Journal of Business Intelligence Research (IJBIR), 2, 64–77.

Ramakrishnan, T., Jones, M. C., & Sidorova, A. (2012). Factors influencing business intelligence(BI) data collection strategies: An empirical investigation. Decision Support Systems, 52,486–496.

Ramiller, N. C., & Swanson, E. B. (2009). Mindfulness routines for innovating with informationtechnology. Journal of Decision Systems, 18, 13–26.

Rao, L., & Osei-Bryson, K. (2008). An approach for incorporating quality-based cost-benefit anal-ysis in data warehousing design. Information Systems Frontiers, 10, 361–373.

Roberts, N., & Grover, V. (2012). Leveraging information technology infrastructure to facilitate afirm’s customer agility and competitive activity: An empirical investigation. Journal of Man-agement Information Systems, 28, 231–269.

Rutz, D., Nelakanti, T. K., & Rahman, N. (2012). Practical implications of real time businessintelligence. Journal of Computing and Information Technology (CIT), 20, 257–264.

Schmidt, M. T. (1999). The evolution of workflow standards. IEEE Concurrency, 3, 44–52.Sen, A., Sinha, A. P., & Ramamurthy, K. (2006). Data warehousing process maturity: An explor-

atory study of factors influencing user perceptions. IEEE Transactions on Engineering Man-agement, 53, 440–455.

Shanks, G., & Bekmamedova, N. (2012). Achieving benefits with business analytics systems: Anevolutionary process perspective. Journal of Decision Systems, 21, 231–244.

Sheth, A. (1995). Workflow automation: Applications, technology and research. Proceedingsof the ACM SIGMOD International Conference on Management of Data. (pp. 469–69).San Jose, CA, USA, ACM 0-89791-731 -6/95/0005

Steiger, D. M. (2010). Decision support as knowledge creation: A business intelligence designtheory. International Journal of Business Intelligence Research (IJBIR), 1, 29–47.

Tambe, P., & Hitt, L. M. (2012). The productivity of information technology investments: Newevidence from IT labor data. Information Systems Research, 23, 599–617.

Watson, H. J. (2009). Tutorial: Business intelligence – Past, present, and future. Communicationsof the Association for Information Systems, 25, 487–510.

Weill, W., Subramani, M., & Broadbent, M. (2002). Building IT infrastructure for strategic agility.MIT Sloan Management Review, Fall, 2002, 57–65.

Weske, M. (2001). Formal foundation and conceptual design of dynamic adaption in a workflowmanagement system. In Proceedings of the 34th Hawaii International Conference on SystemSciences, Hawaii, USA, January 3–6, 2001.

Weske, M. Goesmann, T. Holten, R., & Striemer, R. (1999). A reference model for workflowapplication development processes. In Proceedings of the International Joint Conference onWork Activities Coordination and Collaboration (WACC’99) (pp. 1–10), San Francisco, CA,USA, February 22–25, 1999.

Weske, M., Goesmann, T., Holten, R., & Striemer, R. (2001). Analysing, modelling, and improv-ing workflow application development processes. Software Process: Improvement and Prac-tice, 6, 35–46. doi:10.1002/spip.134

White, C. (2005). Data integration: Using ETL, EAI and EII tools to create an integratedenterprise. The Data Warehousing Institute, URL: http://tdwi.org/research/2005/10/bpr-3t-data-integration.aspx.

Wixom, B. H., & Watson, H. J. (2001). An empirical investigation of the factors affecting datawarehousing success. MIS Quarterly, 25, 17–41.

Xu, P., & Ramesh B. (2002). Supporting workflow management with traceability. Proceedings ofthe 35th Hawaii International Conference on System Sciences (pp. 1–10), Hawaii, USA.

Yu, J., & Buyya, R. (2005). A taxonomy of workflow management systems for grid computing.Journal of Grid Computing, 3, 171–200.

Journal of Decision Systems 331