wp en di talend tools insidelook · pdf fileenvironment. its appropriation is easy and fast...
TRANSCRIPT
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
An Inside Look at How Top Companies Tackle Data Integration with Talend
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
Table of Contents Introduction ................................................................................................................. 3
NEW INTEGRATION PROJECTS ..................................................................................... 3
Groupon ............................................................................................................................................ 3
CHALLENGES WITH LEGACY TECHNOLOGY .................................................................. 5
Epsilon ............................................................................................................................................... 5
IGEPA Group/HRI ITS ......................................................................................................................... 6
COST CONCERNS .......................................................................................................... 7
Newcastle University ........................................................................................................................ 7
Children’s Hospital & Medical Center of Omaha ............................................................................. 8
TIMING & URGENCY ..................................................................................................... 9
National Oceanic and Atmospheric Administration (NOAA) ............................................................ 9
Buffalo Studios ................................................................................................................................ 10
MAKING THE RIGHT CHOICE ...................................................................................... 11
AOL .................................................................................................................................................. 11
CSIA (Air Information Systems Center) ........................................................................................... 12
Conclusion .................................................................................................................. 13
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
Introduction
Anyone considering a new data integration project has faced challenges. There are
challenges that lead you to kick off the project, challenges involved when you start from
scratch, and challenges involved when you switch from a legacy system to new software.
But Talend customers know that the key to overcoming obstacles is having the right
solution in place.
In this paper, real businesses address some of the most common issues surrounding data
integration projects today and demonstrate how Talend has helped them emerge from
the process more agile and empowered—without breaking the bank.
NEW INTEGRATION PROJECTS
Beginning a new integration project can be simultaneously exhilarating and overwhelming.
On the one hand, you’re getting ready to handle all of that significant data. You’ll be able
to make use of it and realize new value. But on the other hand, where do you start? The
volume can be daunting.
The answer, as these Talend customers learned, is to implement easy‐to‐use technology
that makes the business more agile and responsive to change. We asked both companies
how they tackled their new integration projects with Talend, and what results they
achieved.
Groupon
Our exceptional growth—transitioning from a startup to a publicly‐
traded Internet giant in just a few years—placed our IT infrastructure
under considerable pressure. Every day we have to process and analyze more than 1
terabyte of raw data in real time, store this information in various database systems, and
use it to identify developments and trends as they emerge.
Every day, we run around 1,000 different data integration jobs involving Extract,
Transform and Load (ETL) processes. Some of these jobs run once a day, whereas others
run every hour or even more frequently. Talend’s integration solution loads data in
parallel to several databases. In addition to the main Teradata warehouse, Groupon also
deploys PostgreSQL and Exasol databases and a Salesforce.com Customer Relationship
Management (CRM) solution. The e‐mail marketing and Online Transaction Processing
(OLTP) systems are the most important data sources.
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
Our Talend integration solution means that our data warehouse is always updated with
the latest information, giving us as precise an image of the current situation as possible.
The system is cost‐effective, easy to use, readily adaptable and extremely versatile. With
the help of the graphical user interface, we can easily and quickly link up a large number
of source systems using the standard connectors. To connect the Teradata database, we
had to develop an interface component ourselves, but that was an easy task for our
internal developers.
At the end of the day, information‐driven decisions can only ever be as good as the
underlying information, and our information is always very good—thanks in no small part
to Talend.
Virgin Mobile France
As a Mobile Virtual Network Operator (MNVO), we are in a very
competitive sector with more and more players, and we are all facing
important challenges, particularly in terms of accounting and marketing. Our customer
base reached one million subscribers in 2008, and their calls generate close to 5 million
invoice tickets per day. These volumes are continually increasing, so we must carefully
manage our rapid growth while still dealing with profitability and trying to know our
customers better.
We use Talend Enterprise Data Integration to extract data from our production databases,
transform this data, and integrate it in the data warehouse database. Organized according
to a star schema, this database allows efficient and fast selection and aggregation of
selected data to guarantee analytical accuracy.
Talend Enterprise Data Integration also helps us go beyond the limits of transactional
systems to offer our users high performance reporting services. With hindsight, we can
affirm that Talend offers the only open source data integration solution that is really
enterprise‐ready and which provides all the necessary features—a professional solution,
quality technical support, and a high degree of integration with MySQL and the rest of our
environment. Its appropriation is easy and fast and it offers many customization options.
Senior management now has a better sense of the business, and this allows them to
respond quickly to changing market conditions and to define priorities. And the marketing
team now has the information it needs to gain market share and grow customer loyalty.
The system makes the company more agile and is a real market differentiator.
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
CHALLENGES WITH LEGACY TECHNOLOGY
Just because you’re ready to begin a new integration project doesn’t mean you can simply
eliminate any existing technology in use. Managing legacy systems during this change can
be a project unto itself.
The good news is that the right data integration solution can greatly ease this challenge.
We asked these customers how Talend empowered them to tackle outdated processes and
ultimately build more cutting‐edge infrastructure—while boosting productivity and
collaboration.
Epsilon
As the leading marketing services firm in the United States, Epsilon
deals daily with extensive volumes of consumer and business data (around 430 million
records in all). Data typically comes to us with more than 800 attributes in over a dozen
different formats, so we used to perform a lot of tedious hand coding in order to
aggregate it. Talend helps us streamline this integration process.
We had prior experience in‐house with proprietary tools, but as we moved forward, we no
longer wanted to be tied to a closed solution’s restrictions. We wanted something that
was more formally focused on data integration. Talend answered that need, and it was
one of the easiest‐to‐use solutions we tested. Because in‐house programs used previously
were written in Java or Perl, our developers were already at home with the technology.
We quickly noted that Talend outperformed some of the other products we were testing.
However, the determining factor was that the project involved legacy code that we
needed to integrate into our build solutions. With its ease‐of‐use of external applications
through the system, or through the Java drivers, Talend allows us to easily interface with
external processes. Basically, we’ve replaced or overlaid a lot of legacy technology with
Talend and it’s much easier to maintain.
As we scaled up our use of Talend, we opted to subscribe to Talend Enterprise Data
Integration, and it was well worth the investment. Beyond value added features for larger
projects, the Talend Enterprise Data Integration subscription also includes technical
support and IP indemnification.
If you work on many different systems, even for testing, the product is very efficient.
Instead of manually exporting your code over to many different systems, Talend
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
Enterprise Data Integration lets you launch that code and test it on different systems from
a single repository. It also facilitates reusability and makes teamwork pretty seamless.
IGEPA Group/HRI ITS
HRI ITS is the IT service provider for IGEPA Group, one of Europe’s leading paper
wholesalers, with more than 50,000 customers, 3,500 employees, and a product catalog
featuring 7,000 items. The Group is essentially comprised of independent companies
united by a common marketing strategy. At HRI ITS, we run a business intelligence (BI)
database for IGEPA Group that they use to support key executive decision making. We use
Talend to populate this database with key operational data.
Our challenges with legacy technology centered around scalability and speed. Before we
implemented Talend, we used Ascential software for the ETL process of extracting the
data, transforming it into a suitable format and loading it from the operational system to
the BI database. But Ascential was clearly starting to reach the limits of its capabilities.
Loading took far too long. On occasion, there simply were not enough hours in the night,
and we often had to contend with database errors and aborted jobs. Another downside
was that Ascential could not be virtualized, so it often took several hours to load a
customer database. I have actually had to abort jobs because the load rate was not rising
above 25 data records per second, which was unacceptably slow.
The slow performance combined with an ERP system that managed products at different
warehouses made it difficult for us to gain an accurate overview of product availability.
The only solution was to re‐upload the entire database every time—and that was simply
not possible because of time constraints. Talend now solves this problem by loading data
at a crazy speed. Uploading the whole database now takes as little as seven minutes.
Talend can also be fully virtualized and can run on all platforms. An ETL job is a Java
program in Talend, so it is also possible to distribute different jobs across multiple servers.
This allows my team to make optimum use of the available hardware resources at all
times. Meanwhile, the Talend Administration Center (TAC) helps us plan and allocate the
jobs efficiently.
What makes Talend so cool—and what truly helped ease the migration from our legacy
solution—was that it was both powerful and familiar. Talend resembled our old Ascential
solution in structure, which meant that we were able to find our way around the new
environment very quickly. And since Talend is based on Eclipse, we were even able to fall
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
back on in‐house Java programmers in some instances. This meant that it only took one
person six months to migrate 440 ETL jobs for IGEPA.
But IGEPA didn’t stop on completion of the planned 440 jobs. With further analysis, we
saw potential for another 538 Talend jobs. Our old solution lacked the power to handle
these, given the frequent open/close commands for the ODBC drivers. Talend not only
significantly accelerates these processes it also virtually eliminates the burden on the
databases with intelligent open/close control functionality.
Our entire team is already looking forward to seeing the end of the old system, which
caused us considerable problems. For me, Talend is a home run.
COST CONCERNS
Naturally, cost is another substantial issue that arises when companies look to implement
a new data integration solution. Budgets and resources can vary tremendously based on
the type or size of business—and all organizations want to be as efficient as possible
without compromising on performance or features.
We asked these Talend customers how they felt restricted by cost concerns during their
search for a data integration platform, and how they believed Talend lived up to their
expectations for a more affordable infrastructure.
Newcastle University
Newcastle University is one of the UK’s leading institutions of higher
education. Our major challenge was being able to input data relating to our staff
members, students, the library and the university as a whole, while ensuring data
consistency and accuracy across the board. It had become apparent that this was an
impossible job for the university to carry out alone and a decision was made to seek a cost
effective, yet reliable and scalable tool.
After we found it ticked all our boxes for scalability, real‐time capabilities, and
compatibility with all our legacy systems, we decided to try Talend Enterprise Data
Integration. Because it is open source, costs can be kept extremely low, which is essential
for the university. In addition, being open source means the product can be fully
implemented within days or weeks rather than months, which is often the case with
proprietary software—and that helps save money, too.
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
Data quality has been an evident benefit, and we are now confident that all of our data is
accurate and consistent due to Talend’s scheduling capability, meaning that jobs can be
run automatically at any time. Not only does this mean that we now have a round‐the‐
clock system processing data instantly, we have also eliminated the chance of human
error and can allow staff members to focus on their core teaching.
Working with Talend opened up a host of new opportunities for the university, as we can
continuously utilize the open source community in order to rectify issues such as bugs and
make strategic improvements to our system. Effectively, this helps us be at the forefront
of education, providing our students and staff with the best resources.
Children’s Hospital & Medical Center of Omaha
Our facility is the only full‐service pediatric specialty health care
center in Nebraska. We have cared for children since 1948, and no
child in need of medical care is ever turned away because of an inability to pay.
Our IT department is responsible for the integrity of data extracted from multiple sources
across the organization, including registration, patient billing, financials, and electronic
medical record (EMR) elements. We need to aggregate that data and load it into our data
warehouse for business intelligence purposes—but first we have to normalize data,
without actually consolidating it in a secondary data store. This has to happen in a timely
manner, with any possible error notifications received immediately, so people can act on
information quickly. For example, our ambulatory EMR produces documents that need to
be added to our legal medical record. On occasion, some of those document errors out
before being filed in the patient’s chart in Chartmaxx. Since this is clinical information that
may impact clinician decision‐making, it is very important to make sure the documents are
filed to the appropriate patient’s record and in a timely manner to support quality patient
care.
Obviously, we needed a tool with enough flexibility and robust support features to allow
us to manage data across the organization more efficiently and accurately. As a non‐profit
children’s hospital, we don’t have a large budget for our integration tools. The price point
was a major factor in our consideration of Talend. And fortunately, the platform’s
affordability did not come at the sacrifice of our necessary functionality.
Talend was leading edge as far as development, maintenance, extracts, and tools for
integrating different systems. We switched to Talend a few years ago, and since then have
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
rewritten all our ETL jobs. We get error notifications right away and can discern where
processes have failed. Talend’s support for heterogeneous systems allows us to identify
uniqueness in the data between different systems and transform it into normalized data,
virtually, with data validation occurring inside the data stream. This makes data processing
much faster and more efficient while reducing errors.
TIMING & URGENCY
Change can be tough. On the one hand, you have systems or processes in place that
simply aren’t working and need to be corrected. At some point, the urgency of the
situation demands action—but how do you know when to take the leap and begin your
integration project?
These Talend customers understood that timing was critical to the success of their projects.
We asked them how Talend helped answer the “Why now?” question.
National Oceanic and Atmospheric Administration
(NOAA)
Dating back to 1807, NOAA is a division of the Department of
Commerce and reaches from the surface of the sun to the depths of
the ocean floor to keep citizens informed of the changing
environment around them. We gather data from our scientists and satellites that provide
detailed information on water levels and weather, so we can inform our consumers, the
Department of Commerce, fisheries and other related parties. The information we deliver
is used to help citizens, planners, and emergency managers make critical decisions fast—
so urgency is always there.
Our problem was with two existing Sybase databases that, despite having some
overlapping functionality, were not integrated. Data feeds from our satellites and
scientists are automatically fed into these databases, along with water level
measurements and water currents. While the data size wasn’t massive, the feeds from
satellites are unusually complex due to different data models that are difficult to
transform and load. And the lack of integration between the databases was producing
duplicate data that scientists couldn’t trust. This was unacceptable for our organization,
and we knew we needed to integrate the databases quickly to ensure accuracy.
We brought in Project Performance Corporation, a technology service group that works
with an international energy and environmental consultancy to help us find the right
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
solution for our needs. We knew we would be replacing a legacy solution, so we not only
needed the right features but we needed a smooth transition and a short learning curve.
Talend delivered.
With Talend, NOAA has not only integrated data between the two databases, but we have
dramatically reduced maintenance costs and now have automated data consistency
across both Sybase systems. We needed to act quickly and effectively for the sake of our
constituents, and we succeeded.
Buffalo Studios
Buffalo Studios, a subsidiary of Caesars Interactive Entertainment
(CIE), creates fun and accessible social casino games like Bingo Blitz,
the world’s largest online free‐to‐play bingo game. We have millions
of monthly active users. This popularity can provide huge business opportunities—but
only to those that understand how to use data effectively.
In the case of Bingo Blitz, we track things like cards played per game, whether or not bingo
is achieved, how many credits are earned and spent, and so on. Every bit of data we
collect is vital. But before this data can be of value to us, we have to turn our data into
actionable insight. That’s why it’s critical that our data is accurate and that it gets into our
data warehouse promptly.
This used to be a big problem for us as we were dealing with tedious, manual processes,
where the data team ended up becoming a bottleneck. New, important data would get
stuck in the pipeline or never even make it past the raw logs. This eventually led the
Business Intelligence team to think that the data just wasn’t there even though we were
collecting! It just never made it to the warehouse because the flow was so cumbersome.
Events never got added.
We knew it was time to find a solution to fix this situation fast.
When we started looking at vendors, we immediately saw that Talend offered a more
mature and full‐featured development environment than any other solution we were
evaluating. We liked the fact that Talend’s data integration solution was Java‐based
because that let us leverage existing skill sets in our team. And there was so much more
flexibility. We could use a lot of the Talend solution right out of the box, in conjunction
with writing our own custom Java code.
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
Implementation was quick and painless. With the help of Artha Data Solutions, a Talend
Consulting Partner, we went from concept to production in a few short months. And,
because no one in the company had any previous experience using Talend, we also opted
to utilize online training. That definitely helped bump our engineers up in the learning
curve.
Talend helped us achieve the flexibility we previously lacked. We can write our custom
Java components, where we have specific parsing needs, and just wrap them inside
Talend. We are no longer a bottleneck for adding new data instrumentation points. Our
developers can simply work with the tools and not worry about having to make manual
changes under the hood.
These benefits are huge for us. If we cannot get critical data to the Business Intelligence
group, people are essentially flying blind. We need to measure effectiveness of our
features—whether positive or negative—in near real time. The easier it is for us to
instrument the data, the better the BI team can actually use the data. Talend is helping us
streamline the process, achieve more agility, and get more value from our data. We can
focus on building innovative new technology again. That’s what drives customer growth
and loyalty.
MAKING THE RIGHT CHOICE
When you at last have made the decision to move forward with a new data integration
solution, you still are tasked with choosing which one is the right product for your
organization. This means not only understanding which features you need to solve current
challenges, but recognizing which vendor provides those features in the best possible
package. There are a lot of factors that come into play—everything from licensing costs to
the need for open source tools.
We asked these Talend customers why Talend stood apart from the crowd for them, and
what the primary drivers were behind their decisions.
AOL
AOL is a global ad‐supported Web company, with a comprehensive
display advertising network, a suite of popular Web brands and products, and a leading
social media network.
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
A few years ago, we decided that by standardizing technology, we would reduce the need
for different integration tools and maximize efficiency among our different development
teams. We began by standardizing our front‐end technologies on Apache/Tomcat and our
back‐end technologies on MySQL. Next, we tackled data integration solutions and ETL
tools.
We had been using a combination of Kettle (Pentaho), eMule, and proprietary tools. We
also had custom development which created problems related to reuse and consistency.
After an evaluation, we decided we needed a new solution that would meet a number of
criteria. And we were sure that we wanted an open source product. As a company we're
moving away from commercial licensing whenever possible as it’s not a scalable business
model.
Our selection came down to a number of key requirements. First, performance was
critical: we needed a fast process to handle small files as well as large files, both simple
and complex. Ease‐of‐use was also important, and we liked Talend's graphical
environment. It's really a tool that doesn't need a developer to operate. Extensibility was
also important. We wanted to be able to go under the hood and develop additional
functionalities if necessary. And finally, robustness was very important as data integration
processes are critical to our business and we needed a reliable product. Talend met these
requirements and was the right choice for AOL.
CSIA (Air Information Systems Center)
With origins dating back to World War II, the CSIA is the French Air
Force’s expertise center in terms of information systems. Like a
software and computer services company, we cover the entire lifecycle of a software
product: user assistance during IT projects (customer support), software development
(project management) and application hosting.
One of our departments also manages business intelligence (BI) activities, developing
hierarchical dashboards covering a vast range of business data. Part of this process is the
“REPAIR” (Air Management Repository) project, a BI data warehouse for air activity
monitoring, aircraft technical availability monitoring, maintenance activity monitoring, and
Finance and HR monitoring. This warehouse makes use of an Oracle database and was
modeled in accordance with good BI practices. Initially, each application was connected to
an information center, but the system’s performance was not satisfactory.
Talend Inc. 800 Bridge Parkway, Suite 200, Redwood City, California 94065 US Tel: +1 (650) 539 3200
We started by evaluating open source tools, including Talend Open Studio, for data
integration purposes. We compared its performance with Oracle Data Integrator and
ultimately chose Talend Open Studio for Data Integration to complete testing for a minor
project—because the ability to download the tool for free allowed us to do so quickly,
without impacting our budget. Not only was its performance extremely satisfactory, but it
also was very easy to learn how to use. In addition, the tool provides a comprehensive
view of our jobs, while the Oracle solution only offered a single vision of flows. Finally, the
Java orientation of the tool is a better match to our developers’ skills than Oracle’s
proprietary language.
Following the success of this first project, we then decided to implement Talend
Enterprise Data Integration, so we could further industrialize our data integration
approach and benefit from the solution’s collaborative development and centralized
control features. The implementation and use of Talend Enterprise Data Integration
proved to be more economical than Oracle Data Integrator, because its invoicing and
licensing method is based on the number of developers versus the number of sources and
targets that Oracle uses.
In a nutshell, we chose Talend because it provided, at a lower cost, a similar or even
superior technical performance to other major vendors. Talend proved itself to us. We no
longer have to worry about the sustainability of our system since we have the skills
necessary for its administration and free access to the solution source code for any
developments.
Conclusion
Real businesses face real challenges every day when it comes to managing and effectively
using data. Deciphering the choices that are available is often not an easy task, especially
when you’re up against tight budgets or deadlines. The companies that have shared their
stories here provide genuine insight into the issues that affect how to choose the best
data integration platform—and when to implement it. Clearly, Talend has assisted—and
continues to assist—these businesses and organizations with tackling their data
integration challenges in the most empowering, cutting‐edge, and economical ways
possible.
For more information on how Talend can help your company solve its data integration
issues, please call your Talend Representative or visit us online at talend.com/contact.
WP193‐EN