nov 2010 hug: business intelligence for big data
DESCRIPTION
TRANSCRIPT
© 2010, Pentaho. All Rights Reserved. www.pentaho.com.
Business Intelligence for Big Data
James Dixon, Chief GeekAugust, 2010
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Business Intelligence = reports, dashboards, analysis,
visualization, alerts, auditing
It might be a self-selecting audience since we are a Business Intelligence company, but upwards of 90% of the companies we talk to are using, or plan to use Hadoop to transform structured or semi-structured data - with the aim of then analyzing, investigating and reporting on the data.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Hadoop and BI
It might be a self-selecting audience since we are a Business Intelligence company, but upwards of 90% of the companies we talk to are using, or plan to use Hadoop to transform structured or semi-structured data - with the aim of then analyzing, investigating and reporting on the data.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Example Hadoop Cases Today
Transactional• Fraud detection• Financial services/stock marketsSub-Transactional• Weblogs• Social/online media• Telecoms events
* Not many companies have transactional data that classifies as Big Data. Credit card companies, and financial services companies are about it.
* With stock market data were are talking about every stock trade and the bid and ask prices between the transactions - for every stock on multiple markets for a significant time period.For many other companies the Big Data is sub-transactional - it is the events that lead up to transactions
* Weblogs are semi/badly structured. Consider the number of weblog entries created as you look for a book online - researching 5-10 books, reading reviews and comments. You might generate 1000 entries and may or may not buy a book - potentially lots of entries for no transaction. We also want to enrich this data with metadata about the URLs and information about the location of user
* In an online game or world every interaction between participants and the system and between each other is logged. An individual participant might generate > 1 million events for their 1 monthly transaction
* A single phone call or text message generates many events within a telecoms company
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Example Hadoop Cases Today
Non-Transactional• Web pages, blogs etc• Documents• Physical events• Application events• Machine events
In most cases structured or semi-structured
* In additional to transactional and sub-transactional there is also non-transactional data. Some of this data is human-generated and some of it is people-generated.
* People generate lots of content that companies are interested in - web pages, blogs, and comments
* Physical events include data such as weather data. If you take the combined output of the weather-sensing instruments deployed today you get Big Data
* Many software applications log events as they execute, as do machines such as production line machinery
TRANSITIONIn the majority of these cases the data is structured or semi-structured.
LEAD-INWhat do we have in common between these use cases? How can we describe these Big Data scenarios?
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lake
• Single source• Large volume• Not distilled• Can be treated
In most of these cases we are dealing with a single source of data.
We are, we know, dealing with a large volume of data.
We are also dealing with data that is not aggregated, or summarized. Itʼs not ʻdistilledʼ in any way.
It is a large body of data. The data can be raw data or might be treated in some way, treated within the lake or on its way into the lake. For example weblog entries might be geocoded and enriched with metadata.
So we are calling these things Data Lakes.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lakes
• 0-2 lakes per company • Known and unknown questions • Multiple user communities• $1-10k questions, not $1m ones• Don’t fit in traditional RDBMS with a
reasonable cost
There are some other interesting attributes of these data lakes
* If there is a data lake at all in a company there is usually only 1. Some domains such as financial services companies might have two, but any more than this is very rare.
* In most cases we have some questions of this data that are known ahead of time. But we also have questions of the data that cannot be anticipated.
* We also frequently have different user communities that want access to the data. In the example of weblogs we have sales and marketing departments that want to know about the behavior of visitors and the volume of traffic on the site, maybe for different geographies. We also have the IT department that wants information about throughput and load on the server for capacity planning.
* In general most of the questions about the data are not million dollar questions, they are $1k to $10k questions. Because no one user or group has a million dollar question, no-one has a million dollar budget to solve the problem.
* Additionally this amount of data does not fit into a database either because the database physically will not fit or the cost of doing so is out of reach economically.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lake Requirements
• Store all the data• Satisfy routine reporting and analysis• Satisfy ad-hoc query / analysis / reporting • Balance performance and cost
If we look at the requirements of these data lakes we also see common ground:* We want to store all of the data because we donʼt know all the questions we have of the data. If we did know, weʼd only have to keep a subset of the data.* We still want to satisfy all of the traditional BI reporting and analysis needs.* We need to provide the ability to dip into the lake at any time to ask any question of the data:- In some cases we want to extract a slice of data from the lake for detailed analysis. Letʼs say Iʼm in charge of pricing and promotions for a company and this week Iʼm looking at a particular region or a particular product. I want to select a subset of the data from the lake, summarized to some level, with attributes that I want to analyze. I want to slice and dice this data for a few hours or days, and then move onto my next region or product. In this case we are creating a short-lived data-mart from the data lake.- In other cases we know exactly the data we are looking for and donʼt need to explore it. In this case we defined the attributes of the data that we want and we get a query results back.* We also want to balance cost and performance. Big Data solutions are cheaper per-TeraByte than other solutions, but do not have the same level of performance. We want a system where we can selectively improve the performance of data that we care the most about, and still have access to the entire data set any time we need it.
LEAD-INSince we are introducing a new term ʻData Lakeʼ we need to explain how it is different from traditional BI system
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Traditional BI
Tape/Trash
Data Mart(s)
DataSource
?? ?
??
??
In a traditional BI system where we have not been able to store all of the raw data, we have solved the problem by being selective.
Firstly we selected the attributes of the data that we know we have questions about. Then we cleansed it and aggregated it to transaction levels or higher, and packaged it up in a form that is easy to consume. Then we put it into an expensive system that we could not scale, whether technically or financially. The rest of the data was thrown away or archived on tape, which for the purposes of analysis, is the same as throwing it away.
TRANSITIONThe problem is we donʼt know what is in the data that we are throwing away or archiving. We can only answer the questions that we could predict ahead of time.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Tape/Trash
What if...Data Mart(s)
DataSource
Ad-Hoc
Data Lake(s)
Data Warehouse
But what if, instead of sampling the data and throwing the rest awayTRANSITIONWe pour all of the data into a Data LakeTRANSITIONAnd then create whatever data marts we need from the Data LakeTRANSITIONAnd also provide the ability to extract data from the Data Lake on an ad-hoc basisTRANSITIONAnd also provide the ability to extract data from the Data Lake to feed into a data warehouse
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lake(s)
Big Data ArchitectureData Mart(s)
DataSource
Data WarehouseAd-Hoc
This, then, is our Big Data architecture.
As well as pouring data from the source into the Data Lake, we can also take our archive tapes and pour them into the lake as well. Giving us a huge about of historical data.
Does this meet our requirements?
TRANSITIONWe are storing all of the data, so we can answer both known and unknown questionsTRANSITIONWe are satisfying our standard reporting and analysis requirements by putting the most commonly requested data into data martsTRANSITIONWe are satisfying ad-hoc needs by providing the ability to dip into the lake at any time to extract data. This extracted data might be used to populate a temporary data mart, it might be used at the input for a specialized visualization tool, or might be used by an analytical application.TRANSITIONWe are meeting the need to balance performance and cost by allowing you to choose how much data is staged in high-performance databases for fast access, and how much data is available from the Data Lake only.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Does Big Data Replace Data Marts?
• If it is a database• If it has low latency
Hadoop (to date)• Databases are immature• Databases are no-SQL
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Why Hadoop and BI?
• Distributed processing• Distributed file system • Commodity hardware• Platform independent (in theory) • Scales out beyond technology and/or
economy of a RDBMS
In many cases it’s the only viable solution
* For the purposes of BI the parallel processing and distributed storage of Hadoop, along with its scale-out architecture using commodity hardware is attractive.
* Since Hadoop is written in Java it is, theoretically plaform-independent. At this point, due to some dependencies, it is only recommended for Linux/Unix.
* And because these factors allow it to scale with a better price/performance characteristics than databases...
TRANSITION
... in many cases itʼs the only viable solution
LEAD-INSo are there any downsides to Hadoop for BI use cases?
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Hadoop and BI?
90% of new Hadoop use cases are transformation of semi/structured data*
* of those companies we’ve talked to...
It might be a self-selecting audience since we are a Business Intelligence company, but upwards of 90% of the companies we talk to are using, or plan to use Hadoop to transform structured or semi-structured data - with the aim of then analyzing, investigating and reporting on the data.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
“The working conditions within Hadoop are shocking”
ETL Developer
Hadoop and BI?
Unfortunately for developers who are used to working with data transformation tools, the productivity within the Hadoop environment is not what they are used to.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Hadoop and BI?
Instead of this...
Instead of a graphical UI with palettes of data transformation operations to string together in a way that is easy to understand, easy to trace, and easy to explain...
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Hadoop and BI?
public void map( Text key, Text value, OutputCollector output, Reporter reporter)
public void reduce( Text key, Iterator values, OutputCollector output, Reporter reporter)
You have to do this...
In Hadoop we have two Java functions - Map and Reduce - that need to be implemented. These functions are part of the MapReduce processing engine mentioned earlier.
Mapping and reducing are important functions in a data transformation engine, unfortunately there are many other operations that we need to do on our data.
Hadoop does not include a comprehensive suite of data transformation operations
To understand how we ended up in this situation we need to take a brief look at the history of Hadoop
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
MapReduce Limitations
Doing everything with MapReduce is like doing everything with recursion.
You can, but that doesn’t mean its the best solution
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
MapReduce Limitations
Not a scalable name...
What’s next?MapReduceLookupJoinDenormalize UpdateDedupeFilterCalcMergeAppend
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Google’s Use Case
• Needed to index the internet• Huge set of unstructured data• Predetermined input• Predetermined output (the index)• Predetermined questions• Single user community• Needed parallel processing and storage
Their answer was MapReduce (MR)
The trail starts with Google. Google wanted to index the internet.
* This is clearly a big data set, and also an unstructured data set.
* Before they set out, Google knew what their data set was
* They knew how they wanted to process the data - to create an index
* They knew the questions they wanted to ask of the data - given some keys words, what are the most relevant web pages
* They has a single user community - the set of people trying to search the internet
* In order to solve this problem they needed a scalable architecture with distributed storage and parallel processing
TRANSITIONTheir answer was to use MapReduce
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Yahoo’s Use Case
Their answer was Hadoop (w/ MapReduce)
• Needed to index the internet• Huge set of unstructured data• Predetermined input• Predetermined output (the index)• Predetermined questions• Single user community• Needed parallel processing and storage
Next along the trail is Yahoo. Yahooʼs requirements were very similar, in fact almost identical, to Googlesʼs.
* The exact same data set* The same input format* The same output* The same questions* From the same population* With the same scalability requirements
TRANSITIONYahooʼs answer was Hadoop, which includes a MapReduce engine
LEAD-INSo how do these requirements compare with the current, BI-specific use cases?
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Current Use Cases
• Not indexing the internet• Huge set of semi/structured data• Different input source and format• Different outputs• Different questions• Multiple user communities• Need parallel processing and storage✓
✗
✗
✗
✗
✗
✗
* No-one is indexing the internet - that is not a BI use case* In most cases we have structured or semi-structured data, not unstructured
* In each use case the data source is different, so the format of the data is different
* In each case the output is not an index, it is a variety of data sets, data feeds, and reports
* In each case the questions of the data are different, and the questions cannot all be predicted
* In most cases we have multiple user communities with different needs and questions
* In each case the volume of the data is such that we need a scalable architecture with distributed storage and parallel processing
When we compare these scenarios with the purpose for which Hadoop was created we see thatTRANSITIONThere is not much overlap between the Big Data needs of BI, and the original intent of HadoopLEAD-INThe realization here is that...
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Unfortunately Hadoopwasn’t designed
for most BI requirements
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Hadoop’s Strengths and Weaknesses
• Distributed processing• Distributed file system • Commodity hardware• Platform independent (in theory) • Scales out beyond technology and/or
economy of a RDBMSBut...• Not designed for BI
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
No-SQL and BI
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Structured
BI Tools Need...
LanguageQuery
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
BI Tools Don’t Need
• CREATE / INSERT• UPDATE• DELETE • (only Read needed)• No ACID transactions
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Mondrian (OLAP) NeedsRequired: • SELECT• FROM• WHERE• GROUP BY• ORDER BY
Nice to have: • HAVING• ORDER BY ... NULLS COLLATE• COUNT(DISTINCT x,y)• COUNT(DISTINCT x), COUNT(DISTINCT y)• VALUES (1,’a’), (2,’b’)
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Why not add to Hadoop the things it’s missing...
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
... until it can do what we need it to?
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
If only we had a Java, embeddable,
data transformation engine...
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Hadoop Architecture
Hadoop Common
Job Tracker
Task Tracker
Task Tracker
Task Tracker
Name Node
Data Node
Data Node
Data Node
ClientsJava/ Python
Map/Reduce
Filesystem:HDFS,S3...
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Pentaho Data Integration
Hadoop Pentaho Data Integration
Data Marts, Data Warehouse, Analytical Applications
Design
Deploy
Orchestrate
Pentaho Data Integration
Pentaho Data Integration
Fortunately we have an embeddable data integration engine, written in Java
We have taken our Data Integration engine, PDI and integrated with Hadoop in a number of different areas:
* We have the ability to move files between Hadoop and external locations
* We have the ability to read and write to HDFS files during data transformations
* We have the ability to execute data transformations within the MapReduce engine
* We have the ability to extract information from Hadoop and load it into external data bases and applications
* And we have the ability to orchestrate all of this so you can integrate Hadoop into the rest of your data architecture with scheduling, monitoring, logging etc
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Op#mize
Visualize
Load
Files / HDFS
Hive
DM & DW
Applica3ons & Systems
Web Tier
RDBMS
Hadoop
Repor3ng / Dashboards / Analysis
Put in to diagram form so we can indicate the different layers in the architecture and also show the scale of the data we get this Big Data pyramid.
* At the bottom of the pyramid we have Hadoop, containing our complete set of data.
* Higher up we have our data mart layer. This layer has less data in it, but has better performance.
* At the top we have application-level data caches.
* Looking down from the top, from the perspective of our users, they can see the whole pyramid - they have access to the whole structure. The only thing that varies is the query time, depending on what data they want.
* Here we see that the RDBMS layer lets up optimize access to the data. We can decide how much data we want to stage in this layer. If we add more storage in this layer, we can increase performance of a larger subset of the data lake, but it costs more money.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Files / HDFS
Hive
DM & DW
Applica3ons & Systems
Web Tier
RDBMS
Hadoop
Met
adat
aRepor3ng / Dashboards / Analysis
PD
IP
DI
PD
I
We are able to provide this data architecture because we have metadata about every layer in the architecture.
We used Pentaho Data Integration to move data into Hadoop, and to process data within Hadoop, and as result we have metadata about the data within Hadoop.
We also use PDI to create the data marts and extracts from Hadoop, so we have metadata about those as well
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Applica3ons & Systems
Repor3ng / Dashboards / Analysis
Web Tier
RDBMS
HadoopData Lake
If we compare this diagram to our other Big Data diagram we see how it fits together.
TRANSITION
Our Data Lake sits within Hadoop
TRANSITION
Our neatly packaged data mart and DW extracts feed into the database layer. Data from here can get to users very quickly.
TRANSITION
Our ad-hoc queries and ad-hoc data-marts come directly from the Data Lake
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Op#mize
Visualize
Load
Files / HDFS
Hive
DM & DW
Applica3ons & Systems
Web Tier
RDBMS
Hadoop
Repor3ng / Dashboards / Analysis
This, then, is our big data architecture.
Its a hybrid architecture that enables you to blend Hadoop with other elements of your data architecture, and with whatever amount of database storage you think necessary.
The blend of Hadoop and other technologies is flexible and easy to tweak over time
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Web Tier
RDBMS
Hadoop
Repor3ng / Dashboards / Analysis
HDFS
Hive
DM
In this demo we will show how easy it is to execute a series of Hadoop and non-Hadoop tasks. We are going to
TRANSITION 1 Get a weblog file from an FTP serverTRANSITION 2 Make sure the source file does not exist with the Hadoop file systemTRANSITION 3 Copy the weblog file into HadoopTRANSITION 4 Read the weblog and process it - add metadata about the URLs, add geocoding, and enrich the operating system and browser attributesTRANSITION 5 Write the results of the data transformation to a new, improved, data fileTRANSITION 6 Load the data into HiveTRANSITION 7 Read an aggregated data set from HadoopTRANSITION 8 And write it into a databaseTRANSITION 9 Slice and dice the data with the databaseTRANSITION 10 And execute an ad-hoc query into Hadoop
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Demo
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
FAQ
1. Will Pentaho contribute to Apache’s Hadoop projects? Yes 2. Will Pentaho distribute Hadoop as part of their product? Unlikely3. What version of Hadoop will be supported? Initially 20.24. Will Pentaho’s APIs allow existing open source APIs to be used in parallel? Yes
1. Any changes Pentaho makes to the Apache code will be contributed to Apache.
2. Pentaho does not plan to provide its own distribution of Hadoop or to provide anyone elseʼs distribution as part of our products. If we need to provide binary patches while we wait for our contributions to be accepted by the Hive developers, we will do so, but this will be a temporary situation only.
3. We are looking into support for version 20.0 as well.
4. We are not modifying or disabling any Hadoop APIs so any existing MapReduce tasks will work as they did before
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
FAQ
5. Will Pentaho provide support or services to help setup Hadoop? Yes, no, maybe 6. What are the requirements to be in the Pentaho Hadoop beta program? Requirements, be serious, have started already, etc
5. Hadoop is a data source for Pentaho, just as any filesystem, FTP, web service or database is. We donʼt directly provide support for these third party services. We recognize that companies want support and services for Hadoop so we will work with partners to provide these.
6. For the ongoing beta program we are looking for Hadoop sites that have data, have Hadoop installed, and have requirements
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Can I Use ‘Big Data’as a Data Warehouse?
Yes, probably
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
No, probably not
Should I Use ‘Big Data’ as a Data Warehouse?
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
What is a Data Warehouse?
Data Mart• Data structured for query and reportingData Warehouse• What you get if you create data marts for
every system, then combine them together
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Warehouse
• Multiple sources• Cleansed and
processed• Organized• Summarized
By definition a data warehouse has content from many different sources - every operational system within your organization. This data has been cleansed, processed, structured and aggregated to the transaction level
TRANSITIONIf we compare the data warehouse to the Data Lake the differences between them become obvious
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Lake(s)
Big Data ArchitectureData Mart(s)
DataSource
Data WarehouseAd-Hoc
So our recommendation is the Data Lake architecture, where data marts and a data warehouse are fed from a data lake.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
But what if I really, really want to . . .
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Data Water-Garden
• Lake(s)• Pools and ponds
• Organized• Cleansed
• Linkages
Instead of a single Data Lake, create a series of data pools. Each pool will be populated from a different data source. The data in the pools should be cleansed and structured.
Create links between the pools with attributes that are exist in both.
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. US and Worldwide: +1 (866) 660-7555 | Slide
Water-Garden ArchitectureData Mart(s)
DataSources
Ad-Hoc
Water-Garden
Data Mart(s)
Then optimize your system by creating data marts for different domains or user populations
© 2010, Pentaho. All Rights Reserved. www.pentaho.com. Pentaho Template v6
More informationwww.pentaho.com/hadoopcontact: [email protected]