open source solutions: managing, analyzing and delivering business information
DESCRIPTION
These slides on the usage of open source solutions within the business intelligence and data warehousing market go with a webcast and research report. The webcast is archived at http://ow.ly/KLz0 along with a PDF of the report, This presentation describes what open source software is being deployed and presents the benefits, challenges and practices for organizations adopting open source technologies.TRANSCRIPT
Leveraging Open Source Business Intelligence Across Your Organization Mark R. Madsen – February 2009www.ThirdNature.net
Open Source Solutions:Managing, Analyzing andDelivering Business InformationMark R. Madsen – November 2009www.ThirdNature.net
Slide 2February 2009 Mark Madsen
The First Recorded Patent
Slide 3February 2009 Mark Madsen
The First Monopoly
Slide 4February 2009 Mark Madsen
The Origin of Copyright
•1556: The Worshipful Company of Stationers and Newspaper Makers is granted a Royal Charter, giving it a monopoly over the publishing industry until …
•1710: “An Act for the Encouragement of Learning, by vesting the Copies of Printed Books in the Authors or purchasers of such Copies, during the Times therein mentioned”, otherwise known as the Statute of Anne, put the put the rights into the hands of authors
Slide 5February 2009 Mark Madsen
After Each Revolution, the Old Pirates Become the New Establishment
Pirate
Establishment
Slide 6February 2009 Mark Madsen
What is Commercial Software, Really?
Slide 7February 2009 Mark Madsen
What Makes Software Open Source?
More freedom
Academic LIcenses
Reciprocal Licenses
“Freeware” Licenses
Commercial Licenses
Less freedom
The fuzzy dividing line between open and closed source
Slide 8February 2009 Mark Madsen
Some Quick Definitions
Proprietary SoftwareSoftware under a license that provides limited usage rights only, provided in binary format.Open Source Software (OSS)Software under a license that allows acquisition, modification and redistribution. FreewareSoftware that does not have licensing limitations, generally distributed in binary format. Not the same as open source.
Slide 9February 2009 Mark Madsen
Fauxpen SourceSomething appearing with greater frequency as open source becomes more popular and lower tier proprietary vendors seek a differentiator.
Slide 10February 2009 Mark Madsen
Evolution of the Software Market 1987
Source: John Prendergast (data: Bloomberg, Factset)
Slide 11February 2009 Mark Madsen
Evolution of the Software Market 1997
Source: John Prendergast (data: Bloomberg, Factset)
Slide 12February 2009 Mark Madsen
Evolution of the Software Market 2007
Source: John Prendergast (data: Bloomberg, Factset)
Slide 13February 2009 Mark Madsen
The DW & BI Software Market Today According to IDC, the analytics and data warehouse software market is growing at 10.3% CAGR
17,38619,342
21,40823,601
26,00128,682
31,595
2005 2006 2007 2008 2009 2010 2011
Slide 14February 2009 Mark Madsen
Any Industry This Big is Maturing
Annual US software sales
-10
10
30
50
70
90
110
130
150
70 75 80 85 90 95 00Source: US Dept. of Commerce
“If the automobile had followed the same development as the computer, a Rolls-Royce would today cost $100, get a million miles per gallon, and explode once a year killing everyone inside.”
Robert Cringely
Time
Anything
Reality
Slide 16February 2009 Mark Madsen
Software Revenue = Corporate IT Cost
IT costs as a percent of equipment investment
0
10
20
30
40
50
68 72 76 80 84 88 92 96 00 04Source: US Dept. of Commerce
Slide 17February 2009 Mark Madsen
Open Source is an Inevitable Consequence
If the means of production is widely distributed at commodity costAnd the internet connects all those means of productionAnd the supply of any software program is infiniteThen we need to rethink some things.“The era of high capital industrial production is giving way to a different model.” – Peter Drucker
Slide 18February 2009 Mark Madsen
A Perfect Commodity Changes Things
Open source is a means of production and distribution of software, and is driving change in the market.
But the fact that the internet is a massive copying machine for the perfect commodity is the real change in conditions.
The basis of open source is economics, not ideology.
Slide 19February 2009 Mark Madsen
The Real State of Enterprise Software?
Slide 20February 2009 Mark Madsen
Enterprise Software Economics
• 70% - 80% of sales & marketing is for new sales
• 76% of new license revenue goes to sales & marketing
• Maintenance makes up 45% of revenues and this number is increasing
• 75% of R&D for mature products is for updates, bug fixing, and non-revenue enhancements
• Maintenance and support is becoming the biggest factor is software company profitability.
Sources Godman-Sachs, Tech Strategy Partners, Forrester
The enterprise software model is breaking down. Some facts:
Slide 21February 2009 Mark Madsen
Open Source Disruption
“Which sector of the industry is most vulnerable to disruption by open source in the next five years?”
1. Web publishing and content management2. Social software3. Business Intelligence
Source: North Bridge Venture Partners
Slide 22February 2009 Mark Madsen
BI is Entering Mainstream Adoption
The BI market has lots of segments, most new, some mature, some being rejuvenated.
Platforms
DatabasesReporting & Analysis
Data Integration
Predictive analytics
Slide 23February 2009 Mark Madsen
Maturity for OSS Components of the StackIn
form
atio
n de
liver
y Dashboards & Scorecards
Analytics / OLAP clients
Interactive Reporting
Standard Reporting
Visualization
GIS & location
Predictive Analytics
Search/Discovery
Modeling
Portal Workflow
Infrastructure
Operating SystemsServers
Integration Management
ETL EII EAI EDR
Information Management
DW/Mart/ODS OLAP servers MDM* Data Quality
Databases
Metadata
Slide 24February 2009 Mark Madsen
Interest in and Use of Open Source
5%
14%
18%
18%
8%
8%
12%
13%
18%
22%
17%
18%
43%
37%
31%
29%
26%
19%
22%
22%
Advanced analytics
Business intelligence
Data integration and ETL
Database
In production Prototype or pilot Evaluating Considering No plans
Source: Third Nature Open Source BI/DW adoption survey
Slide 25February 2009 Mark Madsen
Database Use
2%
2%
2%
3%
3%
3%
3%
7%
7%
8%
10%
11%
44%
75%
Bizgres
Kickfire
LucidDB
MonetDB
SQLite
CouchDB
Palo
Firebird
Ingres
BerkeleyDB
EnterpriseDB
Infobright
Postgres
MySQL
Source: Third Nature Open Source BI/DW adoption survey
Slide 26February 2009 Mark Madsen
Data Integration Tool Use
Source: Third Nature Open Source BI/DW adoption survey
What’s popular
What it’s being used for
2%
2%
2%
5%
5%
8%
13%
33%
42%
Clover
Open Data Quality
OSDQ
Apatar
Red Hat Teiid
DataCleaner
Jitterbit
Talend
Pentaho DI / Kettle
8%
10%
15%
15%
21%
30%
Low‐latency ETL for a data warehouse or mart
Master data management efforts
Data quality efforts
Data migration efforts
Operational integration
Batch ETL for a data warehouse or mart
Slide 27February 2009 Mark Madsen
14.6%
15.2%
15.9%
16.5%
17.1%
20.7%
OLAP
Reports embedded in an application or website
Reporting against an application database
End user or interactive reporting
Dashboards or scorecards
Static reports
BI Tool Use
Source: Third Nature Open Source BI/DW adoption survey
2%
2%
5%
5%
9%
14%
19%
26%
28%
47%
OpenReports
Palo
MarvelIT
Openl
SpagoBI
Jfree
BIRT
Mondrian
Jaspersoft
Pentaho
What’s popular
What it’s being used for
Slide 28February 2009 Mark Madsen
Advanced Analytics Use
2%
3%
4%
4%
7%
8%
8%
23%
42%
46%
Cytoscape
Taverna
Axiis
Processing
Orange
Graphviz
Knime
RapidMiner
Weka
R
Source: Third Nature Open Source BI/DW adoption survey
Slide 29February 2009 Mark Madsen
Usage of the tools
18%
10% 10%13%
50%
14%15%
10%8%
53%
16%
11%14%
18%
41%
25%
18%14%
7%
36%
Replacing proprietary software
Replacing internally developed software
Supplementing a system with similar
features
Adding new functionality to an existing system
Using as part of a new system or
project
Database Data Integration BI Adv. Analytics
Source: Third Nature Open Source BI/DW adoption survey
Slide 30February 2009 Mark Madsen
Who’s Adopting Open Source for BI/DW?
1.The under-budgeted2. ISVs3.The under-served4.The over-served5.Developers who never
had it before
More co-existence and use in edge cases than straight replacements, and often competing with lack of use.
Slide 31February 2009 Mark Madsen
Adoption by Organization Size
Slide 32February 2009 Mark Madsen
Adoption by Size of Organization
Medium and large are the two biggest evaluators, with small using the most in production.
Source: Third Nature Open Source BI/DW adoption survey
38%
23%
41%
23%
37%
32%
Evaluating
Using Small
Medium
Large
Small
Medium
Large
Small
Medium
Large
Slide 33February 2009 Mark Madsen
Scope of System Deployment
27%
38%32%
35%40%
27%
Department or Division Corporate‐wide
Small Medium Large
Source: Third Nature Open Source BI/DW adoption survey
Slide 34February 2009 Mark Madsen
21%
31%
13%
6%
9%
14%
33%
31%
23%
24%
22%
29%
33%
28%
30%
52%
28%
36%
45%
38%
38%
58%
53%
54%
Large
Medium
Small
No purchasee
Maintenance or support contract
Training
Consulting or installation services
Phone, email or on‐site support from the vendor
Commercial license
Phone, email or on‐site support from a third party
Subscription to value‐added, enterprise features
Open Source Purchasing
Source: Third Nature Open Source BI/DW adoption survey
Slide 35February 2009 Mark Madsen
Where Are People Getting Information?
7%
14%
14%
16%
17%
19%
20%
27%
28%
29%
32%
37%
37%
47%
47%
48%
53%
53%
Internet relay chat (IRC)
Support from a third party
Classroom training
Pre‐bundled software (e.g. a database packaged with a BI tool)
Software features in a paid "professional" version of the software
Outside consultant or systems integrator
Vendor support, paid or as part of a subscription
Third party books or documentation
Web‐based training
Print articles
Vendor evaluation / trial support (free)
Blogs
Web seminars or screencasts
Community forums
Online demos
White papers
Online documentation / wikis
Online articles
Source: Third Nature Open Source BI/DW adoption survey
Slide 36February 2009 Mark Madsen
Why Consider Open Source?
IT is after one of three things:
Slide 37February 2009 Mark Madsen
Rationale When Evaluating OSS
Source: Third Nature Open Source BI/DW adoption survey
Lower cost and reducing vendor risk are the two big reasons.
28%
28%
32%
32%
32%
33%
43%
44%
48%
66%
Access to the source code
Extensibility, customizability of software
Open development process and road …
Easier to evaluate or procure
Speed of innovation of the software
Flexibility in deployment
Lower maintenance costs
Reduced dependence on a vendor
Open standards
Lower acquisiton costs
Slide 38February 2009 Mark Madsen
Good News: It Works
12%
22%
30%
32%
33%
34%
36%
40%
43%
69%
Better performance
Quicker turnaround on bug fixes
Speed of innovation of the software
Extensibility / customizability of software
Access to the source code
Freedom from vendor lock‐in
Flexibility in deployment
Reduced dependence on vendor
Ease of integration / open standards
Lower costs
Source: Third Nature Open Source BI/DW adoption survey
The benefits are largely being realized.
Slide 39February 2009 Mark Madsen
Reduced Vendor Dependence
Avoid vendor imposed upgrade cycles
Slide 41February 2009 Mark Madsen
Why did the software evaluations fail?
16%
18%
19%
21%
25%
28%
29%
32%
34%
72%
Lack of vendor service or support
Higher costs than anticipated
Interoperability problems
Lack of available consulting
Reliability problems
Difficulty finding available solutions
Difficulty integrating into current environment
Required more internal expertise than expected
Scalability problems
Missing or incomplete features
Source: Third Nature Open Source BI/DW adoption survey
The biggest reason is maturity of the software.
Slide 42February 2009 Mark Madsen
24%
14% 14%15%
13%
4%
1%3%
Less than 50GB
50 to <100GB
100 to <500GB
500GB to <1TB
1 to <5TB 5 to <20TB 20TB to 50TB
More than 50TB
67% of the sample < 1TB
Data Size, All Database TypesSource: Third Nature Open Source BI/DW adoption survey
Slide 43February 2009 Mark Madsen
Performance problems
33%
33%
37%
69%
Poor batch reporting performance
Poor ETL or data integration performance
Poor performance loading data
Poor interactive BI or analytics performance
Source: Third Nature Open Source BI/DW adoption survey
Slide 44February 2009 Mark Madsen
Solving Performance ProblemsReplace every single thing before the database?
Migrating to an analytic database is twice as likely as to another row-store database.
4%
8%
10%
18%
18%
26%
30%
32%
32%
34%
38%
Migrate to a different traditional database
Buy a specialized accellerator
Migrate to an analytic database
Limit the number of users accessing the system
Change ETL or data integration tools
Rewrite the BI application or reports
Limit the amount of data stored in the system
Redesign the ETL or data integration
Change BI or analytics tools
Buy more powerful hardware
Database or application tuning
Source: Third Nature Open Source BI/DW adoption survey
Slide 45February 2009 Mark Madsen
Discontinuity Drives Open Source BI Use
The situations most appropriate to open source BI tools often involve discontinuous change.
• New interface requirements• New integration requirements• Platform change• Schema change• Data latency / real-time
requirements• Segmenting the user population
The data warehouse is becoming much more diverse – one BI vendor can no longer be expected to provide tools for all needs.
Slide 46February 2009 Mark Madsen
First Thought is Often “Replace”
Slide 47February 2009 Mark Madsen
Coexist is More Likely Than Replace
Slide 48February 2009 Mark Madsen
Augment is Also More Likely
Slide 49February 2009 Mark Madsen
Recommendations1.Don't focus solely on cost
savings. People did not mention as up-front reasons many of the benefits they discovered later.
2.Plan to augment, not replace, existing software with open source. Rather than trying to saving money by replacing software, look at gaps in the BI portfolio or data warehouse stack and use open source to supplement your systems.
Slide 50February 2009 Mark Madsen
Recommendations3.Consider developing open
source policies. Most organizations are adopting open source in an ad-hoc fashion, project by project.
4.Evaluate open source like any other software. It doesn't matter if the software is free if it takes longer to build, manage and deploy solutions to end users, if it is unstable, or if it is missing a key feature
5.Make open source the default option. When there are no internal tools, open source should be the first alternative.
Slide 51February 2009 Mark Madsen
Questions?“When a new technology rolls over you, you're either part of the steamroller or part of the road.” – Stewart Brand
Slide 52February 2009 Mark Madsen
Creative CommonsThanks to the people who made their images available via creative commons:glassblower - http://flickr.com/photos/cazasco/261229878/canal - http://flickr.com/photos/mcsixth/150749007/rc toy truck.jpg - http://flickr.com/photos/texas_hillsurfer/2683650363/asymmetry_building_tokyo.jpg - http://flickr.com/photos/fukagawa/2004102417/beer_free_beer2.jpg - http://flickr.com/photos/fzero/173386050beer_free_beer3.jpg - http://flickr.com/photos/henrikmoltke/142750871/condiments_salsa.jpg - http://flickr.com/photos/uberculture/2462506722/london modern and ancient together.jpg - http://www.flickr.com/photos/cc_chapman/299509390/firemen not noticing fire.jpg - http://flickr.com/photos/oldonliner/1485881035/acapluco_cliff_divers_cc.jpg - http://flickr.com/photos/raveller/highway storm.jpg - http://flickr.com/photos/areyoumyrik/235230688Tenessee chicken - http://www.flickr.com/photos/mayhem/2495739721/
About the Presenter
Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.