database trends - baltimore/washington db2 users group trends.pdf · adabas, teradata, postgresql,...
TRANSCRIPT
Database TrendsDatabase Trends
The State of Data, DBMS and DBA Circa 2005
Craig S. MullinsDirector, Product Strategy
Embarcadero Technologies100 California StreetSan Francisco, CA 94111-4517http://www.embarcadero.com
• Complexity
–Heterogeneity–Rapid change–Architecture
• Lack of Resources
–Skilled Technicians–TimeAlready overworked
–Budget
General IT Industry TrendsGeneral IT Industry Trends
Open Source
Speed-to-Market
Buy versus Build
Regulatory Compliance
Outsourcing
Follow the Leader
DBMS Industry TrendsDBMS Industry Trends
• Rapid DBMS Versioning
• Increasing Complexity
• Enabling for the Internet� Online, real-time …
� Java and .Net
� XML
• Multimedia
• Procedural logic
• ERP and CRM
• Data growth (VLDB)
• Open Source
• Regulatory Compliance
The DBMS MarketThe DBMS Market
0
1000
2000
3000
4000
5000
6000
2000 2001 2002 2003 2004 2005
Microsoft
Oracle
IBM
Sources: Gartner Dataquest, IDC and BMC Market Analysis
(IBM figures include Informix)
• Mainframe and Distributed RDBMS market size of $7.1B in 2003• Mainframe and Distributed DBMS market size of $14B in 2002• Forecasted to grow 13% through 2005• Data Growth at approximately 125% per annum• DBA growth 3-5% per year
The Database EnvironmentThe Database Environment
• A lot of choices!
� Vendor, platform, and architecture of DBMS
MVS, OS/390, z/OSWindows NT / 2000 / XP
UnixAIXSun Solaris
HP-UXothers?
LinuxOthers (VSE, VMS, MPE, OS/400, etc.)
Desktop OSWindows 98 / ME / XPLinux
Mac?
Enterprise- Parallel Edition
DepartmentalPersonal
Mobile (PDA)
Adabas, Teradata, PostgreSQL, Supra, Compaq Non-Stop SQL, Ingres, IMS, IDMS, Datacom, others...
Open SourceOpen Source
• Open source does not mean “free,” but it does mean access to the source code, any modifications must be distributed free, and no restrictions on use.
� Like Linux in the world of operating systems, open source DBMS software
is growing:
–MySQL– InnoDB engine acquired by Oracle
–PostgreSQL–Ingres– Open sourced by CA
–Berkeley DB (Sleepycat Software)–SAP DB– Acquired by MySQL
– renamed MaxDB
–Apache Derby (IBM Cloudscape)
Frequent DBMS VersioningFrequent DBMS Versioning
• Analysis of New Features
� Check all Requirements
– Hardware and Software
• Planning the Upgrade
� Impact to system, applications
� Scheduling
• Fallback Strategy
• Migration Verification
DBMS Subsumes FunctionalityDBMS Subsumes Functionality
• XML
• ETL and Propagation
• OLAP
� Analytical features
• Multimedia
� LOBs, BLOBs, CLOBs
• Objects
• Logic/Code
� Triggers
� UDFs
� Stored procedures
More, More, More FeaturesMore, More, More Features
• Materialized Views and MQTs
• Online Schema Changes
� DB2 – change database structures
• Flashback Database
� Oracle – rollback to a PiT
• Real Time Statistics
• Wizards and Advisors
� Autonomic and Self-managing Features
Database Systems and the InternetDatabase Systems and the Internet
• From DBA to eDBAInternet-Age DBA Skills
� Availability
� New skillsets (Java, .Net, XML, WebSphere, etc.)
� Challenging development timelines
Internet Infrastructure WeaknessesInternet Infrastructure Weaknesses
Problem Symptom Effect
UnreliableSporadic crashes
for no apparent
reason
Unplanned
outages
Complex Operators do not
understand how to
resolve problems
Simple problems
result in long
outages
Fragile IT mgrs. Must
debug innocuous
changes
Long debugging
cycles for new
releases
Vulnerable Viruses and bugs
attack all systems
at once
Systems must be
rolled back to
clean backups
Source: Forrester Research
Application AvailabilityPlanned OutagesUnplanned Outages
70% of Outages
30% of Outages
Application DowntimeApplication Downtime
• Minimizing downtime is a requirement on the web� How much availability is enough?
� Five 9s?
• What is the cause of downtime?� Fewer outages are caused by hardware failures
• Planned vs. Unplanned� Planned outages represents 70% of application downtime.
� Just 30% is due to unplanned outages and 50% of the unplanned downtime is due to problems during planned downtime.
Online and Real-time RequirementsOnline and Real-time Requirements
• The need for more and more availability drives online and real-time maintenance
� The DBMS begins to allow for more changes to be made during normal
operations
� The DBMS begins to gather statistics and performance metrics during
normal operations
� ISVs deliver more online, real-time features and functionality that the DBMS
does not yet deliver
� Less manual-intervention required
Impact on the DBAImpact on the DBA
database schema
DNS
SQL
application code
Java operating system
network software
ISP
bridge/router/hub
FTPnetwork cabling
hardware
CGIconnectionZPARMs
3GL
Where is the performance problem?›Most experts agree that 75% to 80% of
performance problems in relational applications is caused by poor SQL or application code, but on the web . . .
ASP
XML
HTMLgateway
Java applet
DB2Connect
init.ora
SQL*Net
HTTP
How Fragile is Your Infrastructure?How Fragile is Your Infrastructure?
Database Design and Web TimeDatabase Design and Web Time
•When the Web is involved everything becomes “rush-rush” - do it now!
•Don’t let database design suffer - take your time and do it right.
•Apps are temporary but data is forever!
� If you do not believe this, then consider: “How often has your
organization re-entered or re-keyed data into a new database
when the data already exists elsewhere?”
Java and Database SystemsJava and Database Systems
• JDBC� Enables Dynamic SQL from Java
� Uses API (CLI)
• SQLJ� Enables Static SQL for Java
� Uses embedded SQL
• J2EE - Java 2 Enterprise Edition� Standard services and specifications for making Java highly available, secure,
reliable, and scalable for enterprise adoption
• EJB - Enterprise Java Beans� Components that contain the business logic for a J2EE application
Java versus .NetJava versus .Net
• ...designed to enable applications to be deployed on any platform as long as they are written in Java
• …designed to enable development in multiple languages as long as the application is deployed on Windows
The Rush to XMLThe Rush to XML
•XML stands for eXtensible Markup Language. � XML is used for exchanging and sharing data
– Inter- and intra-organization
•XML and Database Systems� Integrating XML into relational
– e.g. next version of DB2 (Viper)
� Extender capabilities (like IBM DB2 video, image, audio, and other
multimedia data types)
� XML document stored in a column -or-
components stored as parts of multiple columns in multiple tables
� Formulate XML documents from existing tables
� Search XML documents text and sections
� XQuery capabilities
•New XML DBMS products?
•Over-enthusiasm!
Logic and the DBMS
Logic and the DBMS
Logic in the DBMS
CodeCode
CodeCode
U
P
D
A
T
E
Code
Code
CodeCode
CodeCode
→ Code
Code
I
N
S
E
R
T
D
E
L
E
T
EUDFsfunction( )if this then thatelsedo this stuffreturn x
end
TriggersTriggers
StoredStored
ProceduresProcedures
SQL
Role of the Procedural DBARole of the Procedural DBA
DBCO Adm
inistration
(trigger firi
ng order, pro
c set)DesignReviews
EXPLAINAnalysis
DebuggingSQL
Coding
Complex
Queries
Tuning SQL
On Call
for DBCO
Abends
DBCO Implementation(COMMIT in proc, write or guide)
EnsuringReuse
Schema
Resolution
Non-Traditional DataNon-Traditional Data
•DBMSs are adapting to “handle” more types of non-traditional data� Spreadsheets
� Word documents
� Text
� Presentations
� Seismic data
� Design data
•How?� Integrate the data into the DBMS
� Federate and manage the data “where it lies”
� Graphics
� Photos, Video, Images
� Compound Documents
� Audio, MP3
� Temporal data
� GIS, Spatial data
Data GrowthData Growth
• Technology enables larger databases
• Web, multimedia, data warehousing, & data mining drive up database size� And it will continue (e.g. RFID)
• Disk drives increase in capacity but speed of access does not keep up with capacity increases
• Cost of storage decreasing; sowhy not store more data? But...� What data do users need to store?
� How long must it be maintained?
� What are they willingto pay?
0
50
100
150
200
250
300
350
400
450
2004 2005 2006 2007
Pb
Data Collected, Not Necessarily LeveragedData Collected, Not Necessarily Leveraged
•Enterprise databases growing 125% annually; up to 80% of the information not used, inactive� Analysts expect data to continue to grow at prodigious rates
Inactive Data: 80%
Active Data: 20%
Enterprise Data Growth1
1Source: IDC, Gartner, Embarcadero Technologies Analysis
Database Manageability?Database Manageability?
Source: Gartner Group
Interesting Quotes on Data GrowthInteresting Quotes on Data Growth
• “Global 2000 companies double the amount of data they own every year, while the average dot-com’s data doubles every 90 days.”
– Mike Ruettgers, CEO of EMC Corp.
• “Inside IBM we talk about 10 times more connected people, 100 time more network speed, a 1000 times more devices and a million times more data.”
– Lou Gerstner, Former CEO of IBM Corp.
• A Giga Information Group research paper by analyst Lou Agosta estimated that there are about 201,000TB, or 197 petabytes, of data on the planet. � Of course, this is just an estimate that Giga deemed to be accurate within an order of
magnitude (that is, within a factor of 10).
� And this estimate was made in September 2000 – so there should have been a lot more data generated by now…
DBMS versus Data ComplexityDBMS versus Data Complexity
Today’s Data ChallengesToday’s Data Challenges
• Data consistency, integration cost organizations� Poor quality data costs the typical organization 20% of revenue (Thomas C. Redman, Ph.D )
� 70% of an average IT budget targets data integration projects (IDC)
� 50% of knowledge workers time spent researching data
• Managing complex data infrastructures and data growth hamper efficiency� Total cost to meet data storage requirements may in just a few years account for as much as 70% of
the IT budget (Vacca, 2002)
� Explosive growth threatens to overwhelm systems
� Slow response time, downtime have significant business impact
• Risk of security breach, focus on compliance exacerbates problem� CEOs and CIOs accountable for data availability and change, but security difficult problem to solve
– 58% of respondents to Information Week survey named “managing the complexity” the biggest challenge to
security
� Internal threats are top of mind for IT organizations
– 72 % of CIOs rank careless or risky employee behavior as one of their top three security concerns (CIO Insight),
Additional Data ChallengesAdditional Data Challenges
•According to a recent (September 2005) survey of IT managers:
�77% said that information was not shared efficiently across
organizations
�68% believed too much time and money was spent on managing
and searching for information
�59% said they worried information was not up-to-date or accurate
–Managers voiced very serious concerns about potential damage caused by poor information management.
Source: ContentManagement365
http://www.contentmanagement365.com/Information_Architecture_Analysis/Article406686.aspx?
How Bad is Data Quality?How Bad is Data Quality?
• How good is your data quality? � Estimates show that, on average, data quality is suspect:
– Payroll record changes have a 1% error rate;– Billing records have a 2-7% error rate, and;– The error rate for credit records: as high as 30%.
Source: T.C. Redman, Data Quality: Management and Technology, (New York, Bantam Books).
• Similar studies in Computer World and the Wall Street Journal back up the notion of overall poor data quality.� W.M. Bulkeley, "Databases Are Plagued by Reign of Error," The Wall Street Journal,
26 May 1992, B2.
� B. Knight, "The Data Pollution Problem," ComputerWorld,
28 September 1992, 81-84.
• Even bigger problem: “Does anyone care?”� Fast Company interviewed Tom Peters on the twentieth anniversary of his seminal book, In Search
of Excellence. In the interview, Peters casually mentioned that he'd faked his data. http://www.fastcompany.com/magazine/53/peters.html
� A recent popular blog comes out “against” data integrity!http://jooto.com/blog/index.php/2005/11/01/the-myth-of-data-integrity/
Capital - Chart of Accounts
Facilities - blueprints
Human Resources - org chart
Materials - Bill of Materials
Data Modeling as a Solution:“Why Model Data?”
Data Modeling as a Solution:“Why Model Data?”
Data?
Database DesignDatabase Design
• Many data quality issues can be addressed through better database design:
� Logical data modeling
–Designed for business needs–Metadata capture and definition–Normalization
� Logical to physical mapping and translation
– Proper data types and lengths
� Non-bypassable data integrity mechanisms
–Check constraints–Unique constraints–Referential constraints– Triggers
Enter the DBAEnter the DBA
• The job of database
administration is getting
increasingly more difficult
as database technology
rapidly advances adding
new functionality, more
options, and more
complex and complicated
capabilities...
•• The job of database The job of database
administration is getting administration is getting
increasingly more difficult increasingly more difficult
as database technology as database technology
rapidly advances adding rapidly advances adding
new functionality, more new functionality, more
options, and more options, and more
complex and complicated complex and complicated
capabilities... capabilities...
Database activities that are most challenging
Source: Forrester Research
Perf/Troubleshooting
Patch/Upgrade
Change
management
Planning
Rep/Sync
Backup/Rec
Security
issues
Resource issues
26%
6%6%
21%
14%
11%
8%
8%
Most Challenging DBA TasksMost Challenging DBA Tasks
The DBA is a “Jack of all Trades”The DBA is a “Jack of all Trades”
DB2
database schema
DNS
SQL
application code
Java
operating system
network software
ISP
bridge/router/hub
HTTP
network cabling
hardware
CGI
connection ZPARMs
3GL
ASP
XML
HTMLgateway
Java applet
DB2Connect
Oracle Sybase
SQL Server
Unix
Windowsz/OS
SQL*Net
TCP/IP
V$ Tables
COBOL
VB
C++
JCL
CICS
MQ
Linux
VTAM
SDM: A Key IT DisciplineSDM: A Key IT Discipline
Strategic data management helps data managers unleash the value of their
corporate information by maximizing the usefulness, security, and
availability of their data.
SDM: ArchitectureSDM: Architecture
SDM: AvailabilitySDM: Availability
SDM: SecuritySDM: Security
Benefits of SDMBenefits of SDM
CRM ERP SCMCustom
App
DBAData
Architect
Application
Developer
Compliance More responsive, reliable IT
• Accurate, consistent information
• Visibility for rapid, distributed
development
High performing, efficient data systems
• Ability to implement standard
operating, administration
• Performance tuning, monitoring,
enhancement to fully leverage
systems
Protected data to minimize exposure
• Security policy consistent,
enforceable
Contact InformationContact Information
• Craig S. Mullins
• Director, Product Strategy
• Embarcadero Technologies100 California StreetSan Francisco, CA 94111-4517
• http://www.embarcadero.com
http://www.craigsmullins.com/cm-book.htm
http://www.craigsmullins.com/dba_book.htm