mysql for developers

MySQL: A Java Developer Prospective

MySQL for Developers

Carol McDonald, Java Architect

Outline

Storage engines

SchemaNormalization

Data types

Indexes

Know your SQL Using Explain

Partitions

JPA lazy loading

Resources

Why is it significant for a developer to know MySQL?

Generics are inefficient

take advantage of MySQL's strengths

understanding the database helps you develop better-performing applicationsbetter to design a well-performing database-driven application from the start

...than try to fix a slow one after the fact!

To get the most from MySQL, you need to understand its design , MySQL's architecture is very different from that of other database servers, and makes it useful for a wide range of purposes.The point about understanding the database is especially important for Java developers, who typically will use an abstraction or ORM layer like Hibernate, which hides the SQL implementation (and often the schema itself).

ORMs tend to obscure the database schema for the developer, which leads to poorly-performing index and schema strategies, one-engine designs that are not optimal, and queries that use inefficient SQL constructs such as correlated subqueries.

MySQL Pluggable Storage Engine Architecture

MySQL supports several storage engines that act as handlers for different table types

No other database vendor offers this capability

query parsing, analysis, optimization, caching, all the built-in functions, stored procedures, triggers, and views is provided across storage engines.A storage engine is responsible for storing and retrieving all the data stored . The storage engines have different functionality, capabilities and performance characteristics,A key difference between MySQL and other database platforms is the pluggable storage engine architecture of MySQL, which allows you to select a specialized storage engine for a particular application need such as data warehousing, transaction processing, high availability... in many applications choosing the right storage engine can greatly improve performance. IMPORTANT: There is not one single best storage engine. Each one is good for specific data and application characteristics.Query cache is a MySQL-specific result-set cache that can be excellent for read-intense applications but must be guarded against for mixed R/W apps.

What makes engines different?

Concurrency: table lock vs row lock right locking can improve performance.

Storage: how the data is stored on disksize for tables, indexes

Indexes: improves search operations

Memory usage:Different caching strategies

Transactions supportnot every application needs transactions

Each set of the pluggable storage engine infrastructure components are designed to offer a selective set of benefits for a particular application Some of the key differentiations include:Concurrency -- some applications have more granular lock requirements (such as row-level locks) than others. Choosing the right locking strategy can reduce overhead and therefore help with overall performance. This area also includes support for capabilities like multi-version concurrency control or 'snapshot'? read. Transaction Support - not every application needs transactions, but for those that do, there are very well defined requirements like ACID compliance and more. Referential Integrity - the need to have the server enforce relational database referential integrity through DDL defined foreign keys. Physical Storage - this involves everything from the overall page size for tables and indexes as well as the format used for storing data to physical disk. Index Support - different application scenarios tend to benefit from different index strategies, and so each storage engine generally has its own indexing methods, although some (like B-tree indexes) are common to nearly all engines. Memory Caches - different applications respond better to some memory caching strategies than others, so while some memory caches are common to all storage engines (like those used for user connections, MySQL's high-speed Query Cache, etc.), others are uniquely defined only when a particular storage engine is put in play. Performance Aids - includes things like multiple I/O threads for parallel operations, thread concurrency, database checkpointing, bulk insert handling, and more. Miscellaneous Target Features - this may include things like support for geospatial operations, security restrictions for certain data manipulation operations, and other like items.

So...

As a developer, what do I need to know about storage engines, without being a MySQL expert?

keep in mind the following questions:What type of data will you be storing?

Is the data constantly changing?

Is the data mostly logs (INSERTs)?

requirements for reports?

Requirements for transactions?

The MySQL storage engines provide flexibility to database designers, and also to allow for the server to take advantage of different types of storage media. Database designers can choose the appropriate storage engines based on their applications needs. each one comes with a distinct set of benefits and drawbacksAs we discuss each of the available storage engines in depth, keep in mind the following questions:What type of data will you eventually be storing in your MySQL databases? Is the data constantly changing? Is the data mostly logs (INSERTs)? Are your end users constantly making requests for aggregated data and other reports? For mission-critical data, will there be a need for foreign key constraints or multiplestatement transaction control?The answers to these questions will affect the storage engine and data types most appropriate for your particular application.

MyISAM Pluggable Storage engine

Default MySQL engine

high-speed Query and Insert capabilityinsert uses shared read lock

updates,deletes use table-level locking, slower

full-text indexing

Non-transactional

good choice for :read-mostly applications that don't require transactions

Web, data warehousing, logging, auditing

MyISAM excels at high-speed operations that don't require the integrity guarantees (and associated overhead) of transactionsMyISAM locks entire tables, not rows. Readers obtain shared (read) locks on all tables they need to read. Writers obtain exclusive (write) locks. However, you can insert new rows into the table while select queries are running against it (concurrent inserts). This is a very important and useful feature. Read-only or read-mostly tables Tables that contain data used to construct a catalog or listing of some sort (jobs, auctions, real estate, etc.) are usually read from far more often than they are written to. This makes them good candidates for MyISAMIt is a great engine for data warehouses because of that environment's high read-to-write ratio and the need to fit large amounts of data in a small amount of spaceMyISAM doesn't support transactions or row-level locks.MyISAM is not a good general purpose storage engine for any application that has:a) high concurrencyb) lots of UPDATEs or DELETEs (INSERTs and SELECTs are fine)

InnoDB Storage engine in MySQL

Transaction-safe and ACID compliant

good query performance, depending on indexes

row-level locking, MultiVersion Concurrency Control (MVCC)allows fewer row locks by keeping data snapshotsno locking for SELECT (depending on isolation level)

high concurrency possible

uses more disk space and memory than ISAM

Good for Online transaction processing (OLTP)Lots of users: Slashdot, Google, Yahoo!, Facebook, etc.

InnoDB - supports ACID transactions, multi-versioning, row-level locking, foreign key constraints, crash recovery, and good query performance depending on indexes. InnoDB uses row-level locking with multiversion concurrency control (MVCC). MVCC can allow fewer row locks by keeping data snapshots. Depending on the isolation level, InnoDB does not require any locking for a SELECT. This makes high concurrency possible, with some trade-offs: InnoDB requires more disk space compared to MyISAM, and for the best performance, lots of memory is required for the InnoDB buffer pool. InnoDB is a good choice for any order processing application, any application where transactions are required. InnoDB was designed for transaction processing. Its performance and automatic crash recovery make it popular for non transactional storage needs, too.

When you deal with any sort of order processing, transactions are all but required. Another important consideration is whether the engine needs to support foreign key constraints.

Memory Engine

Entirely in-memory enginestores all data in RAM for extremely fast access

Hash index used by default

Good for Summary and transient data

"lookup" or "mapping" tables,

calculated table counts,

for caching Session or temporary tables

Memory - stores all data in RAM for extremely fast access. Useful when you need fast access to data that doesn't change or doesn't need to persist after a restart. Good for "lookup" or "mapping" tables, for caching the results of periodically aggregated data, for intermediate results when analyzing data.

The Memory Engine tables are useful when you need fast access to data that either never changes or doesn't need to persist after a restart. Memory tables are generally faster . All of their data is stored in memory, so queries don't have to wait for disk I/O. The table structure of a Memory table persists across a server restart, but no data survives. good uses for Memory tables:For "lookup" or "mapping" tables, such as a table that maps postal codes to state names For caching the results of periodically aggregated data For intermediate results when analyzing data Memory tables support HASH indexes, which are very fast for lookup queries. . They use table-level locking, which gives low write concurrency, and they do not support TEXT or BLOB column types. They also support only fixed-size rows, so they really store VARCHARs as CHARs, which can waste memory.

Archive engine

Incredible insert speeds

Great compression rates

No UPDATEs

Ideal for storing and retrieving large amounts of historical dataaudit data, log files,Web traffic records

Data that can never be updated

Archive tables are ideal for logging and data acquisition, where analysis tends to scan an entire table, or where you want fast INSERT queries on a replication master.

# Archive - provides for storing and retrieving large amounts of seldom-referenced historical, archived, or security audit information. More specialized engines:FEDERATED Kind of like linked tables in MS SQL Server or MS Access. Allows a remote server's tables to be used as if they were local. Not good performance, but can be useful at times.NdbCluster Highly-available clustered storage engine. Very specialized and much harder to administer than regular MySQL storage enginesCSV stores in tab-delimited format. Useful for large bulk imports or exportsBlackhole the /dev/null storage engine. Useful for benchmarking and some replication scenarios# Merge - allows to logically group together a series of identical MyISAM tables and reference them as one object. Good for very large DBs like data warehousing.

HighHighLowHighHighLowMemory cost (relative to other engines)HighMedHighestHighMedHighBulk insert speedYesYesYesYesYesYesReplication supportNoNoNoYesNoNoBuilt-in Cluster/High-availability supportNoYesNoNoYesNoForeign Key supportNAMedSmallMedMedSmallStorage cost (relative to other engines)NoNoYesNoNoYesCompressed dataNAYesNoYesYesYesIndex cachesNAYesNoYesYesNoData cachesNoYesYesNoYesYesGeospatial supportNoYesNoNoYesNoMVCC snapshot readTableRowRowRowMVCCTableLocking granularityNoYesNoYesYesNoTransactionsYes64TBNoYes110TBNoStorage limitsMemoryInnoDBArchiveNDBFalconMyISAMFeature

Storage Engines

Dynamically add and remove storage engines.Change the storage engine on a table with ALTER TABLE

Does the storage engine really make a difference?

User LoadMyISAM Inserts Per SecondInnoDB Inserts Per SecondArchive Inserts Per Second13,203.002,670.003,576.0049,123.005,280.0011,038.0089,361.005,044.0013,202.00168,957.004,424.0013,066.00328,470.003,934.0012,921.00648,382.003,541.0012,571.00Using mysqlslap, against MySQL 5.1.23rc, the Archive engine has 50% more INSERT throughput compared to MyISAM, and 255% more than InnoDB

Pluggable storage engines offer Flexibility

You can use multiple storage engines in a single applicationA storage engine for the same table on a slave can be different than that of the master

can greatly improve performance

master

slave

innodb

isam

you can use multiple storage engines in a single application; you are not limited to using only one storage engine in a particular database. So, you can easily mix and match storage engines for the given application need. This is often the best way to achieve optimal performance for truly demanding applications: use the right storage engine for the right job.You can use multiple storage engines in a single application. This is particularly useful in a replication setup where a master copy of a database on one server is used to supply copies, called slaves, to other servers. A storage engine for a table in a slave can be different than a storage engine for a table in the master. In this way, you can take advantage of each engine's abilities. For instance, assume a master with two slaves environment. We can have InnoDB tables on the master, for referential integrity and transactional safety. One slave can also be set up with innoDB or the ARCHIVE engine in order to do backups in a consistent state. Another can be set up with MyISAM and MEMORY tables in order to take advantage of FULLTEXT (MyISAM) or HASH-based indexing (MEMORY).

Inside MySQL Replication

Writes & Reads

MySQL Master

I/OThread

SQLThread

Writes

relaybinlogMySQL Slavemysqlddataindex &binlogsmysqlddatabinlogReplication

Web/AppServerA storage engine for the same table on a slave

can be different than that of the master

Using different engines

Creating a table with a specified engineCREATE TABLE t1 (...) ENGINE=InnoDB;

Changing existing tablesALTER TABLE t1 ENGINE=MyISAM;

Finding all your available enginesSHOW STORAGE ENGINES;

The schema

Basic foundation of performanceNormalization

Data TypesSmaller, smaller, smallerSmaller tables use less disk, less memory, can give better performance

IndexingSpeeds up retrieval

In a normalized database, each fact is represented once and only once. Conversely, in a denormalized database, information is duplicated, or stored in multiple places.People who ask for help with performance issues are frequently advised to normalize their schemas, especially if the workload is write-heavy. This is often good advice. It works well for the following reasons: Normalized updates are usually faster than denormalized updates. When the data is well normalized, there's little or no duplicated data, so there's less data to change. Normalized tables are usually smaller, so they fit better in memory and perform better. The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values. Consider the preceding example: it's impossible to get a distinct list of departments from the denormalized schema without DISTINCT or GROUP BY, but if DEPARTMENT is a separate table, it's a trivial query.The drawbacks of a normalized schema usually have to do with retrieval. Any nontrivial query on a well-normalized schema will probably require at least one join, and perhaps several. This is not only expensive, but it can make some indexing strategies impossible. For example, normalizing may place columns in different tables that would benefit from belonging to the same index.

Goal of Normalization

Eliminate redundant data:Don't store the same data in more than one table

Only store related data in a table

reduces database size and errors

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.Database normalization minimizes duplication of information, this makes updates simpler and faster because the same information doesn't have to be updated in multiple tables. With a normalized database: * updates are usually faster. * there's less data to change. * tables are usually smaller, use less memory, which can give better performance. * better performance for distinct or group by queries

Normalization

updates are usually faster.there's less data to change.

tables are usually smaller, use less memory, which can give better performance.

better performance for distinct or group by queries

In a normalized database, each fact is represented once and only once. Conversely, in a denormalized database, information is duplicated, or stored in multiple places.People who ask for help with performance issues are frequently advised to normalize their schemas, especially if the workload is write-heavy. This is often good advice. It works well for the following reasons: Normalized updates are usually faster than denormalized updates. When the data is well normalized, there's little or no duplicated data, so there's less data to change. Normalized tables are usually smaller, so they fit better in memory and perform better. The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values. Consider the preceding example: it's impossible to get a distinct list of departments from the denormalized schema without DISTINCT or GROUP BY, but if DEPARTMENT is a separate table, it's a trivial query.The drawbacks of a normalized schema usually have to do with retrieval. Any nontrivial query on a well-normalized schema will probably require at least one join, and perhaps several. This is not only expensive, but it can make some indexing strategies impossible. For example, normalizing may place columns in different tables that would benefit from belonging to the same index.

taking normalization way too far

http://thedailywtf.com/forums/thread/75982.aspx

However Normalized database causes joins for queries

excessively normalized database:queries take more time to complete, as data has to be retrieved from more tables.

Normalized better for writes OLTP

De-normalized better for reads , reporting

Real World Mixture:normalized schema

Cache selected columns in memory table

Normalize first denormalize later

In a denormalized database, information is duplicated, or stored in multiple places. The disadvantages of a normalized schema are queries typically involve more tables and require more joins which can reduce performance. Also normalizing may place columns in different tables that would benefit from belonging to the same index, which can also reduce query performance. More normalized schemas are better for applications involving many transactions, less normalized are better for reporting types of application. You should normalize your schema first, then de-normalize later. Applications often need to mix the approaches, for example use a partially normalized schema, and duplicate, or cache, selected columns from one table in another table. A denormalized schema works well because everything is in the same table, which avoids joins.

If you don't need to join tables, the worst case for most querieseven the ones that don't use indexesis a full table scan. This can be much faster than a join when the data doesn't fit in memory, because it avoids random I/O.

A single table can also allow more efficient indexing strategies.

In the real world, you often need to mix the approaches, possibly using a partially normalized schema, cache tables, and other techniques. The most common way to denormalize data is to duplicate, or cache, selected columns from one table in another table.

Data Types: Smaller, smaller, smaller

Use the smallest data type possible

The smaller your data types, The more index (and data) can fit into a block of memory, the faster your queries will be.Period.

Especially for indexed fields

Smaller = less disk=less memory= better performance

In general, try to use the smallest data type that you can. Small and simple data types usually give better performance because it means fewer disk accesses (less I/O), more data in memory, and less CPU to process operations.

Choose your Numeric Data Type

MySQL has 9 numeric data typesCompared to Oracle's 1

Integer: TINYINT , SMALLINT, MEDIUMINT, INT, BIGINT

Require 8, 16, 24, 32, and 64 bits of space.

Use UNSIGNED when you don't need negative numbers one more level of data integrity

BIGINT is NOT needed for AUTO_INCREMENTINT UNSIGNED stores 4.3 billion values!

If you're storing whole numbers, use one of the integer types: TINYINT, SMALLINT, MEDIUMINT, INT, or BIGINT. These require 8, 16, 24, 32, and 64 bits of storage space, respectively. They can store values from 2(N1) to 2(N1)1, where N is the number of bits of storage space they use.

FLOAT, DOUBLE: supports approximate calculations with standard floating-point math.DECIMAL: use DECIMAL when you need exact results, always use for monetary/currency fields.Floating-point types typically use less space than DECIMAL to store the same range of values use DECIMAL only when you need exact results for fractional numbersBIT: to store 0,1 values.

Choose your Numeric Data Type

Floating Point: FLOAT, DOUBLEApproximate calculations

Fixed Point: DECIMALAlways use DECIMAL for monetary/currency fields, never use FLOAT or DOUBLE!

Other: BITStore 0,1 values

INT(1) does not mean 1 digit! The number in parentheses is the ZEROFILL argument, and specifies the number of characters some tools reserve for display purposes. For storage and computational purposes, INT(1) is identical to INT(20).

Integer data types work best for primary key data types.

Use UNSIGNED when you don't need negative numbers, this doubles the bits of storage space. BIGINT is not needed for AUTO_INCREMENT, INT UNSIGNED stores 4.3 billion values!

Always use DECIMAL for monetary/currency fields, never use FLOAT or DOUBLE!

Character Data Types

VARCHAR(n) variable lengthuses only space it needsCan save disk space = better performance

Use :Max column length > avg

when updates rare (updates fragment)

CHAR(n) fixed lengthUse:short strings, Mostly same length, or changed frequently

The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters to store.VARCHAR(n) stores variable-length character strings. VARCHAR uses only as much space as it needs, which helps performance because it saves disk space.. However, because the rows are variable-length, they can grow when you update them, which can cause extra work. use VARCHAR when the maximum column length is much larger than the average length; when updates to the field are rare, so fragmentation is not a problem;CHAR(n) is fixed-length: MySQL allocates enough space for the specified number of characters. Useful to store very short strings, when all the values are nearly the same length, and for data that's changed frequently. AR is also better than VARCHAR for data that's changed frequently, Changing an ENUM or SET field's definition requires an entire rebuild of the table.When VARCHAR Is Bad VARCHAR(255)Poor Design - No understanding of underlying dataDisk usage may be efficientMySQL internal memory usage is not

Appropriate Data Types

Always define columns as NOT NULLunless there is a good reason not to

Can save a byte per column

nullable columns make indexes, index statistics, and value comparisons more complicated.

Use the same data types for columns that will be compared in JOINsOtherwise converted for comparison

Use NOT NULL always unless you want or really expect NULL valuesYou should define fields as NOT NULL whenever you can.It's harder for MySQL to optimize queries that refer to nullable columns, because they make indexes, index statistics, and value comparisons more complicated.if you're planning to index columns, avoid making them nullable if possible.NOT NULLSaves up to a byte per column per row of dataDouble benefit for indexed columnsNOT NULL DEFAULT '' is bad design

smaller, smaller, smaller

The Pygmy Marmosetworld's smallest monkey

The more records you can fit into a single page of memory/disk, the faster your seeks and scans will beUse appropriate data types

Keep primary keys small

Use TEXT sparinglyConsider separate tables

Use BLOBs very sparinglyUse the filesystem for what it was intended

smaller is usually better. In general, try to use the smallest data type that can correctly store and represent your data.Simple is good. Fewer CPU cycles are typically required to process operations on simpler data types.Disk = Memory = PerformanceEvery single byte countsLess disk accesses and more data in memory

Indexes

Indexes Speed up Querys, SELECT...WHERE name = 'carol'

only if there is good selectivity: % of distinct values in a column

But... each index will slow down INSERT, UPDATE, and DELETE operations

Indexes are data structures that help retrieve row data with specific column values faster. Indexes can especially improve performance for larger data bases.,but they do have some downsides. Index information needs to be updated every time there are changes made to the table. This means that if you are constantly updating, inserting and removing entries in your table this could have a negative impact on performance.You can add an index to a table with CREATE INDEX

Missing Indexes

Always have an index on join conditions

Look to add indexes on columns used in WHERE and GROUP BY expressions

PRIMARY KEY, UNIQUE , and Foreign key Constraint columns are automatically indexed.other columns can be indexed (CREATE INDEX..)

MyISAM index structure
Non-clustered organisation

1-100Data filecontaining unordereddata records1-3334-6667-100Non leaf Nodes store keys, along with pointers to nodesLeaf nodes store index keys with pointers to a row data

Most MySQL storage engines support B-tree indexes. a B-tree is a tree data structure that sorts data values, tree nodes define the upper and lower bounds of the values in the child nodes. B-trees are kept balanced by requiring that all leaf nodes are at the same depth. MyISAM Leaf nodes have pointers to the row data corresponding to the index key .

Clustered organisation (InnoDB)

1-1001-33

34-6667-100So, bottom line:

When looking up a record by a primary key, for a clustered layout/organisation, the lookup operation (following the pointer from the leaf node to the data file) is not needed.

Non leaf Nodes store keys, along with pointers to nodesleaf nodes actually contain all the row data for the recordIn a clustered layout, the leaf nodes actually contain all the data for the record (not just the index key, like in the non-clustered layout)so When looking up a record by a primary key, for a clustered layout/organization, the lookup operation (following the pointer from the leaf node to the data file) involved in a non-clustered layout is not needed. InnoDB leaf nodes refers to the index by its primary key values.

InnoDB's clustered indexes store the row data in the leaf nodes, it's called clustered because rows with close primary key values are stored close to each other. This can make retrieving indexed data fast, since the data is in the index. But this can be slower for updates , secondary indexes, and for full table scans.

InnoDB (Clustered) indexes

ISAM

InnoDB

InnoDB: very important to have as small primary key as possibleWhy? Primary key value is appended to every record in a secondary index

If you don't pick a primary key (bad idea!), one will be created for you

Ref: High Performance MySQL

B-tree indexes

B-Tree indexes work well for:Match on key value

Match on range of valuesavoid NULLS in the where clause - NULLS aren't indexed

Match on left most prefixavoid LIKE beginning with %

Covering Indexes are indexes that contain all the data values needed for a query, these queries can improve performance because the row does not have to be read.

Covering indexesWhen MySQL can locate every field needed for a specific table within an index (asopposed to the full table records) the index is known as a covering index.Covering indexes are critically important for performance of certain queries andjoins. When a covering index is located and used by the optimizer, you will seeUsing index show up in the Extra column of the EXPLAIN output.

Know how your Queries are executed by MySQL

harness the MySQL slow query log and use Explain

Append EXPLAIN to your SELECT statementshows how the MySQL optimizer has chosen to execute the query

You Want to make your queries access less data:are queries accessing too many rows or columns?

Use to see where you should add indexes Consider adding an index for slow queries Helps find missing indexes early in the development process

You need to understand the SQL queries your application makes and evaluate their performance To Know how your query is executed by MySQL, you can harness the MySQL slow query log and use EXPLAIN. Basically you want to make your queries access less data: is your application retrieving more data than it needs, are queries accessing too many rows or columns? is MySQL analyzing more rows than it needs? Indexes are a good way to reduce data access. When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the SELECT, including information about how tables are joined and in which order. With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order.Developers should run EXPLAIN on all SELECT statements that their code is executing against the database. This ensures that missing indexes are picked up early in the development process and gives developers insight into how the MySQL optimizer has chosen to execute the query.

MySQL Query Analyser

Find and fix problem SQL:how long a query took

how the optimizer handled it Drill downs, results of EXPLAIN statements

Historical and real-time analysis query execution counts, run time

Its not just slow running queries that are a problem, Sometimes its SQL that executes a lot that kills your system

MySQL Query AnalyzerThe MySQL Query Analyzer is designed to save time and effort in finding and fixing problem queries. It gives DBAs a convenient window, with instant updates and easy-to-read graphics,The analyzer can do simple things such as tell you how long a recent query took and how the optimizer handled it (the results of EXPLAIN statements). But it can also give historical information such as how the current runs of a query compare to earlier runs.Most of all, the analyzer will speed up development and deployment because sites will use it in conjunction with performance testing and the emulation of user activity to find out where the choke points are in the application and how they can expect it to perform after deployment.The MySQL Query Analyzer saves time and effort in finding and fixing problem queries by providing: Aggregated view into query execution counts, run time, result sets across all MySQL servers with no dependence on MySQL logs or SHOW PROCESSLIST Sortable views by all monitored statisticsSearchable and sortable queries by query type, content, server, database, date/time, interval range, and "when first seen"Historical and real-time analysis of all queries across all serversDrill downs into sampled query execution statistics, fully qualified with variable substitutions, and EXPLAIN resultsThe new MySQL Query Analyzer was added into the MySQL Enterprise Monitor and it packs a lot of punch for those wanting to ensure their systems are free of bad running SQL code.let me tell you the two things I particularly like about it from a DBA perspective: 1. It's Global: If you have a number of servers, you'll love what Query Analyzer does for you. Even Oracle and other DB vendors only provide single-server views of bad SQL that runs across their servers. Query Analyzer bubbles to the top the worst SQL across all your servers which is a much more efficient way to work. No more wondering what servers you need to spend your time on or which have the worst code. 2. It's Smart: Believe it or not, sometimes it's not slow-running SQL that kills your system it's SQL that executes way more times than you think it is. You really couldn't see this well before Query Analyzer, but now you can. One customer already shaved double-digits off their response time by finding queries that were running more much than they should have been. And that's just one area Query Analyzer looks at; there's much more intelligence there too, along with other stats you can't get from the general server utilities.

Understanding EXPLAIN

Just append EXPLAIN to your SELECT statement

Provides the execution plan chosen by the MySQL optimizer for a specific SELECT statementgives insight into how the MySQL optimizer has chosen to execute the query

Use to see where you should add indexes ensures that missing indexes are picked up early in the development process

When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the SELECT, including information about how tables are joined and in which order. With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order.EXPLAIN returns a row of information for each "table" used in the SELECT statement, which shows each part and the order of the execution plan. The "table" can mean a real schema table, a derived or temporary table, a subquery, a union result...Developers should run EXPLAIN on all SELECT statements that their code is executing against the database. This ensures that missing indexes are picked up early in the development process and gives developers insight into how the MySQL optimizer has chosen to execute the query.

EXPLAIN: the execution plan

filmFilm categoryJOINcategoryJOIN

EXPLAIN returns a row of information for each "table" used in the SELECT statementThe "table" can mean a real table, a temporary table, a subquery, a union result.

.

With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order.EXPLAIN returns a row of information for each "table" used in the SELECT statement, which shows each part and the order of the execution plan. The "table" can mean a real schema table, a derived or temporary table, a subquery, a union result.rows: the number of rows MySQL estimates it must examine to execute the query. type The access strategy used to grab the data in this set possible_keys keys available to optimizer keys keys chosen by the optimizer rows An estimate of the number of rows Extra Extra information the optimizer chooses to give you Extra: additional information about how MySQL resolves the query. Watch out for Extra values of Using filesort and Using temporary. Using index means information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index (Covering Index).

EXPLAIN example

How MySQL will access the rows to find resultsEach row represents information used in SELECT

.


Full Table Scan

EXPLAIN SELECT * FROM customer id: 1 select_type: SIMPLE table: customer type: ALLpossible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2 Extra: Using where

full table scan

Avoid: ensure indexes are on columns that are used in the WHERE, ON, and GROUP BY clauses.

type: shows the "access strategy"

BADUsing SELECT * FROM

No WHERE condition

How do you know if a scan is used?

In the EXPLAIN output, the type for the table/set will be ALL or index. ALL means a full table data record scan is performed. index means a full index record scan. Avoid them by ensuring indexes are on columns that are used in the WHERE, ON, and GROUP BY clauses.


EXPLAIN SELECT * FROM customer WHERE custid=1 id: 1 select_type: SIMPLE table: customer type: constpossible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra:

primary key lookup

primary key used in the WHEREvery fast because the table has at most one matching row

constant

system, or const: very fast because the table has at most one matching row (For example a primary key used in the WHERE)

The constaccess strategy is just about as good as you can get from the optimizer.It means that a WHERE clause was provided in the SELECT statement that used: an equality operator on a field indexed with a unique non-nullable key and a constant value was suppliedThe access strategy of system is related to const and refers to when a table with only a single row is referenced in the SELECT

Range Access type

EXPLAIN SELECT * FROM rental WHERE rental_date BETWEEN '2005-06-14' AND '2005-06-16' id: 1 select_type: SIMPLE table: rental type: rangepossible_keys: rental_date key: rental_date key_len: 8 ref: null rows: 364 Extra: Using where

rental_date must be Indexed

let's assume we need to find all rentals that were made between the 14th and 16thof June, 2005. We'll need to make a change to our original SELECT statement touse a BETWEEN operator:SELECT * FROM rentalWHERE rental_date BETWEEN '2005-06-14' AND '2005-06-16'\GAs you can see, the access strategy chosen by the optimizer is the range type.This makes perfect sense, since we are using a BETWEEN operator in the WHEREclause. The BETWEEN operator deals with ranges, as do =.The MySQL optimizer is highly optimized to deal with range optimizations.Generally, range operations are very quick, but here's some things you may not beaware of regarding the range access strategy:An index must be available on the field operated upon by a range operatorIf too many records are estimated to be returned by the condition, the rangeoperator won't be used an index or a full table scan will instead be preferred The field must not be operated on by a function call

Full Table Scan

EXPLAIN SELECT * FROM rental WHERE rental_date BETWEEN '2005-06-14' AND '2005-05-16' id: 1 select_type: SIMPLE table: rental type: ALLpossible_keys: rental_date key: null key_len: null ref: null rows: 16000 Extra: Using where

when range returns a lot of rows, > 20% table,forces scan

If too many rows estimated returned,scan will be used instead

To demonstrate this scan versus seek choice, the range query has been modified to include a larger range of rental_dates.the optimizer is no longer using the range access strategy, because the number of rows estimated to be matched by the range condition > certain % of total rows in the table which the optimizer uses to determine whether to perform a single scan or a seek operation for each matched record. In this case, the optimizer chose to perform a full table scan, which corresponds to the ALL access strategy you see in the type column of the EXPLAIN output

Scans and seeks

A seek jumps to a place (on disk or in memory) to fetch row dataRepeat for each row of data needed

A scan will jump to the start of the data, and sequentially read (from either disk or memory) until the end of the data

Large amounts of data?Scan operations are usually better than many seek operations

When optimizer sees a condition will return > ~20% of the rows in a table, it will prefer a scan versus many seeks

The scan vs seek dilemma Behind the scenes, the MySQL optimizer has to decide what access strategy to use in order to retrieve information from the storage engine. One of the decisions it must make is whether to do a seek operation or a scan operation. A seek operation, generally speaking, jumps into a random place -- either on disk or in memory -- to fetch the data needed. The operation is repeated for each piece of data needed from disk or memory. A scan operation, on the other hand, will jump to the start of a chunk of data, and sequentially read data -- either from disk or from memory -- until the end of the chunk of data. With large amounts of data, sequentially scanning through contiguous data on disk or in memory is faster than performing many random seek operations. MySQL keeps stats about the uniqueness of values in an index in order to estimate the rows returned (rows in the explain output). If the estimated number of matched rows is greater than a certain % of total rows in the table, then MySQL will do a scan.

When do you get a full table scan?

No WHERE condition (duh.)

No index on any field in WHERE condition

Poor selectivity on an indexed field

Too many records meet WHERE condition

scans can be a sign of poor indexing

The ALL access strategy (Full Table Scan)The full table scan (ALL type column value) is definitely something you want towatch out for, particularly if:? You are not running a data warehouse scenario? You are supplying a WHERE clause to the SELECT? You have very large data sets

Sometimes, full table scans cannot be avoided -- and sometimes they can performbetter than other access strategies -- but generally they are a sign of a lack of proper indexing on your schema.If you don't have an appropriate index, no range optimization

Covering indexes

When all columns needed from a single table for a SELECT are available in the index

No need to grab the rest of the columns from the data (file or page)Shows up in Extra column of EXPLAIN as Using index

Important to know the data index organisation of the storage engine!

Covering Indexes are indexes that contain all the data values needed for a query, these queries can improve performance because the row does not have to be read.

Covering indexesWhen MySQL can locate every field needed for a specific table within an index (asopposed to the full table records) the index is known as a covering index.Covering indexes are critically important for performance of certain queries andjoins. When a covering index is located and used by the optimizer, you will seeUsing index show up in the Extra column of the EXPLAIN output.


There is a huge difference between index in the type column and Using index in the Extra columntype column: "access strategy" Const: primary key= good

ref: index access =good

index: index tree is scanned = bad

ALL: A full table scan = bad

Extra column: additional information Using index=good

filesort or Using temporary = bad

means a covering index was found (good!)

means a full index tree scan (bad!)

Remember that index in the type column means a full index scan. Using index in the Extra column means a covering index is being used. The benefit of a covering index is that MySQL can grab the data directly from the index records and does not need to do a lookup operation into the data file or memory to get additional fields from the main table records. One of the reasons that using SELECT * is not a recommended practice is because by specifying columns instead of *, you have a better chance of hitting a covering index.

EXPLAIN example

Covering indexes are useful.Why? Query executionfully from index, withouthaving to read the data row!

.


Operating on indexed column with a function

Indexes speed up SELECTs on a column, but...

indexed column within a function cannot be usedSELECT ... WHERE SUBSTR(name, 3)

Most of the time, there are ways to rewrite the query to isolate the indexed column on left side of the equation

Indexes can quickly find the rows that match a WHERE clause, however this works only if the index is NOT used in a function or expression in the WHERE clause.

Indexed columns and functions don't mix

mysql> EXPLAIN SELECT * FROM film WHERE title LIKE 'Tr%'\G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: rangepossible_keys: idx_title key: idx_title key_len: 767 ref: NULL rows: 15 Extra: Using wheremysql> EXPLAIN SELECT * FROM film WHERE LEFT(title,2) = 'Tr' \G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: ALLpossible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 951 Extra: Using whereNice. In the top query, we have a fast range access on the indexed field title

Oops. here we have a slower full table scan because of the function operating on the indexed field (the LEFT() function)indexed column should be alone on left of comparison

In the 1st example a fast range "access strategy" is chosen by the optimizer, and the index scan on title is used to winnow the query results down.2nd exampleA slow full table scan (the ALL"access strategy") is used because a function (LEFT) is operating on the title column. Operating on an indexed column with a function (in this case the LEFT() function) means the optimizer cannot use the index to satisfy the query. Typically, you can rewrite queries in order to not operate on an indexed column with a function.

Partitioning

Vertical partitioningSplit tables with many columns into multiple tableslimit number of columns per table

Horizontal partitioningSplit table by rows into partitions

Both are important for different reasons

Partitioning in MySQL 5.1 is horizontal partitioning for data warehousing

Niccol MachiavelliThe Art of War, (1519-1520):divide the forces of the enemy come weaker.

the main goal of partitioning is to reduce the amount of data read for particular SQL operations so that the overall response time is reducedVertical Partitioning this partitioning scheme is traditionally used to reduce the width of a target table by splitting a table vertically so that only certain columns are included in a particular dataset, with each partition including all rows. An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren't addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another.Horizontal Partitioning this form of partitioning segments table rows so that distinct groups of physical row-based datasets are formed that can be addressed individually (one partition) or collectively (one-to-all partitions). All columns defined to a table are found in each set of partitions so no actual table attributes are missing. An example of horizontal partitioning might be a table that contains historical data being partitioned by date.

vertical partitioning

Mixing frequently and infrequently accessed attributes in a single table?

Space in buffer pool at a premium?Splitting the table allows main records to consume the buffer pages without the extra data taking up space in memory

Need FULLTEXT on your text columns?

CREATE TABLE Users ( user_id INT NOT NULL AUTO_INCREMENT, email VARCHAR(80) NOT NULL, display_name VARCHAR(50) NOT NULL, password CHAR(41) NOT NULL, first_name VARCHAR(25) NOT NULL, last_name VARCHAR(25) NOT NULL, address VARCHAR(80) NOT NULL, city VARCHAR(30) NOT NULL, province CHAR(2) NOT NULL, postcode CHAR(7) NOT NULL, interests TEXT NULL, bio TEXT NULL, signature TEXT NULL, skills TEXT NULL, PRIMARY KEY (user_id), UNIQUE INDEX (email)) ENGINE=InnoDB;

Less Frequentlyreferenced,TEXT data

Frequentlyreferenced

CREATE TABLE Users ( user_id INT NOT NULL AUTO_INCREMENT, email VARCHAR(80) NOT NULL, display_name VARCHAR(50) NOT NULL, password CHAR(41) NOT NULL, PRIMARY KEY (user_id), UNIQUE INDEX (email)) ENGINE=InnoDB;

CREATE TABLE UserExtra ( user_id INT NOT NULL, first_name VARCHAR(25) NOT NULL, last_name VARCHAR(25) NOT NULL, address VARCHAR(80) NOT NULL, city VARCHAR(30) NOT NULL, province CHAR(2) NOT NULL, postcode CHAR(7) NOT NULL, interests TEXT NULL, bio TEXT NULL, signature TEXT NULL, skills TEXT NULL, PRIMARY KEY (user_id), FULLTEXT KEY (interests, skills)) ENGINE=MyISAM;

An example of vertical partitioning might be a table that contains a number of very wide text or BLOB columns that aren't addressed often being broken into two tables that has the most referenced columns in one table and the seldom-referenced text or BLOB data in another. limit number of columns per table split large, infrequently used columns into a separate one-to-one tableBy removing the VARCHAR column from the design, you actually get a reduction in query response time. Beyond partitioning, this speaks to the effect wide tables can have on queries and why you should always ensure that all columns defined to a table are actually needed.

Understanding the Query Cache

ClientsParser

Optimizer

QueryCachePluggable Storage Engine API

MyISAMInnoDBMEMORYFalconArchivePBXTSolidDBCluster(Ndb)

ConnectionHandling &Net I/O

Packaging

Caches thecomplete query

Query cache

Caches the complete query

Coarse invalidation any modification to any table in the SELECT invalidates any cache entry which uses that table

Good for read mostly tablesFast query when no table changes

Remedy with vertical table partitioning

vertical partitioning ... continued

Mixing static attributes with frequently updated fields in a single table?Each time an update occurs, queries referencing the table invalidated in the query cache

Doing COUNT(*) with no WHERE on an indexed field on an InnoDB table? full table counts slow InnoDB table

CREATE TABLE Products ( product_id INT NOT NULL , name VARCHAR(80) NOT NULL, unit_cost DECIMAL(7,2) NOT NULL, description TEXT NULL, image_path TEXT NULL, num_views INT UNSIGNED NOT NULL, num_in_stock INT UNSIGNED NOT NULL, num_on_order INT UNSIGNED NOT NULL, PRIMARY KEY (product_id), INDEX (name(20))) ENGINE=InnoDB;

// Getting a simple COUNT of products// easy on MyISAM, terrible on InnoDBSELECT COUNT(*)FROM Products;

frequently updated fields

CREATE TABLE Products ( product_id INT NOT NULL , name VARCHAR(80) NOT NULL, unit_cost DECIMAL(7,2) NOT NULL, description TEXT NULL, image_path TEXT NULL, PRIMARY KEY (product_id), INDEX (name(20))) ENGINE=InnoDB;

CREATE TABLE ProductCounts ( product_id INT NOT NULL, num_views INT UNSIGNED NOT NULL, num_in_stock INT UNSIGNED NOT NULL, num_on_order INT UNSIGNED NOT NULL, PRIMARY KEY (product_id)) ENGINE=InnoDB;

CREATE TABLE TableCounts ( total_products INT UNSIGNED NOT NULL) ENGINE=MEMORY;

static attributes

count

Solving multiple problems in one query

SELECT * FROM Orders WHERE TO_DAYS(CURRENT_DATE()) TO_DAYS(order_created) = '2008-01-11' - INTERVAL 7 DAY;

We replaced the function CURRENT_DATE() with a constant. However, we are specifying SELECT * instead of the actual fields we need from the table.

What if there are fields we don't need? Could cause large result set which may not fit into the query cache and may force a disk-based temporary table

SELECT order_id, customer_id, order_total, order_created FROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY;

Although we rewrote the WHERE expression to remove the function on the index, we still have a non-deterministic function CURRENT_DATE() in the statement, which eliminates this query from being placed in the query cache. Any time a non-deterministic function is used in a SELECT statement, the query cache ignores the query. In read-intensive applications, this can be a significant performance problem. let's fix that:SELECT * FROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY;We replaced the function with a constant (probably using our application programming language). However, we are specifying SELECT * instead of the actual fields we need from the table. What if there is a TEXT field in Orders called order_memo that we don't need to see? Well, having it included in the result means a larger result set which may not fit into the query cache and may force a disk-based temporary table. let's fix that:

SELECT order_id, customer_id, order_total, order_createdFROM Orders WHERE order_created >= '2008-01-11' - INTERVAL 7 DAY;

Scalability: MySQL 5.1 Horizontal Partitioning

Cust_id 1-999Cust_id 1000-1999Cust_id 2000-2999

Web/AppServers

MySQL Partitioning

Split table with many rows into partitions by range, key

Logical splitting of tables No need to create separate tables

Transparent to user

Why?To make range selects fasterGood for Data Warehouses

Archival and Date based partitioning

CREATE TABLE cust (id int) ENGINE=MyISAM

PARTITION BY RANGE (id)(PARTITION P1 VALUES LESS THAN (10),PARTITION P2 VALUES LESS THAN (20))

An important new 5.1 feature is horizontal partitioning# Increased performance during scan operations, the MySQL optimizer knows what partitions contain the data that will satisfy a particular query and will access only those necessary partitions during query execution.Partitioning is best suited for VLDB's that contain a lot of query activity that targets specific portions/ranges of one or more database tables. other situations lend themselves to partitioning as well (e.g. data archiving, etc.)good for datawarehousing not designed for OLTP environments

Scalability: Sharding - Application Partitioning

Cust_id 1-999

Cust_id 1000-1999

Cust_id 2000-2999

Web/AppServers

Sharding Architecture

Lazy loading and JPA

Default FetchType is LAZY for 1:m and m:n relationshipsbenefits large objects and relationships

However for use cases where data is needed can cause n+1 selects

Capture generated SQLpersistence.xml file:

examine the SQL statements optimise the number of SQL statements executed!

only retrieve the data your application needs!

public class Employee{ @OneToMany(mappedBy = "employee") private Collection addresses; .....}

Lazy loading and JPAWith JPA many-to-one and many-to-many relationships lazy load by default , meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the "many" objects in a relationship, it will cause n+1 selects where n is the number of "many" objects. You can change the relationship to be loaded eagerly as follows :public class Employee{ @OneToMany(mappedBy = "employee", fetch = FetchType.EAGER) private Collection addresses; .....}However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections. If you want to temporarily override the LAZY fetch type, you could use Fetch Join. For example this query would eagerly load the employee addresses: @NamedQueries({ @NamedQuery(name="getItEarly", query="SELECT e FROM Employee e JOIN FETCH e.addresses")})public class Employee{.....}

Lazy loading and JPA

Relationship can be Loaded Eagerlyif you have several related relationships, could load too much !

ORTemporarily override the LAZY fetch type, use Fetch Join in a query:

public class Employee{

@OneToMany(mappedBy = "employee",fetch = FetchType.EAGER) private Collection addresses;

.....}

@NamedQueries({ @NamedQuery(name="getItEarly", query="SELECT e FROM Employee e JOIN FETCH e.addresses")})

public class Employee{.....}

Lazy loading and JPAWith JPA many-to-one and many-to-many relationships lazy load by default , meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the "many" objects in a relationship, it will cause n+1 selects where n is the number of "many" objects. You can change the relationship to be loaded eagerly as follows :public class Employee{ @OneToMany(mappedBy = "employee", fetch = FetchType.EAGER) private Collection addresses; .....}However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections. If you want to temporarily override the LAZY fetch type, you could use Fetch Join. For example this query would eagerly load the employee addresses: @NamedQueries({ @NamedQuery(name="getItEarly", query="SELECT e FROM Employee e JOIN FETCH e.addresses")})public class Employee{.....}

Scalability improvements - more CPUs / cores than before.

MySQL/InnoDB scales up to 16-way x86 servers and 64-way CMT servers

Subquery optimizations decrease response times (in some cases > 99%)

New join methods improve speed of queries

And more (Dtrace probes, replication heartbeat)

GA Target: December 2009

MySQL Server 5.4

MySQL Server 5.4

Solaris x86 sysbench benchmark MySQL 5.4 vs. 5.1

Scalability improvements

Dozens for ReadsDozens for ReadsDozens for ReadsDozens for Reads# of SlavesMaster/Slave(s)NoYesMySQL Replication

YesNoVariesNoNo

MySQL ReplicationMaster/Slave(s)NoYesMySQL Replication

YesNoVariesNoYes

MySQL Replication + HeartbeatActive/PassiveIf configured correctlyMySQL ReplicationMySQL Replication

MySQL ReplicationYes< 30 secsYesYes

MySQL, Heartbeat + DRBDYesAutomated DB Fail Over255# of Nodes per ClusterYesWrite IntensiveYesRead IntensiveYesBuilt-in Load Balancing

ScalabilityMySQL ReplicationGeographic RedundancyYesAuto Resynch of Data< 3 secsTypical Fail Over TimeNoAutomated IP Fail Over

AvailabilityMySQL ClusterRequirements

MySQL: #3 Most Deployed Database

Gartner 2006

Source: Gartner

63% Are Deploying MySQL or Are Planning To Deploy

Subscription:

MySQL Enterprise

License (OEM):

Embedded Server

Support

MySQL Cluster Carrier-Grade

Training

Consulting

NRE

ServerMonitorSupport

MySQL Enterprise Server

Monthly Rapid Updates

Quarterly Service Packs

Hot Fix Program

Extended End-of-Life

Global Monitoring of All Servers

Web-Based Central Console

Built-in Advisors

Expert Advice

Specialized Scale-Out Help

24 x 7 x 365 Production Support

Web-Based Knowledge Base

Consultative Help

Bug Escalation Program

MySQL Enterprise

Added Value of MySQL Enterprise

Comprehensive offering of production support, monitoring tools, and MySQL database software

Optimal performance, reliability, security, and uptime

Open-source server
with pluggable APIs

Monitoring

Enterprise
manager

Query analysis

Hot fixes

Service packs

Best practices rules

Knowledge base

24x7 supportAdvanced backup

Load
balancer

Single, consolidated view into entire MySQL environment

Auto-discovery of MySQL servers, replication topologies

Customizable rules-based monitoring and alerts

Identifies problems before they occur

Reduces risk of downtime

Makes it easier to scale out without requiring more DBAs

MySQL Enterprise Monitor

A Virtual MySQL DBA Assistant!

facebook

ApplicationFacebook is a social networking site

Key Business BenefitMySQL has enabled facebook to grow to 70 million users.

Why MySQL? We are one of the largest MySQL web sites in production. MySQL has been a revolution for young entrepreneurs.

Owen Van NattaChief Operating OfficerFacebook

Facebook is an excellent example of a company that started using MySQL in its infancy and has scaled MySQL to become one of the top 10 most trafficked web sites in the world.

Facebook uses deploys hundreds of MySQL Servers with Replication in multiple data centers to manage:- 175M active users- 26 billion photos- Serve 250,000 photos every secondFacebook is also a heavy user of Memcached, an open source caching layer to improve performance and scalability: - Memcache handles 50,000-100,000 requests/second alleviating the database burden

MySQL also helps Facebook manage their Facebook applications

20,000 applications which are helping other web properties grow exponentially.

iLike (Music Sharing) added 20,000 users/hour after launching their facebook application

eBay

ApplicationReal-time personalization for advertising Key Business BenefitsHandles eBays personalization needs. Manages 4 billion requests per dayWhy MySQL Enterprise? Cost-effective

Performance: 13,000 TPS on Sun Fire x4100

Scalability: Designed for 10x future growth

Monitoring: MySQL Enterprise Monitor

Chris Kasten, Kernel Framework Group, eBay

- eBay is a heavy Oracle user, but Oracle was become too expensive and it was cost-prohibitive to deploy new applications.b- MySQL is used to run the eBays Personalization Platform which serves advertisements based on user interest.- A business critical system running on MySQL Enterprise for one of the largest scale websites in the world- Highly scalable and low cost system that handles all of eBays personalization and session data needs- Ability to handle 4 billion requests per day of 50/50 read/write operations for approximately 40KB of data per user / session- Approx 25 Sun 4100s running 100% of eBays personalization and session data service (2 CPU, Dual core Opteron, 16 GB RAM, Solaris 10 x86)

- Highly manageable system for entire operational life cycle - Leveraging MySQL Enterprise Dashboard as a critical tool in providing insight into system performance, trending, and identifying issues - Adding new applications to ebay.com domain that previously would have been in a different domain because of cookie constraints- Creating several new business opportunities that would not have been possible without this new low cost personalization platform- Leveraging MySQL Memory Engine for other types of caching tiers that are enabling new business opportunities

Zappos

Application$800 Million Online Retailer of shoes. Zappos stocks over 3 million items.Key Business BenefitZappos selected MySQL because it was the most robust, affordable database software available at the time.Why MySQL? "MySQL provides the perfect blend of an enterprise-level database and a cost-effective technology solution. In my opinion, MySQL is the only database we would ever trust to power the Zappos.com website."Kris Ongbongan,IT Manager

Zappos is one of the world's largest online retailers with over $1 billion in annual sales. They focus on selling shoes, handbags, eyewear as well as other apparel. However their primary focus is delivering superior customers service. They believe delivering the best customer services is key to a successful online shopping experience. MySQL plays a critical role in delivering that customer service by providing Zappos with:

High performance and scalability enabling millions of customers to shop on Zappos.com every day.

99.99% database availability so that Zappos' customers don't experience service interruptions that impact revenue
- A cost-effective solution saving Zappos over $1 million per year, allowing them to spend more money on their customer service and less on their technical infrastructure.

Since Zappos was founded in 1999 they have used MySQL as their primary database to power their web site, internal tools and reporting tasks. In the early days of Zappos, they could not afford a proprietary enterprise database. But, as Zappos has grown, MySQL has been able to scale with their business making it a perfect solution even at their current sales volume. Its been an important piece of infrastructure that they have scaled as the company has grown to $1 billion in sales.
Compared to proprietary enterprise systems, Zappos estimates they are saving about $1 million per year in licensing fees and salaries of dedicated DBAs that can only manage individual systems. In the lifetime of Zappos, they estimate they have saved millions of dollars using MySQL.

Glassfish and MySQL Part 2

DBRegistration ApplicationManaged BeanJSF ComponentsSession BeanEntity ClassCatalogItem ManagedBean

Catalog Sample Java EE Application


DBRegistration ApplicationManaged BeanJSF Components Web Service Entity ClassCatalog EJBItem ManagedBean

SOAPWeb ServiceClient

Catalog Sample JAX-WS Application


DBRegistration ApplicationJAX-RS classJavaFXJAXB classEntity ClassItemsConverterItem ItemsResource

RIA App

RESTWeb Services

Persistence-tier

DataBase

HTTPRESTful Catalog

In Conclusion

Understand the storage engines

Keep data types smallData size = Disk I/O = Memory = Performance

Know your SQLUse EXPLAIN , use the Query Analyzer

Understand the query optimizer

Use good indexing

Resources

MySQL Forge and the Forge Wikihttp://forge.mysql.com/

Planet MySQLhttp://planetmysql.org/

MySQL DevZonehttp://dev.mysql.com/

High Performance MySQL book

http://java.sun.com/developer/technicalArticles/glassfish/GFandMySQL_Part2.html

http://java.sun.com/developer/technicalArticles/glassfish/GFandMySQL_Part4Intro.html

Click to edit the title text format

Click to edit the notes format

Exact File Name

22/06/09

Page

Click to edit the title text format

mysql for developers

Documents