migrating from postgresql to mysql at cocolog

Migrating from PostgreSQL to MySQL at Cocolog

Naoto Yokoyama, NIFTY CorporationGarth Webb, Six ApartLisa Phillips, Six Apart

Credits:Kenji Hirohama, Sumisho Computer Systems Corp.

Agenda1. What is Cocolog2. History of Cocolog3. DBP: Database Partitioning4. Migration From PostgreSQL to MySQL

1. What is Cocolog

What is CocologNIFTY CorporationEstablished in 1986A Fujitsu Group CompanyNIFTY-Serve (licensed and interconnected with CompuServe)One of the largest ISPs in JapanCocologFirst blog community at a Japanese ISPBased on TypePad technology by SixApartSeveral hundred million PV/monthHistoryDec/02/2003: Cocolog for ISP users launchNov/24/2005: Cocolog Free for free launchApril/05/2007: Cocolog for Mobile Phone launch

Cocolog (Screenshot of home page)2008/04700 Thousand Users

Cocolog (Screenshot of home page)TypePadCocolog

Cocolog template sets

Cocolog Growth (User)CocologCocolog Freephase1phase2phase3phase4

Cocolog Growth (Entry)CocologCocolog Freephase1phase2phase3phase4

Technology at CocologCore SystemLinux 2.4/2.6Apache 1.3/2.0/2.2 mod_perlPerl 5.8+CPANPostgreSQL 8.1MySQL 5.0memcached/TheSchwartz/cfengineEco SystemLAMP,LAPP,Ruby+ActiveRecord, CapistranoEtc...

MonitoringManagement ToolProprietary in-house development with PostgreSQL, PHP, and PerlMonitoring points (order of priority)response time of each postnumber of spam comments/trackbacksnumber of comments/trackbackssource IP address of spamnumber of entriesnumber of comments via mobile devicespage views via mobile devicestime of batch completionamount of API usagebandwidth usageDBDisk I/OMemory and CPU usagetime of VACUUM analyzeAPPnumber of active processesCPU usageMemory usageHardDBServiceAPL

Tips for migrationTroubles with PostreSQL 7.4-8.1&Linux 2.4/2.6VACUUMData sizeCharacter setCleaning dataTroubles with MySQL convert_tz function sort order

2. History of Cocolog

Phase1 2003/12(Entry: 0.04Million)RegisterPostgreSQLNASWEBStatic contents PublishedBefore DBP 10servers

Phase2 2004/12 (Entry: 7Million)PodcastPortalProfile Etc..Rich templatePublish BookTel Operator SupportNASWEBStatic contents PublishedPostgreSQLRegisterTypePadBefore DBP 50servers

Phase2 - ProblemsThe system is tightly coupled.Database server is receiving from multiple points.It is difficult to change the system design and database schema.

Phase3 2006/3 (Entry: 12Million)NASWEBStatic contents PublishedWeb-APImemcachedPodcastPortalProfile Etc..PostgreSQLRich templatePublish BookTel Operator SupportRegisterTypePadBefore DBP 200servers

Phase4 2007/4 (Entry: 16Million)Web-APINASWEBStatic contents PublishedmemcachedAtomMobileWEBRich templatePublish BookTel Operator SupportRegisterTypepadPostgreSQLBefore DBP 300servers

Now 2008/4Web-APINASWEBStatic contents PublishedmemcachedAtomMobileWEBTypepadRich templatePublish BookTel Operator SupportRegisterMulti MySQLAfter DBP 150servers

3. TypePad Database Partitioning

Steps for TransitioningServer Preparation Hardware and software setupGlobal Write Write user information to the global DBGlobal Read Read/write user information on the global DBMove Sequence Table sequences served by global DBUser Data Move Move user data to user partitionsNew User Partition All new users saved directly to user partition 1New User Strategy Decide on a strategy for the new user partitionNon User Data Move Move all non-user owned data

TypePad Overview (PreDBP)StorageDatabase (Postgres)Static Content (HTML, Images, etc)Application ServerWeb ServerTypeCast ServerATOM ServerMEMCACHEDData Caching servers to reduce DB loadDedicated Server for TypeCast (via ATOM)https(443)http(80)http(80) : atom apimemcached(11211)postgres(5432)Mail ServerInternetnfs(2049)ADMIN(CRON) Serversmtp(25) / pop(110)Blog ReadersBlog OwnersMobile Blog Readerssmtp(25) / pop(110)Cron Server for periodic asynchronous tasks

Why Partition?TypePadTypePadTypePadNon- User RoleTypePadUser Role(User0)All inquires (access) go to one DB(Postgres) After DBPCurrent setupInquiries (access) are divided among several DB(MySQL) TypePadTypePadTypePadTypePadGlobal RoleNon-User RoleUser Role(User1)User Role(User2)User Role(User3)

Server PreparationNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleNew expanded setupDB(MySQL) for partitioned dataCurrent SetupJob Server + TypePad + SchwartzSchwartz DBUser information is partitionedMaintains user mapping and primary key generationStores job detailsServer for executing JobsGrey areas are not used in current stepsAsynchronous Job ServerInformation that does not need to be partitioned (such as session information)

Global WriteCreating the user mapNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanationFor new registrations only, uniquely identifying user data is written to the global DB This same data continues to be written to the existing DBDB(MySQL) for partitioned dataAsynchronous Job ServerMaintains user mapping and primary key generationGrey areas are not used in current steps

Global ReadUse the user map to find the user partitionNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanation Migrate existing user data to the global DB At start of the request, the application queries global DB for the location of user data The application then talks to this DB for all queries about this user. At this stage the global DB points to the user0 partition in all cases.DB(MySQL) for partitioned dataMaintains user mapping and primary key generationMigrate existing user dataAsynchronous Job ServerGrey areas are not used in current steps

Move SequenceMigrating primary key generationNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanation Postgres sequences (for generating unique primary keys) are migrated to tables on the global DB that act as pseudo-sequences. Application requests new primary keys from global DB rather than the user partition.DB(MySQL) for partitioned data

Maintains user mapping and primary key generationGrey areas are not used in current steps

Migrate sequence managementAsynchronous Job Server

User Data MoveMoving user data to the new user-role partitionsNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanation Existing users that should be migrated by Job Server are submitted as new Schwartz jobs. User data is then migrated asynchronously If a comment arrives while the user is being migrated, it is saved in the Schwartz DB to be published later. After being migrated all user data will exist on the user-role DB partitions Once all user data is migrated, only non-user data is on PostgresDB(MySQL) for partitioned dataStores job detailsServer for executing JobsMaintains user mapping and primary key generationUser information is partitionedGrey areas are not used in current stepsMigrating each user dataDB(MySQL) for partitioned data

New User PartitionNew registrations are created on one user role partitionNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanation When new users register, user data is written to a user role partition. Non-user data continues to be served off PostgresDB(MySQL) for partitioned dataMaintains user mapping and primary key generationUser information is partitionedGrey areas are not used in current stepsAsynchronous Job Server

New User StrategyPick a scheme for distributing new usersNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanation When new users register, user data is written to one of the user role partitions, depending on a set distribution method (round robin, random, etc) Non-user data continues to be served off PostgresDB(MySQL) for partitioned dataMaintains user mapping and primary key generationUser information is partitionedGrey areas are not used in current stepsAsynchronous Job Server

Non User Data MoveMigrate data that cannot be partitioned by userNon- User RoleTypePadUser Role(User0)DB(PostgreSQL)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanation Migrate non-user role data left on PostgreSQL to the MySQL side.DB(MySQL) for partitioned dataMaintains user mapping and primary key generationUser information is partitionedGrey areas are not used in current stepsMigrate non-User dataAsynchronous Job ServerInformation that does not need to be partitioned (such as session information)

Data migration doneNon- User RoleTypePadUser Role(User0)DB(Postgres)User Role(User1)User Role(User2)User Role(User3)Global RoleNon-User RoleJob Server + TypePad + SchwartzSchwartz DBExplanationAll data access is now done through MySQL Continue to use The Schwartz for asynchronous jobsDB(MySQL) for partitioned dataStores job detailsServer for executing JobsMaintains user mapping and primary key generationUser information is partitionedGrey areas are not used in current stepsAsynchronous Job ServerInformation that does not need to be partitioned (such as session information)

The New TypePad configurationStorageDatabase (MySQL)Static Content (HTML, Images, etc)Application ServerWeb ServerTypeCast ServerATOM ServerMEMCACHEDData Caching servers to reduce DB loadDedicated Server for TypeCast (via ATOM)https(443)http(80)http(80) : atom apimemcached(11211)MySQL(3306)Mail ServerInternetnfs(2049)ADMIN(CRON) Serversmtp(25) / pop(110)Blog ReadersBlog Owners (management interface)Mobile Blog Readerssmtp(25) / pop(110)Cron Server for periodic asynchronous tasksJob ServerTheSchwartz server for running ad-hoc jobs asynchronously

4. Migration from PostgreSQL to MySQL

DB Node Spec HistoryHistory of scale up PostgreSQL server, Before DBP

DB DiskArray Spec[FUJITSU ETERNUS8000]

Best I/O transaction performance in the world146GB (15 krpm) * 32disk with RAID - 10MultiPath FibreChannel 4GbpsQuickOPC (One Point Copy)OPC copy functions let you create a duplicate copy of any data from the original at any chosen time.http://www.computers.us.fujitsu.com/www/products_storage.shtml?products/storage/fujitsu/e8000/e8000History of scale up PostgreSQL server, Before DBP

Scale out MySQL servers, After DBPA role configurationEach role is configured as HA clusterHA Software: NEC ClusterProShared Storage

Scale out MySQL servers, After DBPPostgreSQLFibreChannel SANDiskArrayheart beatTypePadApplication

Scale out MySQL servers, After DBPBackupReplication w/ Hot backup

Scale out MySQL servers, After DBPPostgreSQLFibreChannel SANDiskArrayheart beatMySQLBackupRoleTypePadApplication mysqldmysqldmysqldrepreprepopcmysqldmysqldmysqld

Troubles with PostreSQL 7.4 8.1Data sizeover 100 GB40% is indexSevere Data FragmentationVACUUMVACUUM analyze cause the performance problemTakes too long to VACUUM large amounts of datadump/restore is the only solution for de-fragmentationAuto VACUUMWe dont use Auto VACUUM since we are worried about latent response time

Troubles with PostgreSQL 7.4 8.1Character setPostgreSQL allow the out of boundary UTF-8 Japanese extended character sets and multi bytes character sets which normally should come back with an error - instead of accepting them.

Cleaning dataRemoving characters set that are out of the boundries UTF-8 character sets.StepsPostgreSQL.dumpALLSplit for PiconvUTF8 -> UCS2 -> UTF8 & MergePostgreSQL.restoredump

Migration from PostgreSQL to MySQL using TypePad scriptStepsPostgreSQL -> PerlObject & tmp publish-> MySQL -> PerlObject & last publishdiff tmp last Object data checkdiff tmp last publish file checkTypePadTypePadPostgreSQLDocumentObjecttmpDocumentObjectlastFile checkdata check

Troubles with MySQLconvert_tz functiondoesn't support the input value outside the scope of Unix Timesort orderdifferent sort order without order by clause

Cocolog Future PlansDynamicJob queue

Consulting bySumisho Computer Systems Corp.System Integratorfirst and best partner of MySQL in Japan since 2003provide MySQL consulting, support, training serviceHAMaintenanceonline backupJapanese character support

Questions

Nifty

Nifty

Typepad

DBDBDBPostgreSQLMySQLNIFTY Corporation1986CompuServe90googleOEM

ISPASP200312Movable TypeTypePadOEM PVPVGoogleAdsense200770Typepadnifty

Typepad

LAPPLAMPLiveJurnalmemcached, TheSchwartzcfengine

LAMPRubyActiveRecordCapistranoMRTG

API

DBVACUUM analyzeVACUUM

ISPWeb(), ()

DB

DBDBDBTypepadTypePadPerlORTypePadDBTypepad

DB

XML-RPCATOM-APIWeb-APImemcachedDB

Web-APIATOMAPI

10

DBDB

SATypepad()DB

Lisa:* Thi s is the general over view of our system before moving to DBP* This shows that everything has multiple servers handling differeent work, except for the DB.Lisa::* Only way to grow is by buying bigger machines* Moving to a new machine is risky** New, untested hardware** New configurations for OS, DB** Requires downtime* Buying big machines is expensive* Single point of failure* Replication helps but still requires big machines* Partitioned data can be run on commodity hardware* Upgrades can be done by moving users slowly to the new machine while the app is running* Backups take less time* crons can be run on the passive sides of the DB.Garth:* A lot of new hardware and infrastructure changes** Briefly about Roles*** Global Hub for user data; maps uniquely identifying user information to the rest of their data on a user partition. Generates unique Ids for user partitions. This will contain duplicate data to what is in the user parition, not replace any.*** User All data directly related to a user (blogs, entries, comments, etc)*** Non-User Data not specific to a certain user or needs to be globally available but not accessed frequently; tags (shared by all users), guest author mapping, support, billing.** Schwartz Tool for managing bulk operations such as moving users from one user partition to another.* All updates to uniquely identifying user information is written to the global DB in addition to its original location in the user record.* No data is read from the global yet (application defaults to look in user0)* Backfill all existing user data into the global role so that it is up to date.* App updated to refer to data location by role** User role** Global role** Support Role** Billing Role* Allows new configurations to be implemented easily* Allows further partitioning of Non-User data* The application reads from the global role to determine where to find the user data. All records should still point to user0, so no change Garth:* Move from using Postgres sequences to MySQL psuedo-sequences** A psuedo-sequence is a table with a single auto-increment ID field** When the app needs to insert new user partition rows it inserts a row into the psuedo-sequence table and uses the lastid as the primary key** One psuedo-sequence table per user tableLisa:* Even/odd increments* Danger of cleaning up all data in these sequence tables (on restart sequence will start from 0)* Use the schwartz service to move users from the user0 partition to user1,2 and 3.* Global is used to mark the user 'read-only'** While the user is read-only the app does not allow any updates** Users who atempt to write data (blog post, etc) are given a status screen** Data is submited automatically when the move is done** Data is checked on the row level** Data is checked on the file level (comparing published files)* New users are added to a single user partitionDepending on business needs, new users can be directed to any of the user partitions. Default is round robin.* Requires scheduled downtime* Small amount of data compared to user data* Migrate data all at once* Update config to point at new servers, bring up service.At this point the app is completely configured on MySQL and PostgresQL can be decommissionedLisa:DBPDBDBDBPostgreSQLMySQLIOTypePadDBPostgreSQLDBETERNUS800016IO

MySQLIOMySQLPostgreSQLHASANDisk

MysqldMysqlSANDiskBackupQuickOPC

analyzeINDEXDB

VACUUM4010

WebHTTPOkMs

1GB1/3INDEXUTF-8PostgreSQL8.08.1VerUPPostgreSQLINSERTPostgreSQL

UTF8UCS2UTF

MySQLPostgreSQLPostgreSQLMySQL2PostgreSQLMySQL

TypepadMySQL

PerlObjecthtmlPostgreSQLUnixTimeMySQLconvert_tz1970/1/1timezone1 OrderbyPostgreSQLMySQL

DBWeb

CocologMySQLSumisho Computer SystemsHA

migrating from postgresql to mysql at cocolog

Documents

new user strategy

readwrite user information

database schema

database server

database partitioning4

mysql convert

global dbuser data

global dbglobal