ceph object storage at spreadshirt (july 2015, ceph berlin meetup)
TRANSCRIPT
![Page 1: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/1.jpg)
Ceph Object Storage at Spreadshirt
How we start
July 2015
Jens Hadlich, Chief Architect Ansgar Jazdzewski, System Engineer
Ceph Berlin Meetup
![Page 2: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/2.jpg)
About Spreadshirt
2
Spread it with Spreadshirt
A global e-commerce platform for everyone to create, sell and buy ideas on clothing and accessories across many points of sale. • 12 languages, 11 currencies • 19 markets • 150+ shipping regions
• community of >70.000 active sellers • € 72M revenue (2014) • >3.3M items shipped (2014)
![Page 3: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/3.jpg)
Object Storage at Spreadshirt
• Our main use case – Store and read primarily user generated content, mostly images
• Some 10s of terabyte (TB) of data • 2 typical sizes:
– a few dozen KB – a few MB
• Up to 50.000 uploads per day • Read > Write
3
![Page 4: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/4.jpg)
Object Storage at Spreadshirt
• „Never change a running system“? – Currently solution (from our early days):
• Big storage, well-branded vendor • Lots of files / directories / sharding
– Problems: • Regular UNIX tools are unusable in practice • Not designed for „the cloud“ (e.g. replication is an issue) • Performance bottlenecks
– Challenges: • Growing number of users à more content • Build a truly global platform (multiple regions and data centers)
4
![Page 5: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/5.jpg)
Ceph
• Why Ceph? – Vendor independent – Open source – Runs on commodity hardware – Local installation for minimal latency – Existing knowledge and experience – S3-API
• Simple bucket-to-bucket replication – A good fit also for < Petabyte – Easy to add more storage – (Can be used later for block storage)
5
![Page 6: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/6.jpg)
Ceph Object Storage Architecture
6
Overview
Ceph Object Gateway
Monitor
Cluster Network
Public Network
OSD OSD OSD OSD OSD
Monitor Monitor
A lot of nodes and disks
Client HTTP (S3 or SWIFT API)
RADOS (reliable autonomic distributed object store)
![Page 7: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/7.jpg)
Ceph Object Storage Architecture
7
A little more detailled
Monitor
Cluster Network
Public Network
Client
RadosGW
HTTP (S3 or SWIFT API)
Monitor Monitor
Some SSDs (for journals) More HDDs JBOD (no RAID)
OSD node
Ceph Object Gateway
librados
Odd number (Quorum)
OSD node OSD node OSD node OSD node
1G
10G (the more the better)
...
RADOS (reliable autonomic distributed object store)
OSD node
![Page 8: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/8.jpg)
Ceph Object Storage at Spreadshirt
8
Initial Setup
Cluster Network (OSD Replication)
Cluster nodes 3 x SSD (journal / index) 9 x HDD (data) xfs
3 Monitors
2 x 1G
2 x 10G
Public Network
Client HTTP (S3 or SWIFT API)
HAProxy
RadosGW
Monitor
RadosGW
Monitor
RadosGW
Monitor
RadosGW RadosGW
2 x 10G Cluster Network
RadosGW on each node
![Page 9: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/9.jpg)
Ceph Object Storage at Spreadshirt
9
Initial Setup • Hardware Configuration – 5 x Dell PowerEdge R730xd
• Intel Xeon E5-2630v3, 2.4 GHz, 8C/16T • 64 GB RAM • 9 x 4 TB NLSAS HDD, 7.2K • 3 x 200 GB SSD Mixed Use • 2 x 120 GB SDD for Boot & Ceph Monitors (LevelDB) • 2 x 1 Gbit + 4 x 10 Gbit NW
![Page 10: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/10.jpg)
10
Performance – First smoke tests
![Page 11: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/11.jpg)
Ceph Object Storage Performance
11
First smoke tests
• How fast with RadosGW? – Response times (read / write)
• Average? • Percentiles (P99)?
– Throughput? – Compared to AWS S3?
• A first (very minimalistic) test setup – 3 VMs (KVM) all with RadosGW, Monitor and 1 OSD
• 2 Cores, 4GB RAM, 1 OSD each (15 GB + 5GB), SSD, 10G Network between nodes, HAProxy (round-robin), LAN, HTTP
– No further optimizations
![Page 12: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/12.jpg)
Ceph Object Storage Performance
12
First smoke tests
• How fast is RadosGW? – Random read and write – Object size: 4 KB
• Results: Pretty promising! – E.g. 16 parallel threads, read:
• Avg 9 ms • P99 49 ms • > 1.300 requests/s
![Page 13: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/13.jpg)
Ceph Object Storage Performance
13
First smoke tests
• Compared to Amazon S3? – Comparing apples and oranges (unfair, but interesting)
• http vs. https, LAN vs. WAN etc.
• Reponse times – Random read, object size: 4KB, 4 parallel threads, location: Leipzig Ceph S3
(Test) AWS S3
eu-central-1 eu-west-1
Location Leipzig Frankfurt Ireland Avg 6 ms 25 ms 56 ms P99 47 ms 128 ms 374 ms Requests/s 405 143 62
![Page 14: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/14.jpg)
14
Performance – Now with the final hardware
![Page 15: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/15.jpg)
Ceph Object Storage Performance
15
Now with the final hardware
• How fast is RadosGW? – Random read and write – Object size: 4 KB
• Results: – E.g. 16 parallel threads, read:
• Avg 4 ms • P99 43 ms • > 2.800 requests/s
![Page 16: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/16.jpg)
Ceph Object Storage Performance
16
Now with the final hardware
0
50
100
150
200
250
300
350
1 2 4 8 16 32
ms
client threads
Average response times (4k object size)
read
write
![Page 17: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/17.jpg)
Ceph Object Storage Performance
17
Now with the final hardware
0
5
10
15
20
25
30
35
40
45
50
1 2 4 8 16 32 32+32
ms
client threads
Read response times (4k object size)
avg
p99
![Page 18: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/18.jpg)
Ceph Object Storage Performance
18
Now with the final hardware
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 4 8 16 32 32+32
requ
ests
/s
client threads
Read request/s
4k object size
128k object size
1 client / 8 threads: 1G network almost saturated at ~115 MB/s
2 clients: 1G network saturated again; but scale out works J
![Page 19: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/19.jpg)
19
Monitoring
![Page 20: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/20.jpg)
Monitoring
20
Grafana rulez J
![Page 21: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/21.jpg)
21
Global availability
![Page 22: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/22.jpg)
Global Availability
22
• 1 Ceph cluster per data center
• S3 bucket-to-bucket replication
• Multiple regions, local delivery
![Page 23: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/23.jpg)
23
Currently open issues / operational tasks
![Page 24: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/24.jpg)
Open issues / operational tasks
24
• Backup – s3fs-fuse too slow – Setup another Ceph cluster?
• Security – Users – ACLs
• Migration of old data – Upload all existing files via script – Use the old system as fallback / in parallel
![Page 25: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/25.jpg)
Open issues / operational tasks
25
• Replication – Test-drive radosgw-agent – s3cmd? Custom tool? – Metadata (User) – Data
• Performance?
• Bucket Notification – Currently unsupported by RadosGW – Build a custom solution?
![Page 26: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/26.jpg)
Open issues / operational tasks
26
• Scrubbing • Rebuild
![Page 27: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/27.jpg)
To be continued ...
+ = ?
![Page 28: Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)](https://reader033.vdocuments.mx/reader033/viewer/2022042819/55cd5ac2bb61eb760e8b47a6/html5/thumbnails/28.jpg)
Thank You! [email protected] [email protected]