picnic software - developing a flexible and scalable application
TRANSCRIPT
Developing a flexible and scalable application
Presenting Today
Andrew Browne Dave Churchill Basarat Syed
Nick Josevski Matt Walkenhorst
Disclaimer
The views expressed here are solely those of the authors in their private capacity and do not in any way represent the views of the Picnic Software Pty Ltd, or any other associated entity or shareholder.
Picnic Software Pty Ltd has not approved, endorsed, embraced, friended, liked, tweeted, google-plused, pinterested, dugg, reddited, hacker-newsed, sanctioned or authorized this presentation.
Agenda
• App & Tech
• Infrastructure & Data Flow
• Deployment & Scalability
• Permissions with Neo4j
• Client Side Technologies
• Development & Testing Workflow
What we do
• We are an ISV (Independent Software Vendor)
– Building and running a workflow/collaboration application
• Partnerships with large businesses in the Advertising/Marketing sector
• Our customers are primarily large retailers
Our App
• Media Library
– High resolution files; PSDs, Video
• Collaborative workflows
– Coordinating inputs
• Photography, Illustrations, Graphic Design
– Producing advertising outputs
• Over multiple media channels – Catalogues, Billboards/Print, TV, Radio, Web
Our Tech Stack
• F#, C#, ASP.NET MVC, ServiceStack
• EventStore, Eventful, RavenDB, Neo4j
• Angular, TypeScript, Mocha, Node, Sass
• SignalR, AutoMapper, RabbitMQ, LINQ, NSubstitute, Nunit, FSUnit, FParsec, FsCheck
• AWS, Docker, Riemann, Logstash, PowerShell, PSake, NodaTime
• GitHub, TeamCity, Octopus Deploy, Slack, YouTrack
Flexibility
• Architecture choices to support
– Changes in requirements
– Future customers working in same domain
• When we started building we had
– Known customer workflows
– Known unknown customer workflows
– Unknown unknown customers workflows
Questions
Our Infrastructure
Andrew
Sydney region
CloudFront
S3Core, Neo4j,
RavenDbRabbit,
EventStore
Media
Processing
Web
Elastic Load
Balancing
Availability Zone x 2
MySQL
HA Proxy
S3
CloudFront
Elastic Load Balancer
EC2
Event Store
Writes
Reads
S3
EventStore
Web
Availability Zone x 2
Core Neo4j RavenDb Media
Processing
Rabbit
Disposable
Questions
Deployment & Scalability
Matt
Deployment
• Server Infrastructure – AWS
– Starting new instances largely a manual affair at this point
• Configuration Management – Chef
• Application deployment – Team City, Octopus Deploy
Chef
“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”
Chef
• Windows / Linux deployment nodes
• Each node is in an Environment
• Each node performs one or more Roles
• Each Role requires the running of one or more Recipes
• Recipes are stored in cookbooks
• Keep configuration in Git
• Keep CHEF Server configuration in Git
Octopus Deploy
• Windows code deployments only – Linux coming soon.
• Environments, Roles, Apps, Releases.• Deployment Process – steps executed on Roles to
“Tentacles”– Nuget packages retrieved from Team City– Store configuration as variables – Variable snapshot + Nuget packages = release– IIS, Windows Service, PowerShell steps available– Partial and Rolling deploys– Easy to roll back – just re-deploy last working release.
SOON: Blue / Green Deployments
• Asgard brings up a NEW (GREEN) copy of production AWS infrastructure.
• Automatically Bootstrap instances against Chef and Octopus.
• Smoke test GREEN environment• Add GREEN web servers to load balancer• Remove OLD (BLUE) web servers from load
balancer• Asgard tears down BLUE production AWS
infrastructure.
Message delivery between processes
• Requirements
– Reliable.
– Easy to manage.
– Easy to use.
– Low latency.
Things we looked at (2+ years ago)
• NServiceBus
– Tied to MSMQ at the time.
• MassTransit
– Lacked documentation.
– Not ready for prime-time at that point.
• RabbitMQ + EasyNetQ
– Simple, best fit for us.
– Wrote our own client – bad idea.
RabbitMQ
• Written in Erlang, maintained by Pivotal.
• Linux / Windows.
• Easy administration (Web, command line, JSON).
• Supports– Clustering and failover.
– Durable and HA queues.
– ‘At least once’ delivery guaranteed.
– Direct, Fan-out and Topic exchanges.
– Partitioning (vhosts), Federation & Shovelling.
How Picnic use RabbitMQ
• Setup
– Cluster of RabbitMQ servers behind ELB in multiple AZs.
• You can use HAProxy instead of ELB.
– EasyNetQ library by Mike Hadlow.
• Handles subscription, publish and reconnection logic.
– So solid now we hardly think about it.
Use case: Scaling of long-running CPU and IO intensive operations
• File format conversion, Zip bundling, PDF & InDesign creation etc.
• Uses Topic exchange. Currently just one topic!
• Subscribers are round-robined by the broker.
• Subscribers are isolated – no clustering.
• Scaling - just launch new instances.
• Redundancy – launch in multiple Azs
• This has worked really well for us.
Use case: Distribution to SignalRclients
• In-app notifications, long running task progress etc. to the browser.
• Each web server receives all messages (Fan-out exchange)
• Messages delivered to users / groups via SignalR
Questions
Scalable Permissions
Dave
Permissions Model
<approval-1-guid> Owner
<footwear-folder-guid> Write
• All entities in the system are identified by GUIDs• Each permission applies to a specific entity
Permissions Model
<approval-1-guid> Owner
<footwear-folder-guid> Write
• Permissions also have a role
<footwear-folder-guid> Read
Permissions Model
<approval-1-guid> Owner
<footwear-folder-guid> Write
• Each user has a corresponding "Me" permission
<footwear-folder-guid> Read
<user-dave-guid> Me
Permissions Model
<approval-1-guid> Owner
<footwear-folder-guid> Write
• As events arrive, relationships are built up between permissions• e.g. a JobCreated event might give Owner permission to the creator
<footwear-folder-guid> Read
<user-dave-guid> Me
Permissions Model
<approval-1-guid> Owner
<footwear-folder-guid> Write
• Relationships don't have to stem from the Me permissions• e.g. having Write permission on a folder could mean you can also Read
<footwear-folder-guid> Read
<user-dave-guid> Me
Permissions Model
<approval-1-guid> Owner
<footwear-folder-guid> Write
• A user has a permission if there's a path from the user's Me node• This user doesn't have Read on the folder
<footwear-folder-guid> Read
<user-dave-guid> Me
Permissions Model
<approval-1-guid> Owner
<footwear-folder-guid> Write
• A user has a permission if there's a path from the user's Me node• Giving Write on the footwear folder also gives the user Read
<footwear-folder-guid> Read
<user-dave-guid> Me
Original Implementation
Write
Read
Me
ApprovalOwner
• RavenDB document for each permission• Records which permissions directly inherit this permission
Read <footwear-folder-guid>:
[Write <footwear-folder-guid>]
Write <footwear-folder-guid>:
[Me <user-dave-guid>]
Owner <approval-1-guid>:
[Me <user-dave-guid>]
Me <user-dave-guid>:
[]
Original Implementation
Write
Read
Me
ApprovalOwner
• Worker task builds a second state document for each permission
• Records all permissions which inherit this permission
Read <footwear-folder-guid>:
[Write <footwear-folder-guid>,
Me <user-dave-guid>]
Write <footwear-folder-guid>:
[Me <user-dave-guid>]
Owner <approval-1-guid>:
[Me <user-dave-guid>]
Me <user-dave-guid>:
[]
Original Implementation
Write
Read
Me
ApprovalOwner
• A user has a permission if their Me appears in the permission's state document
Read <footwear-folder-guid>:
[Write <footwear-folder-guid>,
Me <user-dave-guid>]
Original Implementation - Issues
• State documents can get large• Introduce intermediate groups
• Takes time for state documents to be updated• Cache and update permission graph in process
• Other processes can still sometimes see out-of-date permissions• Use a graph database!
New Implementation
• In process of switching to Neo4j
• Transactional updates• No need to calculate intermediate state• Faster• Simpler
• Still need to send some state data across to RavenDB for permissions when searching
Questions
Client Side Architecture
Bas
http://slides.com/basarat/picnic-frontend
Questions
Development Workflow
Nick
Development Workflow
• GitHub
– Feature branches
– Pull Requests
Pull Requests
• Just over 2 months now using the PR based workflow
• Approx 120 closed pull requests so far
• How
– Features branches, GitHub Tagging, TeamCity build process comments against PR
– Asynchronous task for a team member
• “Here’s a PR please review when you can”
Team City & Tagging
Development Workflow
• Why
– It was recommended to us.
– Supports consistent and frequent code reviews.
• Improves code quality
• Shares knowledge amongst the team
– Lets us catch some bugs much earlier.
Development Workflow
• The wins
– Build server is more often in a green state.
• Can push to your PR branch to rely on CI to give feedback
– Knowledge sharing
• “Oh, that’s how you solved that”
• Reducing silo effects
– Offer constructive feedback to others
• “This could be made better by…”
– Bugs / issues caught
• Typos, debug code left in, incomplete/missed features
Development Workflow
• Testing
– Each PR builds as if it was already merged
– Unit/Integration tests run against PR in TeamCity
– YouTrack bugs marked with build numbers to track deployment
Development Workflow
Development Workflow
• Agility
– Tracking feature changes as they evolve alongside code
• As with all documentation - trying our best to keep up to date
• PRs feedback can “code change not reflected in docs”
– Testing team
• Can review these changes and be more up to date with
• Review PRs for an idea of scope of changes and where to look for issues
Development Workflow
Questions
Thanks
Other ALT.NET Presentations by us
Event Sourcing with F# - Andrew BrowneThinking in a document centric
world with RavenDB - Nick Josevski