full-stack web development with mongodb, node.js and aws
DESCRIPTION
Akira Technologies will share its experience of building a universal scalable high-performance platform for conducting surveys. Using MongoDB allowed replacing dozens unique survey systems with a single flexible solution, improved data and questionnaire reusability, simplified data analysis. We will also cover full-stack development and integration with Node.js, Hadoop, deployment to AWS Cloud, offline caching and stress-tecting the entire system with Tsung. A working prototype will be demonstrated including multiple surveys, dynamically rebuilding interface, geolocation, data analysis and visualization.TRANSCRIPT
Andrey Mikhalchuk
Chief Architect
Akira Technologies
FULL-STACK WEB DEVELOPMENT WITH
MONGODB, NODE.JS AND AWS
Founded in 2003Technology and management
consulting company80+ employeesHUBZone, SBA 8(a), SDBHITSP and NIEM Contributing MemberISO 9001, CMMI Level 3Top Secret Facility ClearanceMore on http://www.akira-tech.com
ABOUT AKIRA
Relational databases -> MongoDB
Integrating with MongoDBStress-testingComplications and solutions
ABOUT THIS PRESENTATION
“Leading source of quality data about the nation’s people and economy”
US Census Bureau
ABOUT THE CLIENT
PROBLEM TO SOLVE
WHERE DOES THE QUALITY DATA COME FROM?
SurveysDecennial SurveyAmerican Community Survey Survey of Income and Program Participation Current Populat ion Survey Consumer Expendi ture Survey Nat iona l Heal th Interv iew Survey American Hours ing Survey American Time Use Survey Beginn ing Teacher Long i tud ina l Study Consumer Expendi ture Survey Current Populat ion Survey Hous ing Vacancy Survey Ident i ty Theft Supplement Nat iona l Ambulatory Medica l Care Survey
Most of these surveys use unique software to collect
and process the data
PROBLEM
Different data storages: Text files or different formatBinary data filesRelational databases
Incompatible data structuresDozens of programming languages Technologies incompatibles closed architectures
PROBLEM
Single solution for all surveysScalableInexpensiveReusable cross-survey dataWorks on all devicesFlat learning curve for developers
AKIRA GOAL
15© Akira Technologies, 2013
SURVEY DSLHow to define a new survey
© Akira Technologies, 2013 16
HOW TO DEFINE SURVEY
Akira has developed a prototype of an engine that takes JSON specification and turns it into feature-rich survey
JSON is a simple language that analysts can use to define questions
Developers later can enrich user experience with custom bells and whistles
Both analysts and developers can reuse existing functionality in future surveys
© Akira Technologies, 2013 17
DEFINE SURVEY
1. survey = {2. name: "decennial",3. type: 'survey',4. title: "Decennial Survey",5. stylesheet: "decennial.css",6. header: "MongoDB PoC Demo",7. footer: "Measuring America …",8. intro: "The Census must count every
…",9. questions: []10.}
© Akira Technologies, 2013 18
QUESTIONS
1. {2. name: "num_people_1",3. question: "How many people were l iving or staying …",4. details: "<B>INCLUDE</B> in this number:<UL><LI>…",5. type: "int",6. required: true,7. },{8. name: "additional_people_2",9. question: "Were there any additional people staying …",10. details: "Mark all that apply",11. type: "checkbox",12. options: [13. "Children, such as newborn babies or foster children",14. "Relatives, such as adult children, cousins …",15. "Nonrelatives",16. "People staying here temporarily",17. "No additional people”18. }
© Akira Technologies, 2013 19
BELLS AND WHISTLES
1. validator: function(s) {2. n = parseInt( s, 10 ); 3. return n > 0 && n < 5 ? null : ”Enter value in
[1,4] range”4. },
5. decorator: function( q ) {6. for( i = 1; i <= 4; i++ ) {7. if( i > parseInt( $("input#num_people_1").val(),
10 ) ) {8. $("div#question_person_"+i).hide();9. } else {10. $("div#question_person_"+i).show();11. }12. }
© Akira Technologies, 2013 20
ACTION!
Let’s see how this code works:http://census.akira-tech.com/survey/8
Contains all Decennial questionsWritten in <1 hourValidates input in realtimeUpdates interface dynamically
© Akira Technologies, 2013 21
© Akira Technologies, 2013 22
ACTION!
Take a look at another example:http://census.akira-tech.com/survey/9
Contains most ACS questionsWritten in <1 hour Supports geolocation
© Akira Technologies, 2013 23
© Akira Technologies, 2013 24
MORE CUSTOMIZATIONS
We didn’t have to create a new database schema
We didn’t have to create new web interfaceWe can customize all aspects of survey processHere is another great example:
http://census.akira-tech.com/test/10000001
http://census.akira-tech.com/test_dashboard/sat_math
© Akira Technologies, 2013 25
26© Akira Technologies, 2013
MOBILE VERSION
© Akira Technologies, 2013 27
MOBILE VERSION
Speaking of geolocation …Let’s take a look at iPad version.
Android/Windows versions will look similarIt looks basically the same, not surprisinglySurprisingly the iPad is not even connected to
the internet!Let’s complete the survey anyway and submit
the resultsEnjoy the result!
© Akira Technologies, 2013 28
© Akira Technologies, 2013 29
OFFLINE MODE
Let’s shutdown the browser, it’s not in RAM anymore
Let’s connect the iPad to the internetGet back to the pageIsn’t what you see is awesome?!This is complete offl ine mode out of the
boxThe interface is the same ->
No re-learning Single interface for CAPI, WAPI, CATI and more Less code to maintain
© Akira Technologies, 2013 30
31© Akira Technologies, 2013
ANALYTICS
After some t ime this system could col lect bi l l ions of records. How do we process them?
© Akira Technologies, 2013 32
DATA PROCESSING
Hadoop allows distributed data processing on thousands servers
Cloudera manager and AWS CLI allow deploying hundres of servers in minutes
We have deployed cluster of 3 nodes in AWS. Let’s see how it can be reconfigured
© Akira Technologies, 2013 33
LET’S PROCESS SOME DATA
There are a lot of options for processing data in Hadoop: Hive – DataWarehouse infrastructure for data query and
analysis MapReduce – programming model for large-scale data
processing Pig – high-level platform for creating MapReduce programs
In this demo we chose Pig as the simplest way to demonstrate the power of Hadoop
This script loads data from MongoDB into , calculates statistics and pushes it back to MongoDB
© Akira Technologies, 2013 34
SOME PIG LATIN HERE
1. REGISTER /home/ec2-user/Distr/mongo-2.10.1.jar;2. REGISTER /usr/ l ib/hadoop-0.20/l ib/mongo-hadoop-
core_0.20.205.0-1.1.0.jar;3. REGISTER /usr/ l ib/hadoop-0.20/l ib/mongo-hadoop-
pig_0.20.205.0-1.1.0.jar;4. raw = LOAD 'mongodb://master:27017/mongodb_poc.invites'
USING com.mongodb.hadoop.pig.MongoLoader( 'u__id:chararray, token:chararray, emai l :chararray, survey:chararray, fi rst_name:chararray, last_name:chararray, address:chararray, interview_mode:chararray, fr_ id:int, processed:chararray, state:chararray' )
5. AS ( id, token, email , survey, fi rst, last, address, mode, fr, processed, state);
6. raw_l imited = LIMIT raw 3000;7. DUMP raw_l imited;8. raw_fi ltered = FILTER raw BY processed == 'true';9. total_processed = FOREACH raw_fi ltered GENERATE
COUNT(processed);10. total_by_state = GROUP raw_fi ltered BY state;11. DUMP total_by_state;12. DUMP total_processed;13. STORE total_by_state INTO
'mongodb://master:27017/mongodb_poc.statist ics'14. USING com.mongodb.hadoop.pig.MongoStorage();
© Akira Technologies, 2013 35
STATISTICS
Both MongoDB and Hadoop provide aggregation framework. Hadoop works best for slow crunching humongous
quantities of data MongoDB is good for quick calculations on reasonably
large (tens of millions records) scopesProcessed data is pushed back to Mongo for
storing and future visualizationWe used Highcharts (an open source library)
for data visualization
© Akira Technologies, 2013 36
37© Akira Technologies, 2013
ARCHITECTURE
© Akira Technologies, 2013 38
HOW MULTIPLE SURVEYS ARE POSSIBLE? MONGODB!
We use MongoDB Instead of creating complex database we store
both surveys and responses as documentsAll surveys are stored in the same collection.
If survey questions change, you still can query all versions of the survey in single query.
You even can query totally different survey results on core parameters, like DOB
The mongo cluster is deployed in the cloudWe use Amazon Web Services (AWS) as the
cloud platform, but can use any other solution as well or build our own
Let’s take a look at the deployment
© Akira Technologies, 2013 39
© Akira Technologies, 2013 40
THE POC HARDWARE
In the PoC we use very basic computers: Realtime data collection:
0.613 Gb RAM 30 Gb HDD 1 Core iPhone 5: 1Gb RAM, 32Gb SSD, Dual-core
Data processing: 1.7gb RAM 6 Gb storage 1 Core Samsung Galaxy S4: 2Gb RAM, 16Gb Storage, Quad-core
© Akira Technologies, 2013 41
WEBSERVER(S)
We use Node.js as the web serverAll code is written in JavascriptNode.js + MongoDB + Akira Survey DSL =
everything is written in Javascript. Learning curve is almost flat
Validators in survey definition can be used both on client side and server side, no need to duplicate code
Node.js is extremely fast, but Nginx is faster for static content. We use it as surrogate Content Delivery Network (CDN)
© Akira Technologies, 2013 42
ACTION!
Let’s go to the AWS console:
https://console.aws.amazon.com/ec2/v2/home?region=us-west-2#
InstancesMaster, Node1, Node2 all run the same confi guration:
Mongo Confi g, Mongo Router, Mongo DB + Node.jsNode3 is identical to other nodes except it typically
doesn’t run Mongo confi g serverAll nodes are load-balanced with Elastic Load Balanser
(ELB) If we turn on Node 3 it will be automatically included
into load balancing All nodes share MongoDB content, this is called
sharding
© Akira Technologies, 2013 43
PRODUCTION CONFIGURATION
PoC is different from the production configuration: Slow/cheap servers Only few nodes in clusters Runtime and data warehouse in the same cloud Simplified security No real CDN No Mongo replicas
Let’s take a look at how this could work in production
© Akira Technologies, 2013 44
45© Akira Technologies, 2013
MONITORING
How can we make sure the system works properly, predict fai lures and avoid them?
© Akira Technologies, 2013 46
AWS CONSOLE
Provides status of all your serversAllows shutting them down when you don’t
need them and bringing back in minutesYou can take software running on a server
and move it to a more powerful computer in minutes
Also provides vital statistics about all your servers
© Akira Technologies, 2013 47
© Akira Technologies, 2013 48
© Akira Technologies, 2013 49
© Akira Technologies, 2013 50
MONGODB
Many free opensource and commercial tools
Here is just one exampleProvides comprehensive statistics on all
aspects of a MongoDB cluster performance
© Akira Technologies, 2013 51
© Akira Technologies, 2013 52
HADOOP
Multiple commercial and opensource solutions for monitoring and managing Hadoop clusters
Cloudera Manager – deploys nodes in EC2 in bulk
Here is just the standard out-of the box Hadoop web interface for monitoring cluster health
© Akira Technologies, 2013 53
© Akira Technologies, 2013 54
© Akira Technologies, 2013 55
© Akira Technologies, 2013 56
57© Akira Technologies, 2013
PERFORMANCE
How can we guarantee the system wil l withstand real l i fe load?
© Akira Technologies, 2013 58
STRESS TESTING
We user Tsung to stress-test the system, creating load of up to 40000 simultaneous users.
Even on a single laptop the system was serving 250+ responses per second with avg response time < 1/3 sec
© Akira Technologies, 2013 59
SCALABILITY
What if you need even better performance?Scale vertically. Every node can be shut
down and restarted on more powerful hardware up to 32 cores, 117Gb RAM, 2Tb SSD
Scale horizontally. Hundreds of copies can be deployed in <1hr. Akira already built the infrastructure that supports plugging in hundreds of nodes
All scaling operations are either automated or could be automated with AWS CLI scripts
Add reliability by adding replicas to MongoDB
Nodes
CPU
Core
s
© Akira Technologies, 2013 60
61© Akira Technologies, 2013
INTEGRATIONEven the best system has l imited use i f we can’t integrate i t with other systems.
© Akira Technologies, 2013 62
INTEGRATION
Our PoC code provides complete REST API for manipulating surveys and responses
As we have demonstrated before we can easily integrate it with Oracle Service Bus and SOAP-based services
Data can be extracted into Oracle database both from MongoDB and Hadoop
SAS and R can be used to process data, both integrate with Hadoop and MongoDB
OPA Hadoop can write data into .csv files for OPA batch processing OPA batch processor can output to .csv for Hadoop MongoDB
consumption OPA Determinations engine can be queried from Hadoop MR
tasks
63© Akira Technologies, 2013
COSTHow much does this solution cost?
© Akira Technologies, 2013 64
SOFTWARE
Only open source software is used in this PoC
Most software can be used absolutely for free
Some has nominal fee, typically within $1000 range
All software has commercial licenses providing support.
© Akira Technologies, 2013 65
CLOUD SERVICES
If you don’t use it, you don’t pay for itHundreds of nodes can be deployed from the
images we have preparedHot-plug nodes can be preconfigured and
shutdown for future useThis is how much we paid for our servers for
3 weeks:
https://portal.aws.amazon.com/gp/aws/developer/account?ie=UTF8&action=activity-summary
© Akira Technologies, 2013 66
© Akira Technologies, 2013 67
SUMMARY
Effortlessly supports multiple surveys
Extremely scalable, leverages cloudLow-cost open sourceEasily integrates with Oracle DB,
Services, OPA Tested to handle stress loadsSupports online and offl ine
interview modes
CONTACT
Akira Technologies, Inc1747 Penn ave NW #600Washington, DC 20006P: 202.517.7187F: 800.589.3129E: [email protected]: www.akira-tech.com