(gam406) glu mobile: real-time analytics processing og 10 mm+ devices
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jordan Young – Manager, Analytics Engineering, Glu Mobile
October 2015
GAM406
Glu MobileAn Amazon Kinesis-centric data platform to process real-time
gaming events for 10+ million user devices
What to expect from the session
• Glu Mobile: Data requirements and challenges
• Architecture overview and decisions
• Amazon Kinesis: Producers, Streams, and the Amazon
Kinesis Connector Library
• Real Time: Storm and Amazon Kinesis Storm Spout
• Other challenges and insights
Glu mobile basics
• A mobile gaming leader across genres
• 4 titles in top 100 grossing (US) (9/24/15)
• 4–6 million daily active users (DAU) (typical, 2015)
• 1 billion+ global installs (2010-)
What we collect
High, variable volume
• 700 million to 2+ billion
events per day
• 600 bytes per event
• Up to 1.2 TB per day
• Could scale up further with
successful game launch
Multiple sources
• Client side SDKs
• Game servers, central
services servers
• Attribution partners
• Ad networks
• Third parties
Basic requirements
• Near zero data loss
• High levels of uptime
• Flexible data format — JSON with arbitrary fields
• Real-time aggregations
• Reasonably low latency for ad-hoc queries
(hourly batching OK)
Other requirements
• Not expensive
• Can be implemented with minimal engineering effort
• Requires minimal changes to existing games
Architecture: Past, Present,
and Why
First redesign
First redesign
Next Step: Bring data collection in house
• Build our own analytics SDK
• Need a framework for collecting data from SDK
• Options:
• Build our own streaming and collection (Apache Kafka)
• Use a hosted service (Amazon Kinesis)
Amazon Kinesis: Producers,
Streams, and the Amazon
Kinesis Connector Library
What is Amazon Kinesis?
Why Amazon Kinesis?
• Minimal setup time
• Prebuilt applications (Amazon Kinesis Connector Library
[KCL], Amazon Kinesis Storm Spout)
• Extremely minimal maintenance
• Minimal hardware
• No significant price advantages either way (vs Kafka)
Producers
• Custom built client SDKs
• Native Android (Java), Native iOS (Obj-C) plug-ins
• Unity wrapper for unity titles
• Built on top of AWS SDKs (for each platform)
• Implements our internal analytical schema / standards
• Additional server-side implementations
Producers (continued)
• Vanilla KinesisRecord.submitAllRecords()
• No record batching
• No compression
• Records flushed every 30 seconds, or on certain events
• Client authentication using Amazon Cognito
• Server authentication using AWS Identity and Access
Management (IAM) profiles
How many shards?
• Shard limits:
• 1,000 records per second
• 1 MB per sec writes
• 2 MB per sec read
• Our situation: 20,000 RPS, 600 bytes per message
• Need at least 20 shards to handle message count
• Only need 12 MB per sec write capacity
• 20 shards = 40 MB per sec read capacity
• Up to 3 apps OK (36 MB < 40 MB)
• Other considerations (peak load, excess capacity)
Consumers: Amazon Kinesis Connector Library
Consumers: KCL pipeline
public class S3Pipeline implements IKinesisConnectorPipeline<String, byte[]> {
@Override
public ITransformer<String, byte[]> getTransformer(KinesisConnectorConfiguration configuration) {
return new GluJsonTransformer();
}
@Override
public IFilter<String> getFilter(KinesisConnectorConfiguration configuration) {
return new GluMessageFilter<String>();
}
@Override
public IBuffer<String> getBuffer(KinesisConnectorConfiguration configuration) {
return new BasicMemoryBuffer<String>(configuration);
}
@Override
public IEmitter<byte[]> getEmitter(KinesisConnectorConfiguration configuration) {
return new GluS3Emitter(configuration);
}
}
Consumers: Transformer implementation ex.
@Overridepublic String toClass(Record record) {
String json_str = "";
try { json_str = new String(record.getData().array()); } catch (Exception e) { return null; }
if (json_str != null && !json_str.isEmpty()) {if (json_str.startsWith("{") && json_str.endsWith("}")) {
json_str = json_str.substring(0, json_str.length()-1);if (json_str.length() > 3) {
json_str += ",";
json_str = json_str + "\"kin_seq_num\":\"" + record.getSequenceNumber() + "\",";json_str = json_str + "\"server_ts\":" + System.currentTimeMillis() + "}";
}
}}return json_str;
}
@Overridepublic byte[] fromClass(String record) {
return record.getBytes();}
Real Time: Storm and the
Amazon Kinesis Storm Spout
Storm and real-time data
• Distributed, fault-tolerant
• Processes real-time data
• Views records as “tuples” which are passed through an
arbitrary DAG of nodes (a topology)
• Spouts: Emit tuples into the topology
• Bolts: Process tuples in the topology
• Read from anywhere, write to anywhere
Storm cluster architecture
Storm: Real-time aggregation
Implementing the Amazon Kinesis Storm Spout
//Define configuration parameters
final KinesisSpoutConfig config =
new KinesisSpoutConfig(streamName, zookeeperEndpoint).withZookeeperPrefix(zookeeperPrefix)
.withKinesisRecordScheme(new DefaultKinesisRecordScheme())
.withRecordRetryLimit(recordRetryLimit)
.withInitialPositionInStream(initialPositionInStream) //LATEST or TRIM_HORIZON
.withEmptyRecordListBackoffMillis(emptyRecordListBackoffMillis);
//Create Spout
final KinesisSpout spout = new KinesisSpout(config, new CustomCredentialsProviderChain(awsAccessKey, awsSecretKey), new ClientConfiguration());
//Set Spout in Topology and define parallelism
builder.setSpout("kinesis_spout", spout, num_spout_executors);
Storm: Lessons
• Only extract necessary fields when deserializing
• Big instances with few workers (JVMs)
• Too many workers can reduce speeds
• Balance flexibility vs. speed
• Final state (throughput and hardware)
• Can handle up to ~42K RPS on (4) c4.2xlarge instances
• Using (2) m3.large for ZooKeeper, m3.xlarge for Nimbus
Glu’s new architecture
Challenges and Insights
Challenge: Shards, buffers, and file size
• Each shard is handled by its own buffer
• A KCL instance needs a buffer for each shard
• More shards per machine = more memory
• Avoid memory bottleneck by reducing buffer size
Creates more, smaller files
• Hadoop does not like this!
Solution: CombineFileInputFormat
Solution: CombineFileInputFormat (continued)
Challenge: No IP address on record
• Record sent to stream without IP
• Device doesn’t know its own IP
• Amazon Kinesis does not provide client IP
• But rely on IP address for GEO lookup
• No geographic splits
• Big problem
Solution: Geo lookup service (v1)
Solution: Geo lookup service (v2)
Challenge: Scaling
• Can our system scale with minimal effort and impact?
• Stream
• Can scale up / down with Amazon Kinesis Scaling Utils
• Consumers
• Can add more machines, pegged to shard count rather
than records if memory bottlenecked
• Hadoop
• Add more nodes, but enough extra room
• Storm?
Kinesis Scaling Utils
$ java -cp KinesisScalingUtils.jar-complete.jar \
-Dstream-name=<stream name> -Dscaling-action=resize \
–Dcount=<new shard count> ScalingClient
Github: Kinesis Scaling Utils
Scaling and Amazon Kinesis Storm Spout
• Assigning tasks to shards
• Required topology restart so that zookeeper could refresh the
shard list
• Solved in 1.1.1; now only requires “storm rebalance”
• Need to be sure that withEmptyRecordListBackoffMillis
setting is adequately low (defaults to 5 post 1.1.0)
• Loss of state
• Restarting / rebalancing causes tasks to lose their state.
• Breaks topology operations, which require state such as
unique counts and joins.
Redis to the rescue
• Redis is a scalable, in-memory key-value store
• Solution: Store long running state to Redis
• Count unique values using “sets”
• Perform joins using key-value hashes
• Easy deployment / management using Amazon ElastiCache
Redis unique counter: Local aggregator@Override
public void execute(Tuple tuple) {
if (!(TupleHelpers.isTickTuple(tuple))) {
addItemToSet(listOfIds, tuple.getID());
} else {
emit(listOfIds);
}
collector.ack(tuple);
}
private void addItemToSet(HashSet<String> listOfIds, String id) {
listOfIds.add(id);
}
private void emit(listOfIds) {
Jedis jedis = redisPool.getResource();
Pipeline redisPipeline = jedis.pipelined();
String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd");
ttlSec = 60 * 60 * 24;
for(String id : listOfIds){ redisPipeline.sadd(redisKeyName, id); }
redisPipeline.sync();
if(jedis.ttl(redisKeyName) == -1 ) {
jedis.expire(redisKeyName, ttlSec);
}
jedis.close();
listOfIds.clear()
}
Redis unique counter: Global aggregator
@Override
public void execute(Tuple tuple) {
Jedis jedis = redisPool.getResource();
String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd");
Double unique_count = jedis.scard(redisKeyName).doubleValue();
jedis.close();
emitToWherever(redisKeyName, unique_count);
collector.ack(tuple);
}
@Override
public Map<String, Object> getComponentConfiguration() {
Map<String, Object> conf = new HashMap<String, Object>();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds);
return conf;
}
In closing
• Amazon Kinesis Connector Library makes basic
consumer applications simple
• Amazon Kinesis Storm Spout enables real-time
processing
• Optimize Hadoop file size with CombineFileInputFormat
• Geo Lookup service in lieu of Amazon Kinesis API
• Scale with Amazon Kinesis scaling utils and Storm
Spout 1.1.1
Thank you!
GAM406
Remember to complete
your evaluations!