(gam406) glu mobile: real-time analytics processing og 10 mm+ devices

42
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jordan Young Manager, Analytics Engineering, Glu Mobile October 2015 GAM406 Glu Mobile An Amazon Kinesis-centric data platform to process real-time gaming events for 10+ million user devices

Upload: amazon-web-services

Post on 15-Apr-2017

1.478 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Jordan Young – Manager, Analytics Engineering, Glu Mobile

October 2015

GAM406

Glu MobileAn Amazon Kinesis-centric data platform to process real-time

gaming events for 10+ million user devices

Page 2: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

What to expect from the session

• Glu Mobile: Data requirements and challenges

• Architecture overview and decisions

• Amazon Kinesis: Producers, Streams, and the Amazon

Kinesis Connector Library

• Real Time: Storm and Amazon Kinesis Storm Spout

• Other challenges and insights

Page 3: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Glu mobile basics

• A mobile gaming leader across genres

• 4 titles in top 100 grossing (US) (9/24/15)

• 4–6 million daily active users (DAU) (typical, 2015)

• 1 billion+ global installs (2010-)

Page 4: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

What we collect

High, variable volume

• 700 million to 2+ billion

events per day

• 600 bytes per event

• Up to 1.2 TB per day

• Could scale up further with

successful game launch

Multiple sources

• Client side SDKs

• Game servers, central

services servers

• Attribution partners

• Ad networks

• Third parties

Page 5: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Basic requirements

• Near zero data loss

• High levels of uptime

• Flexible data format — JSON with arbitrary fields

• Real-time aggregations

• Reasonably low latency for ad-hoc queries

(hourly batching OK)

Page 6: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Other requirements

• Not expensive

• Can be implemented with minimal engineering effort

• Requires minimal changes to existing games

Page 7: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Architecture: Past, Present,

and Why

Page 8: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

First redesign

Page 9: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

First redesign

Page 10: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Next Step: Bring data collection in house

• Build our own analytics SDK

• Need a framework for collecting data from SDK

• Options:

• Build our own streaming and collection (Apache Kafka)

• Use a hosted service (Amazon Kinesis)

Page 11: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Amazon Kinesis: Producers,

Streams, and the Amazon

Kinesis Connector Library

Page 12: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

What is Amazon Kinesis?

Page 13: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Why Amazon Kinesis?

• Minimal setup time

• Prebuilt applications (Amazon Kinesis Connector Library

[KCL], Amazon Kinesis Storm Spout)

• Extremely minimal maintenance

• Minimal hardware

• No significant price advantages either way (vs Kafka)

Page 14: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Producers

• Custom built client SDKs

• Native Android (Java), Native iOS (Obj-C) plug-ins

• Unity wrapper for unity titles

• Built on top of AWS SDKs (for each platform)

• Implements our internal analytical schema / standards

• Additional server-side implementations

Page 15: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Producers (continued)

• Vanilla KinesisRecord.submitAllRecords()

• No record batching

• No compression

• Records flushed every 30 seconds, or on certain events

• Client authentication using Amazon Cognito

• Server authentication using AWS Identity and Access

Management (IAM) profiles

Page 16: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

How many shards?

• Shard limits:

• 1,000 records per second

• 1 MB per sec writes

• 2 MB per sec read

• Our situation: 20,000 RPS, 600 bytes per message

• Need at least 20 shards to handle message count

• Only need 12 MB per sec write capacity

• 20 shards = 40 MB per sec read capacity

• Up to 3 apps OK (36 MB < 40 MB)

• Other considerations (peak load, excess capacity)

Page 17: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Consumers: Amazon Kinesis Connector Library

Page 18: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Consumers: KCL pipeline

public class S3Pipeline implements IKinesisConnectorPipeline<String, byte[]> {

@Override

public ITransformer<String, byte[]> getTransformer(KinesisConnectorConfiguration configuration) {

return new GluJsonTransformer();

}

@Override

public IFilter<String> getFilter(KinesisConnectorConfiguration configuration) {

return new GluMessageFilter<String>();

}

@Override

public IBuffer<String> getBuffer(KinesisConnectorConfiguration configuration) {

return new BasicMemoryBuffer<String>(configuration);

}

@Override

public IEmitter<byte[]> getEmitter(KinesisConnectorConfiguration configuration) {

return new GluS3Emitter(configuration);

}

}

Page 19: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Consumers: Transformer implementation ex.

@Overridepublic String toClass(Record record) {

String json_str = "";

try { json_str = new String(record.getData().array()); } catch (Exception e) { return null; }

if (json_str != null && !json_str.isEmpty()) {if (json_str.startsWith("{") && json_str.endsWith("}")) {

json_str = json_str.substring(0, json_str.length()-1);if (json_str.length() > 3) {

json_str += ",";

json_str = json_str + "\"kin_seq_num\":\"" + record.getSequenceNumber() + "\",";json_str = json_str + "\"server_ts\":" + System.currentTimeMillis() + "}";

}

}}return json_str;

}

@Overridepublic byte[] fromClass(String record) {

return record.getBytes();}

Page 20: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Real Time: Storm and the

Amazon Kinesis Storm Spout

Page 21: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Storm and real-time data

• Distributed, fault-tolerant

• Processes real-time data

• Views records as “tuples” which are passed through an

arbitrary DAG of nodes (a topology)

• Spouts: Emit tuples into the topology

• Bolts: Process tuples in the topology

• Read from anywhere, write to anywhere

Page 22: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Storm cluster architecture

Page 23: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Storm: Real-time aggregation

Page 24: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Implementing the Amazon Kinesis Storm Spout

//Define configuration parameters

final KinesisSpoutConfig config =

new KinesisSpoutConfig(streamName, zookeeperEndpoint).withZookeeperPrefix(zookeeperPrefix)

.withKinesisRecordScheme(new DefaultKinesisRecordScheme())

.withRecordRetryLimit(recordRetryLimit)

.withInitialPositionInStream(initialPositionInStream) //LATEST or TRIM_HORIZON

.withEmptyRecordListBackoffMillis(emptyRecordListBackoffMillis);

//Create Spout

final KinesisSpout spout = new KinesisSpout(config, new CustomCredentialsProviderChain(awsAccessKey, awsSecretKey), new ClientConfiguration());

//Set Spout in Topology and define parallelism

builder.setSpout("kinesis_spout", spout, num_spout_executors);

Page 25: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Storm: Lessons

• Only extract necessary fields when deserializing

• Big instances with few workers (JVMs)

• Too many workers can reduce speeds

• Balance flexibility vs. speed

• Final state (throughput and hardware)

• Can handle up to ~42K RPS on (4) c4.2xlarge instances

• Using (2) m3.large for ZooKeeper, m3.xlarge for Nimbus

Page 26: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Glu’s new architecture

Page 27: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Challenges and Insights

Page 28: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Challenge: Shards, buffers, and file size

• Each shard is handled by its own buffer

• A KCL instance needs a buffer for each shard

• More shards per machine = more memory

• Avoid memory bottleneck by reducing buffer size

Creates more, smaller files

• Hadoop does not like this!

Page 29: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Solution: CombineFileInputFormat

Page 30: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Solution: CombineFileInputFormat (continued)

Page 31: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Challenge: No IP address on record

• Record sent to stream without IP

• Device doesn’t know its own IP

• Amazon Kinesis does not provide client IP

• But rely on IP address for GEO lookup

• No geographic splits

• Big problem

Page 32: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Solution: Geo lookup service (v1)

Page 33: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Solution: Geo lookup service (v2)

Page 34: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Challenge: Scaling

• Can our system scale with minimal effort and impact?

• Stream

• Can scale up / down with Amazon Kinesis Scaling Utils

• Consumers

• Can add more machines, pegged to shard count rather

than records if memory bottlenecked

• Hadoop

• Add more nodes, but enough extra room

• Storm?

Page 35: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Kinesis Scaling Utils

$ java -cp KinesisScalingUtils.jar-complete.jar \

-Dstream-name=<stream name> -Dscaling-action=resize \

–Dcount=<new shard count> ScalingClient

Github: Kinesis Scaling Utils

Page 36: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Scaling and Amazon Kinesis Storm Spout

• Assigning tasks to shards

• Required topology restart so that zookeeper could refresh the

shard list

• Solved in 1.1.1; now only requires “storm rebalance”

• Need to be sure that withEmptyRecordListBackoffMillis

setting is adequately low (defaults to 5 post 1.1.0)

• Loss of state

• Restarting / rebalancing causes tasks to lose their state.

• Breaks topology operations, which require state such as

unique counts and joins.

Page 37: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Redis to the rescue

• Redis is a scalable, in-memory key-value store

• Solution: Store long running state to Redis

• Count unique values using “sets”

• Perform joins using key-value hashes

• Easy deployment / management using Amazon ElastiCache

Page 38: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Redis unique counter: Local aggregator@Override

public void execute(Tuple tuple) {

if (!(TupleHelpers.isTickTuple(tuple))) {

addItemToSet(listOfIds, tuple.getID());

} else {

emit(listOfIds);

}

collector.ack(tuple);

}

private void addItemToSet(HashSet<String> listOfIds, String id) {

listOfIds.add(id);

}

private void emit(listOfIds) {

Jedis jedis = redisPool.getResource();

Pipeline redisPipeline = jedis.pipelined();

String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd");

ttlSec = 60 * 60 * 24;

for(String id : listOfIds){ redisPipeline.sadd(redisKeyName, id); }

redisPipeline.sync();

if(jedis.ttl(redisKeyName) == -1 ) {

jedis.expire(redisKeyName, ttlSec);

}

jedis.close();

listOfIds.clear()

}

Page 39: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Redis unique counter: Global aggregator

@Override

public void execute(Tuple tuple) {

Jedis jedis = redisPool.getResource();

String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd");

Double unique_count = jedis.scard(redisKeyName).doubleValue();

jedis.close();

emitToWherever(redisKeyName, unique_count);

collector.ack(tuple);

}

@Override

public Map<String, Object> getComponentConfiguration() {

Map<String, Object> conf = new HashMap<String, Object>();

conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds);

return conf;

}

Page 40: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

In closing

• Amazon Kinesis Connector Library makes basic

consumer applications simple

• Amazon Kinesis Storm Spout enables real-time

processing

• Optimize Hadoop file size with CombineFileInputFormat

• Geo Lookup service in lieu of Amazon Kinesis API

• Scale with Amazon Kinesis scaling utils and Storm

Spout 1.1.1

Page 41: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Thank you!

GAM406

Page 42: (GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

Remember to complete

your evaluations!