tomer elmalem - graphql apis: rest in peace - codemotion milan 2017

Post on 21-Jan-2018

113 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GraphQL at YelpREST in Peace

Tomer Elmalemtomer@yelp.com

@zehtomer

Yelp’s MissionConnecting people with great

local businesses.

In the beginning

In the beginning

Enter GraphQL

{ business(id: "yelp") { name alias rating } }

{ "data": { "business": { "name": "Yelp", "alias": "yelp-sf", "rating": 4.9 } } }

{ business(id: "yelp") { name alias rating reviews { text } } }

{ "data": { "business": { "name": "Yelp", "alias": "yelp-sf", "rating": 4.9, "reviews": [{ "text": "some review" }] } } }

{ business(id: "yelp") { name rating reviews { text } hours { ... } }}

{ "data": { "business": { "name": "Yelp", "rating": 4.9, "reviews": [{ "text": "some review" }], "hours": [{ ... }] } } }

{ b1: business(id: "yelp") { name rating reviews { text } } b2: business(id: "sforza") { name rating reviews { text } }}

{ "data": { "b1": { "name": "Yelp", "rating": 4.9, "reviews": [{ "text": "some review" }] }, "b2": { "name": "Sforza Castle", "rating": 5.0, "reviews": [{ "text": "awesome art" }] } } }

{ business(id: "yelp") { reviews { users { reviews { business { categories } } } } }}

{ "data": { "business": { "reviews": [{ "users": [{ "reviews": [ { "business": { "categories": ["food"] } }, { "business": { "categories": ["media"] } } ] }] }] } } }

Let’s start with some vocab

Query The representation of data you want returned

query { business(id: "yelp") { name rating reviews { text } hours { ... } }}

Schema The representation of your data structure

class Business(ObjectType):

name = graphene.String() alias = graphene.String() reviews = graphene.List(Review)

def resolve_name(root, ...): return "Yelp"

def resolve_alias(root, ...): return "yelp-sf"

def resolve_reviews(root, ...): return [ Review(...) for review in reviews ]

Fields The attributes available on your schema

class Business(ObjectType):

name = graphene.String() alias = graphene.String() reviews = graphene.List(Review)

def resolve_name(root, ...): return "Yelp"

def resolve_alias(root, ...): return "yelp-sf"

def resolve_reviews(root, ...): return [ Review(...) for review in reviews ]

{ "business": { "name": "Yelp", "alias": "yelp-sf", "rating": 4.9 } }

{ business(id: "yelp") { name alias rating }}

Resolvers Functions that retrieve data for a specific field in a schema

class Business(ObjectType):

name = graphene.String() alias = graphene.String() reviews = graphene.List(Review)

def resolve_name(root, ...): return "Yelp"

def resolve_alias(root, ...): return "yelp-sf"

def resolve_reviews(root, ...): return [ Review(...) for review in reviews ]

So how does this all work?

Our Setup• Dedicated public API service• Python 3.6• Graphene (Python GraphQL library)• Pyramid + uWSGI• Complex microservices architecture• No attached database

Our Setup• Dedicated public API service• Python 3.6• Graphene (Python GraphQL library)• Pyramid + uWSGI• Complex microservices architecture• No attached database

{ business(id: "yelp") { name reviews { text } }}

POST /v3/graphql

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

def verify_api_access(wrapped):

def wrapper(context, request): access_token = _validate_authorization_header(request)

response = _validate_token( access_token, path, request.client_addr )

request.client = response

if response.valid: return wrapped(context, request) else: raise UnauthorizedAccessToken()

return wrapper

def verify_api_access(wrapped):

def wrapper(context, request): access_token = _validate_authorization_header(request)

response = _validate_token( access_token, path, request.client_addr )

request.client = response

if response.valid: return wrapped(context, request) else: raise UnauthorizedAccessToken()

return wrapper

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

class Query(graphene.ObjectType):

business = graphene.Field( Business, alias=graphene.String(), )

search = graphene.Field( Businesses, term=graphene.String(), location=graphene.String(), # ... )

# ...

class Query(graphene.ObjectType):

business = graphene.Field( Business, alias=graphene.String(), )

search = graphene.Field( Businesses, term=graphene.String(), location=graphene.String(), # ... )

# ...

class Query(graphene.ObjectType):

# ...

@verify_limited_graphql_access('graphql') def resolve_business(root, args, context, info): alias = args.get('alias') internalapi_client = get_internal_api_client()

business = internalapi_client.business.get_business( business_alias=alias ).result()

return context['dataloaders'].businesses.load(business.id)

class Query(graphene.ObjectType):

# ...

@verify_limited_graphql_access('graphql') def resolve_business(root, args, context, info): alias = args.get('alias') internalapi_client = get_internal_api_client()

business = internalapi_client.business.get_business( business_alias=alias ).result()

return context['dataloaders'].businesses.load(business.id)

class Query(graphene.ObjectType):

# ...

@verify_limited_graphql_access('graphql') def resolve_business(root, args, context, info): alias = args.get('alias') internalapi_client = get_internal_api_client()

business = internalapi_client.business.get_business( business_alias=alias ).result()

return context['dataloaders'].businesses.load(business.id)

class Query(graphene.ObjectType):

# ...

@verify_limited_graphql_access('graphql') def resolve_business(root, args, context, info): alias = args.get('alias') internalapi_client = get_internal_api_client()

business = internalapi_client.business.get_business( business_alias=alias ).result()

return context['dataloaders'].businesses.load(business.id)

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

class Query(graphene.ObjectType):

# ...

@verify_limited_graphql_access('graphql') def resolve_business(root, args, context, info): alias = args.get('alias') internalapi_client = get_internal_api_client()

business = internalapi_client.business.get_business( business_alias=alias ).result()

return context['dataloaders'].businesses.load(business.id)

The Schema class Business(graphene.ObjectType):

name = graphene.String() reviews = graphene.List(Reviews)

def resolve_name(root, context, ...): return root.name

def resolve_reviews(root, context, ...): return [ Review(...) for review in root.reviews ]

Dataloaders…?

The N+1 Problem

The N+1 Problem The inefficient loading of data by making individual, sequential queries

cats = load_cats()cat_hats = [ load_hats_for_cat(cat) for cat in cats]

# SELECT * FROM cat WHERE ...# SELECT * FROM hat WHERE catID = 1 # SELECT * FROM hat WHERE catID = 2 # SELECT * FROM hat WHERE catID = ...

query { b1: business(id: "yelp") { name } b2: business(id: "moma") { name } b3: business(id: "sushi") { name } b4: business(id: "poke") { name } b5: business(id: "taco") { name } b6: business(id: "pizza") { name }}

GET /internalapi/yelpGET /internalapi/momaGET /internalapi/sushiGET /internalapi/pokeGET /internalapi/tacoGET /internalapi/pizza

query { b1: business(id: "yelp") { name } b2: business(id: "moma") { name } b3: business(id: "sushi") { name } b4: business(id: "poke") { name } b5: business(id: "taco") { name } b6: business(id: "pizza") { name }}

GET /internalapi/yelpGET /internalapi/momaGET /internalapi/sushiGET /internalapi/pokeGET /internalapi/tacoGET /internalapi/pizza

Dataloaders!• An abstraction layer to load data in your resolvers• Handle batching ids and deferring execution until all of your data has been

aggregated

query { b1: business(id: "yelp") { name } b2: business(id: "moma") { name } b3: business(id: "sushi") { name } b4: business(id: "poke") { name } b5: business(id: "taco") { name } b6: business(id: "pizza") { name }}

GET /internalapi/yelpGET /internalapi/momaGET /internalapi/sushiGET /internalapi/pokeGET /internalapi/tacoGET /internalapi/pizza

query { b1: business(id: "yelp") { name } b2: business(id: "moma") { name } b3: business(id: "sushi") { name } b4: business(id: "poke") { name } b5: business(id: "taco") { name } b6: business(id: "pizza") { name }}

GET /internalapi/yelp,moma,sushi,poke,

The View @view_config( route_name='api.graphql', renderer='json', decorator=verify_api_access, ) def graphql(request): schema = graphene.Schema( query=Query, )

locale = request.headers.get( 'Accept-Language' )

context = { 'request': request, 'client': request.client, 'dataloaders': DataLoaders(locale), }

return schema.execute( request.body, context_value=context )

class DataLoaders:

def __init__(self, locale): self.businesses = BusinessDataLoader(locale) self.coordinates = CoordinatesDataLoader() self.hours = HoursDataLoader() self.photos = PhotosDataLoader() self.events = EventDataLoader() self.reviews = ReviewsDataLoader(locale) self.venues = VenueDataLoader()

Dataloader class BusinessDataLoader(DataLoader):

def __init__(self, locale, **kwargs): super().__init__(**kwargs) self._locale = locale

def batch_load_fn(self, biz_ids): businesses = get_businesses_info( biz_ids, self._locale ).result()

biz_id_map = self._biz_map( businesses )

return Promise.resolve([ biz_id_map.get(biz_id) for biz_id in biz_ids ])

def _biz_map(self, businesses): return { biz.id: biz for biz in businesses }

Dataloader class BusinessDataLoader(DataLoader):

def __init__(self, locale, **kwargs): super().__init__(**kwargs) self._locale = locale

def batch_load_fn(self, biz_ids): businesses = get_businesses_info( biz_ids, self._locale ).result()

biz_id_map = self._biz_map( businesses )

return Promise.resolve([ biz_id_map.get(biz_id) for biz_id in biz_ids ])

def _biz_map(self, businesses): return { biz.id: biz for biz in businesses }

class Query(graphene.ObjectType):

# ...

@verify_limited_graphql_access('graphql') def resolve_business(root, args, context, info): alias = args.get('alias') internalapi_client = get_internal_api_client()

business = internalapi_client.business.get_business( business_alias=alias ).result()

return context['dataloaders'].businesses.load(business.id)

Our Setup• Dedicated public API service• Python 3.6• Graphene (Python GraphQL library)• Pyramid + uWSGI• Complex microservices architecture• No attached database

Considerations• Caching• Performance• Complexity• Rate limiting• Security• Error handling

Caching• Edge caching is hard• Greater diversity of requests• Many caching strategies don't fit

query { business(id: "yelp") { name }}

query { business(id: "yelp") { name rating }}

query { search(term: "burrito", latitude: 30.000, longitude: 30.000) { ... }}

query { search(term: "burrito", latitude: 30.001, longitude: 30.001) { ... }}

query { search(term: "Burrito", latitude: 30.000, longitude: 30.000) { ... }}

Service Caching• Network caching proxy• Generic caching service• Can wrap any service, applies to everyone

What about bulk data?

Caching in Bulk• ID-based caching, setup a key: value cache map• Parse and cache individual models, don't cache the entire response as-is

cached_endpoints: user.v2: { ttl: 3600, pattern: "(^/user/v2(?:\\?|\\?.*&)ids=)((?:\\d|%2C)+)(&.*$|$)", bulk_support: true, id_identifier: 'id' }

cached_endpoints: user.v2: { ttl: 3600, pattern: "(^/user/v2(?:\\?|\\?.*&)ids=)((?:\\d|%2C)+)(&.*$|$)", bulk_support: true, id_identifier: 'id' }

Request Budgets

Request BudgetsX-Ctx-Request-Budget 1000

sleep(0.470)

X-Ctx-Request-Budget 530

Request BudgetsX-Ctx-Request-Budget 1000

sleep(1.470)

X-Ctx-Request-Budget -530

Complexity

{ business(id: "yelp") { reviews { users { reviews { business { categories } } } } }}

{ "data": { "business": { "reviews": [{ "users": [{ "reviews": [ { "business": { "categories": ["food"] } }, { "business": { "categories": ["media"] } } ] }] }] } } }

Rate Limiting

Normally GET https://api.yelp.com/v3/search

Normally GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search GET https://api.yelp.com/v3/search

GraphQL POST https://api.yelp.com/v3/graphql

query { business(id: "yelp-san-francisco") { name } }

GraphQL POST https://api.yelp.com/v3/graphql

query { b1: business(id: "yelp-san-francisco") { name } b2: business(id: "garaje-san-francisco") { name } b3: business(id: "moma-san-francisco") { name } }

GraphQL POST https://api.yelp.com/v3/graphql

query { search(term: "burrito", location: "sf") { business { name reviews { rating text } } } }

Node-based• Count individual nodes returned by

the request sent to the API

POST https://api.yelp.com/v3/graphql

query { search(term: "burrito", location: "sf") { business { name reviews { rating text } } } }

Node-based• Count individual nodes returned by

the request sent to the API

POST https://api.yelp.com/v3/graphql

query { search(term: "burrito", location: "sf") { business { name reviews { rating text } } } }

Node-based• Count individual nodes returned by

the request sent to the API

POST https://api.yelp.com/v3/graphql

query { search(term: "burrito", location: "sf") { business { name reviews { rating text } } } }

Node-based• Count individual nodes returned by

the request sent to the API

POST https://api.yelp.com/v3/graphql

query { search(term: "burrito", location: "sf") { business { name reviews { rating text } } } }

Field-based• Count each individual field returned

by the request sent to the API

POST https://api.yelp.com/v3/graphql

query { search(term: "burrito", location: "sf") { business { name id } } }

Field-based• Count each individual field returned

by the request sent to the API

{ "data": { "search": { "business": [ { "name": "El Farolito", "id": "el-farolito-san-francisco-2" }, { "name": "La Taqueria", "id": "la-taqueria-san-francisco-2" }, { "name": "Taqueria Guadalajara", "id": "taqueria-guadalajara-san-francisco" }, { "name": "Taqueria Cancún", "id": "taqueria-cancún-san-francisco-5" }, { "name": "Little Taqueria", "id": "little-taqueria-san-francisco" }, { "name": "Pancho Villa Taqueria", "id": "pancho-villa-taqueria-san-francisco" }, { "name": "Tacorea", "id": "tacorea-san-francisco" }, { "name": "El Burrito Express - San Francisco", "id": "el-burrito-express-san-francisco-san-francisco" }, { "name": "El Burrito Express", "id": "el-burrito-express-san-francisco" }, ... ] } } }

Field-based• Count each individual field returned

by the request sent to the API

{ "data": { "search": { "business": [ { "name": "El Farolito", "id": "el-farolito-san-francisco-2" }, { "name": "La Taqueria", "id": "la-taqueria-san-francisco-2" }, { "name": "Taqueria Guadalajara", "id": "taqueria-guadalajara-san-francisco" }, { "name": "Taqueria Cancún", "id": "taqueria-cancún-san-francisco-5" }, { "name": "Little Taqueria", "id": "little-taqueria-san-francisco" }, { "name": "Pancho Villa Taqueria", "id": "pancho-villa-taqueria-san-francisco" }, { "name": "Tacorea", "id": "tacorea-san-francisco" }, { "name": "El Burrito Express - San Francisco", "id": "el-burrito-express-san-francisco-san-francisco" }, { "name": "El Burrito Express", "id": "el-burrito-express-san-francisco" }, ... ] } } }

Securing the API• Bulk endpoints to minimize the number of queries• Network-level caching• Daily rate limiting• Limiting the maximum query size• Per-resolver level authentication• Persisted queries

Securing the API• Bulk endpoints to minimize the number of queries• Network-level caching• Daily rate limiting• Limiting the maximum query size• Per-resolver level authentication• Persisted queries

class MaxQuerySizeMiddleware: MAX_SIZE = 2000

def __init__(self): resolvers_executed = 0

def resolve(self, next, root, info, **args): # did we hit the max for this query? nope if resolvers_executed <= MAX_SIZE: self.resolvers_executed += 1 return next(root, info, **args) # we hit the max for this query return None

Easy* failure handling and retries• GraphQL requests can partially succeed!

{ business(id: "yelp") { name rating reviews { text } hours { ... } }}

{ "data": { "business": { "name": "Yelp", "rating": 4.9, "reviews": [{ "text": "some review" }], "hours": null, } }, "errors": [ { "description": "could not load hours", "error_code": "HOURS_FAILED" } ] }

HTTP 200

Easy* failure handling and retries• GraphQL requests can partially succeed!• But… that makes some other failure cases trickier

{ business(id: "123-fake-street") { name rating reviews { text } hours { ... } }}

{ "data": null, "errors": [ { "description": "business not found", "error_code": "BUSINESS_NOT_FOUND" } ] }

HTTP 200

def talk(): return end

Building UI Consistent Android AppsSaturday - 11.30 in Room 6 

Nicola Corti

{ questions? { answers } }

top related