multi criteria queries on a cassandra application

58
Multi-criteria Queries on a Cassandra Application Jérôme Mainaud

Upload: ippon

Post on 26-Jan-2017

1.712 views

Category:

Technology


1 download

TRANSCRIPT

Multi-criteria Queries on a Cassandra Application

Jérôme Mainaud

Ippon Technologies © 2015#CassandraSummit

Who am I

Jérôme Mainaud

➔ @jxerome

➔ Software Architect at Ippon Technologies, Paris

➔ DataStax Solution Architect Certified

Ippon Technologies © 2015#CassandraSummit

Ippon Technologies

● 200 software engineers in France and the US

➔ Paris, Nantes, Bordeaux

➔ Richmond (Virginia), Washington (DC)

● Expertise

➔ Digital, Big Data and Cloud

➔ Java & Agile

● Open-source Projects :

➔ JHipster,

➔ Tatami …

● @ipponusa

Agenda

1. Context2. Technical Stack3. Modelisation4. Implementation5. Results

Ippon Technologies © 2015

Warning

The following slideshow features data patterns and

code performed by professionals.

Accordingly, Ippon and conference organisers must

insist that no one attempt to recreate any data pattern

and code performed in this slideshow.

Once Upon a time an app …

Ippon Technologies © 2015#CassandraSummit

Once Upon a time an app …

Invoice application in SAAS

➔ A single database for all users

➔ Data isolation for each user

High volume data

➔ 1 year

➔ 500 millions invoices

➔ 2 billions invoice lines

Ippon Technologies © 2015#CassandraSummit

Once Upon a time an app …

Ippon Technologies © 2015#CassandraSummit

Once Upon a time an app …

Ippon Technologies © 2015#CassandraSummit

Back-end evolution

Technical Stack

Ippon Technologies © 2015#CassandraSummit

Technical Stack

JHipster

➔ Spring Boot + AngularJS Application Generator

➔ Support JPA, MongoDB

➔ and now Cassandra!

Made us generate first version very fast

➔ Application skeleton ready in 5 minutes

➔ Add entities tables, objets and mapping

➔ Configuration, build, logs management, etc.

➔ Gatling Tests ready to use

http://jhipster.github.io

Ippon Technologies © 2015#CassandraSummit

Technical Stack

Spring Boot

➔ Build on Spring

➔ Convention over configuration

➔ Many “starters” ready to use

Services Web

➔ CXF instead of Spring MVC REST

Cassandra

➔ DataStax Enterprise

Java 8

Ippon Technologies © 2015#CassandraSummit

JHipster — Code generator

● But

➔ Cassandra was not yet supported

➔ No AngularJS nor frontend

➔ CXF instead of Spring MVC

Ippon Technologies © 2015#CassandraSummit

JHipster — Code generator

● But

➔ Cassandra was not yet supported

➔ No AngularJS nor frontend

➔ CXF instead of Spring MVC

● JHipster alpha generator

➔ Secret Generator secret used to

validate concepts before writing

Yeoman generator

Ippon Technologies © 2015#CassandraSummit

JHipster — Code generator

Julien DuboisCode Generator

Ippon Technologies © 2015#CassandraSummit

Cassandra Driver Configuration

Spring Boot Configuration

➔ No integration of driver DataStax Java Driver in Spring Boot

➔ Created Spring Boot autoconfiguration of DataStax Java Driver

➔ Use the standard YAML File

Offered to Spring Boot 1.3

➔ Github ticket #2064 « Add a spring-boot-starter-data-cassandra »

➔ Still opened

Improved by the Community

➔ JHipster version was improved by pull-request

➔ Authentication, Load-Balancer config

Data Model

Ippon Technologies © 2015#CassandraSummit

Conceptual Model

Ippon Technologies © 2015#CassandraSummit

Physical Model

Ippon Technologies © 2015#CassandraSummit

create table invoice ( invoice_id timeuuid, user_id uuid static, firstname text static, lastname text static, invoice_date timestamp static, payment_date timestamp static, total_amount decimal static, delivery_address text static, delivery_city text static, delivery_zipcode text static, item_id timeuuid, item_label text, item_price decimal, item_qty int, item_total decimal, primary key (invoice_id, item_id));

Table

Multi-criteria Search

Ippon Technologies © 2015#CassandraSummit

Multi-criteria Search

Mandatory Criteria

➔ User (implicit)

➔ Invoice date (range of dates)

Additional Criteria

➔ Client lastname

➔ Client firstname

➔ City

➔ Zipcode

Paginated Result

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

● Integrated in DataStax Enterprise

● Atomic and Automatic Index update

● Full-Text Search

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

● We search on static columns

➔ Solr don’t support them

● We search partitions

➔ Solr search lines

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

● We search on static columns

➔ Solr don’t support them

● We search partitions

➔ Solr search lines

Ippon Technologies © 2015#CassandraSummit

Shall we use secondary indexes ?

● Only one index used for a query

● Hard to get good performance with them

Ippon Technologies © 2015#CassandraSummit

Index Table

Use index tables

➔ Partition Key : Mandatory criteria and one additional criterium

○ user_id

○ invoice day (truncated invoice date)

○ additional criterium

➔ Clustering columns : Invoice UUID

Ippon Technologies © 2015#CassandraSummit

Index Table

Ippon Technologies © 2015#CassandraSummit

Materialized view

CREATE MATERIALIZED VIEW invoice_by_firstname

AS

SELECT invoice_id

FROM invoice

WHERE firstname IS NOT NULL

PRIMARY KEY ((user_id, invoice_day, firstname), invoice_id)

WITH CLUSTERING ORDER BY (invoice_id DESC)

new in

3.0

Ippon Technologies © 2015#CassandraSummit

Parallel Search on indexes

in memorymerge by application

Ippon Technologies © 2015#CassandraSummit

Parallel item detail queries

Result Page (id)

Ippon Technologies © 2015#CassandraSummit

Search

Search on date range

➔ loop an every days in the range and stop

when there is enough result for a page

Ippon Technologies © 2015#CassandraSummit

Search Complexity

Query count

➔ For each day in date range

○ 1 query per additional criterium filled (partition by query)

➔ 1 query per item in result page (partition by query)

Search Complexity

➔ partitions by query

Example: 3 criteria, 7 days, 100 items per page

➔ query count ≤ 3 × 7 + 100 = 121

JAVAIndexes

Ippon Technologies © 2015#CassandraSummit

Index — Instances

@Repository

public class InvoiceByLastNameRepository extends IndexRepository<String> {

public InvoiceByLastNameRepository() {

super("invoice_by_lastname", "lastname", Invoice::getLastName, Criteria::getLastName);

}

}

@Repository

public class InvoiceByFirstNameRepository extends IndexRepository<String> {

public InvoiceByFirstNameRepository() {

super("invoice_by_firstname", "firstname", Invoice::getFirstName, Criteria::getFirstName);

}

}

Ippon Technologies © 2015#CassandraSummit

Index — Parent Class

public class IndexRepository<T> {

@Inject

private Session session;

private final String tableName;

private final String valueName;

private final Function<Invoice, T> valueGetter;

private final Function<Criteria, T> criteriumGetter;

private PreparedStatement insertStmt;

private PreparedStatement findStmt;

private PreparedStatement findWithOffsetStmt;

@PostConstruct

public void init() { /* initialize PreparedStatements */ }

Ippon Technologies © 2015#CassandraSummit

Index — Insert

@Override

public void insert(Invoice invoice) {

T value = valueGetter.apply(invoice);

if (value != null) {

session.execute(

insertStmt.bind(

invoice.getUserId(),

Dates.toDate(invoice.getInvoiceDay()),

value,

invoice.getId()));

}

}

Ippon Technologies © 2015#CassandraSummit

Index — Insert — Prepare Statement

insertStmt = session.prepare(

QueryBuilder.insertInto(tableName)

.value("user_id", bindMarker())

.value("invoice_day", bindMarker())

.value(valueName, bindMarker())

.value("invoice_id", bindMarker())

);

Ippon Technologies © 2015#CassandraSummit

Index — Insert — Date conversion

public static Date toDate(LocalDate date) {

return date == null ? null :

Date.from(date.atStartOfDay().atZone(ZoneOffset.systemDefault()).toInstant());

}

Ippon Technologies © 2015#CassandraSummit

Index — Search

@Override

public CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID offset) {

T criterium = criteriumGetter.apply(criteria);

if (criterium == null) {

return CompletableFuture.completedFuture(null);

}

BoundStatement stmt;

if (invoiceIdOffset == null) {

stmt = findStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium);

} else {

stmt = findWithOffsetStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium, offset);

}

return Jdk8.completableFuture(session.executeAsync(stmt))

.thenApply(rs -> Iterators.transform(rs.iterator(), row -> row.getUUID(0)));

}

Ippon Technologies © 2015#CassandraSummit

Index — Search — Prepare Statement

findWithOffsetStmt = session.prepare(

QueryBuilder.select()

.column("invoice_id")

.from(tableName)

.where(eq("user_id", bindMarker()))

.and(eq("invoice_day", bindMarker()))

.and(eq(valueName, bindMarker()))

.and(lte("invoice_id", bindMarker()))

);

Ippon Technologies © 2015#CassandraSummit

Index — Search (Guava to Java 8)

public static <T> CompletableFuture<T> completableFuture(ListenableFuture<T> guavaFuture) {

CompletableFuture<T> future = new CompletableFuture<>();

Futures.addCallback(guavaFuture, new FutureCallback<T>() {

@Override

public void onSuccess(T result) {

future.complete(result);

}

@Override

public void onFailure(Throwable t) {

future.completeExceptionally(t);

}

});

return future;

}

JAVASearch Service

Ippon Technologies © 2015#CassandraSummit

Service — Class

@Service

public class InvoiceSearchService {

@Inject

private InvoiceRepository invoiceRepository;

@Inject

private InvoiceByDayRepository byDayRepository;

@Inject

private InvoiceByLastNameRepository byLastNameRepository;

@Inject

private InvoiceByFirstNameRepository byLastNameRepository;

@Inject

private InvoiceByCityRepository byCityRepository;

@Inject

private InvoiceByZipCodeRepository byZipCodeRepository;

Ippon Technologies © 2015#CassandraSummit

Service — Search

public ResultPage findByCriteria(Criteria criteria) {

return byDateInteval(criteria, (crit, day, offset) -> {

CompletableFuture<Iterator<UUID>> futureUuidIt;

if (crit.hasIndexedCriteria()) {

/*

* ... Doing multi-criteria search; see next slide ...

*/

} else {

futureUuidIt = byDayRepository.find(crit.getUserId(), day, offset);

}

return futureUuidIt;

});

}

Ippon Technologies © 2015#CassandraSummit

Service — Search

CompletableFuture<Iterator<UUID>>[] futures = Stream.<IndexRepository> of(

byLastNameRepository, byFirstNameRepository, byCityRepository, byZipCodeRepository)

.map(repo -> repo.find(crit, day, offset))

.toArray(CompletableFuture[]::new);

futureUuidIt = CompletableFuture.allOf(futures).thenApply(v ->

Iterators.intersection(TimeUUIDComparator.desc,

Stream.of(futures)

.map(CompletableFuture::join)

.filter(Objects::nonNull)

.collect(Collectors.toList())));

Ippon Technologies © 2015#CassandraSummit

Service — UUIDs Comparator

/**

* TimeUUID Comparator equivalent to Cassandra’s Comparator:

* @see org.apache.cassandra.db.marshal.TimeUUIDType#compare()

*/

public enum TimeUUIDComparator implements Comparator<UUID> {

desc {

@Override

public int compare(UUID o1, UUID o2) {

long delta = o2.timestamp() - o1.timestamp();

if (delta != 0)

return Ints.saturatedCast(delta);

return o2.compareTo(o1);

}

};

}

Ippon Technologies © 2015#CassandraSummit

Service — Days Loop

@FunctionalInterface

private static interface DayQuery {

CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID invoiceIdOffset);

}

private ResultPage byDateInteval(Criteria criteria, DayQuery dayQuery) {

int limit = criteria.getLimit();

List<Invoice> resultList = new ArrayList<>(limit);

LocalDate dayOffset = criteria.getDayOffset();

UUID invoiceIdOffset = criteria.getInvoiceIdOffset();

/* ... Loop on days ; to be seen in next slide ... */

return new ResultPage(resultList);

}

Ippon Technologies © 2015#CassandraSummit

Service — Days Loop

LocalDate day = criteria.getLastDay();

do {

Iterator<UUID> uuidIt = dayQuery.find(criteria, day, invoiceIdOffset).join();

limit -= loadInvoices(resultList, uuidIt, criteria, limit);

if (uuidIt.hasNext()) {

return new ResultPage(resultList, day, uuidIt.next());

}

day = day.minusDays(1);

invoiceIdOffset = null;

} while (!day.isBefore(criteria.getFirstDay()));

Ippon Technologies © 2015#CassandraSummit

Service — Invoices Loading

private int loadInvoices(List<Invoice> resultList, Iterator<UUID> uuidIt, int limit) {

List<CompletableFuture<Invoice>> futureList = new ArrayList<>(limit);

for (int i = 0; i < limit && uuidIt.hasNext(); ++i) {

futureList.add(invoiceRepository.findOne(uuidIt.next()));

}

futureList.stream()

.map(CompletableFuture::join)

.forEach(resultList::add);

return futureList.size();

}

Results

Ippon Technologies © 2015#CassandraSummit

Limits

● We got an exact-match search

➔ No full text search

➔ No « start with » search

➔ No pattern base search

● Requires highly discriminating mandatory criteria

➔ user_id & invoice_day

● Pagination doesn’t give total item count

➔ Could be done with additionnal query cost

● No sort availaible

Ippon Technologies © 2015#CassandraSummit

Hardware

● Hosted by Ippon Hosting

● 8 nodes

➔ 16 Gb RAM

➔ Two SSD drives with 256 Gb in RAID 0

● 6 nodes dedicated to Cassandra cluster

● 2 nodes dedicated to the application

Ippon Technologies © 2015#CassandraSummit

Application

● 5,000 concurrent users

● 9 months of data loaded

➔ Legacy system: store 1 year; search on last 3 months.

➔ Target: 3 years of history

● Real-time search Result

➔ Data are immediately available

➔ Legacy system: data available next day

● Cost Killer

Q & A

PARISBORDEAUX

NANTESWASHINGTON

NEW-YORKRICHMOND

[email protected] - www.ippon-hosting.com - www.ippon-digital.fr

@ippontech-

01 46 12 48 48