multi criteria queries on a cassandra application
TRANSCRIPT
Ippon Technologies © 2015#CassandraSummit
Who am I
Jérôme Mainaud
➔ @jxerome
➔ Software Architect at Ippon Technologies, Paris
➔ DataStax Solution Architect Certified
Ippon Technologies © 2015#CassandraSummit
Ippon Technologies
● 200 software engineers in France and the US
➔ Paris, Nantes, Bordeaux
➔ Richmond (Virginia), Washington (DC)
● Expertise
➔ Digital, Big Data and Cloud
➔ Java & Agile
● Open-source Projects :
➔ JHipster,
➔ Tatami …
● @ipponusa
Ippon Technologies © 2015
Warning
The following slideshow features data patterns and
code performed by professionals.
Accordingly, Ippon and conference organisers must
insist that no one attempt to recreate any data pattern
and code performed in this slideshow.
Ippon Technologies © 2015#CassandraSummit
Once Upon a time an app …
Invoice application in SAAS
➔ A single database for all users
➔ Data isolation for each user
High volume data
➔ 1 year
➔ 500 millions invoices
➔ 2 billions invoice lines
Ippon Technologies © 2015#CassandraSummit
Technical Stack
JHipster
➔ Spring Boot + AngularJS Application Generator
➔ Support JPA, MongoDB
➔ and now Cassandra!
Made us generate first version very fast
➔ Application skeleton ready in 5 minutes
➔ Add entities tables, objets and mapping
➔ Configuration, build, logs management, etc.
➔ Gatling Tests ready to use
http://jhipster.github.io
Ippon Technologies © 2015#CassandraSummit
Technical Stack
Spring Boot
➔ Build on Spring
➔ Convention over configuration
➔ Many “starters” ready to use
Services Web
➔ CXF instead of Spring MVC REST
Cassandra
➔ DataStax Enterprise
Java 8
Ippon Technologies © 2015#CassandraSummit
JHipster — Code generator
● But
➔ Cassandra was not yet supported
➔ No AngularJS nor frontend
➔ CXF instead of Spring MVC
Ippon Technologies © 2015#CassandraSummit
JHipster — Code generator
● But
➔ Cassandra was not yet supported
➔ No AngularJS nor frontend
➔ CXF instead of Spring MVC
● JHipster alpha generator
➔ Secret Generator secret used to
validate concepts before writing
Yeoman generator
Ippon Technologies © 2015#CassandraSummit
Cassandra Driver Configuration
Spring Boot Configuration
➔ No integration of driver DataStax Java Driver in Spring Boot
➔ Created Spring Boot autoconfiguration of DataStax Java Driver
➔ Use the standard YAML File
Offered to Spring Boot 1.3
➔ Github ticket #2064 « Add a spring-boot-starter-data-cassandra »
➔ Still opened
Improved by the Community
➔ JHipster version was improved by pull-request
➔ Authentication, Load-Balancer config
Ippon Technologies © 2015#CassandraSummit
create table invoice ( invoice_id timeuuid, user_id uuid static, firstname text static, lastname text static, invoice_date timestamp static, payment_date timestamp static, total_amount decimal static, delivery_address text static, delivery_city text static, delivery_zipcode text static, item_id timeuuid, item_label text, item_price decimal, item_qty int, item_total decimal, primary key (invoice_id, item_id));
Table
Ippon Technologies © 2015#CassandraSummit
Multi-criteria Search
Mandatory Criteria
➔ User (implicit)
➔ Invoice date (range of dates)
Additional Criteria
➔ Client lastname
➔ Client firstname
➔ City
➔ Zipcode
Paginated Result
Ippon Technologies © 2015#CassandraSummit
Shall we use Solr ?
● Integrated in DataStax Enterprise
● Atomic and Automatic Index update
● Full-Text Search
Ippon Technologies © 2015#CassandraSummit
Shall we use Solr ?
● We search on static columns
➔ Solr don’t support them
● We search partitions
➔ Solr search lines
Ippon Technologies © 2015#CassandraSummit
Shall we use Solr ?
● We search on static columns
➔ Solr don’t support them
● We search partitions
➔ Solr search lines
Ippon Technologies © 2015#CassandraSummit
Shall we use secondary indexes ?
● Only one index used for a query
● Hard to get good performance with them
Ippon Technologies © 2015#CassandraSummit
Index Table
Use index tables
➔ Partition Key : Mandatory criteria and one additional criterium
○ user_id
○ invoice day (truncated invoice date)
○ additional criterium
➔ Clustering columns : Invoice UUID
Ippon Technologies © 2015#CassandraSummit
Materialized view
CREATE MATERIALIZED VIEW invoice_by_firstname
AS
SELECT invoice_id
FROM invoice
WHERE firstname IS NOT NULL
PRIMARY KEY ((user_id, invoice_day, firstname), invoice_id)
WITH CLUSTERING ORDER BY (invoice_id DESC)
new in
3.0
Ippon Technologies © 2015#CassandraSummit
Search
Search on date range
➔ loop an every days in the range and stop
when there is enough result for a page
Ippon Technologies © 2015#CassandraSummit
Search Complexity
Query count
➔ For each day in date range
○ 1 query per additional criterium filled (partition by query)
➔ 1 query per item in result page (partition by query)
Search Complexity
➔ partitions by query
Example: 3 criteria, 7 days, 100 items per page
➔ query count ≤ 3 × 7 + 100 = 121
Ippon Technologies © 2015#CassandraSummit
Index — Instances
@Repository
public class InvoiceByLastNameRepository extends IndexRepository<String> {
public InvoiceByLastNameRepository() {
super("invoice_by_lastname", "lastname", Invoice::getLastName, Criteria::getLastName);
}
}
@Repository
public class InvoiceByFirstNameRepository extends IndexRepository<String> {
public InvoiceByFirstNameRepository() {
super("invoice_by_firstname", "firstname", Invoice::getFirstName, Criteria::getFirstName);
}
}
Ippon Technologies © 2015#CassandraSummit
Index — Parent Class
public class IndexRepository<T> {
@Inject
private Session session;
private final String tableName;
private final String valueName;
private final Function<Invoice, T> valueGetter;
private final Function<Criteria, T> criteriumGetter;
private PreparedStatement insertStmt;
private PreparedStatement findStmt;
private PreparedStatement findWithOffsetStmt;
@PostConstruct
public void init() { /* initialize PreparedStatements */ }
Ippon Technologies © 2015#CassandraSummit
Index — Insert
@Override
public void insert(Invoice invoice) {
T value = valueGetter.apply(invoice);
if (value != null) {
session.execute(
insertStmt.bind(
invoice.getUserId(),
Dates.toDate(invoice.getInvoiceDay()),
value,
invoice.getId()));
}
}
Ippon Technologies © 2015#CassandraSummit
Index — Insert — Prepare Statement
insertStmt = session.prepare(
QueryBuilder.insertInto(tableName)
.value("user_id", bindMarker())
.value("invoice_day", bindMarker())
.value(valueName, bindMarker())
.value("invoice_id", bindMarker())
);
Ippon Technologies © 2015#CassandraSummit
Index — Insert — Date conversion
public static Date toDate(LocalDate date) {
return date == null ? null :
Date.from(date.atStartOfDay().atZone(ZoneOffset.systemDefault()).toInstant());
}
Ippon Technologies © 2015#CassandraSummit
Index — Search
@Override
public CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID offset) {
T criterium = criteriumGetter.apply(criteria);
if (criterium == null) {
return CompletableFuture.completedFuture(null);
}
BoundStatement stmt;
if (invoiceIdOffset == null) {
stmt = findStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium);
} else {
stmt = findWithOffsetStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium, offset);
}
return Jdk8.completableFuture(session.executeAsync(stmt))
.thenApply(rs -> Iterators.transform(rs.iterator(), row -> row.getUUID(0)));
}
Ippon Technologies © 2015#CassandraSummit
Index — Search — Prepare Statement
findWithOffsetStmt = session.prepare(
QueryBuilder.select()
.column("invoice_id")
.from(tableName)
.where(eq("user_id", bindMarker()))
.and(eq("invoice_day", bindMarker()))
.and(eq(valueName, bindMarker()))
.and(lte("invoice_id", bindMarker()))
);
Ippon Technologies © 2015#CassandraSummit
Index — Search (Guava to Java 8)
public static <T> CompletableFuture<T> completableFuture(ListenableFuture<T> guavaFuture) {
CompletableFuture<T> future = new CompletableFuture<>();
Futures.addCallback(guavaFuture, new FutureCallback<T>() {
@Override
public void onSuccess(T result) {
future.complete(result);
}
@Override
public void onFailure(Throwable t) {
future.completeExceptionally(t);
}
});
return future;
}
Ippon Technologies © 2015#CassandraSummit
Service — Class
@Service
public class InvoiceSearchService {
@Inject
private InvoiceRepository invoiceRepository;
@Inject
private InvoiceByDayRepository byDayRepository;
@Inject
private InvoiceByLastNameRepository byLastNameRepository;
@Inject
private InvoiceByFirstNameRepository byLastNameRepository;
@Inject
private InvoiceByCityRepository byCityRepository;
@Inject
private InvoiceByZipCodeRepository byZipCodeRepository;
Ippon Technologies © 2015#CassandraSummit
Service — Search
public ResultPage findByCriteria(Criteria criteria) {
return byDateInteval(criteria, (crit, day, offset) -> {
CompletableFuture<Iterator<UUID>> futureUuidIt;
if (crit.hasIndexedCriteria()) {
/*
* ... Doing multi-criteria search; see next slide ...
*/
} else {
futureUuidIt = byDayRepository.find(crit.getUserId(), day, offset);
}
return futureUuidIt;
});
}
Ippon Technologies © 2015#CassandraSummit
Service — Search
CompletableFuture<Iterator<UUID>>[] futures = Stream.<IndexRepository> of(
byLastNameRepository, byFirstNameRepository, byCityRepository, byZipCodeRepository)
.map(repo -> repo.find(crit, day, offset))
.toArray(CompletableFuture[]::new);
futureUuidIt = CompletableFuture.allOf(futures).thenApply(v ->
Iterators.intersection(TimeUUIDComparator.desc,
Stream.of(futures)
.map(CompletableFuture::join)
.filter(Objects::nonNull)
.collect(Collectors.toList())));
Ippon Technologies © 2015#CassandraSummit
Service — UUIDs Comparator
/**
* TimeUUID Comparator equivalent to Cassandra’s Comparator:
* @see org.apache.cassandra.db.marshal.TimeUUIDType#compare()
*/
public enum TimeUUIDComparator implements Comparator<UUID> {
desc {
@Override
public int compare(UUID o1, UUID o2) {
long delta = o2.timestamp() - o1.timestamp();
if (delta != 0)
return Ints.saturatedCast(delta);
return o2.compareTo(o1);
}
};
}
Ippon Technologies © 2015#CassandraSummit
Service — Days Loop
@FunctionalInterface
private static interface DayQuery {
CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID invoiceIdOffset);
}
private ResultPage byDateInteval(Criteria criteria, DayQuery dayQuery) {
int limit = criteria.getLimit();
List<Invoice> resultList = new ArrayList<>(limit);
LocalDate dayOffset = criteria.getDayOffset();
UUID invoiceIdOffset = criteria.getInvoiceIdOffset();
/* ... Loop on days ; to be seen in next slide ... */
return new ResultPage(resultList);
}
Ippon Technologies © 2015#CassandraSummit
Service — Days Loop
LocalDate day = criteria.getLastDay();
do {
Iterator<UUID> uuidIt = dayQuery.find(criteria, day, invoiceIdOffset).join();
limit -= loadInvoices(resultList, uuidIt, criteria, limit);
if (uuidIt.hasNext()) {
return new ResultPage(resultList, day, uuidIt.next());
}
day = day.minusDays(1);
invoiceIdOffset = null;
} while (!day.isBefore(criteria.getFirstDay()));
Ippon Technologies © 2015#CassandraSummit
Service — Invoices Loading
private int loadInvoices(List<Invoice> resultList, Iterator<UUID> uuidIt, int limit) {
List<CompletableFuture<Invoice>> futureList = new ArrayList<>(limit);
for (int i = 0; i < limit && uuidIt.hasNext(); ++i) {
futureList.add(invoiceRepository.findOne(uuidIt.next()));
}
futureList.stream()
.map(CompletableFuture::join)
.forEach(resultList::add);
return futureList.size();
}
Ippon Technologies © 2015#CassandraSummit
Limits
● We got an exact-match search
➔ No full text search
➔ No « start with » search
➔ No pattern base search
● Requires highly discriminating mandatory criteria
➔ user_id & invoice_day
● Pagination doesn’t give total item count
➔ Could be done with additionnal query cost
● No sort availaible
Ippon Technologies © 2015#CassandraSummit
Hardware
● Hosted by Ippon Hosting
● 8 nodes
➔ 16 Gb RAM
➔ Two SSD drives with 256 Gb in RAID 0
● 6 nodes dedicated to Cassandra cluster
● 2 nodes dedicated to the application
Ippon Technologies © 2015#CassandraSummit
Application
● 5,000 concurrent users
● 9 months of data loaded
➔ Legacy system: store 1 year; search on last 3 months.
➔ Target: 3 years of history
● Real-time search Result
➔ Data are immediately available
➔ Legacy system: data available next day
● Cost Killer
PARISBORDEAUX
NANTESWASHINGTON
NEW-YORKRICHMOND
[email protected] - www.ippon-hosting.com - www.ippon-digital.fr
@ippontech-
01 46 12 48 48