the time to defer is now

The Time to Defer is Now

Michael DiamantCo-author, Scala High Performance Programming

This talk is based on material from the book

What's our goal?

Lets figure out where we want to be by the end of this talk.

Source: http://www.ew.com/sites/default/files/i/2015/04/15/wall-street-douglas.jpg

We want to be like this guy!

Does anyone know who this is?

Wall Street. I cant guarantee youll be making Michael Douglas bucks by tend the of this talk. But, you will know more about financial trading.

Let's try again

deferred evaluationscala.collection.immutable.Queuefunctional programmingperformancefinancial tradingJMHdesign tradeoffsorder bookFocus areas

Topics we'll dig into

Your takeaways

strategies/tools for designing more robust and performant software

Lets start at the bottom and see what well cover. We will focus on fp, performance and apply it to the financial trading domain.

In doing so, well dig into deferred evaluation (the name of this talk!) and the design of Scalas Queue. Our exploration will cover design tradeoffs and jmh, and we will learn about a central concept in trading: the order book.

Im excited to share my learnings with you. When we finish, you will have new strategies / tools in your toolbox for designing robust/performant software.

Lets get started!

Getting to know the order book

BidOffer

27.01 x 427.03 x 1

27.00 x 127.05 x 3

26.97 x 227.10 x 2

BidOffer

27.01 x 427.03 x 1

27.00 x 127.05 x 3

26.97 x 227.10 x 2

BidOffer

27.01 x 527.03 x 1

27.00 x 127.05 x 3

26.97 x 227.10 x 2

Buy @ 27.05ID = 7389

BidOffer

27.01 x 427.05 x 3

27.00 x 127.10 x 2

26.97 x 2

Crossing the book

Resting on the book

BidOffer

27.01 x 427.03 x 1

27.00 x 127.05 x 3

26.97 x 227.10 x 2

BidOffer

27.01 x 327.03 x 1

27.00 x 127.05 x 3

26.97 x 227.10 x 1

Canceling an order request

Buy @ 27.01ID = 1932CancelID = 5502Order book is central to trading. The order book is how traders indicate buy and sell interest. Imagine a stock like Google changing in price. Heres how the price changes.

Resting: Adds to top of book bidCrossing: Hits the single offer at 27.03Canceling: Remove the 27.10 offer

How can we model this concept?

Let's model the order book

class TreeMap[A, +B] private (tree: RB.Tree[A, B]) (implicit val ordering: Ordering[A])

object Price {
implicit val ordering: Ordering[Price] =
new Ordering[Price] {
def compare(x: Price, y: Price): Int =
Ordering.BigDecimal.compare(x.value, y.value)
}
}

case class QueueOrderBook(
bids: TreeMap[Price, Queue[BuyLimitOrder]],
offers: TreeMap[Price, Queue[SellLimitOrder]]) {

def bestBid: Option[BuyLimitOrder] = // highest price
bids.lastOption.flatMap(_._2.headOption)
def bestOffer: Option[SellLimitOrder] = // lowest price
offers.headOption.flatMap(_._2.headOption)
}

TreeMap: Modeling the importance of prices by providing fast access to the lowest/highest prices.

The definition shows the key must define an Ordering. In our case, the price defines ordering based on its underlying data type BigDecimal

Each column of the order book is represented with its own TreeMap. Looking at a particular book side, the rows are represented with Scalas Queue.

Lets get to know the Queue data structure better

Better know a queue (1/3)

package scala.collection
package immutable

class Queue[+A] protected(???) {
def enqueue[B >: A](elem: B) = ...
def dequeue: (A, Queue[A]) = ...
}

O(1)

amortized O(1)

Heres the Scala Queue with implementation omitted and underlying data structure hidden.

The question here is: What data structure or structures does Queue use?


package scala.collection
package immutable

class Queue[+A] protected(
protected val in: List[A],
protected val out: List[A]) {

def enqueue[B >: A](elem: B) = new Queue(elem :: in, out)
def dequeue: (A, Queue[A]) = out match {
case Nil if !in.isEmpty => val rev = in.reverse ; (rev.head, new Queue(Nil, rev.tail))
case x :: xs => (x, new Queue(in, xs))
case _ => throw new NoSuchElementException("dequeue on empty queue")
}
}

Deferred evaluationList.reverse: O(N)

1st thing to notice is usage of two Lists. Why bother doing that?

A linked list supports fast prepend operations. Accessing the end of the list is slow. When we dequeue, we will want to access the end of the list.

OK, great. I may have convinced you a single linked list is a suboptimal way to represent a FIFO queue. But, how does it help to have another List?

Notice when we enqueue, out is not used. The sausage is made in dequeue.

When out is empty, a single O(N) reverse operation now gives us FIFO ordering at the head of out. This is deferred evaluation!


OperationInOut

enqueue(1)List(1)Nil

enqueue(2)List(2, 1)Nil

enqueue(3)List(3, 2, 1)Nil

dequeueNilList(2, 3)

dequeueNilList(3)

enqueue(4)List(4)List(3)

dequeueList(4)Nil

dequeueNilNil

Lets walk through an example to better understand Queues behavior.

Understanding performance:
buy limit order arrives

Is there a resting sell order priced EventInstant,
ob: LazyCancelOrderBook,
id: OrderId): (LazyCancelOrderBook, Event) =
ob.copy(pendingCancelIds = ob.pendingCancelIds + id) ->
OrderCanceled(currentTime(), id)

Similar to Queue.enqueue

QueueOrderBook

LazyCancelOrderBook

Lets look at the runtime performance of both approaches when an order is canceled. Well call our new approach, LazyCancelOrderBook.

Three linear time operations become a single, effectively constant time operation.

We also get a taste of what the code backing LazyCancelOrderBook looks like. To cancel an order, the state is copied with the to-be-cancelled ID added to the Set and an event is returned to signify that the order was canceled.

This operation is analogous to Queue.enqueue.

Source: http://previews.123rf.com/images/kaarsten/kaarsten1102/kaarsten110200033/8723062-Stylized-red-stamp-showing-the-term-mission-accomplished-All-on-white-background--Stock-Photo.jpg

Job done! Presentation over, lets go home.

Not so fast!

Source: http://previews.123rf.com/images/kaarsten/kaarsten1102/kaarsten110200033/8723062-Stylized-red-stamp-showing-the-term-mission-accomplished-All-on-white-background--Stock-Photo.jpg

Will this unit test pass?

"""Given empty book |When cancel order arrives |Then OrderCancelRejected""".stripMargin ! Prop.forAll( OrderId.genOrderId, CommandInstant.genCommandInstant, EventInstant.genEventInstant) { (id, ci, ei) => LazyCancelOrderBook.handle( () => ei, LazyCancelOrderBook.empty,
CancelOrder(ci, id))._2 ====
OrderCancelRejected(ei, id)}

Public API that supports all order book operations:def handle( currentTime: () => EventInstant, ob: LazyCancelOrderBook, c: Command): (LazyCancelOrderBook, Event)

Here is a unit test that exercises an empty order book. This unit test makes use of property-based testing to setup its inputs. If this Is unfamiliar to you, dont worry.

Lets instead focus on what we conceptually expect. If the order book is empty, there is nothing to be canceled.

This is a canonical case of rejecting order cancels. Is that what we will see? Nope!

Motivating Design Question #4

Can I change any constraints to allow me to model the problem differently?

This brings us to our final motivating design question:

One constraint that would be great to remove is supporting rejects for cancels that correspond to a non-existent or already canceled order ID.

Unfortunately for us, rejecting a cancel requests is table stakes for building an order book.

But, perhaps in your domain, there are assumptions you can challenge. Its worth considering and questioning because you might greatly simplify your problem space.

Let's try again

case class LazyCancelOrderBook(
activeIds: Set[OrderId],
pendingCancelIds: Set[OrderId],
bids: TreeMap[Price, Queue[BuyLimitOrder]],
offers: TreeMap[Price, Queue[SellLimitOrder]])

Since order cancel reject support is a hard requirement, the new implementation needs additional state

Fine! Lets add another bit of state to help us handle the cancel reject requirement.

Similar to our treatment of to-be-canceled orders, we can use a Set to capture all active orders. This helps us answer the question: Does a cancel request refer to an order thats in the order book?

Rejecting invalid cancel requests

Is there a bid with matching ID?TreeMap.find: O(N)Queue.exists: O(N)Is there an offer with matching ID?TreeMap.find: O(N)Queue.exists: O(N)OrderCancelRejected No

No - Yields

QueueOrderBook

LazyCancelOrderBook

Is there an active order with matching ID?Set.contains: effectively O(1) No

OrderCancelRejecteddef handleCancelOrder(
currentTime: () => EventInstant,
ob: LazyCancelOrderBook, id: OrderId): (LazyCancelOrderBook, Event) = ob.activeIds.contains(id) match {
case true => ob.copy(
activeIds = ob.activeIds id,
pendingCancelIds = ob.pendingCancelIds + id) -> OrderCanceled(currentTime(), id)
case false => ob -> OrderCancelRejected(currentTime(), id) }

In the bottom right, our implementation is updated to reflect the introduction of the Set of active order IDs.

This means canceling an order involves two effectively constant time operations. This is still faster than the original implementation.

Resting buy order requests

Is there a resting sell order priced EventInstant, ob: LazyCancelOrderBook, lo: LimitOrder):
(LazyCancelOrderBook, Event) = lo match { case b: BuyLimitOrder =>
ob.bestOffer.exists(_.price.value ??? // Omitted case false => val orders = ob.bids.getOrElse(b.price, Queue.empty)
ob.copy(
bids = ob.bids + (b.price -> orders.enqueue(b)), activeIds = ob.activeIds + b.id) ->
LimitOrderAdded(currentTime()) } case s: SellLimitOrder =>
??? // Omitted}

Continuing to defer evaluation of canceled order requests

Source: http://images.askmen.com/1080x540/2016/03/14-032037-how_to_correctly_roll_up_your_sleeves_basics.jpg

We can't defer anymore

After dealing with cancels and resting orders, weve finally hit that point: We cant defer anymore.

Time to roll up our sleeves and figure out how to implement crossing the book.

def handleAddLimitOrder( currentTime: () => EventInstant, ob: LazyCancelOrderBook, lo: LimitOrder): (LazyCancelOrderBook, Event) = lo match { case b: BuyLimitOrder => ob.bestOffer.exists(_.price.value ob.offers.headOption.fold(restLimitOrder) { case (p, q) => ??? // We need to fill this in } case false => restLimitOrder

Goals:Find active resting sell order to generate OrderExecuted event

Remove canceled resting orders found in front of active resting sell

Canceled
ID = 1Canceled
ID = 2Canceled
ID = 3Active
ID = 4Canceled
ID = 5Active
ID = 621.07 ->

Given:

We want the following final state:

Canceled
ID = 5Active
ID = 621.07 ->

What are our goals?

What are our goals when an order crosses?

Lets use a partial implementation of handling a limit order. In this method, we focus specifically on the case where a buy order arrived and we determined that there is a matching best offer. What do we want to happen here?

Heres a simple example to illustrate what we want. After evaluating the incoming order, we want to remove all canceled orders prior to the matched active order.

Translating our goals to code (1/3)

@tailrecdef findActiveOrder( q: Queue[SellLimitOrder]): (Option[SellLimitOrder], Option[Queue[SellLimitOrder]]) = ???

Optionally, find an active order to generate execution

Optionally, have a non-empty queue remaining after removing matching active order and canceled orders

To start, lets write a method that returns a tuple of optionally the active, crossed order, optionally the remaining orders in the queue, and the set of canceled order IDs

Youll see that this method is marked with the tailrec annotation. This is a strong hint that we have a recursive solution to our problem.


@tailrecdef findActiveOrder( q: Queue[SellLimitOrder]): (Option[SellLimitOrder], Option[Queue[SellLimitOrder]]) = q.dequeueOption match { case Some((o, qq)) => ob.pendingCancelIds.contains(o.id) match { case true => findActiveOrder(qq) case false => ( Some(o), if (qq.nonEmpty) Some(qq) else None) } case None => (None, None) }

Found active order; stop recursing

Queue emptied without finding active order;stop recursing

Found canceled order;Keep recursing

The method is driven by recursively evaluating the state of the provided queue, q. Evaluation ends once either an active order is found or once the queue is empty.

The operations happening here are effectively constant time. These operations will happen N times depending on the queue size.


// Earlier segments omittedcase (p, q) => findActiveOrder(q) match { case (Some(o), Some(qq)) => (ob.copy( offers = ob.offers + (o.price -> qq), OrderExecuted(currentTime(), Execution(b.id, o.price), Execution(o.id, o.price))) case (Some(o), None) => (ob.copy( offers = ob.offers - o.price, OrderExecuted(currentTime(), Execution(b.id, o.price), Execution(o.id, o.price))) case (None, _) => val bs = ob.bids.getOrElse(b.price, Queue.empty).enqueue(b) (ob.copy(bids = ob.bids + (b.price -> bs), offers = ob.offers - p, activeIds = ob.activeIds + b.id), LimitOrderAdded(currentTime()))}

Found an active order and queue is non-empty

Found an active order and queue is empty

Since no active order was found, the price level must be empty

Lets return to the initial method definition we need to fill in. We can now invoke findActiveOrder and pattern match.

1. We found a sell order and there are orders remaining in the price level denoted as qq.2. We found a sell order and there are no remaining orders in the price level.3. We did not find a sell order. By definition, this means that we looked at all the orders in the price level.

In each case we see similar bookkeeping.

Here we pay for the deferred evaluation. While more work is being done, bear in mind that according to the historical data we reviewed, crossing the book is only the 3rd most frequent operation.

Source: http://www.sssupersports.com/wp-content/uploads/2014/11/12-rowen-f430-dashboard.jpg

But, how fast is it?

Weve rolled up our sleeves and re-implemented the order book. The code is complete, the unit tests pass. But, how much faster, if at all, is LazyCanceOrderBook than QueueOrderBook?

How can we measure the difference?

How to measure?

The 3 most important rules about microbenchmarking:1. Use JMH2. ?3. ?

Segue into performance and out of functional programming.

One way to measure performance is to write a small app that measures throughput.

How will we know the JVM is warmed up? And how will we ensure the JVM does not optimize away method calls when the return values are not used? How will we instrument state for each test?

JMH!

Will want to review the samples provided in the jmh samples to be prepared with examples of why this is a good idea

How to measure?

The 3 most important rules about microbenchmarking:1. Use JMH2. Use JMH3. ?

How to measure?

The 3 most important rules about microbenchmarking:1. Use JMH2. Use JMH3. Use JMH

The shape of a JMH test

Test stateTest configurationBenchmarksKnowing what tool to use is half the battle. How will we use it?

Writing a JMH test involves three parts:1. Defining the test state2. Defining test configuration (e.g. warm-up count, JVM settings)3. Benchmarks the actual code under test

Lets build a microbenchmark for the two order book implementations. Well start by defining the test state.

JMH: Test state (1 of 2)

@State(Scope.Benchmark)class BookWithLargeQueue { @Param(Array("1", "10")) var enqueuedOrderCount: Int = 0 var eagerBook: QueueOrderBook = QueueOrderBook.empty var lazyBook: LazyCancelOrderBook =
LazyCancelOrderBook.empty var cancelLast: CancelOrder =
CancelOrder(CommandInstant.now(), OrderId(-1)) // More state to come}

Which groups of threads share the state defined below?

What test state do we want to control when running the test?

Note var usage to manage state

In the next few slides, our goal is to get a sense of the landscape rather than exhaustively exploring each option.

The state we define is encapsulated in a class. JMH controls configuration via annotations.

Were trying to define state that will allow us to queue up varying sizes of orders in a price level within the order book.

As an example of one annotation, the @param allows us to sweep values when we execute the test.

So far all we have done is to the define the test state. No initialization yet.

JMH: Test state (2 of 2)

class BookWithLargeQueue { // Defined vars above @Setup(Level.Trial) def setup(): Unit = { cancelLast = CancelOrder( CommandInstant.now(), OrderId(enqueuedOrderCount)) eagerBook = { (1 to enqueuedOrderCount).foldLeft( QueueOrderBook.empty) { case (ob, i) => QueueOrderBook.handle( () => EventInstant.now(), ob, AddLimitOrder( CommandInstant.now(), BuyLimitOrder(OrderId(i), Price(BigDecimal(1.00)))))._1 } } lazyBook = ??? // Same as eagerBook }}

How often will the state be re-initialized?

Mutable state is initialized here

Later in the same class we can now initialize the mutable state. There is a lifecycle hook of sorts via @Setup. Within this method we can initialize the mutable state.

Here we are adding state to the order book based on the number of orders we wish to queue up.

We also configure the ID of the final order added to the book to allow us to cancel the last order. This will allow us to benchmark canceling the first and last orders.

At this point were done with our first JMH building block: test state. Lets now configure our tests.

JMH: Test configuration

@BenchmarkMode(Array(Throughput))@OutputTimeUnit(TimeUnit.SECONDS)@Warmup(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS)@Measurement(iterations = 30, time = 10, timeUnit = TimeUnit.SECONDS)@Fork(value = 1, warmups = 1, jvmArgs = Array("-Xms1G", "-Xmx1G"))class CancelBenchmarks { ... }

In another class, CancelBenchmarks we will soon be defining our benchmarks. First, we apply several annotations to define how we want the benchmarks run.

Can point to a few examples shown in the slide.

Its also worthwhile to note that these values can be provided as cli args. I like defining this configuration via annotations because it ensures a consistent testing profile.

JMH: Benchmarks

class CancelBenchmarks { import CancelBenchmarks._ @Benchmark def eagerCancelLastOrderInLine(b: BookWithLargeQueue): (QueueOrderBook, Event) = QueueOrderBook.handle(systemEventTime, b.eagerBook, b.cancelLast)

@Benchmark def eagerCancelFirstOrderInLine(b: BookWithLargeQueue): (QueueOrderBook, Event) = QueueOrderBook.handle(systemEventTime, b.eagerBook, b.cancelFirst)

@Benchmark def eagerCancelNonexistentOrder(b: BookWithLargeQueue): (QueueOrderBook, Event) = QueueOrderBook.handle(systemEventTime, b.eagerBook, b.cancelNonexistent)

// Same for LazyCancelOrderBook}

Much like a junit unit test, each benchmark is annotated with a @Benchmark.

Our benchmarks focus on three cancel scenarios weve been considering so far:1. Cancel 1st, 2. Cancel last, 3. Non-existent cancel

Worth noting that each of these tests return a non-Unit value. JMH takes care of ensuring the JVM does not remove the method invocation.

Also note that our usage of immutability ensures steady state.

Source: http://s.newsweek.com/sites/www.newsweek.com/files/2016/06/06/ai-google-red-button-artificial-intelligence.jpg

sbt 'project chapter4' 'jmh:run CancelBenchmarks -foe true'

Lets push the red button and kick off the tests!

JMH results (1 of 2)

BenchmarkEnqueued Order CountThroughput (ops per second)Error as Percentage of Throughput

eagerCancelFirstOrderInLine16,912,696.09 0.44

lazyCancelFirstOrderInLine125,676,031.50 0.22

eagerCancelFirstOrderInLine102,332,046.09 0.96

lazyCancelFirstOrderInLine1012,656,750.43 0.31

eagerCancelLastOrderInLine15,641,784.63 0.49

lazyCancelLastOrderInLine125,619,665.34 0.48

eagerCancelLastOrderInLine101,788,885.62 0.39

lazyCancelLastOrderInLine1013,269,215.32 0.30

The raw JMH output looks similar. One notable difference is that it does note express the trial error difference as percentage of throughput. I do this because I find it convenient to review. I scrutinize this value to ensure there is limited variability in test results.

What takeaways do we have here?

A couple of highlights here is that we see a clear win in throughput (higher is better) for canceling the first order independent of order size.

Why does the magnitude of the ops decrease as enqueued order count increases? These are the kinds of questions worth reflecting on to understand the results.

JMH results (2 of 2)

BenchmarkEnqueued Order CountThroughput (ops per second)Error as Percentage of Throughput

eagerCancelNonexistentOrder19,351,630.96 0.19

lazyCancelNonexistentOrder131,742,147.67 0.65

eagerCancelNonexistentOrder106,897,164.11 0.25

lazyCancelNonexistentOrder1024,102,925.78 0.24

Poll

Would you release LazyCancelOrderBook into production?

How about on a Friday?

What else would you need to do in order to be comfortable?

How about designing a test that matches the frequency of operations seen in production? Load testing in a staging environment?Analyzing memory usage? We potentially dramatically increased memory usage and changed GC patterns due to long-lived order IDs.

The goal is to make you think and consider the tradeoffs. Often pausing to consider is enough to uncover serious flaws.

And if thats not enough, then you can exercise your well-practiced rollback plan.

Motivating design questions

QuestionApplication to the order book example

What operations in my system are most performant?Executing an order and resting an order on the book are the most performant operations. We leveraged fast execution time to perform removals of canceled orders from the book.

Why am I performing all of these steps now?

Originally, order removal happened eagerly because it was the most logical way to model the process.

How can I decompose the problem into smaller discrete chunks?The act of canceling was decomposed into identifying the event sent to the requester and removing the cancelled order from the book state.

Can I change any constraints to allow me to model the problem differently?Ideally, we would have liked to remove the constraint requiring rejection of non-existent orders. Unfortunately, this was out of our control.

Before we conclude I want to rehash the motivating questions we asked ourselves while working on the order book.

Draw parallels to your own work.

Thank you!

Book repo:https://github.com/PacktPublishing/Scala-High-Performance-Programming

https://www.packtpub.com/application-development/scala-high-performance-programming

eBook: TLQQBR50Print: KIPJIU10Discounts: 50% on eBook, 10% on PrintValidity: 15th November to 19th November

the time to defer is now

Software