analyzing blockchain data with deep learning · 2019. 7. 15. · bsc thesis topic kristóf máté...

BSC THESIS TOPIC

Kristóf Máté Horváth

Graduating Student in Electrical Engineering

Analyzing Blockchain Data with Deep Learning

Bitcoin, the first cryptocurrency and the underlying blockchain technology were

established around a decade ago. The blockchain is a distributed database that contains a

chain of blocks; each block contains data of transactions, including timestamp, endpoints,

and value. This data is free to access; it is cryptographically secure and immutable. As all

the transactions are opened, it forms a challenging task to data scientist and machine

learning researchers to search patterns, anomalies and correlations within the blockchain

and also with additional data sources.

Due to the significant increase in the amount of available data, the continuous rise of

high-performance GPUs (Graphics Processing Unit) and the novel scientific results, deep

learning has become one of the most focused research area in machine learning. The tens

or even hundreds layers of deep architectures are simultaneously able to learn

representation and model the input data efficiently.

The goal of this thesis work is to investigate the blockchain technology, analyze the

transaction database, concentrating on deep learning-based methods.

The following subtasks should be elaborated:

• Overview the most important scientific papers about deep learning and

blockchain technology.

• Investigate the structure and possible origins of blockchain database and related

data source.

• Create a solution for downloading and storing these data in a Linux environment,

considering performance issues.

• Analyze the gathered data and create a possible train and test sets for deep learning

systems.

• Implement and test at least one deep neural network to jointly model transactions

and a related data feed (e.g., derived from the asset prices) in a demonstration

system.

• Evaluate the results of the demonstration system.

• Explore the possibilities of extending this solution to more complex deep learning

systems.

• Prepare detailed documentation about the work, summarize the results and

write a conclusion and possible future work.

Academic supervisor: Bálint Gyires-Tóth, PhD

Budapest, 27th September 2018

Gábor Magyar, PhD

/Head of Department/

Budapest University of Technology and Economics

Faculty of Electrical Engineering and Informatics

Department of Telecommunications and Media Informatics

Kristóf Máté Horváth

ANALYZING BLOCKCHAIN DATA

WITH DEEP LEARNING

ACADEMIC SUPERVISOR

Bálint Gyires-Tóth, PhD

BUDAPEST, 2018

Contents

1. Introduction ................................................................................................................ 1

2. Public-key cryptography ........................................................................................... 2

2.1 One-way hash functions .......................................................................................... 2

2.2 Private and Public Keys .......................................................................................... 4

2.3 The Elliptic Curve Digital Signature Algorithm (ECDSA) .................................... 6

2.3.1 Finite Fields ...................................................................................................... 6

2.3.2 The Finite Field Fp ........................................................................................... 6

2.3.3 Elliptic curves over Finite fields ...................................................................... 6

2.3.4 Group order and group structure ...................................................................... 8

2.3.5 ECDSA domain parameters ............................................................................. 8

2.3.6 Randomly generating an elliptic curve ............................................................. 9

2.3.7 Domain parameter generation ........................................................................ 10

2.3.8 ECDSA Key Pair generation, public key validation and proof of possession

of the private key ............................................................................................ 10

3. The blockchain of the Bitcoin network .................................................................. 13

3.1 Private Keys ......................................................................................................... 13

3.2 Public Keys .......................................................................................................... 13

3.3 Bitcoin addresses .................................................................................................. 14

3.4 Transactions ......................................................................................................... 15

3.4.1 Structure of a transaction ................................................................................ 16

3.4.2 Transaction inputs and outputs ....................................................................... 17

3.4.3 Transaction fees .............................................................................................. 18

3.4.4 Transaction validation conditions .................................................................. 19

3.4.5 Orphan transactions ........................................................................................ 20

3.5 The data structure of the blockchain ..................................................................... 21

3.5.1 A block’s data fields ....................................................................................... 21

3.5.2 Block Header .................................................................................................. 22

3.5.3 Merkle Trees ................................................................................................... 22

3.6 Decentralised consensus through proof of work ................................................... 24

3.6.1 Aggregation of transactions ............................................................................ 24

3.6.2 Proof of work .................................................................................................. 25

3.6.3 Validation of a new block .............................................................................. 26

3.6.4 Blockchain forks ............................................................................................. 27

4. Analyzing Bitcoin’s blockchain with deep learning algorithms........................... 28

4.1 Collecting Bitcoin’s blockchain data .................................................................... 28

4.1.1 Blockchain.com API ...................................................................................... 29

4.1.2 Drawing transaction networks with NetworkX .............................................. 29

4.1.3 Additional features from blocks ..................................................................... 32

4.1.4 Storing data in HDF5 files .............................................................................. 33

4.1.5 Volatility estimators ....................................................................................... 36

4.2 Deep learning ........................................................................................................ 38

4.3 Predicting price and volatility with different architectures ................................... 41

4.3.1 Determining the number of transactions from the transaction graphs ........... 48

4.4 Different approaches for predictions, system usage, extensions .......................... 50

4.4.1 Analyzing the correlations of block features with market data ...................... 50

4.4.2 Long Short-Term Memory network with block features ............................... 54

4.4.3 Application and integration of an operative prediction system ...................... 57

4.4.4 Possible future experiments ............................................................................ 60

5. Summary ................................................................................................................... 62

Acknowledgements ....................................................................................................... 64

References ...................................................................................................................... 65

6. Appendix .................................................................................................................... 68

A.1. Secure Hash Algorithm (SHA) ........................................................................ 68

A.2. The domain parameters of the Koblitz curve, secp256k1 ............................... 74

HALLGATÓI NYILATKOZAT

Alulírott Horváth Kristóf Máté, szigorló hallgató kijelentem, hogy ezt a szakdolgozatot

meg nem engedett segítség nélkül, saját magam készítettem, csak a megadott forrásokat

(szakirodalom, eszközök stb.) használtam fel. Minden olyan részt, melyet szó szerint,

vagy azonos értelemben, de átfogalmazva más forrásból átvettem, egyértelműen, a forrás

megadásával megjelöltem.

Hozzájárulok, hogy a jelen munkám alapadatait (szerző(k), cím, angol és magyar

nyelvű tartalmi kivonat, készítés éve, konzulens(ek) neve) a BME VIK nyilvánosan

hozzáférhető elektronikus formában, a munka teljes szövegét pedig az egyetem belső

hálózatán keresztül (vagy hitelesített felhasználók számára) közzétegye. Kijelentem,

hogy a benyújtott munka és annak elektronikus verziója megegyezik. Dékáni engedéllyel

titkosított diplomatervek esetén a dolgozat szövege csak 3 év eltelte után válik

hozzáférhetővé.

Kelt: Budapest, 2018. 12. 06.

...…………………………………………….

Horváth Kristóf Máté

Kivonat

A blokklánc egy elosztott peer-to-peer hálózat ami lehetővé teszi, hogy ismeretlen felek

biztonságosan küldjenek egymásnak tranzakciókat digitális valuták formájában. Mindez

anélkül történik, hogy egy központi felügyelő rendszer beavatkozna a tranzakciós

folyamatokba.

A Bitcoin a világ jelenlegi legértékesebb digitális valutája amivel ma már

kriptotőzsdéken és hagyományos tőzsdéken is kereskednek határidős ügyletek által. A

Bitcoin blokklánc publikus főkönyve lehetőséget teremt arra, hogy mély tanuló

algoritmusok használatával az adatokból új információkat nyerjünk és az adatok

statisztikai elemzését felhasználjuk automatizált kereskedési rendszerek tervezéséhez.

A szakdolgozatomban először bemutatom azokat az alapvető matematikai definíciókat

és műveleteket amelyek a nyilvános kulcs alapú vagy aszimmetrikus kriptográfia alapját

képezik. Az aszimmetrikus kriptográfiai módszereknek tulajdonítható a blokkláncok

biztonsága és kikényszerítik az elosztott rendszer résztvevőiből az egymás iránti

kölcsönös bizalmat.

A második fejezetben részletesen tárgyalom a Bitcoin blokklánc felépítését,

működését és adatstruktúráját.

A dolgozatom második felében bemutatom a folyamatot amely során adatokat

gyűjtöttem a Bitcoin hálózatáról és piaci értékéről. Tárgyalom az adatok hatásos tárolását,

transzformálását és elemzését amelyet mély tanuló algoritmusok segítségével végeztem

azért, hogy változók jövőbeli értékekét jelezzem előre. A dolgozat végén említést teszek

további lehetséges kutatásokról és egy működő prediktáló rendszer felhasználási

lehetőségéről.

Ma a hagyományos és kriptotőzsdéken az ügyletek legnagyobb részét automatizált

rendszerek hajtják végre. Ezek a rendszerek eliminálták az emberek érzelmi hibáját,

jobban kihasználják a mintákon alapuló felismerést, és a pontos kereskedési stratégiákhoz

való ragaszkodást, valamint a rendkívül gyors információfeldolgozást annak érdekében,

hogy minél nagyobb nyereséget érjenek el.

Abstract

Blockchain is a distributed peer to peer network, which allows clients to anonymously

and securely transfer digital currencies without the intervention of a centralized authority.

Blockchain technology is also called public ledger, because the network’s transactions

are public.

Bitcoin is the most valuable digital asset, which is traded on cryptocurrency exchanges.

The publicity of the Bitcoin ledger creates an opportunity to combine blockchain data and

deep learning algorithms in order to leverage possible new sources of information for

automated trading. In this thesis, in the first place I introduce the basic definitions,

mathematical formulas and operations of public-key cryptographic methods that facilitate

blockchain technology to operate without a central authority and establish so called

decentralised trust between anonymous parties. Then I discuss in detail the data structure

and the operation of the Bitcoin blockchain. The second half of this thesis represents the

process, through which I collected, transformed and analyzed data about Bitcoin, and

utilized the effectiveness of deep learning algorithms in order to predict future properties

of Bitcoin and its network. At the end of the document I mention a possible use case of a

prediction system and some future investigation opportunities that the thesis leaves

behind.

Traditional stock market and cryptocurrency trading are mostly based on algorithmic

trading in today’s world. The trading algorithms exploit pattern recognition, stickiness to

precise trading strategies and rapid information processing in order to beat human traders.

- 1 -

1. Introduction

Blockchain and deep learning are both outstanding computer technologies, which have

gained importance in recent years. Blockchain is a distributed peer to peer network, which

allows the network’s participants to securely and anonymously transfer digital assets to

each other without the intervention of a central authority. Deep learning is a subfield of

machine learning and it is used to analyse large sets of data and to map input variables to

desired output. In this thesis I utilize the Bitcoin blockchain data in order to predict

Bitcoin’s price and its volatility with deep learning algorithms.

In the second chapter of my thesis I introduce the concepts of cryptographic hash

functions, basic encryption schemes, private and public keys. I discuss in detail the

mathematical background of the Elliptic Curve Digital Signature Algorithm, which is

widely utilized by blockchain technology in order to generate private and public key pairs.

These concepts and innovations are the fundamental blocks of blockchains, which

facilitate the secure operation of the peer to peer networks and eliminate the necessity of

a central authority.

In the following paragraph I connect the previously introduced mathematical

background to the Bitcoin blockchain. The key pairs, adresses, the structure of

transactions and the data fields are discussed, which will be used in the subsequent

chapters. The concept of Merkle tree is also explained, which aggregates the network’s

transactions and then the proof of work algorithm is interpreted, which constitutes the so-

called decentralized consensus and creates a possibility for blockchain forks to occur.

In the fourth chapter I discuss in detail the process of data collection about the Bitcoin

blockchain, the storing, transformation and analysis of the data. I introduce transaction

graphs that I created from each Bitcoin block in order to feed them to deep neural

networks. I experimented with different convolutional neural network architectures to

predict Bitcoin’s price and its volatility from the transaction graphs and I utilized

additional block features to train long short-term memory networks. In this part of my

thesis I also summarize the results of the investigations and the detected correlations

between the Bitcoin network and Bitcoin’s market data. In the remaining of the chapter I

introduce a possible use case of an operative prediction system and I propose potential

further experiments.

- 2 -

2. Public-key cryptography

Permissionless blockchain protocols like Bitcoin are based on P2P networks,

cryptography and game theory. The participants of blockchain networks reach consensus

over which transaction is correct, without the help of a central authority. Cryptography is

used to preserve privacy and transparency at the same time. Public-key cryptography or

asymmetric cryptography is a cryptographic system that relies on a pair of keys. A private

key is kept secret and a public key can be broadcasted out to a network. The cryptographic

system ensures the authenticity and integrity of a message. Bitcoin’s wallet creation,

signing of transactions, verification of transactions and common consensus over the

network are activities of blockchain networks, which rely on public-key cryptography

techniques.

2.1 One-way hash functions

The mathematical one-way functions are the key to public-key cryptography. These

functions are the fundamental building blocks of secure communications over an insecure

channel.

One-way functions are easy to compute but almost impossible to reverse. For a given

x value, it is easy to compute f(x) but with the possession of f(x), x is not computable.

Only with the use of brute-force attacks (that is to try every possible value that might

produce f(x)) one could be able to produce the secret data, x. For this reason, many

protocols rely on one-way hash functions, because they transform valuable information

into a uniquely differentiable, fixed length data that is known as the data’s digital

fingerprint[1].

Hash functions use variable length data as input to create a hash value with fixed

length. Hash values can contain many leading zeros in order to match the required length

output. The following properties must be mathematically satisfied for cryptographic hash

functions that create digital fingerprints[2]:

• Providing hash values for any kind of data quickly

• Being pseudorandom

• Being deterministic

• Being one-way functions

• Being collision resistant

- 3 -

Providing hash values for any kind of data quickly means that the algorithm of the

function to produce a fixed length output for any kind of data should not be

computationally intensive and the output must be returned quickly.

Definition 1.1 (pseudorandom functions):

A pseudorandom function is an efficient (deterministic) algorithm[3] which is given by

an 𝑛-bit seed 𝑠, an 𝑛-bit argument x and returns an 𝑛 -bit string, denoted 𝑓𝑠(𝑥), so that it

is infeasible to distinguish the responses of 𝑓𝑠, for a uniformly chosen 𝑠, from the

responses of a truly random function.

The hash value that is returned by a pseudorandom function changes unpredictably

with the change of the input data. It should be impossible to predict the output of the hash

function with the knowledge of the input data.

Deterministic functions return identical encrypted data for the same inputs. Equivalent

data given to a hash function must have equivalent digital fingerprints, to identify them

correctly.

Definition 1.2 (one-way functions):

A function 𝑓: {0, 1}*↦ {0, 1}* is called one-way, if

• easy direction: there is an efficient algorithm which on input x outputs f(x).

• hard direction: given f(x), where x is uniformly selected, it is infeasible to find,

with non-negligible probability, a preimage of f(x). That is, any feasible algorithm

which tries to invert f may succeed only with negligible probability, where the

probability is taken over the choices of x and the algorithm’s coin tosses.

One-way functions are non-invertable, therefore it is impossible to recover the original

input data in the possession of the hash value.

Definition 1.3 (Collision-Free Hashing):

Consider a family of hash functions[4], indexed by strings, 𝐹 ≝ {𝑓𝛼 ∶ {0, 1}2|𝛼| ↦

{0, 1}|𝛼|}𝛼, so that there exists a polynomial-time algorithm for evaluating 𝐹 (i.e., on input

α and 𝑥 returns 𝑓𝛼(𝑥)). The family 𝐹 is called collision-free with respect to (w.r.t)

complexity 𝑐(∙) if for every non-uniform family of circuits {𝐶𝑛} with size bounded by

𝑐(∙) and all sufficiently large 𝑛’s, the probability that 𝐶𝑛, given a uniformly chosen 𝛼 ∈

{0, 1}𝑛 , outputs a pair (𝑥, 𝑦) so that 𝑓𝛼(𝑥) = 𝑓𝛼(𝑦), is bounded above by 1/𝑐(𝑛). The

family 𝐹 is called collision-free if it is collision-free w.r.t. all polynomials and is called

- 4 -

strongly collision-free if, for some ϵ > 0, it is collision-free w.r.t. the function

𝑓(𝑛) ≝ 2𝑛ϵ .

Collision-free functions exist assuming the intractability of factoring integers (i.e.: in

polynomial time). Strong collision-free functions exist if n-bit long integers cannot be

factored in time 2𝑛ϵ , for some ϵ > 0. Collision resistance occurs when the possibility of

creating identical hash values from two distinct inputs is approximately zero.

The abovementioned conditions must be satisfied to create digital fingerprints with a

hash function. Every definition is analogous to a human fingerprint. A human fingerprint

is quickly identifiable by a proper camera, the digital fingerprint must change when the

human finger is injured, every time when the finger is sampled it has to produce the same

digital output, for someone who sees the fingerprint it is impossible to guess the

corresponding personality and two different people will never have identical fingerprints

even if they are twins.

2.2 Private and Public Keys

The idea of asymmetric cryptography which is also known as public key cryptography

was proposed by Merkle, Diffie and Hellman in the mid-1970s. This cryptographic

standard is a set of techniques that allows two parties to communicate securely by

eliminating the possibilities for eavesdropping, tampering and impersonation attacks.

It provides:

• Encryption

• Tamper detection

• Authentication

• Non-repudiation

Two parties that want to exchange confidential information must encrypt and decrypt

the data that contain the information. The raw data, called plaintext that represents

readable information is encrypted by the sender with an encryption algorithm, using the

sender’s public key. The encryption algorithm produces an uninterpretable ciphertext,

which is transmitted over a shared medium. The receiver decrypts the ciphertext with a

private key to read the plaintext. The public and private key are interconnected, in a sense

that the public key is generated from the private key. The public key allows somebody to

encrypt data but only the owner of the private key can decrypt it. The receiver of the

information can verify that the data has not been modified during the transmission. An

- 5 -

adversary with an attempt to modify the data will cause a change in the message, thus

these harmful actions can be detected. This is called tamper detection. Authentication

provides a method to prove the identity of the sender, therefore it excludes impersonation

attacks. Non-repudiation prevents the sender from claiming later that the information was

never sent.

The mathematical definition of public key cryptography encryption scheme is defined

in the following [5].

Definition 2.2.1

Let κ ∈ ℕ be a security parameter. An encryption scheme is defined by the following

spaces in (all depending on the security parameter κ) and algorithms in Table 1.

Table 1. Spaces and algorithms of an encryption scheme

𝑀κ The space of all possible messages.

𝑃𝐾κ The space of all possible public keys.

𝑆𝐾κ The space of all possible private keys.

𝐶κ The space of all possible ciphertexts.

KeyGen

A randomised algorithm that takes the

security parameter κ, runs in expected

polynomial-time (i.e. 𝑂(κ𝑐) bit operations

for some constant c ∈ ℕ) and outputs a

public key pk ∈ 𝑃𝐾κ and a private key sk

∈ 𝑆𝐾κ.

Encrypt

A randomised algorithm that takes as

input m ∈ 𝑀κ and pk, runs in expected

polynomial time (i.e. 𝑂(κ𝑐) bit operations

for some constant c ∈ ℕ) and outputs a

ciphertext c ∈ 𝐶κ.

Decrypt

An algorithm (not usually randomised)

that takes c ∈ 𝐶κ and sk, runs in

polynomial-time and outputs either m ∈

𝑀κ or the invalid ciphertext symbol ⊥.

It is required that

Decrypt (Encrypt (m, pk), sk) = m.

if (pk, sk) is a matching key pair. It is a requirement that the fastest known attack on this

system should perform at least 2κ bit operations.

- 6 -

2.3 The Elliptic Curve Digital Signature Algorithm (ECDSA)

The Elliptic Curve Digital Signature Algorithm creates a digital signature of the input

data. The digital signature is used to verify the authenticity of the underlying data without

compromising its security. The following chapters discuss in detail the mathematics of

ECDSA.

2.3.1 Finite Fields

A finite field consists of a finite set of elements F [8]. The order of a finite field is the

number of elements in the field. A finite field of order q exists if and only if q is a prime

power. If q is a prime power, then there is essentially only one finite field of order q and

it is denoted by 𝐹𝑝. If 𝑞 = 𝑝𝑚, where p is a prime and m is a positive integer, then p is

called the characteristic of 𝐹𝑞 and m is called the extension degree of 𝐹𝑞. Most standards

which specify the elliptic curve cryptographic techniques restrict the order of the

underlying finite field to be an odd prime (q=p) or a power of 2 (𝑞 = 2𝑚).

2.3.2 The Finite Field 𝑭𝒑

Let p be a prime number. The finite field 𝐹𝑝 is called prime field. 𝐹𝑝 consists of the set of

integers {0, 1, 2, …, p-1}. The following operations are defined on 𝐹𝑝:

• Addition: If a, b ∈ 𝐹𝑝, then a + b = r, where r is the remainder when a + b is

divided by p and 0 ≤ r ≤ p – 1. This is known as addition modulo p.

• Multiplication: If a, b ∈ 𝐹𝑝, then a ∙ b = s, where s is the remainder when a ∙ b is

divided by p and 0 ≤ s ≤ p – 1. This is known as multiplication modulo p.

• Inversion: If a is a non-zero element in 𝐹𝑝, the inverse of a modulo p, denoted

𝑎−1, is the unique integer c ∈ 𝐹𝑝, for which a ∙ c = 1.

Example 1. The finite field’s 𝐹44 elements are {0, 1, 2, …, 43}. In addition,

multiplication and inverse operation respectively are: 41 + 22 = 19, 4 ∙ 12 = 4, 9−1 = 5.

2.3.3 Elliptic curves over Finite fields

Let p > 3 be an odd prime. An elliptic curve E over 𝐹𝑝 is defined by an equation of the

form 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏, where a, b ∈ 𝐹𝑝 and 4𝑎3 + 27𝑏2 ≡ 0 (𝑚𝑜𝑑 𝑝). The set 𝐸(𝐹𝑝)

consists of all points (x, y), x ∈ 𝐹𝑝, y ∈ 𝐹𝑝, which satisfy the defining equation, together

with a special point Ơ called the point at infinity.

- 7 -

Addition of two points on an elliptic curve 𝐸(𝐹𝑝) is defined according to the chord-

and-tangent rule. Let P = (x1, y1) and Q = (x2, y2) be two distinct points on an elliptic

curve E. Then the sum of P and Q, denoted R = (x3, y3), is defined as follows. First draw

the line through P and Q. This line intersects the elliptic curve in a third point. Then R is

the reflection of this point in the x-axis. The geometric description is depicted on Figure

1. In this case, 𝐸(𝐹𝑝) consists of an ellipse and an infinite curve.

Figure 1. Addition of two distinct elliptic curve points (Source: [8])

The double of P = (x1, y1), denoted R = (x3, y3) is defined as follows. A tangent line

is drawn from P. The intersection of the tangent line and the elliptic curve is -R. Then R

is the reflection of -P in the x-axis. Figure 2. depicts this process.

Figure 2. The doubling of a point on an elliptic curve (Source: [8])

- 8 -

The algebraic formulas for the sum of two points and the double of a point is derived

from the geometric description.

1. 𝑃 + Ơ = Ơ + 𝑃 for all 𝑃 ∈ 𝐸(𝐹𝑝).

2. If 𝑃 = (𝑥, 𝑦) ∈ 𝐸(𝐹𝑝), then (𝑥, 𝑦) + (𝑥, −𝑦) = Ơ. (The point (𝑥, −𝑦) is

denoted by −𝑃, is called the negative of 𝑃, and it is indeed a point on the curve.

3. Let 𝑃 = (𝑥1, 𝑦1) ∈ 𝐸(𝐹𝑝), where 𝑃 ≠ −𝑃. Then 2𝑃 = (𝑥3, 𝑦3), where

𝑥3 = (𝑦2−𝑦1

𝑥2−𝑥1)

2

− 𝑥1 − 𝑥2 and 𝑦3 = (𝑦2−𝑦1

𝑥2−𝑥1) (𝑥1 − 𝑥3) − 𝑦1

4. Let 𝑃 = (𝑥1, 𝑦1) ∈ 𝐸(𝐹𝑝), where 𝑃 ≠ −𝑃. Then 2𝑃 = (𝑥3, 𝑦3), where

𝑥3 = (3𝑥1

2+𝑎

2𝑦1)

2

− 2𝑥1 and 𝑦3 = (3𝑥1

2+𝑎

2𝑦1) (𝑥1 − 𝑥3) − 𝑦1

2.3.4 Group order and group structure

Let E be an elliptic curve over a finite field 𝐹𝑞. According to Hasse’s theorem, the number

of points on an elliptic curve (including the point at infinity) is #𝐸(𝐹𝑞) = 𝑞 + 1 − 𝑡

where |𝑡| ≤ 2√𝑞. #𝐸(𝐹𝑞) is called the order of E and t is called the trace of E. Otherwise,

the order of an elliptic curve 𝐸(𝐹𝑞) is approximately equal to the size of q of the

underlying field.

𝐸(𝐹𝑞) is an abelian group of rank 1 or 2. 𝐸(𝐹𝑞) is isomorphic to ℤ𝑛1 × ℤ𝑛2, where n2

divides n1, for unique positive integers n1 and n2. ℤ𝑛 denotes the cyclic group of order

n. Moreover, n2 divides q – 1. If n2 = 1, then 𝐸(𝐹𝑞) is said to be cyclic. Therefore, 𝐸(𝐹𝑞)

is isomorphic to ℤ𝑛1 and there exists a point 𝑃 ∈ 𝐸(𝐹𝑝) such that 𝐸(𝐹𝑞) = {𝑘𝑃 ∶ 0 ≤ k

≤ n1 − 1}. Such a 𝑃 point is called a generator point of 𝐸(𝐹𝑞).

2.3.5 ECDSA domain parameters

The domain parameters for ECDSA are an elliptic curve E, defined over a finite field 𝐹𝑞

of characteristic p and a base point 𝐺 ∈ 𝐸(𝐹𝑞). Restrictions are placed on on the

underlying field size q, the representation of the elements of 𝐹𝑞, on the elliptic curve E,

and the order of the base point. These restrictions are necessary to facilitate

interoperability and to avoid known attacks.

- 9 -

1. The field size should be an odd prime q=p, so the underlying finite field is 𝐹𝑝, the

integers modulo p.

2. An indication field representing FR, used for the representation of the elements of

𝐹𝑞.

3. An optional bit string seedE of length at least 160 bits.

4. Two field elements a and b in 𝐹𝑞 which define the equation of the elliptic curve E

over 𝐹𝑞.

5. Two field elements 𝑥𝐺 and 𝑦𝐺 in 𝐹𝑞 which define a finite point G = (𝑥𝐺 , 𝑦𝐺) (also

called Generator Point) of prime order 𝐸(𝐹𝑞).

6. The order n of the point G, with n > 2160 and n > 4√𝑞.

7. The cofactor ℎ = #𝐸(𝐹𝑞)/𝑛.

2.3.6 Randomly generating an elliptic curve

The following algorithm is a verified random method to generate an elliptic curve. The

algorithm will be referenced by Algorithm 1. for further explanations. The notations 𝑡 =

⌈log2 𝑝⌉, 𝑠 = ⌊(𝑡 − 1)/ 160⌋, and 𝑣 = 𝑡 − 160 ∙ 𝑠 are used.

Algorithm 1.: Generating a random elliptic curve over 𝐹𝑝.

Input: A field size p, where p is an odd prime.

Output: A bit string seedE of length at least 160 bits and field elements 𝑎, 𝑏 ∈ 𝐸(𝐹𝑝)

which define an elliptic curve 𝐸 over 𝐹𝑝.

1. Choose an arbitrary bit string seedE of length 𝑔 ≥ 160 bits.

2. Compute 𝐻 = 𝑆𝐻𝐴256(𝑠𝑒𝑒𝑑𝐸) and let 𝑐0 denote the bit string of length 𝑣 bits

obtained by taking the 𝑣 rightmost bits of 𝐻.

3. Let 𝑊0 denote the bit string of length 𝑣 bits obtained by setting the leftmost bit of

𝑐0 to 0. (This ensures that 𝑟 < 𝑝.)

4. Let 𝑧 be the integer whose binary expansion is given by the 𝑔-bit string 𝑠𝑒𝑒𝑑𝐸.

5. For i to 1 to s do:

- Let 𝑠𝑖 be the 𝑔-bit string which is the binary expansion of the integer

(𝑧 + 𝑖) 𝑚𝑜𝑑 2𝑔.

- Compute 𝑊𝑖 = 𝑆𝐻𝐴256(𝑠𝑖).

6. Let 𝑊 be the bit string obtained by concatenating 𝑊0, 𝑊1, … , 𝑊𝑠 as follows:

𝑊 = 𝑊0 ∥ 𝑊1 ∥ ⋯ ∥ 𝑊𝑠.

7. Let 𝑟 be the integer whose binary expansion is given by 𝑊.

- 10 -

8. If 𝑟 = 0 or if 4𝑟 + 27 ≡ 0 (𝑚𝑜𝑑 𝑝) then go to step 1.

9. Chose arbitrary integers 𝑎, 𝑏 ∈ 𝐸(𝐹𝑝), not both 0, such that 𝑟 ∙ 𝑏2 ≡ 𝑎3 𝑚𝑜𝑑 𝑝.

10. The elliptic curve chosen over 𝐹𝑝 is 𝐸 ∶ 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏.

11. Output(𝑠𝑒𝑒𝑑𝐸, 𝑎, 𝑏)

2.3.7 Domain parameter generation

There are several ways to generate cryptographically secure domain parameters. Some of

the existing methods that used in practice are the Koblitz Curves [9], Atkin-Morain

method [10] and Schoof’s algorithm [11]. The following method is one way to generate

secure domain parameters:

1. Select coefficients a and b from 𝐹𝑞 verifiably at random using Alg.1. Let E be the

curve 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏.

2. Compute 𝑁 = #𝐸(𝐹𝑞).

3. Verify that 𝑁 is divisible by a large prime 𝑛 (𝑛 > 2160 and 𝑛 > 4√𝑞 ). If not,

then go to step 1.

4. Verify that 𝑛 does not divide 𝑞𝑘 − 1 for each 𝑘, 1 ≤ 𝑘 ≤ 20. If not, then go to

step 1.

5. Verify that 𝑛 ≠ 𝑞. If not, then go to step 1.

6. Select and arbitrary point 𝐺′ ∈ 𝐸(𝐹𝑞) and set 𝐺 = (𝑁/𝑛)𝐺′. Repeat until 𝐺 ≠ Ơ.

2.3.8 ECDSA Key Pair generation, public key validation and proof of

possession of the private key

A specific ECDSA key pair can be characterized with the elliptic curve domain

parameters: 𝐷 = (𝑞, 𝐹𝑅, 𝑎, 𝑏, 𝐺, 𝑛, ℎ). The entity that possesses the key pairs must assure

that the domain parameters are valid. The ECDSA key pair generation consists of the

following three steps:

1. Select a random or pseudorandom integer d in the interval [1, 𝑛 − 1].

2. Compute 𝑄 = 𝑑𝐺.

3. The public key is 𝑄, the private key is d.

The private key is a randomly generated integer and the public key is derived from the

private key by multiplying the base point.

The validation of the public key is required to avoid known attacks and errors, such as

malicious insertion of an invalid public key and inappropriate coding or transmission.

- 11 -

The following algorithm referenced by Algorithm 2. validates that an associated public

key with the private key exists. However, it is not an assurance of the existence of the

private key nor the possession of it.

Algorithm 2.: Explicit validation of an ECDSA public key.

Input: A public key 𝑄 = (𝑥𝑄 , 𝑦𝑄) associated with valid domain parameters:

(𝑞, 𝐹𝑅, 𝑎, 𝑏, 𝐺, 𝑛, ℎ).

Output: Acceptance or rejection of the validity of 𝑄.

1. Check that 𝑄 ≠ Ơ.

2. Check that 𝑥𝑄 and 𝑦𝑄 are properly represented elements of 𝐹𝑄 (integers in the

interval [0, p-1]).

3. Check that 𝑄 lies on the elliptic curve defined by 𝑎 and 𝑏.

4. Check that 𝑛𝑄 = Ơ.

5. If any check fails, then 𝑄 is invalid, otherwise 𝑄 is valid.

The ECDSA signature generation and verification is the consequence of all previously

mentioned methods. Transmission of information between two parties and proof that the

message was originated from a trusted and authentic source is described in the following.

ECDSA signature generation is signing a message m. An entity A with domain

parameters 𝐷 = (𝑞, 𝐹𝑅, 𝑎, 𝑏, 𝐺, 𝑛, ℎ) and associated key pair (𝑑, 𝑄) does the following:

1. Select a random or pseudorandom integer 𝑘, 1 ≤ 𝑘 ≤ 𝑛 − 1.

2. Compute 𝑘𝐺 = (𝑥1, 𝑦1) and convert 𝑥1 to an integer 𝑥1′.

3. Compute 𝑟 = 𝑥1 𝑚𝑜𝑑 𝑛. If 𝑟 = 0, then go to step 1.

4. Compute 𝑘−1 𝑚𝑜𝑑 𝑛.

5. Compute 𝑆𝐻𝐴256(𝑚) and convert this bit string to an integer 𝑒.

6. Compute 𝑠 = 𝑘−1(𝑒 + 𝑑𝑟) 𝑚𝑜𝑑 𝑛. If 𝑠 = 0 then go to step 1.

7. A’s signature for the message m is (𝑟, 𝑠).

In order to verify A’s signature (𝑟, 𝑠) on m, B obtains an authentic copy of A’s domain

parameters 𝐷 = (𝑞, 𝐹𝑅, 𝑎, 𝑏, 𝐺, 𝑛, ℎ) and associated public key 𝑄. It is also recommended

for B to validate the domain parameters D and the public key Q. To verify the signature

B does the following:

1. Verify that 𝑟 and 𝑠 are integers in the interval [1, 𝑛 − 1].

2. Compute 𝑆𝐻𝐴256(𝑚) and convert this bit string to an integer 𝑒.

3. Compute 𝑤 = 𝑠−1 𝑚𝑜𝑑 𝑛.

4. Compute 𝑢1 = 𝑒𝑤 𝑚𝑜𝑑 𝑛 and 𝑢2 = 𝑟𝑤 𝑚𝑜𝑑 𝑛

- 12 -

5. Compute 𝑋 = 𝑢1𝐺 + 𝑢2𝑄.

6. If 𝑋 = Ơ, reject the signature. Otherwise convert 𝑥-coordinate 𝑥1 of 𝑋 to an

integer 𝑥1′ and compute 𝑣 = 𝑥1′ 𝑚𝑜𝑑 𝑛.

7. Accept the signature if and only if 𝑣 = 𝑟.

If a signature (𝑟, 𝑠) on a message m was indeed generated by A, then

𝑠 = 𝑘−1(𝑒 + 𝑑𝑟) 𝑚𝑜𝑑 𝑛. Rearranging gives

𝑘 ≡ 𝑠−1(𝑒 + 𝑑𝑟) ≡ 𝑠−1𝑒 + 𝑠−1𝑟𝑑 ≡ 𝑤𝑒 + 𝑤𝑟𝑑 ≡ 𝑢1 + 𝑢2𝑑 (𝑚𝑜𝑑 𝑛).

Thus 𝑢1𝐺 + 𝑢2𝐺 = (𝑢1 + 𝑢2𝑑)𝐺 = 𝑘𝐺 and so 𝑣 = 𝑟 as required.

- 13 -

3. The blockchain of the Bitcoin network

Bitcoin is a virtual network with separated participants, rules and a digital currency. The

network consists of participants that operate the network, follow the same rules and thus

eliminate the need for a central authority. Users of the network can transfer digital

currency, called bitcoin to other peers. Every transaction is validated by the network’s

operators and added to the public ledger, called the blockchain[22]. The blockchain is a

chain of blocks, which aggregate transactions. The security of the network is maintained

despite its publicity through emergent decentralised consensus between the network

operators or mining nodes by using cryptographic hash functions and by taking the

advantages of these functions. In comparison to traditional bank account numbers, public

keys are used to generate public addresses to receive currencies. Private keys represent

the ownership of goods, which can be transferred to other peers of the network. I utilized

Mastering Bitcoin: unlocking digital cryptocurrencies[6] book in the following

investigation of the Bitcoin blockchain.

3.1 Private Keys

A private key authorizes its owner to access and spend the bitcoin funds, which belong to

a specific account or bitcoin address. A private key is a random number generated by a

cryptographically secure source of entropy. It can be any number between 1 and 1.1568 ∗

1077 − 1, slightly less than 2256 − 1. This number is the same as the order of the elliptic

curve that secp256k1 defines. In general, SHA-256 algorithm (See Appendix, A.1. Secure

Hash Algorithm (SHA) for further details) is used to generate this number by feeding the

algorithm with a large string of random bits. The private key is almost never shown to the

owner. Different software wallets use different methods for the generation of a private

key, like using the underlying operating system random number generators to produce

256-bits of entropy or using the user’s mouse movements for generation. The most secure

way is to use a one-way hash function, which produces random sequence of bits.

3.2 Public Keys

The public key is calculated from the private key using elliptic curve multiplication,

which is previously described. The secp256k1 Koblitz curve (For further details, see

Appendix, A.2. The domain parameters of the Koblitz curve, secp256k1) with its

predefined properties is used to produce irreversible steps on an elliptic curve and to

- 14 -

create the public key. 𝐾 = 𝑘 ∗ 𝐺, where 𝑘 is the private key, 𝐺 is the generator point and

𝐾 is the public key. The operation is non-invertable, thus it is impossible to find the

private key from the public key. Because 𝐺 is the same for all bitcoin users, a private key

multiplied by G will always result in the same public key. The multiplication of the

generator point 𝐺 with 𝑘 is the same as adding 𝐺 to itself 𝑘 times in a row, according to

the mathematics of elliptic curves over finite fields. Below Figure 3. shows the iterative

process of drawing a tangent line on the point 𝐺, then finding where it intersects the curve

then reflecting that point on the x-axis. This procedure repeats itself for 2𝐺, 4𝐺, … , 𝑘 ∗ 𝐺.

Figure 3. Visualization of the multiplication of a point G by an integer k on an elliptic

curve (Source: [6])

3.3 Bitcoin addresses

A bitcoin address is like a bank account number. It represents an account that is eligible

for receiving bitcoins. It can be shared with anyone who wants to send bitcoins to the

owner of the account and it is also publicly available in bitcoin’s public ledger. Anyone

can query a specific address with the corresponding holdings, however, the account

remains anonym. Bitcoin addresses are produced from the public keys and begin with the

digit 1. SHA-256 and RIPEMD-160 hash functions are used in combination with the

public key 𝐾, to produce a bitcoin address 𝐴. Equation (1). represents the generation of

A.

𝐴 = 𝑅𝐼𝑃𝐸𝑀𝐷160(𝑆𝐻𝐴256(𝐾)) (1)

- 15 -

Because the outer function is RIPEMD-160[27], the resulting address is 160-bit (20

byte) number. In the interest of the user, Base56Check encoding is used by software

wallets to represent a bitcoin address in human readable and shorter format. This type of

encoding was developed for use in bitcoin. It is a subset of Base-64, which uses 26 lower

case letters, 26 capital letters, 10 numerals and two more characters for encoding character

spaces. Base-58 is Base-64 without 0 (number zero), O (capital o), l (lower L), I (capital

i) and the symbols \, +, /. Base58Check additionally introduces a four bytes long built-in

error-checking code to detect and prevent transcription and typing errors. Software

wallets check mistyped bitcoin addresses, so they do not get accepted as a valid

destination for a transaction, therefore the funds cannot get lost this way.

3.4 Transactions

Bitcoin transactions represent transfer of value from one party to another. Like other non-

digital currencies, bitcoin can be divided into smaller units, called Satoshis. One bitcoin

is equal to 10-8 Satoshis. Bitcoin’s decentralized system is resistant to traditional

inflationary effects, because the maximum available bitcoins are fixed at 21 million.

However, this quantity does not circulate in its enclosed system, currently. New bitcoins

are added to the system itself, by the activity of miners, while they validate the

genuineness of spendable transactions, until the circulating bitcoins reach the 21 million.

When someone would like to send bitcoins to another party the transaction should be

signed cryptographically by the appropriate private key, representing the ownership. The

signed transaction then propagated to the bitcoin network to a few nodes. These nodes

validate the signature and if the transaction is validated successfully, it is broadcasted to

more peers until it reaches every node. Transactions does not contain any confidential

information about the users, so they can be propagated through insecure networks like

NFC, Wifi, etc. Once a transaction becomes valid, it is sent to a common pool that collects

transactions, called the memory pool. Operators of the bitcoin networks, called miners,

compete to summarize the collected transactions in a block, which is then added to the

blockchain, also called a public ledger. It is public, because every peer in the network can

check and query information about the anonymous transactions. The only thing that

matters is a person who wants to spend the bitcoins, factually has the rights to spend them.

The incentive behind the reliable activity of the miners is the newly created bitcoins with

every new block. After every 210,000 transaction blocks, the reward amount is halved,

until the total number of circulating coins will reach the fixed amount. When the network

- 16 -

was launched with the mining of the so-called genesis block, the reward was 50 bitcoins

for every new block. The current reward is 12.5 bitcoins on the 22nd of October 2018.

3.4.1 Structure of a transaction

In general, there are two kind of transactions, normal and Coinbase transactions. Normal

transactions are the transactions used by parties to transfer value to each other. These

transactions have inputs and outputs. Unlike traditional bank accounts, these input and

output values belongs to private keys instead of identities. Once the private key is lost,

the corresponding funds are also lost forever because of the huge address space. Coinbase

transaction is the first transaction in every new block and it only has outputs, usually to a

miner’s address who has successfully created the block.

Table 2. describes a transaction data structure.

Table 2. Data structure of a transaction

Field Description Size

Version Version control for

software updates and

developments

4 bytes

Input Counter Number of inputs 1-9 bytes (VarInt)

Inputs Transaction inputs Variable

Output Counter Number of outputs 1-9 bytes (VarInt)

Outputs Transaction outputs Variable

Locktime Unix timestamp or block

number 4 bytes

Transactions form chains to almost infinity. They lock spendable bitcoins, which

change their owner from time to time. The chain can be inspected by a block explorer on

online sites by inspecting transaction inputs recursively. The lock time field is used to

define timing conditions when the transaction can be added to the blockchain. The field’s

value is above 500 million, it is interpreted as a Unix Epoch timestamp and the transaction

is not included in the blockchain prior to the specified date. If the lock time is between

zero and 500 million, it is interpreted as a block height (blocks are indexed by integer

numbers, called block height), which specifies the block index from when the transaction

can be included in the blockchain.

- 17 -

3.4.2 Transaction inputs and outputs

In the lowest level every circulating quantity of bitcoin is locked to the appropriate owner

by unspent transaction outputs, called UTXO-s. UTXO-s are the basic elements of every

transaction. Hundreds and thousands of UTXO-s can belong to an identity who wants to

spend bitcoins. Wallet software that provides convenient methods for the usage of the

bitcoin network collects all UTXO-s that belong to a specific person to display the

available balance. The bitcoin network nodes also maintain a database that contain every

UTXO and ownership pairs. Therefore, when a transaction is created, it consumes the

adequate amount of UTXO-s, unlocks it with the signature of the current owner, creates

an UTXO and locks it to the new owner. Although transactions are anonymous with

sophisticated methods the frequent uses of the same bitcoin addresses can lead to a

traceback to the owner. In consequence, developed wallet software takes advantage of

different public address creation methods and the available address space by creating a

change address for every transaction. This changed address will hold the remaining

UTXO-s that are not spent by the user.

UTXO-s are tracked by full node bitcoin clients and stored in a database held in

memory, called the UTXO pool. New transactions are created by consuming one or more

of these unspent outputs. Locking scripts are used to specify the conditions, which should

be satisfied to spend the outputs or coins. Tabe 3. describes the data fields of a transaction

output.

Table 3. Data structure of a transaction output


Amount Transferable value

denominated in

Satoshis

8 bytes

Locking-script size Locking script

length in bytes 1-9 bytes (VarInt)

Locking-script

A script that defines

the conditions

required to spend

the output

Variable

Transactions identified by their hashes, which are produced by SHA-256 hash

function. A transaction’s inputs are pointers to UTXO-s, which point to the transaction

hash and a sequence number that identifies the UTXO record in the blockchain. UTXO-

- 18 -

s can only be spent if the unlocking-script satisfies the required conditions. The script

contains a signature which proves the ownership of the address, that the UTXO-s belong

to. Tabe 4. describes the data fields of a transaction’s input.

Table 4. Data structure of a transaction input


Transaction hash Pointer to UTXO

referenced by the

transaction

32 bytes

Output index The index number of the

UTXO, starts from 0 4 bytes

Unlocking-script size Unlocking-script length in

bytes 1-9 bytes (VarInt)

Unlocking-script A script that satisfies the

spending conditions Variable

Sequence number Currently disabled feature 4 bytes

3.4.3 Transaction fees

Miners compete for bitcoin rewards that they earn by the successful summarization of the

transactions into a new block. The new block is then appended to the chain of blocks. The

winning miner also earns transaction fees for each transaction that is summarized into the

block. Mining fees and block rewards serve as an incentive to prevent any malicious

activity or abuse against the network. The transaction fees are calculated based on the

transaction size in kilobytes. However, the network users who spend their bitcoins can

also determine the fees that they are willing to pay to the miners. Although, miners

obviously prioritize transactions by fees, so common market forces between the peers

prevail. There is a minimum fee that is currently fixed at 0.0001 bitcoin.

Transaction fees are calculated as the sum of the input UTXO-s minus the sum of the

output UTXO-s as described with Equation (2).

𝐹𝑒𝑒𝑠 = 𝑠𝑢𝑚(𝑖𝑛𝑝𝑢𝑡𝑠) − 𝑠𝑢𝑚(𝑜𝑢𝑡𝑝𝑢𝑡𝑠)

(2)

For this reason, wallet software calculates the fees based on the current market

conditions that determine the prevailing fees on the market. In most applications fees are

also adjustable by the users, giving them the opportunity to prioritize their urgency.

The age of the UTXO-s that are being spent in a transaction input also determines the

priority of the transaction.

- 19 -

The priority is calculated with Equation (3).

𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 =

𝑠𝑢𝑚(𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑖𝑛𝑝𝑢𝑡 ∗ 𝑖𝑛𝑝𝑢𝑡 𝑎𝑔𝑒)

𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒

(3)

The value is denominated in satoshis and the age of the input is measured by the blocks

that have elapsed since the transaction is recorded on the network, therefore the age is an

expression by how many blocks deep into the blockchain the transaction is. High priority

transactions can be validated without any fees if the transaction fits into the remaining

space of the block. The original block size was 1 megabyte but by the adaptation of a new

bitcoin protocol called SegWit, it is increased to 2-megabyte. A transaction has a higher

priority if its priority exceeds 57,600,000 which is one bitcoin denominated in satoshis

and aged one day, approximately 144 blocks in a transaction with a size of 250 bytes as

described with Equation (4).

100,000,000 𝑠𝑎𝑡𝑜𝑠ℎ𝑖𝑠 ∗ 144 𝑏𝑙𝑜𝑐𝑘𝑠

250 𝑏𝑦𝑡𝑒𝑠= 57,600,000

(4)

In every bitcoin block the first 50 kilobytes are preserved for high priority transactions,

without the consideration of the transaction fees. The remaining space is filled with

transactions that pay the minimum fee, prioritizing the highest fees on a per kilobyte basis.

The transactions that remain in the memory pool get older as new blocks are added to the

chain.

3.4.4 Transaction validation conditions

A bitcoin node verifies several criteria to consider a transaction to be valid. If the

transaction satisfied all conditions, the transaction is propagated to the connected nodes,

otherwise it is discarded. The following list of criteria is validated when a transaction is

received by a node:

• The transaction’s syntax and data structure must be correct

• Neither lists of inputs or outputs are empty

• The transaction size in bytes is less than MAX_BLOCK_SIZE

• Each output value and the total must be within the allowed range of values (more

than 0 and less than 21 million coins)

• None of the inputs have Coinbase transaction

• nLockTime is less than or equal to INT_MAX

• The transaction size in bytes is greater than or equal to 100

- 20 -

• The number of signature operations contained in the transaction is less than the

signature operation limit

• The unlocking script (called scriptSig) can only push numbers on the stack and

the locking script (called scriptPubkey) must match is standard forms (rejection

of nonstandard transactions)

• A matching transaction in the pool or in a block in the main branch must exist

• For each input, if the referenced output exists in any other transaction in the pool,

rejects the transaction (prevention of a double spending)

• For each input, look in the main branch and the transaction pool to find the

referenced output transaction. If the output transaction is missing for any input,

this will be an orphan transaction. Add this transaction to the orphan transactions

pool, if it is not already in the pool

• For each input, if the referenced output transaction is a Coinbase input, it must

have at least COINBASE_MATURITY (100) confirmations

• For each input there must be a referenced output that is not spent

• Reject if the sum of input values < sum of output values

• Reject if the transaction fee would be too low to get into an empty block

• The unlocking scripts for each input must validate against the corresponding

output locking scripts

3.4.5 Orphan transactions

Transactions form a chain. In this chain the previous parent transaction outputs are spent

by the child transaction and the child transaction outputs by the grandchildren and so on.

There are different kind of complex transactions, like CoinJoin transaction where

transactions are joined together by multiple parties to protect their privacy. In this case, it

can happen when the chain of the transactions depends on each other. Transactions are

transmitted to peers and they do not always arrive in the same order. Because the child’s

signature is required before the parent is signed, a situation can emerge when the child

references its parent transaction that is not yet known for the node. The node instead of

rejecting the transaction puts it into a temporary pool that is known as the orphan pool.

Thereafter the transaction waits in the pool while its parent arrives with the correct UTXO

reference. The orphan pool is stored in memory and for this reason the total number of

- 21 -

transactions that can be stored is fixed by a constant called

MAX_ORPHAN_TRANSACTIONS.

3.5 The data structure of the blockchain

The data structure of blockchain forms a back-linked list of blocks that contain

aggregation of transactions. Each block in the chain is identified by a hash that is

generated using SHA-256 hash function and by an integer index called the block height.

Each block contains a reference to the previous block, called the parent block within its

header field. The links point back to the previous hashes constitute a chain, where every

element is cryptographically connected to each other. The hashes from the most basic

level, from aggregating the transactions to linking the blocks to each other are calculated

based on the previous values. Therefore, if anyone tries to forge a value in a transaction

or anywhere in the blocks, the whole links of the chain also changes. Changes are detected

by the common proof of work algorithm immediately and are rejected by the nodes. The

network’s property that every information is encapsulated in a chain and relies on

previous elements provides bitcoin strong and unbreakable security.

3.5.1 A block’s data fields

A block is characterized by four data fields, each consisted of different length and

meanings. The Block Size field contains the size of the block in bytes. The Block Header

contains several fields and the summary of them is 80 bytes. The Transaction Counter is

a variable integer and indicates the number of transactions that are settled in a block. The

Transactions field contains the recorded transactions for the block with a variable length.

Table 5. describes a block’s data fields.

Table 5. Data fields of a bitcoin block


Block Size The size of the block in

bytes 4 bytes

Block Header Different fields that form

the block’s header 80 bytes

Transaction Counter Number of transactions in

the block Variable, from 1 to 9 bytes

Transactions The transactions that

construct the Merkle Tree

Variable

- 22 -

3.5.2 Block Header

The block header consists of six fields each contains different metadata. Table 5.

represents the the data fields of a block’s header.

Table 6. Data fields of a bitcoin block’s header


Previous Block Hash A hash reference to the

previous block in the chain 32 bytes

Merkle Root Hash of the Merkle Tree’s

root, summarizing the

block’s transactions

32 bytes

Timestamp The estimated creation

time of this block

(UnixEpoch)

4 bytes

Difficulty Target Difficulty target of the

proof of work algorithm 4 bytes

Nonce A counter used for the

proof of work algorithm 4 bytes

Version Software version number 4 bytes

3.5.3 Merkle Trees

A block summarizes transactions in a data structure called Merkle Tree. A Merkle Tree

is a Binary Hash Tree which is used to efficiently summarize and verify the integrity of

large datasets. Merkle Tree’s structure is similar to the mathematical tree structure, except

it contains cryptographic hashes.

Transactions need to be validated at a given time that are collected by the network

to encapsulate them in a block. A merkle tree is constructed in a recursive manner.

Transactions that are collected in a pool are used as inputs to a one-way cryptographic

hash function. This function is usually Secure Hash Algorithm 2 with 256 bits output

(SHA-256). After hashing the transactions individually, they are concatenated in binary

hash pairs and then hashing the concatenations again. This process recursively repeats

itself until only one transaction hash remains, the merkle tree’s root.

Let’s consider a simple example by constructing a merkle tree. There are four

transactions collected in the pool, A, B, C and D. The transactions data is hashed by

applying SHA-256 on each.

HA = SHA256(SHA256(Transaction A))

HB = SHA256(SHA256(Transaction B))

- 23 -

The same procedure is repeated on every remaining transaction, in this example on C

and D. These hashes are the leafs of the merkle tree. A parent node is constructed from

every binary pairs by concatenating the 32 bytes hashes, producing a 64-byte string. On

this string SHA256 is used twice to produce the parent node’s hash a 32-byte string.

HAB = SHA256(SHA256(HA + HB))

This process is repeated for every remaining leaf pairs and then for parents as well, as

illustrated on Figure 4.

Figure 4. Aggregation of transactions in a Merkle tree structure

The top node of the merkle tree is the Merkle Root and its data is stored in the block

header, summarizing all underlying transaction’s data. No matter how many transactions

are included in a block, the merkle root always summarizes them to 32 bytes.

The recursive cycle of constructing a merkle tree can be generalized for every

number of even transactions to construct trees of any size. If there is an odd number of

transactions, the last transaction hash will be duplicated to create an even number of leafs,

also known as a balanced tree.

This data structure is very efficient to store information because 2*log2(N)

calculations are maximally needed to check if a specific element is included in the tree.

- 24 -

3.6 Decentralised consensus through proof of work

Bitcoin mining is a process through which transactions are validated and added to the

public ledger by the network’s mining nodes. Mining is incentivized by mining rewards

that the competitors can earn by every new block creation. This reward is halved

approximately every four years or 210,000 blocks, until the reward will reach 1 satoshis.

After about 2140 new bitcoins will not be issued and miners will exclusively receive

reward through mining fees. The main purpose of mining is to secure the bitcoin network

by forcing the network’s participants to individually validate every transaction. Validated

transactions become part of a block that is added to the blockchain. Since then, new

owners of bitcoin can spend their received currencies. New blocks are added to the

blockchain by miners, who solve cryptographic hash puzzles by computing trillions of

hashes, searching for the appropriate hash that matches the network’s so-called difficulty

target. This process is called proof of work, an algorithm through decentralised consensus

emerge and propagate through the bitcoin network.

Bitcoin has no central authority. Every node stores the public ledger they can trust.

The decentralised consensus emerges by the independent operation of the mining nodes.

However, their operation is independent, but they follow the same rules. Mining nodes

independently verify each transaction based on a list of criteria that is described in section

2.4.4. The transactions are aggregated into new blocks and a field value is added to the

block header which proves that the miner satisfied the work that is required to add a new

block. Every new block is verified by every node, then the new block is added to each

miner’s chain independently. The nodes select the main chain with the most cumulative

computation.

3.6.1 Aggregation of transactions

Transactions are validated immediately when they are received. Valid transactions then

added to the memory pool, where they are reserved until they are mined. When a node

receives a new block, it checks that if transactions in the memory pool are included in the

new block and in this case, it removes them. Transactions are prioritized by the age of the

UTXO that is being spent in their inputs. Transactions with high priority can be sent

without any fees. In every block the first 50 kilobytes of the transaction space are reserved

for high priority transactions, regardless of fees. The rest of the block is filled with

transactions that pay the minimum fee, preferring those with the highest fee on a per

- 25 -

kilobyte basis. If there is a remaining space in the block it can be filled with transactions

without fees. Transactions that remain in the memory pool get older, therefore their

priority will increase over time. Transactions are aggregated in a merkle tree structure as

described in section 2.5.3. When a node solves the hash puzzle, it constructs a generation

transaction or Coinbase transaction that has no inputs and its output references the miner’s

bitcoin address. The reward is calculated based on the block height and on the halving

fact after every 210,000 blocks. The mining fee is added to the reward and they together

represent the output of the Coinbase transaction. The generation transaction has the

following data structure, as described with Table 7.

Table 7. Data structure of a Coinbase transaction


Transaction hash All bits are zero, because it

is not a transaction hash

reference

32 bytes

Output index All bits are ones 4 bytes

Coinbase data size Length of Coinbase data 1-9 bytes (VarInt)

Coinbase data Arbitrary data used for

extra nonce and mining

tags

Variable

Sequence number All bits are ones 4 bytes

3.6.2 Proof of work

Mining is the process through trillions of hash values created with the SHA-256 hash

function [22]. After transactions are aggregated by a node, it creates a block header with

the appropriate fields as described in section 2.5.2. The node then repeatedly hashes the

block header by changing the Nonce field at every iteration randomly, until it matches a

criterion. Because the output of the hash function is unpredictable, the solution can only

be found by trial and error, similarly like a brute force approach. The criteria that the hash

value of the block header must match is the network’s difficulty target. The difficulty

target is represented as a coefficient/exponent format, where the first two hexadecimal

digits represent the exponent and the next six hexadecimal digits represent the coefficient.

Equation (5). is used to calculate the difficulty target:

𝑡𝑎𝑟𝑔𝑒𝑡 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 ∗ 2(8∗(𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡−3)) (5)

- 26 -

Let’s consider an example. A block explorer website, https://www.blockchain.com/ is

used to search for block 318,516. On the site the bits field specify 405675096, which is

in hexadecimal and can be represented as 0x182E1C58. The exponent 0x18 and the

coefficient is 0x2E1C58. Using the formula, the target is 302,191,2 ∗ 2168 in decimal.

This huge value is converted to a binary value, depicted on 256 bits produces a number

that has approximately sixty leading zeros. The miner who created block 318,516 had to

produce a hash value that is less than this target value, like a hash value with 65 leading

zeros. While creating a new block the winning miner produces a hash value from the

block header, by varying the nonce field of the header until the hash value is less than the

network’s difficulty target. When the correct nonce is found, the block is created and

propagated through the network, where each node independently validates the new block

and adds to its own blockchain.

The bitcoin network’s difficulty target is dynamically adjusted by the network’s source

code based on the computational or hashing power that operates the system. Blocks are

created on average every 10 minutes and difficulty is adjusted to keep this pace. If block

creation is slower, the difficulty decreases otherwise it increases. Because the difficulty

is independent of the number of transactions or anything else, the hashing power

represents market forces as new miners enter the market to achieve the reward.

3.6.3 Validation of a new block

Once a mining node find a solution for the hash puzzle, it propagates the new block

through the network. The peers independently verify the block by checking the following

criteria:

• The block data structure is valid

• The block header hash is less than the target difficulty

• The block timestamp is less than two hours in the future

• The block size is within the limit

• The first transaction is a Coinbase transaction

• All transactions are valid

If a block does not match the conditions each node rejects the block. If a block is

rejected, the competition restarts, otherwise the race begin for the next block. The

individual validation of every transaction and block enforces a common consensus

between the nodes, therefore excluding the opportunity that some nodes cheat the system.

https://www.blockchain.com/

- 27 -

The decentralised consensus is achieved through the rules that every node follows to

validate transactions and blocks.

3.6.4 Blockchain forks

The bitcoin network’s topology is a loosely connected mesh like object where every node

is interconnected with a few other peers. Because peers are not connected with every other

node, the information propagation is limited in time. A situation can consist for a short

time, when two different newly mined blocks are added to the same chain, or in other

words, two different chains compete to be considered as the main chain. Due to the

bitcoin’s network protocol this situation happens on average every week. However,

according to the protocol nodes must select the longest chain with the most cumulative

difficulty that represents the most proof of work. Blockchain forks under normal

conditions are temporary inconsistencies between versions of the blockchain, which are

resolved by the reconvergence as new blocks are added to one of the forks. Blockchain

forks can also occur when there is an upgrade in the network’s protocol and a considerable

percentage of the nodes decide to follow the new rules. In this case, there is no

reconvergence and both chains will exist. This incident is called a hard fork.

- 28 -

4. Analyzing Bitcoin’s blockchain with deep learning

algorithms

Machine learning is a data analysis method that automates analytical model building. The

purpose of machine learning algorithms is to predict software applications outcome

without explicitly programming them to do so. These kinds of algorithms operate on huge

datasets, each contains millions of records with several features, which describe each

record. The learning algorithms require preprocessed datasets that fill the requirements

of the specific algorithms, for appropriate operation and results. Several model

architectures exist that can solve different mathematical problems by recognising hidden

patterns in the datasets.

In this thesis I utilize the Bitcoin blockchain data in order to predict bitcoin’s price and

volatility. Each record of the dataset belongs to a specific date thus it is a temporal dataset.

It is not a time-series dataset, because the time intervals between the samples are not

equal. The dataset must be split up to train, validation and test sets for machine learning

models in order to train and test them. Because of the temporal property of the dataset,

the three separated datasets must be ordered sequentially in time. Different models are

trained on the train dataset and during the training, the performance of the models is

evaluated on the validation dataset. After each model finished the training process, they

are applied on the test dataset in order to make predictions for bitcoin’s price and volatility

values. The train, validation and test sets contain 51712, 6463 and 6459 records from

2017.01.02. to 2018.01.02., from 2018.01.02. to 2018.02.15. and from 2018.02.15. to

2018.04.05. respectively.

In this thesis at first, a graph representation of blockchain transactions in each block is

examined in order to forecast the afore mentioned target values. The main idea behind

this approach is the assumption, that if unique structures represent each block’s

transaction network in different times of market conditions, then using these transaction

network’s bitcoin’s price and volatility could be predicted.

4.1 Collecting Bitcoin’s blockchain data

Data mining is a process through data is collected, processed and transformed in order to

feed machine learning algorithms with the properly formatted data.

- 29 -

4.1.1 Blockchain.com API

Blockchain.com is a bitcoin block explorer website that provides an application

programming interface (API)1. An API is a set of standardized requests that define the

proper way for an application to request services from another application. Because

bitcoin blockchain size is hundreds of gigabytes, I used Blockchain.com’s API to query

bitcoin blocks.

Python is the most used language for machine learning problems, therefore I exploited

its capabilities in this research to achieve my goals. Blocks, transactions, addresses and

balances can be queried through blockchain.com’s API in different ways. At first, I

queried every block from 2017.01.02. to 2018.04.05. There is a specific https request

provided by the API, which enables users to query blocks. I wrote a function that

generates datetime objects from the Python Datetime library. The function generates

datetime objects from the start to the end date day by day and then it converts the dates

to milliseconds. The millisecond date format is the proper data format that the API query

needs. The function then returns a list of milliseconds. I made a https request for every

element of the list to get the mined blocks for those specified days. For one call, the API

responds with the blocks that contain block heights, block header hashes and times when

the blocks were approximately created. The times property is in a UNIX epoch format,

which is a common format worldwide and measures the time that elapsed from UTC

1970.01.01. 00:00:00. The block hashes are needed for further data collection. Another

API call that blockchain.com provides is a request, through individual blocks can be

queried by their block header hashes. The individual blocks contain every information

described in the previous sections. List of transactions are contained in each block with

further lists of input and output transactions that belong to a specific transaction. I

separated these transactions from each block’s data in order to create and visualize them

in mathematical graph structures.

4.1.2 Drawing transaction networks with NetworkX

NetworkX is a Python package for the creation, manipulation and study of the structure,

dynamics and functions of mathematical graph networks[13]. Almost every graph

structure and algorithm used for analyzing networks are implemented in this library. I

used NetworkX to build transaction networks from each bitcoin block’s transactions,

separately.

1 https://blockchain.info/api downloaded at: 2018.09.20

https://blockchain.info/api

- 30 -

I chose a class called MultiDiGraph, which is a graph type with directed edges. This

type permits multiple directed edges between nodes. In a bitcoin transaction, inputs and

outputs of a specific transaction do not correspond to each other explicitly. Inputs are

collected from remaining UTXO-s and can be spent to different destinations, like multiple

addresses and to a change address that is used to provide more anonymity for the user.

For this reason, I added an auxiliary node to every transaction that collects inputs from

and emits outputs to addresses. Figure 5. illustrates the problem and my solution.

Figure 5. Illustration of a bitcoin transaction

Nodes of the MultiDiGraph network represent bitcoin addresses and edges represent

transactions between them. The edges also store the bitcoin amount that is transferred

between addresses, although these cannot be visualized efficiently because of the density

of the networks.

Figure 6. depicts an example bitcoin transaction with two input and output

transactions, which I drew and visualized with NetworkX and Matplotlib.

Figure 6. Illustration of a Bitcoin transaction created with NetworkX

- 31 -

It can be seen, that the locking script represented by the middle red node with a long

sequence collects two UTXO-s of 0.01615 and 0.1897 Bitcoins, which were sent to two

output addresses. One output address received 0.2 Bitcoins and the other received the

remaining from the input UTXO-s. Presumably, the later address was the change address,

where the original owner of the coins kept the remainder of his UTXO-s, which was not

spent.

I draw the graphs with Fruchterman-Reingold algorithm, which is implemented in

NetworkX. It is a force-directed algorithm. Force-directed algorithm’s goal is to

aesthetically satisfy the display of graphs, that have huge number of nodes and edges[14].

The algorithm places the nodes in two or three-dimensional space in a way that as less

edges intersect each other as could. This is achieved by the application of Hooke’s law

on the nodes. The nodes are simulated with repulsive force, but the adjacent nodes have

attractive force too. The acting force between the nodes can be calculated with the Kawai

algorithm, which assigns force values to the nodes that are proportionate with the shortest

path between the nodes. After the system converged to an equilibrium state the adjacent

nodes have equal length edges, while the non-adjacent nodes are placed farther from each

other. In total, I draw 64,636 pictures about transaction networks in Bitcoin blocks. Each

picture depicts and individual block. Figure 7. represents some of the pictures with

different density.

Figure 7. Transaction networks in Bitcoin blocks

These pictures are trimmed with a Python library called PIL. They originally contained

superfluous white spaces at the edges, because of NetworkX’s plotting. Each picture is

identified by the corresponding block’s block height. The files were saved with png

extensions, in 1024x1024 resolution.

- 32 -

4.1.3 Additional features from blocks

I created additional features from each block’s data. These features are described in

Table 8.

Table 8. Features calculated from Bitcoin blocks

Feature Description

Block height Indexes of blocks in time order as they

were created

Creation time An approximation when the block was

created

Number of transactions The number of transactions contained in

each block

Block size The size of the block in kilobytes

Nonce The data that solves the hash puzzle

Block hash The header hash of the block

Average transaction size The average transaction size in the block

Mining fee The mining fee denominated in Bitcoin

Mining fee in USD The mining fee denominated in USD

All reward The reward for the block creation plus

the mining fee denominated in Bitcoin

All reward in USD The reward for the block creation plus

the mining fee denominated in USD

Difficulty target The network’s difficulty

Total BTC output All Bitcoin that were transferred in the

block denominated in Bitcoin

Total BTC output in USD All Bitcoin that were transferred in the

block denominated in USD

The first six features in the table can be explicitly extracted from each block’s data

fields.

The average transaction size can be calculated by iterating through every transaction

and extracting their sizes from the ‘size’ data field.

The mining fee can be calculated from the Coinbase transaction, which is the first

transaction in every block. The zero indexed transaction ‘out’ field’s zero indexed ‘value’

field contains the miner’s earnings for the creation of the block. The earnings are

denominated in Satoshis, thus I divided it by 10^8 to get the Bitcoin representation. The

- 33 -

mining reward was 12.5 Bitcoin throughout the period that I have investigated, so I

subtracted it from the earnings in order to get the mining fee.

The difficulty can be calculated as described in section 2.6.2. The block’s ’bits’ field

contains the value, which first two and the remaining digits represent the coefficient and

the exponent, respectively. The result of the equation can then be calculated in binary,

hexa or decimal numeral system. The result is an extremely large number but for machine

learning I had to represent it in decimal instead of hexadecimal.

The total bitcoin output of each block can be calculated with nested loops. The block’s

‘tx’ field contains every transaction and then every transaction’s ‘out’ field contains the

outputs for each transaction. These can be summed together in order to get all the Satoshis

that were transferred in the block. It can be changed to Bitcoin as I previously described.

Those features that are denominated in USD were calculated by multiplying the

appropriate features with Bitcoin’s USD market value at the block’s creation time. I

obtained a minutely price dataset from Kaggle, as I will describe later in a section. The

features were collected in a Pandas DataFrame format and were saved as a comma

separated values (‘.csv’) file.

4.1.4 Storing data in HDF5 files

Hierarchical Data Format version 5 (HDF5) is a file format, which is developed to store

and hierarchically organize huge amount of data[15].

Basically, the file structure is simplified to contain two different objects:

• Datasets, which are multidimensional arrays and contain the same data types

• Groups, which are the storage of datasets and additional groups

Using this structure, a completely hierarchical file structure is created where the stored

data is accessed with POSIX syntax, like: /path/to/resource. Additional metadata is stored

in user defined attributes, which are attached to either groups or datasets. The power of

HDF5 lies in its property that it can read and write huge amount of data. An example for

this is chunked storage, by which the user can pre-define an arbitrary smaller size of a

bigger dataset. These smaller chunks can be accessed instead of the whole dataset, which

hardly fits into the memory. For example, an image of size 1024x1024 can be stored in

64x64 pixels of blocks. The slices of the blocks are indexed with a binary tree to preserve

their order. Different filter operations and compression techniques can be defined on the

slices. The filter operations consist of checksum, adding metadata, or any other operations

- 34 -

wanted to be used on the slices. For compression, GZIP, SZIP, LZF or other third-party

filters can be chosen.

My storing solution creates a ‘BTC_dataset.hdf5’ named file if it does not exist

already, in the directory that the main programme code is called from. After creating the

file, the programme checks if the file contains a transaction_matrices named group. If not,

the function creates one, otherwise it appends the actual block’s network graph’s

adjacency matrix and the graph picture to the transaction_matrices group and identifies

them with the corresponding block height. I attached additional metadas to the datasets,

like creation time, block height, nonce, total number of transactions, aggregated

transaction fees, cost per transaction on average, total output value and estimated

transaction value. Some of these additional features are calculated from each block

independently. The block height also identifies the blocks sequentially in time as well. I

stored the adjacency matrices of the graphs with GZIP compression and the shuffle

parameter turned on. GZIP is the simplest and most portable compression method. Every

HDF version contains GZIP and it operates on every HDF5 file format. With the shuffle

option the speed of the compression is optimized. I tried all built in compression methods

and the smallest file sizes were achieved with GZIP. Figure 8. illustrates the storing

structure of my solution.

Figure 8. The storing structure of blockchain data

- 35 -

I also created another HDF5 file that contains separated train, validation and test sets.

These datasets can be represented as vectors with shapes of (51712, 128, 128, 3), (6463,

128, 128, 3), (6459, 128, 128, 3) respectively to the length of each dataset. The first

number of the shapes represent the number of pictures in the train, validation and test

sets. The images about block transaction networks were originally created in 1024x1024

pixels resolution, which is quite large size for machine learning algorithms because of

their GPU memory requirements. Therefore, I resized the pictures with CV2 Python

library[21] to 128x128 size. The number 3 in the shapes corresponds to the 3 colour

channels, RGB. The resized version of the previously introduced pictures on Figure 8.

can be seen on Figure 9.

Figure 9. Resized transaction graph pictures

I added six more vectors to the HDF5 file, which contains target values for every

dataset. Each dataset is coupled with weighted prices (market value of Bitcoin

denominated in USD) and volatilities, thus two target values. The test set contains target

values in order to verify and evaluate machine learning models’ performance after they

have completed their training.

The vector representation is needed, because some learning models require time

distributed sequences of data. These models have an inbuilt memory and they make

predictions after sequences, rather than after each input.

- 36 -

4.1.5 Volatility estimators

In finance, volatility is the degree of variation of a trading price series over time[16]. It

can be measured by standard deviation or variance between returns. A tradable asset, like

a security, currency or market index is considered riskier the higher the volatility is.

Historical volatility measures a time series of past market prices, while implied volatility

is compared against historical volatility to see if an underlying asset is cheap or not.

Market data that describes an underlying asset can be obtained with different time

resolutions. For example, it can be obtained by ticks (usually means seconds), minutely,

hourly, 4-hourly, daily or weekly. These are the most common resolutions. Time intervals

that are larger than tickly are represented with 5 values. These values are the underlying

asset’s open, high, low, close prices and its volume (OHLCV). Let’s consider an example:

For an interval that starts from 14:00 and holds up to 15:00, the open price represents

the price at 14:00. The high and low prices represent the highest and lowest price of an

asset during the examined period and the close price is the price at 15:00. Volume is the

quantity of an asset that changed hands during the trading period.

In the following, a few volatility estimators, their advantages and disadvantages will be

introduced. Table 9. contains notations that will be used.

Table 9. Notations used in volatility estimator formulas

Notation Description

N The chosen sample size

F A scaling factor, equals to the amount of

trading days in a year

𝑜𝑖 The 𝑖𝑡ℎ open price in a time interval

ℎ𝑖 The 𝑖𝑡ℎ highest price in a time interval

𝑙𝑖 The 𝑖𝑡ℎ lowest price in a time interval

𝑐𝑖 The 𝑖𝑡ℎ close price in a time interval

𝑥′ The average of 𝑥𝑖-s, also called drift

Volatility is defined as the annualised standard deviation of logarithmic returns. Close-

to-close volatility is the usual measure for historical volatility. It requires at least 5

samples to be used.

Close to close volatility is calculated with Equation (6):

𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝑐𝑙𝑜𝑠𝑒 𝑡𝑜 𝑐𝑙𝑜𝑠𝑒 = 𝜎𝑐𝑐 = √𝐹

𝑁 − 1√∑(𝑥𝑖 − 𝑥′)

𝑁

𝑖=1

= √𝐹

𝑁 − 1√∑ 𝐿𝑛(

𝑐𝑖

𝑐𝑖−1)

𝑁

𝑖=1

(6)

- 37 -

The Parkinson estimator is the first advanced volatility estimator created by Parkinson

in 1980. It uses high and low instead of closing prices. The drawback of the estimator is

that it assumes continuous trading, therefore it underestimates the volatility as potential

movements when the market is shut are ignored. Today, there are exchanges that provide

pre and after-hours trading, which is isolated from normal trading hours and markets.

These markets are characterized by high volatility and low liquidity.

The formula of the Parkinson estimator is represented by Equation (7):

𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝑃𝑎𝑟𝑘𝑖𝑛𝑠𝑜𝑛 = 𝜎𝑃 = √𝐹

𝑁√

1

4 𝐿𝑛(2)∑(𝐿𝑛 (

ℎ𝑖

𝑙𝑖))2

𝑁

𝑖=1

(7)

An extension of the Parkinson estimator is the Garman-Klass estimator, which

includes opening and closing prices. It also underestimates the volatility because it

ignores overnight jumps.

The Garman-Klass estimator is represented by Equation (8):

𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝐺𝑎𝑟𝑚𝑎𝑛−𝐾𝑙𝑎𝑠𝑠 = 𝜎𝐺𝐾 = √𝐹

𝑁√∑

1

2(𝐿𝑛 (

ℎ𝑖

𝑙𝑖))2 − (2𝐿𝑛(2) − 1)(𝐿𝑛 (

𝑐𝑖

𝑜𝑖))2

𝑁

𝑖=1

(8)

The Garman-Klass estimator was modified by Yang-Zhang in order to let it handle

jumps. The measurement assumes zero drift hence it overestimates the volatility if an

underlying asset has a non-zero mean return.

The modified formula is described by Equation (9):

𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦𝐺𝐾𝑌𝑍 = 𝜎𝐺𝐾𝑌𝑍

= √𝐹

𝑁√∑(𝐿𝑛 (

𝑜𝑖

𝑐𝑖−1

))2 +1

2(𝐿𝑛 (

ℎ𝑖

𝑙𝑖

))2 − (2𝐿𝑛(2) − 1)(𝐿𝑛 (𝑐𝑖

𝑜𝑖

))2

𝑁

𝑖=1

(9)

Bitcoin is traded on so called cryptocurrency exchanges or crypto currency exchanges.

These exchanges allow customers to trade digital currencies, like Bitcoin for other digital

assets or traditional fiat money. The main difference between traditional and crypto

exchanges is that the later operate permanently, without no closing hours. Crypto

exchanges usually provide functional API-s to their customers to implement automated

trading, based on several strategies.

In my experiment, I used the so called BVOL Annualized Historical Volatility Index,

which is a common estimator for Bitcoin’s volatility in crypto community.

- 38 -

The calculation of the index is represented by Equation (10):

𝐵𝑉𝑂𝐿 𝐼𝑛𝑑𝑒𝑥 = 𝑆𝑡𝑑𝑒𝑣 (𝐿𝑛 (

𝑃1

𝑃0) , 𝐿𝑛 (

𝑃2

𝑃1) , … , 𝐿𝑛 (

𝑃𝑖

𝑃𝑖−1)) ∗ √365

(10)

For 𝑃𝑖, I used weighted prices that were provided in the minutely sampled Bitcoin price

dataset, which I obtained from Kaggle2. This dataset, called Bitcoin Historical Data

contains one-minute Bitcoin’s price data from Bitstamp and Coinbase exchanges. The

dataset is updated frequently. I resampled it by 5-minute, which means I obtained a

dataset that contains volatility values for every 5-minute. In Equation 10. 365 denotes the

trading days of Bitcoin in a year, so I replaced this value with 288, which is the number

of 5 minutes in a day.

4.2 Deep learning

Artificial intelligence or AI was born in the 1950s. The field was first emerged in order

to automate intellectual tasks by computers that normally humans perform. AI is general

field that encompasses machine learning and deep learning but also includes other

approaches that do not involve any learning[23].

Symbolic AI was the dominant paradigm in AI from the 1950s to the late 1980s. In

this approach, experts believed that human-level artificial intelligence could be achieved

by sufficiently large set of explicit rules, programmed into machines. Symbolic AI

provided satisfactory solutions to solve logical problems, such as playing board games.

However, it turned out to be useless in cases when explicit rules were needed to solve

more complex tasks, like fuzzy problems, image classification, categorization tasks,

speech recognition and language translation. A new subset of AI emerged, by the help of

mathematicians, called machine learning. Questions like could a computer automatically

learn rules by looking at data and learn on its own how to perform a specified task led to

a new programming paradigm. In machine learning, humans input the data and the

answers, which are expected from the data as well and models create rules. These rules

after the models are trained can be applied to new data to produce answers. Machine

learning models are trained on huge amount of data to find statistical structures in them

and come up with rules to automate specific tasks. Therefore, they are trained rather than

explicitly programmed. Machine learning started to become more popular in the 1990s,

2 https://www.kaggle.com/mczielinski/bitcoin-historical-data#bitstampUSD_1- min_data_2012-01-

01_to_2018-06-27.csv, downloaded at: 2018.09.20.

https://www.kaggle.com/mczielinski/bitcoin-historical-data#bitstampUSD_1-min_data_2012-01-01_to_2018-06-27.csv

https://www.kaggle.com/mczielinski/bitcoin-historical-data#bitstampUSD_1-min_data_2012-01-01_to_2018-06-27.csv

- 39 -

when appropriate hardware was built in order to perform sufficiently enough calculations,

needed by these models.

Every machine learning task requires input data, which properly describes the feature

space and from which the output can be presumably calculated. Examples of the expected

outputs needed to bound the specific inputs to the desired outputs. Mathematical functions

are used to measure the performance of the machine learning algorithms and make an

adjustment to optimize them. The optimization, when the parameters of the algorithms

are updated called learning. Machine learning models consist of layers. These layers learn

different and unique representations from the input data to associate inputs with outputs.

Each layer is parameterized by its weights. During the training the optimal parameters of

the layers are searched to make the whole network correctly map the inputs to the

associated targets. The loss function measures how far the network current output is from

the true target. It calculates a distance score in order to measure the difference. An

optimizer then adjusts the weights in a direction that lowers the loss score for the current

inputs. The first and most common optimizer that enabled machine learning to gain space

is called Backpropagation. The weights initially set to random values and they are updated

after a batch of inputs. A batch can be the whole, a subset of the dataset or even one input

data. The typical batch sizes that are used are the powers of two. There is no explicit rule

to determine the optimal batch size that results in the best performance, it is up to trial

and error. An epoch is one iteration of a cycle, through which the whole train dataset is

fed to the network once. Usually, models are trained through tens or hundreds of epochs,

until their performance improves. Figure 10. describes a block diagram of a machine

learning model’s training process.

Figure 10. A block diagram of a machine learning model

- 40 -

Deep learning is a specific subfield of machine learning. It is called deep, because

networks in this field often have hundreds or thousands of successive layers. The main

idea behind this approach is that different layers learn different patterns and hierarchical

representations about the input data. A model’s depth is the number of layers a model

has. Neural networks are the most often used models in deep learning. They were initially

developed by theories, which were based on the understanding of the human brain, but

currently there is no evidence that the brain implements mechanism like those used in

deep learning models.

In the early 2000s, companies that focus on the production of massively parallel chips

called graphical processing units or GPUs, developed products that become capable to

run huge deep neural networks, which require millions of matrix multiplications and

tensor operations. These chips were first used to gamers to render complex 3D scenes real

time, but later AI experts started to write implementations of neural networks in order to

run them on GPUs. Today, the most advanced chips capable to execute hundreds of

teraflops per a second, which is the number of floating-point operations in a second. Deep

learning has achieved many real-world applications and large companies started to

develop specialized hardware, like Google’s tensor processing unit or TPU. Nowadays,

there are several libraries that contain implementation of deep learning models, like

Keras, Theano, Tensorflow and Pytorch. These libraries can be installed to support CPU

or GPU devices based on the users’ equipment.

The most advanced deep learning applications and pioneering breakthroughs are the

followings:

• Language translation

• Near-human level autonomous driving

• Digital assistants

• Improved ad targeting systems

• Improved search engines

• Chatbots

• Board and computer games played by AI, which defeats humans

Artificial intelligence continuously transforms how people live. Although, there are

pioneering achievements which are the results of AI, the technology’s true potential is

likely to has not come to the surface yet.

- 41 -

4.3 Predicting price and volatility with different architectures

Convolutional neural networks, also known as convnets are used in computer vision

applications. These networks commonly used for image-classification problems.

Different convnet architectures are designed and implemented by group of artificial

intelligence experts from large technological companies. The mathematical building

blocks of these models are implemented in several machine learning libraries like in

Keras. The models are trained on huge datasets, which contain millions of images. After

the models completed the training their inner state with the learned weights was saved to

files. The saved files can be loaded with the learned weights in order to reuse these

models. However, the models can be retrained and modified arbitrarily to exploit their

architectures only and reuse them to solve different problems. In this thesis, I chose the

following architectures, in order to feed them with transaction graph pictures:

• Inceptionv3 [17]

• MobileNet [18]

• NASNet Mobile [19]

• DenseNet 121 [20]

In Keras, models and layers can be added on top of each other. Therefore, in all cases

when I used the afore mentioned models, the implementation of the final architectures

was the same. At first, I added the convolutional base to the sequence, for example

Inceptionv3. I loaded the model with 128x128x3 input shape and I specified in the

argument that not to include weights and the top layer. In the original architectures, the

top layer was used for classification tasks, but volatility and weighted price prediction is

a regression problem where the target values are continuous. After the convolutional base,

I added a Flatten layer, which makes a one-dimensional vector from tensors. It is followed

by a fully-connected Dense layer with size 512 and a Rectifier activation function, called

Relu. The top layer that I finally added is a Dense layer with 1 neuron, with linear

activation function. I set the convolutional base to trainable in order to make its weights

updatable.

I chose Root Mean Square Prop, called RMSprop optimizer to update the model

weights during training. I left the optimizer parameters to the defaults, as it is suggested

by the developers of Keras. The loss function that I monitored was Mean Squared Error

or MSE.

- 42 -

Callbacks are functions which can be applied to influence the models at given stages

during the training procedure. They can be used to view internal states and make pre-

defined adjustments. I used EarlyStopping, which was applied to monitor the validation

loss during the training with a patience value of 20. This function stopped the training

process, if the validation loss has not decreased for 20 epochs or training cycles. I set the

initial learning rate to 0.1, which was decreased by 0.02, if the validation loss has not

improved for the last 5 epochs. This was achieved by the ReduceLROnPlateu function. I

exploited ModelCheckpoint capabilities to save the best model to an HDF5 file during

the training with the corresponding weights. Each model’s learning attributes are saved

to comma separated values or ‘.csv’ files with CSVLogger after each epoch.

In order to feed each model with data, I used the ImageDataGenerator class. I rescaled

each image’s pixels between 0-1, which can be done at the initialization of the class.

ImageDataGenerator has different methods to generate batches of data. I chose the

’flow_from_dataframe’ method, because my target values were stored in ‘.csv’ files with

the corresponding block height values. Block heights also identified the pictures of

transaction networks. Keras automatically infers the file extension from the names, if the

extension is not provided.

Table 10. depicts the Pandas DataFrame object, which was used to generate the input

and target values for each convnet.

Table 10. Table of input and target values

In the generator function I set the class mode to ’other’, which is the parameter that

should be used for regression, the shuffle parameter to ’False’ in order to keep the

temporal property of the dataset, the target size to 128x128 and the batch size equal to 16.

I tried larger batch sizes as well, but all of them resulted in resource exhausted errors, due

to lack of GPU memory.

- 43 -

For the test generator, I moved the test dataset pictures to a separated directory, called

test. Keras ’flow_from_directory’ method explicitly yielded the test files from this

directory in order to test the models after training.

I initially set the number of epochs to 100, but all training processes exited around 30

epochs since the validation loss has not improved, therefore the callback function stopped

the training.

The following figures, Figure 11. and Figure 12. summarize the performance of the

models during the training:

Figure 11. Train and validation loss, when the price of Bitcoin was the target value

- 44 -

Figure 12. Train and validation loss, when the volatility of Bitcoin's price was the target

value

It can be seen on Figure 11-12. that the train and validation losses decreased from huge

value ranges. They converged after a few epochs to intervals where they settled and

oscillated for the remaining epochs.

- 45 -

On Figure 13-14., the convergences of the loss functions can be better seen.

Figure 13. Enlarged picture of train and validation loss, when weighted price was the

target

Figure 14. Enlarged picture of train and validation loss, when volatility was the target

Although the loss of each model was decreased, huge amount of loss remained.

Therefore, it was expected that the models could not learn unique patterns from the

transaction networks in order to predict the target values, weighted prices and volatilities.

- 46 -

Figures 15-18. illustrate the predictions for both target values. I zoomed into some

diagrams for better visibility.

Figure 15. Each model’s prediction for Bitcoin's weighted price

Figure 16. Each model's prediction for the volatility of Bitcoin's price

- 47 -



Figures 15-18. confirm the previous assumption, that the models were not able to learn

the target values from the pictures. In the case of weighted price targets, only DenseNet

could predict notable range of values. The other models averaged the targets and predicted

constant values. For the volatilities, NasNet could predict highly oscillating values.

The training of each model held up to 2-3 days on Nvidia TitanX GPU, with 12 GB

memory, which access was provided by my department. I tried a data augmentation

technique, the rotation of the pictures randomly, training with grayscale images and I

divided the datasets to volatile and non-volatile periods as well, but none of the attempts

- 48 -

ended with different results. It can also be tried to train the models with higher resolution

images than 128x128x3 pixels, but due to lack of GPU memory it was not an option.

4.3.1 Determining the number of transactions from the transaction graphs

As the previous section revealed, different architectures were not able to associate the

pictures of transaction networks with price fluctuations and price values. However, I

made further experiments through which I attached different target values to the convnets,

like the number of transactions in each block.

On Figure 19., the predictions of different convnet architectures can be seen for the

number of transactions. It is very interesting that three different models were able to

determine the cardinality of transactions right from the pictures. It means that the pictures

have some representational ability. The best predictors were NasNet and Inceptionv3,

both achieved a test Root Mean Squared Error (RMSE) of 157.

Unfortunately, this experiment has no practical application, because the number of

transactions in a block can be explicitly queried from the blockchain. However, the results

are interesting and provide reasons for further research.

- 49 -

Figure 19. Predictions of convnets for the number of transactions

- 50 -

4.4 Different approaches for predictions, system usage, extensions

I devote this chapter to introduce new experiments and results, future investigation

opportunities and an application of an operative prediction system.

Recurrent neural networks (RNN) process sequences of data by iterating through the

elements in the sequences. These networks have an internal loop in order to maintain a

state that contains information relative to the input sequences. Simple RNNs are unable

to learn long-term dependencies due to the vanishing gradient problem[28], which arise

from the layer depth of neural networks. Long Short-Term Memory (LSTM) algorithm

was developed to solve the vanishing gradient problem. It is capable to carry information

across several timesteps hence this algorithm has an inbuilt memory.

4.4.1 Analyzing the correlations of block features with market data

The block features that I calculated from each Bitcoin block and presented previously are

continuous variables. The values of these variables provide information about the Bitcoin

blockchain.

The diagrams on Figure 20. reflects the values and the corresponding 150 long moving

averages of block features, Bitcoin’s price and volatility. It can clearly be seen that

Bitcoin price started a long-term rally from about July 2017. There is a correlation

between the upward tendency of the number of transactions in a block, the mining fee

and the total Bitcoin amounts in a block with Bitcoin’s price although, this is not true for

the all bull market when Bitcoin’s price was on an uptrend. The number of transactions

and the mining fee started to increase from about August 2017. The Bitcoin amount that

was transferred in each block with a little lag, started to increase from about October

2017. It has reached its peak value well before the bear market started about 2017.12.18.

It is noticable that as Bitcoin’s price started a downward tendency, the network’s

usability also started to drop. The drop in the number of transactions and the amounts of

transferred Bitcoins confirm this statement. As Bitcoin mining became less profitable the

network’s difficulty also decreased, because significant hashing power headed out from

the system. Therefore, the network automatically adjusted its difficulty target according

to the represented hashing power in the system.

- 51 -

Figure 20. Diagrams of block features and their 150 long moving averages

- 52 -

The following matrix on Figure 21. describes the correlation between the previously

mentioned variables:

Figure 21. Correlation matrix of block features and market data

The correlation matrix on Figure 21. represents the correlation coefficients between

the variables. If the coefficients are closer to 1, the correlation is stronger and if the

coefficients are in the negative territory, it represents negative correlation. I created the

matrix by resampling the dataset by daily frequency. This means that I averaged the

variables on a per day basis. The matrix shows there are positive correlations between

Bitcoin’s price, the mining fee, the size of the transactions and trivially with the volatility.

However, because this matrix was created from both bull and bear market data, the

correlation values are misleading.

- 53 -

The following two figures, Figure 22-23. represent the correlation matrices of

separated bull and bear markets:

Figure 22. Correlation matrix of Bitcoin’s bull market

On Figure 20., the fluctuation of the correlating features is clearly visible during the

bull trend of Bitcoin’s price. Therefore, on Figure 22. the afore mentioned correlations

are not as strong as in the following bear market where value of reward, block size, mining

fee, the number and size of transactions in a block and the total circulating Bitcoins in the

network strongly correlated with the downward movement of Bitcoin’s price.

- 54 -

The correlations during the bear market are visible on Figure 23. The conclusion of

the analysis is the following: During a bull market the Bitcoin’s blockchain utilization by

the network’s users increases, while during a bear market it decreases.

Figure 23. Correlation matrix of Bitcoin’s bear market

4.4.2 Long Short-Term Memory network with block features

LSTM networks are designed to process sequential or time series data. These networks

are capable to utilize previous values from a sequence in order to forecast the next values.

The length of the input and output sequences can be arbitrarily defined with the time lags

as well. Time lag is the number of time steps left out between the input and output.

I trained LSTM models with 300 neurons. I used up every block feature with different

sequence lengths in order to predict the next price value. I divided the dataset into 10 and

50 long sequences.

- 55 -

Figure 24. demonstrates the training and validation loss of the models and their

corresponding predictions.

Figure 24. Training and validation loss of LSTM models with different sequence lengths

Figure 25. LSTM predictions, which was trained on 50 long sequences for 100 epochs

- 56 -



- 57 -


There is a visible underfitting to the dataset on Figure 25. The LSTM model that was

trained on 50 long sequences produced better results, when the training took more epochs.

The same statement holds true for the LSTM model that was trained with 10 long

sequences. Although, it obviously produced better results after 100 epochs than when it

was trained on 50 long sequences of data.

These experiments were carried out with different length of sequential input data and

only one-time lag was attached to each input. It can be clearly seen on Figure 28. that

LSTM was able to forecast drops in the price several times before it factually happened.

However, additional investigation is needed in order to predict multiple time lags.

4.4.3 Application and integration of an operative prediction system

An operational prediction system which can accurately predict a crypto asset like

Bitcoin’s price or the volatility of its price can be effectively used for profitable trading.

Such a system could be integrated to a strategy module of an event-driven trade system.

Event-driven trade systems are built in order to realize semi-automated and fully

automated trade systems[24]. Semi-automated systems can produce signals about

evolving entry points on markets, which signals are utilized by users in order to open new

positions. In the case of fully automated systems, they are capable to open positions on

their own upon a signal receival. Essentially, an event-driven trade system operates like

a computer game. All calculations are generated from an infinitely running cycle where

- 58 -

different objects are placed at the frequency of incoming data. Because market data

continuously flows it is necessary for the system to be operated on a high frequency.

Cryptocurrency exchanges afford suitable data access possibilities for this task, while

traditional stock exchanges require severe conditions and large amounts of money to

provide real-time market data and support for automated trading.

An event-driven trade system has several advantages:

• The source code of the system is reusable. Its components can be easily replaced

to test it on historical data or use it for real-time trading.

• The foresight property excludes the possibility to use future data, because the data

flows with the event objects sequentially, thus it operates like a real-time system.

• The system works realistically. Any trade order with commissions can be

simulated arbitrarily.

Figure 29. illustrates an event-driven trade system. The components of the system

called objects are the most standard elements, which I describe in detail.

• Event – Every objects reaction in an adequate time is based on the reception of

event objects. The essential types of event objects are the Market, Signal, Order,

and Fill objects. Differents objects inherit the properties of the abstract base class.

• Event Queue – In memory stored Python Queue type object, which stores every

descendant event object, which are generated by other classes as reactions to data

flows.

Figure 29. The block diagram of an event-driven trade system

- 59 -

• DataHandler – An abstract base class (ABC). It provides an interface to treat

historical and real-time data differently. In this way, the strategy and portfolio

objects are reusable for both approaches. The DataHandler object generates

MarketEvent objects at the Backtest Event Queue’s frequency, which are then

treated by the Strategy.

• Strategy – The Strategy object is also an ABC and it has an interface, which can

communicate with the DataHandler. It interprets the market data adequately and

generates SignalEvent objects accordingly. When it signals a new position, the

SignalEvent contains the trading asset’s ticker symbol (like BTC-USD), the

direction of the position (long or short) and a timestamp. In the case of

cryptocurrencies, the direction of the trade is mostly a long position, which is a

buy order or the closing of an opened long position.

• Portfolio – It maintains a database about the user’s balance and historical trades

with statistics as well. The Portfolio object also calculates the size of each

position, which is also proportionate to the total available balance (excluding

already invested money).

• ExecutionHandler – It simulates the connection to an exchange or in the case of

real-time trading it realizes the connection. The ExecutionHandler is a gateway

for the system, which is used to connect to an API interface of a specfic exchange.

The handler also receives orders from the queue, which is then transmitted to the

API. If an order is filled the handler generates a FillEvent object, which contains

the information about the filled order. The information includes the filled quantity

of the order, the commission that was payed and a possible drift. Drift can only

happen in the case of market orders, when the desired trading price is not fixed.

• Backtest – Every object is collected in a common event cycle, from where

different events are directed to the adequate components of the system.

The strategies of an event-driven trade system mostly consist of signals, which are

based on technical indicators. These indicators are mathematical calculations based on

price, volume or open interest of a security or contract. There are several strategies that

use more indicators and the combination of them. A prediction system that can predict

the price movements, the volatility of the price, or even volatility from transaction

networks or block features could be integrated to the Strategy component of a trade

system.

- 60 -

4.4.4 Possible future experiments

I propose two different methods to further investigate the topic of training deep learning

networks on the pictures of transaction graphs. The first method suggests a combination

of a convolutional and a recurrent neural network and the second method describes a data

separation process to train different models on the segregated data.

The combination of a convolutional and a recurrent neural network is usually used as

a next frame predictor of a video input. CNN is used as a deep hierarchical feature

extractor and an LSTM is capable to recognize and synthesize temporal dynamics. A

Long-term recurrent convolutional network (LRCN)[25], or a convolutional, long short-

term memory, fully connected deep neural network (CLDNN)[26] could be applied to

process sequences of transaction graph pictures with the corresponding weighted price or

volatility target sequences. The sequence length is an arbitrarily adjusted parameter which

can be optimized by trial and error. The final layer of these architectures must be modified

in order to adapt to regression.

An additional possible research could be to train different models on separated time

intervals, where the sequences of the price and volatility correlates to each other. In order

to demonstrate this idea, I divided my temporal dataset into 15 long subsets, each of them

contains equal elements to the length. Then, I iterated through every subset and searched

for matching subsets with a higher correlation value than 0.8. The following dataframe

on Figure 31. illustrates the result DataFrame:

Table 11. DataFrame of correlating subintervals

The interpretation of the DataFrame on Figure 31. is identical to a correlation matrix

of features. The third column with index 2 represents the third subset of my temporal

dataset, which has a higher than 0.8 correlation with the 175th, 434th, 681th, 691th, etc.

subsets. There are 15 groups of subsets in the dataframe, which have more than 40

- 61 -

elements. It means 15 groups with 40 elements, each of the elements contains 15 records

of temporal data.

The following diagrams on Figure 30-31. illustrates the elements of the previous

dataframe’s 3rd and 53rd column:

Figure 30. Correlating time intervals, 53rd column of the DataFrame on Table 11.

Figure 31. Correlating time intervals, 3rd column of the DataFrame on Table 11.

It can be seen on Figure 30-31. that these time intervals are highly correlated. The

previously mentioned CNN architectures could produce different results, if they are

trained on separated datasets.

- 62 -

5. Summary

In this thesis, I introduced the basic mathematical background that blockchain networks

rely on. I discussed in detail the data structure of the Bitcoin blockchain, its protocol and

operation. I represented the process, through which I collected, stored, analyzed and

transformed the Bitcoin network’s data in order to feed them into deep learning networks

and make predictions for future price, volatility and transaction quantity.

During my work I learned about public-key cryptography. This field of cryptography

introduces cryptographic hash functions, finite fields and mathematical operations on

elliptic curves. These innovations jointly secure blockchain networks and allows them to

operate without a central authority. Blockchain networks are recent inventions that create

trust between untrusted parties. Today, there are several untapped possibilities that the

technology of blockchain could be used for. However, most of the blockchain networks

provide digital or cryptocurrencies and make these tokens transferable between two

parties. The value of cryptocurrencies is also denominated in fiat currencies and therefore

they are traded on so called cryptocurrency exchanges. The publicly available historical

data of every blockchain creates new opportunities for trading strategies.

Deep learning is a subfield of machine learning. Artificial neural networks with several

layers are capable to learn interrelations between input data and desired output, which are

otherwise impossible to be explicitly associated with traditional functions. I collected one

and a half year long temporal data about the Bitcoin network and stored them in HDF5

files and in other data structures, which is required by deep learning networks. From every

Bitcoin block, I created graph pictures about transaction networks in order to investigate

the possible relationships between unique graph structures, subsequent price and

volatility data. I also did experiments to explicitly tell the number of transactions from

the graph pictures, with the help of different convolutional neural network architectures.

The reason why many convolutional neural network architectures were not able to learn

weighted prices and volatilities from the transaction graphs, I used a recurrent neural

network, long short-term memory to predict the desired targets, from the temporal

sequences of block features. LSTM could learn the sequences, but it had a lag in time

between the true values and its predictions and therefore it is not perfectly usable for

predictions in a real-time environment.

In the last chapters of this thesis, I introduced an application of an operational

prediction system in an event-driven trade system’s strategy module. I also proposed

- 63 -

future investigation opportunities to process sequences of transaction networks, in the

form of a convolutional neural neural network and long short-term memory combination.

Another possibility to further investigate the topic is to train different deep learning

models on separated but correlating time intervals.

- 64 -

Acknowledgements

I would first like to thank my thesis advisor, Dr. Bálint Gyires-Tóth of the Department of

Telecommunications and Media Informatics at Budapest University of Technology and

Economics. He helped me a lot to figure out the way that fits into my interest. He

consistently allowed to do my own work, steered me in the right directions and his door

was always open to dispute actual topics when I got stuck in a subtask.

I would also like to thank to my friend Róbert M. Németh, who helped me to create

illustrations and figures with his graphic designer experience.

Finally, I must express my very profound gratitude to my Dad, Mom, Grandma and to

other family members for inventing energy in me, providing me permanent support, love

and making my studies possible. They cooked me delicious dishes, which obviously

helped me to tackle this road.

- 65 -

References

[1] Schneier, B., 1996. Applied cryptography-protocols, algorithms, and source code in

C. John Wiley & Sons., pp. 56-57.

[2] Drescher, D., 2017. Blockchain basics. Apress. pp. 71-81.

[3] Goldreich, O., 1998. Modern cryptography, probabilistic proofs and

pseudorandomness (Vol. 17). Springer Science & Business Media. pp. 11.

[4] Goldreich, O., 1998. Modern cryptography, probabilistic proofs and

pseudorandomness (Vol. 17). Springer Science & Business Media. pp. 65-66.

[5] Galbraith, S.D., 2012. Mathematics of public key cryptography. Cambridge

University Press. pp. 4-7.

[6] Antonopoulos, A.M., 2014. Mastering Bitcoin: unlocking digital

cryptocurrencies. O'Reilly Media, Inc.

[7] Standard, S.H., 2002. FIPS PUB 180-2. National Institute of Standards and

Technology.

[8] Johnson, D., Menezes, A. and Vanstone, S., 2001. The elliptic curve digital

signature algorithm (ECDSA). International Journal of Information Security, 1(1),

pp.36-63.

[9] Koblitz, N., 1991, August. CM-curves with good cryptographic properties.

In Annual International Cryptology Conference. Springer, Berlin, Heidelberg, pp.

279-287.

[10] Morain, F., 1991, April. Building cyclic elliptic curves modulo large primes.

In Workshop on the Theory and Application of of Cryptographic Techniques.

Springer, Berlin, Heidelberg, pp. 328-336.

[11] Koblitz, N., 1990, August. Constructing elliptic curve cryptosystems in

characteristic 2. In Conference on the Theory and Application of Cryptography,

Springer, Berlin, Heidelberg, pp. 156-167.

[12] Qu, M., 1999. SEC 2: Recommended elliptic curve domain parameters. Certicom

Res., Mississauga, ON, Canada, Tech. Rep. SEC2-Ver-0.6.

[13] Hagberg, A., Schult, D. and Swart, P., 2012. NetworkX Reference. Python

Package.

[14] Kobourov, S.G., 2004. Force-directed drawing algorithms. University of Arizona,

pp. 383-403.

- 66 -

[15] Collette, A., 2013. Python and HDF5: Unlocking Scientific Data. O'Reilly Media,

Inc, pp. 21-110.

[16] Bollen, B. and Inder, B., 2002. Estimating daily volatility in financial markets

utilizing intraday data. Journal of Empirical Finance, 9(5), pp.551-562.

[17] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethinking

the inception architecture for computer vision. In Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition, pp. 2818-2826.

[18] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T.,

Andreetto, M. and Adam, H., 2017. Mobilenets: Efficient convolutional neural

networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

[19] Zoph, B., Vasudevan, V., Shlens, J. and Le, Q.V., 2017. Learning transferable

architectures for scalable image recognition. arXiv preprint

arXiv:1707.07012, 2(6).

[20] Huang, G., Liu, S., van der Maaten, L. and Weinberger, K.Q., 2017. CondenseNet:

An Efficient DenseNet using Learned Group Convolutions.

[21] Mordvintsev, A. and Abid, K., 2014. Opencv-Python tutorials

documentation. Avaiable at: https://media. readthedocs. org/pdf/opencv-Python-

tutroals/latest/opencv-Python-tutroals.pdf.

[22] Nakamoto, S., 2008. Bitcoin: A peer-to-peer electronic cash system, pp. 1-9.

[23] Chollet, F., 2017. Deep learning with Python. Manning Publications Co, pp. 3-93.

[24] Kim, K., 2010. Electronic and algorithmic trading technology: the complete guide.

Academic Press.

[25] Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S.,

Saenko, K. and Darrell, T., 2015. Long-term recurrent convolutional networks for

visual recognition and description. In Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition pp. 2625-2634.

[26] Sainath, T.N., Vinyals, O., Senior, A. and Sak, H., 2015, April. Convolutional, long

short-term memory, fully connected deep neural networks. In IEEE International

Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580-

4584.

[27] Dobbertin, H., Bosselaers, A. and Preneel, B., 1996, February. RIPEMD-160: A

strengthened version of RIPEMD. Springer, Berlin, Heidelberg. In International

Workshop on Fast Software Encryption, pp. 71-82.

- 67 -

[28] Hochreiter, S., 1998. The vanishing gradient problem during learning recurrent

neural nets and problem solutions. International Journal of Uncertainty, Fuzziness

and Knowledge-Based Systems, 6(02), pp.107-116.

- 68 -

6. Appendix

A.1. Secure Hash Algorithm (SHA)

Secure Hash Algorithm is a hash function that was developed by the National Institute of

Standards and Technology (NIST) and published as a Federal Information Processing

Standard (FIPS 180) after approval by the Secretary of Commerce pursuant to Section

5131 of the Information Technology Management Reform Act of 1996 (Public Law 104-

106) and the Computer Security Act of 1987 (Public Law 100-235). Weaknesses were

discovered in SHA therefore revised versions were issued in the following years. SHA-

256, SHA-384 and SHA-512 are 256, 384 and 512-bits versions respectively. They were

introduced in 2002 by NIST. The documentation that describes SHA-256 and which is

the subject of the subsequent investigation is known as FIPS-180-2 [7].

The FIPS-180-2 standard specifies the SHA-256 hash function for the usage of

generating message digests. Digests can be used for the detection of changes in the

messages after the digest generation. The SHA-256 is considered as secure because it is

computationally infeasible to find a message that corresponds to a given message digest

(one-way property) or find two different messages that produce the same message digest.

Two different messages with any little dissimilarity will cause different message digests

with very high probability. For this reason, a change in the message will result in a

validation failure when the algorithm is used with a digital signature algorithm.

The SHA-256 has properties used by the algorithm for the generation of message

digest. The following table illustrates these properties and the conditions that should be

met.

Algorithm Message Size

(bits)

Block size

(bits)

Word size

(bits)

Message

Digest Size

(bits)

SHA-256 <264 512 32 256

A bit indicates a binary digit with a value of 0 or 1. A byte is a group of eight bits

and a word is a group of 32 bits.

The following table describes the parameters that are used by the secure hash

algorithm.

- 69 -

a, b, c, …, h Variables that are w-bit words used in the computation of the hash

values, 𝐻(𝑖).

𝐻(𝑖) The 𝑖𝑡ℎ hash value. 𝐻(0) is the initial and 𝐻(𝑁) is the final hash value.

They are used in the construction of the message digest.

𝐻𝑗(𝑖)

The 𝑗𝑡ℎ word of the 𝑖𝑡ℎ hash value, where 𝐻0(𝑖)

is the left-most word of

hash value i.

𝐾𝑡 Constant value used for the iteration t of the hash computation.

k The number of zeroes appended to a message during the padding step.

l Length of M, the message in bits.

m Number of bits in a message block, 𝑀(𝑖).

M Message to be hashed.

𝑀(𝑖) Message block i, with a size of m bits.

𝑀𝑗(𝑖)

The jth word of the ith message block, where 𝑀0(𝑖)

is the left-most word

of the message block i.

n Number of bits to be rotated or shifted when there is an operation upon

the word.

N Number of blocks in the padded message.

T Temporary w-bit word used in the hash computation.

w Number of bits in a word.

𝑊𝑡 The 𝑡𝑡ℎ w-bit word of the message schedule.

The following symbols represents binary operators each operates on w-bit words.

<< Left-shift operator, where x<<n means that every bit shifted to left by

n positions, discarding the left-most n bits of x and padding the result

with n zeroes on the right.

>> Right shift operator, where x>>n means that every bit shifted to right

by n positions, discarding tthe right-most n bits of x and padding the

result with n zeroes on the left.

∧ Bitwise AND operator.

∨ Bitwise OR operator.

¬ Bitwise complement operator.

⊕ Bitwise XOR operator.

+ Addition modulo 2𝑤.

- 70 -

These symbols are general in computer science. The following operators are specific

to the specification of SHA-256.

𝑅𝑂𝑇𝐿𝑛(𝑥) Rotate left operator, also called circular

left shift, where x is a w-bit word and n is

an integer with 0 ≤ n < w, is defined by

𝑅𝑂𝑇𝐿𝑛(𝑥)= (x << n) ∨ (x >> w -n).

𝑅𝑂𝑇𝑅𝑛(𝑥) Rotate right operator, also called circular

right shift, where x is a w-bit word and n

is an integer with 0 ≤ n < w, is defined by

𝑅𝑂𝑇𝑅𝑛(𝑥) = (x >> n) ∨ (x << w-n).

𝑆𝐻𝑅𝑛(𝑥) Right shift shift operator, where x is a w-

bit word and n is an integer with 0 ≤ n <

w, is defined by 𝑆𝐻𝑅𝑛(𝑥) = x >> n.

The following equivalence relationships exists between the rotating operators:

𝑅𝑂𝑇𝐿𝑛(𝑥) ≈ 𝑅𝑂𝑇𝑅𝑤−𝑛(𝑥)

𝑅𝑂𝑇𝑅𝑛(𝑥) ≈ 𝑅𝑂𝑇𝐿𝑤−𝑛(𝑥)

The abovementioned notations require some explanations:

• A hexadecimal digit is an element of the set {0, 1, 2, …, 9, a, b, c, d, e, f} and is

the representation of a 4-bit string.

• A word is a w-bit that can be represented as a sequence of hexadecimal digits, by

converting 4-bit stings to their hexadecimal equivalents. For example, the 32-bit

string 1000 0010 1010 1111 0111 0001 0010 1001 can be expressed as 82af7129.

Within each word the ‘big-endian’ convention is used, so the most significant bit

is stored in the left-most bit position.

• A word or pair of words can represent an integer. Padding techniques that are used

in the algorithm of SHA-256 require the message length, l, to be represented as a

word or pair of words in bits. An integer between 0 and 232 − 1 inclusive can be

represented as a 32-bit word. The least significant four bits of the integer are

represented by the right-most hexadecimal digit of the word. For example, the

integer 314 = 28 + 25 + 24 + 23 + 21 = 256 + 32 + 16 + 8 + 2 can be

represented with the word 0000013a.

- 71 -

• The following property is used by SHA-256. If Z is an integer, 0 ≤ Z < 264, then

Z = 232 𝑋 + 𝑌 where 0 ≤ X <232 and 0 ≤ Y <232. Let x and y be the word

representation of X and Y respectively and the pair of words (x, y) be the

representation of Z.

• The addition modulo 2𝑤 operation x + y is defined as Z = (X + Y) mod 2𝑤, where

X and Y are integers and are represented by the words x and y respectively. For

positive integers U and V, UmodV is the remainder when dividing U by V. For Z,

it is true that 0 ≤ Z < 2𝑤, so convert the integer Z to a word z and z = x + y is

defined.

• SHA-256 operates on 32-bit words (w=32).

Several functions are used by SHA-256 in order to hash the message. Each function

operates on 32-bit words. These words are represented by x, y and z. The following table

defines the functions, each outputs a new 32-bit word.

𝐶ℎ(𝑥, 𝑦, 𝑧) = (𝑥 ∧ y) ⊕ (¬ x ∧ z)

𝑀𝑎𝑗(𝑥, 𝑦, 𝑧) = (𝑥 ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z)

∑ (𝑥) = 2560 𝑅𝑂𝑇𝑅2(𝑥) ⊕ 𝑅𝑂𝑇𝑅13(𝑥) ⊕ 𝑅𝑂𝑇𝑅22(𝑥)

∑ (𝑥) = 2561 𝑅𝑂𝑇𝑅6(𝑥) ⊕ 𝑅𝑂𝑇𝑅11(𝑥) ⊕ 𝑅𝑂𝑇𝑅25(𝑥)

𝜎0256(𝑥) = 𝑅𝑂𝑇𝑅7(𝑥) ⊕ 𝑅𝑂𝑇𝑅18(𝑥) ⊕ 𝑆𝐻𝑅3(𝑥)

𝜎1256(𝑥) = 𝑅𝑂𝑇𝑅17(𝑥) ⊕ 𝑅𝑂𝑇𝑅19(𝑥) ⊕ 𝑆𝐻𝑅10(𝑥)

SHA-256 use sixty-four constant 32-bit words in the computation of the hash value.

These constant words are indicated by 𝐾0{256}

, 𝐾1{256}

, 𝐾2{256}

, … , 𝐾63{256}

and they represent

the first thirty-two bits of the fractional parts of the cube roots of the first sixty-four prime

numbers. The superscript of each K indicates the 256-bit version of SHA, because in

different versions of SHA, different constants are used. The following table represents

these constans in hexadecimal format.

428a2f98 71374491 b5c0fbcf e9b5dba5 3956c25b 59f111f1 923f82a4

ab1c5ed5

d807aa98 12835b01 243185be 550c7dc3 72be5d74 80deb1fe 9bdc06a7

c19bf174

e49b69c1 efbe4786 0fc19dc6 240ca1cc 2de92c6f 4a7484aa 5cb0a9dc

76f988da 983e5152 a831c66d b00327c8 bf597fc7 c6e00bf3 d5a79147

06ca6351 14292967 27b70a85 2e1b2138 4d2c6dfc 53380d13 650a7354

766a0abb 81c2c92e 92722c85

- 72 -

a2bfe8a1 a81a664b c24b8b70 c76c51a3 d192e819 d6990624 f40e3585

106aa070 19a4c116 1e376c08 2748774c 34b0bcb5 391c0cb3 4ed8aa4a

5b9cca4f 682e6ff3

The algorithm starts with preprocessing the message, M, which wanted to be hashed.

At first, padding is used to amend the message with the required bits, to be a multiple of

512. Let suppose that the length of M is l bits. A ‘1’ bit is appended to the end of M,

followed by k zero bits. The equation l + 1 + k ≡ 448 mod 512 must be satisfied with the

appropriate k value, where k is the possible smallest, non-negative solution. Then a 64-

bit block which is equal with l is appended, where l is in binary representation. For

example, consider a message ‘halo’, where each character is coded in 8-bit ASCII.

M = 01101000 01100001 01101100 01101111 = ‘halo’ in ASCII coding.

A ‘1’ bit is appended to the end of M, followed by 415 zero bits, because 448 – (32 +

1) = 415, then the length of the message l=32 represented in a 64-bit binary block. The

message became a 512-bit padded message. The message should be in the range of 0 <

M <264. If the message is longer than 512 bits, when it is padded, it should become

a multiple of 512 in bits.

When padding is completed the padded message is parsed into N 512-bit blocks,

𝑀(1), 𝑀(2), … , 𝑀(𝑁) . Each input block is expressed as sixteen 32-bit words, where the

first 32 bits of the message block i are denoted 𝑀0(𝑖)

, the next 32 bits are 𝑀1(𝑖)

and so on

up to 𝑀15(𝑖)

.

The computation of the hash value requires an initial hash value to be set, 𝐻(0). It is

made up eight 32-bit words, with the following values.

𝐻0(0)

= 6𝑎09𝑒667

𝐻1(0)

= 𝑏𝑏67𝑎𝑒85

𝐻2(0)

= 3𝑐6𝑒𝑓372

𝐻3(0)

= 𝑎54𝑓𝑓53𝑎

𝐻4(0)

= 510𝑒527𝑓

𝐻5(0)

= 9𝑏05688𝑐

𝐻6(0)

= 1𝑓83𝑑9𝑎𝑏

𝐻7(0)

= 5𝑏𝑒0𝑐𝑑19

- 73 -

During the process of hash value computation, SHA-256 uses a message schedule of

sixty-four 32-bit words labeled 𝑊0, 𝑊1, … , 𝑊63, eight 32-bit working variables labeled a,

b, c, d, e, f, g, h and a hash value of eight 32-bit words labeled 𝐻0(0)

, 𝐻1(0)

, … , 𝐻7(0)

which

will hold the initial hash value 𝐻(0) . After each message block processing 𝐻(0) is replaced

by an intermediate hash value 𝐻(𝑖), until the ending of the iteration with

𝐻(𝑁), the final hash value. Two temporary words, 𝑇1 and 𝑇2 are also used by the

algorithm.

The following steps are repeated N times, while all message blocks will be processed.

The result is the 256-bit message digest, which is a digital fingerprint of the message M,

in the form of:

𝐻0(𝑁)

∥ 𝐻1(𝑁)

∥ 𝐻2(𝑁)

∥ 𝐻3(𝑁)

∥ 𝐻4(𝑁)

∥ 𝐻5(𝑁)

∥ 𝐻6(𝑁)

∥ 𝐻7(𝑁)

The message blocks 𝑀(1), 𝑀(2), … , 𝑀(𝑁) are processed in order, using the following

steps:

For i=1 to N:

{

𝑊𝑡 = {𝑀𝑡

(𝑖) , 0 ≤ 𝑡 ≤ 15

𝜎1256(𝑊𝑡−2) + 𝑊𝑡−7 + 𝜎0

256(𝑊𝑡−15) + 𝑊𝑡−16, 16 ≤ 𝑡 ≤ 63

a = 𝐻0(𝑖−1)

b = 𝐻1(𝑖−1)

c = 𝐻2(𝑖−1)

d = 𝐻3(𝑖−1)

e = 𝐻4(𝑖−1)

f = 𝐻5(𝑖−1)

g = 𝐻6(𝑖−1)

h = 𝐻7(𝑖−1)

- 74 -

For t=0 to 63:

{

𝑇1 = ℎ + ∑ (𝑒)

{256}

1

+ 𝐶ℎ(𝑒, 𝑓, 𝑔) + 𝐾𝑡{256}

+ 𝑊𝑡

𝑇2 = ∑ (𝑎)

{256}

0

+ 𝑀𝑎𝑗(𝑎, 𝑏, 𝑐)

ℎ = 𝑔

𝑔 = 𝑓

𝑓 = 𝑒

𝑒 = 𝑑 + 𝑇1

𝑑 = 𝑐

𝑐 = 𝑏

𝑏 = 𝑎

𝑎 = 𝑇1 + 𝑇2

}

𝐻0()

= 𝑎 + 𝐻0(𝑖−1)

𝐻1(𝑖)

= 𝑏 + 𝐻1(𝑖−1)

𝐻2(𝑖)

= 𝑐 + 𝐻2(𝑖−1)

𝐻3(𝑖)

= 𝑑 + 𝐻3(𝑖−1)

𝐻4(𝑖)

= 𝑒 + 𝐻4(𝑖−1)

𝐻5(𝑖)

= 𝑓 + 𝐻5(𝑖−1)

𝐻6(𝑖)

= 𝑔 + 𝐻6(𝑖−1)

𝐻7(𝑖)

= ℎ + 𝐻7(𝑖−1)

A.2. The domain parameters of the Koblitz curve, secp256k1

The elliptic curve called secp256k1 is a Koblitz curve. It’s domain parameters over 𝐹𝑝 are

specified by the sextuple 𝑇 = (𝑝, 𝑎, 𝑏, 𝐺, 𝑛, ℎ) where the finite field 𝐹𝑝 is defined by[12]:

p =

FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE FFFFF2CF =

2256 − 232 − 29 − 28 − 27 − 26 − 24 − 1

The curve 𝐸: 𝑦2 = 𝑥3 + 𝑎𝑥 + 𝑏 over 𝐹𝑝 is defined by:

- 75 -

𝑎

= 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

𝑏

= 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000007

The compressed form of G is:

G

= 02 79BE667E F9DCBBAC 55A06295 CE870B07 029BFCDB 2DCE28D9 59F2815B 16F81798

The order n of G is:

𝑛

= FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE BAAEDCE6 AF48A03B BFD25E8C D0364141

The order cofactor, h is:

ℎ = 01

analyzing blockchain data with deep learning · 2019. 7. 15. · bsc thesis topic kristóf máté...

Documents