sds podcast episode 7 with artem vladimirov · 2018-06-01 · kirill: this is episode number 7,...

SDS PODCAST

EPISODE 7

WITH

ARTEM

VLADIMIROV

http://www.superdatascience.com/7

Kirill: This is episode number 7, with top analytics consultant,

Artem Vladimirov.

(background music plays)

Welcome to the SuperDataScience podcast. My name is Kirill

Eremenko, data science coach and lifestyle entrepreneur.

And each week, we bring you inspiring people and ideas to

help you build your successful career in data science.

Thanks for being here today and now let’s make the complex

simple.


Hello everybody, and welcome to this super special episode

of the SuperDataScience podcast. I hope you're ready for a

crazy rollercoaster. This episode is with one of my best

friends, Artem Vladimirov. Artem and I go way back. We

studied together in our Masters degrees starting from 2010,

and then in 2012, we joined Deloitte and worked there

together in the data science department, and then our paths

split in around 2014. I went into industry and Artem went

into another consulting firm, which is called the Boston

Consulting Group, which is a top tier consulting firm, and

he continued doing data science there.

And it was so great to catch up now. It's been a long time.

We do talk occasionally, but I don't keep track as much of

what he does and his career, and today I learned so much

about how he has grown and what a great consultant,

world-class consultant, he has become.

I'll give you a few examples. Just in the past two years,

Artem has travelled to at least six different countries,

ranging from Spain, Hong Kong, India, Italy, all over the


place, performing consulting engagements with large clients.

And we're talking deals and projects that range from million

dollars, and sometimes even more than a million dollars,

which is normal for this organisation, the Boston Consulting

Group. So you can kind of like tell what calibre of consultant

he is.

The interesting, the very interesting thing that I learned from

this podcast is what Artem does is actually not entirely

classified as data science. It is a mix of different approaches

and methodologies, and it's actually called "Advanced

Analytics". And Advanced Analytics is a bit different to data

science, and I think you will find this very interesting.

Advanced Analytics involves more of a simulation type

approach. So I don't know if you've ever heard of, or even

played this game called Sim City. Back when we were kids,

there was game, Sim City, where you would be able to build

a city and then stuff would happen in the city, and then

your fire trucks would dispatch, and they would go, and you

would be controlling the city from a bird's eye view.

So that is the simplified way of the way I imagine what he

does, is he builds these simulation models which are

actually little miniature models on your computer of, for

example, a supply chain. Or of a warehouse. Or of a

company that's producing something on conveyor belts, and

they have some bottlenecks. And adjusting certain

parameters on this simulation model, he can identify where

the potential bottlenecks are, where the challenges in the

supply chains are, how the company should place its

warehouses, and so on.

And it's great that in this podcast, Artem actually goes into

several case studies in a lot of detail. So Artem will walk us


through, like literally walk us through a case study of a

project that he did for a bank, where they were performing

some modelling, and he will explain exactly which algorithm

they were using, it was random forest, and how he thought

about it, and I really drill into the questions, and I ask him a

lot about the way he thinks about it, what the overall

business challenge was, and we learn a lot from that.

Then there'll be a case study about some warehouses which

he was optimising somewhere in Europe, the placement of

warehouses or storage facilities. So that was also a valuable

thing. And regardless of what you're using analytics for,

whether you're pursuing a career in analytics, or you're

building an analytics culture, an environment, or you're an

executive and you want to leverage analytics some more in

your business, you will find a lot of value in this podcast. We

go into all of these different details and ways you can be

applying analytics.

And also, Artem will share a little bit of his background with

us. And it's very interesting because I actually knew this,

but I forgot, and he reminded me that Artem's background

isn't actually in data science or analytics. So the stuff he

studied at uni was economics and finance. And when he

went to Deloitte, he really had to develop these skills, such

as R programming, any logic, and SQL from scratch. So he

didn't have any of these skills, and his career is a great

testament to the saying that where there's a will, there's a

way.

So just by looking at how he approached this challenge in

his life of becoming a data scientist from scratch after

university, you will be very inspired to go and do the same.

Because if he had the determination and the willpower to


persevere and actually achieve the results that he has

achieved, and build this super successful career for himself,

then you should be inspired to find that same willpower,

that same perseverance, and determination to build a career

for yourself just like Artem did.

Can't wait for you to check out all of the value inside this

class. You'll notice that we went a bit over time. That is

because we just got so carried away in all of these

discussions. This is a super exciting episode, and without

further ado, I bring to you Artem Vladimirov of the Boston

Consulting Group.


Hey guys, welcome to this podcast. I'm super excited you

can probably tell by my voice. I've got my good friend Artem

Vladimirov here. Artem, hey mate, how you going?

Artem: Hi Kirill! Great to talk to you. I am good. And you?

Kirill: Great, great, thanks. For those of you who don't know,

Artem and I go way back. We met -- when was it? Like back

in 2010, yeah?

Artem: Yeah, 2010-2011, back at uni times.

Kirill: Yeah. We both went to the same university and studied

pretty much the same degree. And do you remember that

crazy story of how we met?

Artem: Something like you introduced yourself to someone from

Zimbabwe? Yeah, I do remember something like that.

Kirill: I tell people still. It is like the stupidest thing ever.

Remember, it was our first lecture of our first class. And we

both went to the wrong building. Remember that?


Artem: Yeah, yeah, that was true!

Kirill: We were supposed to go to a statistics class, and we went

and we were the only two out of our statistics class, we went

to the biology class or something. And we didn't recognise

anybody, we didn't understand what was going on. We were

just sitting there like two idiots. It was so cool. It was such a

coincidence back in the day. And then since then, a lot of

things crossed still in our past. We got a lot of time to bond

and connect. We did all our assignments together, especially

for economics, the group assignments, that was fun. And

then we started working at Deloitte together, yeah?

Artem: That's true. We worked together for several years at Deloitte.

Kirill: It was a fun time. It was -- what was the department? It was

called Data Analytics at first, and then it was called Decision

Science, yeah?

Artem: Yeah, Data Analytics, and then it was renamed to Decision

Science and Analytics.

Kirill: Yeah, I remember how they called it DADS for a while?

Artem: Yeah, Decision Science and -- what was it called?

Kirill: Deloitte Analytics and Decision Science.

Artem: Yeah, that was it.

Kirill: Yeah, that wasn't the best choice of name. It's just DADS.

Yeah, good old days. Anyway, and since then, I left Deloitte

and I went into industry, and then quit that, and now I do

this. And you moved to a very exciting and new role which,

personally, I don't even know much about. You moved to

BCG. Boston Consulting Group. Right?


Artem: That's right. As it happened, I also left Deloitte shortly after

you left, but I didn't leave the industry, I stayed within

consulting and I joined the Boston Consulting Group,

working in the same team in Big Data and Advanced

Analytics, with a slightly different office than I had before.

But I'm happy to discuss it in more detail.

Kirill: Awesome. Yeah, that'd be great. If you can tell us a bit about

it. Because all I can hear from you is like when I want to

catch up, or have a chat with you, you're like, oh, I'm in

India. Or I'm in America today. Or I'm in Japan. You're like

all over the place. And what do you do for a job?

Artem: I'm working in the Big Data and Advanced Analytics team.

It's an expert team which provides expertise to our case

teams, so we provide the expertise, so we do cases for the

clients. And I personally work at the intersection of data and

advanced analytics techniques, so just spatial modelling,

dynamic simulation, mathematical optimisations, and with

practical applications of my work including things like

network design and optimisation for financial institutions or

for retail stores, things like supply chain optimisation pretty

much for any industry, the bottlenecking of manufacturing

facilities, so you may guess that it's more advanced analytics

rather than big data. I would say probably 25% data

scientist, and 75% is advanced analytics.

Kirill: Very interesting. And before the podcast, you actually

mentioned to me that you're doing more advanced analytics

than data science. Could you tell us a bit more, what is the

difference between advanced analytics and data science?

Artem: Yeah, sure. So for instance, as I mentioned, I am doing

dynamic simulation. So that's programming in Java and


Java is one of the tools that I use. That's creating models

which look like simplified versions of computer games,

where you can see things moving around. So I create these

models for industrial shops, where they produce stuff, like

metals for instance, and then I use these models to test

various scenarios, like if they're going to change some

production logic, how it would input the total production in

terms of pounds.

And to develop these things, you don't need data per se. So

you just need estimates. So let's say, what's your average

processing time, what distribution does it follow? What's

your maintenance logic for your equipment? So whether

you're taking down your equipment, let's say, once in a

month for planned maintenance and then there is a certain

probability for unplanned maintenance.

And it's just literally a few numbers for each of these rules.

So let's say 15 minute average time, and 5 minutes standard

deviation for processing time for equipment A, etc. etc. But

in order to get these estimates, of course you need to do

some data crunching. So that's where data analytics comes

in to help as well. But I usually ask someone else to do this.

But just to develop these models, you don't need this to do

this data crunching, you can just use dummy variables to

see how it's working. Like I can use, let's say, 30 minutes,

having no idea what the real processing time may be, to

develop this model and then I just feed the estimates into

this model. Does it make sense?

Kirill: Yeah, yeah, it makes sense. So you're kind of like building a

little mini version, or a computer version of the factory, or of

the supply chain, or of the network, or something, so that

you can model it and speed up the process and understand


where the issues and bottlenecks will occur in real life. Is

that correct?

Artem: Exactly. Or take for instance just spatial modelling. So that's

taking into account geography into consideration in your

analysis. You don't need huge data sets to do just spatial

modelling. So what you need to know is locations of your,

let's say source, or points of interest, and locations of your

competitors, things like raw distances, and then you can

find out what the optimal locations of your warehouse is, for

example, to minimise total transportation costs for your

client.

Kirill: Very cool. The way I imagine it in my head right now is like a

little Sim City. You know that game, Sim City? Where it's

like you're building a city, and then something happens, like

a fire breaks out, and your little fire truck has to get there

on time, and you kind of like model it. You can try to rebuild

a big city like New York, or something like that. So yeah,

that's the way I think about it.

Artem: Pretty much, except for we don't have disasters!

Kirill: Alright. So that's pretty cool. So in that sense, Advanced

Analytics isn't like -- because I thought, when you

mentioned Advanced Analytics, I thought it was like a step

up from data science. From what you're explaining, it

sounds like just a something that's parallel to data science,

right? Is that correct?

Artem: I would say that it's something that supplements data

analytics, yeah. It's definitely something separate to data

analytics, but at the same time, it often complements each

other.


Kirill: Ok. Ok. That sounds pretty cool. And so if you can do this

all on your computer, why do you have to always go and

actually visit the client, whether it's India, Japan, America,

and all these other crazy places that you've been to recently?

Artem: That's a good question. So as part of my role, I look after the

Asia Pacific region. So I do projects for the client not only in

Australia, but in the whole Asia Pacific region, and with

some occasions for global projects. So for example, recently,

I had been in Europe to do a project for one of our European

clients. And the reason why I have to travel is because while

I can develop all these models remotely from Sydney, where I

currently live, which I often do, an important part of my

work is to understand the business rules and, let's say, the

rules of the game. So you need to understand what is the

business problem first, and then what are the business

rules? What are the business constraints that can shape

this solution or this problem. And you need to discuss it

with the client. So the most efficient way is to face to face

with the client discuss these things, and then start

developing the model.

And then, after you have, let's say a first version of the

model ready, with some preliminary insights, or results, you

need to validate it. You need to make sure these results

make sense in the context of the business. So you do some

validation of [14:51] how they can use it with the client. You

sit with the people in the business to understand whether

this result makes sense, and more often than not, the

results from the first version would not make total sense,

just because it's very hard from the first time to take into

account every single business rule that can shape your

solution. So you will probably miss something in the initial


iteration. And then you will try to understand the case. If

your solution does not really make sense, does not make

sense 100%, what do you do? What did you miss? What can

you add to the model that's realistic, and that will shape this

solution?

Kirill: Ok, yeah, makes sense. And so you're kind of like visiting

these places and talking to these people, and actually seeing

the place to develop a certain level of domain knowledge. Is

that correct?

Artem: Yes. That's correct. I even visited some metal plant, metal

making plant, so that I worked with the case team on the

ground there, and I also had to visit a shop itself so that I

basically know exactly what I am modelling.

Kirill: Can you tell us, just so that our listeners can get a feel for

your lifestyle, what countries have you been to in the past

two years?

Artem: So I work in Australia, but in the past, I've been to the

States, I've been to Japan, I've been to Hong Kong, Italy,

Spain, Singapore, India, Russia. That was pretty much for

holidays though.

Kirill: That's crazy.

Artem: That's pretty much it.

Kirill: That's so cool. Even I'm a bit jealous. Like in a good way. I'm

happy for you. I'm really happy that you get to travel.

Artem: Thank you.

Kirill: Do you find it stressful, travelling all the time for work? Or

do you find yourself doing -- what do you do on the plane?

It's such a long flight from Australia to Italy and Spain, and


so on. What do you find yourself doing? Do you just keep

working all the time?

Artem: If I have to work, I have to work. But very often, it's night

flights, so in general, if I'm in business class, I can sleep

well.

Kirill: Nice. Nice. Yeah.

Artem: Sometimes you just feel like you can watch movies.

Kirill: Your frequent flier miles must be through the roof!

Artem: Oh yeah!

Kirill: Which airlines are you with?

Artem: I'm with Qantas. But actually, to be honest, I have not

travelled much in the last 6 months, so they're probably

going to downgrade me from Gold to Silver, and then I'm

also Gold with Singapore Airlines, I also have some status

with Emirates and Etihad.

Kirill: Everything. A little bit of everything.

Artem: Yeah.

Kirill: Sounds like fun. You mentioned cases. You said you're with

the case team, or you're working on a case. That sounds like

a police case, or a legal case. I'm assuming it's not, of

course. But what is a case? Because I remember at Deloitte,

we never had that term. What do you mean by case?

Artem: It's a project. So every company has its own terminology for

a project. I think at Deloitte, we had an engagement for this,

so we called engagements. So here it's called cases. I think

in some other place it's called project. But it's essentially a

project for the client.


Kirill: Ok. Now that our listeners have envisaged who Artem is, and

that he's obviously just without a doubt, you're very

successful in what you do in your career, and you sound

very happy about what you do, and who wouldn't be,

travelling the world and doing all these exciting projects, can

you tell us please a little bit more about your background so

that our listeners can understand what pathway you took to

get to where you are.

Artem: Sure. So my first degree was in economics, and then I got

my postgraduate degree in finance, which is completely

different from what I'm doing right now, to be honest. And

then I started to work at Deloitte in the data analytics team,

and to be honest, in the first three months, I thought I was

going to leave it, just because I use a bit of programming,

but not much, so things like R I didn't know about. I had to

learn it from scratch. I had to learn SQL from scratch. So

most of the tools I had to learn from scratch, and then let's

say I was thrown in the ocean and I was looking at the SQL

scripts that they had in place, and I just didn't understand

anything, to be honest. And I thought I'm not going to

survive for long.

But then I kind of started to understand everything. I spent

a lot of time after office hours trying to understand all the

procedures, all the scripts, trying to learn the language, and

I kind of liked it. And that's how I became a data scientist.

Kirill: Wow, I love it. That's a great story. And I actually forgot

about that. Because you had told me those things, that you

needed to learn even SQL from scratch. So it's a great

example that yes, you do have two degrees. You have a

Bachelors and you have a Masters, but they're completely in

unrelated fields. Yes, it's economics and finance, so it's


somewhat related. But you still had to build your data

science skill set from scratch. And I think that's going to

stand as a lot of inspiration to a lot of people who are going

to be listening to this podcast who don't know where to

start. Like, you're a great example of a person who didn't

give up, who actually just pushed through it, and like you

say, late nights, and perseverance, and actually learning all

these tools from scratch. So yeah, that's great to hear.

What would you say is the one biggest piece of advice you

can give to somebody who's going to be in the same shoes

you were in 4 years ago, or was it 6 years ago?

Artem: Do what you like. If you don't like the area that you are

working in, or the area that you are studying, think hard

about it, whether you should continue. Because the main

strengths are in what you like. If you like it, then you will

find inspiration, you will find strength to do it. And just

don't give up.

Kirill: That's fantastic. Thanks a lot for that. And speaking of

learning R programming, because recently we chatted, and

we were talking about R. And if you don't mind me

mentioning, you said that you don't use R much, and you're

slowly starting to forget that skill. Do you think it's easy if

you want to recover it? Do you think it will take you a long

time to recover R programming now?

Artem: I don't know, really quickly? In a few hours, I can pretty

much remember everything. I think the reason why I'm not

using R so much is that I now switched to another tool,

which is called Alteryx, which is kind of a mixture of SQL

and R. So basically, what I could do is a combination of

these two tools, SQL and R, I can now do in one. So Alteryx


has an in-built module, and in-built residues for R. So there

are in-built things like regressions, Random Forests, things

like that, which are based on R code. So you can basically

run R code in Alteryx. It's quite cool.

Kirill: Ok. Can you tell us a bit more about Alteryx, is it a free

software? And also these models that are incorporated, do

you need to download libraries to install them, or do they

come pre-packaged and stuff like that?

Artem: So it's not free, it's commercial software. To be honest, I'm

not sure how much a licence costs. As far as I'm aware, it's

not too expensive. It's definitely cheaper than some of the

other software that we use. And what it can do, so it's very

good in data manipulation. So things like queries, data

restructuring, joining tables, things like that. Aggregations,

grouping. But then it also has some other modules, in-built

modules, which allow you to do some additional things.

So for example, there is module called Statistical Model,

which is linked to R, which can do regressions, Random

Forests, neural networks, and it's very easy to set up. So

easy you don't need to program, you just drag and drop

different elements together and create a diagram. And then

you can also do simple chess spatial modelling in Alteryx as

well. So for example, if I have a client who is a retailer, and I

know the locations of their stores. And I know competitors,

locations of their competitors. I can pretty quickly derive

something like 10 minute drive time radius based on actual

drive time network, based on actual road network, and I can

understand what kind of population lives within 10 minutes

of our client's stores and within competitors, the 10 minutes

of competitors' stores. So what are the demographics of


these catchments, compare it, and do some analytics on

that.

Kirill: Wow, and that all happens within Alteryx?

Artem: Yeah.

Kirill: That's really cool. So is that what you predominantly use it

for? Or do you also utilise the Random Forest, and neural

network algorithms that you mentioned?

Artem: Yeah, I also use it for statistical modelling.

Kirill: Ok, ok. That's very interesting. What are the probably 2 or 3

most used modelling algorithms, or which ones are the ones

that you use most?

Artem: So if we are talking about statistics, then it will be linear

regressions and Random Forests, or GLMs, or boosted

models. Sometimes you would use things like Random

Forests, which are essentially black boxes, right. You don't

know exactly how they operate. Well you know roughly how

they operate, but you don't know exactly how they transform

the input data into the final recommendations, so you don't

know exactly how each of these different attributes that you

put into this model, how does it affect your final output. You

can do some sensitivities, but it's effectively black box. And

sometimes I use that, like if I don't need to explain how

exactly I got to this result, if I just need to predict

something. So, for instance, recently, I used it to predict

total value that a band can get from each area in Australia

based on their current customer base. So let's say they have

current customers distributed across the whole of Australia.

They don't have customers everywhere, in every single

region, and we had 55,000 different areas that we can split


Australia into, and they obviously don't have customers in

every single area.

Now, what we can do, based on the current customer base,

based on the demographics and value of the products that

they take, we can infer what are the other areas that are

worst for this bank if they put in a branch in these areas.

And I did that using the Random Forest, just because I had

to make a prediction. I didn't need to explain exactly which

demographics attribute results in uplifting this metric in my

final metrics.

However, there are other situations when you would discard

Random Forests, or GLMs, or whatever other model you are

using, in favour of a much simpler model, something like a

linear regression. It can have a bit less predictive power, but

its major strength is in the fact that it's interpretable. You

can easily interpret it in terms of coefficients. So let's say

each of your predictors will have a coefficient associated with

it, and the value of that coefficient will basically indicate

what's the impact of this predictor on your final outcome.

Which can be very, very useful assuming you're doing your

statistical analysis right. Assuming there is no

multicollinearity and other very scary statistical things.

Kirill: Homoskedasticity, yeah?

Artem: Yeah, yeah, something like that. Assuming all major

assumptions hold, the coefficients are pretty interpretable

and you can basically say how much each predictor, what's

the impact of each predictor on your final outcome.

Kirill: That’s really cool. And it’s interesting that you mention that

because -- probably this is more for our listeners -- that if

you're interested in learning about any of those algorithms,


like Random Forests, or linear regressions, and all of the

interpretation, you might want to check out the courses on

SuperDataScience, which are the machine learning course,

and Data Science A to Z. We discuss all those things in a lot

of detail, so a lot of the students listening to this podcast

actually should be quite familiar with these concepts. The

only one I would ask you to clarify a little bit is GLM. What

does GLM stand for?

Artem: Generalized Linear Model which is just a more sophisticated

version of linear models like that. It can take up different

combinations, so it’s not just linear relationships that it can

test.

Kirill: That’s really cool. And Random Forests, linear regressions —

and that was a great example about the bank and how you

would predict the outcome for the bank. I’m still trying to get

my head around how do you think of that problem in a way

to say, ‘Oh, it actually makes sense. I probably should use a

Random Forests algorithm.’ Because at the end of the day,

Random Forests is a combination of decision tree

algorithms, right? So it’s just many decision trees and then

their averaged out outcome from there. So how would you go

about thinking a business problem to come up with a

conclusion that the Random Forest is the way to go in this

situation?

Artem: Well, you start with the business problem as a whole. So you

need to understand what the business problem is, what the

business implications are. And then what I do is I spend a

few hours in front of a whiteboard trying to basically pencil

out a solution of drawing out a methodology, how I would

approach this problem. And let’s say -- the example that I

gave you -- it was actually the part of the problem. So that


was, let’s say, a first step to solving a much larger problem.

And the problem was that we had to determine best

locations for their branches, so to do a network optimisation

for that bank and to determine best locations for the

branches. And then I thought about it, about this problem.

How do you solve it? Let’s say, in order to make a decision

on where would you put branches geographically, you need

to know: a) how is the value distributed geographically, so

what is the potential of each area. And then you need to

know how do these branches capture this value.

Having these two pieces together, you can then run

mathematical optimisation, maximising the value that these

branches capture based on their locations. However, what

you don’t know, like you don’t know these two pieces in

advance. So you need to determine the value, like what’s the

potential of each area. And that’s where I use this Random

Forests technique. So I thought about it: how can I derive

the potential of each area in terms of value per bank? Of

course, we have a current customer base, we know their

profitability, we know which products they take, we know

where they live roughly, we know the demographics. Again,

roughly, their age, their gender, etc. And we can use this to

understand whether these demographics, whether these

attributes affect the profitability of a customer. So whether,

for example, older people take more valuable products, take

more mortgages, things like that. And then you use these

insights to run a statistical model to make a prediction.

Again, now that I know this information about my

demographics, how it impacts my profitability of my

customers, now I also know population of each area in

Australia in terms of number of people and their associated


demographics from the census data, and now I can run a

model, a statistical model, which basically will predict what

is the potential of each area based on things that I know.

And then I thought about a statistical model, what statistical

model I can use. So, obviously, you have a range of different

statistical models, such as GLM, Random Forests, and even

neural networks, so simple linear models. The choice of

which model, so what I often do is I often run several models

at once and then I compare the predictions. So I compare

the performance of each model and then it’s like the best

model, the model which performs the best, subject to certain

considerations in regards to whether I need to interpret,

whether I need an interpretable model or not, in which case I

would not use Random Forests. Like, what kind of an

output, what kind of an outcome, an output variable do I

test on it again? So for example, if it’s a categorical variable

that I want to predict or whether it’s a numerical variable, it

will all shape the choice of the final technique.

Kirill: That really explains it well. Thank you so much, especially

that part where you started mentioning different

characteristics that you have about the customers of the

bank -- their age, their gender, and other knowledge that

you have about them. That kind of in my head now makes

sense where the decision tree comes from. So you're kind of

like, decision trees, like ‘Are they over 30 or under 30? Are

they male or female? Do they work in white collar or blue

collar?’ and things like that. And then based on that, you

would get a Random Forests algorithm to work on through

using those decision trees and kind of like -- it makes sense

how it would work. So it was a great case study. So thanks a

lot for walking us, literally walking us through this case


study. I think it’s a lot of value for people studying statistics

and especially machine learning. Another question I had,

since we’re talking about some of the work that you’ve done,

is what would you say – if you can share this information, of

course, because I understand BCG has certain non-

disclosure statements and stuff like that – but whatever you

can share, what has been your recent biggest win, you

would say, in the space of data science or advanced

analytics, and your biggest challenge in your day-to-day

role?

Artem: Tricky question. I can probably mention one of my previous

cases that I’ve done, which was in Europe. It was a network

design for a European utility company which I did last year.

And I can probably say it was a recent win that I can share

with you. And the reason why it was a very big win for me

personally is the work involved very advanced models,

modelling techniques. So it was very technically

sophisticated because it was including an optimisation

model -- so mathematical optimisation -- and a simulation

model just to solve one problem for a client, which is quite a

rare case, to be honest. It’s a very rare occasion that you

would need to. Most often you would just use either

optimisation, for example, if you want to understand what

are the best locations for your warehouses, or you would use

a simulation to test certain scenarios. So for example, if you

want to test how certain production initiatives will impact

your total productivity at the plant. But in that particular

case, the problem required the use of two different

techniques, including some just spatial modelling as well,

which together had very tangible results achieved for the

client. And client appreciation for the whole project made it a


very enjoyable case for me. So it was a big win for the client.

It was a big win for us. A very nice case, very nice team to

work with.

Kirill: Wow, that’s fantastic! And that’s also a good example. I

know you probably can’t go into a lot of detail about the

project itself, but a good example for those who are listening

who have their own businesses or who are in managerial or

even executive positions. Like, when you think about it, you

can just place warehouses anywhere, right? You can just

place warehouses wherever it’s cheaper. But then, why

would you do that if you can run some optimisation, supply

chain optimisation, and other analytics to understand what

is actually the best location for your warehouse. It’s just

something that doesn’t come to mind right away and maybe

for those listening who have their own businesses, maybe

there’s other parts of your businesses that you are just like

placing or going about based on your intuition, or gut feel, or

just based on some common standards, acceptable ways of

conducting business. But at the same time, maybe there’s a

better approach through data to actually come up with a

more optimised solution. So, thanks for that. And what

would you say is your biggest challenge?

Artem: I think it’s a very, very good point that you just mentioned

because very often, and I see that a lot with our clients etc.,

that people just use their gut feeling to make certain

decisions, right? So they base them either on Excel

spreadsheets, which don’t take into account all the business

rules, etc. They base their decisions on gut feelings and their

intuition based on how the business did it in the past, which

is most often not the best way to do things. And let’s say, for

example, let’s take again this example of warehouses. You


need to put 10 warehouses across the whole country and

you need to put it in such a way as to -- you also want to

minimise your costs, your supply chain costs. And then

there can be lots and lots of different considerations that can

shape, that can impact this transportation cost. So

obviously you want to minimise your transportation

businesses, whether it’s road, or rail, or whatever else. You

want to minimise your inventory costs. Then there can be

other business rules you need to take into account.

So, for example, I had a case when I had to take into

account that when you transport your materials from a

warehouse to a customer, and if they’re in the same state,

there is no tax applied. But if they’re in a different state,

there’s an interstate tax that has to be applied, that the

client pays for. Which basically has lots and lots of impact

on your final solution because it basically incentivises you to

put warehouses in the same states as where customers are.

And then there can be lots of these different business rules

that you need to take into account. You may have capacities

of your factories playing a huge role. You may have even

limited capacities on your, let’s say, railroad transportation

or road transportation. And all of this shapes your solution.

You just can’t take all of this into account if you make your

decision based on your intuition, or based on the historical,

how your company did it in the past. So mathematical

optimisations, they have become so powerful in the last

decade with the computer power basically raising, hugely

increasing in power, and optimisations have become much

easier to solve just from the pure processing time and

algorithms. Algorithms have developed intensely over the

last decade. And then these mathematical optimisations,


they can be applied to solve business problems as well,

taking as an example this warehouses problem.

So you can describe all these business rules and constraints

in the form of mathematical equations and mathematical

problems. You have an objective, you have your levers,

things you can pull to shape your solutions. So in this case

it will be locations of your warehouses, which the model can

change, and then your constraints. Your business rules are

constraints such as it can be capacities of factories, it can be

capacities of your transportation, it can be different taxes

etc. And then you can formulate this problem in the

mathematical form that the computer will understand. And

it will try to optimise it, or try to find, let's say, a minimised

cost, so the least cost, subject to all the constraints that

you’ve put in, and it will determine what is the best -- in our

case, what is the best location of the warehouses which

minimises the cost, but at the same time satisfies all the

constraints that you’ve put in.

And it is so powerful and we see -- so, we’ve implemented

these techniques with so many clients, and we can see huge

benefits from just using these techniques. You can do things

slightly different, so slightly differently than you do now, and

you can save lots of money just because historically you just

don’t make optimal decisions. And optimisations can

consider from an infinite range of alternative solutions. So

you can have hundreds of thousands of different locations in

the country where you can potentially put a warehouse. You

can choose either. And optimisation will choose the best

ones.

Kiril: That’s definitely something you can’t do just with gut feel or

on a piece of paper.


Artem: Exactly. And most standard approaches would involve just

tweaking the things, so testing different scenarios. Let’s say

if I just move a warehouse from location A to location B. You

would recalculate all the costs, etc. You would compare

these two scenarios, and then ok, if it's better, you’d say, ‘We

need to move this warehouse to another location just

because it will improve the costs.’ But then what you don’t

have an ability on — again, you may find another location

which is slightly better than the location which I found

which would certainly improve your costs. Let’s say even if

you have just hundred of locations, and if you have ten

warehouses -- I can’t do math in my head right now, but I

believe the number of possible combinations where you

could put ten warehouses out of a hundred locations is

enormous, more than trillions of different combinations.

Kirill: You’ve raised a couple of very interesting points when you

were describing this problem. So mathematical equations, in

my understanding, it might sound complex, it might sound

like Fourier transformations or some crazy high level

mathematics, but it’s actually not, right? Am I right in

saying that the mathematical equations you’re talking about

are very straightforward, like eleventh grade or tenth grade

mathematics? Is that right?

Artem: Yeah. It’s pretty much right. So, the equations are pretty

simple. The trickiness is to formulate this problem into a

form that this optimisation will understand. And there are

also pitfalls. The most common approach is to use linear

programming. That’s when all the equations and constraints

are set up in a linear form. That’s the easiest way to solve it

because there are lots of algorithms that just basically

correct this problem easily if it’s formulated in a linear way.


But then some problems, or most of the problems in reality,

are non-linear in nature. And there are a few ways how you

can approach that. So, one way—there are tricks. You can

basically transform a non-linear problem into a linear form

just using some binary variables, using some tricks. Then

you can use non-linear optimisation techniques, but that’s

slightly harder to use. And finally, you can use something

like genetic algorithms.

Kirill: Yeah, well familiar with those. Those are very popular in the

financial world. So can you give us an example of a non-

linear problem and a trick that you would use to change it

into a linear problem? I know it must be a hard question,

but something simple just so that we get a better

understanding of what you mean by non-linear.

Artem: Sure. So let’s say you have a fixed cost for a warehouse. I’m

just going to stick with this warehouse example. Which

means that if you are using a warehouse, if you are putting

a warehouse in a certain location, then there are certain

costs associated. There are certain variable costs which are

dependent on your throughput, so the more commodities

you transport via this warehouse, the more you pay.

Because it’s handling, it’s inventory costs, etc. So, it depends

on your throughput.

Then there are also things like your fixed costs. So basically

what it means is that if you have a warehouse, whether you

rent it or whether you own it, you pay some money,

irrespective of how much you use it. So whether your

throughput is 1,000 tons or 100,000 tons, you would still

pay the same amount of money to use this warehouse. It’s a

fixed cost. It’s a fixed cost on the business and it’s effectively

non-linear in nature, while variable costs are linear. So let’s


say if you have a unit cost of $1 per ton, then if you have

1,000 tons of throughput it will be $1,000; if you have

100,000 tons of throughput it will be $100,000. That’s

linear. But then fixed costs, including incorporating fixed

costs into this formulation, basically non-linear.

And then there is a technique, like a trick which you can use

to transform this non-linearity into a linear problem. So

what you can do is you can introduce a separate binary

variable which says ‘Okay, if we have a warehouse in this

location then it’s 1; if we don’t have a warehouse in this

location then it’s 0.’ And then what you do is you multiply.

So you use a product of this binary variable and your fixed

cost, so a fixed cost is $25,000 per month for one

warehouse. And then what you do, a model we’ll choose, for

each location we’ll choose either 1 or 0 whether we’d like to

put a warehouse or not, and then you multiply this by your

fixed cost. If it’s 1 then it will be multiplied by 25,000, so it

will be a fixed cost. If it’s 0, it will be 0.

Kirill: So that’s how you transform a non-linear problem into a

linear one?

Artem: Yeah, so you just introduce additional variables, and most

often it’s binary variables which basically introduce some

additional logic side.

Kirill: Okay. Yeah. No, that totally makes sense and it’s actually a

very interesting example. I think I learned a bit about that

myself like that. You don’t think about it but really, these

constant costs for the warehouse in this scenario, they are

non-linear. So they don’t increase with your throughput. So

you do need to come up with a way to deal with them. So

that might be a handy trick. Thanks for that. It kind of


reminds me of dummy variables in regression, when you

have a categorical variable in your regression and you need

to introduce a dummy variable like 1 or 0.

Artem: Yeah, exactly. And another good example is when you have,

let’s say, a constraint which has to be MAD, not

simultaneously but let’s say if there is a condition that the

first constraint is MAD, then the second constraint has to

also be MAD. There are tricks how to transform this non-

linear logic into a linear set of equations as well. I don’t

remember from the top of my head how exactly to do it, but

again, introducing one or two binary variables can solve this,

can transform it into a linear problem.

Kirill: All right, thanks. It sounds like a very interesting field. So

we’ll get back to that. I have some other questions in terms

of career wise. But also I wanted to make a comment on

what you said earlier, that it is so easy that people,

especially business owners who do not use data and linear

data science in their decision making process are making an

unforgivable mistake because computers have developed so

rapidly and also algorithms, with the computers, have

developed so rapidly over the past decade that you should be

using them.

One of the things that pops to mind on that topic is that

before, back in the day, decision trees, when they were first

brought into life, they were popular. But then they kind of

died off because more sophisticated algorithms took their

place, like linear regressions, logistic regressions, support

vector machines, and so on. But now, decision trees, even

though they’re not as powerful, because the computers are

getting so powerful, now we’ve got algorithms like Random

Forests or gradient boosting, which actually employ those


previously used methods such as decision trees, but they

use them in ensemble way. So instead of having one decision

tree you have like 500 or 50,000 decision trees working for

you. And as an ensemble, they make better predictions than

one individual decision tree. So it is exactly the case that

both algorithms and computers have developed so rapidly

over the past decade that it is so easy to come up with a

model, or even to just hire somebody like BCG or any other

consulting firm to help you out, place those warehouses or

whatever you’re trying to solve using data science. So that

was a great comment and I totally agree with you on that

one. So moving back to what we started talking about, what

is your daily challenge? What is the most challenging thing

in your role?

Artem: I think the challenging thing, so very often I have projects

that use huge amounts of data that I need to handle. So, for

example, my previous project involved something like over

30-40 different data sets that I had to manage pretty much

on a daily basis. So I need to remember what kind of

information is located in which data set. I need to remember

how to link all these data sets, what do these different fields

mean. If I need to pull out some additional information that I

didn’t have in my analytics data set before, which original

data set do I go into. And that was quite tough with regard

to the challenge to handle just because I was the only

person on this case doing the data analytics and Advanced

Analytics stuff. Yeah, that was quite challenging. And the

way I overcame it is that I just used—like, I built in very,

very quick additional tools for myself, very basic ones in

Excel where I basically just had a list of all the data sets I

had with the correspondent business owner from the client


side, so who can I go back to if I had any questions or if the

data is slightly off. And then I had a list of comments across

fields and general comments like ‘Oh, this field is not

reliable. Don’t use it,’ etc.

Kirill: Okay. And is that something that you continue doing on new

projects now?

Artem: It depends. Sometimes it’s not. I generally don’t work with

lots of data sets now that I’m at BCG. So if I have, let’s say, a

supply chain optimisation case, of course you need data,

you need things like transportation data to understand what

the historical price rates are, just as an example. But

usually, I have someone else do it for me and then I just use

these calculations for my modelling. So I don’t usually work

with large data sets, I just only use this technique or trick if

I have very, very large data sets to work on.

Kirill: Okay. All right, that’s a good example of a challenge and

maybe some of our listeners can learn from that, that you

shouldn’t get lost in all the data sets that you have, so make

sure to keep track of them from the very, very start. And

now that we know a little bit more about exactly what you do

and this new style of analytics, I’m sure that a lot of our

listeners will find that this is a new kind of approach or a

new field in analytics that they haven’t explored before, this

simulation type of analytics, and Advanced Analytics. What

would you suggest, what would you say the one most

important thing is for somebody to look into to get into this

field? Because not everybody has to go through the same

pathway that you went through – data science through

Deloitte and learning R programming and so on. This

sounds like a field where you can get into even if you don’t

have a passion for R programming or Python or SQL, that


you could probably—just if you have that mindset, you

could probably get into this field. What would you say is that

one thing that people should focus on in order to break into

the field of Advanced Analytics?

Artem: Mindset is very important. Again, you have to like this thing

in order to start learning it and you incentivise yourself to

learn it, otherwise you have no chance. And first of all, for

example, the moment I saw this first simulation model,

which was an animated supply chain with trains moving

around, I literally loved it. I want to be doing this! I wanted

to build something like that! And at that time, I thought

about doing something like that in R, which was pretty

much impossible, and then I learned about these other

softwares that are available that can do these things and I

started to learn that, which helped me a lot.

And then, to some particular things, so for example, if we

take simulation modelling, you also need to know some

programming because all of the software that I know, they

are based on some kind of programming language. And the

one that I use is based on Karel, for instance. You don’t need

to know hardcore Karel, you don’t need to be a hardcore

Karel programmer, but you need to know basics. That’s the

minimum. Ideally, in the beginning, you need to have an

intermediate level of programming in that language which a

tool is based on.

Kirill: Okay, that’s a good one. And are there any open source tools

or softwares or maybe even just websites where people who

want to try their skills out in this field, they can go to or they

can download these open and free tools just to get a feel for

it, you know, like a playground? Can you suggest any tools

that are free?


Artem: As far as I know this is very commercial, commercially savvy

area, so all of the tools that I know about are commercial

and they are not free, unfortunately. So the one that I use,

for instance, is AnyLogic. They have a trial version which is

available on their website which is free. They also have a so-

called student or an educational version, which if you are a

student at university, and you are writing a coursework

which may require some simulation modelling, then they

can provide it for free, I believe as well, which you can try.

Then there is also a website. They have a website called

runthemodel.com which is the repository of the models built

in AnyLogic and it has models across all industries, whether

it’s supply chain, whether it’s manufacturing, whether it’s

finance. You can find lots and lots of different models there,

and I highly encourage you, if it is something that you might

want to look at, just go to this website and check different

models that they have just to get a feel of what it is and

whether it’s something that you may want to try or not.

And also, in order to do these kinds of things, you also need

to have business acumen. No one wants this models per se.

No one cares if you build a simulation model or optimisation

model. And consulting actually is very, very tough on that.

Like, 99 percent of consulting is all about delivering value

for our clients. And in 99 percent of cases, this value is

expressed in dollar terms. No one is interested in the

simulation or an optimisation model per se. Companies are

interested in how they can use these models, or the insights

from the analysis, to generate more revenue or to reduce

their costs. So that’s where my economic background and,

more broadly, my knowledge of how businesses operate

helps a lot. But then if you want to try something like that,


you also need to have this business acumen. So, no one is

interested in just the model. People are interested in what

they can do with these models, how these models can be

used to drive their profitability, increasing their revenue or

decreasing their costs, so that’s where knowledge of how

business operates helps.

Kirill: That’s awesome. So it’s very good advice. It’s very easy to get

carried away doing the analytics and not actually thinking

how the business is going to drive dollars. Because it might

sound a bit cynical, it might sound a bit too money-focused

and money-driven, but that’s the world we live in. We live in

a capitalistic world and a lot of the time, or most of the time,

people, especially businesses, are going to care about the

dollar value. So it’s very important when you’re building a

model to keep that in mind and, as you say, business

acumen helps a lot.

And the other thing that you mentioned, the

runthemodel.com, super excited about that. Everybody

who’s listening to this, jump on your browsers and go to

runthemodel.com and check out those AnyLogic models. I’m

going to personally do that as well. I’m really curious,

because I’ve seen some of those AnyLogics that you’ve

created, Artem. Those were very powerful and even very

exciting to look at. So I would love to see some more of that

and understand how they work as well.

Artem: And just to add to my previous point, I have heard one

saying once, which I quite like, and I would slightly

paraphrase it. So imagine that we have a chart where an X-

axis represents time, and the Y-axis represents level of

granularity of your work, so how deep you go into the rabbit

hole. Let’s say bottom of the chart being very, very granular


level of detail, and top of the chart being C-suite level, so like

CEO and CFO, etc. Did you imagine that? There may be

different opinions on that matter, but I say that you start

working as a data scientist, and especially in consulting.

You start very high on the Y-axis. So you start with the big

picture of the problem, and what are the business

implications, and then you go very deep into the data, into

the level of detail. You crunch the data, and you analyse it,

you derive some insights. Then you go back to the high level

with some preliminary insights or some results. You start to

check these, validate it at the high level. You go down, you

go back down to the number crunching and so on and so

forth. So you’re almost never in the middle. A lot of the time

you are spending cutting the trees in the bottom, if you like,

but you also jump high to see the whole forest and mustn’t

get lost in these trees. So that’s what I quite like in my area,

this velocity. You need to go down and then you need to go

up again. Pretty interesting.

Kirill: Yeah, that’s a great analogy. I’m just drawing it and yeah,

it’s how it looks.

Artem: Of course, in industries it may be slightly different. Like, you

won't go—if you have a boss, you won't bypass your boss

and go directly to the CEO or CFO to present your findings.

Unfortunately, you need to go to this middle level.

Kirill: Yeah, yeah, middleman. Thank you very much for that,

Artem. I just have two last questions. So the first one is,

where can our listeners find you, how can they follow you if

they want to learn more about your career and maybe

connect with you?


Artem: Probably LinkedIn would be the best option. So if they can

find my name, it's the name in LinkedIn. I wouldn’t expect

many people having my same name and surname as me

popping up, so hopefully you will find me pretty quickly.

Kirill: That’s great. We’ll leave that in the show notes at

SuperDataScience.com. You’ll be able to find the show notes

for this show and we’ll have a link to Artem’s LinkedIn there.

And one final question, what is the book? So we usually ask

about the book, but in this case we had a specific request

from Bo – big shout out to you, Bo in the U.S. – who is

interested in learning more about statistics and he would

like to know a book on statistics that could help him get into

the field and develop some advanced knowledge. Can you

recommend a book on statistics for our listeners, including

Bo?

Artem: Can you give me—sorry Kirill, can you give me a bit more

detail whether it’s like basic statistics or advanced statistics,

whether it’s a particular technique that Bo is interested in? I

have a few different options depending on what kind it is.

Kirill: So Bo, when we spoke with Bo, he said he was interested in

more of an advanced level of statistics, so his problem was

that his organisation uses a lot of—like, it presents findings

to—I think he was working with Microsoft, actually, like as a

consultant or something like that. And the findings that he

presented weren’t—like, the company that he was working

with didn’t like the results simply because he didn’t present

them in a statistical enough fashion. They didn’t have

distributions, he didn’t talk about standard deviations, so he

just gave them like numbers and charts, but the company

on the other end wanted some actual more deep statistical

backing and to actually prove that these were statistically


significant results. So something more on the advanced level

of statistics.

Artem: In this case, I can probably suggest the book "Statistical

Models" by David Freedman. It actually includes some basic

stuff as well, but it’s a good overview of all statistical models,

and in fact it’s one of the classic statistical books in a few

universities including Berkeley, so I would recommend it.

Then there are also lots of various books on different

techniques. This area is becoming very advanced and even

things like Random Forests, GLMs or boosted models, in fact

they can be—they will have their own books just devoted to

this one technique. So let’s say—if you’re interested in

GLMs, then there is a very good book called "Categorical

Data Analysis" by Agresti, I believe it’s pronounced, so have

a look at that if you’re interested.

Kirill: Okay, fantastic! So that’s "Statistical Models" by David

Freedman and "Categorical Data Analysis" by Agresti so I

will definitely put those into the show notes. Do you have

any final comments? Maybe other books or maybe

something that you’d like to wish our listeners on their way

into becoming data scientists as successful as yourself.

Artem: That’s a huge compliment from your side, Kirill! Thank you.

Look, I think my last piece of advice will be just don’t get lost

in the tricks because data science is a very handy area just

in terms of number crunching, and you can easily get lost in

the data, in the numbers, etc. And I saw many people do

that. But just remember somewhere in your head that you’re

all doing this just because there is a business problem

required to do this. What people and what company

executives are interested in is to how they can use your work

and the results of your work to improve their balance sheet


or to improve their profit and loss. And once you understand

that, once you will be able to understand what the business

problems are, and even identify business problems yourself,

so identifying the areas, just being proactive and identifying

areas where you can add value as a data scientist. Because

very often people don’t know how they can use data science

to improve their current operations. Actually, it’s part of

your job to tell them, ‘Look, we can do this or something, for

example, on customer segmentation which can allow us to

do this, this and this. This will allow us to improve our

market income plan, etc.’ Just be proactive, think about the

business problems that you can use your skills to solve, and

proactively engage with the business stakeholders to use

your analysis to solve these problems.

Kirill: Fantastic! Thank you very much. So guys, advice is

basically, to sum it up, keep the endgame in mind.

Remember, always remember, why you’re doing what you’re

doing. Thank you very much, Artem. I really appreciate you

taking this time out of your busy schedule to share your

knowledge and insights. This was a fantastic catch-up. I’m

very excited about this, and I’m sure a lot of our listeners

will learn so much from what you’ve shared. Thank you so

much.


Artem: Thanks, Kirill. It was a pleasure for me.

Kirill: And there you have it. I am still so excited about this

episode. I hope you derived so much value from here.

Personally, I learned a lot. Personally, for me, the most

mind-blowing thing was the whole concept of Advanced

Analytics and how it’s different to data science, and that you


don’t really need to develop those data science skills if you

want to get into advanced analytics. Yes, you will need to

know modelling. Yes, you’ll need to know a bit of stats. But

ultimately, you don’t have to go the same pathway that

Artem did. You don’t need to first study R programming, and

then do data science for two years, and then only discover

Advanced Analytics for yourself.

The website that Artem recommended, runthemodel.com, if

you check it out, so I had a quick look, but if you check it

out, you will see these models there that other people have

built. So you’ll see examples, and maybe that will inspire

you to research this type, or this field of – I won’t even call it

data science because it’s not data science – this field of

analytics that is completely different. And maybe you will be

so interested in it that you will decide to build your career

around that. So I highly encourage you to check out that

website, maybe get a trial for AnyLogic. And at the end of the

day it’s just a good thing to know that this part of analytics

exists.

And it’s interesting how we previously had the episode with

Dmitry Korneev, which was episode number 5, and there we

learned about data science and forensics and fraud

investigation. Here we are also learning about a whole new

field, which is Advanced Analytics and Artem was kind

enough to take some time out of his day and show us a

glimpse from this field. And if you found it interesting, then I

highly encourage you to research it further and see if you

like it. And maybe this is something that you will decide to

some how include in your career.

And as always you can get the show notes at

www.superdatascience.com/7, so just a number 7. There


you’ll find the transcript for this episode, you’ll be able to

subscribe on iTunes and Stitcher. Also, at the bottom, leave

us a comment. Let Artem and I know how you felt about this

episode, what new things you learned about this episode.

And also you’ll find a link to Artem’s LinkedIn. Make sure to

hit him up, show him some love and connect with him,

follow his career. I’m sure he’s going to be up to some

extraordinary things in the coming years. I look forward to

seeing you next time. Until then, happy analysing.


sds podcast episode 7 with artem vladimirov · 2018-06-01 · kirill: this is episode number 7,...

Documents