sds podcast episode 375: utilizing oracle …...kirill eremenko: this is episode number 375 with...

SDS PODCAST

EPISODE 375:

UTILIZING ORACLE

CLOUD AS AN

ENTERPRISE,

SMALL BUSINESS,

OR DEVELOPER

http://www.superdatascience.com/375

Kirill Eremenko: This is episode number 375 with Senior Vice President

and Chief Technology Officer at Oracle cloud platform,

Greg Pavlik.

Kirill Eremenko: Welcome to the SuperDataScience podcast. My name

is Kirill Eremenko, Data Science Coach and Lifestyle

Entrepreneur. And each week we bring inspiring

people and ideas to help you build your successful

career in data science. Thanks for being here today,

and now let's make the complex simple.

Kirill Eremenko: Welcome back to the SuperDataScience podcast

everybody. Super pumped to have you back here on

the show, because today we're talking about the cloud.

As data scientists, we don't often think about the

concepts and mechanics behind the cloud that we

often do use for our computations and data storage.

We don't often stop to think what it's all about, what

are the different vendors, how does it all work, what is

the future, what are the trends in this space? Today is

a great episode to educate yourself. I personally was a

learner on this episode, I was learning and soaking up

all this knowledge. And who is better positioned to

teach about the cloud than the Senior Vice President

and Chief Technology Officer at Oracle Cloud

Infrastructure, Greg Pavlik.

Kirill Eremenko: So in this episode you will learn a ton about the cloud.

For instance we'll talk about AutoML and what it

means for the future of data science. The trends in

data science, for example big data and how we saw the

rise and fall of Hadoop, the number of data scientists

growing in the world, natural language processing, and

why it is starting to dominate cloud computations.


Data science and business intelligence, and what that

intersect means for the profession. Small data versus

big data, and much, much more.

Kirill Eremenko: So this is an episode to jump in and learn. It will at

times, or it may at times feel complex. I definitely

found it quite complex in certain areas, but that's why

I asked a lot of questions. And this is an opportunity

to educate yourself about the cloud and understand

the future, understand where all these trends are

going, and take your professional skills as a data

scientist to the next level in a domain that we

constantly use for our work. And so on that note, I

can't wait for you to check out this episode without

further ado, I bring to you Greg Pavlik, who is the

Senior Vice President and Chief Technology Officer at

Oracle Cloud Infrastructure.

Kirill Eremenko: Welcome back to SuperDataScience podcast

everybody. Super pumped to have you back here on

the show. Today I've got a very special guest, Greg

Pavlik calling. I believe you're from the West Coast of

the U.S, right Greg?

Greg Pavlik: Yep, yep. We're in the Bay area. So good place to be for

technology, good place to be for machine learning.

Kirill Eremenko: Fantastic. How long have you been there for?

Greg Pavlik: About 12 years now.

Kirill Eremenko: 12 years?

Greg Pavlik: Yeah. Though we're not really long timers, we showed

up from the East Coast about 12 years back. We're

from the New Jersey area.


Kirill Eremenko: What made you move?

Greg Pavlik: Work. Yeah, yeah. I got to a point where I flew out once

a month, then it was every two weeks, then it was

every week. So after about a year of flying out weekly-

Kirill Eremenko: Wow.

Greg Pavlik: ... we decided it was time to move. Yeah, I think it was

50 out of 52 two weeks out of the year I was on the

road.


Greg Pavlik: Now because of the pandemic I don't travel at all. But

the baseline for travel now is usually mostly around

the West Coast and mostly once a month.

Kirill Eremenko: Very interesting. I met once a gentleman who was

flying in flying out to mining sites, also like every week.

Because he was a dragline operator, those massive

machines. And they're very rare or hard to come by,

these people hardly return. So this mining center was

flying him in and out, every week for seven years. So

yeah, when I saw him on the plane I felt a little bit

sorry for him, because the shape of his back perfectly

fit into the seat of the plane.

Greg Pavlik: What I found when I was traveling all the time is, you

get out on Monday morning on the 6:00 AM flight, and

it's the same people, every week, week after week. They

just had their pattern. A lot of consultants, sometimes

managers, but it's not a good lifestyle if you can avoid

it. I definitely recommend something a little bit more

stable.


Kirill Eremenko: Yeah, I got you. And do you miss it now with the

pandemic that you have to stay at home? Is it

something you reminisce?

Greg Pavlik: Well, the big issue for me is, I don't miss travel, but it's

more of the face-to-face teamwork. I mean, one of the

things I've always felt is, the whiteboard is hard to

beat, is the number one engineering tool. And I've still

not found a great substitute in a face-to-face

conversation for a whiteboard.

Kirill Eremenko: That's interesting.

Greg Pavlik: And there's that social capital you build talking to

people, when you're really in the same room, sharing a

cup of coffee. Then the other problem is, and I think

this is one that people underestimate, the value of

these ad hoc hallway conversations, especially not so

much when you're trying to do a technical problem,

but when you're trying to work across teams and get

teams to coordinate, keeping people on the same page.

There's a lot that happens informally. And it's very

difficult to do the informal thing, when you have to

start a Zoom meeting in order to start a conversation. I

think there's a lot of discussions that just don't

happen or if they do happen, they're email exchanges

that can be interpreted in different ways.

Greg Pavlik: So that's been a bit of a tax. The flip side is, and this is

actually a concern I have, is people seem to be working

more hours now more than ever. Because about a

week ago, we gave everybody a mandatory day off in

the organization. And we'll probably do that again in

another four to six weeks just to let people pace


themselves. Because there's this tendency, get up in

the morning, login, start working, then take a break to

get something to eat, work, work, work, work, another

break to eat, work, and then next, your day is over. It's

great in terms of trying to advance the ball and moving

things forward, up until you start to hit burnout. So

we're really trying to figure out ways to keep people

productive, but also make sure they don't wear

themselves out.

Kirill Eremenko: Absolutely, yeah. Definitely something. Our team is

fully remote. So definitely it was something we noticed

as well. People need to take a break.

Greg Pavlik: Yeah. One of the things we've been trying to do is start

to take lessons from companies and organizations that

do work fully remote all the time. We have some people

that have come in through open source communities,

that we're trying to adopt best practices from open

source, especially from the Apache Software

Foundation, in terms of how we do our internal

development. That's helping, I think improving things

from a quality perspective, overall. But learning more

from organizations that have been, especially

companies that have been remote full time, is

something that we're working on as well. It's really

important.

Greg Pavlik: And it's not the same. Things go reasonably well, I

think as people adapt, but as far as really getting

things dialed in and making sure that we're keeping

the bar high, from a quality perspective, and from a

work life balance perspective, are probably the two

biggest challenges we have right now.


Kirill Eremenko: Absolutely. And coming back a bit to the points you

mentioned about the value of those ad hoc

conversations in the hallways, it was very interesting

to hear that coming from you, since you are in charge

of a big part of Oracle to do with the cloud. And one of

the goals is to move on premise to the cloud. Question,

do you think sometime in the future, maybe triggered

by this pandemic, maybe just over the course of time,

we will be able to come up with a solution, whether it's

VR or AR, where we will move those ad hoc talks, for

instance, we could all wake up and put on virtual

reality goggles and be walking around a virtual office?

Greg Pavlik: Yeah, it's possible. And I would certainly say the way

things have developed, there's a global search for

talent. You can't just go to any one country, any one

state and say, "Hey, this is the talent pool we want." So

I think that there's a strong potential for more and

more organizations to adopt VR for things like

international team integration. When you're local to an

office, though, I think there's just a human element

that it's hard to replace unless the VR gets

sophisticated enough that you can't distinguish

between reality and the virtual environment, I think

people are still going to want to have the face-to-faces.

Greg Pavlik: When I was at the last company I was at, our

management team was pretty distributed. But we

made a real point to get together at an offsite every

quarter, at least once a quarter. And it was a

interpersonal relationship dynamic that got

reestablished quarterly. And I think those are hard to

replace with current technologies. But yeah, I think


we're going to see a lot more technology evolution

toward facilitating better team dynamics. Right now, in

some ways, the state of the art seems to be Slack,

which Slack is great, but it's also a strange, interrupt

driven technology. It's not the same thing as I'm

walking down the hall, to get a cup of coffee, and I run

into someone. So you're already both out of the zone of

work and trying to get something else done that's not

quite as a thing as a hardcore problem solving focus.

So that kind of thing I haven't seen a way yet to really

replace.

Kirill Eremenko: Maybe Oracle can build something.

Greg Pavlik: Yep. [inaudible 00:10:25].

Kirill Eremenko: Gotcha. I hope you're enjoying this amazing episode,

we'll get straight back to it after this super quick

announcement. DataScienceGO Virtual. Have you

registered to attend yet? If not, make sure to check it

out datasciencego.com/virtual, the dates are coming

up, June 20th to 21st. It's a weekend. On Saturday

we've got talks and workshops for newcomers and

transitioners. And on the Sunday we've got talks and

workshops for practitioners and managers. So

whatever level you are, this is the virtual event for you.

And it's absolutely free. Yes, it's absolutely free. But

the number of seats is limited so apply to attend now,

you can find the event at datasciencego.com/virtual.

Come, enjoy the talks, have lots of fun, network with

your peers. Even if you don't manage to get in for

whatever reason, you will get the recordings afterwards

if you register for the event. Once again, the website is

datasciencego.com/virtual. No reason not to attend, no


reason not to register, so make sure to jump on this

opportunity, it's only a matter of days left until this

happens. And I look forward to seeing you there and

now lets jump straight back into this amazing episode.

Kirill Eremenko: Well, Greg, you are a senior VP and CTO at Oracle

Cloud Platform. What I'd love to dig in is to

understand your journey. So you've had a very

interesting career just by judging by your LinkedIn

and you spent over, I was counting, over 12 years in

Oracle in total. So-

Greg Pavlik: In total, yeah. I wound-

Kirill Eremenko: Could you walk us through-

Greg Pavlik: ... up here by accident to be honest.

Kirill Eremenko: Sorry?

Greg Pavlik: I say I wound up here by accident.

Kirill Eremenko: How did that happen?

Greg Pavlik: So my background is not actually in computer science.

It's really solid state physics and physical chemistry.

Kirill Eremenko: Oh, wow.

Greg Pavlik: And I took a job to develop high temperature ceramics

for satellite nose cones, back in the '90s in Colorado.

And I showed up at the job at day one, they said,

"Well, you can do this ceramics engineering work that

you've got prepped up and ready to go, or we need

people to do software development. And with this

project, we're building a simulation for a spacecraft,

really interesting stuff." I said, "Well..." And, "Oh, by


the way, we'll pay you more." And I said, "Well," I'll tell

you, I said, "I'm willing to take more money, but you

guys would be willing to put me through a master's in

computer science." So they said yes. And-


Greg Pavlik: ... I just shifted my focus quite a bit. But it was a

really great project actually, we developed basically a

simulation, not only the spacecraft, but also the full

space environment. So that when they took the actual

command and control hardware, and they plugged it

into the software simulation, it thought it was

controlling a spacecraft. And as the spacecraft was

doing things, moving solar panels or firing off reaction

control thrusters, the simulation was then producing

all the dynamics you would expect in the space

environment, the fully testing.

Greg Pavlik: So it was really, really a cool project. One of my

favorite work projects I've done in my career. And that

started me down the journey of software and wound

up going through a series of startups. The last one

before the first day at Oracle was a company called

Bluestone Software, which was early app server, in the

heyday of the dot com boom and app server mania.

And so we were one of probably four vendors at the

time that were pure plays in the app server side. The

incumbent that really won the day was BEA Systems

who had launched a server and then they eventually

were acquired by Oracle. So they wound up at Oracle

too.


Greg Pavlik: But when the dot com market busted, we wound up

being acquired by HP, that didn't go very well. And

Oracle was looking for a team of distributed systems

and middleware engineers to start to build out their

own app server platform. So I wound taking a job at

Oracle and thought [crosstalk 00:15:03].


Greg Pavlik: There were a couple of years and I wound up. Let's

just say it's been about 12 years in total.


Greg Pavlik: I think nine and a half the first go around and almost

three now in my second-

Kirill Eremenko: Yeah. You had a bit of a break from Oracle for some

time. What happened there?

Greg Pavlik: Yeah, I think we had gotten... When I joined Oracle the

first time, there were about 200 people in the

middleware division. By the time I left it was probably

between four and 5000.


Greg Pavlik: We really built that business up, both organically and

then incrementally by acquisition, and eventually

consolidated that whole Java middleware space

between the BEA acquisition and then Sun

Microsystems with Java itself. And we [crosstalk

00:15:49].

Kirill Eremenko: Sorry. What is middleware?


Greg Pavlik: Oh, middleware. Middleware is your connectivity

software that sits between the application logic and

your backend systems and databases. So app servers

or messaging systems, Kafka, in some sense,

Kubernetes is now playing the role of a middleware in

a lot of systems. I think the heavyweight app servers

have become largely displaced. People are moving more

toward containerized applications. But back in the day

for app development, modern app development it was

the Java Enterprise Edition App Server Environment

was the normative standard. And then that started to

get displaced by the open source Spring Framework.

And then, I think Spring while still around people have

gotten much more freeform in the technologies they're

using for app implementations.

Greg Pavlik: I mean, it was great journey, very interesting. We really

got to develop the market, the business. But we got to

a phase where, this was probably around 2011, late

2010, where Oracle was really focused on ingesting

and integrating all the acquisitions they had done and

consolidating their platform around the app portfolio.

Which is important work for the business, but I'm a

hardcore technologist at heart. And I was getting more

and more interested in the emerging big data segment.

And so it was clear that at the time to really go out and

work with Hadoop and HBase, and a bunch of other

technologies that were coming together in that whole

ecosystem, that that was really going to happen

outside the company.

Greg Pavlik: So I wound up getting hooked up with the team that

was spinning out of Yahoo that had built Hadoop from


day one. And building out one of the two pure plays in

the market around the big data, specific to the Hadoop

business system. So we went on a tear there, that

company IPOed remarkably fast. I think from inception

to IPO was probably three and a half years.


Greg Pavlik: And things were going quite well until I'd say 2016.

And there was a pretty dramatic shift. If you think

about Hadoop, it opened, I suppose important

evolutionary technology, it opened up a lot of new use

cases for non specialists really, say your typical

enterprise business to start to deal with both multi

structured data and very, very large data sets in ways

that they couldn't before. Economically couldn't ever

because the technologies really didn't cater to their use

cases but Hadoop opened up. The problem with

Hadoop was this big monolithic system that was hard

to stabilize, hard to run and just expensive. The open

source bits were really the least expensive part of the

equation because you had to rack and stack all these

machines, put them in your data centers or in Colo

pay for power all the time.

Greg Pavlik: And by 2016, I think people got comfortable enough

with the public cloud infrastructure, they began to

take the same data sets and just put them into object

storage, which in that case, you basically shift the

whole operational problem off to the cloud vendor, and

you're only really paying for what you use. The object

storage, it's pretty cheap. So-

Kirill Eremenko: What is object storage?


Greg Pavlik: Something like S3 in Amazon. Every cloud platform

has some variant, we just call ours the Optics Cloud

Service, at Oracle Cloud Infrastructure. Azure has had

a couple of different permutations in their

environment, but the latest they're calling it Azure

Data Lake Storage. But every cloud platform has this

ability to take binary objects and just put them into-

Kirill Eremenko: Okay. And so they're not... Whether S3 Amazon,

Azure, they don't use Hadoop, in the backend?

Greg Pavlik: You can. I mean, it's one option. So if you put the data

into object storage, you can spin up a Hadoop cluster,

pull it from object storage, process it, shut the cluster

down. It's a very heavyweight infrastructure to do that.

The approach we've taken... One of the things, when I

came into Oracle is, like I say, I really saw a lot of

value in this space for end users, on the one hand. On

the other hand, the technology just seemed really too

cumbersome and difficult to use. So what I wanted to

really do was step back and say, "How do we maintain

and preserve all the good parts of this ecosystem, but

eliminate the overhead, eliminate the cumbersome

nature of it? The unwieldy nature of it."

Greg Pavlik: So we've taken a very different approach. We have a

cloud service called Data Flow. And it uses Apache

Spark to do the data processing, which is the

dominant data crunching framework in that whole

Apache Hadoop ecosystem. But it's entirely clusterless.

It's not just serverless, it's clusterless. We pre-

allocated a bunch of resources in the backend, and all

you have to do as a user is say, "Okay, I want to run

this job, I want to use this much processing power,


and I want to touch this data." And then within

seconds or 10s of seconds, we're off processing

arbitrary workloads.

Greg Pavlik: But the beauty of it is, not only at the storage layer, do

you have nothing to maintain, or deal with as an end

user from an operational perspective, but even at the

data processing level. It's about as close as you're

going to get to a zero ops model. The difference with

Hadoop, you can do the same workload with Hadoop

over object storage but to spin up Hadoop clusters

probably takes five, 10 minutes. Like I say it's a lot of

overhead. And you really don't get any real benefits

beyond what you [inaudible 00:21:59] process with the

actual Spark packages.

Greg Pavlik: So we tried to take a look at this as a Gen two

approach to learn from what other people have done

both good and bad and scrapped the bad part. So I'm

pretty excited about this, I look at this as big data

done right, and really Oracle being the first vendor to

go out and not just utilize the open source technology

as it was designed for the on-premise data center, but

to really re envision it for cloud native use cases that

are actually tractable for real businesses. Enterprise

businesses, you go to your, say typical steel

manufacturing or insurance company and so forth,

you'll have specialists. You'll have people for example

in insurance that are good for data science, because

they come in with strong statistical backgrounds. But

you're not going to get the same kind of population of

technologists that you would have in an eBay or a

PayPal or backend for Apple, where people are doing


lots of data management, data crunching with a staff

that specializes in distributed systems, experts in open

source, fully resourced to keep this machinery

running.

Greg Pavlik: So I think that the goal is to not really lose anything in

terms of the capabilities that those companies can

bring to bear on the problems they're trying to

address, but at the same time, make it tractable for

pretty much universal population.

Kirill Eremenko: Got you. Wow, thank you for the description. I

remember in 2000, between 2012 and '14 or '15, I was

working at some point with a company that was about

to invest in the magnitude of 10s of millions of dollars

to spin up Hadoop on-premise cluster, and that's

when Hadoop was big, and cloud was only getting

bigger, only becoming popular and they were like,

"Should we go to the cloud? Should we make Hadoop

on-premise?" From what you just described, I gather

that the age of Hadoop has gone. It's had its rise, it

had its fall and now we're moving to something post

Hadoop.

Greg Pavlik: Yeah. I think that, like I say it was evolutionary

technology. I think it was important. But I think that,

and I'll be honest with you, the rise of the cloud,

cloud-based data lakes, I didn't see it happening in

2014. If you go back to 2014, Hadoop was in its

heyday. I think we IPOed in 2014 actually. So it was a

exciting year.

Kirill Eremenko: Good timing.


Greg Pavlik: But the cloud platforms at that point were seen as less

stable and less secure. So I think there was a lot of

skepticism that people were going to be able to take

mission critical datasets, and just have them live in

the cloud. I think by 2016, things have flipped over.

There was a lot of hardening, a lot of maturation and

the cloud platforms were starting to become the de

facto data lake infrastructure of choice. And I think

that's only continue to strengthen itself.

Greg Pavlik: So I think, yeah, the days of Hadoop are effectively

over it. But there's still, look, there's still organizations

that for one reason or another, are not able or ready to

make that transition into the cloud yet. And from an

on-premise scale out a multi structured data

management perspective, there aren't really good

alternatives to Hadoop. So there's still a market there,

and I think there will be for the foreseeable future. But

our mantra at the time was 50% of the world's data in

Hadoop in 10 years. And I think 50% of world's data

will wind up in the cloud, not in Hadoop.

Kirill Eremenko: Probably, probably.

Greg Pavlik: But again, all these markings and the stuff that

happened there, they were super important. I mean,

they really helped-

Kirill Eremenko: Oh, of course.

Greg Pavlik: ... to open up a tremendous amount of value for not

just the tech industry, but I think for all industries.

And that was one of the interesting things with the big

data landscape. We speculated at the time, that there

were certain industries that would be very heavily


investing in big data in a lot of industries, that

wouldn't. Actually it wasn't the case. Retail,

healthcare, finance, manufacturing, we had a really

strong presence across just about every vertical. So I

think a very important technology, we learned a lot

from, but now we're moving into a world, really where

there is a platform in the sense that you've got to

manage your data and be able to access it, keep it

secure, govern it. But the frameworks and tools that

you apply over top of that data set, highly variable.

Within an organization, the great thing about cloud

infrastructure is it doesn't really constrain you, you

can run whatever you want, and have it access the

data in the object store.

Greg Pavlik: So for example, we did the Serverless Spark

Infrastructure, it's one way to access the data. But it's

not the only way. You can bring in your own

frameworks, you could spin up a neural network and

grab GPUs, crunch the data through a whole bunch of

training exercises, release the GPUs when you're done

the training, and maybe a month later you're doing

something different. There's almost this infinite

flexibility that the cloud opens up in terms of the tools

that you can bring to bear to the problem domain. And

as you know, with, especially machine learning a lot of

evolution in the toolset. A lot of advances in

algorithms.

Kirill Eremenko: Yeah.

Greg Pavlik: And that'll continue apace.


Kirill Eremenko: And also helps smaller companies get started faster.

Because a lot of startups which are crunching huge

data sets and are also IPOing, not because they have a

huge team or lots of money to spend on servers. No,

because they can use Amazon servers.

Greg Pavlik: Yeah.

Kirill Eremenko: Or your servers.

Greg Pavlik: Or OCI. Yeah.


Greg Pavlik: No, that's absolutely the case. And like I said, with

Hadoop it was interesting patterns we were developing.

People wanted to start doing more with machine

learning and started to do more with, say, TensorFlow.

The problem was Hadoop assumed that storage and

compute were conjoined. They were having [inaudible

00:28:31]. So that we had, at the time seen

organizations that were going and they were buying

Nvidia appliances and they're sitting at next to a dupe

cluster copying a bunch of data out of the dupe cluster

in this Nvidia thing. And this was expensive and

unwieldly architectures to do what was becoming more

and more fundamental work. As they say, now, you're

on the cloud, I can spin up a neural network overtop,

instead of GPUs process the data. I don't pre-spend

anything. I spend what I use.


Greg Pavlik: There's a lot of flexibility. And I think the economics

tend to be much better if they're done in a controlled

way. I mean, the flip side to it is, if you get into the


cloud, and you're not careful about managing the

compute availability to your consumption for when

you're using it, but releasing it and then releasing it,

you can drive some pretty substantial bills. So-

Kirill Eremenko: Yeah, yeah. You got to be careful.

Greg Pavlik: ... this almost chips the problem in terms of operations

from keeping infrastructure running to managing the

financial, the organization. Which is healthy, I mean,

that's the way it should be.

Kirill Eremenko: Yeah, yeah. That's true.

Greg Pavlik: And I think the same thing now with a lot of data

science, is you get more and more teams looking

closely at the business problem, as opposed to the

algorithm problem. In, say a typical enterprise

organization. So these convergent trends are really

more and more toward meeting the goals of the

business versus trying to wrestle with the technology,

which is where we want things to be heading toward.

Kirill Eremenko: Those are two very valuable insights. Thank you for

that. So that Hadoop, one of Hadoop's problem was it

assume that storage and compute are together. By

separating those out, we have now cloud platforms,

which are much more efficient. And in addition, using

cloud platforms, allows the objectives of this data

science machine learning to be aligned with the

objectives of the business financially.

Greg Pavlik: Well, yeah. So I think the cloud element helps quite a

bit on the data science side. I think the other thing is,

the state of the toolset available to data scientists has


changed quite a bit. If I go back four years ago, you

didn't have things like Ubiquitous AutoML. So if I'm a

data scientist four years ago, even if I'm using a pre-

implemented algorithm, I still have to bring a lot more

art, this dark art of trying to do feature engineering,

algorithm selection, hyper parameter tuning. And if

you look at where things have progressed with the

availability of these AutoML capabilities, the

machinery and the tools around the data science

toolkit can do a reasonably good job, in many cases, as

good a job as humans to actually get you to production

of a good model.

Greg Pavlik: So then what does it mean for me as a data scientist?

It means as a data scientist, I spend less time trying to

do a lot of tweaking and tuning and instinctual

adaptation of the tools and libraries and more focus on

the actual data, understanding the data,

understanding the business problem, and moving

more and more into the business domain in terms of

getting a focus on better results. That to me has been

a big sea change, for sure. And we've been, I mean,

just again, not the biggest vendor specific per se, but

one of the great things about Oracle is after we did the

Sun acquisition, we got a large research organization.

Greg Pavlik: And so Oracle Labs, one of their main pillars of focus

is machine learning. And we work really closely with

the last group around AutoML toolkit, which we think

is getting pretty much better results than what you

can get in the public domain. But we package it

together with open source technologies and make it a

part of a collaborative platform. So if you come into


Oracle Cloud, you have a platform for data scientists

to work together as teams. But just built into it for

free, for all intents and purposes, you have all these

AutoML capabilities just as a default part of the

Python toolkit we provide.

Kirill Eremenko: Wow. Fantastic. Just before the podcast, your PR

director Victoria told me about the new division that

you're heading in data science and AI. Is that what

we're talking about now or is that something else?

Greg Pavlik: Yeah, we've started a fairly substantial investment.

Well, actually, Oracle has a lot of investment in

machine learning overall. It goes from labs, all the way

up through the apps. Now there's a whole division of

our applications organization that is basically just

developing models for domain problems specific to the

application so if you're doing HCM, HR type

applications we'll do resume matching. Or supply

chain optimization, all kinds of problems-

Kirill Eremenko: So products? Effectively.

Greg Pavlik: Yeah. We deliver... You consume the benefits of the

machine learning models, but you don't have to go

build them yourself.

Kirill Eremenko: Yeah, yeah.

Greg Pavlik: And that's always, I think that's clearly where we're

going to see the most uptake of machine learning from

end users. At the end of the day, it's the same thing,

you pick up your phone, and you've got image

recognition and all that. You've got billions of people


now, using machine learning models, but they don't

even know it.


Greg Pavlik: At the same time, at the cloud team, we've started up a

fairly significant investment around both data

scientists enablement within the cloud infrastructure.

So adjacent to the big data space, adjacent to the data

warehousing space. And that's really derivative of

acquisition that we did about two years ago,

datascience.com. So we brought in this platform that

allows you to take standard notebooks, standard

Python libraries, stand them up and make them

available for your team but it quirks an over layer

wrapper around it, that ties it into source code control,

helps you do easy model deployment. You get a

manager or a administrator-

Kirill Eremenko: What does-

Greg Pavlik: ... for the project.

Kirill Eremenko: What does that mean for data scientists?

Greg Pavlik: Well, so one of the things we saw a lot with data

scientists is that they love open source. There's a lot

out there for free, it's all great. And so they would grab

it, they'd put it on their laptop, they'd go grab some

data, and they'd start mucking around and building

models, and then pop out something and it's well,

three months later, we've got a great model, what were

the datasets used? How did you get here? What was

the history? Can I reproduce it? We want to, in some

ways, bring the more mature practices that you would


see in software development and apply them in, I'd say

non intrusive ways to the data scientist. So if you

come into our environment, you'll start up a session,

working on a notebook. It'll be all the tools and

libraries data scientists are familiar with. But you're-

Kirill Eremenko: With open source?

Greg Pavlik: Open source, yeah, for sure.

Kirill Eremenko: That's really cool.

Greg Pavlik: Yeah. We provide I mean, we do provide additional

libraries, we have this accelerated data science toolkit,

which is Python add-ons, makes it easy to connect to

the cloud resources. So if I want to do something like

access data in a cloud based data lake work, I want to

spin up GPUs to run algorithms more efficiently. Those

kinds of convenience tools are there, we have the

AutoML capabilities that I talked about before. And

then we also have a bunch of capabilities for model

explainability. Those are fairly in some visualization as

well.

Greg Pavlik: So we do add in and augment with IP that we've

developed, but there's nothing that constrains you to

use that. You can work with the open source tools. I

think the real benefit for teams is that now there's a

single environment, you can share notebooks, you can

publish models into a model catalog. So you start to

bring all this governance and control and source code

management into an environment. So as a data

scientist, you don't really lose anything you have

everything you like, and you're familiar with. But at

the same time, if you're running a data science project,


now you've got a little bit more accountability and I

think much better collaboration and consistency.

Kirill Eremenko: What would you say to the comments which I've heard

in various forms, previously, quite a few times that,

Oracle is more suited for larger organizations that have

a large budget or enterprise level companies. Is Oracle

suitable or beneficial and some of the things you're

talking about are amazing. I don't have to have GitHub

separately to my Jupiter notebooks to where I'm

storing the data, all that is integrated. That'd be really

cool. But what if I'm a small organization, a startup

type level. Can I also get the benefit of these tools?

Greg Pavlik: Yeah. So it's a great question. I mean, so first of all, if

you look at Oracle historically, that's substantially

true. The statement you made is pretty accurate. The

cloud business, we built it from scratch, denovo. And

we did it with the intention of providing a hyper scale

cloud that is as accessible as an Amazon or an Azure

or Google. And that was the assumption from day one.

So if you want to come in as a developer, there's a free

tier, you can get started. It doesn't cost you anything.

If you're a small organization, it's really easy to get

bootstrapped, you can get on board with a credit card

and start to work in the environment.

Greg Pavlik: So there is a certain sense in which the historical on-

premise portfolio really was targeted more at the

enterprise level, a step up of the SMB segment. I don't

think that's true for the cloud. In the cloud, clearly, we

want to be the best at the enterprise game. And that's

really not the strengths of the other players in the

cloud market. But at the same time, you'll never get


there with the enterprise unless you win the hearts

and minds of developers, and really your average user.

And what you'll see now is with the cloud capabilities,

our customer profile has shifted quite a bit.

Greg Pavlik: So there's a lot of customers that were never going to

be large Oracle customers or even small Oracle

customers, which have been onboarding into OCI. Lots

of startups, just imagine the machinery taking

advantage of our services, with a couple of reasons.

One, we again, even with the cloud overall, we had this

advantage of what we call a gen two approach. So we

brought in a lot of architects and implementers that

had worked on other hyper scale clouds, and the

traction for coming into work on OCI was, "You get a

chance to solve the problems that you realize you

couldn't solve because you had engineered your way

into a corner." So it was a clean room environment

where a lot of the engineers had an opportunity to

learn from the mistakes in the first generation and just

do a better job.

Greg Pavlik: So we wound up with both a more efficient

environment, especially strong at the network level.

But also pricing wise, I think it's more attractive than

the competitors. Again, because we have the ability to

do a more streamlined implementation, really, at the

base IS level. So that's been a real boon for us in terms

of just attracting a new set of users into the cloud. It's

not just startups, so not just small businesses, I

mean, it's also individuals and developers, students,

much different than what you would have seen

certainly five years ago in terms of the customer


spread that was typical for Oracle. The other thing, I

will say, this is true that in terms of the SMB segment,

not just at OCI, not just on our cloud-

Kirill Eremenko: OCI is Oracle Cloud Infrastrature?

Greg Pavlik: Oracle Cloud Infrastructure, yeah. So that's really our-

Kirill Eremenko: And that's the same as OCP?

Greg Pavlik: ... IM.

Kirill Eremenko: Oracle Cloud Platform?

Greg Pavlik: [crosstalk 00:41:26] a whole bunch of rebranding.

Kirill Eremenko: Okay.

Greg Pavlik: So the standard unified term that we use now is OCI.

Kirill Eremenko: Got you. Got you.

Greg Pavlik: All cloud services done right in the gen two approach. I

will say though, we've also picked up quite a few SMB

customers, small businesses, medium-sized

businesses, just in our SaaS portfolio as well. Partially

because that was a sweet spot for NetSuite which is

now a part of Oracle, but even in the more

conventional segments for Oracle Applications on the

SaaS side. Quite a few startups, quite a few younger

companies have gone with Oracle. A lot of competition

with Workday and others.

Kirill Eremenko: That's great. So by SaaS, you mean the applications

you mentioned like for instance, resume matching

those type of things? Ready products?


Greg Pavlik: Yeah. Your HR apps, all that could be financials, could

be supply chain management.

Kirill Eremenko: Okay. Okay. Very interesting. You actually answered

my next question, which was about the differences

with Amazon and Azure. Sounds like you've been

able... Because you're building it from scratch and

laser-

Greg Pavlik: I think there's two fundamental differences in my view.

One is at, we might say at the base infrastructure, at

the IS layer, we've had a chance to really do this clean

room gen two implementation. And if you start looking

at benchmarks you look at price performance. And in

fact, there's a new price calculator that supplement

Oracle's website. I mean, the differences are dramatic.

So that's been a big draw, not just for smaller

businesses, but large businesses that are getting these

huge bills from Amazon, you come in, you can do your

cost calculation, in some cases, save 10s of millions of

dollars.

Greg Pavlik: That's why you'll see companies like Zoom, or others

that are doing video conferencing or moving over to

OCI because they're getting better, much better cost

performance outcomes. On the one hand, on the other

hand, what those vendors are lacking and one of the

core strengths of Oracle has of course, always being

this enterprise readiness at the cloud infrastructure

level. From a security perspective, from a governance

perspective, from an accountability perspective, but

you marry that together with the apps and you really

have a complete environment to run the entirety of the

business. And today, to a large extent Amazon and


Azure are just missing, they don't have those core

capabilities moving up into that SaaS or apps tier. So

Oracle really does I think have the first cloud that I'd

be fair to classify as an enterprise cloud. All in.

Kirill Eremenko: Okay, very interesting. Do you think are they catching

up, Amazon and Azure?

Greg Pavlik: Well, who knows what's going to happen with

acquisitions? Organic development in this space is

hard. To build out an [inaudible 00:44:36] portfolio,

you're talking about... In the mature apps vendor

cases you're talking about decades of investment. And

even in quote unquote, startups that have come in

from a SaaS perspective, so Workday, Salesforce,

they're no longer young companies. So it's a big

investment over a long period of time. I doubt that

organic investments can fill those gaps for some of the

other competitors.

Kirill Eremenko: Got you. We've talked a bit about trends. And we

talked about big data or Hadoop for that matter,

having its rise and fall, cloud picking up, gen two

cloud. We talked about data science that with AutoML

data science is probably going to become more of a soft

skill type of profession where you need to do like get

the business knowledge and understand what the

questions are and how to communicate them. What

other trends are you seeing in the space of data

science or data management?

Greg Pavlik: Yeah. That's a great question. One is the number of

data scientists, functional data scientists. And I just

explored it. And that's great. Because it means


you're... Let me go back to, say we talked about 2014,

we used to talk about data scientists being unicorns.

The best you can do is go into university and hire

somebody with a PhD or master's in statistics, and

hope to train them up. The toolsets weren't really

there. So you had this really wonky problem, and

that's changed quite a bit. I mean, the tools that are

available have gotten a lot more sophisticated. And

then just the number of people that are capable of

doing meaningful work has exploded. That for us,

especially as vendors is great, because it means we

can bring more and more people into the platform, do

more and more useful workloads.

Greg Pavlik: The other thing is NLP. One of my leads for the

accelerated data science toolkit I mentioned, he likes

to say that text is now as fundamental for businesses

as instant floats were just 20 years ago. And it's a lot

of... It will be continued innovation, but the results

that we're seeing in terms of text summarization, topic

modeling, etc. I mean, they're infinitely better than

they were a few years ago. We've been doing a lot of

work with BERT and other techniques. And we expect

to see that continue to accelerate in ways that I think

businesses haven't even yet started to tap into. Think

about all the contracts, emails, documents, Word

documents.

Kirill Eremenko: Phone calls.

Greg Pavlik: Everything is sitting there waiting to be mined. And I

always like to say the real promise here from an

analytics perspective or from machine learning is that

you can start to answer the questions you didn't even


know you were going to be able to ask. And I think

that that's been a sea change over the last couple of

years. And we're doing... For example one of my

groups in the cognitive services I had a equation

heavily focused on text analytics. And we'll be looking

at applying that both inside of our own applications

more and more aggressively, but also just opening it

up to end users to use directly.

Kirill Eremenko: Very interesting. Why would you say that we are

seeing a rise of NLP?

Greg Pavlik: I think it's just the convergence of enough investment,

enough innovation and enough hardware based

acceleration that is almost like a perfect storm event.

But that's a big one. The other thing, as I say, people

are comfortable working with terabytes, petabytes of

data. Again, that was hard before. So I think this big

data thing continues to be important. But it's just not

constrained by a technology footprint that was hard to

utilize or stand up. That's certainly part of the cloud

trend that's enabling these use cases to unfold. I'm

trying to think.

Greg Pavlik: The other thing about this is we are seeing more and

more bleed over into the conventional BI analytics side

of the equation where you've got people who were

looking at business problems, but largely data

warehouses, largely SIPO oriented, that are starting to

also pull in and mind meld with data science groups.

So that's, again, pulling the core ML capabilities closer

into the lines of business in useful ways. I mean it's a

fantastic time to be working in this space right now.


Kirill Eremenko: Yeah, absolutely. So I'm really glad you mentioned

this, Business Intelligence, merging with data science

we're getting closer because yeah, a lot of times it

depends on your definition. People say data science

and they actually mean dashboards or they mean

Tableau and Power BI and those tools. It depends.

Greg Pavlik: Yeah, that's right. So that's a bit of confusion that's

going on as well. On the one hand, on the other hand,

that community is starting to draw from the work of

data scientists, more and more. So you will see ML

powered dashboards for sure. One of the things,

Oracle has got a large analytics business, the Oracle

Analytics Cloud, on our data science service, you can

publish models into model catalog, you can browse

and consume those models from within the analytics

tools. So you can start to build predictive analytics

directly into your dashboarding and reports in ways

that with more sophisticated models that you would

typically be able to do even just a year ago.

Greg Pavlik: So there's a kind of, it's not so much a convergence.

Just think about a Venn diagram, and you'll see an

area of overlap, an area of synergy. But at the same

time, I don't see the world of Tableau specialists

suddenly becoming data scientists overnight either. I

think you'll see the intersection points. I should

mention, we talk a lot about big data, but also getting

really good at building good models with small sets of

data. There's more sophistication and transfer

learning.

Greg Pavlik: So while big data has played a role in terms of

acceleration of quality of models, we're seeing more


and more the case that you can do progressively good

models for your own specific problem domain with

relatively small data sets, which are often... For

example, let's say you're trying to deal with a problem

that is specific to an application that you've developed

in-house and you're collecting some data and that

you've got accessible within an operational database

under the app, there may not be tons of data there.

But if you can start to apply transfer learning

techniques, you can often exploit the smaller data sets

in conjunction with work that's already been done in

terms of initial seed training and get good results as

well.

Greg Pavlik: So I think you're going to see more attention paid to

how to get more effective models for specific problems

with less and less data as well. So I think that's going

to be another area that's going to be hugely beneficial

overall from an enterprise perspective.

Kirill Eremenko: Very interesting. So a lot of these things that we talked

about, again, going back to the question of enterprise

versus a smaller business, quite clear, and I even see

now that as a small business, I could come onto

Oracle and benefit from all of these features, especially

the gen two type of cloud. Question is, apart from the

compute side of things, if I have all these free tools

available to me, if I can technically do the things on

my laptop and I can get version control through free

software like GitHub and things or like tools like

GitHub, why would I choose Oracle and stick with

Oracle, as opposed to not choosing anything? And just

going with all the open source tools all the time?


Greg Pavlik: Yeah. I don't think it's either or. You want a hub in a

sense, where you can bring the work together. And you

want to make sure you've got the resources that you

need to actually do training effectively. And that

changes over time. So if you're trying to roll your own,

you're stuck in this static snapshot-

Kirill Eremenko: Okay, got you.

Greg Pavlik: ... versus you come into a cloud platform, you can use

all the open source stuff. But you're not constrained in

the same way. You can continue to evolve, you can

evolve from a hardware perspective, you can evolve

from a software perspective.

Kirill Eremenko: Got you.

Greg Pavlik: As I said, we take for example, on the data science side

of the equation, when we developed the data science

service, the idea was, make sure that you're not taking

anything away from data scientists. Foundationally

open source is the center of the model. And just make

sure that it works well so that you can do a well

managed, collaborative set of projects, on the one

hand, that you can share those models and outputs

with other parts of the business easily. And then you

can continue to just begin to leverage and uptake

every new wave of hardware, every new wave of

software as it becomes available. So for us it's a hub

that facilitate those things rather than a competition

with them.

Kirill Eremenko: Fantastic. And I love that because I worked a bit

with... I don't know if it's still around. There was a

provider called, of Hadoop, Greenplum. And they


acquired Pivotal which was I think a consulting firm.

And in order to work with their instance of our on

Greenplum, Hadoop, you had to learn not R but

Pivotal R and it was like-

Greg Pavlik: Kind of a data warehouse. Yeah.

Kirill Eremenko: Yeah. So all right that-

Greg Pavlik: Look, all these data warehouses do have legitimate

need to include libraries and capabilities for

algorithms explaining algorithms directly on the data. I

mean, you have data in a data warehouse, there's a

time and a place for that and all the major data

warehousing vendors provide that. But I don't think

that's also the general purpose data science problem. I

think that's a specialized problem specific to the data

warehousing domain.

Kirill Eremenko: Got you. Okay. Understood. So we talked a bit about

existing trends and things that are becoming hot or

important picking up traction. How do you see the

future? If we took a snapshot of the future, in three

years from now, not too far, but not too close. Three

years from now, what will the future of data

management look like?

Greg Pavlik: Well, okay, so let me go out a little further than that.

Kirill Eremenko: Sure.

Greg Pavlik: Because I think present trends are going to persist in

the near term. And like I said, I think we're really

focused on is driving people, more and more toward

zero ops model.


Kirill Eremenko: What is zero ops?

Greg Pavlik: Where you're not managing infrastructure.

Kirill Eremenko: Got it.

Greg Pavlik: We want people to basically say, "I've got data, I'm

going to be able to put the data under management

and I may be able to process it with the focus being on

problem solving, not on infrastructure." And I think

you're going to see that be one of our main focuses,

same thing with our data warehousing, autonomous

data warehouse. The idea here is that the data

warehouse is actually being run by machine learning

models by and large. So things that DBAs used to do,

index management, tuning and so forth. The data

warehouse is just getting better and better in doing

that itself.

Kirill Eremenko: Just to clarify, so data warehousing is the storage.

Data ops is the processing.

Greg Pavlik: Yeah, I think data warehousing is... Today when have

a data warehouse, you start up a database. So you

typically scale out multi node database. The ops

around the database, where most organizations is a

combination of IT and DBAs. We want to drive as

much of that overhead down to zero.

Kirill Eremenko: Okay, got you.

Greg Pavlik: So if you're in a relational data warehouse, we want to

make your focus be how do you get the most out of

your data? Not how do you invest the most on IT and

in running databases and tuning databases. When it

comes to these big data workloads, same idea. Put


your data in object store. Things like data flow with a

serverless implementation lets you get the value out of

the data without having to run a bunch of machinery

and maintain a bunch of big IT staff to keep a bunch

of clusters going.

Greg Pavlik: So I think that will continue over the next three to five

years, to be the major trend in the industry. I think

the workloads got a big head start on both those

dimensions. I think you'll see others start to follow suit

over time. The thing that... The reason I said let's look

out longer than that, I think ultimately where we want

to be is to think about the cloud as your database so

to speak. So you don't think about individual

technologies for storing the data. And ultimately, you

don't think about individual technologies for

processing the data. You just push your data to the

cloud, how and where it gets stored behind the cloud

interface is a entirely vendor problem. And then you

will more and more want to be able to just ask

questions about your data without having to project

into technologies that are very specific to data

processing.

Greg Pavlik: So you can imagine where I can come in, and I can

speak to my computer, which is hooked up to the

cloud, say, "Hey, I want to see how sales forecasts

were compared to actuals in North America for April."

And the result comes back. Almost like when you go

into Google, and you just type in, you type a search,

you get back a result. And the algorithms in Google

are trying to figure out as best they can what are the

most relevant results for your need, but you lack


precision. And you're, today at least, there's a degree

of personalization, but it's not hyper personalized. I

think over time, you'll be able to get almost the same

interaction that you have a Google except you'll be able

to ask very specific and very sophisticated questions

and get very specific and very sophisticated responses

back.


Greg Pavlik: A response maybe a spreadsheet that comes back.

Okay thinking of it-

Kirill Eremenko: Fantastic.

Greg Pavlik: ... I didn't have to say, it just knows I work with

spreadsheets, this is going to be the best outcome for

you as a user. And you're not looking at individual

databases, you're not looking at trying to parse

through abstruse data structures and so forth. That's

the level of sophistication that you're going to get out

of the cloud in another decade or so. And I think the

thing about that, it goes back to also is language

processing. If you think about speech, if you think

about text analytics, just being able to say something

how that interpreted. In some sense understood that

having that translated into optimal set of queries that

happen in the backend, and then coming back with an

optimal set of results that will largely be driven

through machine learning.

Kirill Eremenko: Well, thank you. That's a great vision. I have one more

topic that just popped to mind that I wanted to touch

on, 5G and Edge computing. And what I've heard, I'm

not an expert in this by any means, but what I've


heard is 5G is here to partially enable Edge computing

and Edge computing is computing things, for instance

Siri right now, it won't work if you have no internet

connection. But if we have on device computing, then

it will work. Whereas Edge computing is somewhere in

between. It's between the cloud and it's locally in your

area. So is Edge computing going to disrupt Oracle's

business model?

Greg Pavlik: No. I mean, I think in general, the capabilities of the

cloud will progressively look more and more like

they're just a part of the natural landscape we work in.

But you're still going to need to do a lot of core data

processing, a lot of core data management at scale,

within a centralized context. What the promise, at

least in the near term with Edge computing is, is that

you can start to externalize what you might call

auxiliary processing down toward devices. And I think

5G... I mean, 5G will be important because it's opening

up bandwidth. But it's also going to be processing

power at the Edge, which is going to be a determining

factor for what we can do overtime as well. But for

sure, I mean, we're certainly see quite a bit of model

execution occurring outside of a centralized context.

Kirill Eremenko: Is Oracle planning on becoming part of that Edge

computing game?

Greg Pavlik: Yeah. I mean, it's unavoidable now. So it's all part and

parcel. Right now we've got a whole bunch of stuff

around digital assistants, chatbots, and so forth.

Those things will be the first wave you'll see projected

more toward the Edge, app functionality, disconnected

modes, etc. Those are all going to be things that we'll


see moving more and more into the Edge. I still think,

it's not going to be either or. This is complimentary set

of developments which will allow us to do things that

frankly, as a matter of just what would have been

impossible today will be doable on the Edge but it's

unlikely anytime soon that you're going to supplant

the need for internal systems, centralized systems. I

think 30 years out, who knows? A different question.

But in the near term, I think that these are more or

less entirely complimentary.

Kirill Eremenko: Okay. Understood. Yeah. So that wraps up all my

questions and we're also running out of time. But I

wanted to ask you before we wrap up, a guidance.

Because a lot of people listening to this are data

scientists, aspiring data scientists who want to

progress their careers and learn as much as possible.

And personally, I've learned a lot from you today. For

me, it was a very insightful conversation to get up to

speed with the world of cloud because normally as a

data scientist, you don't think about it that much.

You're not up-to-date with these trends and things

that are going on.

Kirill Eremenko: So what was your recommendation or wish, if you

could make one wish for people listening to this or

data scientists, in terms of their relationship with the

cloud and them being up-to-date with what's going on

in the cloud. What would your recommendation be?

Greg Pavlik: Well, I think there's a lot of advantages to thinking

about the ability to have a hub so that teams can work

together, so you get more productive because the

better outcomes we get, the more audibility we get, the


more control the teams have, and traceability in terms

of libraries and versions, and so forth the more

ubiquitous the outputs from data scientists teams are

going to be in organizations that might otherwise have

been a bit conservative about accepting work that was

harder to understand provenance.

Greg Pavlik: And like I say, I think the ability to keep up with the

demand, the processing demands for doing a lot of

artificial intelligence work are going to be impossible

unless you progress rapidly or able to take advantage

of the latest hardware. So if I want the latest

generation of GPUs, yeah, some organizations will buy

them if they're building out large HPC clusters, things

like that. But for most businesses, it's just not

practical. I think that looking at the cloud as an

enabling tool and not as a... And it shouldn't be looked

at as an impediment. It doesn't take anything away. It

only makes the job easier to get good results as

opposed to being stuck in the laptop based world.

Greg Pavlik: The other thing I would say, just in general, for data

scientists is don't be afraid of getting close to the line

of business. Because again, the value of the technology

is what's going to drive investment which is what's

going to drive innovation. So we need to continue to

really be driving powerful outcomes. And I know as a

technologist, it's easy for me to just get excited about

technology. But on the other hand, we all need this

stuff funded. So getting closer and understanding the

business because we've seen... A couple of examples.

One of the areas Oracle has worked with was the

health system in the UK, and they brought a bunch of


machine learning algorithms in from our Oracle

Machine Learning Platform.

Greg Pavlik: And the turnaround there, they applied it to patient

outcomes, they applied it to the fraud detection, and

they were saving within, I think within a year with a

20 person team, something like a billion pounds plus.

It was a billion pounds plus, in net savings on a year

over year basis.


Greg Pavlik: So you can show those kind of results for an

organization, where you get this return on investment,

that you're just not going to see it through any other

mechanisms, that's really going to build up business

confidence that continue to invest and continue to

really make sure that this whole ecosystem is

becoming more and more of the mainstream.

Kirill Eremenko: Wow fantastic. Great advice. Thank you. Thank you

very much. Cloud doesn't take away from your

experience, but adds to it. And make sure to keep the

business objectives in mind. Greg, on that note, it's

been a huge pleasure. And before I let you go, could

you please help us out, where can we follow you, get in

touch or learn more about Oracle Cloud Platform or

Oracle Cloud Infrastructure?

Greg Pavlik: Yeah. So I have I guess, periodically. I don't do as good

a job as I should but I'll put up snippets and updates

and news of interest on LinkedIn. So it's probably the

easiest place to quickly follow what I'm up to, when I'm

not going to heads down in terms of our development

work. At Oracle Cloud, just easiest thing to do is just


go to Oracle Cloud and open up a free account and

start to play with it. I think people will be impressed

right off the bat.

Kirill Eremenko: Fantastic. Fantastic. And one final question, do you

have a book that you can recommend to our listeners?

Greg Pavlik: It depends where you're at, in terms of maturity in the

industry from a data science perspective. One of the

books that we've found to be pretty helpful for our

customers have been one of the O'Reilly books, Data

Science from Scratch. It's a Python oriented book, and

I think Python's others going down a little bit Python's

been on the upswing. So I think in terms of languages

to really try to get a mastery around from a data

science perspective, Python's pretty much where it's at

for today. Who knows, five years ago, or five years from

now if that's the case. And it really walks you through

building out algorithms, understanding how to really

get value from data in a fundamental way. So it's a

good starting point.

Kirill Eremenko: Great, thank you. Data Science from Scratch, right?

Greg Pavlik: Yep.

Kirill Eremenko: Got you. Data Science from Scratch by O'Reilly. On

that note, thank you very much, Greg, for coming on

the show. It's been a huge pleasure. And I personally

learned a lot, and I'm sure many, many other people

will too, as well.

Greg Pavlik: Yeah. Thanks for having me.

Kirill Eremenko: So there you have it everybody, that it was Greg Pavlik,

who is the Senior Vice President and Chief Technology


Officer at Oracle Cloud Infrastructure. And I hope you

enjoyed this episode as much as I did. And I hope you

learned quite a few things about the cloud and were

able to pick up on some of the interesting trends that

are going on in the world, what the future of the cloud

looks like, how to compare between the different

vendors, and why this service actually exists. What's

the purpose of encapsulating everything together? And

personally, that was my favorite part of the episode,

the whole notion of not just using open source tools,

but having a wrapper around them, that allows you to

scale with time because indeed, having your data on

the laptop only takes you that far. And then you need

to start thinking about, "Okay, how do I add cloud

services to this? How do I add traceability or

versioning of the different algorithms that I'm writing?

And also how, which data I'm using," things like that.

Kirill Eremenko: And to me, it sounds quite exciting that solutions like

this, like what Oracle is providing under object store

exist and can actually benefit the community. And I'm

curious too, as to what was your favorite part of the

episode. There was definitely lots of interesting gems

that Greg shared. And as usual, you can find all the

show notes at our website, at

superdatascience.com/375. That's

superdatascience.com/375. There you can find the

transcript for this episode, any materials were

mentioned on the show, plus the URLs to Greg's

LinkedIn and the Oracle Cloud Infrastructure website

where you can check out all the amazing things that

we talked about today.


Kirill Eremenko: And on that note, thank you so much for sharing your

time today with us and for being here and learning

together on this journey, hopefully the insights were

exciting and interesting to you and I look forward to

seeing you back here next time. Until then, happy

analyzing.


sds podcast episode 375: utilizing oracle …...kirill eremenko: this is episode number 375 with...

Documents