sds podcast episode 375: utilizing oracle …...kirill eremenko: this is episode number 375 with...
TRANSCRIPT
SDS PODCAST
EPISODE 375:
UTILIZING ORACLE
CLOUD AS AN
ENTERPRISE,
SMALL BUSINESS,
OR DEVELOPER
Kirill Eremenko: This is episode number 375 with Senior Vice President
and Chief Technology Officer at Oracle cloud platform,
Greg Pavlik.
Kirill Eremenko: Welcome to the SuperDataScience podcast. My name
is Kirill Eremenko, Data Science Coach and Lifestyle
Entrepreneur. And each week we bring inspiring
people and ideas to help you build your successful
career in data science. Thanks for being here today,
and now let's make the complex simple.
Kirill Eremenko: Welcome back to the SuperDataScience podcast
everybody. Super pumped to have you back here on
the show, because today we're talking about the cloud.
As data scientists, we don't often think about the
concepts and mechanics behind the cloud that we
often do use for our computations and data storage.
We don't often stop to think what it's all about, what
are the different vendors, how does it all work, what is
the future, what are the trends in this space? Today is
a great episode to educate yourself. I personally was a
learner on this episode, I was learning and soaking up
all this knowledge. And who is better positioned to
teach about the cloud than the Senior Vice President
and Chief Technology Officer at Oracle Cloud
Infrastructure, Greg Pavlik.
Kirill Eremenko: So in this episode you will learn a ton about the cloud.
For instance we'll talk about AutoML and what it
means for the future of data science. The trends in
data science, for example big data and how we saw the
rise and fall of Hadoop, the number of data scientists
growing in the world, natural language processing, and
why it is starting to dominate cloud computations.
Data science and business intelligence, and what that
intersect means for the profession. Small data versus
big data, and much, much more.
Kirill Eremenko: So this is an episode to jump in and learn. It will at
times, or it may at times feel complex. I definitely
found it quite complex in certain areas, but that's why
I asked a lot of questions. And this is an opportunity
to educate yourself about the cloud and understand
the future, understand where all these trends are
going, and take your professional skills as a data
scientist to the next level in a domain that we
constantly use for our work. And so on that note, I
can't wait for you to check out this episode without
further ado, I bring to you Greg Pavlik, who is the
Senior Vice President and Chief Technology Officer at
Oracle Cloud Infrastructure.
Kirill Eremenko: Welcome back to SuperDataScience podcast
everybody. Super pumped to have you back here on
the show. Today I've got a very special guest, Greg
Pavlik calling. I believe you're from the West Coast of
the U.S, right Greg?
Greg Pavlik: Yep, yep. We're in the Bay area. So good place to be for
technology, good place to be for machine learning.
Kirill Eremenko: Fantastic. How long have you been there for?
Greg Pavlik: About 12 years now.
Kirill Eremenko: 12 years?
Greg Pavlik: Yeah. Though we're not really long timers, we showed
up from the East Coast about 12 years back. We're
from the New Jersey area.
Kirill Eremenko: What made you move?
Greg Pavlik: Work. Yeah, yeah. I got to a point where I flew out once
a month, then it was every two weeks, then it was
every week. So after about a year of flying out weekly-
Kirill Eremenko: Wow.
Greg Pavlik: ... we decided it was time to move. Yeah, I think it was
50 out of 52 two weeks out of the year I was on the
road.
Kirill Eremenko: Wow.
Greg Pavlik: Now because of the pandemic I don't travel at all. But
the baseline for travel now is usually mostly around
the West Coast and mostly once a month.
Kirill Eremenko: Very interesting. I met once a gentleman who was
flying in flying out to mining sites, also like every week.
Because he was a dragline operator, those massive
machines. And they're very rare or hard to come by,
these people hardly return. So this mining center was
flying him in and out, every week for seven years. So
yeah, when I saw him on the plane I felt a little bit
sorry for him, because the shape of his back perfectly
fit into the seat of the plane.
Greg Pavlik: What I found when I was traveling all the time is, you
get out on Monday morning on the 6:00 AM flight, and
it's the same people, every week, week after week. They
just had their pattern. A lot of consultants, sometimes
managers, but it's not a good lifestyle if you can avoid
it. I definitely recommend something a little bit more
stable.
Kirill Eremenko: Yeah, I got you. And do you miss it now with the
pandemic that you have to stay at home? Is it
something you reminisce?
Greg Pavlik: Well, the big issue for me is, I don't miss travel, but it's
more of the face-to-face teamwork. I mean, one of the
things I've always felt is, the whiteboard is hard to
beat, is the number one engineering tool. And I've still
not found a great substitute in a face-to-face
conversation for a whiteboard.
Kirill Eremenko: That's interesting.
Greg Pavlik: And there's that social capital you build talking to
people, when you're really in the same room, sharing a
cup of coffee. Then the other problem is, and I think
this is one that people underestimate, the value of
these ad hoc hallway conversations, especially not so
much when you're trying to do a technical problem,
but when you're trying to work across teams and get
teams to coordinate, keeping people on the same page.
There's a lot that happens informally. And it's very
difficult to do the informal thing, when you have to
start a Zoom meeting in order to start a conversation. I
think there's a lot of discussions that just don't
happen or if they do happen, they're email exchanges
that can be interpreted in different ways.
Greg Pavlik: So that's been a bit of a tax. The flip side is, and this is
actually a concern I have, is people seem to be working
more hours now more than ever. Because about a
week ago, we gave everybody a mandatory day off in
the organization. And we'll probably do that again in
another four to six weeks just to let people pace
themselves. Because there's this tendency, get up in
the morning, login, start working, then take a break to
get something to eat, work, work, work, work, another
break to eat, work, and then next, your day is over. It's
great in terms of trying to advance the ball and moving
things forward, up until you start to hit burnout. So
we're really trying to figure out ways to keep people
productive, but also make sure they don't wear
themselves out.
Kirill Eremenko: Absolutely, yeah. Definitely something. Our team is
fully remote. So definitely it was something we noticed
as well. People need to take a break.
Greg Pavlik: Yeah. One of the things we've been trying to do is start
to take lessons from companies and organizations that
do work fully remote all the time. We have some people
that have come in through open source communities,
that we're trying to adopt best practices from open
source, especially from the Apache Software
Foundation, in terms of how we do our internal
development. That's helping, I think improving things
from a quality perspective, overall. But learning more
from organizations that have been, especially
companies that have been remote full time, is
something that we're working on as well. It's really
important.
Greg Pavlik: And it's not the same. Things go reasonably well, I
think as people adapt, but as far as really getting
things dialed in and making sure that we're keeping
the bar high, from a quality perspective, and from a
work life balance perspective, are probably the two
biggest challenges we have right now.
Kirill Eremenko: Absolutely. And coming back a bit to the points you
mentioned about the value of those ad hoc
conversations in the hallways, it was very interesting
to hear that coming from you, since you are in charge
of a big part of Oracle to do with the cloud. And one of
the goals is to move on premise to the cloud. Question,
do you think sometime in the future, maybe triggered
by this pandemic, maybe just over the course of time,
we will be able to come up with a solution, whether it's
VR or AR, where we will move those ad hoc talks, for
instance, we could all wake up and put on virtual
reality goggles and be walking around a virtual office?
Greg Pavlik: Yeah, it's possible. And I would certainly say the way
things have developed, there's a global search for
talent. You can't just go to any one country, any one
state and say, "Hey, this is the talent pool we want." So
I think that there's a strong potential for more and
more organizations to adopt VR for things like
international team integration. When you're local to an
office, though, I think there's just a human element
that it's hard to replace unless the VR gets
sophisticated enough that you can't distinguish
between reality and the virtual environment, I think
people are still going to want to have the face-to-faces.
Greg Pavlik: When I was at the last company I was at, our
management team was pretty distributed. But we
made a real point to get together at an offsite every
quarter, at least once a quarter. And it was a
interpersonal relationship dynamic that got
reestablished quarterly. And I think those are hard to
replace with current technologies. But yeah, I think
we're going to see a lot more technology evolution
toward facilitating better team dynamics. Right now, in
some ways, the state of the art seems to be Slack,
which Slack is great, but it's also a strange, interrupt
driven technology. It's not the same thing as I'm
walking down the hall, to get a cup of coffee, and I run
into someone. So you're already both out of the zone of
work and trying to get something else done that's not
quite as a thing as a hardcore problem solving focus.
So that kind of thing I haven't seen a way yet to really
replace.
Kirill Eremenko: Maybe Oracle can build something.
Greg Pavlik: Yep. [inaudible 00:10:25].
Kirill Eremenko: Gotcha. I hope you're enjoying this amazing episode,
we'll get straight back to it after this super quick
announcement. DataScienceGO Virtual. Have you
registered to attend yet? If not, make sure to check it
out datasciencego.com/virtual, the dates are coming
up, June 20th to 21st. It's a weekend. On Saturday
we've got talks and workshops for newcomers and
transitioners. And on the Sunday we've got talks and
workshops for practitioners and managers. So
whatever level you are, this is the virtual event for you.
And it's absolutely free. Yes, it's absolutely free. But
the number of seats is limited so apply to attend now,
you can find the event at datasciencego.com/virtual.
Come, enjoy the talks, have lots of fun, network with
your peers. Even if you don't manage to get in for
whatever reason, you will get the recordings afterwards
if you register for the event. Once again, the website is
datasciencego.com/virtual. No reason not to attend, no
reason not to register, so make sure to jump on this
opportunity, it's only a matter of days left until this
happens. And I look forward to seeing you there and
now lets jump straight back into this amazing episode.
Kirill Eremenko: Well, Greg, you are a senior VP and CTO at Oracle
Cloud Platform. What I'd love to dig in is to
understand your journey. So you've had a very
interesting career just by judging by your LinkedIn
and you spent over, I was counting, over 12 years in
Oracle in total. So-
Greg Pavlik: In total, yeah. I wound-
Kirill Eremenko: Could you walk us through-
Greg Pavlik: ... up here by accident to be honest.
Kirill Eremenko: Sorry?
Greg Pavlik: I say I wound up here by accident.
Kirill Eremenko: How did that happen?
Greg Pavlik: So my background is not actually in computer science.
It's really solid state physics and physical chemistry.
Kirill Eremenko: Oh, wow.
Greg Pavlik: And I took a job to develop high temperature ceramics
for satellite nose cones, back in the '90s in Colorado.
And I showed up at the job at day one, they said,
"Well, you can do this ceramics engineering work that
you've got prepped up and ready to go, or we need
people to do software development. And with this
project, we're building a simulation for a spacecraft,
really interesting stuff." I said, "Well..." And, "Oh, by
the way, we'll pay you more." And I said, "Well," I'll tell
you, I said, "I'm willing to take more money, but you
guys would be willing to put me through a master's in
computer science." So they said yes. And-
Kirill Eremenko: Wow.
Greg Pavlik: ... I just shifted my focus quite a bit. But it was a
really great project actually, we developed basically a
simulation, not only the spacecraft, but also the full
space environment. So that when they took the actual
command and control hardware, and they plugged it
into the software simulation, it thought it was
controlling a spacecraft. And as the spacecraft was
doing things, moving solar panels or firing off reaction
control thrusters, the simulation was then producing
all the dynamics you would expect in the space
environment, the fully testing.
Greg Pavlik: So it was really, really a cool project. One of my
favorite work projects I've done in my career. And that
started me down the journey of software and wound
up going through a series of startups. The last one
before the first day at Oracle was a company called
Bluestone Software, which was early app server, in the
heyday of the dot com boom and app server mania.
And so we were one of probably four vendors at the
time that were pure plays in the app server side. The
incumbent that really won the day was BEA Systems
who had launched a server and then they eventually
were acquired by Oracle. So they wound up at Oracle
too.
Greg Pavlik: But when the dot com market busted, we wound up
being acquired by HP, that didn't go very well. And
Oracle was looking for a team of distributed systems
and middleware engineers to start to build out their
own app server platform. So I wound taking a job at
Oracle and thought [crosstalk 00:15:03].
Kirill Eremenko: Wow.
Greg Pavlik: There were a couple of years and I wound up. Let's
just say it's been about 12 years in total.
Kirill Eremenko: Wow.
Greg Pavlik: I think nine and a half the first go around and almost
three now in my second-
Kirill Eremenko: Yeah. You had a bit of a break from Oracle for some
time. What happened there?
Greg Pavlik: Yeah, I think we had gotten... When I joined Oracle the
first time, there were about 200 people in the
middleware division. By the time I left it was probably
between four and 5000.
Kirill Eremenko: Wow.
Greg Pavlik: We really built that business up, both organically and
then incrementally by acquisition, and eventually
consolidated that whole Java middleware space
between the BEA acquisition and then Sun
Microsystems with Java itself. And we [crosstalk
00:15:49].
Kirill Eremenko: Sorry. What is middleware?
Greg Pavlik: Oh, middleware. Middleware is your connectivity
software that sits between the application logic and
your backend systems and databases. So app servers
or messaging systems, Kafka, in some sense,
Kubernetes is now playing the role of a middleware in
a lot of systems. I think the heavyweight app servers
have become largely displaced. People are moving more
toward containerized applications. But back in the day
for app development, modern app development it was
the Java Enterprise Edition App Server Environment
was the normative standard. And then that started to
get displaced by the open source Spring Framework.
And then, I think Spring while still around people have
gotten much more freeform in the technologies they're
using for app implementations.
Greg Pavlik: I mean, it was great journey, very interesting. We really
got to develop the market, the business. But we got to
a phase where, this was probably around 2011, late
2010, where Oracle was really focused on ingesting
and integrating all the acquisitions they had done and
consolidating their platform around the app portfolio.
Which is important work for the business, but I'm a
hardcore technologist at heart. And I was getting more
and more interested in the emerging big data segment.
And so it was clear that at the time to really go out and
work with Hadoop and HBase, and a bunch of other
technologies that were coming together in that whole
ecosystem, that that was really going to happen
outside the company.
Greg Pavlik: So I wound up getting hooked up with the team that
was spinning out of Yahoo that had built Hadoop from
day one. And building out one of the two pure plays in
the market around the big data, specific to the Hadoop
business system. So we went on a tear there, that
company IPOed remarkably fast. I think from inception
to IPO was probably three and a half years.
Kirill Eremenko: Wow.
Greg Pavlik: And things were going quite well until I'd say 2016.
And there was a pretty dramatic shift. If you think
about Hadoop, it opened, I suppose important
evolutionary technology, it opened up a lot of new use
cases for non specialists really, say your typical
enterprise business to start to deal with both multi
structured data and very, very large data sets in ways
that they couldn't before. Economically couldn't ever
because the technologies really didn't cater to their use
cases but Hadoop opened up. The problem with
Hadoop was this big monolithic system that was hard
to stabilize, hard to run and just expensive. The open
source bits were really the least expensive part of the
equation because you had to rack and stack all these
machines, put them in your data centers or in Colo
pay for power all the time.
Greg Pavlik: And by 2016, I think people got comfortable enough
with the public cloud infrastructure, they began to
take the same data sets and just put them into object
storage, which in that case, you basically shift the
whole operational problem off to the cloud vendor, and
you're only really paying for what you use. The object
storage, it's pretty cheap. So-
Kirill Eremenko: What is object storage?
Greg Pavlik: Something like S3 in Amazon. Every cloud platform
has some variant, we just call ours the Optics Cloud
Service, at Oracle Cloud Infrastructure. Azure has had
a couple of different permutations in their
environment, but the latest they're calling it Azure
Data Lake Storage. But every cloud platform has this
ability to take binary objects and just put them into-
Kirill Eremenko: Okay. And so they're not... Whether S3 Amazon,
Azure, they don't use Hadoop, in the backend?
Greg Pavlik: You can. I mean, it's one option. So if you put the data
into object storage, you can spin up a Hadoop cluster,
pull it from object storage, process it, shut the cluster
down. It's a very heavyweight infrastructure to do that.
The approach we've taken... One of the things, when I
came into Oracle is, like I say, I really saw a lot of
value in this space for end users, on the one hand. On
the other hand, the technology just seemed really too
cumbersome and difficult to use. So what I wanted to
really do was step back and say, "How do we maintain
and preserve all the good parts of this ecosystem, but
eliminate the overhead, eliminate the cumbersome
nature of it? The unwieldy nature of it."
Greg Pavlik: So we've taken a very different approach. We have a
cloud service called Data Flow. And it uses Apache
Spark to do the data processing, which is the
dominant data crunching framework in that whole
Apache Hadoop ecosystem. But it's entirely clusterless.
It's not just serverless, it's clusterless. We pre-
allocated a bunch of resources in the backend, and all
you have to do as a user is say, "Okay, I want to run
this job, I want to use this much processing power,
and I want to touch this data." And then within
seconds or 10s of seconds, we're off processing
arbitrary workloads.
Greg Pavlik: But the beauty of it is, not only at the storage layer, do
you have nothing to maintain, or deal with as an end
user from an operational perspective, but even at the
data processing level. It's about as close as you're
going to get to a zero ops model. The difference with
Hadoop, you can do the same workload with Hadoop
over object storage but to spin up Hadoop clusters
probably takes five, 10 minutes. Like I say it's a lot of
overhead. And you really don't get any real benefits
beyond what you [inaudible 00:21:59] process with the
actual Spark packages.
Greg Pavlik: So we tried to take a look at this as a Gen two
approach to learn from what other people have done
both good and bad and scrapped the bad part. So I'm
pretty excited about this, I look at this as big data
done right, and really Oracle being the first vendor to
go out and not just utilize the open source technology
as it was designed for the on-premise data center, but
to really re envision it for cloud native use cases that
are actually tractable for real businesses. Enterprise
businesses, you go to your, say typical steel
manufacturing or insurance company and so forth,
you'll have specialists. You'll have people for example
in insurance that are good for data science, because
they come in with strong statistical backgrounds. But
you're not going to get the same kind of population of
technologists that you would have in an eBay or a
PayPal or backend for Apple, where people are doing
lots of data management, data crunching with a staff
that specializes in distributed systems, experts in open
source, fully resourced to keep this machinery
running.
Greg Pavlik: So I think that the goal is to not really lose anything in
terms of the capabilities that those companies can
bring to bear on the problems they're trying to
address, but at the same time, make it tractable for
pretty much universal population.
Kirill Eremenko: Got you. Wow, thank you for the description. I
remember in 2000, between 2012 and '14 or '15, I was
working at some point with a company that was about
to invest in the magnitude of 10s of millions of dollars
to spin up Hadoop on-premise cluster, and that's
when Hadoop was big, and cloud was only getting
bigger, only becoming popular and they were like,
"Should we go to the cloud? Should we make Hadoop
on-premise?" From what you just described, I gather
that the age of Hadoop has gone. It's had its rise, it
had its fall and now we're moving to something post
Hadoop.
Greg Pavlik: Yeah. I think that, like I say it was evolutionary
technology. I think it was important. But I think that,
and I'll be honest with you, the rise of the cloud,
cloud-based data lakes, I didn't see it happening in
2014. If you go back to 2014, Hadoop was in its
heyday. I think we IPOed in 2014 actually. So it was a
exciting year.
Kirill Eremenko: Good timing.
Greg Pavlik: But the cloud platforms at that point were seen as less
stable and less secure. So I think there was a lot of
skepticism that people were going to be able to take
mission critical datasets, and just have them live in
the cloud. I think by 2016, things have flipped over.
There was a lot of hardening, a lot of maturation and
the cloud platforms were starting to become the de
facto data lake infrastructure of choice. And I think
that's only continue to strengthen itself.
Greg Pavlik: So I think, yeah, the days of Hadoop are effectively
over it. But there's still, look, there's still organizations
that for one reason or another, are not able or ready to
make that transition into the cloud yet. And from an
on-premise scale out a multi structured data
management perspective, there aren't really good
alternatives to Hadoop. So there's still a market there,
and I think there will be for the foreseeable future. But
our mantra at the time was 50% of the world's data in
Hadoop in 10 years. And I think 50% of world's data
will wind up in the cloud, not in Hadoop.
Kirill Eremenko: Probably, probably.
Greg Pavlik: But again, all these markings and the stuff that
happened there, they were super important. I mean,
they really helped-
Kirill Eremenko: Oh, of course.
Greg Pavlik: ... to open up a tremendous amount of value for not
just the tech industry, but I think for all industries.
And that was one of the interesting things with the big
data landscape. We speculated at the time, that there
were certain industries that would be very heavily
investing in big data in a lot of industries, that
wouldn't. Actually it wasn't the case. Retail,
healthcare, finance, manufacturing, we had a really
strong presence across just about every vertical. So I
think a very important technology, we learned a lot
from, but now we're moving into a world, really where
there is a platform in the sense that you've got to
manage your data and be able to access it, keep it
secure, govern it. But the frameworks and tools that
you apply over top of that data set, highly variable.
Within an organization, the great thing about cloud
infrastructure is it doesn't really constrain you, you
can run whatever you want, and have it access the
data in the object store.
Greg Pavlik: So for example, we did the Serverless Spark
Infrastructure, it's one way to access the data. But it's
not the only way. You can bring in your own
frameworks, you could spin up a neural network and
grab GPUs, crunch the data through a whole bunch of
training exercises, release the GPUs when you're done
the training, and maybe a month later you're doing
something different. There's almost this infinite
flexibility that the cloud opens up in terms of the tools
that you can bring to bear to the problem domain. And
as you know, with, especially machine learning a lot of
evolution in the toolset. A lot of advances in
algorithms.
Kirill Eremenko: Yeah.
Greg Pavlik: And that'll continue apace.
Kirill Eremenko: And also helps smaller companies get started faster.
Because a lot of startups which are crunching huge
data sets and are also IPOing, not because they have a
huge team or lots of money to spend on servers. No,
because they can use Amazon servers.
Greg Pavlik: Yeah.
Kirill Eremenko: Or your servers.
Greg Pavlik: Or OCI. Yeah.
Kirill Eremenko: Yeah.
Greg Pavlik: No, that's absolutely the case. And like I said, with
Hadoop it was interesting patterns we were developing.
People wanted to start doing more with machine
learning and started to do more with, say, TensorFlow.
The problem was Hadoop assumed that storage and
compute were conjoined. They were having [inaudible
00:28:31]. So that we had, at the time seen
organizations that were going and they were buying
Nvidia appliances and they're sitting at next to a dupe
cluster copying a bunch of data out of the dupe cluster
in this Nvidia thing. And this was expensive and
unwieldly architectures to do what was becoming more
and more fundamental work. As they say, now, you're
on the cloud, I can spin up a neural network overtop,
instead of GPUs process the data. I don't pre-spend
anything. I spend what I use.
Kirill Eremenko: Yeah.
Greg Pavlik: There's a lot of flexibility. And I think the economics
tend to be much better if they're done in a controlled
way. I mean, the flip side to it is, if you get into the
cloud, and you're not careful about managing the
compute availability to your consumption for when
you're using it, but releasing it and then releasing it,
you can drive some pretty substantial bills. So-
Kirill Eremenko: Yeah, yeah. You got to be careful.
Greg Pavlik: ... this almost chips the problem in terms of operations
from keeping infrastructure running to managing the
financial, the organization. Which is healthy, I mean,
that's the way it should be.
Kirill Eremenko: Yeah, yeah. That's true.
Greg Pavlik: And I think the same thing now with a lot of data
science, is you get more and more teams looking
closely at the business problem, as opposed to the
algorithm problem. In, say a typical enterprise
organization. So these convergent trends are really
more and more toward meeting the goals of the
business versus trying to wrestle with the technology,
which is where we want things to be heading toward.
Kirill Eremenko: Those are two very valuable insights. Thank you for
that. So that Hadoop, one of Hadoop's problem was it
assume that storage and compute are together. By
separating those out, we have now cloud platforms,
which are much more efficient. And in addition, using
cloud platforms, allows the objectives of this data
science machine learning to be aligned with the
objectives of the business financially.
Greg Pavlik: Well, yeah. So I think the cloud element helps quite a
bit on the data science side. I think the other thing is,
the state of the toolset available to data scientists has
changed quite a bit. If I go back four years ago, you
didn't have things like Ubiquitous AutoML. So if I'm a
data scientist four years ago, even if I'm using a pre-
implemented algorithm, I still have to bring a lot more
art, this dark art of trying to do feature engineering,
algorithm selection, hyper parameter tuning. And if
you look at where things have progressed with the
availability of these AutoML capabilities, the
machinery and the tools around the data science
toolkit can do a reasonably good job, in many cases, as
good a job as humans to actually get you to production
of a good model.
Greg Pavlik: So then what does it mean for me as a data scientist?
It means as a data scientist, I spend less time trying to
do a lot of tweaking and tuning and instinctual
adaptation of the tools and libraries and more focus on
the actual data, understanding the data,
understanding the business problem, and moving
more and more into the business domain in terms of
getting a focus on better results. That to me has been
a big sea change, for sure. And we've been, I mean,
just again, not the biggest vendor specific per se, but
one of the great things about Oracle is after we did the
Sun acquisition, we got a large research organization.
Greg Pavlik: And so Oracle Labs, one of their main pillars of focus
is machine learning. And we work really closely with
the last group around AutoML toolkit, which we think
is getting pretty much better results than what you
can get in the public domain. But we package it
together with open source technologies and make it a
part of a collaborative platform. So if you come into
Oracle Cloud, you have a platform for data scientists
to work together as teams. But just built into it for
free, for all intents and purposes, you have all these
AutoML capabilities just as a default part of the
Python toolkit we provide.
Kirill Eremenko: Wow. Fantastic. Just before the podcast, your PR
director Victoria told me about the new division that
you're heading in data science and AI. Is that what
we're talking about now or is that something else?
Greg Pavlik: Yeah, we've started a fairly substantial investment.
Well, actually, Oracle has a lot of investment in
machine learning overall. It goes from labs, all the way
up through the apps. Now there's a whole division of
our applications organization that is basically just
developing models for domain problems specific to the
application so if you're doing HCM, HR type
applications we'll do resume matching. Or supply
chain optimization, all kinds of problems-
Kirill Eremenko: So products? Effectively.
Greg Pavlik: Yeah. We deliver... You consume the benefits of the
machine learning models, but you don't have to go
build them yourself.
Kirill Eremenko: Yeah, yeah.
Greg Pavlik: And that's always, I think that's clearly where we're
going to see the most uptake of machine learning from
end users. At the end of the day, it's the same thing,
you pick up your phone, and you've got image
recognition and all that. You've got billions of people
now, using machine learning models, but they don't
even know it.
Kirill Eremenko: Yeah.
Greg Pavlik: At the same time, at the cloud team, we've started up a
fairly significant investment around both data
scientists enablement within the cloud infrastructure.
So adjacent to the big data space, adjacent to the data
warehousing space. And that's really derivative of
acquisition that we did about two years ago,
datascience.com. So we brought in this platform that
allows you to take standard notebooks, standard
Python libraries, stand them up and make them
available for your team but it quirks an over layer
wrapper around it, that ties it into source code control,
helps you do easy model deployment. You get a
manager or a administrator-
Kirill Eremenko: What does-
Greg Pavlik: ... for the project.
Kirill Eremenko: What does that mean for data scientists?
Greg Pavlik: Well, so one of the things we saw a lot with data
scientists is that they love open source. There's a lot
out there for free, it's all great. And so they would grab
it, they'd put it on their laptop, they'd go grab some
data, and they'd start mucking around and building
models, and then pop out something and it's well,
three months later, we've got a great model, what were
the datasets used? How did you get here? What was
the history? Can I reproduce it? We want to, in some
ways, bring the more mature practices that you would
see in software development and apply them in, I'd say
non intrusive ways to the data scientist. So if you
come into our environment, you'll start up a session,
working on a notebook. It'll be all the tools and
libraries data scientists are familiar with. But you're-
Kirill Eremenko: With open source?
Greg Pavlik: Open source, yeah, for sure.
Kirill Eremenko: That's really cool.
Greg Pavlik: Yeah. We provide I mean, we do provide additional
libraries, we have this accelerated data science toolkit,
which is Python add-ons, makes it easy to connect to
the cloud resources. So if I want to do something like
access data in a cloud based data lake work, I want to
spin up GPUs to run algorithms more efficiently. Those
kinds of convenience tools are there, we have the
AutoML capabilities that I talked about before. And
then we also have a bunch of capabilities for model
explainability. Those are fairly in some visualization as
well.
Greg Pavlik: So we do add in and augment with IP that we've
developed, but there's nothing that constrains you to
use that. You can work with the open source tools. I
think the real benefit for teams is that now there's a
single environment, you can share notebooks, you can
publish models into a model catalog. So you start to
bring all this governance and control and source code
management into an environment. So as a data
scientist, you don't really lose anything you have
everything you like, and you're familiar with. But at
the same time, if you're running a data science project,
now you've got a little bit more accountability and I
think much better collaboration and consistency.
Kirill Eremenko: What would you say to the comments which I've heard
in various forms, previously, quite a few times that,
Oracle is more suited for larger organizations that have
a large budget or enterprise level companies. Is Oracle
suitable or beneficial and some of the things you're
talking about are amazing. I don't have to have GitHub
separately to my Jupiter notebooks to where I'm
storing the data, all that is integrated. That'd be really
cool. But what if I'm a small organization, a startup
type level. Can I also get the benefit of these tools?
Greg Pavlik: Yeah. So it's a great question. I mean, so first of all, if
you look at Oracle historically, that's substantially
true. The statement you made is pretty accurate. The
cloud business, we built it from scratch, denovo. And
we did it with the intention of providing a hyper scale
cloud that is as accessible as an Amazon or an Azure
or Google. And that was the assumption from day one.
So if you want to come in as a developer, there's a free
tier, you can get started. It doesn't cost you anything.
If you're a small organization, it's really easy to get
bootstrapped, you can get on board with a credit card
and start to work in the environment.
Greg Pavlik: So there is a certain sense in which the historical on-
premise portfolio really was targeted more at the
enterprise level, a step up of the SMB segment. I don't
think that's true for the cloud. In the cloud, clearly, we
want to be the best at the enterprise game. And that's
really not the strengths of the other players in the
cloud market. But at the same time, you'll never get
there with the enterprise unless you win the hearts
and minds of developers, and really your average user.
And what you'll see now is with the cloud capabilities,
our customer profile has shifted quite a bit.
Greg Pavlik: So there's a lot of customers that were never going to
be large Oracle customers or even small Oracle
customers, which have been onboarding into OCI. Lots
of startups, just imagine the machinery taking
advantage of our services, with a couple of reasons.
One, we again, even with the cloud overall, we had this
advantage of what we call a gen two approach. So we
brought in a lot of architects and implementers that
had worked on other hyper scale clouds, and the
traction for coming into work on OCI was, "You get a
chance to solve the problems that you realize you
couldn't solve because you had engineered your way
into a corner." So it was a clean room environment
where a lot of the engineers had an opportunity to
learn from the mistakes in the first generation and just
do a better job.
Greg Pavlik: So we wound up with both a more efficient
environment, especially strong at the network level.
But also pricing wise, I think it's more attractive than
the competitors. Again, because we have the ability to
do a more streamlined implementation, really, at the
base IS level. So that's been a real boon for us in terms
of just attracting a new set of users into the cloud. It's
not just startups, so not just small businesses, I
mean, it's also individuals and developers, students,
much different than what you would have seen
certainly five years ago in terms of the customer
spread that was typical for Oracle. The other thing, I
will say, this is true that in terms of the SMB segment,
not just at OCI, not just on our cloud-
Kirill Eremenko: OCI is Oracle Cloud Infrastrature?
Greg Pavlik: Oracle Cloud Infrastructure, yeah. So that's really our-
Kirill Eremenko: And that's the same as OCP?
Greg Pavlik: ... IM.
Kirill Eremenko: Oracle Cloud Platform?
Greg Pavlik: [crosstalk 00:41:26] a whole bunch of rebranding.
Kirill Eremenko: Okay.
Greg Pavlik: So the standard unified term that we use now is OCI.
Kirill Eremenko: Got you. Got you.
Greg Pavlik: All cloud services done right in the gen two approach. I
will say though, we've also picked up quite a few SMB
customers, small businesses, medium-sized
businesses, just in our SaaS portfolio as well. Partially
because that was a sweet spot for NetSuite which is
now a part of Oracle, but even in the more
conventional segments for Oracle Applications on the
SaaS side. Quite a few startups, quite a few younger
companies have gone with Oracle. A lot of competition
with Workday and others.
Kirill Eremenko: That's great. So by SaaS, you mean the applications
you mentioned like for instance, resume matching
those type of things? Ready products?
Greg Pavlik: Yeah. Your HR apps, all that could be financials, could
be supply chain management.
Kirill Eremenko: Okay. Okay. Very interesting. You actually answered
my next question, which was about the differences
with Amazon and Azure. Sounds like you've been
able... Because you're building it from scratch and
laser-
Greg Pavlik: I think there's two fundamental differences in my view.
One is at, we might say at the base infrastructure, at
the IS layer, we've had a chance to really do this clean
room gen two implementation. And if you start looking
at benchmarks you look at price performance. And in
fact, there's a new price calculator that supplement
Oracle's website. I mean, the differences are dramatic.
So that's been a big draw, not just for smaller
businesses, but large businesses that are getting these
huge bills from Amazon, you come in, you can do your
cost calculation, in some cases, save 10s of millions of
dollars.
Greg Pavlik: That's why you'll see companies like Zoom, or others
that are doing video conferencing or moving over to
OCI because they're getting better, much better cost
performance outcomes. On the one hand, on the other
hand, what those vendors are lacking and one of the
core strengths of Oracle has of course, always being
this enterprise readiness at the cloud infrastructure
level. From a security perspective, from a governance
perspective, from an accountability perspective, but
you marry that together with the apps and you really
have a complete environment to run the entirety of the
business. And today, to a large extent Amazon and
Azure are just missing, they don't have those core
capabilities moving up into that SaaS or apps tier. So
Oracle really does I think have the first cloud that I'd
be fair to classify as an enterprise cloud. All in.
Kirill Eremenko: Okay, very interesting. Do you think are they catching
up, Amazon and Azure?
Greg Pavlik: Well, who knows what's going to happen with
acquisitions? Organic development in this space is
hard. To build out an [inaudible 00:44:36] portfolio,
you're talking about... In the mature apps vendor
cases you're talking about decades of investment. And
even in quote unquote, startups that have come in
from a SaaS perspective, so Workday, Salesforce,
they're no longer young companies. So it's a big
investment over a long period of time. I doubt that
organic investments can fill those gaps for some of the
other competitors.
Kirill Eremenko: Got you. We've talked a bit about trends. And we
talked about big data or Hadoop for that matter,
having its rise and fall, cloud picking up, gen two
cloud. We talked about data science that with AutoML
data science is probably going to become more of a soft
skill type of profession where you need to do like get
the business knowledge and understand what the
questions are and how to communicate them. What
other trends are you seeing in the space of data
science or data management?
Greg Pavlik: Yeah. That's a great question. One is the number of
data scientists, functional data scientists. And I just
explored it. And that's great. Because it means
you're... Let me go back to, say we talked about 2014,
we used to talk about data scientists being unicorns.
The best you can do is go into university and hire
somebody with a PhD or master's in statistics, and
hope to train them up. The toolsets weren't really
there. So you had this really wonky problem, and
that's changed quite a bit. I mean, the tools that are
available have gotten a lot more sophisticated. And
then just the number of people that are capable of
doing meaningful work has exploded. That for us,
especially as vendors is great, because it means we
can bring more and more people into the platform, do
more and more useful workloads.
Greg Pavlik: The other thing is NLP. One of my leads for the
accelerated data science toolkit I mentioned, he likes
to say that text is now as fundamental for businesses
as instant floats were just 20 years ago. And it's a lot
of... It will be continued innovation, but the results
that we're seeing in terms of text summarization, topic
modeling, etc. I mean, they're infinitely better than
they were a few years ago. We've been doing a lot of
work with BERT and other techniques. And we expect
to see that continue to accelerate in ways that I think
businesses haven't even yet started to tap into. Think
about all the contracts, emails, documents, Word
documents.
Kirill Eremenko: Phone calls.
Greg Pavlik: Everything is sitting there waiting to be mined. And I
always like to say the real promise here from an
analytics perspective or from machine learning is that
you can start to answer the questions you didn't even
know you were going to be able to ask. And I think
that that's been a sea change over the last couple of
years. And we're doing... For example one of my
groups in the cognitive services I had a equation
heavily focused on text analytics. And we'll be looking
at applying that both inside of our own applications
more and more aggressively, but also just opening it
up to end users to use directly.
Kirill Eremenko: Very interesting. Why would you say that we are
seeing a rise of NLP?
Greg Pavlik: I think it's just the convergence of enough investment,
enough innovation and enough hardware based
acceleration that is almost like a perfect storm event.
But that's a big one. The other thing, as I say, people
are comfortable working with terabytes, petabytes of
data. Again, that was hard before. So I think this big
data thing continues to be important. But it's just not
constrained by a technology footprint that was hard to
utilize or stand up. That's certainly part of the cloud
trend that's enabling these use cases to unfold. I'm
trying to think.
Greg Pavlik: The other thing about this is we are seeing more and
more bleed over into the conventional BI analytics side
of the equation where you've got people who were
looking at business problems, but largely data
warehouses, largely SIPO oriented, that are starting to
also pull in and mind meld with data science groups.
So that's, again, pulling the core ML capabilities closer
into the lines of business in useful ways. I mean it's a
fantastic time to be working in this space right now.
Kirill Eremenko: Yeah, absolutely. So I'm really glad you mentioned
this, Business Intelligence, merging with data science
we're getting closer because yeah, a lot of times it
depends on your definition. People say data science
and they actually mean dashboards or they mean
Tableau and Power BI and those tools. It depends.
Greg Pavlik: Yeah, that's right. So that's a bit of confusion that's
going on as well. On the one hand, on the other hand,
that community is starting to draw from the work of
data scientists, more and more. So you will see ML
powered dashboards for sure. One of the things,
Oracle has got a large analytics business, the Oracle
Analytics Cloud, on our data science service, you can
publish models into model catalog, you can browse
and consume those models from within the analytics
tools. So you can start to build predictive analytics
directly into your dashboarding and reports in ways
that with more sophisticated models that you would
typically be able to do even just a year ago.
Greg Pavlik: So there's a kind of, it's not so much a convergence.
Just think about a Venn diagram, and you'll see an
area of overlap, an area of synergy. But at the same
time, I don't see the world of Tableau specialists
suddenly becoming data scientists overnight either. I
think you'll see the intersection points. I should
mention, we talk a lot about big data, but also getting
really good at building good models with small sets of
data. There's more sophistication and transfer
learning.
Greg Pavlik: So while big data has played a role in terms of
acceleration of quality of models, we're seeing more
and more the case that you can do progressively good
models for your own specific problem domain with
relatively small data sets, which are often... For
example, let's say you're trying to deal with a problem
that is specific to an application that you've developed
in-house and you're collecting some data and that
you've got accessible within an operational database
under the app, there may not be tons of data there.
But if you can start to apply transfer learning
techniques, you can often exploit the smaller data sets
in conjunction with work that's already been done in
terms of initial seed training and get good results as
well.
Greg Pavlik: So I think you're going to see more attention paid to
how to get more effective models for specific problems
with less and less data as well. So I think that's going
to be another area that's going to be hugely beneficial
overall from an enterprise perspective.
Kirill Eremenko: Very interesting. So a lot of these things that we talked
about, again, going back to the question of enterprise
versus a smaller business, quite clear, and I even see
now that as a small business, I could come onto
Oracle and benefit from all of these features, especially
the gen two type of cloud. Question is, apart from the
compute side of things, if I have all these free tools
available to me, if I can technically do the things on
my laptop and I can get version control through free
software like GitHub and things or like tools like
GitHub, why would I choose Oracle and stick with
Oracle, as opposed to not choosing anything? And just
going with all the open source tools all the time?
Greg Pavlik: Yeah. I don't think it's either or. You want a hub in a
sense, where you can bring the work together. And you
want to make sure you've got the resources that you
need to actually do training effectively. And that
changes over time. So if you're trying to roll your own,
you're stuck in this static snapshot-
Kirill Eremenko: Okay, got you.
Greg Pavlik: ... versus you come into a cloud platform, you can use
all the open source stuff. But you're not constrained in
the same way. You can continue to evolve, you can
evolve from a hardware perspective, you can evolve
from a software perspective.
Kirill Eremenko: Got you.
Greg Pavlik: As I said, we take for example, on the data science side
of the equation, when we developed the data science
service, the idea was, make sure that you're not taking
anything away from data scientists. Foundationally
open source is the center of the model. And just make
sure that it works well so that you can do a well
managed, collaborative set of projects, on the one
hand, that you can share those models and outputs
with other parts of the business easily. And then you
can continue to just begin to leverage and uptake
every new wave of hardware, every new wave of
software as it becomes available. So for us it's a hub
that facilitate those things rather than a competition
with them.
Kirill Eremenko: Fantastic. And I love that because I worked a bit
with... I don't know if it's still around. There was a
provider called, of Hadoop, Greenplum. And they
acquired Pivotal which was I think a consulting firm.
And in order to work with their instance of our on
Greenplum, Hadoop, you had to learn not R but
Pivotal R and it was like-
Greg Pavlik: Kind of a data warehouse. Yeah.
Kirill Eremenko: Yeah. So all right that-
Greg Pavlik: Look, all these data warehouses do have legitimate
need to include libraries and capabilities for
algorithms explaining algorithms directly on the data. I
mean, you have data in a data warehouse, there's a
time and a place for that and all the major data
warehousing vendors provide that. But I don't think
that's also the general purpose data science problem. I
think that's a specialized problem specific to the data
warehousing domain.
Kirill Eremenko: Got you. Okay. Understood. So we talked a bit about
existing trends and things that are becoming hot or
important picking up traction. How do you see the
future? If we took a snapshot of the future, in three
years from now, not too far, but not too close. Three
years from now, what will the future of data
management look like?
Greg Pavlik: Well, okay, so let me go out a little further than that.
Kirill Eremenko: Sure.
Greg Pavlik: Because I think present trends are going to persist in
the near term. And like I said, I think we're really
focused on is driving people, more and more toward
zero ops model.
Kirill Eremenko: What is zero ops?
Greg Pavlik: Where you're not managing infrastructure.
Kirill Eremenko: Got it.
Greg Pavlik: We want people to basically say, "I've got data, I'm
going to be able to put the data under management
and I may be able to process it with the focus being on
problem solving, not on infrastructure." And I think
you're going to see that be one of our main focuses,
same thing with our data warehousing, autonomous
data warehouse. The idea here is that the data
warehouse is actually being run by machine learning
models by and large. So things that DBAs used to do,
index management, tuning and so forth. The data
warehouse is just getting better and better in doing
that itself.
Kirill Eremenko: Just to clarify, so data warehousing is the storage.
Data ops is the processing.
Greg Pavlik: Yeah, I think data warehousing is... Today when have
a data warehouse, you start up a database. So you
typically scale out multi node database. The ops
around the database, where most organizations is a
combination of IT and DBAs. We want to drive as
much of that overhead down to zero.
Kirill Eremenko: Okay, got you.
Greg Pavlik: So if you're in a relational data warehouse, we want to
make your focus be how do you get the most out of
your data? Not how do you invest the most on IT and
in running databases and tuning databases. When it
comes to these big data workloads, same idea. Put
your data in object store. Things like data flow with a
serverless implementation lets you get the value out of
the data without having to run a bunch of machinery
and maintain a bunch of big IT staff to keep a bunch
of clusters going.
Greg Pavlik: So I think that will continue over the next three to five
years, to be the major trend in the industry. I think
the workloads got a big head start on both those
dimensions. I think you'll see others start to follow suit
over time. The thing that... The reason I said let's look
out longer than that, I think ultimately where we want
to be is to think about the cloud as your database so
to speak. So you don't think about individual
technologies for storing the data. And ultimately, you
don't think about individual technologies for
processing the data. You just push your data to the
cloud, how and where it gets stored behind the cloud
interface is a entirely vendor problem. And then you
will more and more want to be able to just ask
questions about your data without having to project
into technologies that are very specific to data
processing.
Greg Pavlik: So you can imagine where I can come in, and I can
speak to my computer, which is hooked up to the
cloud, say, "Hey, I want to see how sales forecasts
were compared to actuals in North America for April."
And the result comes back. Almost like when you go
into Google, and you just type in, you type a search,
you get back a result. And the algorithms in Google
are trying to figure out as best they can what are the
most relevant results for your need, but you lack
precision. And you're, today at least, there's a degree
of personalization, but it's not hyper personalized. I
think over time, you'll be able to get almost the same
interaction that you have a Google except you'll be able
to ask very specific and very sophisticated questions
and get very specific and very sophisticated responses
back.
Kirill Eremenko: Wow.
Greg Pavlik: A response maybe a spreadsheet that comes back.
Okay thinking of it-
Kirill Eremenko: Fantastic.
Greg Pavlik: ... I didn't have to say, it just knows I work with
spreadsheets, this is going to be the best outcome for
you as a user. And you're not looking at individual
databases, you're not looking at trying to parse
through abstruse data structures and so forth. That's
the level of sophistication that you're going to get out
of the cloud in another decade or so. And I think the
thing about that, it goes back to also is language
processing. If you think about speech, if you think
about text analytics, just being able to say something
how that interpreted. In some sense understood that
having that translated into optimal set of queries that
happen in the backend, and then coming back with an
optimal set of results that will largely be driven
through machine learning.
Kirill Eremenko: Well, thank you. That's a great vision. I have one more
topic that just popped to mind that I wanted to touch
on, 5G and Edge computing. And what I've heard, I'm
not an expert in this by any means, but what I've
heard is 5G is here to partially enable Edge computing
and Edge computing is computing things, for instance
Siri right now, it won't work if you have no internet
connection. But if we have on device computing, then
it will work. Whereas Edge computing is somewhere in
between. It's between the cloud and it's locally in your
area. So is Edge computing going to disrupt Oracle's
business model?
Greg Pavlik: No. I mean, I think in general, the capabilities of the
cloud will progressively look more and more like
they're just a part of the natural landscape we work in.
But you're still going to need to do a lot of core data
processing, a lot of core data management at scale,
within a centralized context. What the promise, at
least in the near term with Edge computing is, is that
you can start to externalize what you might call
auxiliary processing down toward devices. And I think
5G... I mean, 5G will be important because it's opening
up bandwidth. But it's also going to be processing
power at the Edge, which is going to be a determining
factor for what we can do overtime as well. But for
sure, I mean, we're certainly see quite a bit of model
execution occurring outside of a centralized context.
Kirill Eremenko: Is Oracle planning on becoming part of that Edge
computing game?
Greg Pavlik: Yeah. I mean, it's unavoidable now. So it's all part and
parcel. Right now we've got a whole bunch of stuff
around digital assistants, chatbots, and so forth.
Those things will be the first wave you'll see projected
more toward the Edge, app functionality, disconnected
modes, etc. Those are all going to be things that we'll
see moving more and more into the Edge. I still think,
it's not going to be either or. This is complimentary set
of developments which will allow us to do things that
frankly, as a matter of just what would have been
impossible today will be doable on the Edge but it's
unlikely anytime soon that you're going to supplant
the need for internal systems, centralized systems. I
think 30 years out, who knows? A different question.
But in the near term, I think that these are more or
less entirely complimentary.
Kirill Eremenko: Okay. Understood. Yeah. So that wraps up all my
questions and we're also running out of time. But I
wanted to ask you before we wrap up, a guidance.
Because a lot of people listening to this are data
scientists, aspiring data scientists who want to
progress their careers and learn as much as possible.
And personally, I've learned a lot from you today. For
me, it was a very insightful conversation to get up to
speed with the world of cloud because normally as a
data scientist, you don't think about it that much.
You're not up-to-date with these trends and things
that are going on.
Kirill Eremenko: So what was your recommendation or wish, if you
could make one wish for people listening to this or
data scientists, in terms of their relationship with the
cloud and them being up-to-date with what's going on
in the cloud. What would your recommendation be?
Greg Pavlik: Well, I think there's a lot of advantages to thinking
about the ability to have a hub so that teams can work
together, so you get more productive because the
better outcomes we get, the more audibility we get, the
more control the teams have, and traceability in terms
of libraries and versions, and so forth the more
ubiquitous the outputs from data scientists teams are
going to be in organizations that might otherwise have
been a bit conservative about accepting work that was
harder to understand provenance.
Greg Pavlik: And like I say, I think the ability to keep up with the
demand, the processing demands for doing a lot of
artificial intelligence work are going to be impossible
unless you progress rapidly or able to take advantage
of the latest hardware. So if I want the latest
generation of GPUs, yeah, some organizations will buy
them if they're building out large HPC clusters, things
like that. But for most businesses, it's just not
practical. I think that looking at the cloud as an
enabling tool and not as a... And it shouldn't be looked
at as an impediment. It doesn't take anything away. It
only makes the job easier to get good results as
opposed to being stuck in the laptop based world.
Greg Pavlik: The other thing I would say, just in general, for data
scientists is don't be afraid of getting close to the line
of business. Because again, the value of the technology
is what's going to drive investment which is what's
going to drive innovation. So we need to continue to
really be driving powerful outcomes. And I know as a
technologist, it's easy for me to just get excited about
technology. But on the other hand, we all need this
stuff funded. So getting closer and understanding the
business because we've seen... A couple of examples.
One of the areas Oracle has worked with was the
health system in the UK, and they brought a bunch of
machine learning algorithms in from our Oracle
Machine Learning Platform.
Greg Pavlik: And the turnaround there, they applied it to patient
outcomes, they applied it to the fraud detection, and
they were saving within, I think within a year with a
20 person team, something like a billion pounds plus.
It was a billion pounds plus, in net savings on a year
over year basis.
Kirill Eremenko: Wow.
Greg Pavlik: So you can show those kind of results for an
organization, where you get this return on investment,
that you're just not going to see it through any other
mechanisms, that's really going to build up business
confidence that continue to invest and continue to
really make sure that this whole ecosystem is
becoming more and more of the mainstream.
Kirill Eremenko: Wow fantastic. Great advice. Thank you. Thank you
very much. Cloud doesn't take away from your
experience, but adds to it. And make sure to keep the
business objectives in mind. Greg, on that note, it's
been a huge pleasure. And before I let you go, could
you please help us out, where can we follow you, get in
touch or learn more about Oracle Cloud Platform or
Oracle Cloud Infrastructure?
Greg Pavlik: Yeah. So I have I guess, periodically. I don't do as good
a job as I should but I'll put up snippets and updates
and news of interest on LinkedIn. So it's probably the
easiest place to quickly follow what I'm up to, when I'm
not going to heads down in terms of our development
work. At Oracle Cloud, just easiest thing to do is just
go to Oracle Cloud and open up a free account and
start to play with it. I think people will be impressed
right off the bat.
Kirill Eremenko: Fantastic. Fantastic. And one final question, do you
have a book that you can recommend to our listeners?
Greg Pavlik: It depends where you're at, in terms of maturity in the
industry from a data science perspective. One of the
books that we've found to be pretty helpful for our
customers have been one of the O'Reilly books, Data
Science from Scratch. It's a Python oriented book, and
I think Python's others going down a little bit Python's
been on the upswing. So I think in terms of languages
to really try to get a mastery around from a data
science perspective, Python's pretty much where it's at
for today. Who knows, five years ago, or five years from
now if that's the case. And it really walks you through
building out algorithms, understanding how to really
get value from data in a fundamental way. So it's a
good starting point.
Kirill Eremenko: Great, thank you. Data Science from Scratch, right?
Greg Pavlik: Yep.
Kirill Eremenko: Got you. Data Science from Scratch by O'Reilly. On
that note, thank you very much, Greg, for coming on
the show. It's been a huge pleasure. And I personally
learned a lot, and I'm sure many, many other people
will too, as well.
Greg Pavlik: Yeah. Thanks for having me.
Kirill Eremenko: So there you have it everybody, that it was Greg Pavlik,
who is the Senior Vice President and Chief Technology
Officer at Oracle Cloud Infrastructure. And I hope you
enjoyed this episode as much as I did. And I hope you
learned quite a few things about the cloud and were
able to pick up on some of the interesting trends that
are going on in the world, what the future of the cloud
looks like, how to compare between the different
vendors, and why this service actually exists. What's
the purpose of encapsulating everything together? And
personally, that was my favorite part of the episode,
the whole notion of not just using open source tools,
but having a wrapper around them, that allows you to
scale with time because indeed, having your data on
the laptop only takes you that far. And then you need
to start thinking about, "Okay, how do I add cloud
services to this? How do I add traceability or
versioning of the different algorithms that I'm writing?
And also how, which data I'm using," things like that.
Kirill Eremenko: And to me, it sounds quite exciting that solutions like
this, like what Oracle is providing under object store
exist and can actually benefit the community. And I'm
curious too, as to what was your favorite part of the
episode. There was definitely lots of interesting gems
that Greg shared. And as usual, you can find all the
show notes at our website, at
superdatascience.com/375. That's
superdatascience.com/375. There you can find the
transcript for this episode, any materials were
mentioned on the show, plus the URLs to Greg's
LinkedIn and the Oracle Cloud Infrastructure website
where you can check out all the amazing things that
we talked about today.
Kirill Eremenko: And on that note, thank you so much for sharing your
time today with us and for being here and learning
together on this journey, hopefully the insights were
exciting and interesting to you and I look forward to
seeing you back here next time. Until then, happy
analyzing.