the professional data scientist intelligence report

14
The Professional Data Scientist Intelligence Report The Data Scientists' Guide to a Fulfilling and Balanced Career By Troy Sadkowsky

Upload: others

Post on 12-Sep-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Professional Data Scientist Intelligence Report

The Professional Data

Scientist Intelligence Report

The Data Scientists' Guide to a Fulfilling and Balanced Career

By Troy Sadkowsky

Page 2: The Professional Data Scientist Intelligence Report

The Professional Data Scientist Intelligence Report The Data Scientists' Guide to a Fulfilling and Balanced Career

Inside this report, I'm going to explain to you how I've tapped into one of the hottest new

jobs of today, the data scientist – and share with you how the role is purpose built for those that love learning the latest technologies and a passion to make a difference in the world. The data scientist role is a doorway to a career that allows you to jump out of bed in the morning to literary go and play at work – and get paid for it.

The information in this report has enabled me to work in way that is fulfilling, balanced and purpose driven. As a professional data scientist I have a large amount of freedom in scheduling my own time and can work from anywhere that has an Internet connection. I have more control over who I work for and for what purpose. Since becoming a data scientist I have worked for Organisations from one end of Australia to the other (and now even overseas) and I have done that work from “office” locations ranging from beach fronts in Australia to Internet Cafes in the middle of Poland. My data scientist career has enabled me to travel from one end of the world to other without the need to take “time off”.

It’s been a long road though. For a long time I just felt like I was drifting along not knowing what I really wanted to do or where I could make a big difference with my life.

I've got a list of past job titles that ranges from brickies labourer to artificial intelligence programmer, and have worked in various industries that compare as chalk and cheese.

No matter what position or job title I was given I always felt that I had more to offer and that there was a better way for me to spend my time. On average I was spending 50 hours a week of my life doing something I didn't really like, for someone else (whom I mostly didn't really like). And even at the time it didn't make sense to me, but I was conditioned to think that this was what had to be done.

Over the years I've been able to talk to others that felt the same way about their positions. But no one seemed to know how to get around it, it was just the way it was and that’s what you did. So I kept at it, but all the while I was forever searching for a way out.

Years went by and I slowly build up a reputation for myself as someone that delivers results, because even though I felt that I wasn’t in the right place I took pride in what I did and always did my best to deliver. And because I was never afraid of learning something new I had established myself as more of a jack-of-all-trades rather than a specialised expert. I was jumping from job to job from industry to industry desperately searching for a place to settle into and take hold of. I was on a road that not only seemed endless, but the road also had an endless number of forks in it and at every fork I had to painstakingly choice which way to go.

Until, I found it...

Page 3: The Professional Data Scientist Intelligence Report

To be more accurate I would have to say it found me. I was at a conference in November 2008 and I was in between jobs so I was doing a bit more networking than I usually did and I had just finished answering the typical “what do you do?” question and the person I was talking to said “you know, over in the States, we have a name for you guys, we call you Data Scientists”. At that moment I felt like the heavens had opened up and God himself had descended from above and said “Here you go son, here is your name”. And it wasn’t just a name, it was a purpose, it was the thing I’d been searching for, the sanctuary I could settle into and a path that felt so right that I didn’t care how long I would be on it. I was now on a road to mastery, mastery of the Data Scientist role.

A data scientist's mission is to serve and inspire others in the organising, packaging and delivering of information. In short the data scientist turns data into value.

The data scientist role draws upon skills and attitudes found in today’s professional software developers, data managers, lifestyle coaches, artists and entrepreneurs.

And I believe I've discovered the formula for bringing all these unique skills and attributes into one manageable role in a fulfilling and balanced way.

How to Play and Win Big as a Data Scientist in the Game of Life

By the time you’re finished reading this report, you will not just have an education in how to become a data scientist, you will also know what it takes to be a high performance data scientist. The information in this report has taken me hundreds of thousands of dollars and more than a decade of years to learn.

If you already happen to be a data scientist, that is awesome! This report will show you how to take the success of your role to the next level.

So let’s do it...

Page 4: The Professional Data Scientist Intelligence Report

Quick Warning

This is not the report for you if you are looking for information about how to set up a multinode cluster on Ubuntu. Nor will not teach you how to develop a data management plan for an international cancer research study. And it’s not even going to teach you how to normalise, interlink and integrate data between an XML file and a MySQL database. You get my point, right? This report is not about using specific tools or performing a specific task. (Btw, mulitnode clusters, Linux distros and file formats are also things I love to talk about, so if you are looking for discussions on any of the latest technologies, tools and frameworks let’s talk later.) Also, this is not the report for you if you are looking for a quick fix solution. It is very unlikely that you will turn into a professional data scientist overnight.

This report is for those people looking for ways of bringing the data scientist’s way of doing things into whatever you might be doing right now. It’s a report on how to take an existing role and transform it into the role of a high performance data scientist.

Now, when it comes to being a high performance data scientist there are things you need to do in your Inner Game and there are things you need to do in your Outer Game. What I have learnt from years of experience and education is that if you don't get your Inner Game right, it will be much more difficult to get your Outer Game right. The Outer Game takes place externally and involves interactions with other people and tools. The Inner Game takes place in the mind and involves you and you alone. This report is focused on the Inner Game stuff.

Inner Game.

What is your “Inner Game”? Your Inner Game is your beliefs and your attitude towards things. And these beliefs and attitudes can be relatively unconscious, most of the time we don’t even realise that we have them. And if you don’t get your beliefs and your unconscious mindset correct, everything that you do to position yourself as a high performance data scientist won't work.

As an example the Inner Game is the battle that goes on in your head when your morning alarm goes off and you (1.) wake up, then (2.) think of everything that you need to do today, which causes you to (3.) get so overwhelmed that you decide to hit the snooze button, then (4.) repeat from step 1 until you are late.

So, let’s get into it...

Page 5: The Professional Data Scientist Intelligence Report

The Most Important Thing... Don't Panic

The Data Scientist role is one of the most challenging roles of the world today. And those that take on the challenge will be rewarded with the one thing that everyone yearns for...

Freedom.

Freedom to create and innovate in any area they choose and to whatever level they choose. However, with freedom comes responsibility. So we’ve got to prove ourselves worthy of that responsibility by overcoming the challenges the role presents.

A data scientist gives everything and expects little in return. As an emergent data scientist you will not get the recognition and support that you need. Your innovative solutions will ultimately result in getting more and more responsibility to “make it work” with no or little added compensation or rewards. You most likely will get frustrated.

However, Don't Panic... This is normal.

You will need to work with ambiguity and scarcity. You will be under pressure to deliver quality results on time and on budget. You will be expected to make magical things happen. And even with your best efforts, there will be cases where you will not succeed. Things don't always go to plan, schedules slip, designs have flaws, assumptions are incorrectly made, bugs are discovered.

However, Don't Panic... This is normal.

The role of the data scientist is so new that the true value the role has not yet been fully recognised. And as it is, with all things new, the role is not trusted. The benefits are not commonly known and people are naturally sceptical. Those you work with may be looking, hunting, ready to pounce on any chance they get to diminish the role. You will not be trusted.

However, Don't Panic... This is normal.

You will need to deal with humans. Be aware that we will most likely be 10 times more impacted by something going wrong than when something going right. When it comes to digital data and technology we have all experienced something “going wrong”. Most people have scars to show and stories to tell still fresh in their minds. Socially, there will be barriers, walls and boundaries that will need to be overcome, and these will not just between the cultures that don't quite get IT, you will need to deal with culture clashes at every turn.

However, Don't Panic... This is normal.

Before you get too discouraged, I hope you can read between the lines and see that there is some good news here. There is an easy to do action that will allow you to avoid all unnecessary stress. When everything that could possibly go wrong does, when the pressure is on and it looks like you are going to miss the deadline, when everyone is pointing the finger at you or when things start to fail because of flaws you suggested be fixed months ago, the last thing you should do is panic. Simply just don't panic. Now, I know what you’re saying... that's all well and good advice but it's much easier said than done, right? Well yes, it is easier said than done. So here is a one step process that I have found to be invaluable as a data scientist. The one step process I use for combating the anxiety and depression caused by immense pressure and the lack of recognition. The one step process can be explained in one word...

Page 6: The Professional Data Scientist Intelligence Report

Acceptance.

It is so simple it is often overlooked. Eliminate the stress and negative thoughts by acceptance of what is without reservation.

What would it mean to you if your stress never affected your performance in a negative way again?

Permit me to say it one more time to drive it home, the most important thing you need to do as a data scientist is...

Don't Panic

And a great method for avoiding the panic attacks is...

Acceptance

Pure and simple.

Practice the art of acceptance and not panicking at your very next experience of something “not going right” (Don't worry I'm sure you won’t have to wait long).

Page 7: The Professional Data Scientist Intelligence Report

Next, Find More Courage Than You Currently Have And Then Take Action.

Okay so now you are primed for action.

Let’s do it, let’s take action. Yeah! Wooh!

With the adoption of the “don't panic” rule comes power. You may be able to feel it already. It’s not the power of having power over someone or something else but the power that comes from being fearless. Be careful though, not to confuse fearlessness with being recklessness.

Okay so quick, let’s find something to do.

Think of a task right now. It can be a personal task or a task related to your current job, it doesn't matter.

BTW there are three main categories that encompass all the tasks you ever do in life and those are 1. Tasks related to increasing your social network, 2. Tasks related to increasing your knowledge, and 3. Tasks related to increasing your experience. And to be a high performance data scientist you need to have a healthy balance or all three.

However, getting back to the task, think of it then write it down now...

And now next to it write down how you are going to solve it. Take whatever time you need but do it now.

Now that you’ve got you task and solution, say to yourself, “If I had more courage than what I currently have, what more could I do in performing this task?” Would I take a new approach? Would I spend more time on it? Would I ask someone else to do it?

What more would and could you put towards performing the task if you had more courage than you have right now?

Another way at looking at this is to work out what you would be comfortable doing and then notch it up a few levels. Why should I make myself uncomfortable? It sounds a bit counter-intuitive doesn’t it, right? The reality is, nothing great ever happens unless you are outside you comfort zone.

Okay, so this is nothing new and you’ve heard this all before, right? Get outside your comfort zone and be innovative. However as a Data Scientist, there is something very important to watch out for, and that is to ensure you don't cross the reckless line.

The reckless line is the line between courage zone and reckless zone (see figure 1). There is a reason why we have a natural instinct not to be too courageous and that is so we don’t end up injured or dead. Luckily for us though, there haven't been any reported deaths due to the recklessness of a data scientist (not that I am aware of anyway), however it is important not to know where your boundary is between being courageous and being reckless.

Page 8: The Professional Data Scientist Intelligence Report

Figure 1: Innovation and Fearlessness forming the Zones of a Data Scientist

In order to remain innovative a data scientist continually crosses of the comfort line and works within the courage zone and needs to use intuition and personal judgement to ensure they don’t get to close to crossing the reckless line. However, before moving anywhere near the reckless line there is something we can do to minimise the impact if something was to go bad. And that is to establish a safe-fail environment.

Safe-Fail Environments.

What is a Safe-Fail Environment? A safe-fail environment is your exception handling. It allows you to fail gracefully and is established so that it not only handles your known exceptions but also your unknown exceptions.

My number one method for establishing a safe-fail environment is to know that it is impossible to fail. But let’s leave that one alone for now before we get into an infinite loop of contradictions (e.g. why would I need s safe-fail environment if it was impossible to fail).

So the next best method for establishing a safe-fail environment is transparency.

Establishing a fully transparent working environment means that everyone can see what you are doing. Everyone knows or at least can very easily find out, what you plan on doing and every sees the results that have come out of your actions, good and bad.

Transparency is essential for establishing and maintaining trust within your working environment. Having a culture built on high levels of trust is the absolute perfect environment for being a high performance data scientist.

Establishing a culture built on trust is something I provide more detail about in my audio downloads which you can get from the members area of DataScientists.Net.

Page 9: The Professional Data Scientist Intelligence Report

There is another very important thing to watch out for when working in the courage zone, because of it high levels of innovation it becomes easier to fall into the trap of the infinite loop of perfection.

The infinite loop of perfection.

The infinite loop of perfection is what happens when you get so caught up on making things better that you forget about actually delivering. It has happened to me many times, I get caught in finding new and innovative ways to solve a problem that by the time I finish implementing the solution the original need for the solution has actually dissipate totally. So be careful that by the time you say “here you go, here is the perfect solution to your problem”, you don’t get met with a response of “oh, you know what, we don’t actually need that anymore”. The world is experiencing change at such a rapid rate that if you are not timely then value will be lost. Know that value cannot be realised until something is delivered and that the infinite loop of perfection prevents delivery.

A good preventative measure against the infinite loop of perfection is Time Boxing.

Time Boxing.

So, what is Time Boxing? Time Boxing means that you set a fixed duration of time for performing your task and allows you to more accurately split the task up into researching the solution and getting a solution implemented. And Time Boxing forces delivery. Time boxing is my number one method for working effortlessly and productively. The agile manifesto and iterative development resources are a great place to start if you are new to time boxing. And I've got some great techniques and tools that provide step by step guidance on how to get the most out of time which will be a topic for another report.

Okay, so to sum it up...

A high performance data scientist, when taking action, will challenge themselves and will always be serving to inspire others. Don't forget that you can serve to inspire yourself as much as you can serve to inspire others.

Courage, Safe-Fail environments and Time boxing will breed innovation and motivation.

Now go back to the task and proposed solution you wrote down earlier. Image a solution you could devise while working within your courage zone, using time boxing and with a safe-fail environment established. Make note of any differences between your solutions as well as your thoughts towards the problem and proposed solutions.

Page 10: The Professional Data Scientist Intelligence Report

Finally, Enable Feedback.

Feedback is the data scientist's gold. Remember that the data scientist’s mission is to turn data into value? Well value is in the eye of the beholder, so the longer you go without getting feedback the longer you could be hiking down the wrong road. This is why it is critical to get feedback. Without feedback every next step is a guess.

From my experience feedback doesn't come easy, you need to work for it. And if you put in the work you will get quality feedback. Quality feedback is essential if you want to be successful in delivering solutions that tend to the direct needs of those you are serving. And the level of success is directly proportional to the level of quality feedback you get. So a data scientist needs to do whatever it takes to enable quality feedback.

So what is Quality Feedback anyway?

When a person speaks the truth about their very own personal opinion on a system or solution, then you have quality feedback. The key points in that definition are “truth” and “own personal opinion”.

Quality feedback is not having someone tell you “yeah its pretty good... I can see how it might be useful for someone who does [abc] and wants [xyz]”.

For quality feedback you want the person to tell you exactly how the solution is serving them and what are the faults that they see.

So why doesn't quality feedback come easy?

There are two major hurdles that I have seen that continually provide additional challenges to obtaining quality feedback. The first is that we don't naturally want to be the bearer of bad news. And the second is that most the time we don't actually know what we want.

Page 11: The Professional Data Scientist Intelligence Report

Let me tell you a story.

For me, getting over the first hurdle and actually getting people to tell me that there is a minor fault in the system has been especially difficult. I am not totally sure why, but I think it has something to do with the environment I work in. You see I love working and serving those in the cancer research industry.

People that work in the cancer research industry tend to see the good in everything. And if there is minor fault they will overlook it and may not even consider mentioning it. It's was crazy because people would be actually making my job harder by trying not to hurt my feelings.

I always remember the time I built a very small biospecimen tracking system for an ovarian cancer researcher, it was a very simple system that allowed the researcher to enter a participant number and the type of biospecimen (e.g. blood or tissue), and then click a button that then validated the number against the main database to ensure that participant existed, generate a series of biospecimen numbers (e.g. if it was a blood sample the system would generate a number for appropriate blood components (i.e. alloquots)) and then told the researcher the best place to store them in the fridge (based on small algorithm that considered biospecimen types, fridge types and availability of space).

Three days after I delivered the system to the researcher we had a general monthly meeting and at the meeting I said “So how is the system I built?”, he said “I've been using it all week and its great, it is so much easier than manually entering these records into the spreadsheet”.

Two weeks later, I was in his lab talking to another colleague and I saw he was on the system. I was talking to someone else at the time but could see his monitor and from the corner of my eye I saw an exception page show up on his browser. I stopped mid sentence and ran over to him. By the time I got there, he had already closed the page and our conversation went something like this. Me: “What was that exception?” Researcher: “What?” Me: “That error message that just showed up on the browser” Researcher: “Oh that. That always happens, I usually just click the back button to continue”. At that moment I had a sinking feeling. He had been entering data for two weeks and there was a possibility that the system may have been tracking biospecimens all wrong.

After I bit more conversation and investigation it turns out that it actually was a very minor issue (it was due to a small error in the final redirect after the user clicked the button (it was still redirecting the page to the development server)).

So why didn't he tell me about it? He said he didn't want to worry me about it, he could see that it was working and that's all that mattered. And in this case that was true but my mind just kept thinking about what if it had of been causing back end problem and this system was just appearing to work but really wasn't.

After thinking about this I realised that I didn't actually give him the opportunity to tell me about things like this. That sinking feeling stayed with me long enough to give me the motivation to find out how to ask for feedback in a different way that ensured that I not only knew the system was fully functional but also allowed me to know if there was any minor issues.

I discovered that rather than asking questions like “how’s the system?”, if I always ask “how could the system be made better?”, you will get a lot more people telling you the minor faults if you allow them to tell you how you can make it better rather than asking them to tell you what's wrong with it.

Page 12: The Professional Data Scientist Intelligence Report

In addition I found out that people are generally happy to tell people that aren't you about what’s wrong with it. Whenever I got the chance I would hold off telling anyone that I was the developer of the system before I asked them about it.

So wherever possible ask people about how to make the system better rather than ask them what was wrong with it. And never identify yourself with the system before you have to.

So let’s talk about the second hurdle. Why don't people know what they want? Whatever the case may be, for most of us, in today's society it very easy to become

paralysed by choice. Psychologists call it the “paradox of choice”. The human brain is wired to feel it needs more choices, more options, however, when it gets past a certain threshold with too many choices or options then we get stressed out and less satisfied with the final choice. And no matter how many choices or options we have, in the back of our minds we continuously wondering about what would have happened if I took a different choice or option.

So, we remain vague and ambiguous. We sit on the fence for as long as possible. We frequently jump from one side to the other and back again. Which is all well and good but it is very hard to achieve a goal if the goal posts keep moving.

Okay so as a data scientist, how can I combat this?

Through a series of in-depth questioning and documentation until all vagueness and ambiguities are resolved. Right...?

Wrong!

Odds are the more you question the further down the wrong track you will get. The number one method for removing vagueness and ambiguities is to...

Deliver!

Delivering your solution and deliver it fast. By delivering a solution you enable people to talk about it in more detail, you enable them to point to its parts, you enable them to see it work (or see it fail).

Words of warning though, never call you first release version 1. Version 1 should be reserved for when all (or at least most) vagueness and ambiguities have been removed. And until then call it something like version 0.00.001 beta, where the number of points and zeros are directly proportional to the degree of uncertainty in the supplied requirements. This versioning style is a great conversation starter and can open the doorway to more clarity and specificity in your requirements. This simple naming convention can give those you deliver to an in-your-face visual representation of the quality of the solution they are building. And make sure you reward them with a jump from 0.00.[whatever] to 0.01 whenever you get some more clarity on what they want.

So to gain quality feedback:

work for it

always ask how it can be improved

remove your identification with it

keep delivering.

Page 13: The Professional Data Scientist Intelligence Report

Summary

So in summary, the data scientist role can bring you a level of fulfilment in job satisfaction that enables you to work in the industry that you are passionate about. However, it’s not a quick fix. You need to work at it and as long as you take the approach of working smarter rather than working harder you won’t get burnt out by the demands that the role brings. Start with the inner game. Bring more control and direction in how you handle situations. Push yourself out of your comfort zone. And listen. Once you have mastered these then the rest is easy.

If you are interested in learning more, please join my mailing list at DataScientists.Net. I’ve set up DataScientists.Net for those interested in this new field to share experiences and knowledge so that we may serve to inspire and instruct others in organising, packaging and delivering information. My vision for DataScientists.Net is to be the number one online collaboration resource for data scientists to Meet, Learn and Teach.

Hope to talk to you again soon.

Troy Sadkowsky

Page 14: The Professional Data Scientist Intelligence Report

One More Thing...

I'd like to talk about someone I see as a pioneer in the field of data science. Someone that depicts the true essence of what it means to be a data scientist. If you want to know what it takes to become a successful data scientist, follow the lead of Jim Gray.

It is not Jim's doctorates or even his 30 years experience from working at companies like IBM, Tandem Computers, DEC and Microsoft. From looking at Jim’s life and work you can see that what is important is not the outer game, it’s the inner game. It’s important to build rapport with those that you work with and for. It’s important to give far more than what you take. It’s important to take an egoless approach to building worldwide solutions for worldwide challenges. And if there was just one message coming through from reviewing Jim’s work and life it would be that it is important to have passion about what you do.

Before Jim was lost at sea, Jim formulated several informal rules (now known as Gray's laws) that codify how to approach data engineering challenges in a data intensive world. Live by these laws and as a data scientist you cannot go wrong:

1. Respect the value in the data. (Information is everything)

2. Build scalable solutions. (if you can't built on top of it, it will be lost)

3. Don't move data. (Bring the computations to the data)

4. Know the question(s). (What are the top 20 queries you want to know from the data)

5. Iterate your way from working solution to working solution. (Be agile and expect change)