who is this talk for? - home | illinois physics · resume “data science”, “machine...

24

Upload: others

Post on 25-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,
Page 2: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Who is this talk for?

"As a student I would like a job that pays well and satisfies my intellectual and social needs.”

"As a professor I would like to secure influential and prestigious careers for my students.”

“As a scientist I would like to learn about a different industry that uses similar skills.”

“As a younger student I would like to learn about marketable skills in ‘Big Data’."

Page 3: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Section 1: What is “Big Data”?

Section 2: Scientists in the Silicon Valley.

Section 3: Securing a job in this industry.

Outline

Page 4: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

What is Big Data?

Page 5: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Memory is Cheap

So why not store everything?

Page 6: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

A Big Data Example

Video Players

Highly AvailableData Store

Storage for ProcessedData

Data collected and stored in real time

Data periodically processed

PersonalizedVideo

PublisherAnalytics

InternalAnalysis

Page 7: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Components of Big Data

Storage and Collection

High Availability

Distributed Data Stores Cassandra

Automated data collection over the internet

Processing

Distributed Computing Hadoop - batched

Storm - streaming

Analysts (this doesn’t scale well)

Page 8: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Hadoop

Always writable

Key, Value Pairs

No structure

Denormalized

Page 9: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Schemaless Data Stores -Cassandra

Structured around queries

Distributed, denormalized

Always readable

Eventually consistant

Page 10: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

RDBMS

Difficult to scale

Normalized

Schema’s allow complex data sctructures

SQL – structured query language

RDBMS – relational database management systems, e.g. MySQL

Page 11: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Digression

What do we mean by “Industry” and “Career”?

Page 12: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Scientists in the Silicon Valley

Page 13: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Job description

What do you do? What’s the job like?

Page 14: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

How I use my PhD

Programming, math, stats every day

Communicating complex information clearly to a diverse audience

Work ethic

People in tech respect an Illinois PhD

Page 15: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

The Bay Area

Environment

San Francisco

Mountains

Ocean

Public Transportation

Universities/Culture/Wealthy People

Companies

Big Data: Google, Twitter, Facebook, LinkedIn, Salesforce, Palantir, Seattle: Amazon

Medicine: Genentech

Hardware/Software: Apple, Seattle: Microsoft

Startups

Page 16: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

The Office

Google inspired work places are the norm

Free food!

Page 17: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Example Career Paths

Management Marketing, Pricing, Product Development, Recruiting

Software Engineer Building Products and Tools, Coding

Entrepreneur Fund Raising, Marketing, PR, Coding

Actual Scientist Fundraising, Turning Wrenches, Math, Physics

Page 18: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

If I could go back to grad school…

Record everything

Learn to use databases, SQL is really useful

ABCDE – Always be collecting data earnestly

Page 19: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Physicists could improve on…

Shorter development cycles Multiple papers per year, less time between starting a

project and seeing the results

Place more emphasis on projects with quick returns, only do something big if it is really justified

Do PR More communication outside of papers and press

releases

Really understand who you are trying to influence and why

Page 20: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Getting your first job

Page 21: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Making Connections

Visit, make friends, get warm introductions Talk to Matt or email him after the talk

[email protected]

Get an internship Its ok to take time off

Less pressure on both employer and employee

LinkedIn is effective at finding openings

Page 22: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Resume

“Data Science”, “Machine Learning”

Emphasize experience with programming languages, data bases, analysis techniques, distributed computing

You’ve built amazing projects, emphasize the responsibility and scope of what you can accomplish without much supervision

Statistics and programs like R

Page 23: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Consider the following

Physics

Become the best in the world in an important subject

Few job opportunities

Many more years of investment before any payoff

Industry

More pay, more flexibility

Smaller faster projects

More interaction with people

Opportunities to branch out into other parts of business

Page 24: Who is this talk for? - Home | ILLINOIS PHYSICS · Resume “Data Science”, “Machine Learning” Emphasize experience with programming languages, data bases, analysis techniques,

Come Talk!

I’ll be in Loomis 265 Thursday and Friday afternoons. Send me an email at

[email protected].