data in online business

15
Data in Online Business D. Pejčoch (eBay UK Ltd. + KIZI VŠE Praha)

Upload: others

Post on 26-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data in Online Business

Data in Online Business

D. Pejčoch

(eBay UK Ltd. + KIZI VŠE Praha)

Page 2: Data in Online Business

Outline

• What is online business

• Examples of business models

• Typical processes in online business

• Big volume of processed data

• Specific problems with data

• What is Big Data?

• Potential sources of Big Data

• Specific issues in data quality

Page 3: Data in Online Business

What is online (.com) business?

Page 4: Data in Online Business

Examples of Business Models

• eShop: revenue = buying price – selling price = insertion fee + final value fee + feature fee + subscription fee + ...

• Google: 98% of revenue from selling ad space

• Facebook and others: advertising, payment revenues, ... and ...?

• Zynga, Geewa, ...: virtual goods, advertising

• Instagram (FB), ...: who knows? ... perhaps they sell customer data?

• LinkedIn: Freemium business model (InMails, Profile Stats Pro, ...)

• Slideshare (LI): Freemium as well

• Foursquare: still has no clue

Page 5: Data in Online Business

Comparison of processed volume of data

< 1TB < 5 TB < 10 TB

< 100 TB

PBs

Retail Insurance Banks Telco Online

http://www.economist.com/node/15557443

2010: Exabytes

Page 6: Data in Online Business

http://www.worldwidewebsize.com/

The Indexed Web contains at least 4.23 billion pages

(Saturday, 05 October, 2013).

Page 7: Data in Online Business

Specific problems with volume of data

• Huge data centers

• High availability

• Millions rows per day

• Even robust RDBMS are not enough

• Limited DB space and resources assigned to single analyst

• Extreme effort in optimalization of SQL queries

• Long queues of queries waiting for execution

• Long time to process single query

• Logical + physical partitioning in data

• Tables with 1% samples

http://data-arts.appspot.com/globe-search/

Page 8: Data in Online Business

Elementary Table of Online Business: e-shop

Registration

Search

Guest

B2C

C2C

Unknown

Listing

Transactions

Feedback

Unpaid Items

Shipping

Campaigns

Segmentation

Stores

Deals

Fix Price

Auctions

Buy It Now

Householding

Deduplication

Cross Border

Transfer Bids

CS support

Commit To

Buy

Basket

Page 9: Data in Online Business

Classification of Processed Data

• Metadata: – Technical: e.g. description of huge data models

– (Business: labels, description in reports)

• Master Data: – Customers, their hierarchies and interactions (e.g. Facebook)

– Product Categories

– Stores

• Transactional Data: – Search

– Listings

– Bids

– Transactions

– Revenue

– Shipping

– Payment

– Feedback

– Campaigns

– Promos

– Deals

– ...

• External Data + LOVs

Search Listings Trans. Feedback CS

Page 10: Data in Online Business

Global World of Internet Business: eBay Transaction

Seller from US registered on IT site

listed on UK site

Buyer from Australia purchased item on

UK site

Purchased item is located in China

Italian, US or UK seller?

Page 11: Data in Online Business

What is Big Data? • Data which is impossible or not

effective to process using common RDBMS

• Gartner‘s Vs

• Big Data = Hadoop

• Processing of large unstructured data

• Different sources

• Big Data = Small Data + Big Garbage?

Volume

Variety

Velocity

VALUE !!!

Complexity

?

Veracity ?

Page 12: Data in Online Business

Big Data and online business

• Hadoop has been developed within online business (Google)

• Typical use: search indexation

• Secondary use: when RDBMS is not sufficient, when faster delivery is critical, when aditional knowledge is needed, ...

• Big Data Quality: Just a New Buzzword or Serious Topic? – Quality within Hadoop => re-active DQM => spaghetti code, weakness of Hive SQL, DQM

decentralization ...

– Quality of sources => pro-active DQM => missing metadata, lost control, ...

– Big Data as reference source

– Hadoop as DQM platform

http

://marko

sun

.wo

rdp

ress.com

/20

12

/10

/21/in

side-

goo

gles-giant-d

ata-centers/

Big Data is only another kind of data

under global Governance

initiative !!!

Page 13: Data in Online Business

My Tips for Potential Sources of Big Data

• Social Networks => Word-of-mouth (e.g. testing change of fees), Loyalty program

• Registers of Debtors => UPI prevention

• Shipping data (using RFID) => Shipping optimization

• Weather information => predicting BYR / SLR behavior => better targeting campaigns and promo actions

• Actual news => explaining production metrics anomalies

• External LOVs => Demographic and behavioral informations => better set up of campaigns and Next Best Offers, better segmentation based on client profiles

• Informations about Competitors (results of Competitive Intelligence)

• External Metadata => better validation process

• Analysis of plaintext attributes

– call center: needs, validation, process maturity improvement, new attributes

– items description: avoid IAD Low DSRs

• Product information (e.g. EAN) => better analysis of product (e.g. CBT, product specific MOTs)

Page 14: Data in Online Business

Specifics of Data Quality / Governance

• Not standardized data: fields in item description / title, tweets, fbk statements, ...)

• A lot of not structuralized data => specific problems with quality

• Hadoop accuracy

• Product catalogue on basis of product categories (small level of granularity)

• Deduplication of users (guests vs. standard users, different accounts, purchases without registration, ...) ... thanks God for cookies! FOAF: thanks for emails

• Customer >= User = Account

• Importance of metadata and management of relevant knowledge

• Household identification (no care => no verification)

• Not online MDM: performance (too complex)

• Global environment (mix of Locales)

• Total Costs of Information <= too big data

• Regulations: SOX, Data Quality Act, ... no Basel, no Solvency, ...

• Governance: not all data under (direct) control

• Stewardship: complex knowledge => data scientists?

Page 15: Data in Online Business