data in online business
TRANSCRIPT
![Page 1: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/1.jpg)
Data in Online Business
D. Pejčoch
(eBay UK Ltd. + KIZI VŠE Praha)
![Page 2: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/2.jpg)
Outline
• What is online business
• Examples of business models
• Typical processes in online business
• Big volume of processed data
• Specific problems with data
• What is Big Data?
• Potential sources of Big Data
• Specific issues in data quality
![Page 3: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/3.jpg)
What is online (.com) business?
![Page 4: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/4.jpg)
Examples of Business Models
• eShop: revenue = buying price – selling price = insertion fee + final value fee + feature fee + subscription fee + ...
• Google: 98% of revenue from selling ad space
• Facebook and others: advertising, payment revenues, ... and ...?
• Zynga, Geewa, ...: virtual goods, advertising
• Instagram (FB), ...: who knows? ... perhaps they sell customer data?
• LinkedIn: Freemium business model (InMails, Profile Stats Pro, ...)
• Slideshare (LI): Freemium as well
• Foursquare: still has no clue
![Page 5: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/5.jpg)
Comparison of processed volume of data
< 1TB < 5 TB < 10 TB
< 100 TB
PBs
Retail Insurance Banks Telco Online
http://www.economist.com/node/15557443
2010: Exabytes
![Page 6: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/6.jpg)
http://www.worldwidewebsize.com/
The Indexed Web contains at least 4.23 billion pages
(Saturday, 05 October, 2013).
![Page 7: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/7.jpg)
Specific problems with volume of data
• Huge data centers
• High availability
• Millions rows per day
• Even robust RDBMS are not enough
• Limited DB space and resources assigned to single analyst
• Extreme effort in optimalization of SQL queries
• Long queues of queries waiting for execution
• Long time to process single query
• Logical + physical partitioning in data
• Tables with 1% samples
http://data-arts.appspot.com/globe-search/
![Page 8: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/8.jpg)
Elementary Table of Online Business: e-shop
Registration
Search
Guest
B2C
C2C
Unknown
Listing
Transactions
Feedback
Unpaid Items
Shipping
Campaigns
Segmentation
Stores
Deals
Fix Price
Auctions
Buy It Now
Householding
Deduplication
Cross Border
Transfer Bids
CS support
Commit To
Buy
Basket
![Page 9: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/9.jpg)
Classification of Processed Data
• Metadata: – Technical: e.g. description of huge data models
– (Business: labels, description in reports)
• Master Data: – Customers, their hierarchies and interactions (e.g. Facebook)
– Product Categories
– Stores
• Transactional Data: – Search
– Listings
– Bids
– Transactions
– Revenue
– Shipping
– Payment
– Feedback
– Campaigns
– Promos
– Deals
– ...
• External Data + LOVs
Search Listings Trans. Feedback CS
![Page 10: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/10.jpg)
Global World of Internet Business: eBay Transaction
Seller from US registered on IT site
listed on UK site
Buyer from Australia purchased item on
UK site
Purchased item is located in China
Italian, US or UK seller?
![Page 11: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/11.jpg)
What is Big Data? • Data which is impossible or not
effective to process using common RDBMS
• Gartner‘s Vs
• Big Data = Hadoop
• Processing of large unstructured data
• Different sources
• Big Data = Small Data + Big Garbage?
Volume
Variety
Velocity
VALUE !!!
Complexity
?
Veracity ?
![Page 12: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/12.jpg)
Big Data and online business
• Hadoop has been developed within online business (Google)
• Typical use: search indexation
• Secondary use: when RDBMS is not sufficient, when faster delivery is critical, when aditional knowledge is needed, ...
• Big Data Quality: Just a New Buzzword or Serious Topic? – Quality within Hadoop => re-active DQM => spaghetti code, weakness of Hive SQL, DQM
decentralization ...
– Quality of sources => pro-active DQM => missing metadata, lost control, ...
– Big Data as reference source
– Hadoop as DQM platform
http
://marko
sun
.wo
rdp
ress.com
/20
12
/10
/21/in
side-
goo
gles-giant-d
ata-centers/
Big Data is only another kind of data
under global Governance
initiative !!!
![Page 13: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/13.jpg)
My Tips for Potential Sources of Big Data
• Social Networks => Word-of-mouth (e.g. testing change of fees), Loyalty program
• Registers of Debtors => UPI prevention
• Shipping data (using RFID) => Shipping optimization
• Weather information => predicting BYR / SLR behavior => better targeting campaigns and promo actions
• Actual news => explaining production metrics anomalies
• External LOVs => Demographic and behavioral informations => better set up of campaigns and Next Best Offers, better segmentation based on client profiles
• Informations about Competitors (results of Competitive Intelligence)
• External Metadata => better validation process
• Analysis of plaintext attributes
– call center: needs, validation, process maturity improvement, new attributes
– items description: avoid IAD Low DSRs
• Product information (e.g. EAN) => better analysis of product (e.g. CBT, product specific MOTs)
![Page 14: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/14.jpg)
Specifics of Data Quality / Governance
• Not standardized data: fields in item description / title, tweets, fbk statements, ...)
• A lot of not structuralized data => specific problems with quality
• Hadoop accuracy
• Product catalogue on basis of product categories (small level of granularity)
• Deduplication of users (guests vs. standard users, different accounts, purchases without registration, ...) ... thanks God for cookies! FOAF: thanks for emails
• Customer >= User = Account
• Importance of metadata and management of relevant knowledge
• Household identification (no care => no verification)
• Not online MDM: performance (too complex)
• Global environment (mix of Locales)
• Total Costs of Information <= too big data
• Regulations: SOX, Data Quality Act, ... no Basel, no Solvency, ...
• Governance: not all data under (direct) control
• Stewardship: complex knowledge => data scientists?
![Page 15: Data in Online Business](https://reader031.vdocuments.mx/reader031/viewer/2022022700/621962691553de603c3d83b4/html5/thumbnails/15.jpg)