big- data and risk management - ido lustig, paypal

Download Big- Data and Risk Management - Ido Lustig, PayPal

Post on 16-Jul-2015

210 views

Category:

Technology

2 download

Embed Size (px)

TRANSCRIPT

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Big Data and Financial

    Risk Management

    December 1, 2014

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    at a glance

    143M active accounts in nearly 200 countries

    180B payment volume in 2013; 24% YoY

    2B+ events/day

    12 TB new data added per day

    7.5M payments per day, 5,000 every minute

    500K+ real time queries per second

    Less than 100ms average response time

    We are talking a lot of data Big Data!

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Key dangers stated by experts (partial list):

    Buyer Fraud

    Good account is taken over by fraudster (e.g., using phishing)

    Identity theft (using a stolen credit-card in a new account)

    There are not sufficient funds in the Bank

    Seller Fraud

    Order never arrives/ merchants dont send it

    Product is significantly not what you ordered (e.g., picture of iPhone and not an iPhone)

    There is also AUP, AML, terror-funding, etc.,

    Risk/Fraud is regarded many times as a big threat to

    online/mobile commerce

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    % of Transactions with fraudulent activity is very close to ZERO!

    Well ahead of our competitors (traditional and start-up competitors)

    PayPal gives protection:

    Full buyer/purchase protection (if the seller was fraudulent)

    Full Seller protection for tangible goods

    Global expansion to digital-goods/non-tangible Seller protection as well

    However, reality (at least with PayPal) is so much better

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Car.. Pat.. Created: Aug 30, 2013

    P Tho Legit PayPal user

    t0 $686.55

    Al Wo. Created: Sep 3, 2013

    t+1:03 min. $673.54

    D Mar Legit PayPal user

    J Smi Legit PayPal user

    I Le Legit PayPal user

    Jam.. Lo.. Created: Aug 30, 2013

    Tom.. Men.. Created: Aug 30, 2013

    Alb.. Rich.. Created: Aug 30, 2013

    Fio Jec. Created: Sep 3, 2013

    Don.. Li.. Created: Aug 30, 2013

    Anj.. Por.. Created: Aug 30, 2013

    t+0:23 min. $686.55

    t+1:21 min. $686.55

    t+1:47 min. $686.55

    t+0:58 min. $686.55

    t+0:35 min. $686.55

    t+2:00 min. $686.55

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Identity

    IP

    Phone num.

    Device

    Location

    Phone

    Name

    Email

    Connection

    Change IP

    Change num.

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Identity

    IP

    Phone num.

    Change IP

    Change num. Name

    Email

    Location

    Phone

    Connection

    Device IP b-class

    IP whois

    IP geo

    Phone geo

    Phone type

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Fraudsters are people like you and me, they also have habits

    Scaling attacks requires them to generate a lot of information

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    ana para. pro Premier - Mexican Verified borigita...@hotmail.com Time Created: Jun 20, 2013 10:24:26 Contact Information julion 8 huniero culiacan, Sinaloa 895932 Mexico Home: +52 5278.9700808 Date of Birth: 1981

    gab.. noc sino Premier - Mexican Verified elmxmx@hotmail.com Time Created: Jun 20, 2013 10:43:33 Contact Information culubire 8 amada culiacan, Sinaloa 49693 Mexico Home: +52 53749978989 Date of Birth: 1981

    mor ind ca Premier - Mexican Verified dlxlx@hotmail.com Time Created: Jun 20, 2013 09:46:35 Contact Information poier 9 esdirre culiacan, Sinaloa 59879 Mexico Home: +52 52..697998 Date of Birth: 1981

    sir. bon pas. Premier - Mexican Verified ju.pxpx@hotmail.com Time Created: Jun 20, 2013 10:33:41 Contact Information camjutuli 4 indio culiacan, Sinaloa 43869 Mexico Home: +52 529798 Date of Birth: 1981

    mar sin pit Premier - Mexican Verified cipepe@hotmail.com Time Created: Jun 20, 2013 10:16:19 Contact Information esburgos 9. pancho culiacan, Sinaloa 38692 Mexico Home: +52 5279969798 Date of Birth: 1981

    Behavioral patterns: Jun 20, 2013 Signup Aug 10, 2013 Added CC Jan 11, 2014 Confirmed CC

    Same actions, same dates!

    Name pattern Account type Country Email pattern City Date of birth

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    16

    Input

    Generation

    Algorithm

    Execution

    Output

    Generation

    ~3 hour run-time

    Output:

    26 week x 1M accounts

    5 min per week

    Memory based (not MR)

    ~150k account a week

    ~15M account from the last

    two years

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Merchant Industry in PayPal data is partial and incorrect

    18

    Required only for business accounts

    Incorrect user inputs (lying? / negligent?)

    Example 3 Ambiguous categories:

    Retail (not elsewhere classified)/Chemicals and allied products

    Business/General

    Hardware and Software

    Sells:

    Tickets

    Example 1:

    Declared:

    Sports &

    Recreation/

    General

    18

    Sells:

    Fashion

    Example 2:

    Declared:

    Business to

    Business/Acc

    ounting

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    True Industry

    19

    An automated system for merchant website

    categorization:

    Identifies seller websites using PayPal known URLs

    Crawls seller websites for terms and additional attributes

    Categorizes sites to industry categories by applying statistical

    modeling

    Training process is done offline based on examples to

    produce a predictive model

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Category probability estimation example Travel category

    20

    Random site scores from training set

    Travel site scores from training set For each site, we estimate the probability of it belonging to each category given its weight

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    21

    73%

    91%

    39%

    84%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    PayPal and Agent True Industry

    True Industry Performance

    Accuracy

    Coverage

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    cost, speed

    data volume, accuracy

    Effective decision = func (accuracy, speed, cost)

    data age

    secon

    ds

    ho

    urs

    years

    Data in-motion

    Data in-use

    Tiered Big-Data strategy

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Working HL Flow

    Crawler Crawler

    Manager URL list

    Crawled data

    Sellers Potential Details

    External Data source

    Queries

    API data Attributes

    output

    Common Infra

    THE

    WEB

    WEB

    SERVICES DB

    /FILES

  • 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

    Thank You!

    Ido Lustig ilustig@paypal.com