risk factors for consumer loan default: a censored ...mille/riskfactors.pdf · risk factors for...

27
Risk Factors for Consumer Loan Default: A Censored Quantile Regression Analysis Sarah Miller * June 26, 2014 Abstract The most widely-used econometric technique for analyzing default behavior in con- sumer credit markets is the proportional hazard model, which assumes that borrower characteristics increase or decrease default probability in a similar way over the life of a loan. In this paper, I employ an alternative method using censored quantile regression (Portnoy (2003)) on a data set of over 17,000 loans. This approach evaluates the ef- fect of a borrower’s financial characteristics on the entire distribution of default times, allowing co-variates to influence early and late defaulters in different ways. I find that several typical predictors of credit-worthiness influence default probability differently depending on the amount of time that a loan has been active. I illustrate the impor- tance of this heterogeneity by comparing predicted default probability and expected profit across the two models. Because the quantile regression model takes into account the effect of characteristics on the timing of default, rather than the overall probabil- ity of default, it finds a significantly higher expected profit for low- and medium-risk * University of Michigan Stephen M. Ross School of Business. Email address: [email protected]. I would like to thank Roger Koenker for his generous advice and guidance. I am grateful to an anonymous referee whose comments substantially improved this paper. This paper also benefited from helpful discussions with Dan Bernhardt and Darren Lubotsky and comments from seminar participants at the University of Illinois, Urbana-Champaign. 1

Upload: lymien

Post on 28-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Risk Factors for Consumer Loan Default:

A Censored Quantile Regression Analysis

Sarah Miller∗

June 26, 2014

Abstract

The most widely-used econometric technique for analyzing default behavior in con-

sumer credit markets is the proportional hazard model, which assumes that borrower

characteristics increase or decrease default probability in a similar way over the life of a

loan. In this paper, I employ an alternative method using censored quantile regression

(Portnoy (2003)) on a data set of over 17,000 loans. This approach evaluates the ef-

fect of a borrower’s financial characteristics on the entire distribution of default times,

allowing co-variates to influence early and late defaulters in different ways. I find that

several typical predictors of credit-worthiness influence default probability differently

depending on the amount of time that a loan has been active. I illustrate the impor-

tance of this heterogeneity by comparing predicted default probability and expected

profit across the two models. Because the quantile regression model takes into account

the effect of characteristics on the timing of default, rather than the overall probabil-

ity of default, it finds a significantly higher expected profit for low- and medium-risk

∗University of Michigan Stephen M. Ross School of Business. Email address: [email protected]. I would

like to thank Roger Koenker for his generous advice and guidance. I am grateful to an anonymous referee

whose comments substantially improved this paper. This paper also benefited from helpful discussions with

Dan Bernhardt and Darren Lubotsky and comments from seminar participants at the University of Illinois,

Urbana-Champaign.

1

loans, suggesting that neglecting to model these effects would lead lenders to over-value

high-risk loans relative to low-risk loans.

1 Introduction

The timing of loan defaults has important implications for loan recovery rates and lender

profitability. This paper explores how the impact of borrower characteristics on default

probability changes over the life of a loan. Standard empirical models of loan performance

analyze whether borrower characteristics, such as credit score, increase or decrease the prob-

ability that a loan will default on average.1 However, the effect of typical credit-worthiness

indicators may change over the life of the loan if, for example, very early or late defaulters

behave in systematically different ways. Neglecting to model these differences may lead re-

searchers to conclude that no effect is present, when in fact strong effects exist for early and

late defaulters but have different signs.

I apply the censored quantile regression technique developed by Portnoy (2003) to esti-

mate the effects of borrower characteristics over the complete distribution of default timing

for over 17,000 personal loans from a peer-to-peer lending website. I compare the conclu-

sions from a censored quantile regression model of default risk to those derived from the

widely-used Cox proportional hazard model. The censored quantile regression model reveals

significant heterogeneity among early and late defaulters in the effect of traditional predic-

tors of loan default, such as credit score, past bankruptcies, and number of credit report

inquiries, that cannot be captured by a proportional hazard model.

I illustrate the importance of modeling these time-varying effects with two examples

relevant to credit markets. First, I consider a typical high-, medium-, and low-risk loan

1Some recent examples are Adams et al. (2009), Ravina (2012), Pope and Sydnor (2011), Duarte et al.

(2009), Lin, Prabhala, and Viswanathan (2012), Agarwal et al. (2009), Agarwal et al. (2011), and Bellotti

and Crook (2007), who use a proportional hazard model to analyze personal loan default. Pennington-Cross

(2003) and Ciochetti et al. (2003) apply the model to mortgage default. Glennon and Nigro (2005) use a

similar approach to model small-business loan default rates.

2

to be sold to a third party after two years. Although the proportional hazard model and

the censored quantile regression estimate similar probabilities that these loans ever default,

the proportional hazard model finds a substantially lower probability of default in the last

year. The difference is most pronounced for the low-risk loan: the quantile regression model

estimates a probability of default in the last year that is about four percentage points, or 70

percent, higher than that predicted by the proportional hazard model.

Second, I evaluate the expected net profit of these loans using both the proportional

hazard and the censored quantile regression model. Because the censored quantile regression

model takes into account the difference in default timing for high-quality borrowers, it finds

significantly higher expected net profit for the low- and medium-risk loans and lower profit

for high-risk loans. In this example, failing to account for the heterogeneity in the effects of

borrower characteristics over time would lead to an over-valuation of high-risk loans relative

to low-risk loans.

2 Credit Market Context

My paper uses publicly-available data on loans made to first-time borrowers on the peer-to-

peer lending website Prosper.com between February 2007 and November 2009. This website

connects borrowers and lenders in the United States to facilitate small loans. Potential

borrowers post a request for a loan and also specify the maximum interest rate they are

willing to pay. Prosper pulls the borrower’s credit history from a credit bureau and posts

it with the loan request. Lenders can then choose to invest in the loan. If enough lenders

invest such that the full amount requested by the borrower is funded, then lenders compete

with each other by bidding down the interest rate below the borrower-set maximum. When

the bidding ends, the lowest interest rate on the marginal winning bid becomes the interest

rate on the loan and Prosper facilitates the transaction. If too few lenders are attracted and

the loan is not sufficiently funded, the loan request is canceled and no money changes hands.

The website’s peer-to-peer format attracts many high-risk borrowers who are unable to

secure credit from traditional lending institutions. As such, Prosper loans have a relatively

3

high default rate: over 35% of the approximately 17,600 loans in my sample are currently in

default. Although the default probability for a Prosper loan is high, it is similar to default

rates in other high-risk credit markets (e.g., Adams et al. (2009)). Clearly, avoiding loans

that will eventually default is critical for the profitability of lenders. In the event of a loan

default, however, the timing of default has a large impact on net losses. Loans that default

in the first year have an average recovery rate of only 10.5%; that is, among these early

defaulters, only 10.5% of the principal is recovered. In contrast, loans that default after the

second year have an average recovery rate of 70.2%. The median time to default for bad

loans is 413 days from loan origination.

There is a large and growing body of literature that uses Prosper data to explore various

aspects of credit markets. Many of these studies focus on the role of “soft” information

unique to peer-to-peer lending. Ravina (2012) and Pope and Sydnor (2011) find evidence

that borrowers receive preferential loan terms if they post photographs of attractive people;

Duarte et al. (2009) find that lenders are influenced by the trustworthiness conveyed in the

posted photograph. Freedman and Jin (2008) and Lin, Prabhala, and Viswanathan (2012)

focus on the impact of Prosper online social groups on loan outcomes such as interest rate,

funding probability and loan performance. Iyer et al. (2013) and Miller (2012) analyze

how lenders use a wide range of information to infer the quality of potential borrowers.

Ramcharan and Crowe (2013) use Prosper data to analyze how changes in house prices

across states affect interest rates, delinquency, and credit rationing for Prosper borrowers.

Rigbi (2011) analyzes the effect of usury laws on access to credit, interest rates, and default

probability, using an exogeneous change in usury laws that apply to Prosper borrowers.

In this study, I use the same financial borrower characteristics as other studies using

Prosper data, but focus on how inference is affected by modeling default behavior more

flexibly using quantile regression. To this end, I use all of the verified financial information

available to lenders on Prosper, and supplement it with information about social group

membership and self-reported employment status, but do not include other unverified or

subjective information, such as the beauty or trustworthiness conveyed by the photo included

in the listing.

4

Prosper makes its data publicly available, and every financial variable available to the

lenders is observed. The financial characteristics posted by Prosper are the current number

of delinquent accounts, the total number of credit lines in the borrower’s name (and how

many of these are paid on time and currently open), the number of delinquent accounts in

the last seven years, the number of credit inquiries in the last six months, self-reported em-

ployment status, debt to income ratio, homeownership status, the number of public records

(for example, bankruptcies) on a borrower’s credit report in the last year and the last 10

years, the total dollar amount that the borrower is currently delinquent, their revolving

credit balance, and the ratio of their total debt to their current available credit. Data are

available on the amount of the loan and the interest rate of the loan, as well as the complete

repayment history of all loans for all borrowers. All transactions on Prosper are three-year,

uncollateralized loans. Table 1 presents descriptive statistics, as well as a brief description

of each variable.

In addition to data on borrower’s credit history, I also use 20-point Experian credit score

ranges. To facilitate the estimation, I map these ranges to the median credit score within

the bin, as described in Table 2, to get a value for credit score.2 Credit bureaus create

credit scores by estimating the relationship between observed borrower credit-worthiness

metrics and default probability, then rescaling and ordering the predicted values in such a

way that lenders can use them as a convenient measure of credit risk. As such, it is not

obvious that the credit score necessarily provides different or better information than the

borrower characteristics on their own. Furthermore, because these credit scores were not

created specifically for Prosper, it is doubtful that they accurately capture the expected

default probability in this market. However, because of their convenience, most lenders rely

on credit scores for guidance. I include the credit score in my analysis of default probability

in order to reflect the information set that lenders actually use when making a decision about

funding a loan.

2I find similar results when randomly assigning values within these ranges or assigning high-risk borrowers

to lower than average credit scores within the bin. These results are available upon request.

5

3 Cox Proportional Hazard Model

The standard approach to modeling default probability is the proportional hazard model,

first proposed by Cox (1974). This section uses the proportional hazard model to analyze

the effect of borrower characteristics on default probability, or default “hazard.” The model

for default probability at time t, h(t), for borrower i with characteristics xi is:

h(t) = h0(t) exp(x>i β), (1)

or, after a log transformation,

log(h(t)) = α(t) + β1x1i + ...+ βkxki. (2)

Here α(t) = log(h0(t)). In the Cox model, the baseline hazard function, α(t), is the same

for all borrowers but varies over time. The model makes no assumptions about the shape of

α(t), but assumes a parametric time-constant form on the effects of observed characteristics.

A key assumption is that the effect of the covariates is separable from the temporal effect

embodied in α(t), so that a covariate xj simply shifts the underlying probability of default

up or down depending on the sign of βj.

The borrower characteristics and loan terms, xi, are described in Table 1. Loan char-

acteristics are the amount of the loan, the interest rate on the loan, and the interest rate

squared. Loans that are not in default have censored default times.

The Cox estimates are found in Table 3. The first column presents the estimated coef-

ficients, β̂. The second column reports eβ̂ for each coefficient. There are two observations

about the estimates that may appear unusual. The first is that having a public record, such

as a bankruptcy, in the last year has a small, negative impact on default probability, although

it is not statistically significant. This result is surprising because it implies that recent legal

entanglements due to poor credit behavior may have no effect on – or even decrease – the

probability that a borrower will default on a new loan.

The second observation is that once other financial characteristics are controlled for,

credit score does not measurably impact default probability. Credit score is a function of

6

both the financial variables included in the model and other financial characteristics that are

not displayed on Prosper. The Cox estimates suggest that the available variables capture

the aspects of credit score that influence default probability and thus the remaining marginal

information contained in credit score is negligible. Even in a model that includes only credit

score and credit score squared, the proportional hazard model does not find individually

significant effects, although a model with only credit score finds a strong negative relationship

between credit score and default probability.3

Other indicators of credit-worthiness predict lower default probabilities in the propor-

tional hazard model. Borrowers with full-time employment when the loan is originated are

about 11% less likely to default than those who are not and those who list a city as their

place of residence when they request a loan are about 13% less likely to default. Having open

lines of credit also is associated with lower default risk, as borrowers are able to fall back

on other lending sources to make payments. Alternatively, credit card companies may have

access to additional credit-worthiness metrics that are not observed on Prosper, and the low

default probability associated with open credit lines may be related to these measures of

credit risk. In contrast, no significant effects are found for current and total credit lines.

A high debt to income ratio is associated with higher default risk, indicating that bor-

rowers that have significant debt at the time of their loan request are more likely to default

than other borrowers with similar characteristics. Revolving credit balance and amount

delinquent do not significantly affect default probability. Membership in a Prosper group is

also associated with higher credit risk, although it is not statistically significant.4 A larger

number of credit inquiries, the number of times banks or businesses request the borrower’s

credit report, is associated with higher default rates, as is home ownership. Interest rate has

a positive effect on the probability a loan will default and the coefficient on its quadratic

term is negative. Loan size also is associated with higher default risk.

Although the Cox model provides a good overview of the effect of borrower characteristic

on default rates, relaxing the assumptions of the model may lead to more nuanced insights.

3The estimates from these models are available upon request.4See Lin, Prabhala, and Viswanathan (2012) or Freedman and Jin (2008) for more information on Prosper

groups.

7

In the next section, I explore whether the analysis of defaults can be improved using quantile

regression.

4 Censored Quantile Regression Model

The Cox proportional hazard model requires that the covariates shift the baseline hazard up

or down monotonically and does not identify changes in the effects over the lifetime of a loan.5

In this section, I relax that assumption and allow the effects of borrower characteristics to

affect early and late borrowers differently. I begin by briefly summarizing the methodology

introduced by Portnoy (2003). I present the results in Section 4.2.

4.1 Methodology

Koenker and Bassett (1978) introduced quantile regression, a method for modeling of con-

ditional quantiles (or percentiles) of a dependent variable rather than the conditional mean.

In this paper, I use quantile regression for survival analysis, allowing the variables to have a

different effect on early (low quantile) and late (high quantile) default. I model conditional

quantiles of the log of default time, log(T ), for borrower i with characteristics xi as

Qlog(T )(τ |xi) = α(τ) + β1(τ)x1i + ...+ βk(τ)xki. (3)

I estimate this model for a wide range of quantiles τ ∈ (0, 1), allowing for a full picture

of the default timing distribution for any given loan. The quantile τ is the value of log(T )

associated with a specific cumulative probability of default. For example, if the predicted

quantile for loan i, log(Ti), is equal to 3 at τ = 0.05 or, equivalently,6 Ti = e3 ≈ 20, then loan

5The validity of this assumption has been explored in other duration applications. For example, in a

study of the mortality of fruitflies, Koenker and Geling (2001) find that at high survival times, the effect of

gender reverses: early in their lives, male fruitflies have better survival rates than females, but if they make it

to a sufficiently advanced age, the gender effect reverses sign and female flies have better survival prospects.

These crossover effects are not possible to capture in a traditional Cox proportional hazard model.6Quantiles are preserved under monotone transformations; i.e., for any monotone function g(.), g(Qt(τ)) =

Qg(t)(τ).

8

i has a 5 percent chance to default by day 20. The estimated coefficients vary across τ , so the

observed borrower characteristics may effect low values of τ , associated with early default,

differently than high values of τ , associated with late default. From the untransformed

estimated conditional quantile function Q̂T , one can plot the probability of surviving passed

time t, the survival function S(t), as a function from Q̂T (τ |x) to 1− τ .

A common issue in survival analysis is censored observations. In earlier quantile duration

studies, such as those by Koenker and Geling (2001) and Koenker and Bilias (2001), random

censoring is not an issue because the “event time” is known for all observations. However,

in the Prosper data, all loans that are not in default are considered censored. As only about

35% of the loans are in default, censoring is present in a significant degree.

In the present application, the data are right-censored; i.e. the event time Ti is observed

if Ti < Ci, where Ci indicates a censoring time. Time to default, Ti, is measured in days

from the origination of the loan, and I use log(Ti) as the right-hand side variable. The data

set consists of a vector of pairs (T̃i, xi), where T̃i = min{Ci, Ti}.

In a situation where the censoring times Ci are known for all observations (i.e., “fixed

censoring”), Powell (1986) provides a means of consistently estimating the conditional quan-

tile function for each τ . In the context of loan defaults, however, censoring times Ci are

only observed for the observations that do not default. To further complicate matters, the

probability of being censored may depend on the covariates themselves. In this case, the

probability of being censored–that is, of not defaulting–is positively correlated with a bor-

rower’s credit-worthiness.

Portnoy (2003) suggests an approach to estimating censored quantile regression models

with random censoring that extends Efron (1967) estimate to a regression setting. Portnoy

solves a sequence of weighted quantile regression problems starting at τ near zero and grad-

ually increasing τ toward one while adapting the weights. When an observation (T̃i, xi) is

encountered that is censored, it is split into two parts. The first part remains at the censoring

time, (Ci, xi), while the second part is moved to (T ∗i , xi) where T ∗i is any value lying above all

possible fitted values. In theory, T ∗i could be chosen to be +∞, but computational efficacy

may require a very large but finite value. The relative weight for these pseudo observations

9

is given by

wi(τ) =τ − τ̂i1− τ̂i

. (4)

where τ̂i is the largest τ for which the residual at Ci is positive. That is,

τ̂i = maxτ{x>i β̂(τ) < Ci}. (5)

The estimates of the coefficients β(τ) for each quantile τ are chosen to solve the mini-

mization problem

minβ∈Rp

∑i/∈K

ρτ (Ti − x>i β) +∑i∈K

wi(τ)ρτ (Ci − x>i β) + (1− wi(τ))ρτ (T∗ − x>i β), (6)

where K denotes the set of censored observations that have been encountered up to quantile

τ , ρτ (u) = u(τ − I(u < 0)) and I(.) is the indicator function. In order to compare results to

the proportional hazard model, which employs a log transformation of Ti, I replace Ti with

log(Ti) in the estimation procedure above to obtain my quantile regression estimates.

The estimated quantile function Q̂T (τ |x) can be used to produce a conditional hazard

function h(Q̂T (τ |x)) (Koenker and Geling (2001)). For tj = Q̂T (τj|x), tj+1 = Q̂T (τj+1|x),

and t̄j = (tj + tj+1)/2, the conditional hazard can be plotted as a function of t̄j:

h(t̄j|x) =∆τ/∆Q̂T (τ |x)

1− (τj + τj+1)/2. (7)

The intercept α(τ) is used to derive a baseline hazard function analogous to the h0(t) in the

Cox model (equation (2)) by evaluating Q̂T (τ |x) at x1 = x2 = ... = xk = 0. As in the Cox

model, this “baseline hazard” function is a completely flexible function of time. However,

a critical distinction between the two models is that while the Cox model only has a time-

varying baseline function, the quantile regression model allows both the intercept and the

coefficients to have time-varying effects on the hazard function.7

7Recently, partial likelihood methods have been developed to extend the Cox model to incorporate time-

varying coefficients (see Cai et al. (2007), Tian et al. (2005), Sun et al. (2009)). Although they are not yet

widely-used in the economics literature, these approaches will also capture time-varying effects in a loan

default survival model.

10

4.2 Results

Quantile regression provides a more complete picture of consumer default behavior and sheds

light on some initially puzzling features of the Cox model. Using the Portnoy (2003) method,

as implemented in the R package quantreg, I estimate a censored quantile regression model

of default timing. Quantile regression results are illustrated in Figure 1. The vertical axis

plots the size of the parameter estimate of β(τ) from equation (3) for each variable. The

horizontal axis indicates the different quantiles τ at which the effect is estimated. Positive

effects indicate that the covariate increases the amount of time before the loan has a τ

probability of default, or, analogously, that it decreases the probability of default over that

time interval. Similarly, negative effects indicate that less time must pass before the loan

has a τ probability of default, or that default probability increases with that variable. Lower

values of τ are therefore associated with earlier default times. The grey area indicates the 95

percent confidence interval of the parameter estimate, produced from an xy-pairs bootstrap

of size 500.

In addition to reporting the estimated parameter from equation (3), I also present the

marginal effects of borrower characteristics on the probability of default during three time

intervals in Table 4. These marginal effects are calculated at the sample median and 95

percent confidence intervals are reported under the point estimate.8 The first column shows

how each variable affects default probability within the first 90 days. Although only about 2

percent of Prosper loans default in the first 90 days, these early defaults are costly to lenders

because they are associated with very low recovery rates. The second and third columns

display the marginal effect of each borrower characteristic on the probability of default in the

first year and on the probability that the borrower will ever default, respectively. High-risk

borrowers tend to default early, and results from the quantile regression model highlight

not only which characteristics are associated with higher default rates overall, but which

variables have a strong effect on early defaults in particular.

The estimates from the quantile regression model show significant heterogeneity in the

8Marginal effects are calculated by differencing conditional hazard functions described in equation (7).

Confidence intervals are calculated from an xy-pairs bootstrap of size 500.

11

effect of borrower characteristics on default probability over the life of the loan. Both the

parameter estimates (Figure 1) and the predicted marginal effects (Table 4) indicate that

some borrower information is substantially more effective at predicting early defaults than

later defaults. For example, examining the effect of the full-time employment indicator

reveals that if the borrower is employed full time when the loan is originated, he or she

is approximately 2.2 percentage points – or 18 percent – less likely to default during the

first year of loan repayment. However, employment status at time of loan origination is

associated with only a 1.6 percentage point – or 8 percent – reduction in default probability

during the last two years of the loan. The parameter estimates in Figure 1 indicate that

full-time employment has almost no predictive powers for quantiles above the median default

time. Similar patterns are observed for credit score, city residency and the number of credit

inquiries. It is not surprising that some borrower characteristics are more informative for

defaults that occur around the time the information is observed, as a borrower’s situation

may change over time. However, accounting for the heterogeneity in predictive power over the

life of the loan may significantly affect expected profit. I explore how much these differences

matter in practice in Section 5.

Furthermore, the quantile regression estimates provide an interesting perspective on how

past public records, such as bankruptcies or liens, affect the probability of default. Many

equilibrium models of default assume that a debtor will be excluded from the credit market

following a bankruptcy (see Athreya and Janicki (2009) for an overview). However, using

data from a large credit bureau, Cohen-Cole et al. (2009) find that 90% of bankruptcy filers

have access to credit within 18 months of filing for bankruptcy, and the majority have access

to unsecured credit. Indeed, survey evidence in Porter (2008) indicates that individuals are

aggressively solicited for credit post-bankruptcy. Individuals might be targeted for further

credit after a bankruptcy because they are not allowed to file for Chapter 7 bankruptcy

more than once every eight years, or Chapter 13 bankruptcy more than once every two

years. Because lenders are guaranteed that these borrowers will not declare bankruptcy

again for a fixed period of time, these borrowers may temporarily be better credit risks. As

these restrictions expire, and borrowers are again allowed to declare bankruptcy, they may

become more risky.

12

The results in Table 4 and Figure 1 provide direct evidence for this phenomenon. Having

a public record, such as a bankruptcy, in the last 10 years is initially associated with less

default. In the first 90 days, having an additional public record is associated with a reduction

in default probability of 0.56 percentage points. Because the baseline probability of default

in the first 90 days is only 2 percentage points, this effect represents a nearly 30 percent

reduction in the probability of a very early default. For later defaulters, this effect reverses:

public records have a negative effect on survival probability for later quantiles between

τ = 0.60 and τ = 0.80, and the marginal effect of a public record on the probability a loan

will ever default is positive (although statistically insignificant). Public records in the last 12

months decrease the probability of default for most of the default timing distribution; I only

find negative point estimates (corresponding to higher default probabilities) for the highest

quantiles, none of which are significant. These results call in to question credit market

models that assume that borrowers will be exogenously excluded from credit markets after

a bankruptcy, as it may be in the interest of individual lenders to extend credit immediately

after a bankruptcy, knowing that the borrower will not be able to default again for a set

amount of time.

Other “crossover” effects are present for the number of current credit lines and mem-

bership in Prosper social groups. The proportional hazard model found no significant effect

of these variables on default probability. However, when modeled separately for different

points in the default timing distribution, these variables are found to have effects on default

probability that have a different sign for early and late defaulters. Membership in social

groups prevents early defaults but is associated with more defaults overall. In the first 90

days, social group members are 0.06 percentage points (32 percent) less likely to default than

non-social group members, but they are 1.5 percentage points more likely to default overall.

One reason for this may be that early defaults are especially embarrassing for social groups

and social pressure is more effective in preventing them. Alternatively, Prosper groups may

behave like the social groups that screen labor market participants in Popov and Bernhardt

(2012): the groups are unattractive to the highest quality borrowers, but able screen out

the very worst quality borrowers. A large number of current credit lines is also associated

with less default initially and more default later, conjuring the image of a frazzled borrower

13

paying credit card bills off with other credit cards – the strategy works for a while, but

eventually the debt becomes overwhelming.

Finally, the effect of some borrower characteristics does not appear to change substantially

over the life of the loan. For example, more credit delinquencies has a strong and negative

effect on the probability of repayment at every point in the default timing distribution.

5 Empirical Implications

5.1 Example: Secondary Market for Loans

In this section, I use an example of loan resales to illustrate how the difference between the

proportional hazard model and the censored quantile regression model matter in practice.

Since the mid-1990s, the secondary market for loans has grown considerably in the United

States (Santos and Nigro (2007)), requiring third parties to assess the default risk of loans

long after the original transaction occurred. Prosper.com launched a secondary market for

Prosper loans in 2010. Although the Prosper secondary market updates information on the

borrower’s credit score range, all other financial variables are fixed at the level observed at

the time that the loan was first requested. I compare the predicted default rates derived

from the proportional hazard and quantile regression models for loans in their third year,

mirroring the risk evaluation that a trader in the secondary market for loans must perform.

I consider an example loan for $10,000 to the average low-, medium-, and high-risk

borrower, as defined by their credit score. Low-risk borrowers are defined as borrowers with

credit scores over 760, medium borrowers have credit scores in the range 640 to 679, and

high-risk borrowers fall in to the credit score range of 520 to 559. I choose the interest rate

as the average interest rate within these groups for loans between $9500 and $10500.9 Other

borrower characteristics are evaluated at the mean of each risk group. Table 5 presents the

covariate values for the example loans.

9I allow for a small range in amount of the loan when assessing the average interest rate because relatively

few loans in each group are for exactly $10,000.

14

Table 6 shows predicted default probabilities derived from the proportional hazard and

censored quantile regression models estimated in Sections 3 and 4. Columns 2 and 3 display

the predicted probability from each model that the loan will default at some point during

the 3 year repayment period. Both the proportional hazard and quantile regression models

estimate remarkably similar probabilities of the loan ever defaulting. However, Columns 4

and 5 indicate that the quantile regression model finds different expected probabilities of

default in the last year. The expected probability of defaulting in the last year for a low-risk

loan is almost 70% higher when using estimates from the quantile regression model. This

difference is significant at the 0.01 level. Because the quantile regression model takes into

account the change in the effect of certain characteristics over time (see Figure 1), it results

in different predictions for the timing of loan default in ways that have practical implications

for loans resold on the secondary market.

5.2 Example: Expected Profit

Loans that default early lose close to the entire principal, whereas loans that default near

the end of their term may still generate a positive profit. Because default timing is crucial to

lender profitability, using a flexible approach such as quantile regression may be especially

useful in credit markets. In this section, I present an example of how expected profit of a

loan differs between the proportional hazard and the quantile regression model.

I use the predicted default probabilities at each month during the 36 month period from

the censored quantile regression model in Section 4 and the proportional hazard model in

Section 3 to estimate the expected profit of a loan for $10,000 made to a typical high-,

medium-, and low-risk borrower with the characteristics and interest rate described in Table

5. Table 7 reports expected profit, with 95% confidence intervals, for each loan. Although the

proportional hazard and the censored quantile regression estimate similar probabilities that

each loan will ever default (see Columns 1 and 2 of Table 6), the two models find significantly

different expected profit for the same loan because the effect of certain characteristics varies

significantly for early and late defaulters.

15

Because the quantile regression model predicts that defaults will occur later for high-

quality borrowers, the low- and medium-risk loans have significantly higher expected profit

when the predicted probabilities from the quantile regression model are used relative to

the proportional hazard model. Expected profit on low-risk loans is almost 45% higher

when predicted probabilities from the quantile regression are used. For medium-risk loans,

the quantile regression predicts an expected profit about 15% higher than the proportional

hazard model, although both models still predict a loss. The differences for both low- and

medium-risk loans are significant at the 0.01 level. Both the proportional hazard and the

quantile regression model predict equally abysmal losses for the high-risk loans.

6 Conclusion

This paper analyzes the default behavior of loans made on a peer-to-peer website using two

econometric methods: the widely-used proportional hazard model of Cox (1974) and the

more flexible censored quantile model described by Portnoy (2003) that allows the effects of

covariates to vary over the life of the loan. I find evidence that several traditional predictors

of default affect early and late defaulters differently, including “crossover” effects that cannot

be captured by a proportional hazard model. Incorporating this heterogeneity is especially

important in credit markets, where the timing of loan default substantially impacts recovery

rates and lender profitability.

I further illustrate the empirical implications of these differences with an example of

how the evaluation of a typical high-, medium-, and low-risk loan on the secondary market

changes when each model is used. While both models predict a similar probability that each

loan will ever default, the censored quantile regression model, because it takes into account

time-variation in the covariate effects, predicts a higher probability that low-risk loans will

default later. As a result, the expected profit for low- and medium-risk loans is substantially

higher when predicted default probabilities from the quantile regression are used rather than

the proportional hazard predicted probabilities. In this example, neglecting to account for

the time-varying effect of the covariates would lead to undervaluing low- and medium-risk

16

loans relative to high risk loans.

References

Adams, W., L. Einav, and J. Levin (2009). Liquidity constraints and imperfect information

in subprime lending. American Economic Review 99 (1), 49–84.

Agarwal, S., S. Chomsisengphet, and C. Liu (2011). Consumer bankruptcy and default: The

role of individual social capital. Journal of Economic Psychology 32, 632650.

Agarwal, S., S. Chomsisengphet, C. Liu, and N. S. Souleles (2009). Benefits of relation-

ship banking: Evidence from consumer credit markets. Federal Reserve Bank of Chicago

Working Paper Series.

Athreya, K. B. and H. P. Janicki (2009). Credit exclusion in quantitative models of

bankruptcy: Does it matter? Economic Quarterly 92 (1), 17–49.

Bellotti, T. and J. Crook (2007). Credit scoring with macroeconomic variables using survival

analysis. Working Paper. University of Edinburgh, Credit Research Centre.

Cai, J., J. Fan, H. Zhou, and Y. Zhou (2007). Hazard models with varying coefficients for

multivariate failure time data. The Annals of Statistics 35 (1), 324–354.

Ciochetti, B., R. Yao, and Y. Deng (2003). A proportional hazards model of commercial

loan mortgage default with originator bias. Journal of Real Estate Finance and Eco-

nomics 27 (1), 5–23.

Cohen-Cole, E., B. Duygan-Bump, and J. Montoriol-Garriga (2009). Forgive and forget:

Who gets credit after bankruptcy and why? Working Paper Series: Federal Reserve Bank

of Boston.

Cox, D. (1974). Regression models with life tables. Journal of the Royal Statistical Society 34,

187–220.

17

Duarte, J., S. Siegel, and L. Young (2009). Trust and credit. Working Paper, University of

Washington Business School.

Efron, B. (1967). The two-sample problem with censored data. Proc. Fifth Berkeley Sym-

posium in Mathematical Statistics 4.

Freedman, S. and G. Jin (2008). Do social networks solve information problems for peer-to-

peer lending? Evidence from Prosper.com. Working Paper, University of Maryland.

Glennon, D. and P. Nigro (2005). Measuring the default risk of small business loans: A

survival analysis approach. Journal of Money, Credit, and Banking 37 (5), 923–947.

Iyer, R., A. Khwaja, E. Luttmer, and K. Shue (2013). Screening peers softly: Inferring the

quality of small borrowers. NBER Working Paper No. 15242.

Koenker, R. and G. W. Bassett (1978). Regression quantiles. Econometrica 46 (1), 33–50.

Koenker, R. and Y. Bilias (2001). Quantile regression for duration data: A reappraisal of

the Pennsylvania re-employment bonus experiments. Empirical Economics 26, 199–200.

Koenker, R. and O. Geling (2001). Reappraising medfly longevity: A quantile regression

survival analysis. Journal of the American Statistical Association 96, 458–468.

Lin, M., N. Prabhala, and S. Viswanathan (2012). Judging borrowers by the company they

keep: Friendship networks and information asymmetry in online peer-to-peer lending.

forthcoming, Management Science.

Miller, S. (2012). Information and default in consumer credit markets: Evidence from a

natural experiment. Working Paper. University of Illinois Department of Economics.

Pennington-Cross, A. (2003). Credit history and the performance of prime and nonprime

mortgages. Journal of Real Estate Finance and Economics 27 (3), 279–301.

Pope, D. and J. Sydnor (2011). What’s in a picture? Evidence of discrimination from

Prosper.com. Journal of Human Resources 46 (1), 53–92.

18

Popov, S. and D. Bernhardt (2012). Fraternities and labor market outcomes. American

Economic Journal: Microeconomics 4 (1), 116–141.

Porter, K. (2008). Bankrupt profits: The credit industrys business model for postbankruptcy

lending. Iowa Law Review 93, 1369–1421.

Portnoy, S. (2003). Censored regression quantiles. Journal of the American Statistical As-

sociation 98 (464), 1001–1012.

Powell, J. (1986). Censored regression quantiles. Journal of Econometrics 32, 143–155.

Ramcharan, R. and C. Crowe (2013). The impact of house prices on consumer credit:

Evidence from an internet bank. Journal of Money, Credit, and Banking 45 (6), 10851115.

Ravina, E. (2012). Love and loans: The effect of beauty and personal characteristics in credit

markets. Working Paper, Columbia University Graduate School of Business.

Rigbi, O. (2011). The effects of usury laws: Evidence from the online loan market. forth-

coming, Review of Economics and Statistics.

Santos, J. A. and P. Nigro (2007). Is the secondary loan market valuable for borrowers?

Working Paper. Available at SSRN: http://ssrn.com/abstract=1026788.

Sun, Y., R. Sundaram, and Y. Zhao (2009). Empirical likelihood inference for the Cox

model with time-dependent coefficients via local partial likelihood. Scandinavian Journal

of Statistics 36 (3), 444–462.

Tian, L., D. Zucker, and L.-J. Wei (2005). On the Cox model with time-varying regression

coefficients. Journal of the American Statistical Association 100, 172–183.

19

Fig

ure

1:C

ondit

ional

Quan

tile

Eff

ects

onT

ime

toD

efau

lt

0.2

0.4

0.6

0.8

−0.040.000.04

Cre

ditS

core

oooooooooooooo

ooo

o

0.2

0.4

0.6

0.8

−4e−050e+004e−05

Cre

ditS

core

2

ooooooooooooooooo

o

0.2

0.4

0.6

0.8

−0.40.00.2G

roup

Dum

my

ooooooooooooooooo

o

0.2

0.4

0.6

0.8

−7e−05−4e−05

Am

ount

oooooooooooo

oo

oooo

0.2

0.4

0.6

0.8

−0.4−0.20.0

Bor

row

erR

ate

ooooooooooooo

o

ooo

o

0.2

0.4

0.6

0.8

−0.0040.0020.006

Bor

row

erR

ate2

oooooooooooooo

ooo

o

0.2

0.4

0.6

0.8

−0.100.00

Deb

tToI

ncom

eRat

io

o

ooooooo

oooooo

ooo

o

0.2

0.4

0.6

0.8

−0.6−0.2

Hom

e

oooooooooooooo

o

o

o

o

0.2

0.4

0.6

0.8

−0.08−0.040.00

Cur

rent

Del

inqu

enci

es

ooooooooooooooo

ooo

0.2

0.4

0.6

0.8

0.000.020.04

Del

inqu

enci

esLa

st7Y

ears

oooooooooooooooo

o

o

0.2

0.4

0.6

0.8

−0.08−0.05−0.02

Inqu

iries

Last

6Mon

ths

oooooooooooo

oooo

o

o

0.2

0.4

0.6

0.8

−0.100.00Publ

icR

ecor

dsLa

st10

Year

s

o

oooo

ooooooooooooo

0.2

0.4

0.6

0.8

−0.3−0.10.10.3Publ

icR

ecor

dsLa

st12

Mon

ths

ooooooooooo

ooo

oo

oo

0.2

0.4

0.6

0.8

0.000.050.10

Ope

nCre

ditL

ines

oooooooooooooooo

o

o

0.2

0.4

0.6

0.8

−0.20.00.20.4

City

Dum

my

oooooooooooooooo

o

o

0.2

0.4

0.6

0.8

−0.20.00.20.4

Fullt

ime

oooooooooooooooooo

0.2

0.4

0.6

0.8

−0.40.00.4

Ban

kcar

dUtil

izat

ion

oooooooooooooooo

o

o

0.2

0.4

0.6

0.8

−5.0e−061.0e−05

Rev

olvi

ngC

redi

tBal

ance

ooooooooooooooooo

o

0.2

0.4

0.6

0.8

−1e−051e−05

Am

ount

Del

inqu

ent

oooooooooooooo

oooo

0.2

0.4

0.6

0.8

−0.0100.0000.010

Tota

lCre

ditL

ines

ooooooooooooo

o

ooo

o

0.2

0.4

0.6

0.8

−0.100.00

Cur

rent

Cre

ditL

ines

oooooooooooooooo

oo

0.2

0.4

0.6

0.8

−5515

Inte

rcep

t

oooooooooooooo

ooo

o

20

Tab

le1:

Des

crip

tive

Sta

tist

ics

Var

iable

Nam

eD

escr

ipti

onM

ean

Min

/Max

Am

ount

Dol

lar

amou

nt

oflo

an65

8310

00/2

5000

Bor

row

erR

ate

Inte

rest

rate

pai

dby

bor

row

er17.9

80.

01/3

6.00

Cit

yD

um

my

Bin

ary

vari

able

takin

ga

valu

eof

1if

the

bor

row

eris

are

siden

tof

aci

ty0.

140/

1C

urr

ent

Del

inquen

cies

Num

ber

ofcr

edit

lines

curr

entl

ylist

edas

del

inquen

t1.

030/

83D

ebt

toIn

com

eR

atio

Rat

ioof

dol

lar

valu

eof

outs

tandin

gdeb

tto

self

-rep

orte

din

com

e0.

340/

10D

elin

quen

cies

Las

t7

Yea

rsN

um

ber

ofdel

inquen

tac

counts

inth

ela

stse

ven

year

s5.

100/

99F

ullti

me

Bin

ary

vari

able

takin

ga

valu

eof

1if

the

bor

row

erre

por

tsth

eyar

e0.

880/

1em

plo

yed

full-t

ime

Gro

up

Dum

my

Bin

ary

vari

able

takin

ga

valu

eof

1if

the

bor

row

eris

am

emb

erof

0.31

0/1

aP

rosp

erso

cial

grou

pH

ome

Bin

ary

vari

able

takin

ga

valu

eof

1if

the

bor

row

eris

ahom

eow

ner

0.47

0/1

Inquir

ies

Las

t6

Mon

ths

Num

ber

ofti

mes

the

cred

itre

por

thas

bee

nre

ques

ted

2.63

0/63

Public

Rec

ords

Las

t10

Yea

rsN

um

ber

ofpublic

reco

rds,

such

asban

kru

ptc

ies

orlien

s,list

edon

0.38

0/30

the

bor

row

er’s

cred

itre

por

tin

the

last

10ye

ars

Public

Rec

ords

Las

t12

Mon

ths

Num

ber

ofpublic

reco

rds,

such

asban

kru

ptc

ies

orlien

s,list

edon

0.04

0/7

the

bor

row

er’s

cred

itre

por

tin

the

last

year

Op

enC

redit

Lin

esN

um

ber

ofac

tive

lines

ofcr

edit

inth

eb

orro

wer

’snam

e8.

210/

45C

urr

ent

Cre

dit

Lin

esN

um

ber

ofop

enor

clos

edac

counts

inth

eb

orro

wer

’snam

eth

atth

e9.

412/

129

bor

row

eris

pay

ing

onti

me

Tot

alC

redit

Lin

esT

otal

num

ber

ofcr

edit

lines

list

edin

the

bor

row

er’s

cred

itre

por

t24.3

52/

129

Am

ount

Del

inquen

tD

olla

rva

lue

ofto

tal

del

inquen

cies

1163

0/44

4745

Ban

kcar

dU

tiliza

tion

Ban

kcar

ddeb

tdiv

ided

by

avai

lable

cred

it0.

550/

5.95

Rev

olvin

gC

redit

Bal

ance

Outs

tandin

gbal

ance

oncr

edit

card

sor

other

revo

lvin

gcr

edit

acco

unts

1135

20/

9999

4Sam

ple

Siz

e:17

677

Sam

ple

incl

ud

esall

loan

sm

ad

eby

firs

t-ti

me

borr

ow

ers

issu

edth

rou

gh

Pro

sper

bet

wee

nF

ebru

ary

17,

2007

an

dN

ovem

ber

30,

2009.

Pro

sper

data

an

dE

xp

eria

ncr

edit

score

ran

ges

dow

nlo

ad

edby

the

au

thor

from

Pro

sper

.com

on

02/11/2011.

21

Table 2: Credit Grade-Category Mapping

Experian Scorex PLUS Credit Range Credit Grade Credit Score Assigned Percent of Sample880− 899 AA 889.5 0.04860− 879 AA 869.5 0.03840− 859 AA 849.5 0.59820− 839 AA 829.5 1.31800− 819 AA 809.5 1.96780− 799 AA 789.5 3.51760− 779 AA 769.5 5.17740− 759 A 749.5 5.37720− 739 A 729.5 7.16700− 719 B 709.5 7.03680− 699 B 689.5 9.32660− 679 C 669.5 8.59640− 659 C 649.5 12.84620− 639 D 629.5 9.70600− 619 D 609.5 9.29580− 599 E 589.5 3.91560− 579 E 569.5 4.77540− 559 HR 549.5 3.47520− 539 HR 529.5 5.62

Sample includes all loans made by first-time borrowers issued through Prosper between February 17, 2007 and November 30, 2009

Prosper data and Experian credit score ranges downloaded by the author from Prosper.com on 02/11/2011.

22

Table 3: Cox Proportional Hazard ModelCoef exp(Coef) se(Coef) Z p-value

Credit Score -0.00 1.00 0.00 -1.24 0.22Credit Score2 0.00 1.00 0.00 0.46 0.65

Group Dummy 0.04 1.04 0.03 1.36 0.17Amount 0.00 1.00 0.00 16.21 0.00

Borrower Rate 0.18 1.20 0.01 14.01 0.00Borrower Rate2 -0.00 1.00 0.00 -10.80 0.00

Debt To Income Ratio 0.04 1.04 0.01 3.60 0.00Home 0.21 1.24 0.03 7.12 0.00

Current Delinquencies 0.04 1.04 0.00 8.05 0.00Delinquencies Last 7 Years -0.01 0.99 0.00 -4.34 0.00

Inquiries Last 6 Months 0.05 1.05 0.00 19.22 0.00Public Records Last 10 Years 0.03 1.03 0.01 2.28 0.02

Public Records Last 12 Months -0.07 0.93 0.05 -1.41 0.16Open Credit Lines -0.02 0.98 0.01 -2.35 0.02

City Dummy -0.14 0.87 0.04 -3.60 0.00Fulltime -0.11 0.89 0.04 -2.83 0.00

Bankcard Utilization -0.17 0.84 0.04 -4.26 0.00Revolving Credit Balance 0.00 1.00 0.00 0.62 0.53

Amount Delinquent -0.00 1.00 0.00 -0.77 0.44Total Credit Lines 0.00 1.00 0.00 0.44 0.66

Current Credit Lines 0.01 1.01 0.01 1.44 0.15Sample size: 17677

Sample includes all loans made by first-time borrowers issued through Prosper between February 17, 2007

and November 30, 2009. Prosper data and Experian credit score ranges downloaded by the author from

Prosper.com on 02/11/2011.

23

Table 4: Estimated Marginal Effects of Borrower Characteristics in Default Probabilities byTime, Censored Quantile Regression Model

Effect on default: First 90 Days First Year EverCredit Score -0.0002 -0.0003 -0.0004

[-0.0003, 0.0001] [-0.0005, -0.0002]*** [-0.0005,-0.0002]***Group Dummy -0.006 -0.011 0.0152

[-0.0133, 0.0002]* [-0.0188,-0.0019]** [0.0012,0.0292]**Debt To Income Ratio -0.0016 0.0053 0.0059

[-0.0014, 0.0046] [0.0007,0.0094]** [0.0025, 0.0129]**Home -0.0027 0.0196 0.0477

[-0.0100, 0.0037] [0.0113, 0.0306]*** [0.0340, 0.0628]***Current Delinquencies 0.0026 0.0064 0.0102

[-0.0018, 0.0041]* [0.0041,0.0094]*** [0.0074, 0.015]***Delinquencies Last 7 Years 0.0004 -0.0007 -0.0016

[-0.0009, 0.0090] [-0.0013,-0.0001]** [-0.0024, -0.0005]**Inquiries Last 6 Months 0.0016 0.0059 0.0116

[-0.0019, 0.0034] [0.0044, 0.0077]*** [0.0096, 0.0149]***Public Records Last 10 Years -0.0056 -0.0019 0.0018

[-0.0050, 0.0006]* [-0.0040, 0.0041] [-0.0015, 0.0119]Public Records Last 12 Months -0.0030 -0.0118 -0.0199

[-0.0075, 0.0108] [-0.0224, 0.0029] [-0.0462, 0.0022]City Dummy -0.0051 -0.0192 -0.0374

[-0.0106, 0.0051] [-0.0270, -0.0085]*** [-0.0518, -0.0232]***Fulltime -0.0001 -0.0212 -0.0376

[-0.0093, 0.0074] [-0.0324, -0.0097]*** [-0.0567,-0.0181]***Bankcard Utilization -0.0112 -0.0315 -0.0425

[-0.0164, -0.0011]** [-0.0423, -0.0212]** [-0.0640,-0.0022]***Revolving Credit Balance 0.0010 0.0003 0.00002

[-0.0012, 0.0007] [-0.0001, 0.0006] [-0.0002, 0.0012]Amount Delinquent -0.0031 -0.0002 -0.0029

[-0.0010, 0.0013] [-0.0010, 0.0009] [-0.0023, 0.0004]Open Credit Lines -0.0003 -0.0022 -0.0082

[-0.0059, 0.0075] [-0.0089, 0.0046] [-0.0189, 0.0013]Total Credit Lines -0.0038 0.0002 -0.0019

[-0.0021, 0.0014] [-0.0009, 0.0017] [-0.0009, 0.0042]Current Credit Lines -0.0047 -0.0030 0.0020

[-0.0100,0.0031] [-0.0094, 0.0043] [-0.0089, 0.0117]Baseline probability of default (τ): 0.019 0.116 0.318Sample size: 17677This table displays the marginal effects of Credit Score (increase of 1 point), Current Delinquencies (increase of 1), Revolving Credit

Balance (increase of $1000), Amount Delinquent (increase of $1000), Debt to Income Ratio (increase of 1), Total, Open, and Current Credit

Lines (increase of 3 lines), Bankcard Utilization (increase of 1). Binary variables show the effect of moving from 0 to 1. Marginal

effects computed at the sample median. Sample includes all loans made to first-time borrowers issued through Prosper between February

17, 2007 and November 30, 2009. Prosper data and Experian credit score ranges downloaded by the author from Prosper.com on 02/11/2011.

24

Table 5: Mean Borrower Characteristics by Group

Covariates Low risk Medium risk High riskCredit Score 793.7 657.5 537.1Group Member=1 0.18 0.27 0.49Debt to Income Ratio 0.30 0.34 0.28Homeowner=1 0.77 0.50 0.19Current Delinquencies 0.13 0.71 3.97Delinquencies Last 7 Years 0.74 5.12 11.11Credit Inquiries Last 5 Months 1.65 2.53 4.22Public Records Last 10 Years 0.11 0.41 0.70Public Records Last 12 Months 0.01 0.04 0.09Open Credit Lines 9.69 7.97 4.95City Resident=1 0.12 0.13 0.24Employed Full Time=1 0.87 0.88 0.90Bankcard Utilization 0.23 0.60 0.68Revolving Credit Balance $12931.93 $11700.41 $3177.98Amount Delinquent $283.60 $881.27 $3297.50Total Credit Lines 26.60 24.60 19.40Current Credit Lines 11.18 9.44 5.64Borrower Rate 9.96 19.53 25.68Low-risk loans have credit scores between 760 and 899. Medium-risk loans have credit scores between 640 and 679.

High-risk loans have credit scores between 520 and 559. Average interest rate for a $10,000 is based on the average

interest rate by risk category for loans between $9,500 and $10,500. Sample includes all loans made by first-time

borrowers issued through Prosper between February 17, 2007 and November 30, 2009. Prosper data and Experian

credit score ranges downloaded by the author from Prosper.com on 02/11/2011.

25

Tab

le6:

Exam

ple

:T

he

Sec

ondar

yM

arke

tfo

rL

oans

Pre

dic

ted

Pro

bab

ilit

yof

Def

ault

inth

eT

hir

dY

ear

Pro

bab

ilit

yth

eL

oan

Wil

lE

ver

Def

ault

Pro

bab

ilit

yof

Def

ault

inL

ast

Yea

rD

iffer

ence

Pro

por

tion

alH

azar

dC

enso

red

Quan

tile

Reg

ress

ion

Pro

por

tion

alH

azar

dC

enso

red

Qu

anti

leR

egre

ssio

nC

ol4

-C

ol5

Low

risk

loan

0.21

0.23

0.05

0.09

-0.0

4[0

.19,

0.22

][0

.21,

0.25

][0

.05,

0.06

][0

.07,

0.10

][-

0.05

,-0

.02]

***

Med

ium

risk

loan

0.52

0.51

0.11

0.11

-0.0

0[0

.50,

0.53

][0

.49,

0.53

][0

.10,

0.13

][0

.10,

0.13

][-

0.02

,0.

01]

Hig

hri

sklo

an0.

770.

780.

120.

15-0

.03

[0.7

4,0.

79]

[0.6

9,0.

84]

[0.1

1,0.

13]

[0.0

8,0.

21]

[-0.

09,

0.03

]P

red

icte

dp

rob

ab

ilit

yof

def

au

ltev

alu

ate

dat

mea

nch

ara

cter

isti

csby

gro

up

.S

eeT

ab

le5.

95%

con

fid

ence

inte

rvals

are

pre

sente

din

bra

cket

sb

enea

thp

red

icte

dvalu

es.

Con

fid

ence

inte

rvals

are

esti

mate

dw

ith

an

xy-p

air

sb

oots

trap

size

500

(sam

ple

size

is17677).

Inth

efi

fth

colu

mn

,st

ars

ind

icate

stati

stic

all

ysi

gn

ifica

nt

diff

eren

cefr

om

zero

.S

ign

ifica

nce

level

s:*

=10%

,**

=5%

,***=

1%

26

Tab

le7:

Exam

ple

:E

xp

ecte

dP

rofit/

Los

sE

xp

ecte

dN

etP

rofi

tD

iffer

ence

Pro

por

tion

alH

azar

dC

enso

red

Qu

anti

leR

egre

ssio

nC

ol2

-C

ol3

Low

risk

loan

$245

.46

$357

.90

-$11

2.44

[$14

9.05

,$3

42.8

0][$

240.

80,

$475

.80]

[-$1

79.4

7,-$

50.2

5]**

*M

ediu

mri

sklo

an-$

845.

77-$

723.

12-$

122.

65[-

$963

.68,

-$71

7.41

][-

$847

.04,

-$58

1.69

][-

$181

.70,

-$66

.17]

***

Hig

hri

sklo

an-$

2793

.94

-$28

87.4

0$9

3.46

[-$3

096.

87,

-$24

40.5

2][-

$329

9.81

,-$

2373

.36]

[-$1

67.8

8,$3

09.5

5]P

red

icte

dp

rob

ab

ilit

yof

def

au

ltev

alu

ate

dat

mea

nch

ara

cter

isti

csby

gro

up

.S

eeT

ab

le5.

95%

con

fid

ence

inte

rvals

are

pre

sente

din

bra

cket

sb

enea

thp

red

icte

dvalu

es.

Con

fid

ence

inte

rvals

are

esti

mate

dw

ith

an

xy-p

air

sb

oots

trap

size

500

(sam

ple

size

is17677).

Inth

efo

urt

hco

lum

n,

stars

ind

icate

stati

stic

all

ysi

gn

ifica

nt

diff

eren

cefr

om

zero

.

Sig

nifi

can

cele

vel

s:*

=10%

,**

=5%

,***=

1%

27