Here at P2P Analytics, we’ve received some emails asking the following:
1. What strategy does the Lending Club selection algorithm implement to rank notes?
2. And what exactly is the score that we see in the distribution. What does this score signify?
To answer the first question:
In constructing the algorithm to estimate default rate probability, the first step was to find the top 12 factors that corresponded with influencing default rates:
Grade
Housing status (Home, Mortgage, Rent)
Job time length
Loan Term
Borrower Text description (and whether it was entered)
Total Credit Lines
Income
Type of Loan (Consolidation, Green, Car, etc)
Earliest Credit Line
Number of Credit Inquiries
DTI ratio
Amount Requested
Months since last delinquency
The second step was to check the effect of each of these variables across each grade. For example, Housing status was checked for it’s effect on default rates across all A,B,C, etc loans and a table was built of these variables that described how much each factor influenced default rates (for better of for worse). Each time that variable was checked across a grade, all of the notes ever issued (except the very recent notes) in that grade to make the determination.
A few things to keep in mind about the Lending Club algorithm:
Each variable was built across all loans ever issued for each grade. In other words, thousands of samples were used to make a determination as to the correlation of each variable. Due to the fact that there are twelve correlating factors used, this increases the accuracy of the algorithm’s determinations. Imagine if the algorithm only took into account 3 or 4 factors. It would probably be a better strategy than just guessing when selecting notes, but it still would be a very coarse selection method.
To answer the second question, the score is the output of the algorithm. It’s a number that is a rough estimation of the default rate probability of a particular note. The lower this number is, the better. As the algorithm currently stands, more tweaking needs to be done to make this score slightly more representative in terms of matching actual default rates, but overall the algorithm is very good at comparing notes in funding.


August 11th, 2012
jasonejacks
Posted in
Are the 12 factors analysed independently or collectively against all notes?
We may see mortgages perform better than renters overall, and 36-month perform better than 60-month overall as well. However, together, 60-month and mortgage may perform the best of all.
Yes, all factors are analyzed independently. The algorithm does exactly what you mention: A note that has two independently favorable factors (such as mortgage + 60 month term) does yield a better rank than a note with just one of the factors.
My point is that it may not always hold true that independent analysis justifies the best collective results. You could see underperforming independent factors perform better when combined with other factors. In my example, I said 36-month loans were performing better than 60-month loans if measured independently (I’m not sure if this is true, just an example), but that 60-month loans could perform better than 36-month loans if only looking at loans with people with a mortgage. In that case, shouldn’t a note with mortgage + 60 month yield better, even though 36-month is better when looked at independently?
[...] P2P Analytics - SO HOW DOES THE LENDING CLUB ALGORITHM WORK? (LENDING CLUB STRATEGY) – Social Lending.net[...]