Analysis of the CFPB consumer complaints database

Overview: data, process and objective

This page presents preliminary analysis of the Consumer Finance Protection Bureau's (CFPB) online complaint database. The dataset comprises 1.2 million consumer complaints lodged against financial institutions. Complaint record attributes include: date, company, product, issue category, consumer comments (optional), and company response.

Prior to analysis, the CFPB dataset was merged with demography data corresponding to the 3 digit zip code of the complainant. Additionally, customer comments were run through a sentiment analysis algorithm to assign a sentiment polarity and subjectivity score.

The analysis below comprises static analysis, focused on understanding the representativeness of the data and teasing out patterns in financial institution behavior, as well as predictive analysis, focused on predicting where complaints are likely to arise in the future.




Summary

Preliminary analysis revealed the following findings:




Complaints and resolution by income

The graph below show complaints against quartiles of median income at the 3-digit zip code level. The graph indicates that individuals in the most wealthy zip codes are more than twice as likely to file complaints than those in the lowest income zip codes, which may create selection bias in the overall complaint data.

combined_zipcode



Complaints per 10,000 population, by year

Maps show complaints at a 3-digit zip code level. Heat map indicates 90th, 95th and 99th percentile of complaints for each year (i.e., "trouble spots").




Complaints by product over time

The graph below show complaints by product (y axis) against time (x axis). The graph suggests some cross-product correlation in complaints (vertical bands) as well as persistence within product over time (horizontal bands).

predict_heatmap'



Complaint prediction at the product level

Graphs show predicted complaints in test data (2018 and 2019) using various lagged variables as predictors.

All products

predict_all long_predict_all year_predict_all

Mortgage

predict_all long_predict_all year_predict_all

Debt collection

predict_all long_predict_all year_predict_all



Complaint prediction at the company level

Graphs show predicted complaints in test data (2018 and 2019) using various lagged variables as predictors.

All companies

predict_all long_predict_all year_predict_all

Wells Fargo

predict_all long_predict_all year_predict_all

Citi

predict_all long_predict_all year_predict_all

Transunion

predict_all long_predict_all year_predict_all



Next steps

Potential next steps for this analysis include: