Provide a 10 pages analysis while answering the following question: Reject inference applied on large data sets. Prepare this assignment according to the guidelines found in the APA Style Guide. An abstract is required. However, this assumption does not hold true in the case of application scoring. The modeling data set becomes inherently biased if the customers that are perceived to be “bad” are approved while those that are perceived to be “good” are rejected.
It is a matter of fact that the only population’s performance that is known is for the approved, which apparently does not perform the same way as the rejected population, hence the rejection of this population is rather questionable. Notably, the selection bias does not take place if further bad rates are estimated using the approved population in the model alone. Nonetheless, considering that the model is applicable to the whole population in order to decide who to reject and who to decline, the bias becomes a very important consideration. Correction and accounting for this sample bias is achieved by use of rejecting inference techniques.
In view of this, a gap is present in any statistical model when known Good-Bad (KGB) of the approved population of loan applicants is used, because of the high sampling bias error that occurs. As a matter of fact, any analysis of characteristics is biased as a result of the ‘cherry selection’ of prospective good customers. If bad rates across the whole population is truly described by the characteristics, then it is evident that the rate of approval by the same characteristics should be inversely related. For a case in point, if the customer has serviced loans without any problem for the last one year, then the subdivision’s general bad rate should be moderately small, and the approval rate from this subdivision should be large. Nevertheless, customers that hold at least 4 bad loans in the previous one year should be treated as a high credit risk.