| Author(s) | Collection number | Pages | Download abstract | Download full text |
|---|---|---|---|---|
| Oliyarnyk T. I., Kudriashova A. V. | № 2 (71) | 26-32 |
|
|
The article discusses a simple but dangerous problem. CRM data sometimes contains a small admixture of records that look plausible, pass the usual checks, but quietly shift the model solutions. Since event collection occurs from forms on the website, from mobile applications, from the call center or through partner interfaces, this is where fake calls, bot traffic and false confirmations of actions most often appear, because the event is created by a person or a partner and it is difficult to distinguish it from a real signal. As a result, the system is more likely to make mistakes about who to write or call, what discount to give, how to distribute the budget. This leads to unnecessary costs, worse customer retention and lower forecast accuracy. We offer a practical CRMPGuard framework. It works on top of existing processes and does three things. First. Checks the origin and plausibility of the data and sends suspicious batches to quarantine for verification. Second. It looks for atypical clusters and individual records that have too much impact on training, and reduces their contribution. Third. It updates the model on cleaned subsets in a safe loop, compares the results in two branches, and only then returns the solution to work. All steps are recorded for audit and compliance with personal data protection requirements.
We present the results in an understandable form. If the impurity is small, the model error increases in approximately the same proportion. After cutting off suspicious examples, the error decreases. That is, even in the presence of impurities, the quality of solutions is kept under control. Example. About one percent of fake reviews appear in a small segment. The indicator increases, suspicious records are isolated, the model is retrained on cleaned data, and a check in two branches shows the restoration of the accuracy and consistency of predictions. This means fewer false contacts and discounts to the wrong customers, more stable campaign operation, and faster recovery from incidents.
Keywords: CRM models, data contamination, origin verification, anomaly detection, learning with limited exposure to suspicious records, prediction quality.
doi: 10.32403/1998-6912-2025-2-71-13-25