Fraud Detection with Artificial Intelligence

January 02, 2005

From 1999 to 2004, I collected information on the topic of ‘Fraud detection’ on my website.

When I started this in 1999 as a research assistant at the University of Karlsruhe, there was not much information available on the topic of ‘Data Science’. Back then, it was more commonly referred to as ‘Knowledge Discovery in Databases’ (KDD) in academic circles or ‘Data Mining’ in the business world.

In november 2001 my web pages on were the site of the month (Le site du mois) of the (no longer existing) web site web-datamining.net. These pages are gone and are only available at the internet archive.

My web pages on fraud are referenced in the books

Investigative Data Mining for Security and Criminal Detection by Jesus Mena
Computer and Intrusion Forensics by George Mohay et. al.

Introduction

Due to technological advancements, more and more areas of daily life are being permeated by computers. Examples include digital communication, internet commerce (E-Commerce), and online banking.

Due to the complexity of these systems, it is very difficult and also very expensive to find all security vulnerabilities before they are operational. Criminals can thus discover security gaps and exploit them for their (often financial) advantage. For example, digital payment systems have been used for money laundering.

When technical systems are misused, methods are needed to detect this misuse and prevent further fraud.

In the field of fraud detection, user data is analyzed to reconstruct and analyze user behavior. Fraud management goes a step further and also includes preventative measures, such as stronger access controls.

Many thanks to the following individuals for their support: Heinz Cech, Tom Fawcett, Carlos Santa Cruz Fernandez, Al Guiva, Reinhold Huber, Andreas Lenk und Alexey Vasilyev.

Types of fraud

Misuse appears in various areas, but the task remains the same: Based on the available data about user behavior, fraudulent cases must be distinguished from normal cases.

The theory of fraud detection

Numerous algorithms have already been developed in the fields of Knowledge Discovery in Databases (KDD), Data Mining, Machine Learning, and Statistics.

Many of these methods are very general and have been successfully applied in various areas. However, in the field of fraud detection, there are some peculiarities that make the application of these existing methods either impossible or unprofitable.

A unique aspect is that fraud cases only constitute a very small proportion of the total data volume. In statistics, this is referred to as skewed distributions.

For each method of fraud, it is usually necessary to develop a specific detection algorithm, whose parameters must be specially adapted to this ‘pattern of fraud’.

On the other hand, fraudsters change their methods slightly, so that they are no longer detected. Therefore, the detection algorithm must be continuously adjusted.

To limit the damage, a fast response time of the fraud detection systems is necessary. In the case of credit card fraud, for example, it is best if the detection occurs in real-time immediately.

In binary classification (Normal Usage vs. Fraud), there are two different types of errors: false alarms (also known as false positives) and undetected fraud (also known as false negatives). See the following table.”

	Fraud	No fraud
Alarm	correct	false alarm
No alarm	undetected fraud	correct

When an fraud detection system triggers an alarm, it often needs to be reviewed by an employee. The costs for the two types of misdiagnoses are therefore different. With a false alarm, an employee works in vain on a case and wastes valuable working time, and with undetected fraud, the fraud continues. Therefore, cost-sensitive methods are needed.

The constantly changing and skewed distributions and the need for cost-sensitive methods complicate the evaluation of the success of a detection method. Even with “normal” classification methods, several difficulties must be considered when evaluating the success of detection [Sal97]. Usual metrics, such as error rate, accuracy, and ROC curves, are not suitable for fraud detection [PFK98, PF01]. A technique specifically developed for fraud detection is the ROC Convex Hull [PF01].

In traditional databases, data is typically analyzed in the following three steps: “Load the data, create the indexes, and then query the data”. Particularly, loading and index creation can be very time-consuming with large volumes of data, making real-time processing impossible. Here, a new data model has been designed for better handling of large volumes of data, the continuous data streams. This area is still a subject of research, but there are already prototypical data stream management systems, stream processing engines, and an extension of SQL known as Continuous Query Language (CQL).

References

Articles

Tom Fawcett created a bibliographie.
Fraud detection in mobile communication networks in the ASPeCT project.
Some articles from the Statistics Group at Lucent.
A bibliography on fraud detection at the University of Karlsruhe.
Computer Fraud & Security

Workshops and conferences

1997 AAAI Workshop "AI Approaches to Fraud Detection and Risk Management"
1998 AAAI Workshop "The Methodology of Applying Machine Learning"
1998 AI Fall Symposium on Artificial Intelligence and Link Analysis
International Conference on Fighting Mobile FraudLondon, 1997
Research Priorities in Wireless and Mobile Communications and Networking. Report of a Workshop held in March 1997, sponsored by the National Science Foundation, Division of Networking and Communications Research and Infrastructure.

Bibliographie

[AFR97] Emin Aleskerov, Bernd Freisleben, Bharat Rao. CARDWATCH: A Neural Network Based Database Mining System for Credit Card Fraud Detection. In: Proceedings of Computa- tional Intelligence for Financial Engineering (CIFEr), S. 220--226, 1997.
[AME98] Dean W. Abbott, I. Philip Matkovsky und John F. Elder. An Evaluation of High-End Data Mining Tools for Fraud Detection. In: Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, vol. 3, pp. 2836-2841, 1998.
[ATW97] Suhaya Abu-Hakima, Mansour Toloo, Tony White. A Multi-Agent Systems Approach for Fraud Detection in Personal Communication Systems. In: [Faw97], 1997.
[Axe99] Stefan Axelsson. The Base-Rate Fallacy and its Implications for the Difficulty of Intrusion Detection. In: Proceedings of the 6th ACM Conference on Computer and Communications Security, pp. 1-7, 1999.
[BH] Richard J. Bolton, David J. Hand Statistical Fraud Detection: A Review. Statistical Science, 17(3), 235-255.
[BLH99a] R. Brause, T. Langsdorf, M. Hepp. Credit Card Fraud Detection by Adaptive Neural Data Mining. Internal Report 7/99, FB Informatik, University of Frankfurt a.M., 1999
[BLH99b] R. Brause, T. Langsdorf, M. Hepp. Neural Data Mining for Credit Card Fraud Detection. In: Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence. pp. 103--106. 1999.
[BS97a] Peter Burge, John Shawe-Taylor. Detecting Cellular Fraud Using Adaptive Prototypes. In: [Faw97].
[BS97b] Peter Burge, John Shawe-Taylor. Fraud-Management Tools: First Prototype. ASPeCT -- Project, Januar 1997. See [ASPeCT].
[BSCMPS97] P. Burge, J. Shawe-Taylor, C. Cooke, Y. Moreau, B. Preneel, C. Stoermann. Fraud Detection and Management in Mobile Telecommunications Networks.
[CCLPS00] Michael Cahill, Fei Chen, Diane Lambert, José Pinheiro, Don X. Sun. Detecting Fraud in the Real World. In: Handbook of Massive Datasets. Kluewer. 2002.
[CFPS99] Philip K. Chan, Wei Fan, Andreas L. Prodromidis, Salvatore J. Stolfo. Distributed Data Mining in Credit Card Fraud Detection. In: IEEE Intelligent Systems, Bd. 14, Nr. 6, S. 67--74, 1999.
[CLPS99] Fei Chen, Diane Lambert, José Pinheiro, Don Sun. Reducing Transaction Databases, Without Lagging Behind the Data or Losing Information. Unpublished, 1999.
[DB98] Steven K. Donoho, Scott W. Bennett. Fraud Detection and Discovery.
[DC98] J. R. Dorronsoro, C. Santa Cruz. Discrimination of overlapping data and credit card fraud detection. Technischer Bericht, Department of Computer Engineering, Universidad de Madrid, 1998.
[DGSC97] Jose R. Dorronsoro, Francisco Ginel, Carmen Sanchez, Carlos Santa Cruz. Neural Fraud Detection in Credit Card Operations. In: IEEE Transactions on Neural Networks, Nr. 4, Bd. 8, Juli 1997.
[EN96] Kazuo J. Ezawa, Steven W. Norton. Constructing Bayesian Networks to Predict Uncollectible Telecommunications Accounts. IEEE Expert, Nr. 5, Bd. 11, S. 45--51, 1996.
[Faw97] Tom Fawcett. AI Approaches to Fraud Detection & Risk Management --- Papers from the 1997 AAAI Workshop, Technical Report WS-97-07, Juli 1997, AAAI-Press.
[FP97a] Tom Fawcett and Foster Provost. Adaptive Fraud Detection. Data Mining and Knowledge Discovery, vol. 1, no. 3, p. {291-316}. 1997.
[FP97b] Tom Fawcett, Foster Provost. Combining Data Mining and Machine Learning for Effective Fraud Detection. In: [Faw97].
[Gos97] Phil Gosset. Fraud Detection Concepts: Final Report. ASPeCT -- Project, November 1997. See [ASPeCT].
[GH99] Phil Gossett, Mark Hyland. Classification, Detection and Prosecution of Fraud on Mobile Networks. Proceedings of ACTS Mobile Summit, Sorrento, Italy, Juni 1999.
[GR94] Sushmito Ghosh, Douglas L. Reilly. Credit Card Fraud Detection with a Neural-Network. In: Proceedings of the 27th Hawaii International Conference on Information Systems, S. 621-- 630, 1994.
[HDA98] Mark Hyland, Jos Dumortier, Diana Alonso Blas. Legal Aspects of Fraud Detection. ASPeCT-Project. See [ASPeCT].
[HS08] Constantinos S. Hilas, Paris As. Mastorocostas. An Application of Supervised and Unsupervised Learning Approaches to Telecommunications Fraud Detection. Knowledge-Based Systems, 21, pp 721 – 726, 2008. doi:10.1016/j.knosys.2008.03.026.
[HS09] Constantinos S. Hilas, Paris As. Mastorocostas. Designing an expert system for fraud detection in a private telecommunications network. An Application of Supervised and Unsupervised Learning Approaches to Telecommunications Fraud Detection. Expert Systems with Applications. 2009. doi: 10.1016/j.eswa.2009.03.031.
[HS05] Constantinos S. Hilas, John N. Sahalos. User profiling for fraud detection in telecommunication networks. In: 5th International Conference on Technology and Automation, Thessaloniki, Greece, October 2005. pp 382-387.
[HS06] Constantinos S. Hilas, John N. Sahalos. Testing the fraud detection ability of different user profiles by means of FFNN classifiers. In: Collias St. et al ed.. Lecture Notes in Computer Science, vol. 4132, Part II, 2006. pp 872-883.
[HS07] Constantinos S. Hilas, John N. Sahalos. An application of decision trees for rule extraction towards telecommunications fraud detection. In: B. Apolloni et al. (Eds.): KES 2007/ WIRN 2007, Lecture Notes in Artificial Intelligence, vol. 4693, Part II, Springer. 2007, pp. 1112–1121.
[Jen97] David Jensen. Prospective Assessment of AI Technologies for Fraud Detection: A Case Study.
[KKN99] Daniel A. Keim, Eleftherios E. Koutsofios, Stephen C. North. Visual Exploration of Large Telecommunication Data Sets. In: User Interfaces to Data Intensive Systems, S. 12-- 20, 1999.
[MP96] Yves Moreau, Bart Preneel. Definition of Fraud Detection Concepts. ASPeCT -- Project, August 1996. See [ASPeCT].
[OTA95] U. S. Congress, Office of Technology Assessment. Information Technologies for Control of Money Laundering. U. S. Government Printing Office, OTA-ITC-630, Washington DC, September 1995.
[PF97] Foster Provost, Tom Fawcett. Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.
[PF01] Foster Provost, Tom Fawcett. Robust Classification for Imprecise Environments. In: Machine Learning, vol. 42, no. 3, pp. 203-231, 2001.
[PFK98] Foster Provost, Tom Fawcett, Ron Kohavi. The Case Against Accuracy Estimation for Comparing Induction Algorithms. Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98), July 1998.
[Sal97] Steven Salzberg. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. In: Data Mining and Knowledge Discovery, Nr. 3, S. 317--328, 1997.
[SFLPC97] Salvatore J. Stolfo, David W. Fan, Wenke Lee, Andreas L. Prodromidis, Philip K. Chan. Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results. In: [Faw97].
[Stö97] Christof Störmann. Fraud Management Tool: Evaluation Report. ASPeCT - Project, Oktober 1997. See [ASPeCT].

Fraud management systems and services

This list contains Fraud Management Systems, not individual components, such as Data Mining tools. In alphabetical order. This list is not complete. The names of the software products are listed in square brackets.

ACI Worldwide
ACL Services
Advanced Software Applications (A.S.A)
Alcatel
The ai corporation
Amdocs
Beck Computer Systems
Brighterion
Carreker Corporation (now Fiserv)
ChoicePoint
Communications Expert
CyberSource
Ectel [FraudView]
Equinox Information Systems [Protector, Guardian]
FICO (formerly, Fair, Isaac and Company, formerly HNC Software)
FML
i2
infoRate
Inform GmbH[RiskShield]
Inforsud[TimRisk]
Mahindra - British Telecom
Metavante
NFC Global, Inc.
NetMap Analytics
Neural Technologies
Oskar Kilo Ltd.
ReD Retail Decisions
Secure Science Corporation
Subex Systems[Ranger]
Telemate
Telesciences[Sterling]
VerifyFraud
Vips[STARS]
Visual Analytics[VisuaLinks]
Xanalys[Watson]
Xtract

Components of fraud management systems

Fraud Management Systems are often created from usual software components, such as databases, Data Mining, or visualization tools. In alphabetical order. This list is not complete.

KXEN[KXEN Analytic Framework]
Oracle Corporation [Darwin]
SAS Institute [SAS Enterprise Miner]
- SAS Fraud Prevention and Detection for Financial Services
SPSS [Clementine]
- ClearCommerce, online transaction software.
- Lloyds TSB, credit card fraud.
Computer Associates [CleverPath, Neugent]

People and research groups

Research groups

ASPECT, Advanced Security for Personal Communications Technologies

People

Fraud Detection & Prevention at AAAI

Articial Intelligence

Artificial Intelligence Resourcesat the Institute for Information Technology
David W. Aha's Machine Learning Resources
Computational Learning Theory (COLT)
Evaluation of Intelligent Systems
ILPnet2 : Inductive Logic Programming Net
Kernel Based Learning Methods
Knowledge Discovery central
KDNuggets : Data Mining, Web Mining, Knowledge Discovery, and CRM guide
MLnet OiS : Machine Learning network Online Information Service
Mixture Modelling : Cluster
Recursive Partitioning

Statistics

Intrusion Detection Systems

List of intrusion detection systems
National Info-Sec Technical Baseline "Intrusion Detection and Response"

category Big data & data science

tag en tag data tag machine-learning tag ai tag data tag front-page

Fraud Detection with Artificial Intelligence

Introduction

Types of fraud

General

Credit card fraud

Internet fraud

Insurance fraud

Money laundering

Computer Crime

Telecommunications fraud

Identity theft