implicit icdm

Upload: anonymous-rrgvqj

Post on 08-Jul-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/19/2019 Implicit Icdm

    1/14

    Fraud Detection in Telecommunications:History and Lessons Learned

    Richard A. B ECKER , Chris V OLINSKY , and Allan R. W ILKS

    Statistics Research DepartmentAT&T Labs-Research

    Florham Park, NJ 07932([email protected] ; [email protected] ; [email protected] )

    Fraud detection is an increasingly important and difcult task in today’s technological environment. Asconsumers are putting more of their personal information online and transacting much more business overcomputers, the potential for losses from fraud is in the billions of dollars, not to mention the damage doneby identity theft. This paper reviews the history of fraud detection at AT&T, one of the rst companiesto address fraud in a systematic way to protect its revenue stream. We discuss some of the major fraudschemes and the techniques used to address them, leading to generic conclusions about fraud detection.Specically, we advocate the use of simple, understandable models, heavy use of visualization, and aexible environment and emphasize the importance of data management and the need to keep humans inthe loop.

    KEY WORDS: Automated message accounting; Communities of interest; Data stream; Graph match-ing; Signatures; Social networks; Top- k .

    1. INTRODUCTION

    “Food gained by fraud tastes sweet to a man, but he ends up with a mouth fullof gravel.” Proverbs 20:17 (New International Version)

    Fraud, the act of deceiving others for personal gain, is cer-tainly as old as civilization itself. The word comes from theLatin fraudem , meaning deceit or injury, and over the years hascome to represent a wide array of injustices, including forgedartwork, condence schemes, academic plagiarism, and email

    advance-fee frauds (such as the well-known Nigerian-emailscam). Although these forms of fraud are very different in na-ture, they all have in common a dishonest attempt to convincean innocent party that a legitimate transaction is occurring whenin fact it is not. Usually the fraud is for monetary gain, but notalways, as fraud may be perpetrated for political causes (e.g.,electoral fraud), personal prestige (e.g., plagiarism), or self-preservation (e.g., perjury).

    In the twentieth century, fraud matured in the area of transac-tional businesses, most notably in the telecommunications andcredit card industries. Due to the sheer volume of transactionsin these businesses, fraud could go unnoticed fairly easily, be-

    cause it was such a small proportion of the overall business.In the early days of the telecommunications business, “socialengineering” was used to convince telephone operators to giveaccess to lines or complete calls that were not authorized. Inthe 1950s, AT&T started automating direct-dial long distancecalling, exposing themselves to the rst generation of hackers.Because fraud could now be perpetrated without speaking to ahuman, it could be automated.

    In the early days of fraud, perpetrators (hereinafter referred toas fraudsters ) were able to inict signicant losses on telecom-munications companies, resulting in billions of dollars in uncol-lectable revenue. As time went on, these companies tried manytechniques to combat the fraudsters, with some success. Thusbegan an “arms race” that has continued to this day, with the

    legitimate businesses and the fraudsters aiming to stay one stepahead of each other.

    AT&Thasplayeda signicant role in fraud detection over theyears. As a telephone monopoly, and one of the rst companiesto use technology automating communications, AT&T was aprime target of the technologically curious. A new subculture of telecommunications fraudsters (known as phreakers ) was bornwhen Joe Engressia, a famous early phreaker, stumbled into theworld of fraud in 1957. He realized that by whistling at cer-

    tain frequencies, he could control the trunks that automated callrouting (Rosenbaum 1971 ). In the following years, he perfectedthe art of breaking into AT&T’s phone system. Joe is quoted assaying:

    I want to work for Ma Bell. I don’t hate Ma Bell the way some phone phreaksdo. I don’t want to screw Ma Bell. With me it’s the pleasure of pure knowledge.There’s something beautiful about the system when you know it intimately theway I do.

    Although Engressia’s interest was purely academic, others usedsimilar knowledge to create devices to hack into the phone sys-tems and provide free calls to users at AT&T’s expense.

    In the decades that followed, communications became in-

    creasingly complicated, as did the fraud schemes that targetedthem. Large company telecom systems called “private branchexchanges” (PBX), cellular communications, and virtual pri-vate networks are a few examples of some of the new tech-nologies that telecommunications companies have had to pro-tect from fraudsters. Today, AT&T continues to be a main tar-get of fraudsters. Although the incremental cost of carrying asingle call has become quite small in today’s large networks,signicant money is lost through foreign settlement costs and

    © 2010 American Statistical Association andthe American Society for Quality

    TECHNOMETRICS , FEBRUARY 2010, VOL. 52, NO. 1DOI 10.1198/TECH.2009.08136

    20

    mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.amstat.org/http://www.asq.org/http://pubs.amstat.org/loi/techhttp://dx.doi.org/10.1198/TECH.2009.08136http://dx.doi.org/10.1198/TECH.2009.08136http://pubs.amstat.org/loi/techhttp://www.asq.org/http://www.amstat.org/mailto:[email protected]:[email protected]:[email protected]

  • 8/19/2019 Implicit Icdm

    2/14

    FRAUD DETECTION IN TELECOMMUNICATIONS 21

    domestic access charges between companies. The Communi-cations Fraud Control Association (cfca.org ) periodically es-timates the extent of worldwide telecommunications fraud. In1999 this estimate was $12 billion, in 2003 it was between $35and $40 billion, in 2006 it was between $55 and $60 billion,and in 2009 it was between $70 and $78 billion. Other indus-tries, most notably the credit card industry, also have been chal-lenged with ghting sophisticated fraud schemes, with varyinglevels of success.

    One characteristic of ghting fraud that reaches across mosttransaction-based industries is the massive scale of the problem.“Massive” is a hard word to dene and is clearly relative to theanalyst’s resources and the era. In the case of current telecom-munications and credit card fraud, there are billions of trans-actions per day plus associated metadata. Clearly, data storage,compression, and management must be a well-designed part of any fraud detection scheme.

    With the rise of the Internet and Internet-based businessesin the 1990s and early 2000s, new types of fraud began toemerge. More transactions and purchases were taking place on-line, and new companies, such as eBay.com and Amazon.com ,had to concern themselves with fraud detection. The Internetera has spawned new types of fraud, including click-fraud, e-mail spam, phishing schemes, and denial of service attacks. Inaddition, the Internet provides access to data about individualsthat was previously much more difcult to collect and consoli-date, resulting in the serious problem of identity theft.

    In recent years, fraud modeling and detection has becomean academic pursuit; attempts to summarize the state of theart include reviews by Bolton and Hand (2002) and Phua etal. (2005). In this article we review some of the strategies thatAT&T has used to ght fraud. Although fraud cuts across many

    industries and takes many shapes and forms, we believe someof the lessons learned over the decades of experience at AT&Tare relevant, and that there are some common philosophies thatapply to ghting many different kinds of fraud. Some of thesephilosophies are specic to problems involving massive datasets and thus are quite relevant to new types of fraud emergingin the twenty-rst century.

    The article is organized as follows. Section 2 discusses thenature of fraud. Section 3 explores how we have gone aboutghting fraud over the years. Section 4 covers issues in imple-mentation of fraud systems. Finally, Section 5 summarizes thelessons learned at AT&T and how these might apply to the fraud

    detection problems of today and tomorrow.

    2. THE NATURE OF FRAUD

    The most difcult aspect of ghting fraud is identifying it. Inthe context of telecommunications, a fraudulent phone call isone in which there is no intent to pay—theft of service. Whenfraudulent phone usage is found, the normal response is to shutdown whatever avenue the fraudster found, although in egre-gious cases, law enforcement may be involved.

    Usage management is a business function closely related tofraud detection, dealing with accounts that are likely to be un-collectable. There is no sharp line between intent to pay and

    ability to pay; thus it is often reasonable to have the same de-tection tools look for both fraud and usage management prob-lems. In fact, fraud often will masquerade as a usage manage-ment problem; rather than seeking clever ways to subvert thetelecom networks, the fraudster will simply subscribe to a le-gitimate service, but with no intention to pay.

    Historically, fraud takes many different forms, ranging fromteenagers hacking into systems from their bedrooms to sophis-ticated organized crime rings ( Angus and Blackwell 1993 ). Ourgoal was to create a fraud management system that was power-ful enough to handle the many different types of fraud that weencountered and exible enough to potentially apply to thingswe had not seen yet. We next provide examples of some com-mon varieties of fraud in the telecommunications world.

    1. Subscription fraud . Subscription fraud happens whensomeone signs up for service (e.g., a new phone, extralines) with no intent to pay. In this case, all calls associ-ated with the given fraudulent line are fraudulent but areconsistent with the prole of the user.

    2. Intrusion fraud . This occurs when an existing, otherwiselegitimate account, typically a business, is compromisedin some way by an intruder, who subsequently makes orsells calls on this account. In contrast to subscription calls,the legitimate calls may be interspersed with fraudulentcalls, calling for an anomaly detection algorithm.

    3. Fraud based on loopholes in technology . Consider voicemail systems as an example. Voice mail can be cong-ured in such a way that calls can be made out of the voicemail system (e.g., to return a call after listening to a mes-sage), as a convenience for the user. However, if inade-quate passwords are used to secure the mailboxes, it cre-ates a vulnerability. The fraudster looks for a way into

    a corporate voice mail system, compromises a mailbox(perhaps by guessing a weak password), and then uses thesystem to make outgoing calls. Legally, the owner of thevoice mail system is liable for the fraudulent calls; afterall, it is the owner that sets the security policy for the voicemail system.

    4. Social engineering . Instead of exploiting technologicalloopholes, social engineering exploits human interactionwith the system. In this case the fraudster pretends to besomeone he or she is not, such as the account holder, ora phone repair person, to access a customer’s account.Recently, this technique has been used by “pretexters” in

    some high-prole cases of accessing phone records to spyon fellow board members and reporters (Kaplan 2006 ).5. Fraud based on new technology . New technology, such as

    Voice over Internet Protocol (VoIP), enables internationaltelephony at very low cost and allows users to carry theirUS-based phone number to other countries. Fraudsters re-alized that they could purchase the service at a low priceand then resell it illegally at a higher price to consumerswho were unaware of the new service, unable to get itthemselves, or technologically unsophisticated. Detectingthis requires monitoring and correlating telephony usage,IP trafc and ordering systems.

    6. Fraud based on new regulation . Occasionally, regulationsintended to promote fairness end up spawning new types

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

    http://www.cfca.org/http://www.cfca.org/http://www.ebay.com/http://www.amazon.com/http://www.amazon.com/http://www.ebay.com/http://www.cfca.org/

  • 8/19/2019 Implicit Icdm

    3/14

    22 RICHARD A. BECKER, CHRIS VOLINSKY, AND ALLAN R. WILKS

    of fraud. In 1996, the FCC modied payphone compen-sation rules, requiring payphone operators to be compen-sated by the telecommunication providers. This allowedthese operators to help cover the cost of providing accessto phone lines, such as toll-free numbers, which do notgenerate revenue for the payphone operator. This spawneda new type of fraud—payphone owners or their associatesplacing spurious calls from payphones to toll-free num-

    bers simply to bring in compensation income from thecarriers.

    7. Masquerading as another user . Credit card numbers canbe stolen by various means (e.g., “shoulder surng”—looking over someone’s shoulder at a bank of payphones,say) and used to place calls masquerading as the card-holder.

    There are many more fraud techniques, some of which arequite sophisticated and combine more than one known method.Telecommunications fraud is not static; new techniques evolveas the telecom companies put up defenses against existing ones.The fraudsters are smart opponents, continually looking for ex-

    ploitable weaknesses in the telecom infrastructure. Part of theirmotivation is accounted for by the fact that once an exploit isdened, there are thousands (or millions) of potential targets.New types of fraud appear regularly, and these schemes evolveand adapt to attempts to stop them.

    3. DETECTING FRAUD

    A fraud detection system must be exible to respond quicklyand effectively to a variety of fraud types. There is one simplerule to follow in detecting fraud: “Follow the money.” When thenotorious Willie Sutton was asked “Why do you rob banks?,” hereplied “Because that’s where the money is.” Similarly, telecomfraud most often involves international calls, because those callsare the most costly. Although early phone phreakers often didtheir exploits for the “glory,” much of current telecom fraud isinvolved with making money. If a fraudster can steal a phonecall and then sell it to someone else, that is pure prot. This isalso why there are various schemes associated with audiotextnumbers: high-cost, international chat lines or lines with adultcontent. They are expensive services that can generate lots of revenue quickly, and fraudsters can receive a cut of the proceedsfrom these expensive calls.

    Fighting fraud is complicated by the existence of multipletelecom carriers. Since equal access was instituted in 1982

    (Kellogg, Huber, and Thorne 1999 ), hundreds of long-distancecarriers have come on the scene, and they can be used on a per-call basis by dialing a carrier prex when placing a call. Thusfraudsters can easily migrate from one carrier to another; justas paying customers have a choice of long-distance carriers, sodo the fraudsters. That means that once a carrier detects andstops fraudulent usage on its network, that usage is likely tomigrate onto another network. Notifying a PBX owner to se-cure the compromised system is often the only way to ensurethat all of the leaks are plugged.

    There is an interesting phenomenon associated with the dif-ferential ability of telecom companies to carry out this task. If one company is better than others, fraudsters will tend to mi-grate to the others, where their “cost of doing business” will be

    lower. In the fraud detection game, when one telecom companyis better than the competition at detecting and stopping fraud,the fraud tends to move to the competitors.

    Ideally, a fraudulent call is detected as soon as it is made,but depending on the type of fraud, detection may require asequence of several calls. In any case, the sooner the fraud isdetected, the sooner corrective action can be taken. If a personis involved in the detection/correction phase of the process, it isimportant that this person be available quickly as well.

    Interestingly, as with any consumer-related transactional ser-vice, there is an excellent distributed fraud detection system:customer bills. Every month, millions of customers look overtheir bills and complain if they see fraudulent calls there. Un-fortunately, however, the 1-month billing cycle may allow fraudto run rampant for a few weeks, which is unacceptable fromthe company’s perspective. Also, with more consumers turn-ing to online billing, increasing numbers of people never look at their bill. Nonetheless, customer bills provide a great safetynet, identifying fraud that might have been overlooked by afraud detection system. Recently there has been an increased

    academic interest in modeling fraud and documenting the suc-cesses ( Fawcett and Provost 1997 ; Moreau et al. 1997 ; Rossetet al. 1999; Bolton and Hand 2002 ; Cahill et al. 2002 ; Phua etal. 2005). Most of these attempts in the statistical literature fo-cus on the core fraud detection algorithm. In our experienceat AT&T in the context of fraud detection for massive datastreams, there is no way to effectively separate the data analy-sis from the larger issues of data delivery, data management,and data storage. In this section we describe the framework thathas developed over the years and how the data analysis ts intothis broader scheme.

    There are several key components to a fraud detection sys-tem:

    • A continuing source of call detail data• A database to store the data• A set of detection algorithms• People to verify fraud and implement corrective measures• Visualization tools to help people make diagnoses.

    We next describe our experience in each of these areas.

    3.1 Data Sources

    The data that we analyze in our fraud detection system com-prises telecommunication records—more simply, records of

    phone calls. The essence of a phone call can be described bya small number of variables: originating and terminating tele-phone numbers, time stamp and duration of the call, and per-haps some other ags denoting the type of call it was (e.g., toll-free, operator-assisted). But even for something as simple as aphone call, the scale of the network and the legacy of its many,varied components combine to create a large collection of ob-scure and complex data sources.

    The most common source of data describing telephone callsis produced in the industry-standard automated message ac-counting (AMA) format used to create phone bills. Because thisformat is designed for billing, the AMA data are comprehen-sive, as described in the 2300-page GR-1100-CORE (TelcordiaTechnologies 2005 ). AT&T collects information on billions of

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    4/14

    FRAUD DETECTION IN TELECOMMUNICATIONS 23

    local, long distance, cellular, and international calls each day,amounting to hundreds of gigabytes of data. For billing pur-poses, daily receipt of AMA data is sufcient, but because frauddetection often may occur on a much shorter time scale, dailyreceipt is too slow. Originally AMA was sent from the network switches to the billing systems on magnetic tape, but now itis transmitted electronically and often can be sent in real time.Note that even with real-time AMA data, there is still the draw-

    back that the details for a call are available only after the call iscomplete.

    Data from cellular calls also are available, although typicallynot in a standardized form like AMA. Instead, the equipmentvendors for cellular switches typically have their own propri-etary and complex call detail formats, each of which must beunderstood on a deep level.

    Another source of data is the SS7 signalling data used to setup calls (International Telecommunication Union 1988 , 1993).The Common Channel signalling network uses the SS7 proto-col. It is independent of the voice network and it was designed,at least in part, to prevent fraud by keeping the signalling awayfrom the voice channel. Data collected on the signalling net-work is similar in volume to the AMA data and has the sameadvantage of real-time collection; it has the additional advan-tage that it can be sampled at call setup, avoiding the delay untilcall completion inherent in AMA.

    Another data source is the Line Information Database(LIDB), which gives the status of other calls, such as bill-to-third party, collect, and calling card calls ( Telcordia Technolo-gies 2000 ).

    Along with all of these data, we also must deal with a con-tinuous stream of metadata in the form of dozens of differenttables in many different formats. Each of these provide insightinto the meaning of the call details.

    It is normal to think in terms of volume in discussions of large data streams. What we have learned over the years is that,at least in this application, a large data stream brings greatlyincreased complexity. This forces us to repeatedly have to digdeeper to understand the meaning of the record elds them-selves, which in turn leads to changes in how we codify andprocess the data. This tight integration of data analysis and datamanagement is a direct result of the size (volume and complex-ity) of our data sources.

    Incidentally, data produced by deterministic software (asours is) might be expected to be easier to understand. In fact,this is not true. Often the programs that run in telecommunica-tion switches have many millions of lines of code and thus can

    produce data that is just as demanding to interpret as that pro-duced from natural phenomena—determinism is effectively anillusion.

    3.2 Storage

    In a modern computing environment, it makes economicsense to store all of the call detail data for an extended period.Storing these data in a separate database simplies the fraud de-tection system; when call detail is needed, it is a simple matterto query it from the database.

    Database technology is well established, although the vol-ume of data and the need for real-time loading make selec-tion of the database management system (DBMS) important.We required a database that could be updated in real time and

    that could hold vast amounts of historical data in minimal disk space. The ability to retrieve thousands of call records asso-ciated with a particular phone number quickly was important.We also required efciency in machine resources, so that wecould get by on standard commercially available hardware. Be-cause it fullled these requirements, we chose to use the Day-tona DBMS ( Greer 1999 ).

    Because Daytona uses both eld-level and record-level com-

    pression, the actual disk space for the database, including ex-tensive indexes, is smaller than the size of the raw data alone.The compression also speeds up retrieval time for queries, be-cause fewer bytes are accessed from disk. A typical query thatwould retrieve the most recent 2,000 international calls for aparticular phone number would take less than a minute.

    Our strategy is to store all of the call detail for several yearsto provide a lengthy historical reference. In 2005, our call detaildatabase was recognized as the largest data warehouse, basedon normalized data volume, in the Winter Corporation’s surveyof the world’s largest databases ( Winter Corporation 2005 ).

    As the call detail arrives, each record is augmented with sup-porting information, such as the estimated retail cost of the call,

    the type of service (direct dial, collect, bill to third party), andwhether the calling number is a business or a residence. In ad-dition, it is important to ensure the quality of the data in thedatabase by making as many consistency checks as possible.

    Once the call detail database is established, it lls manyneeds beyond providing information for the fraud system. Hav-ing a database allows for retrospective analyses and providesthe means of testing new algorithms on historical data. It alsoprovides the historical view of individual calling patterns, sothat deviations from normal can be seen.

    3.3 Detection

    Because a fraud detection system processes huge volumesof data, detection is a “needle in a haystack”–type problem.The system must automatically lter the data looking for un-usual usage patterns, and then algorithms or people can look atthe unusual parts to zero in on the potential fraud.

    3.3.1 Early Threshold-Based Alerting. Early fraud detec-tion algorithms often used thresholds, delivering alerts whenthe number of messages and the number of minutes of call-ing within a specic time period exceeded a preset threshold.A drawback to this method is that if the time period is, say, anhour, then the fraud continues on average for 30 minutes with-out an alert. Another important drawback is that this methodtreats customers identically; if a threshold is set so that it ndsfraud for a small business line, then a large business is likely totrip the threshold routinely.

    To better identify fraud, thresholds moved to ner granular-ity. Separate thresholds were provided for hour of the day, dayof the week, and country called. Ultimately, one of the earlyAT&T fraud systems had about 30,000 threshold values be-cause of this desire to ne-tune alerting, which led to difcul-ties in managing all of the thresholds while still failing to treatcustomers individually.

    Even with such a ne-grained set of thresholds, there is awide variety of usage from customer to customer, so that thresh-olds set tightly enough to catch fraud often catch lots of legiti-mate use as well. In addition, as more independent thresholdingvariables are introduced, eventually there will be more thresh-olds than customers (i.e., the curse of dimensionality).

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    5/14

    24 RICHARD A. BECKER, CHRIS VOLINSKY, AND ALLAN R. WILKS

    As this realization dawned on us, it seemed like a goodidea to specialize thresholds to individual customers. We wouldkeep a signature for each customer phone number, containing arough idea of that phone’s current calling characteristics. Thisis how several of us at AT&T Bell Laboratories got involved infraud detection in 1995—working to devise a customer-specicsignature based alerting system. Details of the specics of signature-based systems have been reported previously ( Cortes

    and Pregibon 2001 ; Lambert, Pinheiro, and Sun 2001 ; Cahill etal. 2002); we give a brief description here.

    3.3.2 Signature-Based Alerting. The basic idea behindsignature-based alerting is that as each call comes in, it is com-pared to the current signature for that customer and also to ageneric fraud signature. If the call looks more like the customerusage pattern than fraud, then the characteristics of the call areused to update the signature according to an adaptive exponen-tially weighted moving average (EWMA). But if the call lookssubstantially more like fraud than like the customer signature,then the fraud score will be increased for the phone number.Once that per-customer fraud score passes a threshold, an alertis generated.

    Notice that the signatures are self-updating; as new data areseen, the signature becomes more representative of the cus-tomer’s current calling behavior and is able to track evolution-ary changes in the customer’s behavior.

    Signatures are based on various univariate measures:

    • Calling rate (calls/hour)• Distribution of calls by day of week • Distribution of calls by hour of day (for both weekday and

    weekend)• Distribution of call duration• Regions of the world called• Most frequent countries called• Most frequent phone numbers called• Whether calls are charged, and the billing numbers used.

    All of this information is stored in about 500 bytes per cus-tomer and contains simple statistical summaries of the proper-ties of that particular user, as shown in Figure 1.

    Each parameter in the signature X is updated by a version of an EWMA,

    X n = θ ∗D c + (1 − θ) ∗X p, (1)

    where X p and X n are the previous and new values of the pa-rameter and D c is the information that comes from the currentcall.

    These types of EWMA models ( Winters 1960 ) have beenwidely used in statistical forecasting and statistical processmanagement. The properties of EWMA models have been stud-ied extensively and are covered in standard textbooks (e.g.,Abraham and Ledolter 1983 ). EWMAs update parameters ina smoother fashion than simple moving averages and allow forstreaming updating of parameters without the need to accessdata that have already been seen.

    Two important parameters must be set in an EWMA: the pa-rameter θ and the updating interval. Both of these together helpdetermine how quickly new information washes out old infor-mation. If θ is large, then the decay curve is steep and new in-formation quickly washes out old information. If the updatinginterval is small, as in call-based updating, then θ might have

    to be adjusted so that signal from frequent callers does not getwashed out too fast. In applicationswith time-based updating, itis often sensible to set θ globally for all signatures. As an exam-ple, consider an application with daily updating. With θ = 0.85,the parameter value will effectively discount all data (by reach-ing 0.1% of its value) after 30 days, while θ = 0.5 will reach thesame value in just over a week. The type of fraud that we are try-ing to detect might inform which of these is more reasonable.

    The most time-sensitive applications tend to use event-basedupdating, with eq. ( 1) applied call-by-call. Here we set a differ-ent θ for each signature as a nonlinear function of the callingrate, to ensure that proles with heavy activity are not diluted.

    To address this, we make θ smaller as the calling rate, r , mea-sured in calls per week, increases. Basically, we multiply θ by1/ log2(r ) to ensure that old information is not washed out of the estimate too quickly. Setting a different θ for each signa-ture adds a signicant amount of complexity to the modeling.A middle ground between global parameter setting and indi-vidually varying parameters is to segment the data into relevantcategories (e.g., residences and businesses) and t the parame-ter separately for each segment.

    Another way of setting θ was proposed by Hill et al. (2006).They optimized θ by looking at the signature as a predictivefunction. The best value of the parameter is the one that bestpredicts future behavior of the phone number. From this per-spective, θ can be learned by segmenting the data into trainingand test sets and using machine learning methods to minimizethe predictive error.

    Another challenge in the computation of ( 1) is maintaininga “top-k ” list. For example, we might wish to track the mostfrequently called countries or phone numbers or the most fre-quently used billing numbers. We do this essentially by givingeach number not in the current top- k list a small chance of re-placing the last item on the list. This problem have been studiedextensively in recent years (see, e.g., Metwally, Agrawal, andAbbadi 2005 ).

    An interesting special case is how to initialize the signaturefor a phone number. When there is no prior signature, data val-ues from the rst two calls are used to select an initial signaturefrom a set of empirically built signatures. In addition, old signa-tures that are not reinforced by new calls within a month or soare dropped completely, because after that much time withoutcalls, the phone number could well have been reassigned.

    As mentioned earlier, the overall fraud score for a particularphone number may increase call by call. To complement thisgrowth, the fraud scores are decreased every day, leading oldscores to decay away unless they are reinforced by new indi-cations of fraud. In this way, the fraud score is an indicator of recent fraud activity.

    Signature-based alerting tends to do a good job ndingchanges in calling patterns that indicate fraud. No one detectionalgorithm can detect all types of fraud, however. In our system,we use many detection algorithms, each of which tends to besimple, rather than a single complex algorithm. There is powerin many simple models as opposed to a single complex model.Various detection algorithms may generate an alert on the samephone number, and these alerts are combined into a single caseto be investigated. The case manager is a component of the sys-tem and can decide which cases should be investigated rst,based on the type of fraud and the likelihood of fraud (as op-posed to a false alarm).

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    6/14

    FRAUD DETECTION IN TELECOMMUNICATIONS 25

    Figure 1. Statistical prole for a typical legitimate customer.

    Signatures extend naturally to help solve other problems inthe telecom business. For example, a common question askedabout a phone number is whether it belongs to a business or toa residence. Various characteristics of each call tend to mark its likelihood as a business or residence (e.g., hour of day, du-ration). By keeping a signature that records how business-likea phone number is, we can determine empirically how eachphone is behaving.

    3.3.3 Moving to Graph-Based Signatures. Statistics-ba-sed signatures, such as those shown in Figure 1, have provenvery useful in the type of anomaly detection-based alertingthat we need to catch many types of fraud. Social networksof known fraudsters are another source of information that canhelp identify new fraud cases. We discovered that fraudstersoften belonged to communities, in that they either communi-cated with one another or communicated through intermedi-aries that could be identied through a network analysis. Toeffectively use social networks to ght fraud, we had to concep-tualize the call detail data as a graph, in which nodes are phone

    numbers and directed edges represent communication betweenthose numbers. We call this the callgraph network , and let G t denote the callgraph network at time t . The size of the callgraphnetwork for all telephone numbers is on the order of hundredsof millions of nodes and billions of edges, yet the social net-work of any individual number consists of the small subset of nodes that are “close” to that node. Social networks for an indi-vidual number are usually on the order of dozens, but we needto be able to extract that social network easily for any one of the hundreds of millions of numbers of which we are aware. Tobetter cope with the scale, we developed a framework knownas the community of interest (COI) signature for each telephonenumber ( Cortes, Pregibon, and Volinsky 2001 ). This signatureincludes the top numbers (top- k ) called by the target numberand the top numbers that call the target number. This allows usto look at the massive graph at a local scale, where each phonenumber has its own small graph and the union of these graphsmake up the bulk of G t .

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    7/14

    26 RICHARD A. BECKER, CHRIS VOLINSKY, AND ALLAN R. WILKS

    This signature has the additional challenge of dealing withthe fact that phone numbers are transient—many new phonenumbers appear regularly as new lines are provisioned, andmany old ones disappear as people move or transition to otherservices. A 2003 study ( Cortes, Pregibon, and Volinsky 2003 )showed that in a given week, about 1% of all phone numbersdisappear, and a similar percentage of new phone numbers ap-pears. Over the course of several months, there is the potential

    for a large turnover of the phone number space, which we ac-count for in the graph signatures by a variant of the exponentialsmoothing in eq. (1). Let Ĝ t − 1 denote the top- k approximationto G t − 1 at time t − 1, and let g t denote the graph derived fromthe new transactions at time step t . The approximation to G t isformed from Ĝ t − 1 and gt , node by node, using a top- k approxi-mation to eq. ( 1),

    Ĝ t = top-k {θ gt ⊕ (1 − θ) Ĝ t − 1}, (2)

    where the ⊕ operator is a graph sum operation that takes theunion of the nodes and edges in the two graphs for the ag-gregate graph. The “top- k ” part of the updating is a pruningfunction that includes only the neighbor nodes with the high-est weight in the COI signature. Everything that is not includedin the top-k edges gets aggregated into an overow bin calledother . Figure 2 shows an example of this updating scheme. Herek = 9, and the COI signature contains the top 9 other numberscalled. The middle panel shows the calls that were made today;these calls include one number that does not currently exist inthe signature. The nal panel shows the resulting blend of oldinformation and today’s data. The new number has knocked alow weight edge into the other bin, and the other weights haveeither been updated or decayed based on whether a call wasseen to that number today.

    Truncation of the signature in this fashion ensures that only

    the most relevant nodes will make it into the signature. In gen-eral, calling behavior is heavily skewed, such that the vast ma- jority of calling minutes are concentrated on the caller’s top fewfriends. We set θ and k such that typically 95% of all commu-nication behavior is accounted for in the top- k links (Hill et al.2006).

    Once we have COI signatures for each phone number, auto-matically updated through the exponential weighting process,these serve as a surrogate for doing analysis on the entire

    graph. Conceptually, we have now taken our communicationsgraph G t , which is of the order of hundreds of millions of nodesand billions of edges, and replaced it with a database of hun-dreds of millions of graphs, with each graph stored in an in-dexed database and readily retrievable. As such, we can extractmore sophisticated graphs by applying this process recursivelyand including nodes in the extended COI signature that are two,three, or more hops away from the target node.

    These graph-based signatures have proven very useful in de-tecting different types of fraud. In one example, owners of adult-content chat lines attempted to set up fraudulent toll-freephone numbers that terminated at their chat lines. Typically,these customers would pretend to be a legitimate business, suchas a ower shop or candy store, perhaps guaranteeing the ac-count with a stolen credit card. The service would be activatedand the number disseminated to the consumers of the service,who would call into the service for free. After a while, the billgoes unpaid and the number is shut down, but because we havephony information, there is no one to prosecute, leaving thefraudulent business owner free to start over again with a newchat line.

    Although detecting the bad guys is difcult, monitoring theCOI signatures of the users of the service led us to the nextfraudulent line scheme. These customers are not necessarilyfraudulent themselves, but they belong to a group of peoplewho use fraudulent services. In addition, the fraudsters them-selves tend to communicate, or at least to belong to an afnitygroup of people who try to scam phone companies.

    Based on examples like the foregoing, we established a “guiltby association” module to lead us to new fraud cases. Figure 3represents the network for a typical case. In this example, there

    is a new suspicious number (labeled “suspect”). After this num-ber has established its COI signature through calling behav-ior, we build an extended social network and visualize it in thegraph shown in the gure. By coloring in the nodes of knownfraudulent numbers, we see that this suspect has 5 known fraud-sters in her network. Our models translate the number of knownfraudsters and their connectivity with the suspect into a proba-bility of fraud. This probability of fraud gets calculated for allnew numbers, and a ranked list is sent to our fraud investigators.

    (1 − θ)

    Old top-k edges

    node–labels wts652-5467 5 .2756-2656 5 .0652-4132 4 .5653-4231 2 .3624-3142 1 .9735-4212 1 .8423-1423 0 .8534-2312 0 .5526-4532 0 .2Other 0 .1

    + θ

    Today’s edges

    node–labels wts543-6547 10 .0756-2656 6 .2652-5467 2 .0652-4132 0 .8

    Other 0 .0

    =

    New top- k edges

    node–labels wts756-2656 5 .2652-5467 4 .6652-4132 3 .9653-4231 2 .0624-3142 1 .6735-4212 1 .5543-6547 1 .5423-1423 0 .7534-2312 0 .4Other 0 .3

    Figure 2. Computing a new top- k edge set from the old top- k edge set and today’s edges. Note how a new edge enters the top- k edge set,forcing the addition of an old edge to “other.”

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    8/14

    FRAUD DETECTION IN TELECOMMUNICATIONS 27

    Figure 3. A guilt-by-association plot for the node labeled “suspect.”The elliptical nodes correspond to toll-free accounts, and the rectangu-lar nodes are conventional accounts. Shaded nodes have been labeledas fraudulent by network security associates.

    3.3.4 Catching Fraud via Graph Matching. The COI sig-natures serve as a communications ngerprint, a way to char-acterize the usage not of a phone number, but rather of the per-son behind the phone number. As such, they have been usefulin tracking down miscreants who try to cover their tracks bychanging their phone number, name, or address. One exampleof this is our repetitive debtors database (RDD), designed tokeep a running database of COI signatures of delinquent cus-tomers in an attempt to track them down if they tried to assumeother identities. Consider the case where a customer is discon-nected due to nonpayment. In some cases this customer mighttry to sign up for a new account under another identity usinga different name, a different billing address, or a stolen creditcard. Sometimes just permuting letters in the name is enoughto avoid detection by the sales representatives. But in this case,even with different identifying information, the new account be-longs to an individual likely to have the same communicationsngerprint as the delinquent account. Our intuition is that if webuild COI signatures on all new accounts and match them to thesignatures of the delinquent accounts, we can nd these fraud-sters. This approach entails building a distance metric betweenCOI signature graphs. The distance function is based primar-ily on the overlap between the two graphs (i.e., phone numbersappearing in both signatures) and the proportion of the overall

    communication accounted for by those overlapping nodes. Inthis way, we account for both the quality and the quantity of theoverlap. Hill et al. (2006) proposed twosuch functions for graphdistance. The rst of these uses the Dice criterion (Dice 1945 ),which is commonly used in information retrieval for measur-ing similarity between documents and queries. For two sets Aand B, the Dice criterion is

    D( A, B) = 2| A ∩ B|| A| + | B|

    or twice the cardinality of the intersection of the two sets di-vided by the sum of the cardinalities of the sets. The Dice crite-rion is bounded between 0 and 1, with a value of 1 when the twosets are identical. We use a weighted version of this criterion toaccount for the weights on the edges.

    The second predictive criterion is based on the Hellinger dis-tance (Beran 1977 ), which was designed to measure distancesbetween statistical distributions. Applied to graph matching, theHellinger distance becomes

    HD( A, B) = j∈ A∩ B

    w A( j)w B( j),

    where w represents the weights on the edges in the overlappingpart of the graphs. This sum is also bounded by 0 and 1 and ismaximized when all elements of the predicted set appear in thetest set.

    Both of these metrics are based on simple calculations of in-tersections and unions of graphs and can be efciently calcu-lated for large numbers of candidate pairs. In our application,we tested tens of thousands of new accounts daily to see if theymatched up with a stored collection of thousands of knownfraudulent COI signatures. This generated a daily case list of suspected “repetitive debtors” that would get handed off to thefraud team for further investigation.

    3.4 The Role of Humans

    Over the years, we have witnessed several attempts to builda system with intelligent “agents” that can learn the actionsthat a human would take in a given situation and apply themautomatically thereafter. These agents are usually proposed aspart of a system attempting to take the humans “out of theloop” and perhaps introduce some cost savings. It could be ar-gued that detecting fraud and stopping it is best done solely bycomputer-controlled algorithms; however, in many cases, cor-

    rectly diagnosing fraud is difcult without a person investigat-ing. The same calling pattern could be fraud for one customerand normal usage for another. Humans are sensitive and cre-ative in assessing calling behavior—they often can tell whena change in calling pattern makes sense and when it does not.In our experience, this intuition for different scenarios is ex-tremely difcult to codify. Another good reason to have peoplein the loop is that it means that the fraud detection softwaredoes not have to be perfect (and thus difcult and complex toproduce). Instead, the detection algorithms can provide alertsthat function as clues rather than fully weighed conclusions.One way to think of this is that the algorithms point to poten-tial crime scenes, and the person is an investigator looking foradditional clues. The false alarm rate for alerts does not have to

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    9/14

    28 RICHARD A. BECKER, CHRIS VOLINSKY, AND ALLAN R. WILKS

    be zero, as it would need to be if algorithms alone were usedto shut down a business customer’s telecommunications due tosuspected fraud. (Usually only a human can strike the balancebetween letting fraud continue and potentially shutting downan essential business tool.) Personal verication (or disproof)of fraud provides a solid answer for each case, and those den-itive answers can be used to help tune the detection algorithmsfor better performance.

    The task of working a case involves many user-controlledactions. Cases are assigned to analysts based on their areas of expertise, such as business, residential, or credit card. Initially,the analyst retrieves the call detail related to the key number andlooks at it, either as a table for small sets of call detail or graphi-cally (for larger data sets). Theanalyst can detect anysuspiciouspatterns in the call detail (either with or without looking at thealerts generated by the system). If the analyst suspects fraud, heor she deals with it appropriately. For example, in the case of a business customer where fraud involves a PBX or voice mailsystem, the analyst will contact the customer and assist themin securing their equipment, offering to block trafc until the

    system is secure. Through the entire process, the analyst doc-uments what has been done, (e.g., call customer, leave voicemail, receive callback, implement a block) by interacting withthe case manager software. Note that the case manager, whichcontains data on all events involved in that case (e.g., analystnotes, all alerts), now becomes another source of data that mustbe stored appropriately to be accessed for future cases.

    We nd that the experienced case worker uses a signicantamount of intuition in working a case. For instance, the workercan look at a case and understand that there is a trade-off interms of the amount of money that can be recovered by work-ing a case, the amount of effort required to investigate it, andthe probability that this case is a false-positive. We decided tonot attempt to code this information into a single fraud “impor-tance” score, but rather to provide the analyst as much of therelevant information as possible, with a powerful case managerthat allows he or she to sort and lter on relevant variables. Oneof the most important examples of this point involves visual-ization. Visualizing the data associated with a given case is notalways appropriate or efcient (e.g., if only a very small num-ber of relevant calls are associated with an alert). Several factors

    are considered when determining whether or not a sophisticatedinteractive visualization tool is the best way to investigate thedata, or whether a simple table might sufce. We do not man-date visualization of every case; instead, we provide the analystwith all of the tools necessary to determine whether it is war-ranted in a given case.

    3.5 VisualizationIn many instances, the call detail associated with a given case

    is large enough such that a visualization tool can be of greathelp. This tool enables a fraud analyst to view and understandperhaps thousands of calls at once. The goal is to display all therecent call detail, not just the part that looks “fraudy,” to allowthe analyst to see the potentially fraudulent calls in the contextof the normal calling pattern. For this, we provide a tool basedon the Yoix language (Drechsler and Mocenigo 2007 ), whichitself is built atop Java. The tool provides a plot with a timeaxis to show each call and also provides interactive histogramsof various call characteristics that allow the analyst to displayinteresting subsets of the data.

    In the plots that follow, the horizontal axis is time, and thevertical axis is call duration. Each call is shown as a verticalspike. In these displays, spike color is determined by countrycalled. Axes below the main display show tick marks that in-dicate when calls overlap one another (a potential sign of com-promised use), when terminating phone numbers are on a list of known high-fraud numbers, and when the operator is involved.Shading (if present) shows normal business hours in gray andout-of-hours in white.

    We next present some examples. Figure 4 showsa year of calldetail for seven phone numbers associated with a single cus-tomer. Even though 655 calls are shown, 2 heavy spikes standout, emphasizing 2 periods during which fraudulent calling oc-curred.

    Figure 5 shows an egregious fraud event, in which the nor-mal calling pattern is overlaid by many calls to a foreign coun-try historically associated with fraud. Even though 7,000 callsare displayed here, detecting the fraudulent calls is easy, partic-ularly because many of them are of the same duration.

    Figure 4. Shows calls from a set of seven phone numbers associated with a single customer over a 1-year period. Two sets of spikes are quitehigh, associated with two fraud incidents in the year.

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    10/14

    FRAUD DETECTION IN TELECOMMUNICATIONS 29

    Figure 5. Light-colored blocks show calls to a specic foreign country occurring out of hours, starting on January 15 and continuing toJanuary 28. The display shows 7,000 calls. Notice the uniform height of calls at 45 minutes.

    Figure 6 shows another instance of fraud in this same coun-try. In this case, the fraudulent calls occur out of normal busi-ness hours (which are shown by the shaded rectangles), in con-

    trast to the normal calling pattern.The next example, shown in Figure 7, demonstrates how in-tense fraudulent activity can contrast with infrequent normalbehavior. Identifying the fraud pattern here is particularly easy.

    Finally, Figure 8, shows a single social-engineering event su-perimposed over a year of regular calling. These calls are oftenvery long duration.

    These displays are static and monochromatic. But when an-alysts are investigating fraud, the displays are fully interactiveand use color to encode categorical variables, such as “calledcountry.” The analyst may, for example, pop up a histogram thatshows the number of calls going to each country. A menu itemcan deactivate all countries, and a button click can reactivate thecountry that appears to be the recipient of the fraudulent calls.With just those calls displayed, the horizontal axis can be ad- justed to narrow the time window. A rectangle swept over thetip of one or more of the call spikes, causes the full call detailfor those calls to be displayed in another window. If the ana-lyst sees that one of the particularly long fraudy-looking callsis operator-assisted, then it is easy to use a histogram to displayall of the operator-assisted calls.

    This tool is natural and powerful, allowing the analyst to per-form an interactive data analysis session on the call detail, pos-ing and answering questions that arise from seeing the calls.

    4. IMPLEMENTATION

    In 1998, AT&T implemented a new fraud detection systemcalled the Global Fraud Management System (GFMS), builtmostly by statisticians. How did a group of statisticians get in-volved in writing a production fraud detection system? Thatwas not our plan. The initial concept was that we would pro-vide a signature-based alerting algorithm to plug into a newfraud system to be built by other organizations within AT&T.But the new system did not get built through standard develop-ment channels, and we were left with a stand-alone detection

    algorithm. At that time, it became apparent that we would haveto provide the entire fraud system to deliver the signature alert-ing. This was also under a strict deadline due to the Y2K effortwithin the company; the older fraud detection system was notY2K-compliant.

    We succeeded because we had a small team and used simplecomponents, many of which we had developed to carry out full-volume testing of the signature alerting. In particular, we hadalready constructed the call detail database and various tools to

    Figure 6. A 2-day set of fraudulent calls to a specic foreign country on the middle Tuesday and Wednesday. Note how these calls areprimarily out of business hours, and that the general pattern for this phone is calls only during business hours.

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    11/14

    30 RICHARD A. BECKER, CHRIS VOLINSKY, AND ALLAN R. WILKS

    Figure 7. One year of calling behavior for a single phone line. The pattern of intense activity at the center is fraudulent, contrasting with theinfrequent call pattern the rest of the year.

    help with the huge ow of data. We also used extreme program-ming teams (Nosek 1998 ), in which two programmers worked jointly to produce code quickly and effectively.

    The systems that we built all shared several important charac-teristics. They were implemented in UNIX-like environments,they used basic tools in those environments in small pieces of code, and these pieces were connected to one another in waysnatural to the operating system. As an example, one of the par-adigms that we used extensively to manage the ow of datawas what we term “hose parts.” The word “hose” is intendedto recall the well-known analogy comparing obtaining an MITeducation with taking a drink from a rehose—processing con-tinuously owing data on a massive scale is like this. In ourinfrastructure, a “hose” is a UNIX directory with subdirectory“parts.” Each part is a place where data is received, stored,processed, and eventually deleted. The part has subdirectoriesto accomplish these tasks, and by using, for example, the in-herently atomic nature of the UNIX “mv” command (which re-names a le), we were able to elaborate a relatively complex setof generic operations with a surprisingly small amount of code.

    Our “vetter” is a typical user of the hose paradigm. We re-ceive hundreds of thousands of les of AMA data daily fromhundreds of telecommunication switches around the country.Each switch has a mechanism in place for serializing the data

    that it sends—marking it in such a way that we can recognizemissed or duplicated records. Our vetter receives these manyles in a single hose part and bundles them into larger batches,

    after performing extensive per-switch serialization checks. Italso attempts to parse each record to detect corruption as soonas possible. Out of this hose part comes a single, parsablestream of batches with its own reliable serialization and withno duplicated or missing records. Other parts in this hose dealwith compressing data for transmission to vetter clients, feed-back mechanisms to the switches when problems are discov-ered, audit trails, and so on.

    A second major paradigm that we use is the “streamer,” soft-ware that reads the call detail record by record and for eachrecord passes it though a set of plug-ins. Each plug-in has threefunctions: initialization (once per le of records), processing(once per record), and wrap-up (once per le). The plug-insare independent of one another; each is designed to look for aparticular kind of call detail record and extract the relevant in-formation from it. The initialization routine of a plug-in oftenopens a le to hold the extracted information. The processingfunction attempts to make a very fast decision on each record,rejecting it quickly if it is not of interest. Otherwise it extracts afew elds from the call detail record and writes them to the leopened by the initialization function. Finally, the wrap-up func-

    Figure 8. One year of calls, showing a spike in the middle due to social engineering, with long-duration operator-assisted calls to a knownhigh-fraud country.

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    12/14

    FRAUD DETECTION IN TELECOMMUNICATIONS 31

    tion closes any open les (and removes les that were neverwritten to).

    The vetter is implemented as a streamer, as is the fraud de-tection software. In the latter case, les extracted from the callstream by each plug-in are typically processed by a job sched-uled to run at regular time intervals. Often these jobs will ap-ply thresholds of various sorts, producing alerts when appropri-ate. Each detection algorithm comprises a plug-in and the code

    that processes the extracted data to generate alerts. Plug-ins canthemselves produce alerts if they detect from a single call thatan alert is justied; an example of this situation is an alert basedon a call of very long duration.

    As new fraud patterns arise, detection algorithms are gener-ally implemented by adding a new plug-in and some associatedlogic, allowing a quick response to the changing face of fraud.We have been able to add new plug-ins, fully tested, in as littleas 45 minutes.

    Our system is validated in a few different ways. The casesthat we detect are handed off to an internal fraud investigationdepartment, which makes a nal determination as to whetheror not to take action on a particular case. The results of theseinvestigations allow for regular assessment of our detection al-gorithms. Because the system is modular, it is easy to adjust amodel and reimplement it quickly if necessary. Another methodof validation comes from the fact that the impact of fraud of-ten shows up on the bills of innocent customers. Therefore, wehave millions of customers acting as fraud detectors when theyinspect their bills carefully on a monthly basis. Reports fromcustomers regarding incorrect charges often lead us to deter-mine that some fraud has occurred. Keeping track of such re-ports allows us to get a high-level sense of the amount of false-negatives (i.e., undetected cases of fraud) in our system. Thesemeasures ensure that our models effectively address the fraud

    problem over time.

    5. CONCLUSION

    In our effort to develop world-class fraud management toolsfrom massive data streams, we have attacked problems that onthe surface appear to be quite different: subscription fraud, in-trusion detection, repetitive debtors, access management fraud,and the like, each with its own characteristics that hinder detec-tion. Nonetheless, over the years we have found some commonthemes that propagate across these different classes of fraud,specically in cases dealing with extremely large data sets:

    • The need to join data analysis and data management . Inlarge data analysis projects, there is no way to decouple theanalysis of the data from the storage, management, and re-trieval of the data. In our early days, we thought we wouldsimply be building an algorithm, but at massive scales of data, even simple exploratory data analysis methods arechallenging. Questions like how much data exist, what thedistributions are, and whether outliers exist become dif-cult. We discovered that having a stake in data manage-ment allowed us to answer these questions more easily andalso led to new possibilities and learning. In addition, afterobserving how the fraud team used our tools, it was clearthat one of the most useful tools that they had was the abil-ity to take high-level summaries of the data, hone in on

    interesting patterns through interactive visualization, andaccess the raw call data underneath. The ability to accessthe full data records underlying the patterns could result inidentication of important attributes in the data that werenot considered important previously. Tying the data man-agement and the analysis aspects of the system togetherallowed for this close interplay between fraud detectionand the underlying data that would not be possible had thetwo systems been developed by different teams.

    • Inadequacy of large models . In the early days, it took all of our effort simply to be able to keep up with the data com-ing in and attempt to do real-time signature updating. Wewere not able to t any sophisticated models, because thecomputational time and space were just too costly. As a re-sult, our fraud system was built on simple models: simpledistributions t to proles, likelihood ratio tests and, basicregressions. As new types of fraud emerged, the emphasiswas on ghting it quickly, and simple xes would go intothe system. We ended up with a modular system in whichmany small models came together to create a complex sys-

    tem. As the years went by, we found that many of the moresophisticated techniques that we tried were not worth theextra time and complexity in terms of their performanceagainst the collection of smaller models. In addition, someof our best-performing models were simple ones based onhot-lists of known bad numbers.

    Many new data mining tools promise a one-size-ts-allapproach to fraud detection, in which a single tool can an-alyze any type of data, without much thought from the an-alyst. We believe that a better approach is to solve littleparts of the problem, one at a time, and put these togetherto create a single robust, adaptable system. This approachhas worked well in the world of spam detection, wherethere is a similar “arms race” going on between perpe-trators and enforcers (Goodman, Cormack, and Hecker-man 2007 ). Most world-class spam detection systems arehuge rule-based systems in which every technique that aspammer uses is counteracted by a single small modulewhich looks for that specic type of spam, and nothingelse. Hundreds upon hundreds of these rules weigh in onevery e-mail, looking for signs of spam. When the spam-mers gure out a way around the current rules, the detec-tors simply write a new piece of logic looking for that spe-cic loophole. The interesting statistical problem comesin combining the output of all of these detectors. With

    the abundance of training data in the spam domain, learn-ing optimal models to reach acceptable values of false-positives and false-negatives is easy—but what are the ac-ceptable values? This depends on the priorities of the busi-ness unit manager. Each fraud unit must balance the costof having people to work cases with the estimated value of the fraudulent uncollectables recovered. In some cases wehave found that a manager will accept a 95% false-positiverate to be assured of catching all of the fraud.

    • Necessity of humans . Seemingly every day, a new dataanalysis tool comes on the market claiming to do more“automated data mining.” Our goal in building systems isnot to take humans out of the loop, but rather to make thework of the humans easier. It remains true that the best

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

  • 8/19/2019 Implicit Icdm

    13/14

    32 RICHARD A. BECKER, CHRIS VOLINSKY, AND ALLAN R. WILKS

    pattern recognizer is the human brain. Most new fraudschemes are discovered by people who have broad domainknowledge and experience noticing that something is “justnot right” with the data. The best that a fraud detectionsystem can do is point the experts toward cases that mightbe fraudulent, but usually the investigation into the fraudrequires a sophisticated sequence of deduction, analysis,integration between organizations, social interaction, anddecision making that can be done only by people. Frauddetection systems should be built with an eye toward help-ing the experts, not replacing them.

    • Need for fast feedback loops . The success of many of ourtools depends on the ability to get quick feedback froma fraud management team that is investigating cases. Forinstance, in the repetitive debtors example (Section 3.3.4),we would provide the fraud team with a sorted list of casesthat they would investigate in depth by calling the con-sumer and doing other background and data checks thatonly a human could do. At the end of the day, we wouldbe given the results of these investigations, providing us

    with a labeled data set. In this way, our models could beimproved incrementally over time.• Importance of exibility . One of our design philosophies

    from an early stage was to build a powerful system fromsmall components, each of which did a specic small job well. Small tools are built using simple scripts fordata quality checking, data distribution, data storage, andanalysis tasks. These small components are gathered to-gether using a UNIX-style “pipe,” in which componentsare plugged into the ow of the system. This approachprovides extreme exibility, as components can get up-dated, switched around, or removed with little impact onthe whole system. From an analysis standpoint, the sys-tem is very modular. Once new types of fraud are identi-ed, a statistical model is built to detect it, and it is easilyplugged into the system and applied to the full data streamto identify new cases. Over the 10 years that this systemhas been operational, its built-in exibility has allowed ourfraud team to work nimbly to catch new types of fraud withnew types of data that did not exist when the system wasdeveloped. New types of fraud are always popping up, andany system designed to catch fraud must be sufcientlylightweight and exible to keep up with the arms race be-tween the fraudsters and the fraud detectors. Of course,we do not know what types of fraud will emerge in thecoming years, andnew challenges in data management andanalysis certainly will present themselves. The lesson thatwe have learned is that a exible system built with light-weight components will provide the best opportunity toadjust quickly to whatever comes our way.

    We believe that our experience in building scalable, robust,and effective fraud detection modules has relevance in ghtingthe challenges that will undoubtedly arise with new technolo-gies. The methods that we have described here can be applied tosuch fraud schemes as online auction fraud, click-stream fraud,and hopefully other types of fraud that emerge in the twenty-rst century, and we hope that our observations will be helpfulfor those attempting to stop the fraudsters from succeeding.

    ACKNOWLEDGMENTS

    The authors thank the referees and editors for their manyhelpful comments.

    [Received July 2008. Revised April 2009.]

    REFERENCES

    Abraham, B., and Ledolter, J. (1983), Statistical Methods for Forecasting , NewYork: Wiley. [24]

    Angus, I., and Blackwell, G. (1993), Phone Pirates , Ajax, Ontario, Canada:Telemanagement Press. [21]

    Beran, R. (1977), “MinimumHellinger Distance Estimates for ParametricMod-els,” The Annals of Statistics , 5 (3), 445–463. [27]

    Bolton, R., and Hand, D. (2002), “Statistical Fraud Detection: A Review,” Sta-tistical Science , 17 (3), 235–255. [21,22]

    Cahill, M. H., Lambert, D., Pinheiro, J. C., and Sun, D. X. (2002), DetectingFraud in the Real World , Norwell, MA: Kluwer Academic. [ 22,24]

    Cortes, C., and Pregibon, D. (2001), “Signature-Based Methods for DataStreams,” Data Mining and Knowledge Discovery , 5 (3), 167–182. [24]

    Cortes, C., Pregibon, D., and Volinsky, C. (2001), “Communities of Interest,” in Advances in Intelligent Data Analysis . Lecture Notes in Computer Science ,Vol. 2189, Cascais, Portugal, pp. 105–114. [25]

    Cortes, C., Pregibon, D., and Volinsky, C. (2003), “Computational Methods for

    Dynamic Graphs,” Journal of Computational and Graphical Statistics , 12,950–970. [26]Dice, L. R. (1945), “Measures of the Amount of Ecologic Association Between

    Species,” Ecology , 26 (3), 297–302. [ 27]Drechsler, R. L., and Mocenigo, J. M. (2007), “The Yoix Scripting Language:

    A Different Way of Writing Java Applications,” Software: Practice and Ex- perience , 37, 643–667. [28]

    Fawcett, T., and Provost, F. (1997), “Adaptive Fraud Detection,” Data Miningand Knowledge Discovery , 1 (3), 291–316. [22]

    Goodman, J., Cormack, G. V., and Heckerman, D. (2007), “Spam and the On-going Battle for the Inbox,” Communications of the ACM , 50 (2), 24–33.[31]

    Greer, R. (1999), “Daytona and the Fourth-Generation Language Cymbal,” inProceedings of the ACM SIGMOD International Conference on Manage-ment of Data , Philadelphia, PA: ACM Press, pp. 525–526. [ 23]

    Hill, S., Agarwal, D., Bell, R., and Volinsky, C. (2006), “Building an Effec-tive Representation for Dynamic Networks,” Journal of Computational and Graphical Statistics , 15 (3), 584–608. [ 24,26,27]

    International Telecommunication Union (1988), “Recommendation Q.761, Sig-nalling System no. 7,” technical report, ITU. [ 23]

    (1993), “Recommendation Q.764, Signalling System no. 7—ISDNUser Part Signalling Procedures,” technical report, ITU-T. [23]

    Kaplan, D. A. (2006), “Intrigue in High Places: To Catch a Leaker, Hewlett–Packard’s Chairwoman Spied on the Home–Phone Records of Its Board of Directors,” Newsweek (September). [21]

    Kellogg, M. K., Huber, P. W., and Thorne, J. (1999), Federal Telecommunica-tions Law (2nd ed.), New York: Aspen Law and Business. [22]

    Lambert, D., Pinheiro, J. C., and Sun, D. X. (2001), “Estimating Millions of Dynamic Timing Patterns in Real Time,” Journal of the American Statistical Association , 96, 316–330. [24]

    Metwally, A., Agrawal, D., and Abbadi, A. E. (2005), “Efcient Computationof Frequent and Top-k Elements in Data Streams,” in Database Theory— ICDT 2005 , eds. T. Eiter and L. Libkin, Berlin–Heidelberg: Springer,pp. 398–412. [24]

    Moreau, Y., Preneel, B., Burge, P., Shawe-Taylor, J., Stoermann, C., and Cooke,C. (1997), “Novel Techniques for Fraud Detection in Mobile Telecommu-nication Networks,” in ACTS Mobile Summit, Grenada, Spain , Brussels,Belgium: ACTS. [22]

    Nosek, J. T. (1998), “The Case for Collaborative Programming,” Communica-tions of the ACM , 41 (3), 105–108. [30]

    Phua, C., Lee, V., Smith, K., and Gayler, R. (2005), “A Comprehensive Surveyof Data Mining-Based Fraud Detection Research,” technical report, ClaytonSchool of Information Technology, Monash University, available at http: // clifton.phua.googlepages.com/fraud-detection-survey.pdf . [21,22]

    Rosenbaum, R. (1971), “Secrets of the Little Blue Box,” Esquire , 76, 117–125,222–226. [20]

    Rosset, S., Murad, U., Neumann, E., Idan, Y., and Pinkas, G. (1999), “Discov-ery of Fraud Rules for Telecommunications—Challenges and Solutions,” inKDD’99: Proceedings of the Fifth ACM SIGKDD International Conference

    on Knowledge Discovery and Data Mining , New York, NY: ACM Press,pp. 409–413. [22]

    TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1

    http://clifton.phua.googlepages.com/fraud-detection-survey.pdfhttp://clifton.phua.googlepages.com/fraud-detection-survey.pdfhttp://clifton.phua.googlepages.com/fraud-detection-survey.pdfhttp://clifton.phua.googlepages.com/fraud-detection-survey.pdf

  • 8/19/2019 Implicit Icdm

    14/14

    FRAUD DETECTION IN TELECOMMUNICATIONS 33

    Telcordia Technologies (2000), “Line Information Database (LIDB) EnhancedExpanded Measurement (EEM) Generic Requirements,” GR-3104-CORE ,Issue 1. [23]

    (2005), “Billing Automatic Message Accounting Format (BAF)Generic Requirements,” GR-1100-CORE , Issue 10. [22]

    Winter Corporation (2005), “2005 TopTen Award Winners,” available at http: //www.wintercorp.com/ . [23]

    Winters, P. (1960), “Forecasting Sales by Exponentially Weighted Moving Av-erages,” Management Science , 6, 324–342. [24]

    http://www.wintercorp.com/http://www.wintercorp.com/http://www.wintercorp.com/http://www.wintercorp.com/