[ieee 22nd international conference on data engineering (icde'06) - atlanta, ga, usa...

Download [IEEE 22nd International Conference on Data Engineering (ICDE'06) - Atlanta, GA, USA (2006.04.3-2006.04.7)] 22nd International Conference on Data Engineering (ICDE'06) - Automatic Sales Lead Generation from Web Data

Post on 07-Mar-2017




0 download

Embed Size (px)


  • Automatic Sales Lead Generation from Web Data

    Ganesh RamakrishnanIBM India Research Labganramkr@in.ibm.com

    Sachindra JoshiIBM India Research Labjsachind@in.ibm.com

    Sumit NegiIBM India Research Labnegsum@in.ibm.com

    Raghu KrishnapuramIBM India Research Labkraghura@in.ibm.com

    Sreeram BalakrishnanIBM India Research Lab



    Speed to market is critical to companies that aredriven by sales in a competitive market. The earliera potential customer can be approached in the deci-sion making process of a purchase, the higher are thechances of converting that prospect into a customer.Traditional methods to identify sales leads such as com-pany surveys and direct marketing are manual, expen-sive and not scalable.

    Over the past decade the World Wide Web has growninto an information-mesh, with most important factsbeing reported through Web sites. Several news pa-pers, press releases, trade journals, business magazinesand other related sources are on-line. These sourcescould be used to identify prospective buyers automati-cally. In this paper, we present a system called ETAP(Electronic Trigger Alert Program) that extracts triggerevents from Web data that help in identifying prospec-tive buyers. Trigger events are events of corporate rel-evance and indicative of the propensity of companiesto purchase new products associated with these events.Examples of trigger events are change in management,revenue growth and mergers & acquisitions. The un-structured nature of information makes the extractiontask of trigger events difficult. We pose the problem oftrigger events extraction as a classification problem anddevelop methods for learning trigger event classifiersusing existing classification methods. We present meth-ods to automatically generate the training data requiredto learn the classifiers. We also propose a method offeature abstraction that uses named entity recognitionto solve the problem of data sparsity. We score andrank the trigger events extracted from ETAP for easybrowsing. Our experiments show the effectiveness ofthe method and thus establish the feasibility of auto-

    matic sales lead generation using the Web data.

    1 Introduction

    Various marketing techniques have been used bycompanies to target and entice prospective buyers inorder to increase their sales. The process of identifica-tion of potential buyers is also known as sales lead gen-eration. Live seminars, trade shows, cold calling, massmailing, advertising and partner referrals are some ex-amples of sales lead generation. These methods can beclassified into two types1: 1) reactive methods, and 2)proactive methods. Reactive methods are the ones inwhich a customer contacts a company following an ad-vertisement such as one in yellow pages. Subsequently,sales representatives handle the sale. On the otherhand, proactive methods are the ones in which a pos-sible customer is contacted cold by a company. Salesrepresentatives then try to verify that that the possi-ble customer is actually a prospect. Identifying possiblecustomers based on certain information about the cus-tomers is the most important part of proactive meth-ods. Several data mining methods that use customerinformation for the identification of likely buyers canbe found in the literature [9]. However, these meth-ods cannot be used when relevant information aboutcustomers is not readily available. Automatic methodsto gather relevant information about customers are re-quired to help identify possible customers.

    In this paper, we propose a proactive method of saleslead generation that gathers new or evolving informa-tion about customers from the Web. Our approach isinspired by the growth of information (such as companydescriptions, life cycles, profits and revenue changes) on



    Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE

  • the Web, posted by several news papers, press releases,trade journals, business magazines and other relatedsources. We present a system called ETAP (ElectronicTrigger Alert Program) that uses this information todiscover sales leads. The system is suited for iden-tification of prospective companies as opposed to theidentification of individual buyers, making it suitablefor B2B (business to business) scenarios. Analysis ofsales leads is a complex process that requires skill, timeand effort. The proposed system is aimed at gather-ing sales leads from the Web and presenting them todomain specialists for the final validation.

    ETAP is based on a traditional method of sales leadgeneration. In the traditional method, companies iden-tify a set of sales drivers for their products/services. Asales driver represents a class of events whose existenceindicates a strong propensity to buy products/servicesof companies associated with the events. Revenuegrowth, change in management and mergers & acqui-sitions are examples of sales drivers. The set of salesdrivers could be different for different industries. As anexample, mergers & acquisitions could be a sales driverfor the IT industry but may not be a sales driver forthe steel industry. This is because mergers and ac-quisitions of companies could lead to the integrationof IT systems of the companies thereby generating de-mand for new IT products. Typically, the drivers fora specific industry are determined by expert opinions.Suitable questionnaires are developed and surveys areconducted to gather information about the existence oftrigger events related to the known sales drivers. Trig-ger events are events that occur in the context of acompany (or its environment) and belong to a givensales driver. As an example, given mergers & acqui-sitions as a sales driver for the IT industry, an eventsuch as Company X plans to acquire Company Y laterthis year is a trigger event. Similarly, given revenuegrowth as a sales driver, an event such as Company Xreported a revenue growth of 10% in the fourth quar-ter is a trigger event. Identification of such a trig-ger event makes Company X a prospective buyer of ITproducts. The traditional method of gathering triggerevents is to call representatives of different companiesor conduct interviews with them. This method, beingmanual, tedious and expensive, is not scalable.

    In our work, we assume that the set of sales driversfor the industry is known before hand. Given a setof sales drivers for the industry, ETAP automaticallyidentifies trigger events on the Web. The ETAP sys-tem crawls the Web to gather documents related tocompanies and financial news, extracts trigger eventsfrom the crawled documents and then ranks the triggerevents based on a scoring function for easy browsing.

    Thus, ETAP assists sales representatives in prioritizingcompanies that need to be approached.

    Event extraction is the most challenging part of thesystem, due to the unstructured nature of the data.Traditional information extraction [4, 5, 13] deals withextracting entities, relations and events from docu-ments according to pre-defined templates, and can beused to identify events. In our case, identifying trig-ger events for the mergers & acquisitions sales driverwould involve extracting entities that play the rolesof the acquired company, and the acquiring com-pany. Most existing systems for information extrac-tion employ linguistic patterns to fill the slots corre-sponding to entities and relations. As an example, alinguistic pattern company acquired company can beused to extract information about acquired and acquir-ing companies. Since several different patterns need tobe created manually for a single information extrac-tion task, this method involves a large amount of ef-fort. Learning based methods for information extrac-tion have been proposed. However, they suffer frompoor precision and recall2.

    We argue that for identifying trigger events, it suf-fices to distill from a large collection of documents,snippets (a collection of sentences) that are related toa particular sales driver and present them in a rankedorder of relevance. In ETAP, events are associated withsnippets of text. We refer to a snippet that containsinformation relevant to a sales driver as a trigger event.As an example, to find information pertaining to merg-ers & acquisitions, it is good enough to determine snip-pets consisting of one or more sentences that describethe event. In our proposed method, we formulate theproblem of trigger event identification as a two-classclassification problem. The two classes are: a positiveclass of data relevant to the sales driver and a negativeclass comprising a large random sample of data fromthe Web. Since traditional text classification based ona bag-of-words model suffers from data sparsity prob-lem [7], we address this drawback by using feature ab-straction. We obtain feature abstraction by replacingcertain words with their type information. The typeswe consider in this paper are restricted to named entitytypes and part-of-speech types.

    Another issue with training a classifier is the avail-ability of relevant training data. We solve this problemby querying the Web to generate noisy training data,and then using a method similar to Carla Brodley [3]to train the classifier using this data. Other classifiersthat can learn in presence of noise could also be usedinstead.


    projects/muc/proceedings/muc 7 proceedings/overview.html


    Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE

  • The number of extracted trigger events could belarge. Therefore, the ranking component of ETAPranks the extracted trigger events based on a scoringfunction for easy browsing. Thus the ranking compo-nent of ETA