using publicly available data 20 th meeting course name: business intelligence year: 2009

13

Upload: vincent-kelly

Post on 02-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009
Page 2: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

Using Publicly Available Data20th Meeting

Course Name: Business IntelligenceYear: 2009

Page 3: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

Bina Nusantara University

3

Source of this Material

(2). Loshin, David (2003). Business Intelligence: The Savvy Manager’s

Guide. Chapter 15

Page 4: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

The Business CaseIt is very simple to make the case for using public data. Data that has

been collected and made available by government resources is available at a low cost, and the only costs involve storage management and integrating with other BI data. In any company that has set up a BI environment, the processes associated with importing, managing, and integrating data have already been streamlined for internal data set aggregation. And so the only increase is in those variable costs associated with executing those processes. On the other hand, in the right circumstance there can be significant value through data enhancement using publicly available data.

Bina Nusantara University 4

Page 5: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

There are three major management issues associated with the use of publicly available data: integration, privacy, and its lack of structure. In fact, there are a number of companies whose business is to enhance and improve public data sets and the resell them based on their added value.

The second major issues revolves around personal privacy. There is a perception that any organization that collects data about individuals and the tries to exploit that information is invading a person’s privacy.

The third major issue is that a lot of publicly available data is not always in a nicely structured form that is easily adaptable. Frequently, this data is semistructured, which means that the data requires some manipulations before it can be successfully and properly integrated.

Bina Nusantara University 5

Management Issues

Page 6: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

There is a large amount of public data that is easily accessible, and how to explore all of that data could fill an entire book. What is important is to explore the process of locating the data resources that are available and how to determine the usage possibilities for that data.

There are many ways that data sets can be categorized, but we will break the realm of public data into these areas:

• Personal InformationAny data that attributes the information about a person could be called personal information.

• Business InformationAside from personal information, there is a lot of data that can be used to attribute business entities. The public records are frequently related to rules and regulations imposed on business operations by federal or state government jurisdictions. This kind of data includes the following.

Incorporations Uniform Commercial Code (UCC)Bina Nusantara University 6

Public Data

Page 7: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

Bankruptcy Filings Professional Licensing Securities Filings Regulatory Licensing Patents and Trademarks

• Legal InformationA large number of legal cases are accessible online, providing the names of the parties involved in the cases as well as free text describing the case. These documents, many of which having been indexed and made available for search, contain embedded psychographic and geographic enhancement potential, along with opportunities for entity extraction and entity linkage. Those linkages may represent either personal or business relationships.

• Factual InformationThere is an abundance of factual information embedded in available data sets. Although there may be some restriction on specific uses of some of this data, there is still much business value that can be derived from data sets such as the following.

Bina Nusantara University 7

Public Data (cont…)

Page 8: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

Census Summary Topologically Integrated Geographic Encoding and

Referencing database Federal Election Commission Bureau of Labor Statistics (BLS) Pharmaceutical Data

Bina Nusantara University 8

Public Data (cont…)

Page 9: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

There are basically two approaches: gather data from the original source, and pay a data aggregator for a value-added data set.

• Original SourceAs mentioned in the previous sections, the government is a very good source of publicly available data. Another source of publicly available information may be provided by third parties in a form that is not meant for exploitation. A good examples is a Web site, which may have some data but not in a directly usable form. Another interesting source of publicly available data is the subject of that data itself.

• Data AggregatorsThe term data aggregator to refer to any organization that collects data form one or more sources, provides some value-added processing, and repackages the result in a useable form. Another method for providing aggregated data is through a query-and-delivery process.

Bina Nusantara University 9

Data Resources

Page 10: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

On the other hand, when the content is limited to a vocabulary or a format that can be reasonably modeled, it is possible, with some degree of certainty, to extract bits and pieces of information from semistructured data. The point is that although the data has not been broken down into a distinct set of attributes and their assigned values, there is some predictable context that appears frequently enough that allows an application to extract information.

Bina Nusantara University 10

Semistructured Data

Page 11: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

• Fear of InvasionThe truth is, as BI professionals, we are somewhat responsible for collecting customer information and manipulating that information for marketing purposes, but are we really guilty of invasion of privacy?

• The Value and Cost of PrivacyThis demonstrates an interesting model of information valuation, in that the consumer is being compensated in some way in return for providing information.

• The “Privacy” StatementThe issuing of a privacy statement does not imply that your data is being treated as private data. These statements actually are the opposite-they tell the consumer how the information is not being kept private.

• The Good News for Business IntelligenceThere are a lot of benefits in society to the dissemination of personal information, such as the ability to track down criminals, detect fraud, provide channels for improved customer relationship management, and even track down terrorists. As BI professional, we have a twofold opportunity with respect to the privacy issue.

Bina Nusantara University 11

The Myth of Privacy

Page 12: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

The first is to raise awareness regarding the consumer’s value proposition with respect to data provision, leading to raised awareness about both the legality and the propriety of BI analysis and information use. The second is to build better BI applications.

Bina Nusantara University 12

The Myth of Privacy (cont…)

Page 13: Using Publicly Available Data 20 th Meeting Course Name: Business Intelligence Year: 2009

End of Slide

Bina Nusantara University 13