accern whitepaper article volume distribution

13
1 Accern API Whitepaper Article Volume Distribution Analysis This paper examines the Accern data set, which contains 14,230,577 records of financial news articles spanning from August 2012 to August 2016 and how they are distributed across various metrics such as sentiment, sectors, etc.

Upload: accern-corporation

Post on 20-Jan-2017

81 views

Category:

Economy & Finance


0 download

TRANSCRIPT

Page 1: Accern whitepaper article volume distribution

1

Accern API Whitepaper

Article Volume Distribution Analysis

This paper examines the Accern data set, which

contains 14,230,577 records of financial news

articles spanning from August 2012 to August 2016

and how they are distributed across various metrics

such as sentiment, sectors, etc.

Page 2: Accern whitepaper article volume distribution

2

1. INTRODUCTION 3

2. TOTAL ARTICLE VOLUME 3

A. Volume of Articles Per Year 3

3. MARKET SECTORS 4

A. Total Volume of Articles by Sector 4

B. Annual Distribution Breakdown of Articles by Sector 5

C. Cumulative Annual Breakdown of Articles by Sector 6

4. INDUSTRY CATEGORIES 7

A. 20 Industries with the Highest Volume of Articles 7

5. EVENT GROUP ANALYSIS 8

A. Total Number of Articles per Year for Event Group 8

6. OVERALL SOURCE RANK 9

A. Number of Articles for Overall Source Ranks 9

7. ARTICLE TYPE 10

A. Annual Breakdown of Articles by Article Type 10

B. Percentage Breakdown of Articles by Article Type 10

8. ARTICLE SENTIMENT 11

A. Breakdown of Article Sentiment 11

B. Annual Breakdown of Total Vol. of Articles by Article Sentiment 12

9. EVENT IMPACT SCORE ON ENTITY 13

A. Annual Breakdown of Vol. of Articles by Event Impact Score 13

Page 3: Accern whitepaper article volume distribution

1. INTRODUCTION

Accern monitors over 20 million public websites in real time and uses proprietary AI algorithms

to help financial market investors find important stories to act on. The company derives metrics

such as sentiment, impact, source rankings, and more from every relevant article. This study

examines a data set of 14,230,577 records on financial news articles spanning from August

2012 to August 2016. Each record includes information about the article, and metrics derived

from technology utilized by the Accern. This brief and basic study is limited to the distribution

of the article records with respect to a selected number of fields in the records.

3

2. TOTAL ARTICLE VOLUME

The graph below shows the distribution of all the articles across the examined time period.

2012 and 2016 are partial years, with 2012 beginning on August 25 (approximately 4

months), and 2016 ending on July 31 (exactly 7 months). For the years in which there is a

full set of data (2013, 2014, and 2015), the totals are relatively consistent.

A. Volume of Articles Per Year

Page 4: Accern whitepaper article volume distribution

For 2012, the total 4-month period is slightly lower by about 6 percent relative to the full years in

the data set. However, for the year 2016, the total for the 7-month period shows a relative increase

of approximately 26 percent. The fact that the data for 2012 reflects the last 4 months of the year,

while the data for 2016 reflects the first 7 months, suggest that there may by a seasonal

component to how much news is generated over the course of a year.

3. MARKET SECTORS

The sector and industry category associated with each record is the primary means by

which to categorize the articles in the data. Each record includes information about the

article and metrics derived from technology utilized by Accern.

A. Total Volume of Articles by Sector

4

A close inspection of the above chart shows the volume distribution of articles among stock

market sectors. It reveals a very discernable pattern. The technology and consumer services

sectors garner as much as 58 percent of the news articles within the entire data set.

Page 5: Accern whitepaper article volume distribution

B. Annual Distribution Breakdown of Articles by Sector

5

Furthermore, this pattern is consistent across the entire duration of the data. The chart above

shows multiple years for each sector and clearly indicates the consistency of the technology and

consumer services sectors that account for a significant amount of the articles in the data set for

each respective year.

Page 6: Accern whitepaper article volume distribution

C. Cumulative Annual Breakdown of Articles by Sector

6

Additional confirmation is provided in the above chart, in which the shaded areas for the years in

which there is complete data are very similar, indicating that the total number of articles, and how

they are distributed across sectors, is consistent across all the years included in the data.

This concentration of articles in the technology and consumer services sectors may be the result

of several factors. First, these two sectors, year-to-year, are amongst the most volatile and most

traded (in terms of volume) sectors in the stock market. Second, both sectors are associated with

products and services that are highly ubiquitous in all aspects of American society, and throughout

the world. Also, these same products and services are very pervasive in other sectors. Lastly,

news is, in many ways, a profit-generating industry. Therefore, news outlets and journalists have

a propensity to write stories on topics that are popular for the sake of increasing readership, hence

the emphasis placed on “popular” sectors like technology.

Page 7: Accern whitepaper article volume distribution

4. INDUSTRY CATEGORIES

Stock market industry categories are a subset of market sectors. The chart below showing

the breakdown of the top 20 industry categories in terms of volume of articles, further

accentuates the conclusions drawn from the sector charts. Four of the top five industries are

directly related to the technology sector (the sector with the highest article volume).

A. 20 Industries with the Highest Volume of Articles

7

Virtually all of the industry categories listed on the chart utilize technology in one way or another

to manufacture, deliver, or provide a product or service. This makes the technology sector

interrelated with most industries. The ample representation of the consumer services sector is

also confirmed, with numbers 4, 6, and 7 on the industry categories list all being directly related

to it. This makes seven of the top 20 industries on the list directly related to the technology or

consumer services sectors. A closer inspection also reveals that a total of 12 of the 20 industries

on the list share the same two sectors. This concentration of article volume is consistently

reflected throughout the data set.

Page 8: Accern whitepaper article volume distribution

5. EVENT GROUP ANALYSIS

News articles are the result of news events taking place that are worthy of reporting and

disseminating information about to the general public. These types of news events will

determine the impact the news will have on markets. Each record includes a field that

categorizes the news articles according to the type of event.

A. Total Number of Articles per Event Group

8

The events that prompt the news articles to be written are categorized according to the 16

different events listed in the horizontal axis of the above chart. Company earnings and general

business actions garner the highest concentration of articles with respect to event group types.

These categories are also very common and periodic in nature. The events that are not so

common, and are unexpected, i.e. disasters and criminal and legal actions, are the ones that

have a greater impact.

Page 9: Accern whitepaper article volume distribution

6. OVERALL SOURCE RANK

The source from which the news article is obtained is also ranked. This helps to determine

the validity of the article. The source of each article is ranked on a scale of 1 to 10. The

charts below show the distribution of all of the articles in the data. The chart indicates a very

strong concentration in the range of 4 to 9, in sharp contrast with the rest of the scale. The

lower chart shows the consistency of this pattern across all of the years.

9

A. Number of Articles for Overall Source Rank

Page 10: Accern whitepaper article volume distribution

Both of the above charts clearly indicate that

news feeds are where the bulk of the articles

are sourced from. The upper chart further

indicates that this has not changed much over

the years. While the total number of articles has

varied, the percentage breakdown from year to

year has remained consistent.

7. ARTICLE TYPE

Another data field related to the source of the article that is included in each record is one

that identifies whether the article was sourced from a news feed or a blog. Along with other

data about the article, this is critical in determining the overall validity and reliability of the

article and its source.

10

A. Annual Breakdown of Articles by Article Type (News vs. Blog)

B. Percentage Breakdown of Articles by Article Type (News vs. Blog)

Page 11: Accern whitepaper article volume distribution

8. ARTICLE SENTIMENT

An important metric derived from Accern technology is article sentiment. This metric

measures the positive, negative, or neutral sentiment of each article, and assigns a number

from -1, indicating the highest degree of negative sentiment, to +1, indicating the highest

degree of positive sentiment.

The chart below breaks down the three sentiment categories across all years.

A. Breakdown of Article Sentiment

11

Articles with a greater indication of sentiment neutrality tend towards a measurement of “0” on

the scale, which is the measurement with the highest concentration of articles, as seen in both

charts below. Another noticeable characteristic is the higher volume of articles on the positive

side of the sentiment scale, and the wider distribution.

Page 12: Accern whitepaper article volume distribution

B. Annual Breakdown of Total Volume of Articles by Sentiment

12

Although there are slight variations, this is also consistent year by year. This consistency may be

attributed to the fact that the stock market has been trending upwards over the duration of the

time period covered by the data.

Page 13: Accern whitepaper article volume distribution

9. EVENT IMPACT SCORE ON ENTITY

Each article record also includes a field that measures the impact of the article on the entity

that the news article is about. The impact is scored on a scale of 1 to 100. The chart below

shows a peak in the volume of articles with an impact score in the 26 to 30 range, followed

by a steady decrease all the way to the top of the scale.

A. Annual Breakdown of Volume of Articles by Event Impact Score on Entity

13

The distribution pattern is very consistent from year to year, as shown in the lower chart. This

consistency should make finding an optimal threshold on which to base a trading decision easier

to determine.