mapping the blogosphere in america cs406 assignment - group presentation 1 mapping the blogosphere...

Post on 28-Mar-2015

217 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Mapping the Blogosphere in AmericaCS406 Assignment – Group Presentation

Brian McGeeCraig MurrayPiers ThorogoodEmlyn Whittick

2

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Agenda• Summary of the paper• Paper’s key focuses

– Geolocation of blogs– Indexing blogs to city units

• Related Work– Geolocation in general– Alternative mapping of the blogosphere

• Conclusion• Questions

3

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Summary of the Paper

“Mapping the Blogosphere in America”

• Presented at the WWW2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics

• Dr. Alexander Halavais & Dr. Jia Lin• University of Buffalo School of Informatics, NY

4

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Summary of the Paper• Initial phase of a long-term project• Long- term goals:

– Examination of American urban culture– Based on information found in personal blogs– Observe localised political agenda and opinion

• Short- term goals:– Extracting geographic information from blogs– Indexing blogs to ‘city units’

5

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Geolocation of Blogs• No single method to calculate the location

of a blog...• Self- hosted blogs (dedicated domain

name):– Registrant’s address found in domain registry

• Hosted using blog- hosting service:– Location perhaps included in user-profile– Blog perhaps registered with regional blog-

hosting service e.g. ‘NYCblogger.com’

6

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Geolocation of Blogs

• What if there is no explicit location information?

• Answer: Data Mining...– Links to a CV or biography containing

location information– Location found from links to local

weather, school, church or other communities

7

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Geolocation of Blogs• Manual pilot run on 1500 US blogs

– 60% successful identification for self-hosted blogs

– 30% for blogs on blog-hosting sites

• Working on an automatic algorithm• Current approach...

1. GeoURL Metadata, if available2. Whois query for unrecognised domains3. Profile information, if available4. Blogchalking, if available 5. Text on index page (Bio / resume / regionalised links)

8

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Indexing Blogs to City Units• How do we standardise geolocation data?• Varying levels of detail...

– Self- hosted blog: Precise• Street address• 9- digit zip code

– Blog-hosting site: Can be vague• City, state, or even nation• Local links can provide telephone area codes

9

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Indexing Blogs to City Units• How to convert this to a standard unit?• Labelling of by city is vague

– Expansion of city limits– Emergence of ‘second cities’ between big cities

• Requirement for an urban unit– “Geographic clusters consisting of certain sizes

of population sharing physical proximity” [1]

10

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Indexing Blogs to City Units

• The 3- digit zip code– Widely used in marketing and political

strategies– Represents 4 different types of area:

• Metropolitan city• Cluster of suburban cities and towns• Cluster of cities not immediately adjacent to a metropolitan

area• Metropolitan cities plus embedded cities and towns

11

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Indexing Blogs to City Units• Preliminary examination of blog

distribution in the US– Users taken from Livejournal and Diaryland– Both services include location in user profile

• 797 different 3- digit zip codes found• Overall distribution consistent with

population distribution and concentrations of high socio-economic status

12

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Indexing Blogs to City Units

Figure 1. Distribution of blogs in sample [4]

13

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Limitations of the Paper• Authenticity of quoted geographic

information is questionable• 3- digit zip codes

– Overstate the number of bloggers in metropolitan cities

– Many small cities can be grouped into one unit, despite no evidence of common traits or social cohesion.

– Paper suggests dividing units by socio-economic profile

14

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Related Work• Geolocation in general • Non- Geographical Mapping of the

Blogosphere:– Hyperlink Maps– Kohonen Self- Organising Maps

15

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Geolocation• Geographic Information Systems (GIS)

– Geoparsing– Geocoding

• Methods – “Whois” records– Blogging sites requiring registration– Postal addresses and telephone numbers– Geographic feature names– Hyperlinks– Meta data

16

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Geolocation• Uses

– Information retrieval based on geographic criteria

– Tailoring of advertising– Sociological and political trends, mapping the

‘buzz’ of a topic can see which areas are most interested in it

• Problems– Increasing number of mobile devices

17

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Geolocation• Trends

– Blogging hotspots

– More widespread blogging in Eastern US

Figure 2. Blogging Hotspots [6]

18

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

• Trends– Analysing blogs by

geography shows where interest lies.

– You can see a correlation between blogs and restaurant locations

Geolocation

Figure 3. Steak n Shake Restaurants [6]

19

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Mapping Blogosphere

• Other methods of mapping the blogosphere:– Mapping hyperlinks– Self- organising maps– Mapping communities

20

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Mapping Hyperlinks

Figure 5. Outbound links [1] Figure 6. Inbound links [1]

• Cybermap showing outbound and inbound links from www.littlegreenfootballs.com in 3D hyperbolic space

21

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Self- Organising Maps• Neural Network like Kohonen SOM’s can

be used to map blogosphere• Advantages

– Performs clustering of input data

– Maps this onto 2D surface for easy visualisation

Figure 7. Kohonen Map of Blogs [7]

22

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Mapping Communities • Location,

friendships and communities are all interrelated

Figure 4. The importance of location interest and age in forming blogging

communities [5]

23

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

Conclusion• Summary of the Paper• Geolocation

– Methods– Uses– Trends

• Alternative Mapping Methods– Hyperlink Mapping– Self- Organising Maps

24

Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation

References[1] R. Ackland, “Mapping the U.S. Political Blogosphere: Are Conservative Bloggers

More Prominent?”, 2005.

[2] O. Buyukokkten et al., “Exploiting geographical location information of Webpages” In Proceedings of WebDB-99, the 1999 ACM SIGMOD Workshop on the Weband Databases, 1999.

[3] B. Gueye et al., “Contraint-Based Geolocation of Internet Hosts,” In Proceedings of IMC ’04, Sicily, 2004.

[4] A. Halavais and J. Lin, “Mapping the Blogosphere in America,” In Proceedings of the Thirteenth International World Wide Web Conference (WWW2004), New York, 2004.

[5] R.Kumar et al., “Structure and Evolution of Blogspace,” In Communications of the ACM, 47:12 pp.35- 39, 2004.

[6] L. Lloyd, P. Kaulgud, and S. Skiena, “Newspapers vs. blogs: Who gets the scoop?” In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs(AAAI- CAAW), California, 2006.

[7] J. Merelo-Guervos et al., “Mapping weblog communities,” Depto. Arquitectura y Technologia de Computadores, Universidad de Granada, 2006.

top related