science and web2.0

33
Science and Web 2.0 Ian Mulvany, Nature Publishing Group 1 Hi. I hope that the the take home message from this short talk will be that web technologies could have an important role for science, that they are in their very early days at the moment, and that over the next few years the outlook is good. I’m going to tell you a little bit about myself, about what Web 2.0 is, and why Nature is interested in these approaches. After that little overview I'm going to to run you through a short demo of some of the things we have been developing at Nature for scientists.

Upload: ian-mulvany

Post on 17-Jan-2015

20.197 views

Category:

Business


2 download

DESCRIPTION

This is an edited version of a talk that I gave on the 11th of February to some PhD students from the University of Utrecht at a seminar on science and communication.

TRANSCRIPT

Page 1: Science and Web2.0

Science and Web 2.0

Ian Mulvany, Nature Publishing Group

1

Hi.

I hope that the the take home message from this short talk will be that web technologies could have an important role for science, that they are in their very early days at the moment, and that over the next few years the outlook is good.

I’m going to tell you a little bit about myself, about what Web 2.0 is, and why Nature is interested in these approaches.

After that little overview I'm going to to run you through a short demo of some of the things we have been developing at Nature for scientists.

Page 2: Science and Web2.0

2

My name is Ian Mulvany, and I’m a product development manager with the Web Publishing Group at Nature.

I began a PhD in Astronomy, but then moved into academic publishing. Last year I began working with this group at Nature because I am really excited about the opportunities the web has for science.

I’m just telling you this because you are mostly a Cell Biology audience, so I’m interested in hearing from you what tools you use and what sorts of things we are doing right or are doing wrong from your perspective.

If you want to keep up with what our group is doing we have a public blog at http://blogs.nature.com/wp/nascent/

Page 3: Science and Web2.0

2.03

What is Web 2.0?

Historically the term was first used as the name of an O'Reilly Conference held in 2004 that discussed the post dot com bubble reemergence of internet business.

What has this got to do with science?

We will see that the defining methods that new tech companies use may be applied to issues of concern to scientists.

At the heart of the Web 2.0 approach is getting and using data, so for a scientist it should make perfect sense.

Page 4: Science and Web2.0

Web 2.0Google AdSenseFlickrBitTorrentNapsterWikipediabloggingupcoming.org and EVDBsearch engine optimizationcost per clickweb servicesparticipationwikistagging (folksonomy)syndication

Web 1.0

DoubleClick Ofoto Akamai mp3.com Britannica Online personal websites evite domain name speculation page views screen scraping publishingCMSdirectories (taxonomy) stickiness

4

The meme of Web 2.0 was influenced by comparing pre dot com bubble companies and postdot com bubble companies.

What is the difference between the list on the left and the list on the right?

Let’s take the example of Brtiannica vs Wikipedia.

The information in Britannica is centrally controlled. It has a relatively small number of contributors.The workload per contributor is high.

Wikipedia is open to anyone to contribute. A collaboration of 1000’s can lead to a work of equal quality to a more centrally controlled method.

Britannica’s revenues decreased from 650M to 50M over a 10 year period!

The new sites make it easy to add information and use that information toanswer or solve problems for people.

Page 5: Science and Web2.0

easy

easyeasy

hard mining

cont

ribu

ting

semantic web

MicroFormats

plain text, emails

academic papers

microformats

hyperlinks

tagsviews

citations?

5

Let’s look at formats for data in the space of how easy it is to create, and how how easy is it to mine for interesting information.

Plain text is the easiest to create, but is very hard to do data-mining on.

Text that has been rigorously annotated (semantic-web stuff) is very easy to do data-mining on, but it is hard to get people to make this sort of data in the day to day activity of their lives.

(One of the things that helped wikipeida be success was a tool that enabled people to easily add articles that got converted into nice looking web pages)

Unfortunately academic papers are really hard to write, and usually are only available in pdf, the worst of both worlds.

Hyperlinks, page views, tags, and possibly academic citations, are easy to create and are easy to do data-mining on. I’ll show you how some people have used these to build cornerstonesof the web today.

Page 6: Science and Web2.0

6

Google created their search engine by looking at hyperlinks.

If lot’s of pages linked to one particular page then that page is probably important.(the big red ball here)

If that page has a link, that link is important. (link from the red ball to the orange ball)

This is a mirror of the academic citation system. Sergi Brinn and Larry Page were PhD students at Stanfordwhen they founded Google.

You can see a conference paper that they wrote about their search engine here:http://infolab.stanford.edu/~backrub/google.html

Page 7: Science and Web2.0

7

Amazon use page views and a database of user purchases to find things you might like.

Again, here they are using data that they get for free from people using their site.

Page 8: Science and Web2.0

Text

8

Google, again.

The last two examples used basically static sets of data. The data is being updated, but real timeinformation is more or less not required to find the best web page on a topic or to determinea buying recommendation. (searching for news is very different.)

Google Ad Sense is different.

Putting ads on a web page does need almost real time information. Where has the person looking atthis page been, where are they physically located? How much time have they spent looking at different things?

The more accurate your matching based on behavior, the more money you are going to make.

Page 9: Science and Web2.0

9

We have seen how mining user provided data can help to solve problems about information on the Internet.

Are there any problems in Science that could help with these kinds of approaches?

I think there are lots, and I think they break down into a few different categories. Let’s have a lookat a few here.

- information management for the individual scientist- communicating with the public- mining data

On this slide we have our researcher scanning his favorite journal

Page 10: Science and Web2.0

?

?

?

???

?? ?

10

But the proliferation of journals has led to a problem, what to read?

Page 11: Science and Web2.0

11

How do we get the reader back in control?

Aside from journals, there are also lots of other places that the scientific conversation has moved to.

Page 12: Science and Web2.0

12

Discussion Groups and Mailing lists contain a huge amount of information from from snippets of computer code, to long discussions about topics.

Mark Mail, from MarkLogic, have a site that mines this information. Here we see a comparison of a search for FORTRAN vs a search for Java.

At the moment these kinds of archives are mainly relevant in the computer science area, but these kinds of conversations are going on all the time in every field.

http://markmail.org/

Page 13: Science and Web2.0

13

Science blogs represent another medium where science discussions are happening.

Page 14: Science and Web2.0

14

These are just some of the sources that might begin to load up your reading inbox, but as I mentioned beforethe web also offers great opportunities to communicate and share your scientific excitement with the general public.

Bugscope shares a scanning electron microscope with school classes across the world

Page 15: Science and Web2.0

15

The Faulkes telescope does the same with an astronomical telescope.

Page 16: Science and Web2.0

16

There are some solutions to these problems that are out there.

The Chemical Blogspace, run by Egon Willighagen from Wageningen University, automatically collects articles from blogs about chemistry and looks for the most popular articles.

http://cb.openmolecules.net/

Page 17: Science and Web2.0

17

Jean-Claude Bradley from Drexel university does open notebook science.

You can see the data and lab notes for each experiment as it is done is his group through their blog http://usefulchem.blogspot.com/.

A part of his motivation is that all of the data is never gets through to a published papercan also be very valuable for the community.

In arguments about precedence where there is an existent trial of discovery getting scooped is going to be harder.

Page 18: Science and Web2.0

18

There do exist senantic-web approaches to science. The Crystal Eye project from the University of Cambridge is trying to automate the recognition of crystallographic data in academic papers.

Page 19: Science and Web2.0

Open Science Web 2.0

Semantic Web

19

Though not exactly the same, web 2.0, Open science and the semantic web work well togetherand they share some common traits, namely sharing and openness of information.

Page 20: Science and Web2.0

20

This is leading to a brave new world in the space of scientific conversations.

So why are nature involved in these kinds of non-journal initiatives, and what are the initiatives that we are involved with?

Page 21: Science and Web2.0

• "It is intended, first, to place before the general public the grand results of scientific work and scientific discovery"

• "to aid scientific men ... by affording them an opportunity of discussing the various scientific questions that arise from time to time"

21

This is Norman Lockyer, the first editor of Nature, and these are snippets or our mission statement.

One and one way of looking at the whole issue is that journal publication is only one aspect of communication in science.

As we have seen there are a lot of new channels for communication, apart from journalsand if Nature wants to remain relevant then we have to engage with these new emerging conversation.

We have a group in Nature, web publishing, whose goal is to be experimental and try to both keep up to date with what is going on out there in terms of new developments, as well as trying to create new tools for researchers.

Page 22: Science and Web2.0

Nature Web Publishing group

OTMI

22

The main products that we have developed so far are

- database gateways - OTMI (open text mining interface) - podcasts - scintilla - nature network - nature preceedings - connotea

Page 23: Science and Web2.0

23

Nature Network is a place for hosting discussions and forums about science related topics.

Page 24: Science and Web2.0

Second Life Nature

24

- lectures in second life As my colleague Jo who works on second life likes to say, SL is a platform that offers a lot of potential, however not many people know what the best way to use it is. We have been hosting a couple of scientific projects there,

UCL CENTRE FOR ADVANCED SPATIAL ANALYSIS Drexel Chemical Reactions Artifical eco system, some of the creatures escaped our island and were found in the wild,

and I'll be happy to answer questions about these later, but

But so far the most successful thing we have done is use it as a location for hosting talks for the general public.

Page 25: Science and Web2.0

25

Scintilla is like the Chemical Blogspace in that it aggregates content from over 700 science relatedblogs.

You can tag, share and rate stories that you like on this site.

Page 26: Science and Web2.0

26

Precedings is a preprint server for the life sciences.

If you have a presentation or a poster that you have presented that will not be submitted laterto a journal you could place it here for people to access. When an item is uploaded to precedings it receives a digital object identifier (DOI) which can be used to cite the material later.

Page 27: Science and Web2.0

27

Connotea is the tool that I am responsible for. It is a citation and bookmark management service.

The best thing to do is to probably just create an account and start using the site, if you are interested in finding out how it works and play with it a bit, it's pretty easy to use, and I'll give you a quick demo of how it works Social bookmarking sites are often write only -> By making collections public and tagged you provide a resource -> For example I read what my boss is tagging

This is kind of an orthogonal use case for these kinds of services, even before we add in more complex algorithms

for example I'll show you later a tool that joins folksonomies with ontologies, (entity decsriber)

In a social environment you create a collection that might represent your interest, by adding some intelligence to the back end we hope that we can aid the presence of serendipity byhighlighting related things like related tags, users, and related content.

It might also provide a means to keep track of the scientific conversation around a topic, outside of just the citations in the literature, for example by tying blog comments to doi's of papers.

Page 28: Science and Web2.0

28

The Chemical Blogspace can read tags in connotea for items that have a specific chemical tag. This way we can begin to automate the aggregation of social information around scientific pieces of information.

Page 29: Science and Web2.0

29

Mirko Gontek at the university of Colongeinformation visualisation of links in connotea

These social links can create networks of information on top of the basic information.

This is what we want to use to start building collaborative intelligence into these systems.

Page 30: Science and Web2.0

TextTextText

Graph Analysis?

30

This is the graph of items for one user in connotea.

Page 31: Science and Web2.0

31

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Page 32: Science and Web2.0

http://www.connotea.org/blog

http://www.connotea.org/wiki/ConnoteaTools

32

Page 33: Science and Web2.0

Thanks !

33

I hope that you take away from this talk that tools are beginning to emerge. We are at the very early days in their developmentand than can only get more powerful as the expertise in developing these kinds of sites becomes more commonplace.

A frequent complaint that I get is that people have no time to investigate new ways of working, however if you look at how you manage your information now a small investment in, for example, posting your talks to a community site, or having all of your bookmarks in one location, could begin to make a difference.

Thank you!