issue 15 pdf

10
page 4 news from the EGI community ISSUE 15 APRIL 2014 (Matti Mattila / wikicommons) TOP STORIES page 2 MORE 09 DataAvenue: tool for data transfer European Grid Infrastructure www.egi.eu Grid to the masses - the EGI MOOC page 3 Case study: designer bacterias page 5 Inspired VAPOR - toolkit for VOs 08 The EGI Solutions Portfolio Can Open Science boost EGI's impact? page 7 06 Searching high and LOFAR pulsars 01 Lightning talks at the Community Forum 2014 The EGI Engagement Strategy

Upload: tranminh

Post on 31-Dec-2016

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Issue 15 PDF

page 4

news from the EGI communityISSUE 15APRIL 2014

(Mat

tiM

attil

a/w

ikic

omm

ons)

TOP STORIES

page 2

MORE

09 DataAvenue: tool for data transfer

European GridInfrastructure

www.egi.eu

Grid to the masses - the EGI MOOCpage 3

Case study: designer bacteriaspage 5

Inspired

VAPOR - toolkit for VOs

08 The EGI Solutions Portfolio

Can Open Science boost EGI's impact?page 7

06 Searching high and LOFAR pulsars

01 Lightning talks at the Community Forum 2014

The EGI Engagement Strategy

Page 2: Issue 15 PDF

In this issue...> Franck Michel introduces VAPOR, a new set of tools to make

VOs easier and cheaper to run> Jan Bot writes about the experience of creating a MOOC

from scratch> Gergely Sipos unveils the EGI Engagement Strategy> Anna Slater writes about the use of grid computing to design

a toxic chemical-eating bacteria> Neasan O'Neill interviews astronomer Thijs Coenen> Sergio Andreozzi asks if Open Science can boost EGI's impact> Javier Jiménez writes about a new iteration of the EGI

Solutions> Kitti Varga and Akos Ajnal introduce DataAvenuePlease send your comments, feedback and suggestions to:

[email protected] Thanks!

This Issue

The Lutheran Cathedral is just across theHelsinki University Main Building, where

the Community Forum will be held.(Illustration: Matti Mattila / wikicommons)

1Inspired // Issue #15 April 2014

Lightning talks in HelsinkiIt's almost time for the EGI Community Forum 2014 (19-23 May)

The final preparations for theEGI Community Forum 2014 inHelsinki (19-23 May) are wellunder way.The timetable is available onlineat: http://go.egi.eu/CF2014The list of keynote speakers isfinalised and we have abstractsand more information aboutthe speakers on the conferencewebsite: http://go.egi.eu/keynotesLightning talksOne of the new features of theEGI Community Forum 2014 willbe a series of Lightning Talks byresearchers who can demons-trate the scientific impact of gridcomputing.After a successful call forparticipation, the organisersselected the following talks forthe Opening Plenary (19 May):> Towards the big data strategiesfor EISCAT-3D, presented byIngemar Haggstrom> Implementing an air qualityforecasting system on the grid,presented by Anastasia Poupkou

> Large-scale search for periodicsolutions to the Newtonian three-body problem, presented byMilovan SuvakovThe range of talks aims to coverdifferent areas benefitting fromthe support of EGI as an e-Infra-structure for science. Ingemar’stalk highlights the importance ofEGI for large-scale, collaborative,pan-European projects in theESFRI roadmap. Anastasia’scontribution demonstrates theimportance of distributedcomputing to tackle societalchallenges. And Milovan’s workis an example of excellentscience with high scientificimpact in its field.And more...We will also have a dedicatedsession for networking, postersand demos on Wednesday 21May 11:00-12:30. The NetworkingSession will be the ideal occasionto interact with colleagues, seewhat is new and brainstormabout ideas and futurecollaborations.

http://cf2014.egi.eu

Page 3: Issue 15 PDF

2

The VAPOR portal: toolkit for VO management

Franck Michel introduces a new set of tools to make VOs easier and cheaper to run

Virtual Organisation (VO) mana-gers and their support teamshave a large variety of tools andportals available to assist them inmanaging their community andoperating resources. A fewexamples are GStat to browsethe information system, SAMNagios to monitor availability, theCERN Experiment Dashboard orthe VO Operations Portal.Although generic, some of thesetools are often designed to meetthe needs of somewhat specificcontexts (e.g. a large communitywith well-established resourcemanagement policies), and theiroperation requires solid ITsupport.For small or newly emergingcommunities, starting up theoperations of a VO, having anoutlook on the set of storage andcomputing resources that supportthe VO, monitoring activity,dealing with the files spread outby users, can be difficult tasks.VAPORThe Vo Administration and ope-rations PORtal, was developed tohelp small to medium-sized griduser communities to performcommon VO administration andoperation tasks, at a reducedhuman cost. Such communitiestypically have little or nodedicated IT support and mayuse resources either in reservedor opportunistic manner. VAPORis generic, experiment-independent, and complementsexisting tools such as SAM Nagiosand the VO Operations Portal.VAPOR’s features were designedto (1) provide statistical reportsand status indicators about theresources supporting a VO, and

(2) facilitate the management ofVO users’ data.With VAPOR, VO managers can:

(1) generate on-demandreports to (i) visualise status in-formation of resources suppor-ting a VO, (ii) monitor the runningand waiting jobs to help detectpeaks of activity and bottlenecks,(iii) monitor computing elementssuccess rate and time response,and (iv) compute a so-calledwhite list of computing elementsto feed a job submission system.

(2) use data managementfeatures to (i) monitor storageelements to prevent them fromfeeling up, (ii) check and fix incon-

sistencies between the storageelements and the file catalogue(dark data, lost files), and (iii)clean-up files left behind byformer users.VAPOR has been progressivelydeployed during the first quarterof 2014 and is already supportingthe VOs biomed, compchem,enmr.eu, vlemed, shiwa-workflow.eu and vo.france-grilles.fr.We believe that VAPOR will helpthose existing communities indaily administrative and opera-tional tasks, and will possiblyenable the mutualisation of tasksbetween several communities.

How to use VAPOR

VOs can start using VAPOR on demand by sending a request [email protected].

Or they can easily deploy their own instance of VAPOR usingdetailed documentation: http://go.egi.eu/vapor-install

Project wiki: https://wiki.egi.eu/wiki/VT_VAPOR

© timsackton / Fotopedia / CC-BY-SA 2.0

Inspired // Issue #15 April 2014

Page 4: Issue 15 PDF

3

Bringing grid computing to the masses: the SURFsara& EGI Massive Open Online Course

Jan Bot on the experience of creating a MOOC from scratch

Massive Open Online Courses(MOOCs) are complete coursestaught to many students using anonline platform, with the contentbeing freely available to anyoneinterested. This new form ofeducation has recently becomepopular and many universitieshave experimented teachingtheir courses online, usingeducational platforms such asCoursera, EdX or Udacity.Grid computing is perceived bymany as being hard to learn. Thetheoretical background in distri-buted computing and the needfor hands-on experience is hardto obtain for many scientistsworking in fields with little or noexperience in large-scalecomputing.Seeing this opportunity, wedeveloped a MOOC to help thesescientists taking their first fewsteps on grid computing.Grid computing is well suited tobe taught as a MOOC, due to itsdistributed and internationalnature. With the backing of theEGI-InSPIRE project, we set out tocreate a course which wouldexplain some of the fundamentalconcepts behind grid computing,while also providing the hands-on experience needed to startworking with it directly.The first step was to work on thesupport materials: slide decks,quiz questions and assignments.Then, we asked different peoplefrom the EGI community tocontribute use cases to showhow grid computing is currentlybeing used. The use cases were,just as the course’s lectures,

recorded as screencasts withvoice-overs to explain the materialand covered a variety of scientifictopics, including astronomy, lifesciences and climate modelling.The course ran from November2013 until the end of January2014. Over 350 people joined, ofwhich 30 completed the finalassignment. A 10% graduationrate is actually to be expectedwhen working on a MOOC, asmost students are only interes-ted in exploring some parts ofthe content.Our students included PhD stu-dents from different fields,programmers based at Dutchmedical centres and astronomersfrom LOFAR.The students had the opportunityto get hands-on experience ongrid computing by using a pre-configured Virtual Machine (VM)with all the necessary softwaretools installed. The VM providedthe grid environment so that thestudents could test their under-standing on quizzes, weekly

exercises and the final test. Thequizzes were graded andstudents with an average scoreabove 60% advanced to the finalassignment.The feedback we received isencouraging. Of the studentswho completed the final survey,92% (23) gave the MOOC a 4 or 5out of 5 grade, and about half ofthese said they will use gridcomputing for their research.The MOOC course materials havebeen added to the SURFsarawebsite where they are now partof the grid documentation. Thematerial created for the course isstill available and can be used byanyone interested.

The MOOC team

Anatoli Danezi,Jeroen Schot,Tijs de Kler,Nico Kruithof,and Jan Bot

url: http://mooc.uva.nl/portal

© Marie-Lan Nguyen / Wikimedia Commons / CC-BY 2.5

Inspired // Issue #15 April 2014

Page 5: Issue 15 PDF

4

The EGI Engagement Strategy

Gergely Sipos summarises plans for future engagement with user communities

Sustainability is essential for e-Infrastructures and the scientificcommunities that they support.Many of these scientificcommunities have researchagendas measured in decadesand need to be assured of thecontinued operational presenceof the e-Infrastructures that theyadopt.EGI’s sustainability plans havebecome increasingly coupledwith its long-term strategy toconnect researchers from all fieldsof science across the wholeEuropean Research Area (ERA) withthe reliable and innovative ICTservices from EGI that they need toundertake their collaborativeworld-class and world-inclusiveresearch.Engagement is a key element ofthis EGI strategy with thefollowing goals:

1. Identify scientific commu-nities from the ERA that canbenefit from the use of solutionsprovided by the EGI community.

2. Reach out and engage withcommunities about ICTtechnologies to capture theirrequirements.

3. Help communities toaddress scientific challengeswith existing EGI solutions, or todevelop new solutions as needed.

4. Support scientific commu-nities to become self-sufficientusers of the EGI e-Infrastructureservices.The Engagement Strategydescribes the goals and targetsof EGI engagement activities andprovides information about thehuman networks, online

resources and tools that willhelp us to implement theseactivities.In the next two years a growingnumber of Research Infrastru-ctures (RIs) from the ESFRI andnational roadmaps are expectedto reach implementation oroperational stage. These RIs arealready exploring the currentand future needs of their usercommunities and they bringtogether a wide diversity ofstakeholders looking for solutionsto many of the problems scienceis facing today.The ESFRI RIs, their preparatoryprojects, and other multinationalscientific collaborations areconsidered as the primarypotential beneficiaries of EGIservices and therefore the primetargets of EGI Engagementactivities.A second target group are smallresearch collaborations andresearch networks. Unlike RIs,these groups may not be awareof the benefits of e-Infrastructuresto science, so discussions haveto start at a more basic level.

The EGI Engagement workflowaligns elements into a processthat helps EGI to reach newusers, and support themthrough the use of EGI services.This workflow consists of threephases:1. Outreach: Using marketingand communication, this phaseaims to identify ERA memberswhose work could be lifted tothe next level by EGI’s e-Infrastructure services, andinform them about the EGIsolutions and possibleengagement options.2. Scoping: In this phase,engagement with new users isdeepened; detailedrequirements from their e-Infrastructure use cases arecaptured and translated intofocussed support project plans.3. Implementation: This phaseexecutes Virtual Team projectsthat, after successfulcompletion, will enable the usersto reach new frontiers inscience. The projects will alsoindirectly result in a diversifieduse of EGI’s solutions.

Inspired // Issue #15 April 2014

Page 6: Issue 15 PDF

5

Designing a toxic chemical-eating bacteria, betweenthe lab and the computer

Anna Slater explains why grid computing was important

For centuries man has harnessedvarious chemicals in industry andresearch. They are not all benignthough and contamination ofland and water is a potential risk.Millions of years of evolutionhave created an astonishingdiversity of bacteria capable ofprocessing countless chemicalsinto more harmless substances.These too have been takenadvantage of, but some chemicalcompounds used by man do notexist naturally.TCP (1,2,3-trichloropropane) isone such compound. Firstcreated during the industrialrevolution it is now used as apesticide and in chemical manu-facturing. It is also toxic andpersistent, once released in togroundwater, it can stay there forover 100 years. A potentialsolution is clearly a bacteria thatcould convert TCP into a harmlesscompound, quickly and cost-effectively. Yet there hasn’t beensufficient time for one to evolve.Now a group of scientists atLoschmidt Laboratories ofMasaryk University in the CzechRepublic have used the grid tohelp evolution, by engineeringsynthetic bacteria that canmetabolise TCP into harmlessglycerol.The team, led by Jan Brezovskyand Zbynek Prokop, used acombination of grid-poweredmathematical modelling andlaboratory experiments to designand test a five-step metabolicpathway for a new bacterium.The key is balancing the activityof multiple enzymes to keep thebacterium alive, while sustaining

a high processing rate. The teamwere also able to engineer newenzyme variants, which addsfurther complications of whatvariants to create, and their mosteffective combinations andconcentrations.The scientists used computer-assisted directed evolution topredict and plan the creation ofthe enzyme variants. These werethen created and tested for theirperformance. Using these valuesthe team then created a model tosimulate the vast number ofpossible enzyme ratios, in orderto find the optimum combination,which would balance toxicity andglycerol production.The team consumed around 100CPU days using the grid compu-ting resources provided byMetacentrum in the CzechRepublic. The grid-assistedapproach offered a twofoldadvantage. “We could explorecombinations in much higherdetail since we could afford torun variants with minimal mutualdifferences,” said Jiri Damborsky,

one of the scientists in the group.More importantly, grid resourcesenabled tight interaction betweenrounds of experimental work andthe ongoing refinements andpredictions of the model.“Taking into account therequirements for high accuracyand the time demand ofcomputationally intensivecalculations of differentialequations, grid calculation wasthe only reasonable choice.”The predictions of the modelwere used to create bacterialstrains containing the optimalenzyme ratios. These real-lifebacteria behaved very closely towhat was predicted. The modelalso revealed limitations ofspecific enzyme variants, andpredicted ways to improvepathway function. “Our studyhighlights the potential offorward engineering of micro-organisms for the degradation oftoxic anthropogenic compounds,and the essential role of gridcomputing to design complexbiological systems,” Jiri concludes.

Inspired // Issue #15 April 2014

Diagram of the TCP molecule, with Cl atoms in green.(Source: wikicommons)

Page 7: Issue 15 PDF

6

Searching high and LOFAR pulsars

Neasan O'Neill interviews Thijs Coenen about his research and the added value ofgrid computing

More information

url: http://www.lofar.org/

Clouds of charged particles move along the pulsar's magnetic field lines(blue) and create a lighthouse-like beam of gamma rays (purple).

(Image: NASA via wikicommons)

Inspired // Issue #15 April 2014

What is your research?I am an astronomy researcher

with the LOFAR Pulsar WorkingGroup based at the Universityof Amsterdam. My workfocuses on discovering nearbyweak pulsars using data fromthe LOFAR Pilot Pulsar Survey(LPPS) and the LOFAR Tied-Array Survey (LOTAS).

What are pulsars?Sometimes near the end of a

star’s life their core gets superdense and collapses under itsown weight. This results in amassive explosion, or super-nova. These in turn createneutron stars that rotate athigh speeds and emit radiosignals, light and radiation, likea cosmic lighthouse. These areknown as pulsating radio starsor pulsars.

So what is LOFAR?The Low Frequency Array (LOFAR)

is a new radio telescope thatconsists of receivers spread outover Western Europe, withmost being based in the northof the Netherlands. It uses alarge number of antennas thatare combined throughsoftware to form one largetelescope to detect weak, lowenergy radio signals. LOFARhas a unique view of the sky, itis much more sensitive thanother observatories operatingat similar frequencies and isone of the largest in the world.

Why study pulsars?Pulsars are valuable natural

laboratories for astronomersbecause the conditions theyare host to cannot be createdon Earth. Since their discoveryin 1967 they have given usinsight into gravitational fields,

planets outside our solar systemand how matter behaves atextremely high density.

How has the grid been useful?Processing the data from pulsar

surveys is a largecomputational task. The twopilot surveys consisted ofseveral thousands ofobservations of the sky that allneeded to be searched forperiodic signals and singlebright pulses. Using the Dutchresources operated bySURFsara and connected to theEuropean grid I couldefficiently search the data. Itmade my life a lot easier bothin terms of the overall timetaken by the computationaltask and the time spentinteracting with and babysittingthe data processing.

What kind of results have youhad?

Back in November as part of thedefence of my PhD I was able

to announce the first newpulsars discovered by LOFAR.It not only was a big part of myPhD but also validates theapproach we have taken forthe LOFAR pulsar surveys andbodes well for the full-strengthpulsar survey currentlyunderway

And what does the future hold?I continue to work with LOFAR

on pulsars and I expect we willhave more announcementsabout new pulsars as timemoves on. However it is a largeproject and LOFAR will have animpact on many other areas ofastronomy from our planet’sown ionosphere to massive,distant, galaxies formed at thevery beginning of the universe.

Page 8: Issue 15 PDF

7

Can Open Science boost impact in a 'Publish or Perish'reality and make EGI’s research outputs more visible?

Sergio Andreozzi and Ivo Grigorov on the advantages of Open Science

Excellence in science, brightideas and innovative thinkingwill always be a top skill whenembarking on a research career.But in this highly-competitiveage, 'how' we do research cancontribute as much to impact asthe ingenuity of the researchideas themselves.The definition of impact ischanging in the eyes of thefunders from simple citationcounts to how much societalengagement is achieved by theresearch results. Funders are nolonger just satisfied with cleverideas, but want to see themaccessible to citizen scientists,the private sector, and society atlarge. The Horizon 2020 criteriaare just one last example of howopen research results receiveequal importance to theinnovative character of thatresearch.Open Science provides concreteways to improve the day-to-dayresearch routine. Instead ofbeing just another burdennecessary to pass the reviewprocess, Open Science in theInternet age offers the potentialto make individual researchmore visible, talked about,followed and understood by thewider scientific community,inviting serendipity and potentialnew collaborations.How to optimise the chances forthe evaluation process, whileboosting the personal profile ofthe individual researcher, is thetopic of session 'Open Access toEGI Research Outputs' duringthe upcoming EGI CommunityForum in Helsinki (21 May 2014).

The session will also cover theservices EGI is developing tosupport the implementation ofthe Open Access policy and toallow researchers to publish andshare the research outputs thatare possible thanks to EGI. Theaim is to increase discoverabilityand reuse in order to accelerateinnovation and research.Making EGI’s value visibleOpen Access also allows EGI todemonstrate its scientific impactby linking research outputs to itsservices and resources, or to theservices and resources providedby the National Grid Initiativesand virtual organisations.The collaboration with theOpenAIRE project is key to thisgoal and the new functionalitiesthat this project has been deve-loping for EGI will also be pre-sented at the Community Forum.EGI.eu will present its plans fordeveloping a federation of digitalrepositories for open data to beoffered to the communities.With this, EGI aims to stimulate achange in the mind set of

researchers and highlight thatresearch and e-Infrastructuresshould become a primaryconcept that should be citableand linked to publications, data,applications and so on torecognise their value, but also toincrease the visibility as enablersof new scientific discovery fortoday’s researchers andtomorrow’s citizen scientists.

More information

Sergio Andreozzi is Strategy& Policy Manager at [email protected]

Ivo Grigorov is a projectmanager for TechnicalUniversity of Denmark andmember of the FP7 FOSTERSteering [email protected]

Open Science workshop@CF2014http://go.egi.eu/egicf14-oa

Inspired // Issue #15 April 2014

© Marie-Lan Nguyen / Wikimedia Commons / CC-BY 2.5

Page 9: Issue 15 PDF

8

The EGI Solutions Portfolio

Javier Jiménez writes about the new version of the EGI solutions

The European Grid Infrastructure,as a community, serves theneeds of multiple resourceproviders, research communitiesand the National Grid Initiatives(NGIs). All these consumers havedifferent needs, problems andexpectations, which are evolvingconstantly and quickly.The EGI solution portfolio wasdeveloped during 2013 topresent dedicated answers tospecific user needs. But as theusers’ requirements evolvequickly, so do the EGI Solutions.The EGI solutionsThe Community-DrivenInnovation & Support solutionaddresses the way EGI respondsto the researchers’ supportqueries. Whenever researchersencounter a challenge accessingEGI resources, they can, asbefore, knock on many doors.But if the problem requires anew technology, it is nowpossible to summon a group ofexperts to put their brainstogether and create aninnovative answer. This will,then, become part of the pool ofpreviously existing applications,workflows or any other alreadyexisting approach.The Federated Cloud solution isthe long-awaited response tothe demand for a Europeanfederation of academic clouds.With this solution, researchersobtain a single cloud system fortheir research activities, whichthey are able to scale to theirrequirements, which is fullyresilient and free from vendorlock-in. The user-researchers canfocus on their core work andobtain new approaches to theirwork.

The Federated Operationssolution is aimed at theresource infrastructureproviders of EGI, to make theiroperations even more efficientand effective, or at those wishingto become members, toguarantee seamless integration.This solution relies on thelightweight standards family forservice management in ITservice provision, known asFitSM, which EGI has helpedshape. This is a breakthrough inthe implementation of servicemanagement routines in allfederated IT service provisioning,not necessarily related withscientific production. With thefocus on sustainability, theexpertise achieved within EGImight also be valuable for otherIT services organisations in theirstruggle for quality managementin a federated environment.The High-Throughput DataAnalysis solution represents thecore of the EGI activity, which isthe provision of high qualitydata- and computation intensiveresources in a distributedinfrastructure.

New tools for make yourwork easierThe solutions are described in aseries of white papers describingthe problems they address, whoare the beneficiaries from thesolution and how the solutionsare built to provide value.These documents can be usedand delivered to your contacts atall levels. They aim to help you inexplaining what is what EGI does,and what EGI’s value propositionis.

White papers

The EGI Solutions aredescribed in the followingwhite papers:

Federated Operationshttp://go.egi.eu/2196

Federated Cloudhttp://go.egi.eu/2197

HTC Data Analysishttp://go.egi.eu/2198

Community Supporthttp://go.egi.eu/2199

Inspired // Issue #15 April 2014

Page 10: Issue 15 PDF

9

Data Avenue: a new tool for data transfer

Kitti Varga and Akos Hajnal on a new file commander tool for data transfer

Data Avenue is a file commandertool for data transfer, enablingeasy data movement betweenstorage services such as grids,clouds, clusters andsupercomputers using variousprotocols, such as HTTP, HTTPS,SFTP, GSIFTP, SRM, and S3.If you have ever faced problemstransferring data to theenvironment where yourexperiment is running, thenData Avenue is made for you.The Data Avenue interfaceallows you to browse, downloadand upload data to and from thesupported data storages, andmove data easily between themeven if they are accessed bydifferent protocols.The storage solutions usedtoday in distributed computingenvironments have beendeveloped to serve specific goalsby fulfilling specific criteria.Accessing these storageresources, however, requiresdedicated protocols and tools toallow users to organise, manage,up- or download data via agraphical interface or commandline. Selecting the most suitablestorage resource for ourrequirements thus involvessetting up the appropriatesoftware environment, andlearning how to use the relatedtools.Data Avenue offers a great, easy-to-use solution to this problem.Data Avenue has an easy-to-

understand user interface and itis very similar to a filecommander tool that everyonecan use. It has two parallelwindows to open the origin andthe target of the file to becopied. The user has toauthenticate both sides andafter the successful log in, thefiles can be transferred betweenthe sources without using theuser’s local computer storage.Data Avenue can be accessedfrom everywhere, using acomputer with an up-to-datebrowser and an internetconnection.The Data Avenue family consistof three members.The Data Avenue@SZTAKI is astand-alone installation of theData Avenue portlet hosted atMTA SZTAKI that can be usedeasily and quickly to try out thefeatures of Data Avenue withoutany installation.The Integrated Data Avenue is aportlet integrated into WS-

PGRADE/gUSE that offers a web-based interface for managingdata storages with the help ofData Avenue Blacktop.For developers, Data AvenueBlacktop is the core service thatcan be used to perform differentoperations related to storagesthrough a web service interface.

More information

url: https://data-avenue.eu/

If you would like to learn moreabout the Data Avenue service,there will be a live demo of theData Avenue service at theEuropean Grid InfrastructureCommunity Forum 2014 inHelsinki, Finland.

Inspired // Issue #15 April 2014