building blocks for a simple teragrid science gateway: issues to consider in development (40min)...

45
Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana University TeraGrid 2007 Madison, WI

Upload: august-foster

Post on 28-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

Building Blocks for a Simple TeraGrid Science Gateway: Issues to

Consider in Development (40min)

Anurag ShankarTeraGrid Science Gateways Team

Indiana University

TeraGrid 2007Madison, WI

Page 2: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 2

This unit will try to answer the following questions:

• What is a science gateway?• What questions to ask before building one?• What problems scientists face when computing?

• What can gateways do to help?• What technologies can be used?• How to ensure that the gateway will be used?• When is using the TeraGrid appropriate?• What resources do I need to build one?

Page 3: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 3

What do you really mean by a science gateway?

– A (web-based) GUI that allows a scientist to do some sort of computation by clicking buttons.

– The computation requires resource(s) at the back end to carry it out, e.g. storage, CPU cycles, databases, etc.

– These resources could be modest - perhaps just a PC, or significant - a compute cluster or a grid.

– Pardon the subsequent, implicit CPU-cycle-centricity. The gateway could as well be a data repository, etc.

Page 4: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 4

What is a TeraGrid science gateway?– A web interface

• with science users in the front and TeraGrid services in back (a traditional TG SGW).

• that bridges an existing non-TeraGrid science grid and the TeraGrid (a grid-bridging SGW).

• that allows applications running on a user’s desktop to access TeraGrid services (a personal TG SGW).

Will TeraGrid build a science gateway for me?• Nope. But we will gladly help you build one.

Page 5: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 5

Why call them gateways and not portals?

– We could, but distinguish here for the sake of clarity.

– We will use the word “portal” generally, to refer to an entry point, a URL on the web. could be an aggregation point for information, services, or tools, a means to allow ubiquitous access or the ability to customize, etc.

– We define a “science gateway” as a portal designed specifically for (or by) a specific science community.

Page 6: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 6

Ok, so I think I want to build a science gateway. Before we even start, what are the crucial

questions to ask?

What is it that we are trying to do here?

• Is it to lessen the pain? For who?• Is it to build something because it uses cool

technology users will love?• Is it to get the damn thing done so we can write that

quarterly report?•etc. …

Page 7: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 7

Questions to ask …

1. Will the gateway add value for the user?

– For example, a command line user can perform every task that a gateway can, often with far more control and without the obfuscation layer a gateway adds. • You will wrest the command line from these users only from their dead fingers. A gateway must add serious value to be successful here.

Page 8: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 8

Questions to ask …

1.1. Precisely how will the gateway add value? (aka why would the user want to use my

gateway?)

– Will it solve an existing problem?– Will it add new functionality?– Will it save user time?– etc.

1.2. How am I going to find out?

Page 9: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 9

Questions to ask …

2. If yes to (1), what technologies can and should be used?– 2.1. How?

3. Cost?

4. Validation (that I accomplished what I set out to do)? How?

Page 10: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 10

1. What problems can gateways solve?

•What are the common problems facing scientific users? – Increasing complexity of ITs.

– No time to master increasingly complex ITs.– Repetitive tasks waste a lot of time.

• No simple workflow tools.• No easy to use “toolboxes” for frequent tasks.

– An alien HPC culture for many new entrants.– Command line interface too distasteful (for

many).– No native clients to do useful things.

Page 11: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 11

1. Problems …

– No GUI for tasks that are done a lot easier graphically.

– Frequent reinvention of the wheel (redundancy of effort).

– (Insert your favorite here).All problems can be reduced toNot being able to

have the data I need delivered, here, now.

Page 12: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 12

1. Problems …

•Ok, so which of these problems can gateways solve/address? What can they add?

– Save users from back-end complexity.• especially those that do not speak HPC

– Provide a simple interface to many tasks.– Save user time by providing tools for repetitive tasks.– Provide standard tools for a discipline/group of users.– Provide a GUI when/where appropriate.– Provide statefulness, persistence, historical data, etc.– Allow ubiquitous access.

Page 13: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 13

1. Problems …

•Is a science gateway always the right approach?

– No. • For example, a PI with a small research group, all involved in extensive code modification, development, and/or testing is unlikely to benefit from a gateway.

– Gateways are best used when a large group of users (community) make use of the same computational tools.

– Fields using common data formats (astronomy, climate modeling, etc.) also lend themselves to gateway-ing.

Page 14: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 14

2. What technologies can I use?

•Common off the shelf (COTS)– Usually PHP/MySQL, perl, ruby, python based.– Very popular, open source, portal building toolkits

such as Mambo, Joomla, Drupal, e107, PHP-Nuke, etc.

– Also new “web operating systems (WebOS)” like eyeOS.

•Standards based– Portlets (JSR 168).– Globus (Globus Toolkit 4, COG kit).– Web services (WSDL, WSRF, WSRP, etc.).

• Globus web services (WS-MDS, WS-GRAM).

– Grid services (OGSA).

Page 15: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 15

2. COTS technologies …

Page 16: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 16

2. COTS technologies …

When resources are limited (time, people, expertise) and/or when the project has modest needs.

•When are COTS technologies appropriate?

– When the portal needs to be built yesterday.– When the portal needs to be built yesterday and

there is exactly one undergrad to do it.– When the undergrad has just taken his first

programming class.

Page 17: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 17

2. Standards based technologies

•What are all these terms and acronyms? Portlets? JSR 168/286? WSRP? WSRF? COG? OGSA?

– COG kit = COmmunity Grids kit– JSR = Java Standard Request– OGSA = Open Grid Services Architecture– WS-GRAM = Web Services - Grid Resource Allocation

Manager– WS-MDS = Web Services - MetaData Service– WSRF = Web Services Reference Framework– WSRP = Web Services for Remote Portlets

Page 18: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 18

2. Standards based …

–The acronym maze alone will give you a headache, even on a good day.

–Let’s try an evolutionary approach to see if helps.

–For good bedtime reading, check out my “Portals 101” document, created in desperation:

http://www.gridsphere.org/gridsphere/gridsphere/html/docsTab/r/

Page 19: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 19

2. Evolution of portal technologies …

Time

?

Prehistory

(late 1980s)

(1993)

(1995)

(1995)

(1994)

(1997)

(2003)

HTML

Static

Dynamic

Services based

Web Services

WSRP

Stateful WebServices

Page 20: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 20

2. Evolution of grid technologies …

Time

?

Prehistory(Distributed Computing)

(1997)

Globus

(2000)

Web Services

Grid Services

GT 1.0 (1997)

GT 2.0 (2002)

GT 3.0 (2003)

Global Grid Forum

Open Grid ServicesArchitecture

Open Grid ServicesInfrastructure

(1997)

(2005)

(2003)

Java COG kit(1997)

(Grid middleware)

(API for Globus)

Page 21: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 21

2. Evolution of standards …

• Web: HTML CSS XHTML XML (W3C)• Modular web: Servlets Portlets (JCP/Sun)• SOA: WS WSDL WS-x, WSRF (OASIS)• Portlets: JSR 168 JSR 286 (JCP/Sun)• Grid: Globus OGSI OGSA (GGF/OGF)

– JCP = Java Community Process (creates Java Standard Requests or JSRs)

– W3C = World Wide Web Consortium– SOA = Services Oriented Architecture– WS-x = Various web services standards or in process to be

standards (maybe), such as WS-Notification, WS-Security, etc.

Page 22: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 22

2. Problem with evolution …

Evolution according to creationists

Page 23: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 23

2. Evolution …

Page 24: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 24

2. Evolution …

Man’s Evolution from the Prehistoric to Post Fast Food

Is it or it is not evolution?Depends on who you ask.

Page 25: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 25

2. Portlets

•Standardized Java components (special servlets) that can be put together quickly to create a complete portal page.

–Plug and play. Transportable.–Generate fragments of markup.–Follow the JSR 168 standard.

• JSR 168 defines – How to bundle portlets– How the portlet lifecycle is managed

Page 26: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 26

2. Portlets …

–Run inside a “portlet container”. Two popular JSR 168 compliant containers are• Gridsphere• Apache Pluto

–The portlet container runs inside a “servlet container”. The most popular container is• Apache Tomcat

–The servlet container may work with a webserver such as Apache httpd.

Page 27: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 27

2. Portlets & the grid

• What is the connection between portlets and the grid?– None. Portlets are merely generic components.– Some portlets (grid portlets) might perform grid

tasks.

• What about Gridsphere? It has the word grid in it.– Nope. It is simply a strategic name chosen by the

Gridsphere developers.– Gridsphere is a generic, JSR 168 compliant portlet

container.– It can thus run JSR 168 compliant (or not) portlets

that do some grid task(s).

Page 28: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 28

2. Practical (standards-based) tools

• Enough! I have a headache already. Tell me something I can actually use with TeraGrid.

– COG kits– Open Grid Computing Environment (OGCE)– (Gridsphere) GridPortlets– Clarens is a web services approach to the grid– IN-VIGO virtualizes the grid– Application Hosting Environment (AHE) runs

unmodified apps on the grid

Page 29: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 29

2. Globus API

•Java community grids toolkit (COG kit)– An abstraction layer (via a Java API) that hides the

underlying middleware (Globus toolkit/different toolkit versions - GT2/GT4).

– Provides command line tools as well.

•Also Python COG kit.

http://wiki.cogkit.org/

Page 30: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 30

2. Portal Creation Enviroments

•Open Grid Computing Environment (OGCE)– A complete Java environment that allows you to

develop JSR 168 portlets, Gridsphere included.– Uses the COG kit.– Provides a number of bundled portlets

• Job submission and monitoring• File transfer• Collaboration tools, etc.

– Current version: 2.0.4.

http://www.collab-ogce.org/

Page 31: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 31

2. Portal creation …

•GridPortlets– GridPortlets is the name of the package. The package

includes grid portlets, but note the difference.– A specific, JSR 168 compliant Java implementation.– Runs under Gridsphere (not included).– Uses the COG kit but provides an abstraction layer (API) on

top of the COG kit.– Uses (depends on) Gridsphere’s simple API for creating a

GUI.– Provides an “action” model for creating portlets.– Current version: 1.4.

http://www.gridsphere.org/gridsphere/gridsphere/guest/download/r/

Page 32: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 32

3. Tips for building a usable gateway

• How can I make sure that my gateway will actually be used?– If you keep in mind three most important factors:

a) users, b) users, and c) users.– Let users dictate; don’t assume.– If users can’t, spend time with them; observe

what they do and how they do it.– Test, test some more, then test until you drop.The assumption that an IT person/developer,

removed from the user/discipline, can “build it and they will come” is doomed from the get go.

Page 33: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 33

3. Usability tip #1: Determine what users want/need

• Some users know and come seeking help.• Others have no idea; they don’t know what’s possible. How do I help them?

•Try this:– “Can I come over and see your lab (or how you

do X)?” X might be• process data/run simulation/handle results• submit/run/monitor jobs, etc.

– “Ah, that’s how you do it. What if I can Y?” Y might be• make it 100x faster• make it a lot easier, etc.

Page 34: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 34

3. Usability tip #2: Design/build a good user interface

•Otherwise why would Microsoft spend zillions of dollars on developing and testing its user interfaces?– The UI can be a make or break factor.

•How do I ensure that I have a usable UI?– Formal usability testing in a usability lab– Scour the web to learn about usability/testing – Read the “Usability 101” documenthttp://dhruv.uits.indiana

.edu/portals/usability-101.doc– Perform poor man’s usability testing

Page 35: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 35

3. Developer/user UI disconnect …

* From “DON’T MAKE ME THINK: A Common Sense Approach to Web Usability” by Steve Krug.

Page 36: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 36

3. Usability Tips #3: Follow best practices

•Refer to the TeraGrid Science Gateways Primer

http://www.teragridforum.org/mediawiki/index.php?title=TeraGrid_Science_Gateways_Primer

Page 37: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 37

4. Scaling up

• Do I need to scale up?– Not necessarily.– Many scientific applications provided in a gateway

may require only local resources (compute cluster, storage, databases, etc.).

– Many existing science gateways use quite modest back ends for compute resources.

– Some even have nothing to do with CPU cycles or grid at all.

Page 38: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 38

4. Scaling up …

• Ok, so when do I need more powerful resources (such as the TeraGrid)?– Reactively:

• too many new users, analyses, etc.• processing is too slow to be useful• local resources no longer sufficient• users yelling at you?

– Proactively:• possible future growth designed in from the get go• close monitoring of trends• etc. …

Page 39: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 39

4. Scaling up …

• Why should I use the TeraGrid?– Virtually unlimited resources (CPU cycles, storage,

databases, etc.)– Many services available.– Easy to get access.– TG support staff ready to help.– A production, national grid infrastructure (looks

good on grant too)

Page 40: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 40

4. Scaling up …

• Ok, I am convinced that I need to scale up? What do I do next?

– Nancy Wilkins-Diehr will be addressing this later today.

Page 41: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 41

Still awake? Had enough?

Page 42: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 42

5. Local resources needed

• So what will it take locally for me to build one of these gateways?

–People–Expertise–Time–Hardware–Software

Page 43: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 43

5. Local resources needed …

• How many people?– Depends. For a complex, grid-based gateway, 1-2 FTEs.

Much less if modest effort (an undergrad).

• What level of expertise?– For need-it-now projects, interpreted language (PHP, perl,

Ruby, etc.) programming skills + some DB (MySQL, etc.) knowledge.

– For a well designed, high-end gateway Java programming skills a must. Also some database and UI experience.

• How much time?– Anywhere from 3-6 undergrad months for a simple gateway

to roughly ~2 FTE-years for one that is fairly complex; this includes the learning curve (modest to high).

• What hardware?– Anywhere from a Unix/Linux box to an entire Linux cluster

depending on development needs.

Page 44: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 44

5. Local resources needed …

• What software?– Programming language(s): Perl, Python, Ruby, PHP, Java,

Javascript, etc.– Development environment (compilers, editors, debuggers,

etc.).– Databases: MySQL, PostgreSQL, etc.– Server environment: Apache httpd, Apache Tomcat, etc.– Grid middleware: Globus toolkit, COG kit, etc.– Portlet container: Gridsphere, Pluto.– Portlet building toolkit: OGCE, GridPortlets.– Web services: WSRF, WSRP, etc.– Popular portal building toolkit: Joomla/Drupal/Mambo/e107,

etc.

Page 45: Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana

June 2007 TeraGrid 2007 45