the main e-social science issues

28
L C o E e S S

Upload: myrna

Post on 11-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

The Main e-Social Science Issues. Applications: Many large-scale research questions in the social sciences may only be answered fully using a multi-disciplinary computationally-intensive analysis; - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Main e-Social Science Issues

L C

o E

e S

S

Page 2: The Main e-Social Science Issues

L C

o E

e S

SThe Main e-Social Science Issues

• Applications: Many large-scale research questions in the social sciences may only be answered fully using a multi-disciplinary computationally-intensive analysis;

• Data: The complexity of observational social science data can make data curation, data management and the subsequent analysis particularly difficult;

• Methodologies: Much of the quantitative technology presently used in the social sciences dates back to the 1960s and 1970s. Many assumptions in this technology were made in order to minimise computation;

• Computational Culture: Currently, most social scientists in the UK perform their analyses using standard packages or software written for single processors, limiting the scope of the substantive research questions.

Page 3: The Main e-Social Science Issues

L C

o E

e S

SLancaster’s Infrastructure

• Lancaster’s HPC

• NW Trunk

Page 4: The Main e-Social Science Issues

L C

o E

e S

SLancaster’s HPC

• Funded by the ESRC, EPSRC and HEFCE (£1.2M)and consists of an array of 103 dual-processor Sun-Blade workstations, each having between 1 and 8 gigabytes of memory.

• Fileserver with 1300 gigabytes of disk storage. • Sixteen of the workstations have "Myrinet" cards

installed to allow very high speed communication between them, supporting parallel programs which distribute large amounts of data.

• Jobs are submitted to the array from the HPC frontend machine through the Sun Grid Engine/Codine queuing system or via Globus.

• This in turns distributes each submitted job to one of the many execution hosts, or holds it until a host becomes available.

Page 5: The Main e-Social Science Issues

L C

o E

e S

SLancaster’s HPC

Page 6: The Main e-Social Science Issues

L C

o E

e S

SLancaster’s HPC

Page 7: The Main e-Social Science Issues

L C

o E

e S

SRob Allan’s HPCGrid InfoPortal web page:http://esc.dl.ac.uk/InfoPortal/

We are normally visible here but its not picking us up at the moment as there seems to be monitoring and discoveryservice (mds) registration problems at grid-support.

Page 8: The Main e-Social Science Issues

L C

o E

e S

SLancaster’s HPC details can be found at:http://giis.globus.org/ldapbrowser/login.php

Page 9: The Main e-Social Science Issues

L C

o E

e S

SLancaster’s HPC

• Running globus 2.4 with the following enhancements:

- Andrew McNab's GridPP Pool Account patch (http://www.gridpp.ac.uk/authz/gridmapdir/) to accommodate external job submissions from users without a local HPC account

- a modified version of the original release of Marko Krznaric's SGE Integeration Package (new version is at http://www.lesc.ic.ac.uk/projects/epic-gt-sge.html)

• Currently investigating adding gt3 functionality to HPC services

• The new LESC EPIC package adds SGE job-manager functionality to gt3

Page 10: The Main e-Social Science Issues

L C

o E

e S

SNW Trunk

• Funded by NWDA (£1.77M)• Four 10GbE links

– 10GbE Carlisle to Lancaster– 2x 10GbE Lancaster to SJIV C-PoP at

Warrington– 10GbE Lancaster to Daresbury Labs

• Eight 1GbE links– Carlisle – Lancaster, Carlisle-Penrith– Penrith-Kendal, Kendal-Lancaster– Lancaster-Preston, Lancaster-Chorley– Lancaster-SJIV C-PoP at Warrington

Page 11: The Main e-Social Science Issues

L C

o E

e S

S

Page 12: The Main e-Social Science Issues

L C

o E

e S

SExisting Lancaster Projects

• A Training and Support Environment for Advanced Quantitative Methods in the Social Sciences

• An OGSA Component Based Approach to Middleware for Statistical Modelling

• JISC-funded e-Social Science ReDRESS portal

Page 13: The Main e-Social Science Issues

L C

o E

e S

SA Training and Support Environment for

Advanced Quantitative Methods in the Social Sciences (ESRC)

Short Courses and Masterclasses (£154k over 2 yrs).

1. Courses cover the main methods of data collection, fundamental aspects of research design, and statistical methods of data analysis;

2. Courses viewed on-line via web browser;

3. Software courses to cover packages and languages ranging from PC to HPC specific software, such as SAS, SPSS, GAUSS and LIMDEP, and programming languages such as C++, FORTRAN and parallel programming.

• National Consultancy Service (£39k over 2 yrs)

Page 14: The Main e-Social Science Issues

L C

o E

e S

S An OGSA Component Based Approach to

Middleware for Statistical Modelling (£100k)

• SABRE: Statistical software written in Fortran designed to model recurrent events. Standard generalised linear models can be fitted as well as various mixture models with random effects

• R: A free-to-use language and environment for statistical computing and graphics providing a wide variety of statistical and graphical techniques

• Middleware for e-Social Science: Development of a parallel, multilevel, multiprocess (OGSA) implementation of SABRE as an R object to enable Social Scientists to disentangle the full stochastic complexity of socio-economic processes

Page 15: The Main e-Social Science Issues

L C

o E

e S

SMultilevel, multiprocess models

• Most random effect models are for responses of a single type, either dichotomous, ordinal or count. A single link function and family are specified. (Take 2 days on a 2GHz 0.5MB RAM P4)

• Multi-process models, are models with two or more substantively different outcomes, correlated random effects.

• Some examples of two process models include health status and mortality, or getting pregnant and finishing school. Each process may, but need not, include repeated outcomes.

• The models can also be used when the data possess a hierarchical structure, e.g. multi-stage cluster sample, where the responses at the lower levels are more correlated than those higher up, e.g. responses on individual pupils in the same class are more correlated than those between classes at the same school.

Page 16: The Main e-Social Science Issues

L C

o E

e S

SReDRESS Portal (Content)

• Introductory material from roadshows• Specific material from the Agenda Setting Workshops• On-line demonstrators• Course timetables/notes• Video/audio material• Associated reference material and FAQs• Links to JISC national collections• Links to partner institutions in Social Science• World wide links• E-mail for students/staff• Additional help for self learners• Examination and monitoring results

Page 17: The Main e-Social Science Issues

L C

o E

e S

SReDRESS Portal (functionality)

• Single sign-on/certificate-based authentication (same as the Grid and Athens)

• Role-based authorisation (students, staff, managers, developers etc.)

• Database back end for managing users and resources (OGSA-DAI)

• Content management for staff and developers• Active portal services for Grid-based

demonstrations (OGSA, Web services)• Active monitoring suite to capture workflow

and mine for enhanced requirements• XML/XSLT-driven dynamic pages• uPortal or Jetspeed framework with services

based on BlackBoard, HPCPortal and DataPortal

Page 18: The Main e-Social Science Issues

L C

o E

e S

SReDRESS will use/contribute to this

technology

Page 19: The Main e-Social Science Issues

L C

o E

e S

S

• Nesstar is a web-based facility that allows 66 major datasets to be explored online, allows simple sub-setting and simple analyses.

• Only uses one data set at a time; • Has very limited facilities for sub-setting and

none for fusing;• Restricted statistical facilities, e.g. descriptive

analysis, linear regression; • No facilities for handling missing data;• Not currently Grid enabled.

ReDRESS Content:

(Existing Tools)

Page 20: The Main e-Social Science Issues

L C

o E

e S

S • A free web-based service using R, allowing users to submit R jobs and get output back to their web session

• Rweb it needs more menus, R has available a very extensive statistical library, not used in Rweb;

• Rweb uses R and not Rmpi. For use in a Grid environment we would need these hooks to extend functionality;

• R also lacks some of the key multiprocess/multilevel and selection model frameworks appropriate to social science data, these are being developed;

ReDRESS Content:

(Existing Tools)

Page 21: The Main e-Social Science Issues

L C

o E

e S

SContent: New Tools / Middleware

1. Social scientists have much less experience and expertise in the use of the Grid than those typically from other research council areas;

2. There is a significant intellectual gap between such disciplines and computer science;

3. Distributed systems are also inherently complex and associated middleware products are not easy to use;

4. The Open Middleware Infrastructure Institute (OMII) will provide (open-source) middleware and associated services, but not specifically targeted for the social science community;

5. Need to build a more computer-literate collaborative culture for Social Science.

Page 22: The Main e-Social Science Issues

L C

o E

e S

SContent New Tools / Middleware

We propose:

1. To promote the use of component-based software development and visual composition tools and scripting languages for ease of use;

2. To offer a middleware consultancy service for application developers;

3. To exploit local expertise to develop bespoke middleware solutions for customers;

4. To develop exemplar e-Social Science demonstrators for end users;

5. To exploit state-of-the-art software development technologies such as aspect-oriented programming to enhance flexibility.

Page 23: The Main e-Social Science Issues

L C

o E

e S

S

VideoCorpus

Researcher A

Researcher B

Researcher C

Multiple video streams can be delivered into an AG or portlet environment

New Tools : Ex 1. VIDGRID

Page 24: The Main e-Social Science Issues

L C

o E

e S

SNew Tools : Ex2. The Analysis Cycle

Main ESDS Data Sets

Select Data Set and Appropriate Variables:

TTWA Data, NOMIS

Merge Files: Add Variables

Working Data

Contextual Data

Results

Page 25: The Main e-Social Science Issues

L C

o E

e S

S

DataManagement

A

DataManagement

B

DataManagement

C

Analysis A Analysis B Analysis C

Middleware

New Tools : Ex2. Linking Components

Page 26: The Main e-Social Science Issues

L C

o E

e S

SThe ReDRESS Community

Key components will be accessible on the GRID and linked into the portal and demonstrators

Lancaster/DaresburyOther Contributors/Steering Committee

… plus other contributors in the UK, from the USA & Europe

Page 27: The Main e-Social Science Issues

L C

o E

e S

SNew Lancaster Projects

• NWDA NW-GRID (400K kit 4 staff over 3 years, starts Dec 2003)

A collaboration between Lancaster (£1.0M), Daresbury (£1.0M), Liverpool and Manchester. Staff and equipment (Grids) at each site.

Projects at Lancaster in Env. Science, Physics, Computing, Sociology, Economics, Applied Statistics and Grid Training

Page 28: The Main e-Social Science Issues

L C

o E

e S

SThe e-Social Science Future

• Our existing quantitative tools rely heavily on assumptions, they come out a technology that was formed in the 60s and 70s when computers were 10**9 slower.

• What new research agendas are now relevant? The 3 exponentials will change everything.

• The new opportunities for collaboration and evidence based research will lead to new (e)science, not just making legacy approaches faster.

• We can now move away from the assumption ridden technologies and develop robust nonparametric procedures for decomposing the complexity of socio-economic processes.

• There will be amazing opportunities to make a difference/test policy instruments and address some grand challenges be they in reducing drug abuse, crime and poverty, or improving educational attainment.