microsoft hpc leor dilmanian david wollenhaupt nurzhan kirbassov c_technology

26
Microsoft HPC Leor Dilmanian David Wollenhaupt Nurzhan Kirbassov http://blackrose02.rit.edu/wiki/doku.php?id =grid:seminar1:ms_hpc_technology

Post on 20-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Microsoft HPC

Leor DilmanianDavid WollenhauptNurzhan Kirbassov

http://blackrose02.rit.edu/wiki/doku.php?id=grid:seminar1:ms_hpc_technology

Outline

• What is Microsoft HPC?• What's new in HPC?• Administrative Console• Deployment• Security• Building SOA• Parallel Programming with .NET• HPC in the news• Conclusion

What is Microsoft HPC?

• What is Microsoft HPC?• What's new in HPC?• Administrative Console• Deployment• Security• Embarrassingly Parallel Problems• Building SOA• Parallel Programming with .NET• Conclusion

What is Microsoft HPC?

• Head node: – Single point of management, deployment, and job

scheduling for the cluster• Utilizes existing corporate active directory

infrastructure to accomplish security and account management– Implementation of LDAP directory services by

Microsoft for use with Windows– Provides central authentication and authorization

• Currently in the beta stage w/ free evaluation.

What's new in HPC?

• Administrative tools– New administrator console, integrate all aspects of

cluster management.– Configuring network topology, user configuration,

monitoring jobs, health of cluster– “Node Templates Leverage”– “Windows Deployment Services” - simplify

compute node deployment.– Diagnostics testing, reporting– Job Scheduling Features

What's new in HPC?

• End User Tools:– “PowerShell” - Scheduling and managing jobs, 130

command line tools to automate system administrative tasks

– “Job Manager” interface supports parametric commands

– Job view filtering

Administrative Console

• Used for deployment & includes:– Configurations– Node management– Job Management– Change History– Diagnostics– Reports– & more

Deployment Outline

• Select Topology• Install MS Server 2008 on

Head Node• Install HPCP on Head• Configure Head• Deploy Compute Nodes• Add users, groups and

administrators

• Add node groups• Diagnostics Tests• Job Template• Monitoring begins• Job creation• Job submission• Troubleshooting

Cluster Topology

Network topology affect application performance.

• Multiple interconnects & solutions, varying in price, latency, and performance are provided.

• Application is very loosely coupled.– Select a Topology – Cisco: Solution 5

• Public network connects to all nodes, all traffic (MPI, Private and Public) over Ethernet (10/100/1000)

Installation on Head

• Install x64 Windows Server 2008 on Head– DVD or Network Location– Join Appropriate Active Directory Domain– Verify Network Connections– Stop Server Roles

Configure Head

• Consider Selected Topology– Affects usability, performance, access, and

application performance.– Physically set up network interfaces according to

topology– Configure head node to recognize physical

topology (Network Services like DHCP, RRAS, NAT & Firewall)

– Compute node naming series– Deploy O.S. Images to compute nodes– Images may be configured with additional drivers

Configure Head

• Node Templates• Install HPC Pack 2008, applications, drivers,

security patches and configurations on nodes• Different O.S./drivers/patches/etc.

Combinations.• Bring Head Online

Deploy Compute Node

• PXE service on compute cluster connects to CcpManagement service on Head node

• Templates are assigned to compute clusters• Status of nodes change as templates gets installed

from Head• Finally, the head node places compute node

"online".• Same console used to re-image then node.

– (offline -> re-image -> provisioning -> take online)• Compute nodes can be deployed manually• Centralized diagnosis: reverting

User and Node Groups

• Add users, groups and administrators– Security Policy

• Add node groups– Apply Management actions– Default: Head, compute & router nodes– Router nodes dispatch services to compute nodes

Diagnostics Tests

• Test the following– services running– service health– active directory– node-to-node connectivity– name resolution– memory bandwidth

Job Templates

• Not to be confused with Node Templates• Administrators easily manage submitted jobs– Defaults and constraints on job terms– Job to node assignment– Job terms to template assignment– Job template security policy: Which users can add

what to each template?

Monitoring Cluster

• Detect deviance from normal state/performance– Illustrate behavior of some subset of nodes

through "heatmap"– Viewing details of a node– Performance charts, event viewers– Change history– Auditing operations (& sub-operations)– Logging

Job Scheduling

• Improved, with new functionality– improved resource allocation, performance– efficient scheduling of jobs on large clusters of multi-

core nodes.– SOA communications and workloads– Resource matching– Access Control List with templates– Multilevel Computing Resource Allocation - optimal

placement of memory intensive jobs, less contention– Grow and Shrink Job Scheduling (resource allocation)– node and socket level allocation

Job Creation and Submission

• Done in administrator or user console• Microsoft: Job = Term(resources) +

Tasks(work)• Sweep Task: Start-Index, End-Index, Increment• Job Filters: View Jobs by certain criteria

Security

• Active directory domain services (AD DS)– Role based security for job submission and

cluster administration– Credentials encrypted and stored with jobs,

deleted upon completion– Jobs access network resources using credentials– AD DS used to apply and audit security policies

• Encryption, Authentication used for end to end security– MPI traffic not secure

Misc: Building SOA

• Service Oriented Application/Architecture• Prerequisites:– Visual Studio 2005 or higher– Microsoft SDK 3.0– Compute Cluster Pack v2 SDK

• Building– create service– deploy to cluster– create client

• Example in Getting Started Guide

Parallel Programming with .NET

• Traditionally, master-slave code written in C or Fortran.

• Use MPI or PVM library for interprocess communication between master and worker.

• Programmer Concerns:– message passing, data management, distributed

process management & security, parallel job scheduler

Parallel Programming with .NET

• .NET object oriented approach– Worker is an instantiated object or web service– Master: standard web interface or desktop

application, requests objects seamlessly over network.

– .Net Cluster Architecture find best resource, verifies credentials

HPC in the news

• Actuary's challenge• Oil and Gas Industry• Simplification of IT Infrastructure• AIDS research• Cancer research• Interactive digital media & rendering farms• Data visualization

Summary or Conclusion

• Centralized cluster administration• Highly productive, feature rich set of tools• Suitable for Embarrassingly Parallel Problems• Improved job scheduling• Turnkey solution• .Net– Streamlined development– Elimination of programmer concerns

References

• [1] David Lifka, Lucia Walle, Veaceslav Zaloj, and John Zollweg. Increasing the Accessibility of Parallel Processing with Microsoft .NET. HPC Cluster Environment, 2007, Microsoft Corporation.

• [2] Delivering a Service-Oriented Programming Model and Runtime System for Interactive HPC Applications. 2007, Microsoft Corporation.

• [3] Depner, A. Getting Started Guide for Windows HPC Server 2008 Beta 1. 2007, Microsoft Corporation.

• [4]http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns500/net_implementation_white_paper0900aecd804cbe16.html

• [5] http://www.microsoft.com/hpc