National systems

From CC Doc
(Redirected from Migration2016:New Systems)
Jump to: navigation, search
Other languages:
English • ‎français


CCDBCompute Canada Data Base Descriptions

General descriptions are also available on CCDBCompute Canada Data Base:

Compute

Overview

Cedar(GP2) and Graham(GP3) are almost identical systems with some minor differences in the actual mix of large memory, small memory and GPU nodes.

Name Description Capacity Status
CC-Cloud Resources

Arbutus/west.cloud (GP1)
east.cloud

OpenStack IAAS Cloud 7,640 cores In production
(integrated with west.cloud)
Cedar(GP2)

heterogeneous, general-purpose cluster

  • Serial and small parallel jobs
  • GPU and big memory nodes
  • Small cloud partition
27,696 cores In production
Graham(GP3)

heterogeneous, general-purpose cluster

  • Serial and small parallel jobs
  • GPU and big memory nodes
  • Small cloud partition
33,376 cores In production
GP4

heterogeneous, general-purpose cluster

  • May have GPU's, large mem, etc.
Approx 40,000 cores RFP closes in May, 2018
Niagara(LP)

homogeneous, large parallel cluster

  • Designed for large parallel jobs > 1000 cores
~60,000 cores Vendor negotiations

Note that GP1, GP2, GP3, GP4 and LP will all have large, high-performance attached storage.


Availability: Compute RAC2017 allocations started June 30, 2017
Login node: cedar.computecanada.ca
GlobusGlobus is a file transfer service [https://www.globus.org/] endpoint: computecanada#cedar-dtn
System Status Page: https://www.westgrid.ca/support/system_status

Cedar is a heterogeneous cluster suitable for a variety of workloads; it is located at Simon Fraser University. It is named for the Western Red Cedar, B.C.’s official tree, which is of great spiritual significance to the region's First Nations people. It was previously known as "GP2" and is still identified as such in the 2017 RAC documentation.

Cedar is sold and supported by Scalar Decisions, Inc. The node manufacturer is Dell, the high performance temporary space (scratch) is from DDN, and the interconnect is from Intel. It is entirely liquid cooled, using rear-door heat exchangers.

Getting started with Cedar

Attached storage

Home space
250TB total volume
  • Location of home directories.
  • Each home directory has a small fixed quota.
  • Not allocated via RAS or RAC. Larger requests go to Project space.
  • Has daily backup
Scratch space
3.7PB total volume
Parallel high-performance filesystem
  • For active or temporary (/scratch) storage.
  • Not allocated.
  • Large fixed quota per user.
  • Inactive data will be purged.
Project space
10PB total volume
External persistent storage

Scratch storage is a Lustre filesystem based on DDN model ES14K technology. It includes 640 8TB NL-SAS disk drives, and dual redundant metadata controllers with SSD-based storage.

High-performance interconnect

Intel OmniPath (version 1) interconnect (100Gbit/s bandwidth).

A low-latency high-performance fabric connecting all nodes and temporary storage.

By design, Cedar supports multiple simultaneous parallel jobs of up to 1024 broadwell cores (32 nodes) or 1536 skylake cores (32 nodes) in a fully non-blocking manner. For larger jobs the interconnect has a 2:1 blocking factor, i.e., even for jobs running on several thousand cores, Cedar provides a high-performance interconnect.

Node types and characteristics

Cedar has a total of 58,416 CPU cores for computation, and 584 GPU devices.

Count Node type Cores Available memory Hardware detail
576 base "128G" 32 125G or 128000M two Intel E5-2683 v4 "Broadwell" at 2.1Ghz
128 large "256G" 32 250G or 257000M (same as base nodes)
24 large "512G" 32 502G or 515000M (same as base nodes)
24 bigmem1500 "1.5T" 32 1510G or 1547000M (same as base nodes)
4 bigmem3000 "3T" 32 3022G or 3095000M four Intel E7-4809 v4 "Broadwell" at 2.1Ghz
114 base GPU 24 125G or 128000M two E5-2650 v4 at 2.2GHz + four NVIDIA P100 Pascal GPUs (12GB HBM2 memory)
32 large GPU 24 250G or 257000M two E5-2650 v4 at 2.2GHz + four NVIDIA P100 Pascal GPUs (16GB HBM2 memory)
640 Skylake 48 187G or 192000M two Intel Platinum 8160F "Skylake" at 2.1Ghz

Note that the amount of available memory is less than the "round number" suggested by the hardware configuration. For instance, "base" nodes do have 128 GiB of RAM, but some of it is permanently occupied by the kernel and OS. To avoid wasting time by swapping/paging, the scheduler will never allocate jobs whose memory requirements exceed the amount of "available" memory shown above.

All nodes have local (on-node) temporary storage. GPU nodes have a single 800GB SSD drive. All other compute nodes have two 480GB SSD drives, for a total raw capacity of 960GB. Best practice to access node-local storage is to use the directory generated by Slurm, $SLURM_TMPDIR.

Choosing a node type

Most applications will run on either Broadwell or Skylake nodes, and performance differences are expected to be small compared to job waiting times. Therefore we recommend that you do not select a specific node type for your jobs. If it is necessary, use --constraint=skylake or --constraint=broadwell. See Specifying a CPU architecture.

Performance

Theoretical peak double precision performance of Cedar is 936 teraflops for CPUs, plus 2,744 for GPUs, yielding over 3.6 petaflops of theoretical peak double precision performance. 22 fully connected "islands" of 32 base or large nodes each have 1024 cores in a fully non-blocking topology (Omni-Path fabric), with each island designed to yield over 30 teraflops of double-precision performance (measured with high performance LINPACK). There is a 2:1 blocking factor between the 1024 core islands. The Skylake nodes also span 20 non-blocking islands of 32 nodes each, forming islands of 1536 cores.



Availability: In production since June 2017
Login node: graham.computecanada.ca
GlobusGlobus is a file transfer service [https://www.globus.org/] endpoint: computecanada#graham-dtn
Data mover node (rsync, scp, sftp,...): gra-dtn1.computecanada.ca

GRAHAM is a heterogeneous cluster, suitable for a variety of workloads, and located at the University of Waterloo. It is named after Wes Graham, the first director of the Computing Centre at Waterloo. It was previously known as "GP3" and is still identified as such in the 2017 RAC documentation.

The parallel filesystem and external persistent storage (NDC-Waterloo) are similar to Cedar's. The interconnect is different and there is a slightly different mix of compute nodes.

The Graham system is sold and supported by Huawei Canada, Inc. It is entirely liquid cooled, using rear-door heat exchangers.

Getting started with Graham

How to run jobs

Transfering data

Site-specific policies

By policy, Graham's compute nodes cannot access the internet. If you need an exception to this rule, contact technical support and describe what you need to access and why.

Crontab is not offered on Graham.

Attached storage systems

Home space
  • Location of home directories.
  • Each home directory has a small, fixed quota.
  • Not allocated via RAS or RAC. Larger requests go to Project space.
  • Has daily backup.
Scratch space
3.6PB total volume
Parallel high-performance filesystem
  • For active or temporary (/scratch) storage.
  • Not allocated.
  • Large fixed quota per user.
  • Inactive data will be purged.
Project space
External persistent storage

High-performance interconnect

Mellanox FDR (56Gb/s) and EDR (100Gb/s) InfiniBand interconnect. FDR is used for GPU and cloud nodes, EDR for other node types. A central 324-port director switch aggregates connections from islands of 1024 cores each for CPU and GPU nodes. The 56 cloud nodes are a variation on CPU nodes, and are on a single larger island sharing 8 FDR uplinks to the director switch.

A low-latency high-bandwidth Infiniband fabric connects all nodes and scratch storage.

Nodes configurable for cloud provisioning also have a 10Gb/s Ethernet network, with 40Gb/s uplinks to scratch storage.

The design of Graham is to support multiple simultaneous parallel jobs of up to 1024 cores in a fully non-blocking manner.

For larger jobs the interconnect has a 8:1 blocking factor, i.e., even for jobs running on multiple islands the Graham system provides a high-performance interconnect.

Graham high performance interconnect diagram

Node types and characteristics

A total of 36,160 cores and 320 GPU devices, spread across 1,127 nodes of different types.

Processor type: All nodes except bigmem3000 have Intel E5-2683 V4 CPUs, running at 2.1 GHz

GPU type: P100 12g

count Node type cores available memory hardware detail
884 base "128G" 32 125G or 128000M two Intel E5-2683 v4 "Broadwell" at 2.1Ghz; 960GB SATA SSD
24 large "512G" 32 502G or 514500M (same as base nodes)
56 large/cloud 32 250G or 256500M (same as base nodes) may be reserved for cloud use
3 bigmem3000 "3T" 64 3022G or 3095000M like base nodes but four E7-4850 v4 "Broadwell" CPUs at 2.1Ghz
160 GPU 32 124G or 127518M like base nodes but also two NVIDIA P100 Pascal GPUs (12GB HBM2 memory, 1.6TB NVMe SSD

Best practice for local on-node storage is to use the temporary directory generated by Slurm, $SLURM_TMPDIR. Note that this directory and its contents will disappear upon job completion.

Note that the amount of available memory is less than the "round number" suggested by hardware configuration. For instance, "base" nodes do have 128 GiB of RAM, but some of it is permanently occupied by the kernel and OS. To avoid wasting time by swapping/paging, the scheduler will never allocate jobs whose memory requirements exceed the specified amount of "available" memory. Please also note that the memory allocated to the job must be sufficient for IO buffering performed by the kernel and filesystem - this means that an IO-intensive job will often benefit from requesting somewhat more memory than the aggregate size of processes.



National Data Cyberinfrastructure (NDC)

Currently deployed in Production State:

Location File System Type Current Capacity Commissioned Detail
Simon Fraser University
cedar.computecanada.ca

Project

Lustre

10 PB

June 2017

  • Backed up to tape
  • Allocated through the RAC
  • Persistent

Home

Lustre

250 TB

June 2017

  • Backed up to tape
  • Persistent

Scratch

Lustre

3.5 PB

June 2017

  • Inactive data is purged.
University of Waterloo
graham.computecanada.ca

Project

Lustre

12 PB

June 2017

  • Backed up to tape
  • All groups get a default allocation with larger awards allocated through the RAC
  • Persistent

Home

NFS

64 TB

June 2017

  • Backed up to tape
  • Persistent

Scratch

Lustre

3.2 PB

June 2017

  • Inactive data is purged.

Nearline

Lustre

80 PB

July 2018

  • File contents are moved to tape; stub files are left behind and their contents recalled transparently from tape when these files are accessed.
  • All groups get a default allocation with larger awards allocated through the RAC
  • Persistent
University of Toronto
niagara.computecanada.ca

Project

GPFS

2 PB

April 2018

  • Backed up to tape
  • Allocated through the RAC
  • Persistent

Home

GPFS

200 TB

April 2018

  • Backed up to tape
  • Persistent

Scratch

GPFS

7 PB

April 2018

  • Inactive data is purged.

Burst buffer

GPFS

232 TB

April 2018

  • Inactive data is purged.

Archive

HPSS

10 PB

April 2018

  • tape-backed HSM
  • Allocated through the RAC
  • Persistent