National systems

From CC Doc
Jump to: navigation, search

Other languages:


CCDBCompute Canada Data Base Descriptions

General descriptions are also available on CCDBCompute Canada Data Base:

Compute

Overview

Cedar(GP2) and Graham(GP3) are almost identical systems with some minor differences in the actual mix of large memory, small memory and GPU nodes.

Name Description Capacity Status
CC-Cloud Resources

Arbutus/west.cloud (GP1)
east.cloud

OpenStack IAAS Cloud 7,640 cores In production
(integrated with west.cloud)
Cedar(GP2)

heterogeneous, general-purpose cluster

  • Serial and small parallel jobs
  • GPU and big memory nodes
  • Small cloud partition
27,696 cores In production
Graham(GP3)

heterogeneous, general-purpose cluster

  • Serial and small parallel jobs
  • GPU and big memory nodes
  • Small cloud partition
33,376 cores In production
GP4

heterogeneous, general-purpose cluster

  • May have GPU's, large mem, etc.
Approx 40,000 cores In preliminary design stages
Niagara(LP)

homogeneous, large parallel cluster

  • Designed for large parallel jobs > 1000 cores
~60,000 cores Vendor negotiations

Note that GP1, GP2, GP3, GP4 and LP will all have large, high-performance attached storage.


Availability: Compute RAC2017 allocations started June 30, 2017
Login node: cedar.computecanada.ca
GlobusGlobus is a file transfer service [https://www.globus.org/] endpoint: computecanada#cedar-dtn
System Status Page: https://www.westgrid.ca/support/system_status

Cedar is a heterogeneous cluster suitable for a variety of workloads; it is located at Simon Fraser University. It is named for the Western Red Cedar, B.C.’s official tree, which is of great spiritual significance to the region's First Nations people. It was previously known as "GP2" and is still identified as such in the 2017 RAC documentation.

Cedar is sold and supported by Scalar Decisions, Inc. The node manufacturer is Dell, the high performance temporary space (scratch) is from DDN, and the interconnect is from Intel. It is entirely liquid cooled, using rear-door heat exchangers.

Getting started with Cedar

As part of the second phase of the CFI Cyberinfrastructure Challenge 2 program, Cedar will be considerably expanded. Initial discussions with the vendor are in progress, and the expansion is expected to be carried out winter 2018. This should result in close to doubling the capacity of Cedar.

Attached storage

Home space
250TB total volume
  • Location of home directories.
  • Each home directory has a small fixed quota.
  • Not allocated via RAS or RAC. Larger requests go to Project space.
  • Has daily backup
Scratch space
3.7PB total volume
Parallel high-performance filesystem
  • For active or temporary (/scratch) storage.
  • Not allocated.
  • Large fixed quota per user.
  • Inactive data will be purged.
Project space
10PB total volume
External persistent storage

High-performance interconnect

Intel OmniPath (version 1) interconnect (100Gbit/s bandwidth).

A low-latency high-performance fabric connecting all nodes and temporary storage.

By design, Cedar supports multiple simultaneous parallel jobs of up to 1024 cores in a fully non-blocking manner. For larger jobs the interconnect has a 2:1 blocking factor, i.e., even for jobs running on several thousand cores, Cedar provides a high-performance interconnect.

Node types and characteristics

Cedar has a total of 27,696 CPU cores for computation, and 584 GPU devices. Total theoretical peak double precision performance is 936 teraflops for CPUs, plus 2,744 for GPUs, yielding over 3.6 petaflops of theoretical peak double precision performance. 22 fully connected "islands" of 32 base or large nodes each have 1024 cores in a fully non-blocking topology (Omni-Path fabric), with each island designed to yield over 30 teraflops of double-precision performance (measured with high performance LINPACK). There is a 2:1 blocking factor between the 1024 core islands.

base nodes 576 nodes 128 GB of memory, 16 cores/socket, 2 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4.
large nodes 128 nodes 256 GB of memory, 16 cores/socket, 2 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4.
GPU base nodes 114 nodes 128 GB of memory, 12 cores/socket, 2 sockets/node, 4 NVIDIA P100 Pascal GPUs/node (12GB HBM2 memory), 2 GPUs/PCI root. Intel "Broadwell" CPUs at 2.2Ghz, model E5-2650 v4
GPU large nodes 32 nodes 256 GB of memory, 12 cores/socket, 2 sockets/node, 4 NVIDIA P100 Pascal GPUs/node (16GB HBM2 memory), All GPUs on the same PCI root. E5-2650 v4
bigmem500 nodes 24 nodes 0.5 TB (512 GB) of memory, 16 cores/socket, 2 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4.
bigmem1500 nodes 24 nodes 1.5 TB of memory, 16 cores/socket, 2 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4.
bigmem3000 nodes 4 nodes 3 TB of memory, 8 cores/socket, 4 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E7-4809 v4.

All of the above nodes have local (on-node) temporary storage. GPU nodes have a single 800GB SSD drive. All other compute nodes have two 480GB SSD drives, for a total raw capacity of 960GB. Best practice to access node-local storage is to use the directory generated by Slurm, $SLURM_TMPDIR.

Scratch storage is a Lustre filesystem based on DDN model ES14K technology. It includes 640 8TB NL-SAS disk drives, and dual redundant metadata controllers with SSD-based storage.



Expected availability: In production. RAC 2017's implemented June 30, 2017
Login node: graham.computecanada.ca
GlobusGlobus is a file transfer service [https://www.globus.org/] endpoint: computecanada#graham-dtn

GRAHAM is a heterogeneous cluster, suitable for a variety of workloads, and located at the University of Waterloo. It is named after Wes Graham, the first director of the Computing Centre at Waterloo. It was previously known as "GP3" and is still identified as such in the 2017 RAC documentation.

The parallel filesystem and external persistent storage (NDC-Waterloo) are similar to Cedar's. The interconnect is different and there is a slightly different mix of compute nodes.

The Graham system is sold and supported by Huawei Canada, Inc. It is entirely liquid cooled, using rear-door heat exchangers.

Getting started with Graham

How to run jobs

Attached storage systems

Home space
  • Location of home directories.
  • Each home directory has a small, fixed quota.
  • Not allocated via RAS or RAC. Larger requests go to Project space.
  • Has daily backup.
Scratch space
3.6PB total volume
Parallel high-performance filesystem
  • For active or temporary (/scratch) storage.
  • Not allocated.
  • Large fixed quota per user.
  • Inactive data will be purged.
Project space
External persistent storage

High-performance interconnect

Mellanox FDR (56Gb/s) and EDR (100Gb/s) InfiniBand interconnect. FDR is used for GPU and cloud nodes, EDR for other node types. A central 324-port director switch aggregates connections from islands of 1024 cores each for CPU and GPU nodes. The 56 cloud nodes are a variation on CPU nodes, and are on a single larger island sharing 8 FDR uplinks to the director switch.

A low-latency high-bandwidth Infiniband fabric connects all nodes and scratch storage.

Nodes configurable for cloud provisioning also have a 10Gb/s Ethernet network, with 40Gb/s uplinks to scratch storage.

The design of Graham is to support multiple simultaneous parallel jobs of up to 1024 cores in a fully non-blocking manner.

For larger jobs the interconnect has a 8:1 blocking factor, i.e., even for jobs running on multiple islands the Graham system provides a high-performance interconnect.

Graham high performance interconnect diagram

Node types and characteristics

A total of 35,520 cores and 320 GPU devices, spread across 1,107 nodes of different types.

Processor type: All nodes except bigmem3000 have Intel E5-2683 V4 CPUs, running at 2.1 GHz

GPU type: P100 12g

base nodes 864 nodes 128 GB of memory, 16 cores/socket, 2 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4. 960GB SATA SSD.
large nodes (cloud configuration) 56 nodes 256 GB of memory, 16 cores/socket, 2 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4. 960GB SATA SSD.
GPU nodes 160 nodes 128 GB of memory, 16 cores/socket, 2 sockets/node, 2 NVIDIA P100 Pascal GPUs/node (12GB HBM2 memory). Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4. 1.6TB NVMe SSD.
bigmem500 nodes 24 nodes 0.5 TB (512 GB) of memory, 16 cores/socket, 2 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E5-2683 v4. 960GB SATA SSD.
bigmem3000 nodes 3 nodes 3 TB of memory, 16 cores/socket, 4 sockets/node. Intel "Broadwell" CPUs at 2.1Ghz, model E7-4850 v4. 960GB SATA SSD.

Best practice for local on-node storage is to use the temporary directory generated by Slurm, $SLURM_TMPDIR. Note that this directory and its contents will disappear upon job completion.



National Data Cyberinfrastructure (NDC)

Initially the NDC will be composed of large disc+tape systems at SFU and Waterloo.

Type Location Initial Capacity Availability Comments
Project space

SFU
Waterloo
Toronto

10 PB at SFU and 13 PB at Waterloo. Space in Toronto to be confirmed.

Available
(as of June 30, 2017)

  • Mounted on the site's cluster (login and compute nodes)
  • Backed up to tape
  • Allocated through the RAC
  • Persistent
Nearline space
(Long-term Tape)

SFU and Waterloo

  • NDC-SFU
  • NDC-Waterloo
30 PB each Tape systems in production.
No definite schedule for automation.
  • Basic tape archive systems are in service, but the "nearline" automation is still in preparation.
    • Nearline RAC 2017 awards are being allocated to /project on demand. Send request to technical support.
  • Files moved to a "nearline" disc-cache location will be automatically moved to tape.
  • Tape copies will be replicated between SFU and Waterloo sites.
  • Allocated through the RAC
  • Restores will be made upon request to technical support.
  • Persistent
Object Store

All sites

  • including UVic and Toronto in the future
  • NDC-Object

Small to start
(A few PB usable)

No definite schedule.

DDN's "WOS" product has been purchased by the Compute Canada consortia, and is being deployed. Full implementation and access details are not available yet. Expectations include:

  • Fully distributed, redundant storage pools
  • Accessible anywhere (based on access control choices)
  • Allows for redundant, high availability architectures
  • S3 is planned. Swift and other access interfaces are being investigated but there is no timeline or guarantee these will be put into production.
  • This is a new service aimed at experimental and observational data
  • Allocated through the RAC
Special Purpose various ~3.5 PB Customized plans
  • Special purpose for major projects.
  • dCache and other customized configurations
  • Allocated by RAC.

Note that due to Silo decommissioning it has been necessary to provide interim storage while the NDC is developed.