WestGrid Legacy Systems Migration

From CC Doc
Jump to: navigation, search

Other languages:

Parent page: Migration from Legacy Regional Systems to Compute Canada National Systems 2016-18

Updates

v1.8 Dec.11, 2017 Bugaboo defunding delayed. Removed details for previously defunded systems.

WestGrid RAC 2017 (Current) Implementation

The RAC 2017 allocations were implemented on WestGrid continuing systems and the new Cedar system on June 30, 2017. Parallel, Grex, Orcinus, and Bugaboo were included in RAC 2017, but all of these systems will be defunded by Mar.31, 2018 so they will not be available for allocations in RAC 2018.

WestGrid Legacy Systems Migration Process 2016-2018

WestGrid will be defunding the following legacy systems. See also Where to Migrate? on the WG home pages for further details.

System Site Defunding Date Recommended Migration Date(*) Notes
bugaboo Simon Fraser University Jan.31, 2018 Now An email with migration instructions will be sent to all affected users.

Non-RAC users are responsible for their own data (see below).
To begin preparing your files, read these General Directives for Migration.

Parallel University of Calgary Mar.31, 2018 Jan.1, 2018
orcinus University of British Columbia Mar.31, 2018 Jan.1, 2018
grex University of Manitoba Mar.31, 2018 Jan.1, 2018
(*) - If possible, we recommend that you start the migration process as soon as possible in order to leave plenty of time for data migration, job migration and new system familiarization.

Defunded” means that the system is no longer funded, operated or maintained as part of Compute Canada’s national cyberinfrastructure platform. For defunded systems, the host institution assumes full control of the system after the defunding date, including managing the storage file systems and backups. Users should contact Local Site Support of defunded systems for further information about the host institution’s data retention and deletion policies:

Users on the above systems will need to migrate their data and jobs to the new systems before the defunding date.

We recommend users move their data well in advance of the relevant defunding date to avoid network bottlenecks with file transfers, etc. Any remaining data MAY BE DELETED after the Data Deletion Dates. Please note that due to privacy constraints WestGrid will not retain copies of user data. Users should ensure they take the appropriate steps to comply with any data management requirements their institution may require.

Please email support@westgrid.ca to request help with moving data or any other concerns with this migration policy.

Please visit the Migration Process page on WestGrid's website for further details about migrating off WestGrid’s legacy systems.

Data Retention and Deletion Policy for Legacy Systems

User data on defunded systems will be deleted. Users are responsible for migrating their data to alternate storage. WestGrid will give as much advance notice of data deletion dates as possible. WestGrid will keep users of defunded systems informed about timelines for migration, and will provide support for the migration process.

For defunded systems, the host institution assumes full control of the system, including managing the storage systems and backups. Users should contact Local Site Support of defunded systems for further information about the host institution’s data retention and deletion policies.

IMPORTANT: Data on defunded systems will be deleted after the published deletion dates. WestGrid has arranged with the host institutions to keep data until the published deletion date, but can make no guarantees about data retention after that date. WestGrid will not retain any long term or back-up copies of user data and as noted above users must arrange for migration of their data. Users should also ensure they take the appropriate steps to comply with any data management requirements their institution or project may require.

Where to migrate to?

Users with RAC 2017 (current) Allocations

We have a conflict here in that users with current allocations will naturally want to use their allocations and associated priority right to the end of the allocation year (Mar.31). But users also have to migrate applications and especially data before the defunding date (also Mar.31).

Hopefully users will be able to do both:

  1. Continue to submit jobs on the legacy systems right to the end.
  2. In parallel migrate applications and data to a new system in preparation for the final move shortly before the defunding date.

Storage Resources

The National_Data_Cyberinfrastructure (NDC) provides a backed-up, reliable storage system (the /project space) mounted on each of Cedar and Graham. So generally users should copy their data to one of those two systems.

A few special notes:

  • If you have a RAC 2017 award you may have been allocated to Cedar or Graham, therefore you should choose the PROJECT storage attached to that system;
  • If you do not have a RAC 2017 award then you may use Rapid Access Service (RAS) storage amounts at either SFU or Waterloo at your discretion.

See the following useful pages:

Niagara at Toronto will also have extensive storage resources, but these are still being designed so users should generally move data to Cedar or Graham.

Compute Resources

Both Cedar and Graham are now in production and running with RAC 2017 resource allocations and priorities. All users have accounts so users on legacy systems can migrate to Cedar or Graham. Note that the legacy systems will continue to operate until Mar.31/2018 under the RAC2017 allocations so those users with allocations will want to continue running on the legacy systems.

The Niagara large parallel system in Toronto is currently (Dec.11/2017) being delivered and is being allocated for RAC 2018. We expect the system to become available later in the winter. Those of you with large parallel requirements will be allocated to Niagara and will be notified as usual through the RAC process (mid-March).

Users Applying for RAC 2018 Awards

WestGrid's legacy systems will be defunded by Mar.31/2018 and many users have already migrated to the new Cedar or Graham systems. All RAC 2018 allocations will be implemented on the new systems. Continuing RAC awards will be moved to the new systems during the RAC 2018 allocation process. We will do our best to satisfy specific requests but may have to juggle a few awards to ensure reasonably consistent usage.

Software Available on New Systems

Software lists have been compiled and installation scripts developed so that software can be (mostly) automatically installed in Cedar and Graham as soon as they are available. The software list is continuously updated: current list of available software.

Some commercial software will be licensed nationally. This is still under discussion within Compute Canada. Please check back for updates.

Code and Job Migration

Please keep in mind that you may need to re-compile or re-install your software and any required packages on new systems. This can be a time-consuming process. Our support staff have considerable expertise in such tasks, so please feel free to contact support@westgrid.ca for help.

See Code and job migration_from legacy systems for various details that should be kept in mind.

How to migrate data?

User Responsibility for Data Migration

We would like to emphasize that each user is responsible for copying his/her data from the to-be-defunded systems. WestGrid has no ability to track individual data transfers so we cannot send out individual reminders, and the defunding dates are hard deadlines. You must copy your data to a new, reliable site well before the defunding date. WestGrid recommends starting at least 3 months before the defunding date.

General File Management Best Practices and Suggestions

  1. Users must move their own data to a new system BEFORE the defunding dates.
  2. Delete any unnecessary data and files
  3. Refrain from keeping multiple copies of your data on multiple systems.
  4. Move any remaining data not currently being used to a long term storage site.
  5. Scratch storage is not backed up, and has no long-term availability guarantees. Inactive data in scratch storage areas is subject to being purged. We have noticed that some users seem to be storing more important data on scratch. Therefore please check your scratch storage and move anything important to permanent storage.

Recommendations for Large Numbers of Files

We have a few users with very large numbers of files (many thousands). It is notoriously inefficient to transfer such a large number of files with the usual file-transfer utilities. We strongly recommend that users consider taring up their files before transfer. See HERE for an (external) tutorial on how to compress and extract files using the tar command on Linux or the much more detailed discussion and examples in the Compute Canada Wiki at archiving_and_compressing_files.

GlobusGlobus is a file transfer service [https://www.globus.org/] for Data Transfer

Globus is the ideal tool as it significantly improves transfer performance and reduces the time spent managing transfers. Users have reported 10x or even 100x improvements over other transfer methods such as SCP.

GlobusGlobus is a file transfer service [https://www.globus.org/] can be used to move data between any two WestGrid resources. All WestGrid resources are already configured as GlobusGlobus is a file transfer service [https://www.globus.org/] endpoints.

CLICK HERE for more info on using GlobusGlobus is a file transfer service [https://www.globus.org/] File Transfer. Refer to the Best practices for data migration page for detailed instructions on moving your data to a new system.

Bugaboo (Simon Fraser University)

New Defunding Date: Jan.31/2018

The defunding date for Bugaboo is now Jan.31, 2018. Bugaboo users will be emailed instructions for how to migrate their data off the system before this defunding date. To begin preparing your files, please review these General Directives for Migration. If you have questions, email support@westgrid.ca.

WestGrid appreciates Simon Fraser University's ongoing support of this system, which incurs significant costs to the institution for power and maintenance.

Bugaboo is not being continued by SFU after the defunding date, and will be decommissioned.

Storage

Bugaboo has large /home and /global/scratch filesystems. Users should migrate /home to either Cedar or Graham. The /global/scratch is not backed up and was aimed at providing high performance scratch. However we have noticed that some users have what looks like important or permanent data on /global/scratch. Please migrate any such data to /project on Cedar or Graham.

Migration

Bugaboo is a general purpose system used for both serial and parallel computation. In particular Bugaboo had a large storage system, and a few large-memory nodes. All the new general purpose systems have much larger storage systems, and also provide a mix of large memory nodes. So all would be suitable for typical bugaboo users. Since Bugaboo and Cedar are in the same datacentre it will be faster to transfer data locally, and many users may already have working relationships with local support staff.

Currently (Dec.14) we are waiting for additional capacity to be installed on Cedar so that we can move RAC allocations. Note that all users (RAC and non-RAC) have access to Cedar so you can self-migrate at any time.

Draft Bugaboo Migration Process (subject to change)

Details and updates will be sent out to Bugaboo users in January.

The overall process will be

  1. Install additional capacity on Cedar (Jan.31, 2018)
  2. Move allocated (RAC) /global/scratch user directories to /project/<groupname> on Cedar.
  3. Ask non-RAC users to self-migrate their data to Cedar.

There is an issue with /home as many users have already migrated to Cedar and we do not want to over-write any existing data. So Bugaboo home directories will not be auto-migrated. Users will be responsible for self-migration.

Immediate Recommendations (TBD)

  1. All users should start to migrate their applications and home directories to Cedar as soon as possible.
    • Feel free to continue to run jobs on Bugaboo until the last minute, but have everything ready-to-go on Cedar.
  2. Non-RAC users should migrate everything, including /global/scratch on bugaboo to Cedar.

Owncloud and Database Services

WestGrid also provides a Database service at SFU. This service is not dependent on bugaboo (nor bugaboo storage) and is independent. It is not scheduled for defunding.

Similarly for Owncloud. This is a separate service which is not scheduled for defunding.

Orcinus (University of British Columbia)

Continuation to Mar.31/2018

The University of British Columbia agreed to continue operating Orcinus for the Compute Canada user community until Mar.31, 2018. This provides a lengthy overlap period with the new systems and essentially gives users ample time migrate their data and codes. WestGrid very much appreciates this offer which incurs significant costs to the institution for power and maintenance.

UBC is considering continuing Orcinus as a UBC service for UBC members. Please contact your UBC ARC support services for future plans.

Storage

Orcinus has a small attached disc system providing /home and /globalscratch. Users should migrate /home to either Cedar or Graham. The Orcinus /globalscratch is not backed up and was aimed at providing high performance scratch. However we have noticed that some users have what looks like important or permanent data on /globalscratch. Please migrate any such data to Cedar or Graham.

Migration

Orcinus is a general purpose system used for both serial and parallel computation. Any of the new general purpose systems would be suitable for typical orcinus users, and the new large parallel system Niagara will be available for RAC 2018 (April 1, 2018).

Grex (University of Manitoba)

Continuation to Mar.31/2018

The University of Manitoba agreed to continue operating Grex for the Compute Canada user community until Mar.31, 2018. This provides a lengthy overlap period with the new systems and gives users ample time to migrate their data and codes. WestGrid very much appreciates this offer which incurs significant costs to the institution for power and maintenance.

UManitoba is considering continuing Grex as a service for local UManitoba researchers. Please ask your local UManitoba HPC support for future plans.

Storage

Grex has a small local /home and /global/scratch filesystems. Users should migrate /home to either Cedar or Graham. The /global/scratch is not backed up and was aimed at providing high performance scratch. However we have noticed that some users have what looks like important or permanent data on /global/scratch. Please migrate any such data to Cedar or Graham.

Migration

Grex is a general purpose system generally aimed at parallel computation (non-blocking InfiniBand interconnects). Grex also provided a number of large memory nodes. The new general purpose systems are at least as performant although they are aimed more at smaller jobs (less than 1,000 cores). However this is amply sufficient for most grex users.

The new large parallel system Niagara will be available for the RAC 2018 year and would be an excellent alternative for those users with large parallel ambitions.

Parallel (University of Calgary)

Continuation to Mar.31/2018

The University of Calgary agreed to continue operating Parallel for the Compute Canada user community until Mar.31, 2018. This provides a lengthy overlap period with the new systems and essentially gives users ample time migrate their data and codes. WestGrid very much appreciates this offer which incurs significant costs to the institution for power and maintenance.

Please ask your local UCalgary IT services for future plans.

Storage

Parallel has small local /home and /global/scratch filesystems. Users should migrate /home to either Cedar or Graham. The /global/scratch is not backed up and was aimed at providing high performance scratch. However we have noticed that some users have what looks like important or permanent data on /global/scratch. Please migrate any such data to Cedar or Graham.

Migration

Parallel is aimed at parallel computation, and also provides GPU nodes. The new general purpose systems are at least as performant although they are aimed more at smaller jobs (less than 1,000 cores). However this is amply sufficient for most parallel users. Note that jobs >1,000 cores will run, but will communicate between islands so there will be additional latency issues.

The new General Purpose systems also provide a large number of modern GPU nodes (NVidia P100's). Parallel-gpu users can use any of the GP systems.

The new large parallel system Niagara will be available for the RAC 2018 year and would be an excellent alternative for those users with large parallel ambitions. Niagara should be available for RAC 2018 (April 1).

Support & Other Links

Please email support@westgrid.ca for general help. See also WestGrid Institutional Support Contacts for help with systems that will continued by their institutions.

Check out the following links for more tools, tips and support related to migration:

Previously Defunded Systems Notes

System Site Defunding Date Recommended Migration Date(*) Notes
Nestor and Hermes University of Victoria June 1, 2017 (COMPLETE)

UVic may keep Hermes as a local system. For details please ask local IT support.

Breezy and Lattice University of Calgary August 31, 2017 (COMPLETE) Parallel and the shared storage system will continue until Mar.31, 2018. UCalgary may keep Breezy, Lattice and Parallel as local systems. For details please ask local IT support
Jasper and Hungabee University of Alberta September 30, 2017 (COMPLETE)

UofA is intending to keep jasper and hungabee going. For details please ask local IT support.

Document History

v1.8 Dec.11, 2017 Bugaboo defunding delayed. Removed details for previously defunded systems.
v1.7 Oct.19, 2017 Bugaboo migration and defunding plans updated.
v1.6 Sep.12, 2017 2018 migration and defunding plans added.
v1.5 Aug 22, 2017 Some new systems are now available and in production. The text has been updated to reflect actuals.
v1.4 June 30, 2017 RAC 2017 allocations have been implemented on the continuing WG legacy systems.
v1.3 May 31, 2017 Schedule updates and various details for systems to be defunded in 2017.
v1.2 Mar.20, 2017 Link to software availability documentation
v1.1 Mar.9, 2017 Nestor has been extended to June 1, 2017.
v1.0 Mar.8, 2017 UofC dates revised to Aug.31. General clean-up of the docs.
v0.92 Mar.7, 2017 Revised dates. University of Alberta details updated. User access to Jasper & Hungabee extended to September 30, 2017.
v0.91 Mar 3, 2017 UVic nestor/hermes details updated and confirmed (to June 1, 2017)