General directives for migration: Difference between revisions

Jump to navigation Jump to search
Marked this version for translation
(no longer draft, mark for translation)
(Marked this version for translation)
Line 2: Line 2:


<translate>
<translate>
<!--T:1-->
This page is for users of Compute Canada clusters concerned about data migration. It explains issues related to transferring your data between Compute Canada facilities and its regional partners ([http://www.ace-net.ca/ ACENET], [http://www.calculquebec.ca/en/ Calcul Quebec], [http://computeontario.ca/ Compute Ontario] and [https://www.westgrid.ca/ WestGrid]).  
This page is for users of Compute Canada clusters concerned about data migration. It explains issues related to transferring your data between Compute Canada facilities and its regional partners ([http://www.ace-net.ca/ ACENET], [http://www.calculquebec.ca/en/ Calcul Quebec], [http://computeontario.ca/ Compute Ontario] and [https://www.westgrid.ca/ WestGrid]).  


<!--T:2-->
If you are in any doubt about details of the following advice, contact [mailto:support@computecanada.ca support@computecanada.ca] for help.
If you are in any doubt about details of the following advice, contact [mailto:support@computecanada.ca support@computecanada.ca] for help.


== What to do before the migration starts? ==
== What to do before the migration starts? == <!--T:3-->
Make sure you know whether you are responsible for your own data migration, or whether Compute Canada staff will be migrating your data. Migration of certain legacy systems like [[Migration2016:Silo|Silo]] is being handled by staff. If you are in any doubt, write [mailto:support@computecanada.ca support@computecanada.ca].
Make sure you know whether you are responsible for your own data migration, or whether Compute Canada staff will be migrating your data. Migration of certain legacy systems like [[Migration2016:Silo|Silo]] is being handled by staff. If you are in any doubt, write [mailto:support@computecanada.ca support@computecanada.ca].


<!--T:4-->
If you haven't used [[Globus]] before, read about it now and verify that it works on the system you are migrating from. Test any other tools you will use (like [http://www.howtogeek.com/248780/how-to-compress-and-extract-files-using-the-tar-command-on-linux/ tar], [https://www.gnu.org/software/gzip/manual/gzip.html gzip], [https://www.cyberciti.biz/faq/how-to-create-a-zip-file-in-unix/ zip]) on test data to ensure you know how they work before using them on important data.  
If you haven't used [[Globus]] before, read about it now and verify that it works on the system you are migrating from. Test any other tools you will use (like [http://www.howtogeek.com/248780/how-to-compress-and-extract-files-using-the-tar-command-on-linux/ tar], [https://www.gnu.org/software/gzip/manual/gzip.html gzip], [https://www.cyberciti.biz/faq/how-to-create-a-zip-file-in-unix/ zip]) on test data to ensure you know how they work before using them on important data.  


<!--T:5-->
Do not wait until the last minute to start your migration. Depending on how much data you have and how much load there is on the machines and network, you may be surprised at how long it will take to finish a large transfer. Expect hundreds of gigabytes to take hours to transfer, but give yourself days in case there is a problem. Expect terabytes to take days.
Do not wait until the last minute to start your migration. Depending on how much data you have and how much load there is on the machines and network, you may be surprised at how long it will take to finish a large transfer. Expect hundreds of gigabytes to take hours to transfer, but give yourself days in case there is a problem. Expect terabytes to take days.


=== Clean up ===
=== Clean up === <!--T:6-->
It is a good practice to look at your files regularly and see what can be deleted, but unfortunately many of us do not have the habit. A major data migration is a good reminder to clean up your files and directories. Moving less data will take less time, and storage space even on new systems is in great demand and should not be wasted.
It is a good practice to look at your files regularly and see what can be deleted, but unfortunately many of us do not have the habit. A major data migration is a good reminder to clean up your files and directories. Moving less data will take less time, and storage space even on new systems is in great demand and should not be wasted.
* If you compile programs and keep source code, delete any intermediate files. One or more of <code>make clean</code>, <code>make realclean</code>, or <code>rm *.o</code> might be appropriate, depending on your [[Make|makefile]].
* If you compile programs and keep source code, delete any intermediate files. One or more of <code>make clean</code>, <code>make realclean</code>, or <code>rm *.o</code> might be appropriate, depending on your [[Make|makefile]].
* If you find any large files named like <code>core.12345</code> and you don't know that they are, they are probably [https://en.wikipedia.org/wiki/Core_dump core dumps] and can be deleted.
* If you find any large files named like <code>core.12345</code> and you don't know that they are, they are probably [https://en.wikipedia.org/wiki/Core_dump core dumps] and can be deleted.


=== Archive and compress ===
=== Archive and compress === <!--T:7-->
Most file transfer programs move one file of a reasonable size more efficiently than thousands of small files of equal total size. If you have directories or directory trees containing many small files, use [[Archiving and compressing files|tar]] to combine (archive) them.
Most file transfer programs move one file of a reasonable size more efficiently than thousands of small files of equal total size. If you have directories or directory trees containing many small files, use [[Archiving and compressing files|tar]] to combine (archive) them.


<!--T:8-->
Large files can benefit from compression in some cases, especially text files which can usually be compressed a great deal. Compressing a file ''only'' for the purpose of transferring it, and then decompressing it at the end of the transfer, will not necessarily save time though. It depends on how small the file can be compressed, how long it takes to compress it, and the transfer bandwidth. The calculation is described in the "Data Compression and transfer discussion" of [https://bluewaters.ncsa.illinois.edu/data-transfer-doc this document] from the US National Center for Supercomputing Applications.
Large files can benefit from compression in some cases, especially text files which can usually be compressed a great deal. Compressing a file ''only'' for the purpose of transferring it, and then decompressing it at the end of the transfer, will not necessarily save time though. It depends on how small the file can be compressed, how long it takes to compress it, and the transfer bandwidth. The calculation is described in the "Data Compression and transfer discussion" of [https://bluewaters.ncsa.illinois.edu/data-transfer-doc this document] from the US National Center for Supercomputing Applications.


<!--T:9-->
If you decide compression is worthwhile you can use again use [[Archiving and compressing files|tar]] for this, or [https://www.gnu.org/software/gzip/manual/gzip.html gzip].
If you decide compression is worthwhile you can use again use [[Archiving and compressing files|tar]] for this, or [https://www.gnu.org/software/gzip/manual/gzip.html gzip].


=== Avoid duplication ===
=== Avoid duplication === <!--T:10-->
Try not to move the same data twice. If you are migrating from more than one existing system to one new system and you have data duplicated on the sources, choose one and only move the duplicate data from that one.  
Try not to move the same data twice. If you are migrating from more than one existing system to one new system and you have data duplicated on the sources, choose one and only move the duplicate data from that one.  


<!--T:11-->
Beware of files with duplicate names, but which do not contain duplicate information. Ensure that you will not accidentally over-write one file with another of the same name.
Beware of files with duplicate names, but which do not contain duplicate information. Ensure that you will not accidentally over-write one file with another of the same name.


== What to do during the migration process? ==
== What to do during the migration process? == <!--T:12-->
If it is supported at your source site, use [[Globus|Globus Online]] to set up your file transfer. It is the most user-friendly and efficient tool we know of for this task. Globus is designed to recover from network interruptions automatically. We recommend you select the following options at the bottom of the "Transfer files" screen:
If it is supported at your source site, use [[Globus|Globus Online]] to set up your file transfer. It is the most user-friendly and efficient tool we know of for this task. Globus is designed to recover from network interruptions automatically. We recommend you select the following options at the bottom of the "Transfer files" screen:
* preserve source file modification times
* preserve source file modification times
* verify file integrity after transfer
* verify file integrity after transfer


<!--T:13-->
If Globus is not supported at your source site, then the advice to compress data and avoid duplication is even more important. If you must use one of [[scp]], [[sftp]], or [[rsync]], then:
If Globus is not supported at your source site, then the advice to compress data and avoid duplication is even more important. If you must use one of [[scp]], [[sftp]], or [[rsync]], then:
* Make a schedule to migrate your data part by part. If the transfer stops for any reason you will be able to try again starting from the incomplete file, but you will not have to re-transfer files that are already complete. An organized list of files will help here.
* Make a schedule to migrate your data part by part. If the transfer stops for any reason you will be able to try again starting from the incomplete file, but you will not have to re-transfer files that are already complete. An organized list of files will help here.
* Check regularly to see that the transfer process has not stopped. File size is a good indicator of progress. If no files have changed size for several minutes, then something may have gone wrong. If restarting the transfer does not work, contact [mailto:support@computecanada.ca support@computecanada.ca].
* Check regularly to see that the transfer process has not stopped. File size is a good indicator of progress. If no files have changed size for several minutes, then something may have gone wrong. If restarting the transfer does not work, contact [mailto:support@computecanada.ca support@computecanada.ca].


<!--T:14-->
Be patient. Even with Globus, transferring large volumes of data can be time consuming. Specific transfer speeds will vary a lot, but expect hundreds of gigabytes to take hours and terabytes to take days.
Be patient. Even with Globus, transferring large volumes of data can be time consuming. Specific transfer speeds will vary a lot, but expect hundreds of gigabytes to take hours and terabytes to take days.


== What to do after migration? ==
== What to do after migration? == <!--T:15-->
If you did not use Globus, or if you did but did not check "verify file integrity", make sure that the data you have transferred are not corrupted. A crude way to do this is to compare file sizes at the source with file sizes at the destination. For greater confidence you can use [http://man7.org/linux/man-pages/man1/cksum.1.html cksum] or [http://man7.org/linux/man-pages/man1/md5sum.1.html md5sum] at each end, and see that the results match. Any files with mismatching sizes or checksums should be transferred again.
If you did not use Globus, or if you did but did not check "verify file integrity", make sure that the data you have transferred are not corrupted. A crude way to do this is to compare file sizes at the source with file sizes at the destination. For greater confidence you can use [http://man7.org/linux/man-pages/man1/cksum.1.html cksum] or [http://man7.org/linux/man-pages/man1/md5sum.1.html md5sum] at each end, and see that the results match. Any files with mismatching sizes or checksums should be transferred again.


== Where and how to get help? ==
== Where and how to get help? == <!--T:16-->
* To know how to use different archiving and compression utilities, use the Linux command like <code>man <command></code> or <code><command> --help</code>.  
* To know how to use different archiving and compression utilities, use the Linux command like <code>man <command></code> or <code><command> --help</code>.  
* Email [mailto:support@computecanada.ca support@computecanada.ca]
* Email [mailto:support@computecanada.ca support@computecanada.ca]


</translate>
</translate>
Bureaucrats, cc_docs_admin, cc_staff
2,777

edits

Navigation menu