General directives for migration: Difference between revisions

Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
{{Draft}}
{{Draft}}
== This page is a draft --- work in progress ==


== Notes for users to prepare for Data Migration ==
== Notes for users to prepare for Data Migration ==


This page is dedicated to users of Compute Canada clusters concerned by the process of data migration. It is also useful for other users since it explains how to proceed when it comes to transfer your data from one place to another between Compute Canada facilities and its regional partners (ACENET, Calcul Quebec, Compute Ontario and WestGrid). You will find the best practices useful links related to data archiving and migration. In this page, you will find some tips, instructions and links on how to prepare your data and archives to facilitate the migration process. Cleaning your directories and archiving your data is part of migration process. Here, we give you some tips on what to do do before, during and after the migration process.
This page is dedicated to users of Compute Canada clusters concerned by the process of data migration. It is also useful for other users since it explains how to proceed when it comes to transfer your data from one place to another between Compute Canada facilities and its regional partners ([http://www.ace-net.ca/ ACENET], [http://www.calculquebec.ca/en/ Calcul Quebec], [http://computeontario.ca/ Compute Ontario] and [https://www.westgrid.ca/ WestGrid]). You will find the best practices and useful links related to [[https://docs.computecanada.ca/wiki/Archiving_and_Compressing_Data_for_Migration|Data Archiving]] and [[https://docs.computecanada.ca/wiki/Globus|Data Transfer]] or data migration. In this page, you will find some tips, instructions and links on how to prepare your data and archives to facilitate the migration process. Cleaning your directories and archiving your data is part of migration process. Here, we give you some information and tips on what to do before, during and after the migration process.
 
== What to do before the migration starts? ==
 
It is always a good practice to go through your directories on your working directories to check what data you have and how their structure is in your directories. In most of the clusters, each user on CC has two working directories: '''/home/your_user_name''' and '''/global/scratch/your_user_name''' (in some clusters the path your directories may be different). Each user needs to go through his or her directories to check their structure and start organizing and cleaning them as needed, especially before proceeding with data migration. You should reduce the amount of the files you need to migrate by using [[Archiving_and_Compressing_Data_for_Migration|archiving utilities]] and removing any unnecessary data.
 
Taking some time to organize and clean your directories now is not a waste of time because you will considerably speed up your migration process and give you the opportunity to better manage your data on the new system(s) to retrieve your files more quickly in the near future when you will be looking for them. These are some of recommended practices:
* Bulleted list item Look at your data and see how your directories are structured and stored in your directories.
* Bulleted list item Clean your data and directories by removing any unnecessary files you do not need.
* Bulleted list item If you build programs in your home directory, start by removing all object files and keep only source files and configuration files to be able to rebuild your applications in the new clusters.
* Bulleted list item The main goal from this preparation process that consists on cleaning and archiving your data is to find a good and best way to monitor your data and for better handling of the migration process. It is easier for the secure copy protocol or file transfer programs to migrate one archive file of a reasonable size than migrating thousands of small files. These files could be archived and compressed to reduce their size. To avoid any interruption or slowing down of the migration process, it is recommended to transfer archives rather than the whole directory with all files individually.
* Bulleted list item Identify large data that can be compressed separately (this will save space and speed up the migration process).
* Bulleted list item Identify the directories with large number of small files and use archiving and compressing utilities. When it comes to transfer files or data from one system to another, it is much faster for example to transfer an archive file [archive.tar.gz or archive.tar.bz2] that has for example 1000 small files than transferring the 1000 files individually. As an example to see why it is important to archive and compress your data, some files (especially those in text format) can be reduced by more than 50 % of their initial size. The compression of some files is very low but if you have hundreds or thousands of those files, you will find that you can reduce your space by 5 to 10 % or more. The numbers here are just indications to have an idea because the compression rate depend on the type of your data and in which format they are write. For images and binaries for example, the compression rate is very low, however for text data, it can be more than 60 %. 
* Bulleted list item If you are migrating data from more than one place to a unique final destination, you may think about a better way to do not have the same names. If you do so, the first data you moved will be replaced by the last ones. To avoid these situations, it may be necessary to prepare directories according to the initial place where you moved your data. This will help you retrieve and recognize easily the origin of your data.
* Bulleted list item Choose carefully the names of the archives and be sure to do not give the same name to two different archives especially if you put them on the same directory.
* Bulleted list item Try to have a more cleaned and structured directories.
* Bulleted list item Check if your data are duplicated. It is not necessary to transfer the same file twice.
* Bulleted list item Read carefully the instruction on how to prepare data archives by visiting [[https://docs.computecanada.ca/wiki/Archiving_and_Compressing_Data_for_Migration|this web page]].
* Bulleted list item Try some test files to see how all the archiving utilities work and how to use the transfer tools.
* Bulleted list item Read carefully the instruction on how to use [[https://docs.computecanada.ca/wiki/Globus|Globus]] for file transfer. Other tools can be used to transfer but it is recommended to migrate your Data between Compute Canada facilities using [[https://docs.computecanada.ca/wiki/Globus|Globus]]. You can start by transferring few files and find out how it works.
 
== What to do during the migration process? ==
 
* Bulleted list item Be patient. Migrating data from one site to another can be long and time consuming. Depending on the amount of data you have and how many users are going to migrate their data, this process can be scheduled over few days.
* Bulleted list item if you have a huge amount of data to transfer, do not stay till the last minute to start the migration. Depending on how much data you have, it can take a while to finish the transfer. This is another reason to prepare archives before migrating your data.
* Bulleted list item Once your directories are cleaned and your data compressed, you can start the migration process to the new facilities using the Globus file transfer protocol.
* Bulleted list item Do not try to migrate the whole data at once. Depending on the number of users and the amount of the data to migrate, the file system can slow down and stops. If it happens, the migration process will take longer.
* Bulleted list item Make a schedule to migrate your data part by part. With this strategy, you can control what data you migrate. If for any reason, the system stops or your connection interrupted you will be able to try again later to migrate the same data instead of starting again if you tried to migrate your whole data or directories at once.
* Bulleted list item Make sure to check out that the process did not stop after you started the migration. For this reason, it is highly recommended to migrate your data part by part.
* Bulleted list item Be sure that you did not miss anything of your data if the migration process requires more time. It may be necessary to create a new directory for example and give a name Data_Migrated. The idea is to move any data you migrated to this directory when you are sure that the data is migrated. The next time, you want to continue with the migration process, you will just look at the data outside this directory and you will not ask yourself or have to check again in the destination directory if your data have already been transferred or not. You may also keep records of the data you moved to see what is left to migrate and make a schedule for it during .
Make sure that the data you are about to transfer are not corrupted. This can be achieved by different means (compare the file size on your local directory and the destination directory, try to uncompress the archive, or use some linux utilities).   
What to do after migration?
 
 
Connect to the remote machine and check out that your data is there and compare the size of your archives. If there is a problem during the migration, the system may copy part of the archive and not the whole file. You may see the archive name but its size will be different from the original one. In this case, the data may be corrupted and you may have to start again to transfer that file.
You can for example try to untar your files to see if the data are not corrupted. More details about how to check if your data is not corrupted during the migration process can be found on the following link (We redirect to the page with Archives preparation and Tar instructions).
 
 
 
 
Where and how to get HELP?
 
 
Use man <command> to see the options and how to use the archiving utilities.
Read again the information about archiving.
Do some tests.
Ask around you to get more help if necessary.
Contact people involved in the migration process and ask for their support by email: info.migration@westgrid.ca or support@westgrid.ca
cc_staff
411

edits

Navigation menu