General directives for migration: Difference between revisions

Jump to navigation Jump to search
→‎What to do before the migration starts?: subheaders, simplify, remove redundancy
(remove unnecessary headers)
(→‎What to do before the migration starts?: subheaders, simplify, remove redundancy)
Line 5: Line 5:
== What to do before the migration starts? ==
== What to do before the migration starts? ==


It is always a good practice to go through your directories on your working directories to check what data you have and how their structure is in your directories. In most of the clusters, each user on CC has two working directories: '''/home/your_user_name''' and '''/global/scratch/your_user_name''' (in some clusters the path your directories may be different). Each user needs to go through his or her directories to check their structure and start organizing and cleaning them as needed, especially before proceeding with data migration. You should reduce the amount of the files you need to migrate by using [[Archiving_and_Compressing_Data_for_Migration|archiving utilities]] and removing any unnecessary data.  
Test any tools you will use (like [[tar]], [[gzip]], [[zip]], or [[Globus]]) on small test data so you know how they work before starting a large transfer. If you are in any doubt about details of the following advice, contact [[support]] for help.


Taking some time to organize and clean your directories now is not a waste of time because you will considerably speed up your migration process and give you the opportunity to better manage your data on the new system(s) to retrieve your files more quickly in the near future when you will be looking for them. These are some of recommended practices:
=== Clean up ===
* Look at your data and see how your directories are structured and stored in your directories.
It is a good practice to look at your files regularly and see what can be deleted, but unfortunately many of us do not have the habit. A major data migration is a good reminder to clean up your files and directories. Moving less data will take less time, and storage space even on new systems is in great demand and should not be wasted.
* Clean your data and directories by removing any unnecessary files you do not need.
* If you compile programs and keep source code, delete any intermediate files. One or more of <code>make clean</code>, <code>make realclean</code>, or <code>rm *.o</code> might be appropriate, depending on your [[Make|makefile]].
* If you build programs in your home directory, start by removing all object files and keep only source files and configuration files to be able to rebuild your applications in the new clusters.
* If you find any large files named like <code>core.12345</code> and you don't know that they are, they are probably [https://en.wikipedia.org/wiki/Core_dump core dumps] and can be deleted.
* The main goal from this preparation process that consists on cleaning and archiving your data is to find a good and best way to monitor your data and for better handling of the migration process. It is easier for the secure copy protocol or file transfer programs to migrate one archive file of a reasonable size than migrating thousands of small files. These files could be archived and compressed to reduce their size. To avoid any interruption or slowing down of the migration process, it is recommended to transfer archives rather than the whole directory with all files individually.
 
* Identify large data that can be compressed separately (this will save space and speed up the migration process).
=== Compress and archive ===
* Identify the directories with large number of small files and use archiving and compressing utilities. When it comes to transfer files or data from one system to another, it is much faster for example to transfer an archive file [archive.tar.gz or archive.tar.bz2] that has for example 1000 small files than transferring the 1000 files individually. As an example to see why it is important to archive and compress your data, some files (especially those in text format) can be reduced by more than 50 % of their initial size. The compression of some files is very low but if you have hundreds or thousands of those files, you will find that you can reduce your space by 5 to 10 % or more. The numbers here are just indications to have an idea because the compression rate depend on the type of your data and in which format they are write. For images and binaries for example, the compression rate is very low, however for text data, it can be more than 60 %.
Most file transfer programs move one file of a reasonable size more efficiently than thousands of small files of equal total size. If you have directories or directory trees containing many small files, use [[tar]] or [[zip]] to combine (archive) and compress them.
* If you are migrating data from more than one place to a unique final destination, you may think about a better way to do not have the same names. If you do so, the first data you moved will be replaced by the last ones. To avoid these situations, it may be necessary to prepare directories according to the initial place where you moved your data. This will help you retrieve and recognize easily the origin of your data.
 
* Choose carefully the names of the archives and be sure to do not give the same name to two different archives especially if you put them on the same directory.
Large files can also benefit from compression in many cases, especially text files or numeric data stored as human-readable text. You can use again use [[tar]] for this, or [[gzip]], or [[zip]].  
* Try to have a more cleaned and structured directories.
 
* Check if your data are duplicated. It is not necessary to transfer the same file twice.
=== Avoid duplication ===
* list item Read carefully the instruction on how to prepare data archives by visiting [[https://docs.computecanada.ca/wiki/Archiving_and_Compressing_Data_for_Migration|this web page]].
Try not to move the same data twice. If you are migrating from more than one existing system to one new system and you have data duplicated on the two sources, choose one and only move the duplicate data from that one.  
* Try some test files to see how all the archiving utilities work and how to use the transfer tools.
 
* Read carefully the instruction on how to use [[https://docs.computecanada.ca/wiki/Globus|Globus]] for file transfer. Other tools can be used to transfer but it is recommended to migrate your Data between Compute Canada facilities using [[https://docs.computecanada.ca/wiki/Globus|Globus]]. You can start by transferring few files and find out how it works.
Beware of files with duplicate names, but which do not contain duplicate information. Ensure that you will not accidentally over-write one file with another of the same name.


== What to do during the migration process? ==
== What to do during the migration process? ==
Bureaucrats, cc_docs_admin, cc_staff
2,774

edits

Navigation menu