AI and Machine Learning: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 59: Line 59:
<!--T:35-->
<!--T:35-->
* If your dataset is around 10 GB or below, it can probably fit in memory, depending on how much memory your job has. You should not read the data from disk during your machine learning task.
* If your dataset is around 10 GB or below, it can probably fit in memory, depending on how much memory your job has. You should not read the data from disk during your machine learning task.
* If your dataset is around 100 GB or below, it can fit in the local storage of the compute node; please transfer it there at the beginning of the job. This storage is orders of magnitude faster and more reliable than shared storage (home, project, scratch). A temporary directory is available for each job at $SLURM_TMPDIR. An example is given in [[Tutoriel_Apprentissage_machine/en|our tutorial]]. A caveat of local node storage is that another job might be using it fully, leaving you no space (we currently studying this problem). However, you might also get lucky and have a whole terabyte at your disposal.
* If your dataset is around 100 GB or below, it can fit in the local storage of the compute node; please transfer it there at the beginning of the job. This storage is orders of magnitude faster and more reliable than shared storage (home, project, scratch). A temporary directory is available for each job at $SLURM_TMPDIR. An example is given in [[Tutoriel_Apprentissage_machine/en|our tutorial]]. A caveat of local node storage is that another job might be using it fully, leaving you no space (we are currently studying this problem). However, you might also get lucky and have a whole terabyte at your disposal.
* If your dataset is larger, you may have to leave it in the shared storage. You can leave your datasets permanently in your project space. Scratch space can be faster, but it is not for permanent storage. Also, all shared storage (home, project, scratch) are for storing and reading large chunks of data at low frequencies / large intervals (1 second or more).
* If your dataset is larger, you may have to leave it in the shared storage. You can leave your datasets permanently in your project space. Scratch space can be faster, but it is not for permanent storage. Also, all shared storage (home, project, scratch) are for storing and reading large chunks of data at low frequencies / large intervals (1 second or more).


Bureaucrats, cc_docs_admin, cc_staff
2,232

edits

Navigation menu