AI and Machine Learning: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 27: Line 27:


<!--T:24-->
<!--T:24-->
<b>Switching to virtualenv is easy in most cases. Just install all the same packages, except CUDA, CuDNN and other low level libraries, which are already installed on our clusters.</b>
<b>Switching to virtualenv is easy in most cases. Just install all the same packages, except CUDA, CuDNN and other low-level libraries, which are already installed on our clusters.</b>


== Useful information about software packages == <!--T:5-->
== Useful information about software packages == <!--T:5-->
Line 56: Line 56:
* If your dataset is around 10 GB or less, it can probably fit in the memory, depending on how much memory your job has. You should not read data from disk during your machine learning tasks.
* If your dataset is around 10 GB or less, it can probably fit in the memory, depending on how much memory your job has. You should not read data from disk during your machine learning tasks.
* If your dataset is around 100 GB or less, it can fit in the local storage of the compute node; please transfer it there at the beginning of the job. This storage is orders of magnitude faster and more reliable than shared storage (home, project, scratch). A temporary directory is available for each job at $SLURM_TMPDIR. An example is given in [[Tutoriel_Apprentissage_machine/en|our tutorial]]. A caveat of local node storage is that a job from another user might be using it fully, leaving you no space (we are currently studying this problem). However, you might also get lucky and have a whole terabyte at your disposal.
* If your dataset is around 100 GB or less, it can fit in the local storage of the compute node; please transfer it there at the beginning of the job. This storage is orders of magnitude faster and more reliable than shared storage (home, project, scratch). A temporary directory is available for each job at $SLURM_TMPDIR. An example is given in [[Tutoriel_Apprentissage_machine/en|our tutorial]]. A caveat of local node storage is that a job from another user might be using it fully, leaving you no space (we are currently studying this problem). However, you might also get lucky and have a whole terabyte at your disposal.
* If your dataset is larger, you may have to leave it in the shared storage. You can leave your datasets permanently in your project space. Scratch space can be faster, but it is not for permanent storage. Also, all shared storage (home, project, scratch) are for storing and reading at low frequencies (e.g. 1 large chunk every 10 seconds, rather than 10 small chunks every second).
* If your dataset is larger, you may have to leave it in the shared storage. You can leave your datasets permanently in your project space. Scratch space can be faster, but it is not for permanent storage. Also, all shared storage (home, project, scratch) is for storing and reading at low frequencies (e.g. 1 large chunk every 10 seconds, rather than 10 small chunks every second).


=== Datasets containing lots of small files (e.g. image datasets) === <!--T:11-->
=== Datasets containing lots of small files (e.g. image datasets) === <!--T:11-->
Line 101: Line 101:


<!--T:28-->
<!--T:28-->
[[Weights & Biases (wandb)]] and [[Comet.ml]] can help you get the most out of your compute allocation, by:
[[Weights & Biases (wandb)]] and [[Comet.ml]] can help you get the most out of your compute allocation, by


<!--T:29-->
<!--T:29-->
rsnt_translations
53,756

edits

Navigation menu