MAFFT
This site replaces the former Compute Canada documentation site, and is now being managed by the Digital Research Alliance of Canada. Ce site remplace l'ancien site de documentation de Calcul Canada et est maintenant géré par l'Alliance de recherche numérique du Canada. |
MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
Single node[edit]
MAFFT can benefit from multiple cores on a single node. For more information: https://mafft.cbrc.jp/alignment/software/multithreading.html
Note: The MAFFT_TMPDIR is set to $SLURM_TMPDIR/maffttmp when you load the module.
#!/bin/bash
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=0
module load gcc/9.3.0 mafft
mafft --globalpair --thread $SLURM_CPUS_PER_TASK input > output
Multiple nodes (MPI)[edit]
MAFFT can use MPI to align a large number of sequences: https://mafft.cbrc.jp/alignment/software/mpi.html
Note: MAFFT_TMPDIR is set to $SCRATCH/maffttmp when you load the module. If you change this temporary directory, it must be shared by all hosts.
#!/bin/bash
#SBATCH --time=04:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
#SBATCH --mem=12G
module load gcc/9.3.0 mafft-mpi
srun mafft --mpi --large --globalpair --thread $SLURM_CPUS_PER_TASK input > output