- 1 Introduction
- 2 Licensing
- 3 Documentation
- 4 Configuring your own license file
- 5 Cluster Batch Job Submission
- 6 Site Specific Usage
- 7 Additive Manufacturing
Compute Canada is a hosting provider for ANSYS . This means that we have ANSYS software installed on our clusters, but we do not provide a generic license accessible to everyone. However, many institutions, faculties, and departments already have licenses that can be used on our cluster. Once the legal aspects are worked out for licensing, there will be remaining technical aspects. The license server on your end will need to be reachable by our compute nodes. This will require our technical team to get in touch with the technical people managing your license software. In some cases, this has already been done. You should then be able to load the ANSYS modules, and it should find its license automatically. If this is not the case, please contact our Technical support, so that we can arrange this for you.
Available modules are: fluent/16.1, ansys/16.2.3, ansys/17.2, ansys/18.1, ansys/18.2, ansys/19.1, ansys/19.2, ansys/2019R2, ansys/2019R3.
The full ANSYS documentation (for the latest version) can be accessed by following these steps:
- connect to gra-vdi.computecanada.ca with tigervnc as described in VDI Nodes
- open a terminal window and start workbench:
- module load CcEnv StdEnv ansys
- in the upper pulldown menu click the sequence:
- Help -> ANSYS Workbench Help
- once the ANSYS Help page appears click:
Configuring your own license file
Our module for ANSYS is designed to look for license information in a few places. One of those places is your home folder. If you have your own license server, write the information to access into file $HOME/.licenses/ansys.lic using the following format:
setenv("ANSYSLMD_LICENSE_FILE", "<port>@<hostname>") setenv("ANSYSLI_SERVERS", "<port>@<hostname>")
Any researcher may use the non-free CMC license server or the free SHARCNET license server by simply configuring their ~/.licenses/ansys.lic file with the cluster specific settings shown in the following table. Researchers who purchase a CMC license subscription must also send their Compute Canada username to <firstname.lastname@example.org> otherwise license checkouts will fail.
In some situations you may need to ensure your ANSYS configuration on the Compute Canada clusters gives priority to the right kind of license. For example to choose a research license instead of a teaching license or visa versa. You may configure your account using the anslic_admin command as explained in this section. This must be done for each ansys module version you plan to use. A custom file with name license.preferences.xml will be placed under directory $HOME/.ansys/v201/licensing/ assuming you are using the ansys/2020R1 module. ANSYS license servers (such as those with a Multiphysics Campus Solution license) provide both Research and Teaching license types. When an Ansys job starts it will (by default) be assigned an unlimited Academic Research license (aa_r, aa_r_cfd or aa_r_cfd). If a Research license is not available at runtime a Teaching license (aa_t_a) with the following limits will be assigned instead ...
o Mechanical solver limit: 32,000 nodes or elements o CFD solver limit: 512,000 nodes, cells, elements o Geometry model limit: 50 bodies and 300 faces
Since Research licenses are typically in short supply and very expensive, researchers are encouraged to use Teaching licenses whenever possible. For example, in the case of the SHARCNET Ansys license, there are 250 limited Teaching licenses but only 25 unlimited Research licenses. To configure ANSYS to only use Teaching licenses perform the following steps on each cluster where you run jobs:
- connect to a login node with X forwarding (ssh -Y, PuTTY, MobaXTerm) or TigerVNC
- load an Ansys version such as:
module load ansys/2020R1
- click "Set License Preferences for User" button
- tick the module version you will be using, click OK
- tick Use Academic Licenses, click the Solver tab
- first: highlight "ANSYS Academic Research Mechanical and CFD"
- press: the small down arrow to specify Don't Use =
- second: highlight "ANSYS Academic Teaching Mechanical and CFD"
- press: the small down arrow to specify Use =
- click OK, File -> exit
If an ANSYS job starts with a Teaching license but exceeds one of the above limits, an error message will be written to its slurm output file before it immediately terminates ie)
[gra-login1:~/projects/path/to/my/ansys/jobs] cat -n slurm-38493219.out | grep -A 20 "Error at host" 131 Error at host: This is an educational executable, and can only be used 132 with cases containing less than 512000 cells. 133 Please exit this fluent session and start another session to continue.
Local License Servers
Before a local institutional ANSYS license server can be reached from Compute Canada systems firewall configuration changes will need to be made on both the institution side and the Compute Canada side. To start this process, contact your local ANSYS license server administrator and obtain the following information 1) fully qualified hostname of the local ANSYS license server 2) ANSYS flex port (commonly 1055) 3) ANSYS licensing interonnect port (commonly 2325) and 4) ANSYS static vendor port (site specific). Ensure the administrator is willing to open the firewall on these three ports to accept license checkout requests from your ANSYS jobs running on Compute Canada systems. Next open a ticket with <email@example.com> and send us the four pieces of information and indicate which systems(s) you want to run ANSYS on for example Cedar, Beluga, Graham/Gra-vdi or Niagara.
Cluster Batch Job Submission
The ANSYS software suite comes with multiple implementations of MPI to support parallel computation. Unfortunately, none of them supports our Slurm scheduler. For this reason, we need special instructions for each ANSYS package on how to start a parallel job. In the sections below, we give examples of submission scripts for some of the packages. If one is not covered and you want us to investigate and help you start it, please contact our Technical support.
Typically you would use the following procedure for running Fluent on one of the Compute Canada clusters:
- Prepare your Fluent job using Fluent from the "ANSYS Workbench" on your Desktop machine up to the point where you would run the calculation.
- Export the "case" file "File > Export > Case..." or find the folder where Fluent saves your project's files. The "case" file will often have a name like FFF-1.cas.gz.
- If you already have data from a previous calculation, which you want to continue, export a "data" file as well (File > Export > Data...) or find it the same project folder (FFF-1.dat.gz).
- Transfer the "case" file (and if needed the "data" file) to a directory on the project or scratch filesystem on the cluster. When exporting, you save the file(s) under a more instructive name than FFF-1.* or rename them when uploading them.
- Now you need to create a "journal" file. It's purpose is to load the case- (and optionally the data-) file, run the solver and finally write the results. See examples below and remember to adjust the filenames and desired number of iterations.
- If jobs frequently fail to start due to license shortages (and manual resubmission of failed jobs is not convenient) consider modifying your slurm script to requeue your job (upto to 4 times) as shown in the following "Fluent Slurm Script (by node + requeue)" tab. Be aware doing this will also requeue simulations that fail due to non-license related issues (such as divergence) resulting lost compute time. Therefore it is strongly recommended to monitor and inspect each slurm output file to confirm each requeue attempt is license related. When it is determined a job requeued due to a simulation issue then immediately manually kill the job progression with
scancel jobidand correct the problem.
- After running the job you can download the "data" file and import it back into Fluent with File > import > Data....
#!/bin/bash #SBATCH --account=def-group # Specify account #SBATCH --time=00-06:00 # Specify time limit dd-hh:mm #SBATCH --ntasks=16 # Specify total number cores #SBATCH --mem-per-cpu=4G # Specify memory per core #SBATCH --cpus-per-task=1 # Do not change module load ansys/2020R1 slurm_hl2hl.py --format ANSYS-FLUENT > machinefile NCORES=$((SLURM_NTASKS * SLURM_CPUS_PER_TASK)) fluent 3d -t $NCORES -cnf=machinefile -mpi=intel -affinity=0 -g -i sample.jou
#!/bin/bash #SBATCH --account=def-group # Specify account #SBATCH --time=00-06:00 # Specify time limit dd-hh:mm #SBATCH --nodes=1 # Specify number compute nodes (1 or more) #SBATCH --cpus-per-task=32 # Specify number cores per node (graham 32 or 44, cedar 32 or 48, beluga 40) #SBATCH --mem=0 # Do not change (allocates all memory per compute node) #SBATCH --ntasks-per-node=1 # Do not change module load ansys/2020R1 slurm_hl2hl.py --format ANSYS-FLUENT > machinefile NCORES=$((SLURM_NTASKS * SLURM_CPUS_PER_TASK)) fluent 3d -t $NCORES -cnf=machinefile -mpi=intel -affinity=0 -g -i sample.jou
#!/bin/bash #SBATCH --account=def-group # Specify account #SBATCH --time=00-06:00 # Specify time limit dd-hh:mm #SBATCH --nodes=1 # Specify number compute nodes (1 or more) #SBATCH --cpus-per-task=32 # Specify number cores per node (graham 32 or 44, cedar 32 or 48, beluga 40) #SBATCH --array=1-4%1 # Specify number requeue attempts (2 or more) #SBATCH --mem=0 # Do not change (allocates all memory per compute node) #SBATCH --ntasks-per-node=1 # Do not change module load ansys/2020R1 slurm_hl2hl.py --format ANSYS-FLUENT > machinefile NCORES=$((SLURM_NTASKS * SLURM_CPUS_PER_TASK)) fluent 3d -t $NCORES -cnf=machinefile -mpi=intel -affinity=0 -g -i sample.jou if [ $? -eq 0 ]; then echo "Job completed successfully! Exiting now." scancel $SLURM_ARRAY_JOB_ID else echo "Job failed due to license or simulation issue!" if [ $SLURM_ARRAY_TASK_ID -lt $SLURM_ARRAY_TASK_COUNT ]; then echo "Resubmitting now ..." else echo "Exiting now." fi fi
Fluent Journal files can include basically any command from Fluent's Text-User-Interface (TUI); commands can be used to change simulation parameters like temperature, pressure and flow speed. With this you can run a series of simulations under different conditions with a single case file, by only changing the parameters in the Journal file. Refer to the Fluent User's Guide for more information and a list of all commands that can be used.
; SAMPLE FLUENT JOURNAL FILE - STEADY SIMULATION ; ---------------------------------------------- ; lines beginning with a semicolon are comments ; Read input file (FFF-in.cas): /file/read-case FFF-in ; Run the solver for this many iterations: /solve/iterate 1000 ; Overwrite output files by default: /file/confirm-overwrite n ; Write final output file (FFF-out.dat): /file/write-data FFF-out ; Write simulation report to file (optional): /report/summary y "My_Simulation_Report.txt" ; Exit fluent: exit
; SAMPLE FLUENT JOURNAL FILE - STEADY SIMULATION ; ---------------------------------------------- ; lines beginning with a semicolon are comments ; Read compressed input files (FFF-in.cas.gz & FFF-in.dat.gz): /file/read-case-data FFF-in.gz ; Write a compressed data file every 100 iterations: /file/auto-save/data-frequency 100 ; Retain data files from 5 most recent iterations: /file/auto-save/retain-most-recent-files y ; Write data files to output sub-directory (appends iteration) /file/auto-save/root-name output/FFF-out.gz ; Run the solver for this many iterations: /solve/iterate 1000 ; Write final compressed output files (FFF-out.cas.gz & FFF-out.dat.gz): /file/write-case-data FFF-out.gz ; Write simulation report to file (optional): /report/summary y "My_Simulation_Report.txt" ; Exit fluent: exit
; SAMPLE FLUENT JOURNAL FILE - TRANSIENT SIMULATION ; ------------------------------------------------- ; lines beginning with a semicolon are comments ; Read only the input case file: /file/read-case "FFF-transient-inp.gz" ; For continuation (restart) read in both case and data input files: ;/file/read-case-data "FFF-transient-inp.gz" ; Write a data (and maybe case) file every 100 time steps: /file/auto-save/data-frequency 100 /file/auto-save/case-frequency if-case-is-modified ; Retain only the most recent 5 data (and maybe case) files: ; [saves disk space if only a recent continuation file is needed] /file/auto-save/retain-most-recent-files y ; Write to output sub-directory (appends flowtime and timestep) /file/auto-save/root-name output/FFF-transient-out-%10.6f.gz ; ##### settings for Transient simulation : ###### ; Set the magnitude of the (physical) time step (delta-t) /solve/set/time-step 0.0001 ; Set the number of time steps for a transient simulation: /solve/set/max-iterations-per-time-step 20 ; Set the number of iterations for which convergence monitors are reported: /solve/set/reporting-interval 1 ; ##### End of settings for Transient simulation. ###### ; Initialize using the hybrid initialization method: /solve/initialize/hyb-initialization ; Perform unsteady iterations for a specified number of time steps: /solve/dual-time-iterate 1000 ; Write final case and data output files: /file/write-case-data "FFF-transient-out.gz" ; Write simulation report to file (optional): /report/summary y "Report_Transient_Simulation.txt" ; Exit fluent: exit
#!/bin/bash #SBATCH --account=def-group # Specify account name #SBATCH --time=00-06:00 # Specify time limit dd-hh:mm #SBATCH --nodes=1 # Specify number compute nodes (1 or more) #SBATCH --cpus-per-task=32 # Specify number cores per node (graham 32 or 44, cedar 32 or 48, beluga 40) #SBATCH --mem=0 # Do not change (allocates all memory per compute node) #SBATCH --ntasks-per-node=1 # Do not change module load ansys/2020R1 NNODES=$(slurm_hl2hl.py --format ANSYS-CFX) cfx5solve -def YOURFILE.def -start-method "Intel MPI Distributed Parallel" -par-dist $NNODES <other options>
Note that you may get the following errors in your output file : /etc/tmi.conf: No such file or directory. They do not seem to affect the computation.
Site Specific Usage
On 31may2020 the Sharcnet license was upgraded from a CFD (Research CFD) only license to a MCS (Multiphysics Campus Solution) license including the following ANSYS Academic Research products: HF, EM, Electronics HPC, Mechanical and CFD. The Sharcnet ANSYS license supports a total of 275 running jobs consisting of 25 aa_r unlimited simulation size Research tasks and 250 aa_t_a limited simulation size Teaching tasks. There is no limit to the number of jobs a researcher can run using the Teaching tasks. There is however a 2 job limit when using the Research tasks. A total of 384 aa_r_hpc cores are available to all running ANSYS jobs with a limit of 64 cores per user. Researchers are asked to only use Teaching tasks when possible as described in the License Preferences section above. This license has been renewed for over 10 years and there is no reason to expect it will not be renewed again in coming years.
The SHARCNET license can be used by any Compute Canada user on any Compute Canada system for the purpose of Publishable Academic Research. The license is made available on a first come first serve basis. Should a large number of ANSYS jobs attempt to start on a given day, it is possible some jobs may fail to start due to insufficient tokens being available, such jobs will need to be resubmitted. If guaranteed (dedicated) token access is required for your research to progress, open a ticket and request a quote for the quantity of hpc tokens needed. Up to 2 aa_r tokens will be reserved from the main license per group to start jobs reliably if the group purchases their own dedicated aa_r_hpc tokens (128 or more). Should you need to reliably start/run more than 2 jobs then you would also want to purchase a block of 5 aa_r or the much cheaper aa_r_cfd tokens. The quote will be obtained from Simutech to ensure compatibility with the existing license (customer #446422). Prices would be at cost plus applicable taxes and the actual purchase would be done directly by the PI with Simutech after that point. Neither LS-DYNA or Lumerical are included with the Sharcnet ANSYS license. Tokens for these products may be added to the SHARCNET server for dedicated use by similarely opening a ticket and requesting a quote.
License Server File
To use the Sharcnet ansys license configure your ansys.lic file as follows:
[gra-login1:~/.licenses] cat ansys.lic setenv("ANSYSLMD_LICENSE_FILE", "firstname.lastname@example.org") setenv("ANSYSLI_SERVERS", "email@example.com")
Query License Server
Check how many licenses your username currently has in use from all features:
ssh graham.computecanada.ca module load ansys lmutil lmstat -c $ANSYSLMD_LICENSE_FILE -a | grep "Users of\|$USER"
Check how many jobs are running and the total cores currently in use from the global pool:
ssh graham.computecanada.ca module load ansys lmutil lmstat -c $ANSYSLMD_LICENSE_FILE -a | grep "aa_r:\|aa_r_hpc:"
where lines beginning with ...
o your username (if any) represent the licenses currently in use by your running jobs
o Users of aa_r: represents the total number of ANSYS Academic Research tasks in use by all users (maximum 25 jobs running)
o Users of aa_t_a: represents the total number of ANSYS Academic Teaching tasks in use by all users (maximum 250 jobs running)
o Users of aa_r_hpc: represents the total number of ANSYS hpc licenses in use by all users (maximum 384hpc cores = 640total - 256reserved). Please note the total number of aa_r_hpc licenses required to run a parallel job is calculated by subtracting 16 from the total requested in your slurm script. Therefore lmutil will report (for example) that only 16 aa_r_hpc are in use for a running 32 core job.
If you discover any licenses unexpectedly in use by your username (usually due to ansys not exiting cleanly on gra-vdi) then connect to the node where its running, open a terminal window and run the following command to terminate the rogue processes
pkill -9 -e -u $USER -f "ansys" after which your licenses should be freed. Note that gra-vdi consists of two nodes (gra-vdi3 and gra-vdi4) which researchers are randomly placed on when connecting to gra-vdi.computecanada.ca with tigervnc. Therefore its necessary to specify the full hostname (gra-vdi3.sharcnet.ca or grav-vdi4.sharcnet.ca) when connecting with tigervnc to ensure you login to the correct node before running pkill.
1) Using global Compute Canada cluster modules:
- Connect to gra-vdi.computecanada.ca with TigerVNC
module load CcEnv StdEnv/2016.4
module load ansys/2020R1
2) Using local gra-vdi modules (may provide better graphics performance):
- Connect to gra-vdi.computecanada.ca with TigerVNC
module load SnEnv
module load ansys/2019R3
- Press y then
enterto accept the two conditions.
enterto continue using the sharcnet license by default (or in the case of runwb2 your ansys.lic file if one is present)
where the cfx5 command allows starting the following components:
1) CFX-Launcher (cfx5launcher) 2) CFX-Pre (start cfx5pre directly) 3) CFD-Post (start cfx5post directly) 4) CFX-Solver (start cfx5solve directly)
To get started configure your
~/.licenses/ansys.lic file to point to a license server that has a valid ANSYS Mechanical License. This must be done on all systems where you plan to run the software.
To enable ANSYS Additive Manufacturing in your project do the following 3 steps:
- connect to gra-vdi.computecanada.ca with TigerVNC
- module load CcEnv StdEnv ansys/2019R3
- cd to the directory where your test.wbpj file is located
On a cluster:
- connect to a cluster compute node with TigerVNC
- module load ansys/2019R3
- cd to the directory where your test.wbpj file is located
- click Extensions -> Install Extension
- specify the following /path/to/AdditiveWizard.wbex then click Open: /cvmfs/restricted.computecanada.ca/easybuild/software/2017/Core/ansys/2019R3/v195/aisol/WBAddins/MechanicalExtensions/AdditiveWizard.wbex
- click Extensions -> Manage Extensions and tick Additive Wizard
- click the ACT Start Page tab X to return to your Project tab
ANSYS Additive Manufacturing can be run in GUI mode on gra-vdi with up to eight cores for 24 hours as follows:
On Gra-vdias described above in
- click File -> Open and select test.wbpj then click Open
- click View -> reset workspace if you get a grey screen
- start Mechanical, Clear Generated Data, tick Distributed, specify Cores
- click File -> Save Project -> Solve
- open another terminal and run:
top -u $USER
- kill rogue processes from previous runs if required:
pkill -9 -e -u $USER -f "ansys"
To submit an Additive job to a cluster queue, you must first prepare your additive simulation to run on Compute Canada clusters. To do this open then save your simulation (on gra-vdi OR the cluster you are working on in a salloc session) to initialize the projects internal path configuration as described above in the
Enable Additive section. Next create a slurm script in the directory where your project file is located (similar to one below) and submit it to the queue by doing:
sbatch script.txt Be sure that value of
--ntasks in the slurm script matches the Cores value last set in Mechanical in particular if moving the project to a different cluster. To change the Cores value on a cluster without opening your simulation follow the "Open Mechanical on login node" section found near the bottom of this page.
#!/bin/bash #SBATCH --account=def-account #SBATCH --time=00-06:00 # Time (DD-HH:MM) #SBATCH --ntasks=8 # Number of cores #SBATCH --mem-per-cpu=2G # Memory per core unset SLURM_GTIDS rm -f test_files/.lock module load ansys/2019R3 export KMP_AFFINITY=balanced export I_MPI_HYDRA_BOOTSTRAP=ssh export PATH=/cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin:$PATH runwb2 -B -F test.wbpj -E "Update();Save(Overwrite=True)"
For parametric studies change
UpdateAllDesignPoints() in the last line of your slurm script. For initial performance testing one can avoid the solution from being written by specifying
Overwrite=False in the slurm script so further runs to be conducted without needing to reopen the simulation in workbench (and mechanical) to clear the solution and recreate the design points. Another option is to create a replay script once and for all in workbench to perform these tasks then run it on the cluster between runs as follows. The replay file can be used in different directories by changing its internal FilePath setting accordingly.
module load ansys/2019R3 rm -f test_files/.lock runwb2 -R myreplay.wbjn
Once your additive job has been running for a few minutes a snapshot of its resource utilization on the compute node(s) can be obtained with the following the srun command. Sample output corresponding to the above eight core submission script as as follows where it can be noticed that two nodes were selected by the scheduler:
[demo@gra-login1:~] srun --jobid=jobnumber top -bn1 -u $USER | grep R | grep -v top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22843 demo 20 0 2272124 256048 72796 R 88.0 0.2 1:06.24 ansys.e 22849 demo 20 0 2272118 256024 72822 R 99.0 0.2 1:06.37 ansys.e 22838 demo 20 0 2272362 255086 76644 R 96.0 0.2 1:06.37 ansys.e PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4310 demo 20 0 2740212 271096 101892 R 101.0 0.2 1:06.26 ansys.e 4311 demo 20 0 2740416 284552 98084 R 98.0 0.2 1:06.55 ansys.e 4304 demo 20 0 2729516 268824 100388 R 100.0 0.2 1:06.12 ansys.e 4305 demo 20 0 2729436 263204 100932 R 100.0 0.2 1:06.88 ansys.e 4306 demo 20 0 2734720 431532 95180 R 100.0 0.3 1:06.57 ansys.e
After a job completes its elapsed time can be found from the "Job Wall-clock time" output from the
seff jobid. One can use this value to perform scaling tests. If the Wall-clock time decreases by ~50% when the number of cores are doubled (for example from "#SBATCH --ntasks=8" to "#SBATCH --ntasks=16") further core doubling increasements can be investigated. While jobs may run faster when the number of cores is increased, the wait time will also increase significantly unless the research group has a RAC award.
Open mechanical on login node:
This procedure explains howto initialize your mechanical environment on a cluster by opening the simulation on a cluster login node. If the simulation requires more than 8GB which is the typical login node memory limit than a cluster compute node will need to be used. When a simulation is moved to a different cluster the project will need to be opened and saved again if the path and directory location have changed.
* Login to a cluster login node with TigerVNC * Open a terminal window in vncviewer and run: [demo@beluga3:~]
module load ansys/2019R3[demo@beluga3:~]
runwb2o start Mechanical by clicking Component Systems -> Mechanical Model -> Model o under Solve for My Computer enter Cores: 8 o under Solve for My Computer tick Distributed o quit Mechanical by clicking File -> Close Mechanical o quit Workbench by clicking File -> Exit (do not save the current project)
Message Passing Interface