Talk:R

From CC Doc
Jump to navigation Jump to search

Remarks (stubbsda)[edit]

  • I think the best idea here would be to tell people to run the install.packages command on the login node (which is presumably assumed but should be made explicit).
  • The issue with rogue R jobs creating too many threads arises not so much from users writing their own parallelized R scripts I suspect but in most cases running R code found elsewhere (installed via install.packages, cloned from GitHub etc.) and these R scripts/packages adopting a greedy algorithm for thread spawning.

Changes wanted[edit]

  • Refer to Utiliser des modules instead of lengthy modules explanation
  • R on login node should only be for (1) small tests and (2) package installation
  • sbatch for non-interactive work, salloc for interactive, refer to Running jobs
  • install.packages() will fail from Graham compute nodes
  • Recommend CRAN mirrors? http://cran.utstat.utoronto.ca/, http://cran.stat.sfu.ca/, http://cloud.r-project.org/ ?
    • Is https: preferred to http: in this context?
    • Could we open paths to these from Graham compute?
  • In "Exploiting parallelism":
    • Acknowledge variety of solutions at https://cran.r-project.org/web/views/HighPerformanceComputing.html
    • Don't assume reader has prior understanding of MPI, threading, "multicore", etc etc
    • Mention library(parallel)
    • Acknowledge terminology clash: "Following snow, a pool of worker processes listening via sockets for commands from the master is called a 'cluster' of nodes."
    • I think a
Ross Dickson (talk) 20:25, 20 December 2018 (UTC)