CESM2.X on ARC4
The Community Earth System Model (CESM) is a coupled climate model. It has seven different components: atmosphere, ocean, river run off, sea ice, land ice. The Common Infrastructure for Modeling the Earth (CIME) is a python based tool for coupling these different aspects together. The latest release is CESM 2.1.3 using CIME 5.6. The centralised install version has been configured to use Intel compilier and Intel MPI.
Advanced Research Computing node 4 (ARC4) is the high performance computing facilities provided by the Research Computing group at the University of Leeds.
Contents
Getting started on ARC4
All Postgraduate researchers are entitled to an account on ARC4. You can request an account IT, as detailed on the ARC4 docs page. Taught students may also be entitled but must speak to their supervisor before submitting a request.
Once logged on, you can set up the correct module environment using
$ module use /nobackup/CESM/module_env
$ module load cesm
By default, this will load cesm2.1.3, but you can specify a version using cesm/<version>
. Each time you log off ARC4 the environment is reset, so you will need to start each new session by setting the environment. Loading the cesm
module also loads the Earth System Modeling Framework (ESMF) module, a high-perfomance, flexible software infrastructure for building and coupling Earth system models. Both modules are found within the module_env
directory.
Batch Jobs
ARC4 uses a batch scheduler called Sun Grid Engine (SGE), which allocates resources and prioritises jobs that are submitted to the compute nodes through the queue. There is a fair share policy in place, so a users’ priority will decrease with the more resources they use. More information on writing job submission scripts (job scripts) can be found here: ARC4 job scripts.
First time usage
The home directory for CESM2 code is located at /nobackup/CESM
on ARC4, where you can find the ported versions (currently 2.1.3), module environments and input data directories. The ported versions are configured to run on ARC4, with the correct batch and compiler information, and these are the versions available for use. You do not need to download your own copy of the model code unless you want to use a different version, or make changes to the code. There are a few things you need to do to prepare your account for using these versions.
To use the case scripts, such as create_newcase, you will need to be added to the UNIX group arc-cesm2, which will give you the correct group permissions. Please contact CEMAC.
Make a directory in your $HOME directory to store case setups.
$ mkdir $HOME/cesm_prep
Scripts from the central installation (/nobackup/CESM/cesm2.1.3) are copied here for use with a specific case, e.g. case.setup and env_run.xml. Your run submission outputs, run.*.o* and run.*.e*, will also be created and updated here.
Data output and detailed log files are written to a directory in /nobackup
. If you do not already have a user directory there, you need to make one using
$ mkdir /nobackup/<username>
and replace <username> with an appropriate name.
It is important to note that any files written to your /nobackup directory will not be backed up, as the name implies. This is a temporary storage place and any output that you wish to keep must be moved to appropriate storage, else it will be deleted after 90 days without access. Your $HOME directory is backed up once a week, however it is limited to 10GB per user.
Preparing a case
Change directory to your cesm_prep
directory and use the create_newcase
script, located in the central installation to create your case,
$ cd $HOME/cesm_prep
$ $CESM_HOME/cime/scripts/create_newcase --case $HOME/cesm_prep/<case_name> --compset <compset> --res <grid_resolution>
Change working directory into the case directory
$ cd $HOME/cesm_prep/case_name
Run setup and build
$ ./case.setup
$ ./case.build
After running ./case.setup
, you can use ./preview_run
to see the information that will be used at run time. This includes the number of nodes requested, the commands for setting the environment and the run commands. It is useful to check the run command is submitting your job to the correct queue, with the correct resources, and whether archiving is set up correctly.
Input data
The model is configured to look for the input data in a central directory /nobackup/CESM/inputdata
so that all users can share downloaded files. This is currently read only to all users except CEMAC staff. If you find your case is missing files, please contact CEMAC (cemac-support@leeds.ac.uk) and include the compset and resolution configuration that you want.
Archiving
The CIME framework allows for short-term and long-term archiving of model output. This is particularly useful when the model is configured to output to a small storage space and large files may need to be moved during larger simulations. On ARC4, the model is configured to use short-term archiving, but not yet configured for long-term archiving. Short-term archiving is on by default for compsets and can be toggled on and off using the DOUT_S
parameter set to True or False (see Making changes to a case). When DOUT_S=TRUE
, calling ./case.submit
will automatically submit a “st_archive” job to the batch system that will be held in the queue until the main job is complete. This can be configured in the same way as the main job for a different queue, wallclock time, etc, however the default should be appropriate in most cases. Note that the main job and the archiving job share some parameter names so a flag (--subgroup
) specifying which you want to change, if not both, should be used. The archive is currently set up to move .nc files and logs from /nobackup/<username>/case_sims/<case_root>
to /nobackup/<username>/case_sims/archive
. As such, the quota being used is the communal /nobackup
space whether archiving is switched on or off. There is a lot of storage available in this space, however it is not backed up and should only be left there for short periods as it will be removed after 90 days. If a user wants to archive their files directly to a different location, this can be set using the DOUT_S_ROOT
parameter.
Making changes to a case
After creating a new case, the CIME functions can be used to make changes to the case setup, such as changing the wallclock time, number of cores etc. ARC4 has a maximum job time limit of 24 hours and has 40 cores per node.
You can query settings using the function
$ ./xmlquery <name_of_setting>
Adding -p
as a flag allows you to look up partial names, e.g.
$ ./xmlquery -p JOB
Output:
Results in group case.run
JOB_QUEUE: 40core-192G.q
JOB_WALLCLOCK_TIME: 01:30:00
Results in group case.st_archive
JOB_QUEUE: 40core-192G.q
JOB_WALLCLOCK_TIME: 0:20:00
When you know which setting you want to change, you can do so using
$ ./xmlchange <name_of_setting>=<new_value>
For example to change the wallclock time to 30 minutes, without knowing the exact name, you could do
$ ./xmlquery -p WALLCLOCK
Output:
Results in group case.run
JOB_WALLCLOCK_TIME: 01:30:00
Results in group case.st_archive
JOB_WALLCLOCK_TIME: 0:20:00
$ ./xmlchange JOB_WALLCLOCK_TIME=00:30:00 --subgroup case.run
$ ./xmlquery JOB_WALLCLOCK_TIME
Output:
Results in group case.run
JOB_WALLCLOCK_TIME: 00:30:00
Results in group case.st_archive
JOB_WALLCLOCK_TIME: 0:20:00
Note
The flag --subgroup case.run
is used to change only the main job wallclock without affecting the st_archive wallclock.
Note
If you try to set a parameter equal to a value that is not known to the program, it might suggest using a --force
flag. This may be useful, for example, in the case of using a queue that has not been configured yet, but use with care!
Some changes to the case must be done before calling ./case.setup
or ./case.build
, otherwise the case will need to be reset or cleaned, using ./case.setup --reset
and ./case.build --clean-all
. These are as follows:
Before calling
./case.setup
, changes toNTASKS
,NTHRDS
,ROOTPE
,PSTRID
andNINST
must be made, as well as any changes to theenv_mach_specific.xml
file, which contains some configuration for the module environment and environment variables.Before calling
./case.build
,./case.setup
must have been called and any changes toenv_build.xml
andMacros.make
must have been made. This includes whether you have edited the file directly, or used./xmlchange
to alter the variables.
Many of the namelist variables can be changed just before calling ./case.submit
.
Submitting a job
To use ARC4 compute nodes, you can submit a job to the queue through a batch scheduler. It requires you to request the number of cores (or nodes), memory and time needed and pointing to the program you wish to run. The batch scheduler takes your resources into account and allocates your job a place in the queue. More information on writing job submission scripts (job scripts) can be found here: ARC4 job scripts.
For CESM2, the CIME framework has been configured for ARC4 so that you can use the package functions to submit jobs. Before you do so, you can preview commands that will be run at submission time using
$ ./preview_run
You can submit the job using
$ ./case.submit
The default queue is 40core-192G, which has 5960 cores available for use (ARC4 has 40 cores per node). There are other queues available, though most are privately owned. If you have access to another queue through a specific project, you can choose it when preparing your case using ./xmlchange
. On ARC4, usually the project would be specified and it would automatically sort your submitted job to the correct queue assigned to that project. The way CESM2 is set up, if only the project is specified, it will default to the main queue causing your job to hang. Both the project and the associated queue must be specified, e.g.
$ ./xmlchange PROJECT=<name_of_project>
$ ./xmlchange JOB_QUEUE=<name_of_queue>
Note
This will change the PROJECT and JOB_QUEUE for the short-term archive job as well (st_archive). If you only want to change the queue of your main job, you can add the --subgroup case.run
flag as in Making changes to a case.
Note
If you try to set a queue name that is not known, you will need to use the --force
flag.
You can check the qsub command again using ./preview_run
and the case can then be run using the submit command, as before with ./case.submit
.
Monitoring jobs
You can check the status of all your jobs using the command
$ qstat -u $USER
Also, if you want to check the demand on the queues you can use
$ qstat -g c
to see a table of node availability.
Jobs can be cancelled using
$ qdel <job_id>
For further information on the ARC4 docs, see Monitoring jobs on ARC4.
Troubleshooting
If a run fails, the first place to check is in your run submission outputs, run.*.o* and run.*.e*, within $HOME/cesm_prep/<case_name>
. In particular, the .e file (for error output) may give some indication and it might point you to the cesm log files in /nobackup/<username>/cesm_sims/<case_name>/run/
for more information. Additionally, in the same directory there are log files for each of the coupled models (atm, lnd, etc) where you can check for errors.
Note
If you are using short term archiving (DOUT_S=True
), these logs files will be located at /nobackup/<username>/cesm_sims/archive/<case_name>/logs/
.
For more information on the job from the batch system, you can use qacct -j <job_id>
at any time, or qstat -j <job_id>
while the job is queueing or runnning.
Helpful links
Documentation
CIME (CIME framework is used to couple the models that make up CESM)
Configurations
Information on CESM2 configurations
Available configurations and grids (select model version in the top right corner)
GitHub repositories
Contact CEMAC
CEMAC has ported CESM2.1.3 to ARC4 and maintains these docs and the CESM directory on ARC4. To begin using CESM on ARC4, or to get help, please contact the CEMAC support.
Email: cemac-support@leeds.ac.uk.