Size: 4305
Comment:
|
Size: 5282
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= MESSy/CLaMS = | = MESSy/CLaMS: Restarts = |
Line 3: | Line 3: |
== Restarts == | If the restart event is triggered, the status of the model is dumped with full precision to restart files |
Line 5: | Line 5: |
* Am Ende des Laufes werden Restart-Files geschrieben | === Write restart files === |
Line 7: | Line 7: |
* Es können zusätzlich Restart-Files in einem vorgegebenen Zeittakt erstellt werden. Die entsprechenden Einstellungen werden in ''messy/nml/DEFAULTS/timer.nml'' vorgenommen: |
* At the end of a MESSy simulation restart files are written. * Restart files can be written in a given simulation time interval. The simulation can be interrupted and restarted automatically when a given number of cycles is reached (TIMER-User-Manual, 4.4). The interval and the number of cycles can be specified in the messy-script and are replaced in '''timer.nml''', e.g.:<<BR>><<BR>> timer.nml: |
Line 10: | Line 13: |
IO_RERUN_EV = 1,'month','first',0, NO_CYCLES = 12 ! restart cycles without break |
IO_RERUN_EV = ${RESTART_INTERVAL},'${RESTART_UNIT}','last',0, NO_CYCLES = ${NO_CYCLES}, |
Line 13: | Line 16: |
=> Restart-Files jeden Monat, nach 12 Zyklen (im Beispiel also nach einem Jahr) wird der Lauf unterbrochen und neu gestartet (s. TIMER-User-Manual) | messy-script: {{{ ### INTERVAL FOR WRITING (REGULAR) RESTART FILES ### Note: This has only an effect, if it is not explicitely overwritten ### in your timer.nml; i.e., make sure that in timer.nml ### IO_RERUN_EV = ${RESTART_INTERVAL},'${RESTART_UNIT}','last',0, ### is active! ### RESTART_UNIT: steps, hours, days, months, years RESTART_INTERVAL=1 RESTART_UNIT=months NO_CYCLES=12 }}} => Restart files are witten at the beginning of a new month and after 12 months the simulation will be interrupted and restarted automatically. |
Line 15: | Line 30: |
* Nach einer bestimmten Laufzeit (Anz. CPU-Stunden) kann ein automatischer Restart erfolgen. Die Anzahl Rechenstunden wird in ''qtimer.nml'' vorgegeben, z. B.: |
* If starttime of model is noon and restart files should be written at noon:<<BR>> use unit 'hours' {{{ IO_RERUN_EV = 240,'hours','first',0 }}} or use unit 'days' and set offset to 86400 [sec] {{{ IO_RERUN_EV = 10,'days','first',86400 }}} This offset is not allowed for units 'years' and 'months'. * If the job is submitted to a queue manager, it might be necessary to split the simulation into chain elements. The submodel QTIMER triggers the restart just before the maximum time reserved by the scheduler is reached (Development cycle 2 of the Modular Earth Submodel System, section 4). The queue time limit QWCH can be specified the messy-script and is replaced in '''qtimer.nml''', e.g.:<<BR>><<BR>> qtimer.nml |
Line 19: | Line 50: |
QTIME = 4,0,0, ! Queue-Zeitlimit (hh,mm,se); 0,0,0: kein Limit QCLOCK = 'wall', ! 'wall'|'cpu'|'user'|'sys' QFRAC = 0.95 ! Anteil des Zeitlimits, nachdem der Lauf unterbrochen und Restart-Files geschrieben werden }}} => Wenn 95% der 4 Stunden Rechenzeit erreicht werden, werden Restart-Files geschrieben und der Lauf an dieser Stelle neu gestartet <<BR>> (s. Development cycle 2 of the Modular Earth Submodel System) <<BR>> |
QTIME = $QWCH,0,0, ! queue time limit (hh,mm,se); 0,0,0 to switch off QCLOCK = 'wall', ! queue clock type (wall|cpu|user|sys) QFRAC = 0.95 ! usable fraction of queue time limit }}} |
Line 27: | Line 55: |
* Folgende Restart-Files werden bei einem CLaMS-Lauf erstellt: | messy-script: {{{ QWCH=4 }}} |
Line 29: | Line 60: |
* restart_cccc_clams.nc: * dnparts * grid_switch * year_uvfirst, month_uvfirst, day_uvfirst, hour_uvfirst * chem. Spezies * restart_cccc_CLAMS.nc * restart_cccc_clamstraj.nc * JULSEC * LAT, LAT_OLD, LAT_OLD_MIX * LON, LON_OLD, LON_OLD_MIX * LEV, LEV_OLD * TEMP, TEMP_OLD * PRESS, PRESS_OLD * restart_cccc_winddata.nc * LAT, LON, LEV * UDT, VDT, WDT, LEVELDT * UFUT, VFUT, WFUT, LEVELFUT * PREDATA_TEMP, FUTDATA_TEMP * PREDATA_PRESS, FUTDATA_PRESS * Es werden Restart-Files für jeweils bis zu 5 Zyklen erstellt (da in jeder der entsprechenden SMIL-Routinen ''messy_write_output'' aufgerufen wird!) : * nach traj * nach dissoc * nach chem * nach mix * nach bmix => Für jedes dieser Submodule läßt sich der Output (und damit auch das Restartfile) ausschalten (in zugehöriger Namelist: ''loutput_paketname=.false.'') |
=> When 95% of 4 hours CPU time are reached, restart files are written and the next chain-element is started. |
Line 57: | Line 62: |
* Die Nummer des letzten Laufes steht auf der Datei ''MSH_NO''. Die ab der aktuellen Zeit zu nutzenden Restart-Files sind in das Ausgabeverzeichns gelinkt. <<BR>> * Bei einem Neustart wird dort aufgesetzt <<BR>> * Soll von Beginn neu gestartet werden, muss die Datei ''MSH_NO'' gelöscht werden <<BR>> * Soll an einer anderen Stelle wieder aufgesetzt werden, kann man die entsprechenden Einstellungen mit dem folgenden Skript vornehmen: |
=== Restart model === * If the file '''MSH_NO''' is in the working-directory, the model is started in rerun-mode. MSH_NO contains the number of the last chain-element. <<BR>> If you want run the simulation again from the beginning, remove file MSH_NO before starting the run script. * All files needed for a rerun starting from a specific chain element are saved in the subdirectory ''save/NNNN'' of the working directory.<<BR>> NNNN is the 4-digit number of the last complete chain element. <<BR>> The restart files of the last chain-element are linked into the working directory.<<BR>><<BR>> In order to start a rerun with chain element NNNN+1, the script '''messy/util/init_restart''' can be used to link the correct restart files: {{{ messy-dir/messy/util/init_restart -r NNNN -c MMMM [-d dir] }}} NNNN: restart number (number of batch job in a job chain) <<BR>> MMMM: cycle number (number of restart within a batch job) * Get restart and cycle number <<BR>><<BR>> The script |
Line 63: | Line 85: |
init_restart -r nnnn -c mmmm [-d dir] }}} nnnn: restart number <<BR>> mmmm: cycle number |
messy-dir/messy/util/show_restarts.tcsh -c clams }}} (called in working directory) lists all restart and cycle numbers and corresponding dates in the subdirectory ''save'' * Continue a previous simulation: |
Line 68: | Line 92: |
* Ausgabe auf Restart-Files in MESSy-CLaMS: | * Create a new output directory and call ''init_restart'' '''from within this directory''': {{{ mkdir newdir cd newdir ~/messy-2.54.0-clams/messy/util/init_restart -d olddir -r NNNN -c CCCC }}} * The executable and the messy-script are copied to the new directory * The save-directory with namelists is copied and the namelists for the specified rerun and cycle number are linked * If you do not create a new output directory: * check, that the namelists with the correct rerun and cycle number are linked * remove '''END''' files |
Line 70: | Line 104: |
* In einzelnen Submodules (aus SMIL-Routinen) wird ''messy_write_output'' (''messy_main_control_clams.f90'') aufgerufen <<BR>> * In ''messy_write_output'' wird ''messy_channel_write_output'' (''messy_main_channel_bi.f90'') mit ''IOMODE_OUT'' aufgerufen <<BR>> => schreibe Channels/Channelobjekte auf Output-File *In ''messy_write_output'' wird ''messy_channel_write_output'' mit ''IOMODE_RST'' aufgerufen, falls ''l_rerun=.true.'' <<BR>> => schreibe Channels/Channelobjekte auf Restart-File (falls Restart-Event gesetzt) * program clams (clams_main.f90), innerhalb der Zeitschleife: <<BR>> -> sub. messy_global_start (messy_main_control_clams.f90) <<BR>> -> sub. main_timer_global_start (messy_main_timer_bi.f90) <<BR>> -> setze l_rerun: (true, wenn nächster Rerun-Ausgabezeitschritt oder maximale Rechenzeit erreicht) |
* Changes in messy-script: * Change WORKDIR to new output directory * Change the stop date * Do not change the start date ! |
Line 80: | Line 109: |
* Einlesen der Restart-Files in MESSy-CLaMS: | * If you want to modify the namelists, edit the namelist files '''in subdirectory ''nml'' ''' |
Line 82: | Line 111: |
clams_main <<BR>> -> messy_channel_read_restart (messy_main_channel_bi.f90) <<BR>> -> channel_read_data (messy_main_channel_io.f90) <<BR>> |
* Start the messy-script in the new output directory |
Line 86: | Line 113: |
* Restart-Files in ECHAM: | * Restarts after abnormal termination: |
Line 88: | Line 115: |
messy/echam5/bmil/messy_main_control_e5.f90: <<BR>> sub. messy_write_output <<BR>> -> messy_channel_write_output(IOMODE_OUT) (messy_main_channel_bi.f90) <<BR>> in echam5/src/stepon.f90 <<BR>> -> messy_channel_write_output(IOMODE_RST) <<BR>> |
If the model stops due to an occured error or a hardware problem, it can be restarted manually.<<BR>> In this case all restart files are located in the working directory and not saved to subdirectories ''save/NNNN''.<<BR>> To clean up the working directory and move the restart files to subdirectories call the run-script with option '-c': {{{ xmessy_mmd -c }}} |
Line 94: | Line 122: |
Z.B. Restart-Files einmal pro Monat: <<BR>> In ''messy/nml/DEFAULTS/timer.nml'': <<BR>> IO_RERUN_EV = 1,'months','first',0, |
Then the recent restart files can be linked into the working directory {{{ messy-dir/messy/util/init_restart -r NNNN -c CCCC }}} Now the model can be restarted. * The name of the experiment (''EXP_NAME'' in run-script) must not contain the substring ''restart''. <<BR>> All files ''*restart*'' are removed before linking the current restart files. |
MESSy/CLaMS: Restarts
If the restart event is triggered, the status of the model is dumped with full precision to restart files
Write restart files
- At the end of a MESSy simulation restart files are written.
Restart files can be written in a given simulation time interval. The simulation can be interrupted and restarted automatically when a given number of cycles is reached (TIMER-User-Manual, 4.4). The interval and the number of cycles can be specified in the messy-script and are replaced in timer.nml, e.g.:
timer.nml:IO_RERUN_EV = ${RESTART_INTERVAL},'${RESTART_UNIT}','last',0, NO_CYCLES = ${NO_CYCLES},
messy-script:### INTERVAL FOR WRITING (REGULAR) RESTART FILES ### Note: This has only an effect, if it is not explicitely overwritten ### in your timer.nml; i.e., make sure that in timer.nml ### IO_RERUN_EV = ${RESTART_INTERVAL},'${RESTART_UNIT}','last',0, ### is active! ### RESTART_UNIT: steps, hours, days, months, years RESTART_INTERVAL=1 RESTART_UNIT=months NO_CYCLES=12
=> Restart files are witten at the beginning of a new month and after 12 months the simulation will be interrupted and restarted automatically.
If starttime of model is noon and restart files should be written at noon:
- use unit 'hours'
IO_RERUN_EV = 240,'hours','first',0
or use unit 'days' and set offset to 86400 [sec]IO_RERUN_EV = 10,'days','first',86400
This offset is not allowed for units 'years' and 'months'.
- use unit 'hours'
If the job is submitted to a queue manager, it might be necessary to split the simulation into chain elements. The submodel QTIMER triggers the restart just before the maximum time reserved by the scheduler is reached (Development cycle 2 of the Modular Earth Submodel System, section 4). The queue time limit QWCH can be specified the messy-script and is replaced in qtimer.nml, e.g.:
qtimer.nml&CTRL QTIME = $QWCH,0,0, ! queue time limit (hh,mm,se); 0,0,0 to switch off QCLOCK = 'wall', ! queue clock type (wall|cpu|user|sys) QFRAC = 0.95 ! usable fraction of queue time limit
messy-script:QWCH=4
=> When 95% of 4 hours CPU time are reached, restart files are written and the next chain-element is started.
Restart model
If the file MSH_NO is in the working-directory, the model is started in rerun-mode. MSH_NO contains the number of the last chain-element.
If you want run the simulation again from the beginning, remove file MSH_NO before starting the run script.All files needed for a rerun starting from a specific chain element are saved in the subdirectory save/NNNN of the working directory.
NNNN is the 4-digit number of the last complete chain element.
The restart files of the last chain-element are linked into the working directory.
In order to start a rerun with chain element NNNN+1, the script messy/util/init_restart can be used to link the correct restart files:
messy-dir/messy/util/init_restart -r NNNN -c MMMM [-d dir]
NNNN: restart number (number of batch job in a job chain)
MMMM: cycle number (number of restart within a batch job)Get restart and cycle number
The scriptmessy-dir/messy/util/show_restarts.tcsh -c clams
(called in working directory) lists all restart and cycle numbers and corresponding dates in the subdirectory save
- Continue a previous simulation:
Create a new output directory and call init_restart from within this directory:
mkdir newdir cd newdir ~/messy-2.54.0-clams/messy/util/init_restart -d olddir -r NNNN -c CCCC
- The executable and the messy-script are copied to the new directory
- The save-directory with namelists is copied and the namelists for the specified rerun and cycle number are linked
- If you do not create a new output directory:
- check, that the namelists with the correct rerun and cycle number are linked
remove END files
- Changes in messy-script:
- Change WORKDIR to new output directory
- Change the stop date
- Do not change the start date !
If you want to modify the namelists, edit the namelist files in subdirectory nml
- Start the messy-script in the new output directory
- Restarts after abnormal termination:
If the model stops due to an occured error or a hardware problem, it can be restarted manually.
In this case all restart files are located in the working directory and not saved to subdirectories save/NNNN.
To clean up the working directory and move the restart files to subdirectories call the run-script with option '-c':xmessy_mmd -c
Then the recent restart files can be linked into the working directorymessy-dir/messy/util/init_restart -r NNNN -c CCCC
Now the model can be restarted. The name of the experiment (EXP_NAME in run-script) must not contain the substring restart.
All files *restart* are removed before linking the current restart files.