This lesson is being piloted (Beta version)

DUNE Computing Training 2024 Online Update: Submit grid jobs with JustIn

PLEASE USE THE NEW JUSTIN SYSTEM INSTEAD OF POMS

The JustIn Tutorial is currently in docdb at: JustIn Tutorial

The JustIn system is describe in detail at:

JustIn Home

JustIn Docs

Note More documentation coming soon

justIN

justIN ties together:

  1. MetaCat search queries that obtain lists of files to process

  2. Rucio knowledge of where replicas of files are

  3. a table of site-to-storage distances to make best choices about where to run each type of job

To process data using justIN:

You need to provide a jobscript(shell script) with some basic tasks:

##

justin simple-workflow <args...>

once you run the command, you get the workflow ID.

In case of any problem, you can stop your workflow by running

finish-workflow --workflow-id <ID> 

Next topics:

  1. Understand how a jobscriptis structured

  2. Process data using standard code

  3. Process data using customized fclfiles and/or customized code

  4. Select the input dataset

  5. Specify where your output should go (jobs writing to scratch)

Examples of jobscripts are provided in the GitHub production repository.

A jobscripts checklist is available in the backup

Two general remarks:

Note ALWAYS test code and jobscriptbefore sending jobs to the grid

For any large processing (MC or DATA) producing large output that has to be shared within the Collaboration, please contact the production group.

Things you can do

https://wiki.dunescience.org/wiki/Data_Collections_Manager/data_sets

Example: Let’s say you want to run mergeanafor electron neutrinos,

First: Where is the data?

for example:

fardet-hd:fardet-hd__fd_mc_2023a_reco2__full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nue_dune10kt_1x2x6__out1__validation

Dataset names tend to be self explanatory and includes the type of detector, which fcl files were used to produce it, the software version, data tier, and a tag, in this case, the tag is validation.

"files from fardet-hd:fardet-hd__fd_mc_2023a_reco2__full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nue_dune10kt_1x2x6__out1__validation ordered limit 100 "

example jobscript

https://github.com/DUNE/dune-prod-utils/blob/main/justIN-examples/submit_ana.jobscript

# fcl file and DUNE software version/qualifier to be used
FCL_FILE=${FCL_FILE:-standard_ana_dune10kt_1x2x6.fcl}
DUNE_VERSION=${DUNE_VERSION:-v09_81_00d02}
DUNE_QUALIFIER=${DUNE_QUALIFIER:-e26:prof}

a bit further down

# Setup DUNE environment
source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
setup dunesw "$DUNE_VERSION" -q "$DUNE_QUALIFIER"

and here is how you do the actual processing:

# Here is where the LArSoft command is called 
(
# Do the scary preload stuff in a subshell!
export LD_PRELOAD=${XROOTD_LIB}/libXrdPosixPreload.so
echo "$LD_PRELOAD"

lar -c $FCL_FILE $events_option -o $outFile "$pfn" > ${fname}_ana_${now}.log 2>&1
)

The scary preload is to allow xroot to read hdf5 files.

‘Process data (submit a job to the grid) if you are just using code from the base release and you don’t actually modify any of it

$ USERF=$USER $ FNALURL=’https://fndcadoor.fnal.gov:2880/dune/scratch/users’ $ justinsimple-workflow –mql”files from fardet-hd:fardet-hd__fd_mc_2023a_reco2_full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nu_dune10kt_1x2x6__out1__validation skip 5 limit 5 ordered “ –jobscriptsubmit_ana.jobscript–rss-mb 4000 –output-pattern ‘*_ana*.root:$FNALURL/$USERF” ‘You can look at your job status by using justIN dashboard https://justin.dune.hep.ac.uk/dashboard/?method=list-workflows

Custom fcl file

$ tar cvzmy_fcls.tar my_fcls
$ source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
$ setup justin
$ rm -f /tmp/x509up_u`id -u`
$ kx509
$INPUT_TAR_DIR_LOCAL=`justin-cvmfs-upload my_fcls.tar`

Wait a few minutes to check the files

$ ls -l $INPUT_TAR_DIR_LOCAL

You can look at the example at https://github.com/DUNE/dune- prod-utils/blob/main/justIN-examples/submit_local_fcl.jobscript

‘The key part of the code is the following

justin simple-workflow --mql "files from fardet-hd:fardet-hd__fd_mc_2023a_reco2__full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nu_dune10kt_1x2x6__out1__validation skip 5 limit 5 ordered ' --jobscript submit_local_fcl.jobscript --rss-mb 4000 --env INPUT_TAR_DIR_LOCAL="$INPUT_TAR_DIR_LOCAL"

Things you can do

Image

‘Process data (submit a job to the grid) if you are NOT using code from the base release and you want to use customized code

‘Probably you are developing some reconstruction algand you want to check the results in a large sample, before committing your software to GitHub

‘You can use your customized software (e.g. local installation of dunereco) and use justINto process the data with your new LArSoftmodule

‘Similar to the previous part, you will need to provide all pieces in a tar file and put them in cvmfs

$ tar cvz my_code.tar my_code ‘Here my_code.tar includes a directory with my_fcls files and one with my local products (e.g. local Products_larsoft_v09_85_00_e26_prof) this is similar to what you used to do when using jobsub and using customized code/

Things you can do

how to ‘navigate’ in justINdashboard. Example: you want to check outputs/logs for jobs from workflow 1850

To access full statistics: -sites where jobs ran -storage used for input/output

To access details of each job (see next page)

To access log files

For each file, you see where it was processed and which RucioStorage Element it came from.

How it looks like if there are failed jobs

To list storage elements (where data can be)

backup

How to setup MetaCat, Rucioand justIN(on dunegpvm) first run: /cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/bin/apptainershell –shell=/bin/bash -B /cvmfs,/exp,/nashome,/pnfs/dune,/opt,/run/user,/etc/hostname,/etc/hosts,/etc/krb5.conf –ipc–pid/cvmfs/singularity.opensciencegrid.org/fermilab/fnal-dev-sl7:latest Then: source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh setup python v3_9_15 setup rucio kx509 export RUCIO_ACCOUNT= $USER export METACAT_SERVER_URL=https://metacat.fnal.gov:9443/dune_meta_prod/app export METACAT_AUTH_SERVER_URL=https://metacat.fnal.gov:8143/auth/dune setup metacat setup justin justinversion rm -f /var/tmp/justin.session.id-u justintime

Links MetacatWEB interface: https://metacat.fnal.gov:9443/dune_meta_prod/app/auth/login

justIN: https://justin.dune.hep.ac.uk/docs/

Slack channels: #workflow