merge_rse

Keep track of RSEs where files are stored

class merge_utils.merge_rse.MergeRSE(valid: bool, nearline_dist: float = 0)[source]

Class to store information about an RSE

distance(site: str) float[source]

Get the distance to a site

get_ping() float[source]

Get the ping time to the RSE in ms

nearest_site(sites: list | None = None) tuple[source]

Get the nearest merging site to this RSE

Parameters:

sites – list of merging sites to consider (default: all sites)

Returns:

tuple of (site, distance)

property pfns: dict

Get the set of PFNs for this RSE

class merge_utils.merge_rse.MergeRSEs[source]

Class to keep track of a set of RSEs

async add_pfn(did: str, pfn: str, info: dict) float[source]

Add a file PFN to the corresponding RSE

Parameters:
  • did – file DID

  • pfn – physical file name

  • info – dictionary with RSE information

Returns:

distance from the RSE to the nearest site

async add_replicas(did: str, replicas: dict) int[source]

Add a set of file replicas to the RSEs

Parameters:
  • did – file DID

  • replicas – Rucio replica dictionary

Returns:

number of PFNs added

cleanup() None[source]

Remove RSEs with no files

async connect() None[source]

Download the RSE list from Rucio and determine their distances

get_pfns(files: Iterable | None = None) dict[source]

Determine the best RSE for each file

optimize_pfns(pfns: dict, files: Iterable, sites: list) dict[source]

Find the best PFNs for a set of sites.

Parameters:
  • pfns – dictionary of PFNs for all sites

  • files – collection of files to process

  • sites – list of sites to consider

Returns:

dictionary of best PFNs for each site

set_rse_sites() None[source]

Query Rucio for the site associated with each RSE

site_pfns(site: str, files: Iterable) dict[source]

Find the file replicas with the shortest distance to a given merging site.

Parameters:
  • site – merging site name

  • files – collection of files to process

Returns:

dictionary of PFNs for the site

merge_utils.merge_rse.check_path(path: str) str[source]

Check the status of a file path.

Parameters:

path – file path to check. Can be a local file or a remote URL.

Returns:

status of the file (‘ONLINE’, ‘NEARLINE’, ‘NONEXISTENT’, or ‘UNKNOWN’)