retriever¶
FileRetriever classes
- class retriever.FileRetriever[source]¶
Base class for retrieving metadata from a source
- async add(files: list, dids: list | None = None) dict [source]¶
Add the metadata for a list of files to the set.
- Parameters:
files – list of dictionaries with file metadata
dids – optional list of DIDs requested, used to check for missing files
- Returns:
dict of MergeFile objects that were added
- property dupes: dict¶
Return the set of duplicate files from the source
- abstract async input_batches() AsyncGenerator[dict, None] [source]¶
Asynchronously retrieve metadata for the next batch of files.
- Returns:
dict of MergeFile objects that were added
- property missing: dict¶
Return the set of missing files from the source
- output_chunks() Generator[MergeChunk, None, None] [source]¶
Yield chunks of files for merging.
- Returns:
yields a series of MergeChunk objects
- class retriever.LocalRetriever(filelist: list, meta_dirs: list | None = None)[source]¶
FileRetriever for local files
Initialize the LocalRetriever with a list of files and optional metadata directories.
- Parameters:
filelist – list of input data files
meta_dirs – optional list of directories to search for metadata files