fcollections.core#
Files collections.
This module brings together the classic operations done on a collection of files (in other word a dataset stored in multiple files). This operations are: layout (structure of the folders) minimal walk; file name: parsing, filtering and interpretation; file reading: data loading, combination and post-processing.
Functions
|
Compose multiple functions that preprocess an xarray Dataset. |
Extract metadata from a netcdf dataset. |
|
|
Recursive walk of a file system tree. |
Classes
|
|
|
Numpy datetime value. |
|
Numpy datetime value. |
|
|
|
Enum field for files selection. |
|
Float value. |
|
Integer value. |
|
|
|
Period value. |
|
ISO8601 duration codes field (PT1D, P1W, ...) |
|
Parse or generate filenames with a convention definition. |
Interface for reading multiple files on a specific file system. |
|
|
Xarray implementation of IFilesReader interface. |
|
Abstract database mapping. |
|
|
|
|
|
Utility class for filtering values. |
|
Datetime value given as a julian day. |
|
|
|
Metadata for a group of variables. |
|
Metadata of a variable. |
|
|
Interface for defining a complex predicate. |
|
|
Implements a ILayout with a succession of conventions. |
|
Coder-Decoder interface. |
|
Compare two objects of types U and T. |
|
Information about a multiple Tree levels. |
|
Representation of a file system path. |
|
File node of a file system tree. |
|
Directory node of a file system tree. |
|
Visitor processing an |
|
Visitor with node interpretation and branch exploration hints. |
|
Visitor with file node interpretation only. |
|
Possibilities when a folder of file node does not match any layout. |
Visitor for producing the equivalent of |
|
|
Result of a visit. |
|
Filtered discovery and aggregation of filesystem metadata. |
Exceptions
Raised when an error occurs during files discovery and interpretation. |
|
Raised by a codec if a string cannot be properly decoded. |
|
Raised by the visitor during node visit. |
|
Raised if all layouts do not match the actual file system structure. |
- exception fcollections.core.DecodingError[source]#
Bases:
ExceptionRaised by a codec if a string cannot be properly decoded.
- class fcollections.core.Deduplicator(unique: 'tuple[str, ...]', auto_pick_last: 'tuple[str, ...]' = <factory>)[source]#
Bases:
object
- class fcollections.core.DirNode(name, info, fs: AbstractFileSystem, level: int, follow_symlinks: bool = False)[source]#
Bases:
INodeDirectory node of a file system tree.
- Parameters:
name – Name of the node. Not to be confused with the full path that should be contained in the info parameter
info – Additional information. The entry
name- representing the full path - is expected to be in this parameter. Other information will depend on thefsspecimplementationlevel – Nesting level of the current node with respect to the tree root
fs – File system hosting the node. Useful to list the children
follow_symlinks – If False, symbolic links will be marked as file nodes instead of directory nodes, and will not be explored
- accept(visitor: LayoutVisitor) VisitResult[source]#
Accept a visitor.
This method should trigger operations in the visitor. The visitor computes the desired result, and the node is responsible for emitting said-result to the walk operation.
- Returns:
The visit result
See also
walkWalk operation handling the tree traversal
- class fcollections.core.DiscreteTimesMixin(sampling: np.timedelta64 | None = None)[source]#
Bases:
ITemporalMixin
- class fcollections.core.DownloadMixin[source]#
Bases:
ABC- download(files: list[str], local_path: str, force_download: bool = False)[source]#
Retrieve files from FTP to local path.
- abstract property fs: AbstractFileSystem#
The mixin relies on this attribute to build new functionalities.
- exception fcollections.core.FileListingError[source]#
Bases:
ExceptionRaised when an error occurs during files discovery and interpretation.
- class fcollections.core.FileNameConvention(regex: Pattern, fields: list[FileNameField], generation_string: str | None = None)[source]#
Bases:
objectParse or generate filenames with a convention definition.
The convention is expressed as both a regex and a simple string to handle both parsing and generation. The generation string can be omitted and set to None if the convention is only used to parse files.
- fields: list[FileNameField]#
List of fields, each field name must correspond to a group in the regex pattern.
- generation_string: str | None = None#
String that will be formatted with the input objects to generate a string.
The string can use the formatting language described in help(‘FORMATTING’). In addition, the formatting can be delegated to each field.encode methods by specifying the field name fn spec. This allows handling more complex objects such as Period. For example with an FileNameFieldInteger and FileNameFieldPeriod defined: ‘{cycle_number:>03d}_{period!f}’ -> ‘003_20230102_20240201
- get_field(name: str) FileNameField[source]#
Retrieve a field from its name.
Only the first matching field is returned. It is assumed that the convention has fields with independant names.
- Parameters:
name – Name of the field to seek
- Returns:
The requestion FileNameField
- Raises:
KeyError – In case the requested field has no match in the convention
- class fcollections.core.FileNameField(name: str, default: T | None = None, description: str = '')[source]#
- class fcollections.core.FileNameFieldDateDelta(name: str, date_fmt: str | list, delta: timedelta64, include_stop: bool = False, default: Period | None = None, description: str = '')[source]#
Bases:
FileNameField,PeriodTester,PeriodDeltaCodec,DateTimeCodecNumpy datetime value.
- delta#
time delta
- Type:
np.timedelta64
- class fcollections.core.FileNameFieldDateJulian(name: str, reference: datetime64, default: Period | None = None, description: str = '', julian_day_format: str = 'days_hours')[source]#
Bases:
FileNameField,DateTimeTester,JulianDayCodec
- class fcollections.core.FileNameFieldDateJulianDelta(name: str, delta: timedelta64, reference: datetime64, include_stop: bool = False, default: Period | None = None, description: str = '', julian_day_format: str = 'days')[source]#
Bases:
FileNameField,PeriodTester,PeriodDeltaCodec,JulianDayCodecDatetime value given as a julian day.
- delta#
time delta
- Type:
np.timedelta64
- reference#
Reference date for the given julian days
- julian_day_format#
Whether the julian day is expected as ‘days’, ‘days_hours’ or ‘fractional’. For example 24000, 24000_06 or 24000.25
- class fcollections.core.FileNameFieldDatetime(name: str, date_fmt: str | list, default: timedelta64 | None = None, description: str = '')[source]#
Bases:
FileNameField,DateTimeTester,DateTimeCodecNumpy datetime value.
- class fcollections.core.FileNameFieldEnum(name: str, enum_cls: type[Enum], case_type_decoded: CaseType | None = None, case_type_encoded: CaseType | None = None, underscore_encoded: bool = True, default: type[Enum] | None = None, description: str = '')[source]#
Bases:
FileNameField,EnumTester,EnumCodecEnum field for files selection.
- enum_cls#
enum class
- class fcollections.core.FileNameFieldFloat(name: str, default: T | None = None, description: str = '')[source]#
Bases:
FileNameField,FloatTester,FloatCodecFloat value.
- class fcollections.core.FileNameFieldISODuration(name: str, default: T | None = None, description: str = '')[source]#
Bases:
FileNameField,ISODurationCodecISO8601 duration codes field (PT1D, P1W, …)
- sanitize(reference: str | ISODuration) ISODuration[source]#
Cast to one of the types handled by this tester.
- Parameters:
reference – The reference object to cast
- Returns:
The input cast to the proper type
- property type: type[ISODuration]#
Type of the tested field.
- class fcollections.core.FileNameFieldInteger(name: str, default: T | None = None, description: str = '')[source]#
Bases:
FileNameField,IntegerTester,IntegerCodecInteger value.
- class fcollections.core.FileNameFieldPeriod(name: str, date_fmt: str, separator='_', default: Period | None = None, description: str = '')[source]#
Bases:
FileNameField,PeriodTester,PeriodCodecPeriod value.
- class fcollections.core.FileNameFieldString(name: str, default: T | None = None, description: str = '')[source]#
Bases:
FileNameField,StringTester,StringCodec
- class fcollections.core.FileNode(name: str, info: dict[str, Any], level: int)[source]#
Bases:
INodeFile node of a file system tree.
- accept(visitor: LayoutVisitor) VisitResult[source]#
Accept a visitor.
This method should trigger operations in the visitor. The visitor computes the desired result, and the node is responsible for emitting said-result to the walk operation.
- Returns:
The visit result
See also
walkWalk operation handling the tree traversal
- class fcollections.core.FileSystemMetadataCollector(layouts: list[Layout], root_node: INode)[source]#
Bases:
objectFiltered discovery and aggregation of filesystem metadata.
Notes
The aggregation has yet to be implemented.
Only files’ metadata can be collected in the current implementation
- Parameters:
path – path of directory containing files
layouts – Succession of conventions describing how to interpret the folder and file nodes
root_node – Root node representing an explorable tree. Usually represent the parent directory on a file system
- discover(predicates: tuple[Callable[[tuple[Any, ...]], bool], ...] = (), stat_fields: tuple[str] = (), enable_layouts: bool = True, **filters) DataFrame[source]#
- Parameters:
predicates – Complex predicates for filtering a file’s record
stat_fields – Name of the file metadata fields that should be returned in the record. The info that can be retrieved is dependent on the file system implementation. Check the filesystem
lsmethod to get the available stat fieldsenable_layouts – Set to True to use the layouts for directory names parsing. This will speed up the listing, but may raise an error if some directory do not match the declared layouts. Set to False to scan the entire directory and parse the files only
**filters – filters for files/folde selection over the fields declared in the layouts. Each field can accept a different filter value depending on the underlying FileNameField subclass
- Yields:
The record matching the files
- Raises:
KeyError – In case some of the requested stat_fields are not available for the current file system
LayoutMismatchError – In case
enable_layoutsis True and a mismatch between the layouts and the actual files is detected
- to_dataframe(predicates: tuple[Callable[[tuple[Any, ...]], bool], ...] = (), stat_fields: tuple[str] = (), enable_layouts: bool = True, **filters) DataFrame[source]#
- Parameters:
predicates – Complex predicates for filtering a file or folder’s record
stat_fields – Name of the file or folder metadata fields that should be returned in the record. The info that can be retrieved is dependent on the file system implementation. Check the filesystem
lsmethod to get the available stat fieldsenable_layouts – Set to True to use the layouts for directory names parsing. This will speed up the listing, but may raise an error if some directory do not match the declared layouts. Set to False to scan the entire directory and parse the files only
**filters – filters for files/folders selection over the fields declared in the file name convention and layout (optional). Each field can accept a different filter value depending on the underlying FileNameField subclass
- Yields:
A pandas’s dataframe containing all selected filenames + a column per field requested
- Raises:
KeyError – In case some of the requested stat_fields are not available for the current file system
VisitError – In case
enable_layoutsis True and a mismatch between the layouts and the actual files is detected
- class fcollections.core.FilesDatabase(path: str, fs: AbstractFileSystem = LocalFileSystem(), enable_layouts: bool = True, follow_symlinks: bool = False)[source]#
Bases:
objectAbstract database mapping.
- Parameters:
path – path to a directory containing NetCDF files
fs – File system hosting the files. Can be used to access local or remote (S3, FTP, …) file systems. Underlying readers may not be compatible with all file systems implementations
enable_layouts – Set to True to use the layouts for directory names parsing. This will speed up the listing, but may raise an error if some directory does not match the pre-configured layouts. Set to False to scan the entire directory and parse the files only
follow_symlinks – If False, symbolic links will be marked as file nodes instead of directory nodes, and will not be explored
- discoverer#
File discoverer. Walks in a folder (can be on a remote file system), parses the listed files and filters them.
- deduplicator: Deduplicator | None = None#
Deduplicate the file metadata table of a unique subset (after unmixing).
- layouts: list[Layout] | None = None#
Semantic describing how the files are organized.
Useful to extract information and have an efficient file system scanning. The pre-configured layouts can mismatch the current files organization, in which case the user can build its own or set
enable_layoutsto False.
- list_files(*args, **kwargs)#
- map(*args, **kwargs)#
- metadata_injection: dict[str, tuple[str, ...]] | None = None#
Configures how metadata from the files listing can be injected in a dataset returned from the read.
The keys is the columns of the file metadata table, the value is a tuple of dimensions for insertion.
- property parser: FileNameConvention#
Amongst all name conventions, get the one managing the files.
- Returns:
Files name parser
- predicate_classes: list[type[IPredicate]] | None = None#
List of predicates that are built at each query.
The predicates intercepts the input parameters to build a custom record predicate. Usually, it is a complex test involving auxiliary data, such as ground track footprints or half_orbit/periods tables.
- query(*args, **kwargs)#
- reader: IFilesReader | None = None#
Files reader.
- sort_keys: list[str] | str | None = None#
Keys that specifies the fields used to sort the records extracted from the filenames.
Useful to order the files prior to reading them.
- unmixer: SubsetsUnmixer | None = None#
Specify how to interpret the file metadata table to unmix subsets.
- variables_info(*args, **kwargs)#
- class fcollections.core.GroupMetadata(name: str, variables: list[VariableMetadata], subgroups: list[GroupMetadata], attributes: dict[str, str], dimensions: dict[str, int])[source]#
Bases:
objectMetadata for a group of variables.
A dataset may be organized as a simple set of variables, or adopt a more complex tree-like structure. This dataclass reflects the most complex case where we can have an indefinite number of nesting levels. The simplest case (no concept of groups) is naturally well- contained within this model.
- apply(callable: tp.Callable[[GroupMetadata]])[source]#
Apply a callable to the metadata tree.
Useful to modify the tree in place/
- Parameters:
callable – The function to apply to each node
- flatten() list[dict[str, Any]][source]#
Flatten the tree structure to a dictionary.
Group names will be converted to absolute paths with ‘/’ separator.
- Returns:
A dictionary containing all the groups, with keys containing paths linked to the tree structure
- nodes(path: str) list[GroupMetadata][source]#
Walk the metadata tree and retrieves the nodes along a given path.
- Parameters:
path – Absolute path for the node to find. The path separator is ‘/’. For example, a path [root, first_level, second_level] can be given as root/first_level/second_level or /root/first_level/second_level (the prepending ‘/’ will be stripped)
- Returns:
List of nodes that are part of the path, starting with the root node and ending with the last node of the path
- Raises:
ValueError – In case nodes are missing in the path
- subgroups: list[GroupMetadata]#
Nested groups.
- variables: list[VariableMetadata]#
List of variables contained in the group.
- class fcollections.core.ICodec[source]#
-
Coder-Decoder interface.
A codec defines how to encode/decode strings to/from a given T class object.
- abstractmethod decode(input_string: str) T[source]#
Decode an input string and generate a Generic[T] object.
- Parameters:
input_string – The input string
- Returns:
The decoded Generic[T] object
- Raises:
DecodingError – If the input string decoding fails
- class fcollections.core.IFilesReader[source]#
Bases:
ABCInterface for reading multiple files on a specific file system.
Implementations of this interface can be called to peek at the metadata of a dataset, or to read a selection of variables.
- abstractmethod metadata(file: str, fs: AbstractFileSystem = fs_loc.LocalFileSystem()) GroupMetadata[source]#
Load the metadata of the given file.
Useful to get information about the structure of the dataset, and which variables, dimensions and coordinates are available for reading.
- Parameters:
file – File from which the metadata is read
fs – File system hosting the file
- Returns:
A GroupMetadata containing the variables, dimensions, attributes and subgroups
- abstractmethod read(files: list[str] | list[list[str]], selected_variables: list[str] | None = None, fs: AbstractFileSystem = fs_loc.LocalFileSystem(), **kwargs: Any) Dataset[source]#
Read a list of files.
- Parameters:
files – List of the files to read
selected_variables – Variables that needs to be read. Set to None to read everything
fs – File system hosting the files
- Returns:
An xarray dataset containing the selected variables
- class fcollections.core.ILayout[source]#
Bases:
objectInformation about a multiple Tree levels.
Given a Tree (ex a filesystem) with a structure of N homogeneous levels, the layout will associate each level with a FileNameConvention to extract useful information. This information can then be leveraged by building filters to speed up the tree visitation.
For example, let’s consider a set of altimetry data files, organized in pre-defined folders: v1/Expert/cycle_001, v1/Expert/cycle_002, v2/Basic/cycle_001, … The first level contains information about the version, the second level about the subset, and the last level about the cycle number. The layout will declare three FileNameConvention to ‘know’ about the tree structure. Then, filters - for example subset=’Expert’ - can be set to select only a subpart of the tree, greatly improving the visitation performance.
- abstractmethod generate(root: str, **fields: Any) str[source]#
Generate a path from the fields.
- Parameters:
root – The root path
fields – key/values for interpolating the conventions
- Returns:
A path
- Raises:
ValueError – In case one of the field required to generate the path is missing,
ValueError – In case one of the field required to generate the path has an improper value
- abstractmethod set_filters(**references: Any)[source]#
Set filters used to check if a path complies with the layout.
- Parameters:
**references – Key/values matching at least one of the underlying conventions
- abstractmethod test(level: int, node: str) bool[source]#
Checks if a path part matches the current filters.
- Parameters:
node – Path part that needs to be checked
level – Level of the current path part among the layout conventions
- Returns:
True if the path part is selected with the current filters, False otherwise
- class fcollections.core.INode(name: str, info: dict[str, Any], level: int)[source]#
Bases:
ABCRepresentation of a file system path.
- Parameters:
name – Name of the node. Not to be confused with the full path that should be contained in the info parameter
info – Additional information.
name- representing the full path - is expected to be in this parameter. Other information will depend on thefsspecimplementationslevel – Nesting level of the current node with respect to the tree root
- abstractmethod accept(visitor: LayoutVisitor) VisitResult[source]#
Accept a visitor.
This method should trigger operations in the visitor. The visitor computes the desired result, and the node is responsible for emitting said-result to the walk operation.
- Returns:
The visit result
See also
walkWalk operation handling the tree traversal
- class fcollections.core.IPredicate[source]#
Bases:
ABCInterface for defining a complex predicate.
This predicate will be used to filter records from file names listing and parsing.
- indexes#
Attributes
- \*args
Any input that will be used to create the predicate
- class fcollections.core.ITemporalMixin[source]#
Bases:
ABC- abstractmethod list_files(*args, **kwargs) pda_t.DataFrame[source]#
The mixin relies on this method to build new functionalities.
- class fcollections.core.ITester[source]#
-
Compare two objects of types U and T.
This interface can be used to define filters that needs to compare objects with different but close types. For example, an integer with another integer or a list of integers.
In addition to the testing functionality, this interface also provides a way to cast an object to one of the expected U types. This is useful for sanitizing user inputs that are in the simplest possible types. Such example is the automatic building of a np.datetime64 from a string given by the user (‘2024-01-01’)
- sanitize(reference: Any) U[source]#
Cast to one of the types handled by this tester.
- Parameters:
reference – The reference object to cast
- Returns:
The input cast to the proper type
- test(reference: U, tested: T) bool[source]#
Compare two objects of similar types.
- Parameters:
reference – The reference object
tested – The tested object
- Returns:
True if the test is successful, False otherwise
- class fcollections.core.IVisitor[source]#
Bases:
ABCVisitor processing an
INode.Visitors interpret a node and return information from it. It is up to the implementation to define which information it can get from the node. Some implementations will only return the node path, other will try to interpret it using semantics’ definitions.
An important characteristic of the visitor is its ability to advance from a previous visit result. This gives flexibility to implement specific states during the tree traversal.
Additionnal metadata about the visit are also returned by the visitor. This information should be used for tree traversal and visitor advancement only, and not returned by the walk operation.
- abstractmethod advance(result: VisitResult) IVisitor[source]#
Advance the visitor.
The advancement can either return a reference or a copy of the visitor. If a per-branch state is needed, it is advised to return a copy.
- Parameters:
result – Previous result of a visit. Originally intended to be the visit result of the parent node.
- Returns:
The current visitor or a copy with a modified state
- abstractmethod visit_dir(dir_node: DirNode) VisitResult[source]#
Visits a directory node.
- Parameters:
dir_node – The directory node to visit
- Returns:
Node information and visit metadata.
- abstractmethod visit_file(file_node: DirNode) VisitResult[source]#
Visits a file node.
- Parameters:
file_node – The file node to visit
- Returns:
Node information and visit metadata.
- class fcollections.core.Layout(conventions: list[FileNameConvention])[source]#
Bases:
ILayoutImplements a ILayout with a succession of conventions.
- Parameters:
conventions – List of convention, with the first element matching the tree root, and last element the last level before the leafs
- generate(root: str, **fields: Any) str[source]#
Generate a path from the fields.
- Parameters:
root – The root path
fields – key/values for interpolating the conventions
- Returns:
A path
- Raises:
ValueError – In case one of the field required to generate the path is missing,
ValueError – In case one of the field required to generate the path has an improper value
- parse_node(level: int, node: str) tuple[Any, ...][source]#
Interprets a node name.
- Parameters:
level – Depth in the layout. Depth in the layout is the depth of the node with respect to its root minus 1. There is no semantic for the root node, which explains this discrepency of layout-depth and tree-depth
node – Node name (not its full path)
- Returns:
Structure information about the node
- set_filters(**references: Any)[source]#
Set filters used to check if a path complies with the layout.
- Parameters:
**references – Key/values matching at least one of the underlying conventions
- test_record(level: int, record: tuple[Any, ...]) bool[source]#
Checks if the node information matches the filters.
The test will look for filters at the considered layout depth, and apply them on the record.
- Parameters:
level – Depth in the layout. Depth in the layout is the depth of the node with respect to its root minus 1. There is no semantic for the root node, which explains this discrepency of layout-depth and tree-depth
record – Interpreted node informations
- Returns:
True if the node matches the filters, false otherwise
- exception fcollections.core.LayoutMismatchError[source]#
Bases:
VisitErrorRaised if all layouts do not match the actual file system structure.
- class fcollections.core.LayoutMismatchHandling(*values)[source]#
Bases:
EnumPossibilities when a folder of file node does not match any layout.
- IGNORE = 3#
Ignore the mismatch.
- RAISE = 1#
Raise an exception.
- WARN = 2#
Warn the user.
- class fcollections.core.LayoutVisitor(layouts: list[Layout], stat_fields: Iterable[str] = tuple(), on_mismatch_directory: LayoutMismatchHandling = LayoutMismatchHandling.RAISE, on_mismatch_file: LayoutMismatchHandling = LayoutMismatchHandling.IGNORE)[source]#
Bases:
IVisitorVisitor with node interpretation and branch exploration hints.
The layouts will try to interpret a node and get a record of structured information. Layouts also include filters that are applied to give a hint about tree exploration: if all layouts exclude the current node, exploration should not continue.
- Parameters:
layouts – Semantic definitions for interpreting and testing node meanings
stat_fields – List of node metadata to add to the record
on_mismatch_directory – Behavior on mismatch for directories
on_mismatch_file – Behavior on mismatch for files
- advance(result: VisitResult) LayoutVisitor[source]#
Advance the visitor.
The advancement can either return a reference or a copy of the visitor. If a per-branch state is needed, it is advised to return a copy.
- Parameters:
result – Previous result of a visit. Originally intended to be the visit result of the parent node.
- Returns:
The current visitor or a copy with a modified state
- visit_dir(dir_node: DirNode) VisitResult[source]#
Visits a directory node.
The directory node path is parsed into a structured node. If none of the layouts is able to parse the node, it means we are in uncharted territory: tree traversal hint in the visit result will state we should not continue exploring.
In addition, layout filters are applied on the node information. If all layouts exclude the node, it means no node of interest are in this branch: we want to terminate the current branch exploration as soon as possible to speed up the walk operation.
Multiple layouts means multiple semantics are possible. This is the case in a heterogeneous folder. When exploring a branch, some layouts may not match the branch semantic. These are pruned as soon as possible, but only for the current branch.
- Warns:
UserWarning – In case the dir_node does not match any configured layout and
on_mismatchis set toWARN- Raises:
LayoutMismatchError – In case the dir_node does not match any configured layout and
on_mismatchis set toRAISE- Returns:
Node information and visit metadata. The visit metadata includes a tree traversal hint for further exploration, and the surviving layouts that match the current branch
- visit_file(file_node: FileNode) VisitResult[source]#
Visits a file node.
The file node is interpreted to generate a record of structured information. The content of this record depends on the layouts definition. If the interpretation fails, the visit result will not include any information about the node.
Layout filters are also applied to the node record. If all layouts exclude the node, the visit result will not include any information about the node.
- Raises:
KeyError – If the requested stats_fields key are unknown for the given fsspec implementation
- Returns:
Node information and visit metadata. For file node, no further exploration should be needed. In this case, surviving layouts are not relevant and will not be included in the visit result.
- class fcollections.core.NoLayoutVisitor(convention: FileNameConvention, record_filter: RecordFilter, stat_fields: Iterable[str] = tuple())[source]#
Bases:
IVisitorVisitor with file node interpretation only.
The given convention will interpret the file nodes, the folders are not interpreted.
- Parameters:
convention – Semantic definitions for interpreting a file node
record – Tester for the file node information
stat_fields – List of node metadata to add to the record
- advance(result: VisitResult) IVisitor[source]#
Advance the visitor.
The advancement can either return a reference or a copy of the visitor. If a per-branch state is needed, it is advised to return a copy.
- Parameters:
result – Previous result of a visit. Originally intended to be the visit result of the parent node.
- Returns:
The current visitor or a copy with a modified state
- visit_dir(dir_node: DirNode) VisitResult[source]#
Visits a directory node.
Transparent visit of a directory node. The visit will not return any information about the node. The metadata will always hint at continuing the branch exploration.
- Parameters:
dir_node – The directory node to visit
- Returns:
Node information and visit metadata.
- visit_file(file_node: DirNode) VisitResult[source]#
Visits a file node.
- Parameters:
file_node – The file node to visit
- Returns:
Node information and visit metadata.
- class fcollections.core.OpenMfDataset(xarray_options: dict[str, str] | None = None)[source]#
Bases:
IFilesReaderXarray implementation of IFilesReader interface.
This implementation is a simple wrapper around the
xarray.open_mfdatasetfunction. The function parameters are expected to be given as a dictionary of the reader, except for thepreprocessorargument that should be given to thereadmethod.- Parameters:
xarray_options –
xarray.open_mfdatasetreading options. Set to None to keep xarray defaults
See also
xarray.open_mfdatasetThe wrapped reading function
- metadata(file: str, fs: AbstractFileSystem = fs_loc.LocalFileSystem()) GroupMetadata[source]#
Load the metadata of the given file.
Useful to get information about the structure of the dataset, and which variables, dimensions and coordinates are available for reading.
- Parameters:
file – File from which the metadata is read
fs – File system hosting the file
- Returns:
A GroupMetadata containing the variables, dimensions, attributes and subgroups
- read(files: list[str] | list[list[str]], selected_variables: list[str] | None = None, fs: AbstractFileSystem = fs_loc.LocalFileSystem(), preprocess: Callable[[Dataset], Dataset] | None = None, **kwargs: Any) Dataset[source]#
Read a list of files.
- Parameters:
files – List of the files to read
selected_variables – Variables that needs to be read. Set to None to read everything
fs – File system hosting the files
preprocess – Preprocessor for open_mfdataset
- Returns:
An xarray dataset containing the selected variables
- class fcollections.core.PeriodMixin[source]#
Bases:
ITemporalMixin
- class fcollections.core.RecordFilter(fields: list[FileNameField], **references)[source]#
Bases:
objectUtility class for filtering values.
- fields#
the fields to filter
- Type:
List[FileNameField]
- \*\*references
the values of fields used for selection
- class fcollections.core.StandardVisitor[source]#
Bases:
IVisitorVisitor for producing the equivalent of
fsspec.spec.AbstractFileSystem.walk().The useful information is a tuple (root, dirs, files) that mimics the standard output of a walk operation.
No additionnal metadata related to the visit itself is returned.
- advance(result: VisitResult) StandardVisitor[source]#
Advance the visitor.
The advancement can either return a reference or a copy of the visitor. If a per-branch state is needed, it is advised to return a copy.
- Parameters:
result – Previous result of a visit. Originally intended to be the visit result of the parent node.
- Returns:
The current visitor or a copy with a modified state
- visit_dir(dir_node: DirNode) VisitResult[source]#
Visits a directory node.
- Parameters:
dir_node – The directory node to visit
- Returns:
Node information and visit metadata.
- visit_file(file_node: DirNode) VisitResult[source]#
Visits a file node.
- Parameters:
file_node – The file node to visit
- Returns:
Node information and visit metadata.
- class fcollections.core.SubsetsUnmixer(partition_keys: 'tuple[str, ...] | dict[str, tp.Callable | None]', auto_pick_last: 'tuple[str, ...]' = <factory>)[source]#
Bases:
object
- class fcollections.core.VariableMetadata(name: str, dtype: dtype, dimensions: tuple[str, ...], attributes: dict[str, str])[source]#
Bases:
objectMetadata of a variable.
- exception fcollections.core.VisitError[source]#
Bases:
ExceptionRaised by the visitor during node visit.
- class fcollections.core.VisitResult(explore_next: bool, payload: ~typing.Any | None = None, surviving_layouts: list[~fcollections.core._listing.Layout] = <factory>)[source]#
Bases:
objectResult of a visit.
The result type is defined by the
IVisitorimplementations.Additional information related to semantic definition contained in layouts (
Layout) is given for further advancement of the visitors.Tree traversal can also use exploration hints given by the visitors decide if the current branch should be explored.
See also
walkHandle tree traversal
- surviving_layouts: list[Layout]#
LayoutVisitoronly, used to know which semantic is still valid for the current branch.
- fcollections.core.compose(func1: Callable[[Dataset], Dataset], *func2: Callable[[Dataset], Dataset] | None) Callable[[Dataset], Dataset][source]#
Compose multiple functions that preprocess an xarray Dataset.
Before calling xr.open_mfdataset, it is useful to set up various preprocessings. For example, one might want to crop a subset of the dataset, and then create an index before xarray combination steps in.
This method is an utility that will make it easier to chain such preprocessors.
The call order is the same as the input arguments: func1 is called first, func2[0] is called second and so on.
- Parameters:
func1 – First preprocessor. Cannot be None
*func2 – Subsequent preprocessors. None elements will be ignored
- Returns:
The chained functions
See also
xarray.open_mfdatasetmethod that takes chained preprocessors as an input
- fcollections.core.group_metadata_from_netcdf(nds: nc4.Dataset) GroupMetadata[source]#
Extract metadata from a netcdf dataset.
- Parameters:
nds – The netcdf dataset from which we want the metadata
- Returns:
The associated GroupMetadata
- fcollections.core.walk(node: INode, visitor: IVisitor) Iterator[Any][source]#
Recursive walk of a file system tree.
This is a reimplementation of the similar
os.walk()andfsspec.spec.AbstractFileSystem.walk(). The motivation for the reimplementation is that we need to inject some complex logic (node parsing and branch exploration) during the tree traversal.- Parameters:
node – File or folder node representing a path on the filesystem
visitor – Visitor that will process the note and produce some results
- Raises:
VisitError – Raised by the visitor to signal something went wrong during a node visit
- Yields:
The results of all visits in the tree. The result type will depend on the visitor implementation
See also
StandardVisitorVisitor returning (root, dirs, files) tuples similar to a conventionnal walk
LayoutVisitorVisitor that can interpret the node paths and return structured information