| Title: | HDF5-backed DataFrame objects and methods |
|---|---|
| Description: | HDF5DataFrame is an R/Bioconductor package for HDF5-backed DataFrame objects and methods. This allows HDF5-backed data to be easily used as data frames with arbitrary sets of columns. |
| Authors: | Artür Manukyan [aut, cre, fnd] (ORCID: <https://orcid.org/0000-0002-0441-9517>) |
| Maintainer: | Artür Manukyan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.3 |
| Built: | 2026-06-04 07:22:21 UTC |
| Source: | https://github.com/BIMSBbioinfo/HDF5DataFrame |
Represent a column of a HDF5-based data frame as a 1-dimensional DelayedArray. This allows us to use HDF5-backed data inside DataFrame without loading them into memory.
HDF5ColumnSeed(path, name, column, type = NULL, length = NULL) HDF5ColumnVector(x, ...) ## S4 method for signature 'HDF5ColumnSeed' DelayedArray(seed) ## S4 method for signature 'HDF5ColumnSeed' dim(x) ## S4 method for signature 'HDF5ColumnSeed' type(x) ## S4 method for signature 'HDF5ColumnSeed' path(object) ## S4 method for signature 'HDF5ColumnSeed' extract_array(x, index)HDF5ColumnSeed(path, name, column, type = NULL, length = NULL) HDF5ColumnVector(x, ...) ## S4 method for signature 'HDF5ColumnSeed' DelayedArray(seed) ## S4 method for signature 'HDF5ColumnSeed' dim(x) ## S4 method for signature 'HDF5ColumnSeed' type(x) ## S4 method for signature 'HDF5ColumnSeed' path(object) ## S4 method for signature 'HDF5ColumnSeed' extract_array(x, index)
path |
String containing a path to a HDF5-based data frame. |
name |
String containing the HDF5 group of the h5 file. |
column |
String containing the name of the column inside the file. |
type |
String specifying the type of the data.
If |
length |
Integer containing the number of rows.
If |
x |
Either a string containing the path to an HDF5-based data frame
file (to be used as |
... |
Further arguments to be passed to the |
seed, object
|
A HDF5ColumnSeed object |
index |
An unnamed list of integer vectors, one per dimension in x. See extract_array |
For HDF5ColumnSeed, a HDF5ColumnSeed is returned, obviously.
For HDF5ColumnVector, a HDF5ColumnVector is returned.
Artür Manukyan
# libraries library(rhdf5) library(HDF5Array) library(HDF5DataFrame) # h5 output_hdf5 <- tempfile(fileext = ".h5") h5createFile(output_hdf5) h5createGroup(output_hdf5, group = "metadata") # data data("chickwts") metadata <- chickwts # set metadata meta.data_list <- list() for(i in 1:ncol(metadata)){ cur_column <- as.vector(subset(metadata, select = colnames(metadata)[i]))[[1]] if(is.character(cur_column) || is.factor(cur_column)) cur_column <- as.character(cur_column) cur_column <- as.array(cur_column) meta.data_list[[colnames(metadata)[i]]] <- writeHDF5Array(cur_column, output_hdf5, name = paste0("metadata", "/", colnames(metadata)[i]), with.dimnames = FALSE) } # define hd5columnseed columnseed <- HDF5ColumnSeed(path = path(meta.data_list[[1]]), name = "metadata", column = colnames(metadata)[i], type = type(meta.data_list[[1]])) # methods dim(columnseed) path(columnseed) type(columnseed)# libraries library(rhdf5) library(HDF5Array) library(HDF5DataFrame) # h5 output_hdf5 <- tempfile(fileext = ".h5") h5createFile(output_hdf5) h5createGroup(output_hdf5, group = "metadata") # data data("chickwts") metadata <- chickwts # set metadata meta.data_list <- list() for(i in 1:ncol(metadata)){ cur_column <- as.vector(subset(metadata, select = colnames(metadata)[i]))[[1]] if(is.character(cur_column) || is.factor(cur_column)) cur_column <- as.character(cur_column) cur_column <- as.array(cur_column) meta.data_list[[colnames(metadata)[i]]] <- writeHDF5Array(cur_column, output_hdf5, name = paste0("metadata", "/", colnames(metadata)[i]), with.dimnames = FALSE) } # define hd5columnseed columnseed <- HDF5ColumnSeed(path = path(meta.data_list[[1]]), name = "metadata", column = colnames(metadata)[i], type = type(meta.data_list[[1]])) # methods dim(columnseed) path(columnseed) type(columnseed)
The HDF5ColumnSeed class for HDF5ColumnVector.
path |
The path (as a single string or H5File object) to the HDF5 file where the dataset is located. |
name |
The name of the dataset in the HDF5 file. |
column |
the names of the columns, see HDF5ColumnVector |
length |
the length of the HDF5Array. |
The HDF5ColumnVector class for each column of a HDF5DataFrame class
seed |
An HDF5ColumnSeed object |
Create a HDF5-backed DataFrame, where the data are kept on disk until requested.
HDF5DataFrame(filepath, name = "", columns = NULL) ## S4 method for signature 'HDF5DataFrame' nrow(x) ## S4 method for signature 'HDF5DataFrame' length(x) ## S4 method for signature 'HDF5DataFrame' path(object) ## S4 method for signature 'HDF5DataFrame' rownames(x) ## S4 method for signature 'HDF5DataFrame' names(x) ## S4 replacement method for signature 'HDF5DataFrame' rownames(x) <- value ## S4 replacement method for signature 'HDF5DataFrame' names(x) <- value ## S4 method for signature 'HDF5DataFrame' x[[i, j, ...]] ## S4 replacement method for signature 'HDF5DataFrame' x[[i, j, ...]] <- value ## S4 method for signature 'HDF5DataFrame' cbind(..., deparse.level = 1) ## S4 method for signature 'HDF5DataFrame' as.data.frame(x, row.names = NULL, optional = FALSE, ...)HDF5DataFrame(filepath, name = "", columns = NULL) ## S4 method for signature 'HDF5DataFrame' nrow(x) ## S4 method for signature 'HDF5DataFrame' length(x) ## S4 method for signature 'HDF5DataFrame' path(object) ## S4 method for signature 'HDF5DataFrame' rownames(x) ## S4 method for signature 'HDF5DataFrame' names(x) ## S4 replacement method for signature 'HDF5DataFrame' rownames(x) <- value ## S4 replacement method for signature 'HDF5DataFrame' names(x) <- value ## S4 method for signature 'HDF5DataFrame' x[[i, j, ...]] ## S4 replacement method for signature 'HDF5DataFrame' x[[i, j, ...]] <- value ## S4 method for signature 'HDF5DataFrame' cbind(..., deparse.level = 1) ## S4 method for signature 'HDF5DataFrame' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
filepath |
NULL or the path (as a single string) to the (new or existing) HDF5 file where to write the dataset. |
name |
Name of the HDF5 group of the h5 file. |
columns |
Character vector containing the names of columns in a
HDF5-based data frame. If |
x, object
|
A set of HDF5Arrays that are the columns of the HDF5DataFrame object. |
value |
rownames, names or new columns for HDF5DataFrame object |
i |
Depends on the usage |
j |
Depends on the usage |
... |
arguments passed to other methods |
deparse.level |
See ?base::cbind for a description of description of these arguments. |
row.names, optional
|
See ?base::as.data.frame for a description of these arguments. |
A HDF5DataFrame object where each column is a HDF5ColumnVector.
HDF5DataFrame object
number of rows of HDF5DataFrame object
length of HDF5DataFrame object
path to hdf5 file of HDF5DataFrame object
rownames of HDF5DataFrame object
names of columns of HDF5DataFrame object
HDF5DataFrame object
data.frame object
Artür Manukyan
# libraries library(rhdf5) library(HDF5Array) library(HDF5DataFrame) # h5 output_hdf5 <- tempfile(fileext = ".h5") # data data("chickwts") metadata <- chickwts # write data frame to HDF5 metadata_large <- writeHDF5DataFrame(metadata, filepath = output_hdf5, name = "metadata", replace = TRUE) metadata_large <- HDF5DataFrame(filepath = output_hdf5, name = "metadata") # coerce to data.frame metadata_large <- as.data.frame(metadata_large) # cbind metadata_large <- cbind(metadata_large, metadata)# libraries library(rhdf5) library(HDF5Array) library(HDF5DataFrame) # h5 output_hdf5 <- tempfile(fileext = ".h5") # data data("chickwts") metadata <- chickwts # write data frame to HDF5 metadata_large <- writeHDF5DataFrame(metadata, filepath = output_hdf5, name = "metadata", replace = TRUE) metadata_large <- HDF5DataFrame(filepath = output_hdf5, name = "metadata") # coerce to data.frame metadata_large <- as.data.frame(metadata_large) # cbind metadata_large <- cbind(metadata_large, metadata)
The HDF5DataFrame class is a DataFrame subclass for representing datasets with arbitrary collections of columns stored in HDF5.
path |
The path (as a single string or H5File object) to the HDF5 file where the dataset is located. |
name |
The name of the group in the HDF5 file. |
columns |
the names of the columns, see HDF5ColumnVector |
Low-level utility functions and classes to support subsetting of vector-like objects. They are not intended to be used directly. See extractROWS.
## S4 method for signature 'HDF5DataFrame,ANY' extractROWS(x, i) ## S4 method for signature 'HDF5DataFrame' extractCOLS(x, i) ## S4 method for signature 'HDF5DataFrame' replaceROWS(x, i, value) ## S4 method for signature 'HDF5DataFrame' replaceCOLS(x, i, value)## S4 method for signature 'HDF5DataFrame,ANY' extractROWS(x, i) ## S4 method for signature 'HDF5DataFrame' extractCOLS(x, i) ## S4 method for signature 'HDF5DataFrame' replaceROWS(x, i, value) ## S4 method for signature 'HDF5DataFrame' replaceCOLS(x, i, value)
x |
HDF5DataFrame object |
i |
row/column index or name |
value |
vector to be replaced |
HDF5DataFrame object
HDF5DataFrame object
A function for writing an data frames to an HDF5 file.
writeHDF5DataFrame(x, filepath, name = "", replace = FALSE)writeHDF5DataFrame(x, filepath, name = "", replace = FALSE)
x |
data.frame |
filepath |
NULL or the path (as a single string) to the (new or existing) HDF5 file where to write the dataset. See writeHDF5Array |
name |
NULL or the name of the HDF5 group to write columns of the dataset. |
replace |
replace |
HDF5DataFrame object
# libraries library(rhdf5) library(HDF5Array) library(HDF5DataFrame) # h5 output_hdf5 <- tempfile(fileext = ".h5") # data data("chickwts") metadata <- chickwts # write data frame to HDF5 metadata_large <- writeHDF5DataFrame(metadata, filepath = output_hdf5, name = "metadata")# libraries library(rhdf5) library(HDF5Array) library(HDF5DataFrame) # h5 output_hdf5 <- tempfile(fileext = ".h5") # data data("chickwts") metadata <- chickwts # write data frame to HDF5 metadata_large <- writeHDF5DataFrame(metadata, filepath = output_hdf5, name = "metadata")