@Namespace(value="arrow::dataset") @NoOffset @Properties(inherit=arrow_dataset.class) public class ParquetDatasetFactory extends DatasetFactory
_metadata cache file.
Dask and other systems will generate a cache metadata file by concatenating
the RowGroupMetaData of multiple parquet files into a single parquet file
that only contains metadata and no ColumnChunk data.
ParquetDatasetFactory creates a FileSystemDataset composed of
ParquetFileFragment where each fragment is pre-populated with the exact
number of row groups and statistics for each columns.Pointer.CustomDeallocator, Pointer.Deallocator, Pointer.NativeDeallocator, Pointer.ReferenceCounter| Constructor and Description |
|---|
ParquetDatasetFactory(Pointer p)
Pointer cast constructor.
|
| Modifier and Type | Method and Description |
|---|---|
DatasetResult |
Finish(FinishOptions options)
\brief Create a Dataset with the given options
|
SchemaVectorResult |
InspectSchemas(InspectOptions options)
\brief Get the schemas of the Fragments and Partitioning.
|
static DatasetFactoryResult |
Make(BytePointer metadata_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options) |
static DatasetFactoryResult |
Make(FileSource metadata,
BytePointer base_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options) |
static DatasetFactoryResult |
Make(FileSource metadata,
String base_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options)
\brief Create a ParquetDatasetFactory from a metadata source.
|
static DatasetFactoryResult |
Make(String metadata_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options)
\brief Create a ParquetDatasetFactory from a metadata path.
|
Finish, Finish, Inspect, Inspect, root_partition, SetRootPartitionaddress, asBuffer, asByteBuffer, availablePhysicalBytes, calloc, capacity, capacity, close, deallocate, deallocate, deallocateReferences, deallocator, deallocator, equals, fill, formatBytes, free, getDirectBufferAddress, getPointer, getPointer, getPointer, getPointer, hashCode, interruptDeallocatorThread, isNull, isNull, limit, limit, malloc, maxBytes, maxPhysicalBytes, memchr, memcmp, memcpy, memmove, memset, offsetAddress, offsetof, offsetof, parseBytes, physicalBytes, physicalBytesInaccurate, position, position, put, realloc, referenceCount, releaseReference, retainReference, setNull, sizeof, sizeof, toString, totalBytes, totalCount, totalPhysicalBytes, withDeallocator, zeropublic ParquetDatasetFactory(Pointer p)
Pointer(Pointer).@ByVal public static DatasetFactoryResult Make(@StdString String metadata_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
metadata_path will be read from filesystem. Each RowGroup
contained in the metadata file will be relative to dirname(metadata_path).metadata_path - [in] path of the metadata parquet filefilesystem - [in] from which to open/read the pathformat - [in] to read the file with.options - [in] see ParquetFactoryOptions@ByVal public static DatasetFactoryResult Make(@StdString BytePointer metadata_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
@ByVal public static DatasetFactoryResult Make(@Const @ByRef FileSource metadata, @StdString String base_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
metadata - [in] source to open the metadata parquet file frombase_path - [in] used as the prefix of every parquet files referencedfilesystem - [in] from which to read the files referenced.format - [in] to read the file with.options - [in] see ParquetFactoryOptions@ByVal public static DatasetFactoryResult Make(@Const @ByRef FileSource metadata, @StdString BytePointer base_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
@ByVal public SchemaVectorResult InspectSchemas(@ByVal InspectOptions options)
DatasetFactoryInspectSchemas in class DatasetFactory@ByVal public DatasetResult Finish(@ByVal FinishOptions options)
DatasetFactoryFinish in class DatasetFactoryCopyright © 2022. All rights reserved.