@Namespace(value="arrow::dataset") @NoOffset @Properties(inherit=arrow_dataset.class) public class ParquetDatasetFactory extends DatasetFactory
_metadata
cache file.
Dask and other systems will generate a cache metadata file by concatenating
the RowGroupMetaData of multiple parquet files into a single parquet file
that only contains metadata and no ColumnChunk data.
ParquetDatasetFactory creates a FileSystemDataset composed of
ParquetFileFragment where each fragment is pre-populated with the exact
number of row groups and statistics for each columns.Pointer.CustomDeallocator, Pointer.Deallocator, Pointer.NativeDeallocator, Pointer.ReferenceCounter
Constructor and Description |
---|
ParquetDatasetFactory(Pointer p)
Pointer cast constructor.
|
Modifier and Type | Method and Description |
---|---|
DatasetResult |
Finish(FinishOptions options)
\brief Create a Dataset with the given options
|
SchemaVectorResult |
InspectSchemas(InspectOptions options)
\brief Get the schemas of the Fragments and Partitioning.
|
static DatasetFactoryResult |
Make(BytePointer metadata_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options) |
static DatasetFactoryResult |
Make(FileSource metadata,
BytePointer base_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options) |
static DatasetFactoryResult |
Make(FileSource metadata,
String base_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options)
\brief Create a ParquetDatasetFactory from a metadata source.
|
static DatasetFactoryResult |
Make(String metadata_path,
FileSystem filesystem,
ParquetFileFormat format,
ParquetFactoryOptions options)
\brief Create a ParquetDatasetFactory from a metadata path.
|
Finish, Finish, Inspect, Inspect, root_partition, SetRootPartition
address, asBuffer, asByteBuffer, availablePhysicalBytes, calloc, capacity, capacity, close, deallocate, deallocate, deallocateReferences, deallocator, deallocator, equals, fill, formatBytes, free, getDirectBufferAddress, getPointer, getPointer, getPointer, getPointer, hashCode, interruptDeallocatorThread, isNull, isNull, limit, limit, malloc, maxBytes, maxPhysicalBytes, memchr, memcmp, memcpy, memmove, memset, offsetAddress, offsetof, offsetof, parseBytes, physicalBytes, physicalBytesInaccurate, position, position, put, realloc, referenceCount, releaseReference, retainReference, setNull, sizeof, sizeof, toString, totalBytes, totalCount, totalPhysicalBytes, withDeallocator, zero
public ParquetDatasetFactory(Pointer p)
Pointer(Pointer)
.@ByVal public static DatasetFactoryResult Make(@StdString String metadata_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
metadata_path
will be read from filesystem
. Each RowGroup
contained in the metadata file will be relative to dirname(metadata_path)
.metadata_path
- [in] path of the metadata parquet filefilesystem
- [in] from which to open/read the pathformat
- [in] to read the file with.options
- [in] see ParquetFactoryOptions@ByVal public static DatasetFactoryResult Make(@StdString BytePointer metadata_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
@ByVal public static DatasetFactoryResult Make(@Const @ByRef FileSource metadata, @StdString String base_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
metadata
- [in] source to open the metadata parquet file frombase_path
- [in] used as the prefix of every parquet files referencedfilesystem
- [in] from which to read the files referenced.format
- [in] to read the file with.options
- [in] see ParquetFactoryOptions@ByVal public static DatasetFactoryResult Make(@Const @ByRef FileSource metadata, @StdString BytePointer base_path, @SharedPtr FileSystem filesystem, @SharedPtr ParquetFileFormat format, @ByVal ParquetFactoryOptions options)
@ByVal public SchemaVectorResult InspectSchemas(@ByVal InspectOptions options)
DatasetFactory
InspectSchemas
in class DatasetFactory
@ByVal public DatasetResult Finish(@ByVal FinishOptions options)
DatasetFactory
Finish
in class DatasetFactory
Copyright © 2022. All rights reserved.