@Namespace(value="arrow::dataset") @NoOffset @Properties(inherit=arrow_dataset.class) public class Scanner extends Pointer
\brief A scanner glues together several dataset classes to load in data. The dataset contains a collection of fragments and partitioning rules. The fragments identify independently loadable units of data (i.e. each fragment has a potentially unique schema and possibly even format. It should be possible to read fragments in parallel if desired). The fragment's format contains the logic necessary to actually create a task to load the fragment into memory. That task may or may not support parallel execution of its own. The scanner is then responsible for creating scan tasks from every fragment in the dataset and (potentially) sequencing the loaded record batches together. The scanner should not buffer the entire dataset in memory (unless asked) instead yielding record batches as soon as they are ready to scan. Various readahead properties control how much data is allowed to be scanned before pausing to let a slow consumer catchup. Today the scanner also handles projection & filtering although that may change in the future.
Pointer.CustomDeallocator, Pointer.Deallocator, Pointer.NativeDeallocator, Pointer.ReferenceCounter
Constructor and Description |
---|
Scanner(Pointer p)
Pointer cast constructor.
|
Modifier and Type | Method and Description |
---|---|
LongResult |
CountRows()
\brief Count rows matching a predicate.
|
Dataset |
dataset()
\brief Get the dataset that this scanner will scan
|
TableResult |
Head(long num_rows)
\brief Get the first N rows.
|
ScanOptions |
options()
\brief Get the options for this scan.
|
ScanTaskIteratorResult |
Scan()
Deprecated.
|
Status |
Scan(arrow_dataset.TaggedRecordBatchVisitor visitor)
\brief Apply a visitor to each RecordBatch as it is scanned.
|
TaggedRecordBatchIteratorResult |
ScanBatches()
\brief Scan the dataset into a stream of record batches.
|
TaggedRecordBatchGeneratorResult |
ScanBatchesAsync() |
EnumeratedRecordBatchIteratorResult |
ScanBatchesUnordered()
\brief Scan the dataset into a stream of record batches.
|
EnumeratedRecordBatchGeneratorResult |
ScanBatchesUnorderedAsync() |
TableResult |
TakeRows(Array indices)
\brief A convenience to synchronously load the given rows by index.
|
RecordBatchReaderSharedResult |
ToRecordBatchReader()
\brief Convert the Scanner to a RecordBatchReader so it can be
easily used with APIs that expect a reader.
|
TableResult |
ToTable()
\brief Convert a Scanner into a Table.
|
address, asBuffer, asByteBuffer, availablePhysicalBytes, calloc, capacity, capacity, close, deallocate, deallocate, deallocateReferences, deallocator, deallocator, equals, fill, formatBytes, free, getDirectBufferAddress, getPointer, getPointer, getPointer, getPointer, hashCode, interruptDeallocatorThread, isNull, isNull, limit, limit, malloc, maxBytes, maxPhysicalBytes, memchr, memcmp, memcpy, memmove, memset, offsetAddress, offsetof, offsetof, parseBytes, physicalBytes, physicalBytesInaccurate, position, position, put, realloc, referenceCount, releaseReference, retainReference, setNull, sizeof, sizeof, toString, totalBytes, totalCount, totalPhysicalBytes, withDeallocator, zero
public Scanner(Pointer p)
Pointer(Pointer)
.@Deprecated @ByVal public ScanTaskIteratorResult Scan()
@ByVal public Status Scan(@ByVal arrow_dataset.TaggedRecordBatchVisitor visitor)
@ByVal public TableResult ToTable()
@ByVal public TaggedRecordBatchIteratorResult ScanBatches()
@ByVal public TaggedRecordBatchGeneratorResult ScanBatchesAsync()
@ByVal public EnumeratedRecordBatchIteratorResult ScanBatchesUnordered()
@ByVal public EnumeratedRecordBatchGeneratorResult ScanBatchesUnorderedAsync()
@ByVal public TableResult TakeRows(@Const @ByRef Array indices)
@ByVal public TableResult Head(@Cast(value="int64_t") long num_rows)
@ByVal public LongResult CountRows()
@ByVal public RecordBatchReaderSharedResult ToRecordBatchReader()
@SharedPtr public ScanOptions options()
@Const @SharedPtr @ByRef public Dataset dataset()
Copyright © 2022. All rights reserved.