T
- data type for rowSplits()
outputpublic final class UnicodeDecodeWithOffsets<T extends Number> extends PrimitiveOp
The character codepoints for all strings are returned using a single vector `char_values`, with strings expanded to characters in row-major order. Similarly, the character start byte offsets are returned using a single vector `char_to_byte_starts`, with strings expanded in row-major order.
The `row_splits` tensor indicates where the codepoints and start offsets for each input string begin and end within the `char_values` and `char_to_byte_starts` tensors. In particular, the values for the `i`th string (in row-major order) are stored in the slice `[row_splits[i]:row_splits[i+1]]`. Thus:
Modifier and Type | Class and Description |
---|---|
static class |
UnicodeDecodeWithOffsets.Options
Optional attributes for
UnicodeDecodeWithOffsets |
operation
Modifier and Type | Method and Description |
---|---|
Output<Long> |
charToByteStarts()
A 1D int32 Tensor containing the byte index in the input string where each
character in `char_values` starts.
|
Output<Integer> |
charValues()
A 1D int32 Tensor containing the decoded codepoints.
|
static <T extends Number> |
create(Scope scope,
Operand<String> input,
String inputEncoding,
Class<T> Tsplits,
UnicodeDecodeWithOffsets.Options... options)
Factory method to create a class wrapping a new UnicodeDecodeWithOffsets operation.
|
static UnicodeDecodeWithOffsets<Long> |
create(Scope scope,
Operand<String> input,
String inputEncoding,
UnicodeDecodeWithOffsets.Options... options)
Factory method to create a class wrapping a new UnicodeDecodeWithOffsets operation using default output types.
|
static UnicodeDecodeWithOffsets.Options |
errors(String errors) |
static UnicodeDecodeWithOffsets.Options |
replaceControlCharacters(Boolean replaceControlCharacters) |
static UnicodeDecodeWithOffsets.Options |
replacementChar(Long replacementChar) |
Output<T> |
rowSplits()
A 1D int32 tensor containing the row splits.
|
equals, hashCode, op, toString
public static <T extends Number> UnicodeDecodeWithOffsets<T> create(Scope scope, Operand<String> input, String inputEncoding, Class<T> Tsplits, UnicodeDecodeWithOffsets.Options... options)
scope
- current scopeinput
- The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding
- Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.Tsplits
- options
- carries optional attributes valuespublic static UnicodeDecodeWithOffsets<Long> create(Scope scope, Operand<String> input, String inputEncoding, UnicodeDecodeWithOffsets.Options... options)
scope
- current scopeinput
- The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding
- Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.options
- carries optional attributes valuespublic static UnicodeDecodeWithOffsets.Options errors(String errors)
errors
- Error handling policy when there is invalid formatting found in the input.
The value of 'strict' will cause the operation to produce a InvalidArgument
error on any invalid input formatting. A value of 'replace' (the default) will
cause the operation to replace any invalid formatting in the input with the
`replacement_char` codepoint. A value of 'ignore' will cause the operation to
skip any invalid formatting in the input and produce no corresponding output
character.public static UnicodeDecodeWithOffsets.Options replacementChar(Long replacementChar)
replacementChar
- The replacement character codepoint to be used in place of any invalid
formatting in the input when `errors='replace'`. Any valid unicode codepoint may
be used. The default value is the default unicode replacement character is
0xFFFD or U+65533.)public static UnicodeDecodeWithOffsets.Options replaceControlCharacters(Boolean replaceControlCharacters)
replaceControlCharacters
- Whether to replace the C0 control characters (00-1F) with the
`replacement_char`. Default is false.Copyright © 2022. All rights reserved.