T
- data type for rowSplits()
outputpublic final class UnicodeDecode<T extends Number> extends PrimitiveOp
The character codepoints for all strings are returned using a single vector `char_values`, with strings expanded to characters in row-major order.
The `row_splits` tensor indicates where the codepoints for each input string begin and end within the `char_values` tensor. In particular, the values for the `i`th string (in row-major order) are stored in the slice `[row_splits[i]:row_splits[i+1]]`. Thus:
Modifier and Type | Class and Description |
---|---|
static class |
UnicodeDecode.Options
Optional attributes for
UnicodeDecode |
operation
Modifier and Type | Method and Description |
---|---|
Output<Integer> |
charValues()
A 1D int32 Tensor containing the decoded codepoints.
|
static <T extends Number> |
create(Scope scope,
Operand<String> input,
String inputEncoding,
Class<T> Tsplits,
UnicodeDecode.Options... options)
Factory method to create a class wrapping a new UnicodeDecode operation.
|
static UnicodeDecode<Long> |
create(Scope scope,
Operand<String> input,
String inputEncoding,
UnicodeDecode.Options... options)
Factory method to create a class wrapping a new UnicodeDecode operation using default output types.
|
static UnicodeDecode.Options |
errors(String errors) |
static UnicodeDecode.Options |
replaceControlCharacters(Boolean replaceControlCharacters) |
static UnicodeDecode.Options |
replacementChar(Long replacementChar) |
Output<T> |
rowSplits()
A 1D int32 tensor containing the row splits.
|
equals, hashCode, op, toString
public static <T extends Number> UnicodeDecode<T> create(Scope scope, Operand<String> input, String inputEncoding, Class<T> Tsplits, UnicodeDecode.Options... options)
scope
- current scopeinput
- The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding
- Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.Tsplits
- options
- carries optional attributes valuespublic static UnicodeDecode<Long> create(Scope scope, Operand<String> input, String inputEncoding, UnicodeDecode.Options... options)
scope
- current scopeinput
- The text to be decoded. Can have any shape. Note that the output is flattened
to a vector of char values.inputEncoding
- Text encoding of the input strings. This is any of the encodings supported
by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.options
- carries optional attributes valuespublic static UnicodeDecode.Options errors(String errors)
errors
- Error handling policy when there is invalid formatting found in the input.
The value of 'strict' will cause the operation to produce a InvalidArgument
error on any invalid input formatting. A value of 'replace' (the default) will
cause the operation to replace any invalid formatting in the input with the
`replacement_char` codepoint. A value of 'ignore' will cause the operation to
skip any invalid formatting in the input and produce no corresponding output
character.public static UnicodeDecode.Options replacementChar(Long replacementChar)
replacementChar
- The replacement character codepoint to be used in place of any invalid
formatting in the input when `errors='replace'`. Any valid unicode codepoint may
be used. The default value is the default unicode replacement character is
0xFFFD or U+65533.)public static UnicodeDecode.Options replaceControlCharacters(Boolean replaceControlCharacters)
replaceControlCharacters
- Whether to replace the C0 control characters (00-1F) with the
`replacement_char`. Default is false.Copyright © 2022. All rights reserved.