public static class FixedUnigramCandidateSampler.Options extends Object
FixedUnigramCandidateSampler
Modifier and Type | Method and Description |
---|---|
FixedUnigramCandidateSampler.Options |
distortion(Float distortion) |
FixedUnigramCandidateSampler.Options |
numReservedIds(Long numReservedIds) |
FixedUnigramCandidateSampler.Options |
numShards(Long numShards) |
FixedUnigramCandidateSampler.Options |
seed(Long seed) |
FixedUnigramCandidateSampler.Options |
seed2(Long seed2) |
FixedUnigramCandidateSampler.Options |
shard(Long shard) |
FixedUnigramCandidateSampler.Options |
unigrams(List<Float> unigrams) |
FixedUnigramCandidateSampler.Options |
vocabFile(String vocabFile) |
public FixedUnigramCandidateSampler.Options vocabFile(String vocabFile)
vocabFile
- Each valid line in this file (which should have a CSV-like format)
corresponds to a valid word ID. IDs are in sequential order, starting from
num_reserved_ids. The last entry in each line is expected to be a value
corresponding to the count or relative probability. Exactly one of vocab_file
and unigrams needs to be passed to this op.public FixedUnigramCandidateSampler.Options distortion(Float distortion)
distortion
- The distortion is used to skew the unigram probability distribution.
Each weight is first raised to the distortion's power before adding to the
internal unigram distribution. As a result, distortion = 1.0 gives regular
unigram sampling (as defined by the vocab file), and distortion = 0.0 gives
a uniform distribution.public FixedUnigramCandidateSampler.Options numReservedIds(Long numReservedIds)
numReservedIds
- Optionally some reserved IDs can be added in the range [0,
..., num_reserved_ids) by the users. One use case is that a special unknown
word token is used as ID 0. These IDs will have a sampling probability of 0.public FixedUnigramCandidateSampler.Options numShards(Long numShards)
numShards
- A sampler can be used to sample from a subset of the original range
in order to speed up the whole computation through parallelism. This parameter
(together with 'shard') indicates the number of partitions that are being
used in the overall computation.public FixedUnigramCandidateSampler.Options shard(Long shard)
shard
- A sampler can be used to sample from a subset of the original range
in order to speed up the whole computation through parallelism. This parameter
(together with 'num_shards') indicates the particular partition number of a
sampler op, when partitioning is being used.public FixedUnigramCandidateSampler.Options unigrams(List<Float> unigrams)
unigrams
- A list of unigram counts or probabilities, one per ID in sequential
order. Exactly one of vocab_file and unigrams should be passed to this op.public FixedUnigramCandidateSampler.Options seed(Long seed)
seed
- If either seed or seed2 are set to be non-zero, the random number
generator is seeded by the given seed. Otherwise, it is seeded by a
random seed.public FixedUnigramCandidateSampler.Options seed2(Long seed2)
seed2
- An second seed to avoid seed collision.Copyright © 2022. All rights reserved.