Package io.github.jbellis.jvector.pq
Class ProductQuantization
java.lang.Object
io.github.jbellis.jvector.pq.ProductQuantization
- All Implemented Interfaces:
VectorCompressor<byte[]>
Product Quantization for float vectors. Supports arbitrary source and target dimensionality;
in particular, the source does not need to be evenly divisible by the target.
Codebook cluster count is fixed at 256.
-
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionstatic ProductQuantizationcompute(RandomAccessVectorValues<float[]> ravv, int M, boolean globallyCenter) Initializes the codebooks by clustering the input data using Product Quantization.static ProductQuantizationcompute(RandomAccessVectorValues<float[]> ravv, int M, boolean globallyCenter, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Initializes the codebooks by clustering the input data using Product Quantization.createCompressedVectors(Object[] compressedVectors) voiddecode(byte[] encoded, float[] target) Decodes the quantized representation (byte array) to its approximate original vector.byte[]encode(float[] vector) Encodes the input vector using the PQ codebooks.byte[][]encodeAll(List<float[]> vectors, ForkJoinPool simdExecutor) Encodes the given vectors in parallel using the PQ codebooks.booleanfloat[]intintinthashCode()static ProductQuantizationlongrefine(RandomAccessVectorValues<float[]> ravv) Create a new PQ by fine-tuning this one with the data in `ravv`refine(RandomAccessVectorValues<float[]> ravv, int lloydsRounds, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Create a new PQ by fine-tuning this one with the data in `ravv`toString()voidwrite(DataOutput out) Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface io.github.jbellis.jvector.pq.VectorCompressor
encodeAll
-
Field Details
-
MAX_PQ_TRAINING_SET_SIZE
public static final int MAX_PQ_TRAINING_SET_SIZE- See Also:
-
-
Method Details
-
compute
public static ProductQuantization compute(RandomAccessVectorValues<float[]> ravv, int M, boolean globallyCenter) Initializes the codebooks by clustering the input data using Product Quantization.- Parameters:
ravv- the vectors to quantizeM- number of subspacesgloballyCenter- whether to center the vectors globally before quantization (not recommended when using the quantization for dot product)
-
compute
public static ProductQuantization compute(RandomAccessVectorValues<float[]> ravv, int M, boolean globallyCenter, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Initializes the codebooks by clustering the input data using Product Quantization.- Parameters:
ravv- the vectors to quantizeM- number of subspacesgloballyCenter- whether to center the vectors globally before quantization (not recommended when using the quantization for dot product)simdExecutor- ForkJoinPool instance for SIMD operations, best is to use a pool with the size of the number of physical cores.parallelExecutor- ForkJoinPool instance for parallel stream operations
-
refine
Create a new PQ by fine-tuning this one with the data in `ravv` -
refine
public ProductQuantization refine(RandomAccessVectorValues<float[]> ravv, int lloydsRounds, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Create a new PQ by fine-tuning this one with the data in `ravv`- Parameters:
lloydsRounds- number of Lloyd's iterations to run against the new data. Suggested values are 1 or 2.
-
createCompressedVectors
- Specified by:
createCompressedVectorsin interfaceVectorCompressor<byte[]>- Parameters:
compressedVectors- must match the type T for this VectorCompressor, but it is declared as Object because we want callers to be able to use this without committing to a specific type T.
-
encodeAll
Encodes the given vectors in parallel using the PQ codebooks.- Specified by:
encodeAllin interfaceVectorCompressor<byte[]>
-
encode
public byte[] encode(float[] vector) Encodes the input vector using the PQ codebooks.- Specified by:
encodein interfaceVectorCompressor<byte[]>- Returns:
- one byte per subspace
-
decode
public void decode(byte[] encoded, float[] target) Decodes the quantized representation (byte array) to its approximate original vector. -
getSubspaceCount
public int getSubspaceCount()- Returns:
- how many bytes we are compressing to
-
write
- Specified by:
writein interfaceVectorCompressor<byte[]>- Throws:
IOException
-
load
- Throws:
IOException
-
equals
-
hashCode
public int hashCode() -
getCenter
public float[] getCenter() -
memorySize
public long memorySize() -
getClusterCount
public int getClusterCount() -
toString
-