Class ProductQuantization

java.lang.Object
io.github.jbellis.jvector.pq.ProductQuantization
All Implemented Interfaces:
VectorCompressor<byte[]>

public class ProductQuantization extends Object implements VectorCompressor<byte[]>
Product Quantization for float vectors. Supports arbitrary source and target dimensionality; in particular, the source does not need to be evenly divisible by the target.

Codebook cluster count is fixed at 256.

  • Method Details

    • compute

      public static ProductQuantization compute(RandomAccessVectorValues<float[]> ravv, int M, boolean globallyCenter)
      Initializes the codebooks by clustering the input data using Product Quantization.
      Parameters:
      ravv - the vectors to quantize
      M - number of subspaces
      globallyCenter - whether to center the vectors globally before quantization (not recommended when using the quantization for dot product)
    • compute

      public static ProductQuantization compute(RandomAccessVectorValues<float[]> ravv, int M, boolean globallyCenter, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor)
      Initializes the codebooks by clustering the input data using Product Quantization.
      Parameters:
      ravv - the vectors to quantize
      M - number of subspaces
      globallyCenter - whether to center the vectors globally before quantization (not recommended when using the quantization for dot product)
      simdExecutor - ForkJoinPool instance for SIMD operations, best is to use a pool with the size of the number of physical cores.
      parallelExecutor - ForkJoinPool instance for parallel stream operations
    • createCompressedVectors

      public CompressedVectors createCompressedVectors(Object[] compressedVectors)
      Specified by:
      createCompressedVectors in interface VectorCompressor<byte[]>
      Parameters:
      compressedVectors - must match the type T for this VectorCompressor, but it is declared as Object because we want callers to be able to use this without committing to a specific type T.
    • encodeAll

      public byte[][] encodeAll(List<float[]> vectors, ForkJoinPool simdExecutor)
      Encodes the given vectors in parallel using the PQ codebooks.
      Specified by:
      encodeAll in interface VectorCompressor<byte[]>
    • encode

      public byte[] encode(float[] vector)
      Encodes the input vector using the PQ codebooks.
      Specified by:
      encode in interface VectorCompressor<byte[]>
      Returns:
      one byte per subspace
    • decode

      public void decode(byte[] encoded, float[] target)
      Decodes the quantized representation (byte array) to its approximate original vector.
    • getSubspaceCount

      public int getSubspaceCount()
      Returns:
      how many bytes we are compressing to
    • write

      public void write(DataOutput out) throws IOException
      Specified by:
      write in interface VectorCompressor<byte[]>
      Throws:
      IOException
    • load

      public static ProductQuantization load(RandomAccessReader in) throws IOException
      Throws:
      IOException
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • getCenter

      public float[] getCenter()
    • memorySize

      public long memorySize()
    • toString

      public String toString()
      Overrides:
      toString in class Object