Class GraphIndexBuilder<T>

java.lang.Object
io.github.jbellis.jvector.graph.GraphIndexBuilder<T>
Type Parameters:
T - the type of vector

public class GraphIndexBuilder<T> extends Object
Builder for Concurrent GraphIndex. See GraphIndex for a high level overview, and the comments to `addGraphNode` for details on the concurrent building approach.
  • Constructor Details

    • GraphIndexBuilder

      public GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha)
      Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.
      Parameters:
      vectorValues - the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.
      M - – the maximum number of connections a node can have
      beamWidth - the size of the beam search to use when finding nearest neighbors.
      neighborOverflow - the ratio of extra neighbors to allow temporarily when inserting a node. larger values will build more efficiently, but use more memory.
      alpha - how aggressive pruning diverse neighbors should be. Set alpha > 1.0 to allow longer edges. If alpha = 1.0 then the equivalent of the lowest level of an HNSW graph will be created, which is usually not what you want.
    • GraphIndexBuilder

      public GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor)
      Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.
      Parameters:
      vectorValues - the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.
      M - – the maximum number of connections a node can have
      beamWidth - the size of the beam search to use when finding nearest neighbors.
      neighborOverflow - the ratio of extra neighbors to allow temporarily when inserting a node. larger values will build more efficiently, but use more memory.
      alpha - how aggressive pruning diverse neighbors should be. Set alpha > 1.0 to allow longer edges. If alpha = 1.0 then the equivalent of the lowest level of an HNSW graph will be created, which is usually not what you want.
      simdExecutor - ForkJoinPool instance for SIMD operations, best is to use a pool with the size of the number of physical cores.
      parallelExecutor - ForkJoinPool instance for parallel stream operations
  • Method Details

    • build

      public OnHeapGraphIndex<T> build()
    • cleanup

      public void cleanup()
      Cleanup the graph by completing removal of marked-for-delete nodes, trimming neighbor sets to the advertised degree, and updating the entry node.

      Uses default threadpool to process nodes in parallel. There is currently no way to restrict this to a single thread.

      Must be called before writing to disk.

      May be called multiple times, but should not be called during concurrent modifications to the graph.

    • getGraph

      public OnHeapGraphIndex<T> getGraph()
    • insertsInProgress

      public int insertsInProgress()
      Number of inserts in progress, across all threads.
    • addGraphNode

      public long addGraphNode(int node, RandomAccessVectorValues<T> vectors)
      Inserts a node with the given vector value to the graph.

      To allow correctness under concurrency, we track in-progress updates in a ConcurrentSkipListSet. After adding ourselves, we take a snapshot of this set, and consider all other in-progress updates as neighbor candidates.

      Parameters:
      node - the node ID to add
      vectors - the set of vectors
      Returns:
      an estimate of the number of extra bytes used by the graph after adding the given node
    • improveConnections

      public void improveConnections(int node)
    • markNodeDeleted

      public void markNodeDeleted(int node)
    • scoreBetween

      protected float scoreBetween(T v1, T v2)
    • load

      public void load(RandomAccessReader in) throws IOException
      Throws:
      IOException