Package io.github.jbellis.jvector.graph
Class GraphIndexBuilder<T>
java.lang.Object
io.github.jbellis.jvector.graph.GraphIndexBuilder<T>
- Type Parameters:
T- the type of vector
Builder for Concurrent GraphIndex. See
GraphIndex for a high level overview, and the
comments to `addGraphNode` for details on the concurrent building approach.-
Constructor Summary
ConstructorsConstructorDescriptionGraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph. -
Method Summary
Modifier and TypeMethodDescriptionlongaddGraphNode(int node, RandomAccessVectorValues<T> vectors) Inserts a node with the given vector value to the graph.build()voidcleanup()Cleanup the graph by completing removal of marked-for-delete nodes, trimming neighbor sets to the advertised degree, and updating the entry node.getGraph()voidimproveConnections(int node) intNumber of inserts in progress, across all threads.voidvoidmarkNodeDeleted(int node) protected floatscoreBetween(T v1, T v2)
-
Constructor Details
-
GraphIndexBuilder
public GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
vectorValues- the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.M- – the maximum number of connections a node can havebeamWidth- the size of the beam search to use when finding nearest neighbors.neighborOverflow- the ratio of extra neighbors to allow temporarily when inserting a node. larger values will build more efficiently, but use more memory.alpha- how aggressive pruning diverse neighbors should be. Set alpha > 1.0 to allow longer edges. If alpha = 1.0 then the equivalent of the lowest level of an HNSW graph will be created, which is usually not what you want.
-
GraphIndexBuilder
public GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
vectorValues- the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.M- – the maximum number of connections a node can havebeamWidth- the size of the beam search to use when finding nearest neighbors.neighborOverflow- the ratio of extra neighbors to allow temporarily when inserting a node. larger values will build more efficiently, but use more memory.alpha- how aggressive pruning diverse neighbors should be. Set alpha > 1.0 to allow longer edges. If alpha = 1.0 then the equivalent of the lowest level of an HNSW graph will be created, which is usually not what you want.simdExecutor- ForkJoinPool instance for SIMD operations, best is to use a pool with the size of the number of physical cores.parallelExecutor- ForkJoinPool instance for parallel stream operations
-
-
Method Details
-
build
-
cleanup
public void cleanup()Cleanup the graph by completing removal of marked-for-delete nodes, trimming neighbor sets to the advertised degree, and updating the entry node.Uses default threadpool to process nodes in parallel. There is currently no way to restrict this to a single thread.
Must be called before writing to disk.
May be called multiple times, but should not be called during concurrent modifications to the graph.
-
getGraph
-
insertsInProgress
public int insertsInProgress()Number of inserts in progress, across all threads. -
addGraphNode
Inserts a node with the given vector value to the graph.To allow correctness under concurrency, we track in-progress updates in a ConcurrentSkipListSet. After adding ourselves, we take a snapshot of this set, and consider all other in-progress updates as neighbor candidates.
- Parameters:
node- the node ID to addvectors- the set of vectors- Returns:
- an estimate of the number of extra bytes used by the graph after adding the given node
-
improveConnections
public void improveConnections(int node) -
markNodeDeleted
public void markNodeDeleted(int node) -
scoreBetween
-
load
- Throws:
IOException
-