Class GraphIndexBuilder<T>
- Type Parameters:
T- the type of vector
GraphIndex for a high level overview, and the
comments to `addGraphNode` for details on the concurrent building approach.
GIB allocates scratch space and copies of the RandomAccessVectorValues for each thread that calls `addGraphNode`. These allocations are retained until the GIB itself is no longer referenced. Under most conditions this is not something you need to worry about, but it does mean that spawning a new Thread per call is not advisable. This includes virtual threads.
-
Constructor Summary
ConstructorsConstructorDescriptionGraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph. -
Method Summary
Modifier and TypeMethodDescriptionlongaddGraphNode(int node, RandomAccessVectorValues<T> vectors) Inserts a node with the given vector value to the graph.build()voidcleanup()Cleanup the graph by completing removal of marked-for-delete nodes, trimming neighbor sets to the advertised degree, and updating the entry node.getGraph()voidimproveConnections(int node) intNumber of inserts in progress, across all threads.voidvoidmarkNodeDeleted(int node) protected floatscoreBetween(T v1, T v2)
-
Constructor Details
-
GraphIndexBuilder
public GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
vectorValues- the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.M- – the maximum number of connections a node can havebeamWidth- the size of the beam search to use when finding nearest neighbors.neighborOverflow- the ratio of extra neighbors to allow temporarily when inserting a node. larger values will build more efficiently, but use more memory.alpha- how aggressive pruning diverse neighbors should be. Set alpha > 1.0 to allow longer edges. If alpha = 1.0 then the equivalent of the lowest level of an HNSW graph will be created, which is usually not what you want.
-
GraphIndexBuilder
public GraphIndexBuilder(RandomAccessVectorValues<T> vectorValues, VectorEncoding vectorEncoding, VectorSimilarityFunction similarityFunction, int M, int beamWidth, float neighborOverflow, float alpha, ForkJoinPool simdExecutor, ForkJoinPool parallelExecutor) Reads all the vectors from vector values, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
vectorValues- the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.M- – the maximum number of connections a node can havebeamWidth- the size of the beam search to use when finding nearest neighbors.neighborOverflow- the ratio of extra neighbors to allow temporarily when inserting a node. larger values will build more efficiently, but use more memory.alpha- how aggressive pruning diverse neighbors should be. Set alpha > 1.0 to allow longer edges. If alpha = 1.0 then the equivalent of the lowest level of an HNSW graph will be created, which is usually not what you want.simdExecutor- ForkJoinPool instance for SIMD operations, best is to use a pool with the size of the number of physical cores.parallelExecutor- ForkJoinPool instance for parallel stream operations
-
-
Method Details
-
build
-
cleanup
public void cleanup()Cleanup the graph by completing removal of marked-for-delete nodes, trimming neighbor sets to the advertised degree, and updating the entry node.Uses default threadpool to process nodes in parallel. There is currently no way to restrict this to a single thread.
Must be called before writing to disk.
May be called multiple times, but should not be called during concurrent modifications to the graph or while executing concurrent searches on the graph.
-
getGraph
-
insertsInProgress
public int insertsInProgress()Number of inserts in progress, across all threads. -
addGraphNode
Inserts a node with the given vector value to the graph.To allow correctness under concurrency, we track in-progress updates in a ConcurrentSkipListSet. After adding ourselves, we take a snapshot of this set, and consider all other in-progress updates as neighbor candidates.
- Parameters:
node- the node ID to addvectors- the set of vectors- Returns:
- an estimate of the number of extra bytes used by the graph after adding the given node
-
improveConnections
public void improveConnections(int node) -
markNodeDeleted
public void markNodeDeleted(int node) -
scoreBetween
-
load
- Throws:
IOException
-