Index

What is it for?

In order to perform actual searches on Spans, i.e. Document portions, you need to create an Index. The most important method in an Index is the find method, which returns results, as in the following example:

my_index.find("railway museum", n=1)

Why does it need an Index? For some kinds of searches, this allows the Vectorian to perform various optimizations under the hood - quite similar to an index in a database system. Certain kinds of Index objects that are expensive to create can also be saved and loaded, but this is beyond the scope of this introduction.

Constructing an Index

An Index is created from two components, a Partition and a SpanSimilarity:

  • The given Partition indicates the granularity of search and which items should get indexed for searching (see the section on Documents for more details). In short, Partition models how to create Spans from Documents.
  • The SpanSimilarity models the approach taken to compute the similarity of two Spans (e.g. a specific sort of alignment). See the section on Span Similarity for more details.

Here is an example (my_span_sim is an instance of SpanSimilarity):

my_index = session.partition("document").index(my_span_sim, nlp)