Index
What is it for?
In order to perform actual searches on Span
s, i.e. Document
portions,
you need to create an Index
. The most important method in an Index
is
the find
method, which returns results, as in the following example:
my_index.find("railway museum", n=1)
Why does it need an Index
? For some kinds of searches, this allows the
Vectorian to perform various optimizations under the hood - quite similar
to an index in a database system. Certain kinds of Index
objects that are
expensive to create can also be saved and loaded, but this is beyond the
scope of this introduction.
Constructing an Index
An Index
is created from two components, a Partition
and a SpanSimilarity
:
- The given
Partition
indicates the granularity of search and which items should get indexed for searching (see the section on Documents for more details). In short,Partition
models how to createSpan
s fromDocument
s. - The
SpanSimilarity
models the approach taken to compute the similarity of twoSpan
s (e.g. a specific sort of alignment). See the section on Span Similarity for more details.
Here is an example (my_span_sim
is an instance of SpanSimilarity
):
my_index = session.partition("document").index(my_span_sim, nlp)