Skip to content

Latest commit

 

History

History

text-search

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
#Vespa

Vespa sample application - text search tutorial

This sample application contains the code for the text search tutorial. Please refer to the text search tutorial for more information.

See also the MS Marco Ranking sample application for ranking using state-of-the-art retrieval and ranking methods. There is also a Ranking with Transformers sample application.

The following is for deploying the end to end application including a custom front-end.

Prerequisites

  • Docker Desktop installed and running. 10GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
  • Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
  • Architecture: x86_64 or arm64
  • Minimum 10 GB memory dedicated to Docker (the default is 2 GB on Macs)
  • Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
  • python 3
  • Java 17 installed.
  • Apache Maven

Installing vespa-cli

This tutorial uses Vespa-CLI, Vespa CLI is the official command-line client for Vespa.ai. It is a single binary without any runtime dependencies and is available for Linux, macOS and Windows.

$ brew install vespa-cli 
$ vespa clone text-search text-search && cd text-search
$ ./bin/convert-msmarco.sh
$ docker run --detach --name vespa-msmarco --hostname vespa-msmarco \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19112:19112 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa
$ vespa deploy --wait 300 
$ vespa feed ext/vespa.json
$ vespa query 'yql=select title,url,id from msmarco where userQuery()' 'query=what is dad bod' 

Using Logstash to feed data

Instead of using the vespa feed command above, we can use Logstash to feed data. This way:

  • You don't need to convert the data to JSON via ./bin/convert-msmarco.sh.
  • You can more easily adapt this sample application to your own data (e.g. by making Logstash read from a different file database).

You'll need to install Logstash. Then:

  1. Install Logstash Output Plugin for Vespa via:
bin/logstash-plugin install logstash-output-vespa_feed
  1. Change logstash.conf to point to the absolute path of msmarco-docs.tsv.

  2. Run Logstash with the modified logstash.conf:

bin/logstash -f $PATH_TO_LOGSTASH_CONF/logstash.conf

Delete container

Remove app and data:

$ docker rm -f vespa-msmarco