Welcome to the Search.io beta. This guide will help you get started with the new experience and highlight some of the new capabilities that are available in search.io. Let’s get started!

Pick your data-set

First, you will need to decide what data you want to use. You can download an example data-set provided by us or bring your own data.

Top 1000 movies from IMDB

It’s best to explore search.io with a data set you are familiar with. If you aren’t bringing your own data, we are sure you will be familiar with some of the movies on the IMDB top 1000 list.


Bring your own

If you decide to explore search.io with your own data, please ensure the data is available in a JSON format without nested fields.

For example, a record should look like this:

    "show_id": 80192018,
    "type": "Movie",
    "title": "Star Wars: Episode VIII: The Last Jedi",
    "director": "Rian Johnson",
    "cast": "Mark Hamill, Carrie Fisher, Adam Driver, Daisy Ridley, John Boyega, Oscar Isaac, Andy Serkis, Lupita Nyong'o, Domhnall Gleeson, Anthony Daniels, Gwendoline Christie, Kelly Marie Tran, Laura Dern, Frank Oz, Benicio Del Toro, Warwick Davis, Noah Segan, Jimmy Vee, Joonas Suotamo, Joseph Gordon-Levitt, Tim Rose, Paul Kasey, Matthew Sharp, Adrian Edmondson, Amanda Lawrence, Justin Theroux",
    "country": "United States",
    "date_added": "June 26, 2018",
    "release_year": 2017,
    "rating": "PG-13",
    "duration": "152 min",
    "listed_in": "Action & Adventure, Children & Family Movies, Sci-Fi & Fantasy",
    "description": "As the remnants of the Resistance flee Kylo Ren and the First Order, Rey seeks out Luke Skywalker – but he wants nothing more to do with the Force."

Create your account

Head over to https://www.search.io and sign up. If you already have an account, sign in with your existing account.

Create a new collection

Collections store the records that you want to search through.

They also contain the configuration associated with your data including pipelines, rules, synonyms, authorized domains, and analytics. Each collection has an associated schema that designates field names, field types, and whether a field's data is indexed for text search.

Click activate the private beta to enable access for your account.

Once activated, you will be able to select Join Beta to create a new App or Store collection.

Uploading your data

After entering a name for your collection, you can upload your data set. Simply drag and drop the movie data set or your own file onto the screen to upload it.

Click Generate schema. Once the file is uploaded, search.io will infer the schema from the data you just provided.

Verify your schema

Although search.io will do it’s best to identify field types, lists, and unique fields, it’s important that you verify that the fields are correct. If the fields don’t match the structure of the records you are uploading, the records will be rejected.

Select searchable fields and train query suggestions

Select which fields you want to use for searching, and rank them in order of priority. The order will determine the weight assigned to each of those fields for the initial configuration of the search algorithm.

Query suggestions are typically a subset of the searchable fields and will be used to train the suggestions search.io makes when users type a search query into the search box. Common fields here titles, product names, companies, brands, or categories.


You’ve completed the initial setup!

Optimize your search pipeline

In search.io, you configure your search algorithm using pipelines. Pipelines are easily configurable YAML-based scripts that define a series of steps that are executed sequentially when indexing a record (record pipeline) or performing a query (query pipeline). The configuration of an intelligent search algorithm can be extremely complicated. Pipelines break down this problem into smaller pieces that can be easily mixed, matched, and combined to create an incredibly powerful search experience.

Select Pipelines (beta) in the navigation to get started.

In the editor you can find the query pipeline that was generated by the onboarding process. A few things have already been set up for you automatically.

At the top of the file are a number of default steps including filters and pagination that allow you to customize those aspects of the search. Additionally, steps for spelling and synonyms have already been added and will work out of the box.

Lastly, a number of boosts have been added to account for the weighting you gave different fields in the searchable fields setup step.

Learn more about pipelines

Pipeline types

search.io leverages two types of pipelines to provide that flexibility during indexing and querying time.

Record pipelines

The record pipeline can update and augment information as it is indexed. Steps can include:

  • data transformation - e.g. trimming a title

  • data enrichment - generate a lat and long from an address

  • classification - labeling content with a category based on an existing model

  • vectorization - clustering uncategorized records

  • image recognition - detection of objects and faces, read printed and handwritten text, extraction of metadata

Query pipelines

Query pipelines define the query execution and results ranking strategies used when searching the records in your collection. Steps in a query pipeline can be used for:

  • Query understanding - query rewrites, spelling, NLP, …

  • Filtering results - based on any attribute in the index. For example location or customer-specific results.

  • Changing the relevance logic - dynamically boost different aspects based on the search query, parameters or data models

  • Constructing the engine query - as opposed to the input query, the engine query is what is actually executed, it can be extremely complex


Steps are a unit of work in the pipeline flow that is responsible to perform the individual tasks listed above.

Steps are made up of several components:

  1. Constants  -  are used to configure steps. They are fixed and can't be changed at query time.

  2. Params  - Params are key-value pairs that are initialized with the request and are passed from step to step. Each step has the ability to add or modify params, passing them on to subsequent steps. Once the pipeline has been executed, the modified params become available as output values of the pipeline.  

  3. Conditions  - Each step can be conditionally executed based on the input values. Conditions are boolean expressions that can be defined using operators (AND/OR and =,~,>,<,!=, etc) to evaluate the pipeline param values. If the condition is satisfied, then the step will execute, otherwise, it is bypassed.

Pre-steps and Post-steps

Pre- and Post- steps split the pipeline into two parts. One that runs before the request is sent to the search index and another that runs afterward.

Query pipelines

When running a query, the pipeline post-steps have access to the result-set. This makes it possible to act on the result before sending them back to the caller.

Record pipelines

For indexing operations, pre-steps are used to update and augment the record before it is stored in the index. The pipeline post-steps only run when creating new records. They do not execute when updating records.

Boosting popularity

Let’s use an example search with the term star wars (assuming you selected the movie data set above, if not, simply follow along with an example matching your own data set).

The search preview will show the following results:

Let’s be honest, the likelihood that somebody searching for star wars is interested in “Star Wars: Episode I - The Phantom Menace” is quite low. 😉

We can fix that by taking popularity into account and give more weight to movies that match the search term but have higher popularity.

Simply add the following step to your pipeline anywhere before the postSteps: section.

- id: range-boost
    - value: popularity
    - value: "0.4"
    - value: "0"
    - value: "100"   

The above step simply adds boosting for a particular value range to a specified field. In this case, the field is popularity and we give it an overall score of 0.4. However, that score will only be reached if the popularity equals or exceeds the end value of 100 and scales linearly from the start value of 0.

Once we add this step, we can see how the results instantly change to the following.

Looking a lot better!

Range boost is only one of many powerful steps that can be used to improve results and ultimately create a better search experience for your customers. Using live relevance editing and seeing results in real-time makes it easy to explore the different steps and understand the impact a change will have on search results.

API Reference

The REST API enables you to sync your data continuously with search.io.


If you want to explore and play with the API, download the OpenAPI Spec and import it into a tool like Postman.

Additional SDKs

We are currently working on a Node SDK and have plans to add support for .Net and PHP soon after. Please let us know if you require an SDK in a particular language.