Beginner's Guide to use Postgres full text search in Golang

Why Postgres full-text search?

Postgres full-text search might not be the first preference for developers looking to perform the extensive search, so you might wonder what's the purpose of using it in your application over popular search engines like Elastic search or Solr. Let's have a look at few reasons why you might want to use it.

  • It's lightweight
  • Gets the job done
  • If you are already using PostgreSQL in your app and do not intend to add extra dependencies.


Searching contents in nested models

Let us develop a full-text search feature which searches for contents in nested models (A model consisting of fields having Foreign key associations with another model).

Here at Aircto, we provide on-demand technical interviewers and speed up the hiring process of our clients.

Interviews involve applicants who have applied to our client's company.
Now "applicant" here is a combination of job (which they applied to) and candidate (which has a foreign key association with the user).

Since we are using Golang along with postgres, the following codes are written in Golang. You can easily convert them into raw sql statements if you follow up with these examples.

Let us create a model for the applicant first.

type Applicant struct {  
    Model
    CandidateId      uint       `json:"candidate_id"`
    Candidate        *User      `json:"candidate,omitempty"`
    Status           string     `json:"status" sql:"not null"`
    }

Note that I've ignored all other fields and considered only an excerpt of the actual data for the sake of simplicity. You'll also need to db.save(), add foreign key associations, and perform migrations to make it work.

Model struct would look something like this and it is reusable.

type Model struct {  
    Id        uint       `json:"id" sql:"primary_key"`
    CreatedAt time.Time  `json:"created_at"`
    UpdatedAt time.Time  `json:"updated_at"`
    DeletedAt *time.Time `json:"deleted_at"`
}

Candidate will have foreign key association with the user model.

type User struct {  
    Model
    Username      string   `json:"username"`
    FirstName     string   `json:"first_name"`
    LastName      string   `json:"last_name"`
    Email         string   `json:"email"`
    Password      string   `json:"password,omitempty"`
}

Now, our goal would be to have an applicant search feature which takes in FirstName,LastName or Email of the user as a query parameter and outputs a list of applicants as the search result.

Basics

Before implementing this feature, let us understand a few terminology and concepts related to postgres full-text search

What is a document?

"A document is a unit of searching in a full-text search system; for example,a magazine article or email message. The text search engine must be able to parse documents and store associations of lexemes (key words) with their parent document."

--postgresql.org

As it has been impeccably described in the official documentation, A document can be considered as the data we prepare against which we make the query to obtain results.
In our case, the "document" must contain FirstName, LastName and Email of the applicant.

What is tsvector and tsquery?

A tsvector is sorted list of distinct lexemes, and lexemes is the basic lexical unit of a language which may consist of several distinct words, for example, words "processing" and "processed" are derived from the same basic word "process".

#SELECT 'processing a process which cannot be processed '::tsvector;
                           tsvector                           
--------------------------------------------------------------
 'a' 'be' 'cannot' 'process' 'processed' 'processing' 'which'
(1 row)

Note that tsvector type itself does not perform any normalization. We use to_tsvector() to normalize raw document text

#SELECT to_tsvector('processing a process which cannot be processed');
        to_tsvector         
----------------------------
 'cannot':5 'process':1,3,7



A tsquery stores lexeme values that are meant to be searched. We can make use of & (AND), | (OR), and ! (NOT) operators to combine several words and parenthesis to group the operators.

# SELECT 'car & (bike | plane)'::tsquery;
           tsquery            
------------------------------
 'car' & ( 'bike' | 'plane' )
(1 row)


Match Operator @@

Full-text search is primarily based on match operator @@, tsvector(document) is matched against tsquery(query) and it returns true if it matches.

#SELECT 'a car , bike and a plane'::tsvector @@ 'car & bike'::tsquery;
 ?column? 
----------
 t
(1 row)


Using ts_debug to understand working

ts_debug() can be a useful tool to understand how the document is interpreted by tsvector.
For example, consider a simple word being debugged.

#SELECT ts_debug('english','interview');
                                    ts_debug                                     
---------------------------------------------------------------------------------
 (asciiword,"Word, all ASCII",interview,{english_stem},english_stem,{interview})
(1 row)

Now, we know it is being interpreted as a Word.

Let us try passing an email.

# SELECT ts_debug('english','nagaraj@aircto.com');
                                    ts_debug                                     
---------------------------------------------------------------------------------
 (email,"Email address",nagaraj@aircto.com,{simple},simple,{nagaraj@aircto.com})
(1 row)

This reveals something interesting, The presence of @ and .com makes its type as "Email address". This could pose some issues if you're dealing with search feature which takes in email as a query or document. We'll explore this issue further in the end with a possible solution for it.


Now that we have a basic understanding of its working, let us try to solve the problem stated in the beginning.

Gorm

Gorm is the most popular ORM library for Golang, and supports almost all the basic functionalities you would expect from an ORM.

In our case, the "document" would consist of user's email, first name and last name and "query" would be q.

db.Joins("JOIN users ON users.id = applicants.candidate_id").  
Where("to_tsvector(users.email) || to_tsvector(users.first_name) || to_tsvector(users.last_name) @@ to_tsquery(?)", q).  
        Group("applicants.id").
        Preload("Candidate").
        Find(&applicants)

Note that you can also create a separate column to store "document" values. In addition to_tsvector() can be preceded by setweight() to set priority if necessary.

Although this works fine, there are a few issues that need to be addressed.

  • There's no support for prefix matching,suppose the database contains "bruce" and you search for "bru",it doesn't respond with any result.

  • It doesn't handle spaces in the query.

    To fix these, we can write the following piece of code before the query statement,

    qlist := strings.Fields(q)
        for i := range qlist {
            qlist[i] = qlist[i] + ":*"
            q = strings.Join(qlist[:], "|")
        }

strings.Fields() splits up the string based on white-spaces(one or more), while :* takes care of prefix matching, and finally the list is joined using |(OR) condition.


Searching emails

There's one more issue that still persists which we had discussed earlier.

Prefix matching cannot be performed because emails are interpreted as type Email Address and not as Word, as we observed using ts_debug().

One workaround for this could be to use replace() to replace @ with empty string both in the "document" as well as "query".

 s.db.Joins("JOIN users ON users.id = applicants.candidate_id").
Where("(to_tsvector(replace(users.email,'@','')) || to_tsvector(users.first_name) || to_tsvector(users.last_name) @@ to_tsquery(replace(?,'@','')))", q).  
        Group("applicants.id").
        Preload("Candidate").
        Find(&applicants)

Further optimisation can be done by indexing using gin/gist.

Hope you found this brief walk-through beneficial enough to use postgres full-text search in your application.You can find a workable example here.

Testing is an important tool to maintain code quality. If you want to know about writing testable codes in Golang then you can read this article - Writing testable code in Golang .