Sunday, November 24, 2013

Meteor Alternative to MongoDB Full Text Search

I was going to title this post Meteor Impact: What To Do When Your App Blows Up!.  I chose otherwise because the issue we ran into had nothing to do with Meteor and everything to do with how I had designed the application.  As well, it sent me down an interesting rabbit hole in search of a solution to the problems we were experiencing within our application and I felt the title needed to be a bit more indicative of that.

It was Friday September 13th and all 17 of my stores were running a Friday the 13th special consisting of $13 Friday the 13th tattoos and $13 off select body piercings.  This is something that a lot of tattoo studios do.  Every year has both a different number of Friday the 13ths and they are always in different months.  So, it ends up being a cult like event for our customers and they look forward to it.  Some stores have lines before they arrive and tattoo and pierce customers late into the night.  The designs are limited to small bad luck, #13, black cats, Jason masks and various other Friday the 13thy type subject matter.  In other words, it is a really busy day across all stores.

Our application, EtherPOS, had been behaving a bit sluggish and I was experimenting with different AWS instance sizes and monitoring CPU usage to determine what the problem might be.  I already knew that one problem was that I publish all of the documents in some collections.  For example, I was publishing all Customers to all clients so that the look up would be fast, the interface could be designed very intuitive with just typeaheads and without any buttons and so I could avoid server side paging.  I had already experimented with taking the Customers collection up to 15,000, 20,000 and even 50,000 records on the client and I knew that as you approach 10,000 things start to git sluggish.

The truly Friday the 13th part about this was that I was in Colorado for a buddies wedding and I had planned on hitting some serious single track downhill and try to break a bone or two when the proverbial shit hit the fan.  Go figure, it was Friday the 13th after all!

About mid-morning I started getting text messages and phone calls that it was taking more than a minute or two to add a new Customer or check a customer out.  Within a short time it was taking 5 to 10 minutes or the application would not respond at all.  At this point, I knew that there was definitely no way I was gonna have time to hit the mountains and show up at the wedding with a trophy cast or at least some trophy stitches unless I did something fast.

I checked the servers and the CPUs were indeed toppling over and the server was restarting.  I logged into the application and the client was as sluggish as molasses on a cold winter morning in upstate New York.  I knew that the culprit was the Customers collection.  A quick server side count showed 15,000 customers and it was climbing fast.  In fact, we added almost 2,000 new customers on that day which would also mean at least 2,000 invoices, invoice items and invoice payments.  So a total of 8,000 to 10,000 new docs in 12 to 18 hours.

I quickly scaled the server sizes up and added more servers behind the ELB which resolved the server overload issues but the client was still slow because of the size of the Customers collection.

I immediately thought of MongoDB Full Text Search or the alternative Model Data to support Keyword Search.  After researching the Model Data to support Keyword Search it was very clear that it would not solve the problem because that would work better for articles, blogs or applications more content oriented.  We are on Mongo version 2.4.3 so using FT would be possible, but FTS is in beta and I wasn't  comfortable with that.  I didn't want to implement a beta and have to worry about what might go wrong next as I was roaring down a mountain on my bike, or having a cast set or stitches put in or while I was at my buddies wedding.  I don't mind throwing caution to the wind but not when that caution is interfering with literal wind in my hair! I certainly wasn't going to have time to monitor the potential space or memory issues and even 10gen was suggesting to not use FTS in production systems. Finally, the number of articles about using FTS with Meteor were pretty limited to a couple of SO articles here and here.  While both are great posts and provide awesome solutions my gut said I needed to roll up my sleeves and roll my own solution.

It was important that the solution be responsive enough to meet the intuitive, quick type and no button interface on the client.  My staff needed to be able to quickly type parts of the customer names while the view renders the customers list so they can quickly find and select a customer.  It could not be like Google search where you type a phrase, hit a button and then receive results only to find you have to type a new search.  No, it had to be as quick and intuitive as the current typeahead which was implemented with all the Customer docs published to the client but of course the new implementation must work without publishing all the docs to the client.

First, I created a search by keyword method on the Meteor collection prototype.  In previous articles I wrote about extending the Meteor.Collection.prototype so I continued on that same strategy and created a simple searchByKeyword method.  This would allow me to implement it across any Collection in the future.

Meteor.Collection.prototype.searchByKeyword = function(options){
      if(options.keywords.length >= 3){
        check(options.keywords, String);         
        var query = {};
        var keywordArray = options.keywords.split(' ');
        for (var i = 0; i < options.fields.length; i++) {
            query[options.fields[i]] = new RegExp('^' + regExpQuoted(keywordArray[i]), 'i' );
        return this.find(query);
      } else {
        return null;
    } else {
      return null;
    return null;

As you can see it queries only on the fields and keywords passed to it.  I put this in my /proj/lib directory as part of all of my Meteor.collection.prototype extensions.  That way I can use it on the client as well as the server.

One caveat is that regexp queries do not use an index if they are case insensitive.  So, remove the 'i' from the regexp if you want to make sure an index is used.  I am using case insensitive because I ran an explain on the collection and it is 120ms without the index and 1ms with the index.  The 120ms is not sufficiently noticeable to my staff and therefore it is easier if they don't have to worry about case sensitivity.

This is especially important because for some reason some of them think everything should be typed in all capitals and some of them think capitals are not necessary.  I think some of them think the computer doesn't work unless you press the Caps Lock key and others think the Shift button will cause the computer to move across the counter as in literally shift!  Very few of the customer names are typed correctly.  I imagine one day, I might have to remove the 'i' or make it an options argument so it can be turned on and off.  When that day comes, I will probably also have to force all upper case programmatically.

The method takes an options argument like this.

   fields: ['name', 'last_name', 'email', 'phone'],
   keywords:  'steeve cannon 555.555.5555'

Second, I rewrote the template customer search tempalte and changed the typeahead to an input field so that it just sets a session variable which reactively triggers the subscription which is wrapped in an autorun.  In order to abstract this sufficiently to use this design across other tools and collections I implemented a keyword_search template.  So, I can drop this in any template to implement keyword search and trigger the autorun to rerun the subscription and the template can then render the results in real-time based on the keywords the user is typing.  Hyper intuitive and responsive!

<template name='keyword_search'>
    <input tabindex="1" type="text" name="txtKeyWordSearch" id="txtKeyWordSearch" class="input-block-level" rel="tooltip" title="Enter keywords separated by spaces corresponding to the column names to find records." placeholder="Keywords separated by spaces corresponding to the column names." autofocus>
  'keyup #txtKeyWordSearch': function (event, template) {
      if(event.currentTarget.value.length >= 3){
        Session.set('keywords', event.currentTarget.value);
    } else {
      Session.set('keywords', null);


So, I can just drop it into any template with {{> keyword_search}}

Third, I rewrote the subscription in autorun.  If you are working with multiple collections and reusing reactive session variables you will need to prevent reactive triggering of multiple subscriptions by wrapping your subscriptions in some logic.  If you aren't careful about this you will cause unnecessary resubscription and resource consumption.  For example, I usually just check my route.

if( Session.equals('route', 'customers') ){
  Meteor.subscribe('customers', {keywords:Session.get('keywords')});

Fourth, I rewrote my customers publication.  As you can see, I specify the fields to search server side but because the publish can take an options argument you could allow the client to define the fields to search which could come in handy if you are allowing them to define what columns to view as well.

Meteor.publish("customers", function(options) {
     options.fields = ['name', 'last_name', 'email', 'phone'];
     return Customers.searchByKeyword(options);

Fifth, I implemented an index  on the collection on server startup.  You could just do this at the Mongo console as well.

Meteor.startup(function () {  
  Customers._ensureIndex({name: 1, last_name: 1, email: 1, phone: 1}, {background: true}); 

As a Florida flat lander I resent there weren't lifts!
Sixth, I checked the query plan on Mongo to see what kind of performance I would receive.  You can do this from the Mongo console.

db.customers.find({ active: true, name: /^steeve/i }).explain()

Resulted in 120ms with 15,000 records and case insensitive.  I can live with that.

db.customers.find({ active: true, name: /^steeve/ }).explain()

Resulted in 1ms with 15,000 records and case sensitive.

Seventh, maybe 30 to 45 minutes into the issue I deployed!

2,000 ft drop in 20 minutes with good flow, rad air and no walkers!
I logged in and the client was hella blazing awesome fast!  Even better the client interface hadn't really changed and the staff didn't even notice anything had changed whatsoever.

Of course, then I hit the mountains! No broken bones, no stitches but I did manage to blow out both tires on my rental bike, bend the back rim and the bike mechs had to check the frame for stress damage!  Doh!

Does anyone have a spare tube?
I was also able to enjoy my buddies wedding without any nagging anxiety and worry which is what would have happened had I implemented FTS on a production mission critical system.

Will I  consider FTS in the future?

Sure, absolutely!

In fact, I can't wait to dig into some of the information and examples out there!

Do I think I will use it in all cases?

Probably not.

FTS performance will have to sufficiently outweigh the storage and memory costs.  As well, I believe that FTS can only return the top 100 scoring documents based on this article, which would suffice in most cases but is still a consideration.

Ultimately, it always comes down to the right tool or solution for the right job!  By the way, speaking of tools can someone pass me a tire lever and a spoke wrench?

No comments:

Post a Comment