Wednesday, May 18, 2016

How to write a Custom Solr Query Parser for Solr 6

Introduction

Solr comes pre-installed with a bunch of great query parsers, so if you're starting out, there's a push to learn and use that syntax.  However, many times we are not starting out without a historical query language--and converting to a new query language is not an option.  This article is meant to assist those embarking on this voyage.

Solr advertises the fact that it supports extending its base functionality through plugins, but there are not many examples out there of a query parser from start to finish.  With this, my goal is to get the plumbing out of the way so that you can focus on implementing your particular parsing algorithm.

Overview

Here's the bird's-eye view of what we need to do.

  • Download and compile Solr 6 in Eclipse
  • Create a separate project for your plugin
  • Export your parser as a JAR file
  • Install the JAR file in Solr
  • Configure Solr to use the JAR
  • Use the custom Query Parser


Create a separate project for your plugin

It is assumed that you followed these instructions on how to download and compile Solr 6 in Eclipse.
At this point, you should have Eclipse happy with the solr code base (no red marks--errors).
  1. Collapse the solr root folder in Package Explorer
  2. Right-click in the whitepsace in Package Explorer
  3. New > Java Project
  4. Project Name: HelloWorldParser
  5. My execution Environment JRE happened to be JavaSE-1.8
  6. Next
  7. Click on the Projects tab
  8. Add...
  9. Check the solr source code project name and press OK
  10. Click Finish
  11. Right-click HelloWorldParser's src folder > New > Package
  12. Name: org.mycompany.lucene.search
  13. Click Finish
  14. Right-Click the new package created > New > Class
  15. Name: HelloWorldQParserPlugin
  16. Click Finish
  17. Here's the code for our simple HelloWorldQParserPlugin.java file
package org.mycompany.lucene.search;

import org.apache.lucene.index.Term;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QParserPlugin;
import org.apache.solr.search.QueryParsing;

public class HelloWorldQParserPlugin extends QParserPlugin {
  public static final String NAME = "helloWorld";
      
  @Override
  public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {
    return new QParser(qstr, localParams, params, req) {
      @Override
      public Query parse() {
        final IndexSchema schema = req.getSchema();
        final String defaultField = QueryParsing.getDefaultField(schema, getParam(CommonParams.DF));
        // When you implement your Query Parser, you may want to read up on the commented line items.
        // I left them here to give you a jump-off point, but we don't need them in this example.
        //final Analyzer analyzer = schema.getQueryAnalyzer();
        //final SolrCoreParser solrParser = new SolrCoreParser(defaultField, analyzer, req);
               
        // Yes, at the end of the day, this HelloWorldQParserPlugin is nothing more than a wrapper for a TermQuery
        // I wanted to allow some functionality, but not get too crazy because you're likely to replace it, anyway :D
        TermQuery tq = new TermQuery(new Term(defaultField,qstr));
        return tq;
      }
    };
  }
}

Eclipse generally compiles your code as soon as you save it--let's make sure.  Open a Windows Explorer window (Windows Key + E) and navigate to your code for this plugin.  Then click through bin\org\mycompany\lucene\search.  Verify that you see two class files there: HelloWorldQParserPlugin$1.class and HelloWorldQParserPlugin.class.
If they're there, then we're set to export this to a JAR file.

Export your parser as a JAR file

  1. Go back to Eclipse
  2. Right-click on the HelloWorldParser project in Package Explorer > Export...
  3. Expand the Java folder and select JAR file and click Next
  4. I left everything as default (just my HelloWorldParser's src folder was checked
  5. JAR file: Choose where you want to export your JAR file to.  I'll choose a directory where I keep all of them in a backup.  I like to put a version number at the end so I know I'm working with the latest version in Solr, but that's entirely up to you.  I named my JAR HelloWorldParser-0.0.JAR.
  6. Click Finish
  7. Verify that it generated your JAR file.

Install the JAR file in Solr

It is assumed that you have compiled Solr from the source, so that you know that your plugin will work with the version of Solr you have installed.  The idea is that You've compiled Solr 6.0.0 in this case and you have your HelloWorldParser referencing that very version of Solr--so you shouldn't have to worry about your parser code being out-of-date with solr--which can and has happened to me.  Let's save you that frustration. :)
If you haven't done it yet, follow these instructions--specifically, Using the build.xml files in the Project.  In there it tells how to use the Apache Ant build.xml file to build the server (make sure you get the correct build.xml file located under the solr subfolder--not the one at the top level).

  1. Open a Command window and navigate to the root of your Solr source
  2. cd solr
  3. bin\solr.cmd start -e cloud -noprompt
  4. Navigate to the URL provided (i.e. http://localhost:8983/solr)
  5. This sets us up with a gettingstarted collection as a Solr Cloud
  6. Now, let's index some documents.
    java -Dc=gettingstarted -Dauto=yes -Drecursive=yes -jar example\exampledocs\post.jar example\exampledocs
  7. Now we need to shut it down and install the JAR file
    bin\solr.cmd stop -all

Okay--NOW you should be at a point where you can install the JAR file.
Since this is Solr Cloud, there are better ways of installing your JAR, but we just want to get it loaded and test it.  Please keep this in mind for later, as ZooKeeper has a way of distributing your JAR files through it's Blob Store API (see here and here).
But for now, we're not going to pay attention to "best practices" and just get it loaded.
  1. Go to Windows Explorer (Windows Key + E) and navigate to <your solr source root>\solr\example\cloud\node1\solr
  2. Create a new folder called lib
  3. Paste your JAR file in there
  4. For each of the remaining cores, copy the lib folder you just made to nodeN\solr
  5. Go back to the command prompt and start solr back up
    bin\solr.cmd start -e cloud -noprompt
  6. Check the log to make sure it loaded our JAR file--important!
    1. The log file we're looking for is located:
      <solr source>\solr\example\cloud\node1\logs\solr.log
    2. Search for HelloWorld
      You should see something like:
      Adding 'file:/D:/solr-6.0.0/solr/example/cloud/node1/solr/lib/HelloWorldParser-0.0.jar' to classloaderNOTE: For some reason, it didn't load it the first time I tried. I renamed the JAR file from a .JAR extension to a .jar extension (it shouldn't make a difference, but it loaded it the second time. So, if you're having a problem, maybe that's it?)

Configure Solr to use the JAR

Awesome, so our JAR file is loaded--now we need to hook into it and use the parser.  Since we're using Solr Cloud, we will need to use the ZooKeeper API to play with the configs.
  1. Go to your command prompt (you should still be at the <solr-src>\solr subdirectory
  2. Get the solrconfig.xml (rename it so we know it's our local version):
    server\scripts\cloud-scripts\zkcli.bat -cmd getfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983
  3. Open it up:
    notepad solrconfiglocal.xml
  4. Notepad doesn't do a good job of formatting this file, so be careful.  You may want to use a better text editor, but notepad will work
  5. Search for <queryParser
  6. Copy that example and paste it just below the comment it's contained within and make the following changes:
    <queryParser name="helloWorld" class="org.mycompany.lucene.search.HelloWorldQParserPlugin"/>
    The name attribute will be used when we specify which parser we want.  The class attribute is the class that specifically points to our QParserPlugin.  Note that you don't specify the path to the JAR file--it's already loaded by the class loader and it should be found by the class name.
  7. Save the file and exit Notepad
  8. Now, push it back to ZooKeeper:
    server\scripts\cloud-scripts\zkcli.bat -cmd putfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983
  9. We need to reload the core in the Amin UI interface, so hop on your web browser
  10. From the left-hand side, select Collections, then select gettingstarted
  11. Click the Reload button
  12. You should see the Reload button turn green

Use the custom Query Parser

  1. Now select gettingstarted from the core selector from the left drop-down
  2. Select Query
  3. Change the q field to:
    {!helloWorld}hello
  4. Click Execute Query
  5. I got one result, so if you didn't get any, change the q field to {!helloWorld}test
  6. Now, let's try to specify a field to query:
    1. Set the df field to id
    2. Copy the id field value from one of your search results and paste it over the q field like this:
      {!helloWorld}UTF8TEST
    3. Click the Execute Query button and notice that it correctly identifies that document!
Hallelujah!  It works!

Conclusion

Wow, what an adventure that was, right?  The exciting part is that we now have a base from which to develop our very own query parser, which is quite an adventure in and of itself.


Be blessed.

2 comments:

  1. can i add logging in the plugin?

    ReplyDelete
    Replies
    1. Employee,
      You should be able to do anything you want to do inside the plugin. Now, how you do that--I'm not exactly sure.
      Depending on what you need, though, you should be able to pull it off.

      I did a bit of poking around in the source and it looks like some are using a stream.println function. How you get access to the "right stream," though, I don't know.

      This sounds like a good topic for a future post, doesn't it? :)

      Delete