Coding Art: How to write a Custom Solr Query Parser for Solr 6

Introduction

Solr comes pre-installed with a bunch of great query parsers, so if you're starting out, there's a push to learn and use that syntax. However, many times we are not starting out without a historical query language--and converting to a new query language is not an option. This article is meant to assist those embarking on this voyage.

Solr advertises the fact that it supports extending its base functionality through plugins, but there are not many examples out there of a query parser from start to finish. With this, my goal is to get the plumbing out of the way so that you can focus on implementing your particular parsing algorithm.

Overview

Here's the bird's-eye view of what we need to do.

Download and compile Solr 6 in Eclipse
Create a separate project for your plugin
Export your parser as a JAR file
Install the JAR file in Solr
Configure Solr to use the JAR
Use the custom Query Parser

Create a separate project for your plugin

It is assumed that you followed these instructions on how to download and compile Solr 6 in Eclipse.

At this point, you should have Eclipse happy with the solr code base (no red marks--errors).

Collapse the solr root folder in Package Explorer
Right-click in the whitepsace in Package Explorer
New > Java Project
Project Name: HelloWorldParser
My execution Environment JRE happened to be JavaSE-1.8
Next
Click on the Projects tab
Add...
Check the solr source code project name and press OK
Click Finish
Right-click HelloWorldParser's src folder > New > Package
Name: org.mycompany.lucene.search
Click Finish
Right-Click the new package created > New > Class
Name: HelloWorldQParserPlugin
Click Finish
Here's the code for our simple HelloWorldQParserPlugin.java file

package org.mycompany.lucene.search;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.TermQuery;

import org.apache.solr.common.params.CommonParams;

import org.apache.solr.common.params.SolrParams;

import org.apache.solr.request.SolrQueryRequest;

import org.apache.solr.schema.IndexSchema;

import org.apache.solr.search.QParser;

import org.apache.solr.search.QParserPlugin;

import org.apache.solr.search.QueryParsing;

public class HelloWorldQParserPlugin extends QParserPlugin {

public static final String NAME = "helloWorld";

@Override

public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {

return new QParser(qstr, localParams, params, req) {

@Override

public Query parse() {

final IndexSchema schema = req.getSchema();

final String defaultField = QueryParsing.getDefaultField(schema, getParam(CommonParams.DF));

// When you implement your Query Parser, you may want to read up on the commented line items.

// I left them here to give you a jump-off point, but we don't need them in this example.

//final Analyzer analyzer = schema.getQueryAnalyzer();

//final SolrCoreParser solrParser = new SolrCoreParser(defaultField, analyzer, req);

// Yes, at the end of the day, this HelloWorldQParserPlugin is nothing more than a wrapper for a TermQuery

// I wanted to allow some functionality, but not get too crazy because you're likely to replace it, anyway :D

TermQuery tq = new TermQuery(new Term(defaultField,qstr));

return tq;

}

};

}

Eclipse generally compiles your code as soon as you save it--let's make sure. Open a Windows Explorer window (Windows Key + E) and navigate to your code for this plugin. Then click through bin\org\mycompany\lucene\search. Verify that you see two class files there: HelloWorldQParserPlugin$1.class and HelloWorldQParserPlugin.class.

If they're there, then we're set to export this to a JAR file.

Export your parser as a JAR file

Go back to Eclipse
Right-click on the HelloWorldParser project in Package Explorer > Export...
Expand the Java folder and select JAR file and click Next
I left everything as default (just my HelloWorldParser's src folder was checked
JAR file: Choose where you want to export your JAR file to. I'll choose a directory where I keep all of them in a backup. I like to put a version number at the end so I know I'm working with the latest version in Solr, but that's entirely up to you. I named my JAR HelloWorldParser-0.0.JAR.
Click Finish
Verify that it generated your JAR file.

Install the JAR file in Solr

It is assumed that you have compiled Solr from the source, so that you know that your plugin will work with the version of Solr you have installed. The idea is that You've compiled Solr 6.0.0 in this case and you have your HelloWorldParser referencing that very version of Solr--so you shouldn't have to worry about your parser code being out-of-date with solr--which can and has happened to me. Let's save you that frustration. :)

If you haven't done it yet, follow these instructions--specifically, Using the build.xml files in the Project. In there it tells how to use the Apache Ant build.xml file to build the server (make sure you get the correct build.xml file located under the solr subfolder--not the one at the top level).

Open a Command window and navigate to the root of your Solr source
cd solr
bin\solr.cmd start -e cloud -noprompt
Navigate to the URL provided (i.e. http://localhost:8983/solr)
This sets us up with a gettingstarted collection as a Solr Cloud
Now, let's index some documents.
java -Dc=gettingstarted -Dauto=yes -Drecursive=yes -jar example\exampledocs\post.jar example\exampledocs
Now we need to shut it down and install the JAR file
bin\solr.cmd stop -all

Okay--NOW you should be at a point where you can install the JAR file.

Since this is Solr Cloud, there are better ways of installing your JAR, but we just want to get it loaded and test it. Please keep this in mind for later, as ZooKeeper has a way of distributing your JAR files through it's Blob Store API (see here and here).

But for now, we're not going to pay attention to "best practices" and just get it loaded.

Go to Windows Explorer (Windows Key + E) and navigate to <your solr source root>\solr\example\cloud\node1\solr
Create a new folder called lib
Paste your JAR file in there
For each of the remaining cores, copy the lib folder you just made to nodeN\solr
Go back to the command prompt and start solr back up
bin\solr.cmd start -e cloud -noprompt
Check the log to make sure it loaded our JAR file--important!

The log file we're looking for is located:
<solr source>\solr\example\cloud\node1\logs\solr.log
Search for HelloWorld
You should see something like:
Adding 'file:/D:/solr-6.0.0/solr/example/cloud/node1/solr/lib/HelloWorldParser-0.0.jar' to classloaderNOTE: For some reason, it didn't load it the first time I tried. I renamed the JAR file from a .JAR extension to a .jar extension (it shouldn't make a difference, but it loaded it the second time. So, if you're having a problem, maybe that's it?)

Configure Solr to use the JAR

Awesome, so our JAR file is loaded--now we need to hook into it and use the parser. Since we're using Solr Cloud, we will need to use the ZooKeeper API to play with the configs.

Go to your command prompt (you should still be at the <solr-src>\solr subdirectory
Get the solrconfig.xml (rename it so we know it's our local version):
server\scripts\cloud-scripts\zkcli.bat -cmd getfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983
Open it up:
notepad solrconfiglocal.xml
Notepad doesn't do a good job of formatting this file, so be careful. You may want to use a better text editor, but notepad will work
Search for <queryParser
Copy that example and paste it just below the comment it's contained within and make the following changes:
<queryParser name="helloWorld" class="org.mycompany.lucene.search.HelloWorldQParserPlugin"/>
The name attribute will be used when we specify which parser we want. The class attribute is the class that specifically points to our QParserPlugin. Note that you don't specify the path to the JAR file--it's already loaded by the class loader and it should be found by the class name.
Save the file and exit Notepad
Now, push it back to ZooKeeper:
server\scripts\cloud-scripts\zkcli.bat -cmd putfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983
We need to reload the core in the Amin UI interface, so hop on your web browser
From the left-hand side, select Collections, then select gettingstarted
Click the Reload button
You should see the Reload button turn green

Use the custom Query Parser

Now select gettingstarted from the core selector from the left drop-down
Select Query
Change the q field to:
{!helloWorld}hello
Click Execute Query
I got one result, so if you didn't get any, change the q field to {!helloWorld}test
Now, let's try to specify a field to query:

Set the df field to id
Copy the id field value from one of your search results and paste it over the q field like this:
{!helloWorld}UTF8TEST
Click the Execute Query button and notice that it correctly identifies that document!

Hallelujah! It works!

Conclusion

Wow, what an adventure that was, right? The exciting part is that we now have a base from which to develop our very own query parser, which is quite an adventure in and of itself.

So, there you have it. But don't leave just yet: answer the most important question we can ask in this life--where will we go when we die? After that, enjoy this free movie available on YouTube!

Be blessed.

5 comments:

EmployeeNovember 16, 2016 at 2:30 AM
can i add logging in the plugin?
Christian ErnstJanuary 24, 2018 at 8:49 AM
Brandon, great job thanks. With this instruction its very easy to handel it

Cheers
AnonymousDecember 5, 2018 at 1:27 PM
if we want to modify a param and forward it on to a regular parser like edismax how would one do that?

Coding Art

Pages

Wednesday, May 18, 2016

How to write a Custom Solr Query Parser for Solr 6