Wednesday, July 6, 2016

Running Solr 6.1 as a Windows Service

Goal

Get Solr 6.1 running as a Windows Service and have it recover on OutOfMemory Exceptions.

Install Solr 6.1 as a Service

  1. Download NSSM
  2. Extract nssm.exe somewhere
  3. Create solr_start_6.1.0.cmd (this is a Windows Command file that does all of my configuration--yours will definitely be different):
    C:\apache\solr-6.1.0\bin\solr start -f -h node3 -c -p 8983 -z "zk1:2181,zk2:2181,zk3:2181" -m 128m -s C:\apache\solr-6-cores
    NOTE: The -f to run the script in the foreground.  I set the JVM heap size to 128 MB (we want this thing to crash and burn to test out OutOfMemoryException restart!).
  4. Test your script to make sure it starts solr and you can access your Solr Admin UI in a web browser
  5. Open a command window and navigate to your nssm.exe directory
  6. nssm.exe install "Apache - Solr 6.1"
    1. Application Tab
      1. Path: Select your solr_start_6.1.0.cmd from earlier
      2. Startup Directory: set it to the directory containing your script (should populate by default)
    2. Details Tab
      1. Display Name: Apache - Solr 6.1
      2. Startup type: Automatic
    3. Log on Tab
      1. Make sure you specify an account that has administrator-level permissions (Use your account if you're stuck here--but make sure to set it to something production-worthy later)
    4. I/O Tab
      1. I/O Redirection
        1. Output (stdout): Set this to something like path\to\my\solr\cmd\script\dir\solr-6.1.0-out.txt
        2. Error (stderr): path\to\my\solr\cmd\script\dir\solr-6.1.0-error.txt
      2. File rotation
        1. Check Rotate files
        2. Check Rotate while service is running
        3. Restrict rotation to files bigger than: (use common sense here, I did 5 MB, so 5242880 went into the box)
    5. Click Install Service
  7. Open Component Services and select Apache - Solr 6.1
  8. Start the service
  9. Validate that it came up by going to your Admin UI webpage

Make Solr Service respond to Out Of Memory Exceptions

  1. Navigate to this JIRA ticket
  2. Download oom_win.cmd and place it in your solr\bin directory next to solr.cmd
  3. Open solr.cmd in a text editor
  4. Find all the places where the script starts the server:
    1. Search for /solr_gc.log
  5. Immediately after /solr_gc.log, paste the following:
    -XX:OnOutOfMemoryError="%SOLR_SERVER_DIR%\..\bin\oom_win.cmd %SOLR_PORT% !SOLR_LOGS_DIR!"
    1. I had to replace two lines.  NOTE that this is just the manual way of applying the patch file associated with the JIRA ticket above.  If you want, apply it however you want.
  6. Now that we've made our changes, go ahead and restart our new Solr 6.1 service so it knows to kill the process on OutOfMemory errors.
  7. To force an OutOfMemoryError, query *:* and return 1000000 rows
    1. If you have a decent amount of content, this should force an OutOfMemory exception.  If you don't have a lot of content, do whatever you can to make it do a lot of memory-intensive work.  Perhaps consider lowering the JVM memory, too.
    2. You should see the web server go offline temporarily and then come back online
  8. Now that you've seen it restart and come back online, let's give the JVM a good amount of RAM so that it doesn't run out of memory every other request.  Just edit your solr_start_6.1.0.cmd file and change the -m 128m to -m 4g (128 MB to 4 GB)
  9. Save and restart the service
  10. Confirm that you have the new amount of RAM for the JVM by visiting the Dashboard tab in the Admin UI

Logs

When the OutOfMemory Killer runs, it generates a log file in the normal log directory.  Navigate to that directory and you should see a file that looks something like: solr_oom_killer-9000-2016-07-06_13_59_39.  Now you can know when this script runs and hopefully anticipate it in the future or make changes to not get it.

Final step (important!)

Do a happy dance!

Tuesday, May 24, 2016

Apache Solr - RangeQuery

RangeQuery appears to be a range query that operates on strings.  I'm not sure if it only works on strings or if the strings are a way for it to auto-determine what field type it's going to work with.
If you know how to properly use this, please leave a comment below.  I'm stumped on this one.

Fields

  • fieldName (required in hierarchy here or a parent)
  • lowerTerm - string value
  • upperTerm - string value
  • includeLower (optional, default true)
  • includeUpper (optional, default true)

Examples

Simply paste the following into the q= field in the Admin UI.

Here's a query that I came up with, but it doesn't work as I'd expect.

{!xmlparser}
<RangeQuery
 fieldName="price"
 lowerTerm="1.00"
 upperTerm="3.00"
 includeLower="true"
 includeUpper="true">
</RangeQuery>

The above query returns no results, which isn't what I was expecting.


{!xmlparser}
<RangeQuery
 fieldName="price"
 lowerTerm="0"
 upperTerm="10.00"
 includeLower="true"
 includeUpper="true">
</RangeQuery>

The above query returns 30 documents, of which includes prices > 10.00, which isn't what I expected.

Apache Solr - MatchAllDocsQuery

MatchAllDocsQuery is probably the simplest query there is.  All it does is quickly match all documents in an index.

Example

Simply paste the following into the q= field in the Admin UI.

{!xmlparser}
<MatchAllDocsQuery></MatchAllDocsQuery>


Or, using the standard lucene syntax: *:*

Apache Solr - LegacyNumericRangeQuery

NumericRangeQuery was renamed to LegacyNumericRangeQuery about lucene 6.0.0 and marked deprecated. If you're on an older system, you will find it using NumericRangeQuery. Afterwards, LegacyNumericRangeQuery will be what you will want to use.

It looks like NumericRangeQuery has been replaced with PointRangeQuery.

Fields

  • fieldName (required here or in a parent node)
  • lowerTerm (optional, default null)
  • upperTerm (optional, default null)
  • includeLower (optional, default true)
  • includeUpper (optional, default true)
  • precisionStep (optional, default 16)
  • type (optional, default "int") - long | int | double | float

Examples

Simply paste the following into the q= field in the Admin UI.
NOTE: These examples will not work in Solr since Solr does not support Point types.

Lucene 6.0.0+:
{!xmlparser}
<LegacyNumericRangeQuery
 fieldName="price"
 lowerTerm="0"
 upperTerm="10"
 includeLower="true"
 includeUpper="true"
 precisionStep="16"
 type="float">
</LegacyNumericRangeQuery>

Before Lucene 6.0.0:
{!xmlparser}
<NumericRangeQuery
 fieldName="price"
 lowerTerm="0"
 upperTerm="10"
 includeLower="true"
 includeUpper="true"
 precisionStep="16"
 type="float">
</NumericRangeQuery>

Apache Solr - PointRangeQuery

PointRangeQuery is new as of about Apache Solr 6.0 and is meant to replace the, now deprecated, NumericRangeQuery (which got renamed to LegacyNumericRangeQuery).

NOTE: The following may be confusing, and here's why.  The PointRangeQueryBuilder, used by XmlQueryParser, requires that lowerTerm and upperTerm are specified, which means that neither of them may default.  However, in the code, it covers its bases in case lowerTerm or upperTerm was not specified.  In these cases, it defaults to the MIN_VALUE and MAX_VALUE for the respective types.  Until the code is updated after 6.0.0, the defaults cannot be reached.  If you are reading this after they become optional, the following may be beneficial to you.  Until then, you will be required to enter a value for both.

Fields

  • fieldName (required in hierarchy here or in a parent node)
  • lowerTerm (required)
  • upperTerm (required)
  • type (optional, default "int") - long | int | double | float
    • long
      • If lowerTerm is not specified, default to Long.MIN_VALUE (-9223372036854775808)
      • If upperTerm is not specified, default to Long.MAX_VALUE (9223372036854775807)
    • int
      • If lowerTerm is not specified, default to Integer.MIN_VALUE (-2147483648)
      • If upperTerm is not specified, default to Integer.MAX_VALUE (2147483647)
    • double
      • If lowerTerm is not specified, default to Double.NEGATIVE_INFINITY
      • If upperTerm is not specified, default to Double.POSITIVE_INFINITY
    • float
      • If lowerTerm is not specified, default to Float.NEGATIVE_INFINITY
      • If upperTerm is not specified, default to Float.POSITIVE_INFINITY

Example

Simply paste the following into the q= field in the Admin UI.
NOTE: This example should work, but it doesn't in my Solr 6.0.0 build because Point types aren't supported in Solr yet.  I do know that the PointRangeQuery code is running when I submit this query, because it gives me error messages when I don't specify required parameters.  However, somewhere in between, something isn't supported and no documents are returned.  Please leave a comment if you know more.

{!xmlparser}
<PointRangeQuery
 fieldName="price"
 lowerTerm="0.00"
 upperTerm="100.00"
 type="float">
</PointRangeQuery>

Apache Solr - TermsQuery

TermsQuery is essentially a list of TermQuery with a couple of extra options.
It allows you to specify a group of terms on a field and require a minimum number of matches.

Fields

  • fieldName (required here or in a parent node)
  • disableCoord (optional, default false)
  • minimumNumberShouldMatch (optional, default 0)
  • boost (optional, default 1.0)
  • Value: list of tokens (space-delimited will do it)

Example

Simply paste the following into the q= field in the Admin UI.
{!xmlparser}
<TermsQuery
 fieldName="series_t"
 disableCoord="true"
 boost="1.2"
 minimumNumberShouldMatch="2">song ice fire ender black company</TermsQuery>

Monday, May 23, 2016

Apache Solr - TermQuery

TermQuery is a very simple query that matches documents containing a term.
A term represents a word found in a field.  A term has a field property and a text property.

Fields


Examples

Simply paste the following into the q= field in the Admin UI.
Query ParserSyntax
XmlQueryParser
{!xmlparser}
<TermQuery fieldName="_text_" boost="1.3">test</TermQuery>
Lucene _text_:test^1.3

Apache Solr - XML Query Parser

Introduction
The XML Query Parser (XmlQueryParser) supports a very wide range of available Apache Solr
search queries--more so than any other query parser that ships with it.
This article will attempt to examine the breadth of that influence released with Solr 6.0.0.
I will be adding separate articles (and linking to them) for the different types of queries so that
more detail may be devoted to it and not overwhelm this main thread.

De-Facto Example
<BooleanQuery fieldName="description">
    <Clause occurs="must">
        <TermQuery>shirt</TermQuery>
    </Clause>
    <Clause occurs="mustnot">
        <TermQuery>plain</TermQuery>
    </Clause>
    <Clause occurs="should">
        <TermQuery>cotton</TermQuery>
    </Clause>
    <Clause occurs="must">
        <BooleanQuery fieldName="size">
            <Clause occurs="should">
                <TermsQuery>S M L</TermsQuery>
            </Clause>
        </BooleanQuery>
    </Clause>
</BooleanQuery>


Difficulties
  • How do I get highlighting to work?

Top-Level
  • BooleanQuery
    • disableCoord (optional, false)
    • minimumNumberShouldMatch (optional, 0)
    • boost (optional, 1.0)
    • Value
      • Clause
        • occurs: should | must | mustNot | filter
        • Value (Note: Many of the following can also have children, explained later)
          • TermQuery
          • TermsQuery
          • MatchAllDocsQuery
          • BooleanQuery
          • LegacyNumericRangeQuery (deprecated)
          • PointRangeQuery
          • DisjunctionMaxQuery
          • UserQuery
          • ConstantScoreQuery
          • SpanNear
          • BoostingTermQuery
          • SpanTerm
          • SpanOr
          • SpanOrTerms
          • SpanFirst
          • SpanNot
        • NOTE: Only the first Clause child is recognized--others will get silently ignored!
      • Ignores any other element types at this level--i.e. only Clause is recognized, no exceptions thrown if it finds something else

  • MatchAllDocsQuery - Matches all documents in an index
  • TermQuery
  • TermsQuery
  • [Legacy]NumericRangeQuery (deprecated in lucene 6.0.0ish)
    • Not supported as of Solr 6 (solr doesn't support point types yet)
  • PointRangeQuery (new in 6.0ish)
    • Not supported as of Solr 6 (solr doesn't support point types yet)
  • RangeQuery
  • DisjunctionMaxQuery
    • tieBreaker (optional, 0.0)
    • boost (optional, 1.0)
    • Value
      • May contain multiple queries of any type of Query defined in this list (i.e. DisjunctionMaxQuery, RangeQuery, …)
  • UserQuery
    • fieldName (optional, defaults to defaultField)
    • Value
      • Text is passed into QueryParser.parse
      • This appears to support the classic query syntax
    • NOTE: Wraps the query into a BoostQuery
  • ConstantScoreQuery
    • boost (optional, 1.0)
    • Value
      • Only gets the first child
      • Child may be any query in this list
  • SpanNear
    • boost (optional, 1.0)
    • slop
    • inOrder (optional, false)
    • Value
      • A collection of various types of SpanQuery
  • BoostingTermQuery
    • fieldName (required either here or in a parent)
    • boost (optional, 1.0)
    • Value: fieldName value
  • SpanTerm
    • fieldName (required either here or in a parent)
    • boost (optional, 1.0)
    • Value: fieldName value
  • SpanOr
    • boost (optional, 1.0)
    • Value: a collection of various types of SpanQuery
  • SpanOrTerms
    • fieldName (required either here or in a parent)
    • boost (optional, 1.0)
    • Value: terms commonly separated by a space
    • Wraps the terms in a SpanOr query
  • SpanFirst
    • This limits span matches to the first N (specified by the end parameter below) positions
      • More specifically, match spans in the subquery whose end position is less than or equal to end.
    • boost (optional, 1.0)
    • end (optional, 1, integer)
    • Value:
      • Gets the first child, which must be a SpanQuery
      • All other children are ignored
  • SpanNot
    • boost (optional, 1.0)
    • Include - First child element called Include must contain a SpanQuery
    • Exclude - First child element called Exclude must contain a SpanQuery



BooleanQuery
TermQuery
{!xmlparser}
<BooleanQuery fieldName="headline">
  <Clause occurs="must">
    <TermQuery>york</TermQuery>
  </Clause> 
</BooleanQuery>

{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <TermQuery fieldName="headline">york</TermQuery>
  </Clause>
</BooleanQuery>

SpanNear
// Headline: new pre/3 york
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanTerm>york</SpanTerm>
    </SpanNear>
  </Clause>
</BooleanQuery>

// Headline: new pre/3 (york or car)
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanOr>
    <SpanTerm>york</SpanTerm>
    <SpanTerm>car</SpanTerm>
</SpanOr>
    </SpanNear>
  </Clause>
</BooleanQuery>

// Headline: new pre/3 (york or (car w/3 bart))
// Match: "headline":"New York. Hongkong. Wunsiedel"
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanOr>
    <SpanTerm>york</SpanTerm>
    <SpanNear slop="3" inOrder="false">
<SpanTerm>car</SpanTerm>
<SpanTerm>arrives</SpanTerm>
    </SpanNear>
</SpanOr>
    </SpanNear>
  </Clause>
</BooleanQuery>

// Headline: new pre/3 (daybook or (employee w/3 onboarding))
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanOr>
    <SpanTerm>daybook</SpanTerm>
    <SpanNear slop="3" inOrder="false">
<SpanTerm>employee</SpanTerm>
<SpanTerm>onboarding</SpanTerm>
    </SpanNear>
</SpanOr>
    </SpanNear>
  </Clause>
</BooleanQuery>

DisjunctionMaxQuery
{!xmlparser}
<DisjunctionMaxQuery
 tieBreaker="1"
 boost="2">
    <UserQuery fieldName="headline">uber</UserQuery>
    <TermsQuery fieldName="headline">new york times</TermsQuery>
</DisjunctionMaxQuery>

UserQuery
{!xmlparser}
<UserQuery fieldName="headline">
"new computer*"~15
</UserQuery>

ConstantScoreQuery
{!xmlparser}
<ConstantScoreQuery boost="1.0">
    <UserQuery fieldName="headline">tesla</UserQuery>
</ConstantScoreQuery>

SpanNear
{!xmlparser}
<SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanTerm>computer</SpanTerm>
</SpanNear>

BoostingTermQuery
{!xmlparser}
<BoostingTermQuery
  fieldName="headline"
  boost="1.2">
tesla
</BoostingTermQuery>

SpanTerm
{!xmlparser}
<SpanTerm
  fieldName="headline"
  boost="1.2">
tesla
</SpanTerm>

SpanOr
{!xmlparser}
<SpanOr fieldName="headline"
  boost="1.2">
<SpanTerm>pizza</SpanTerm>
<SpanTerm>milk</SpanTerm>
</SpanOr>

SpanOrTerms
{!xmlparser}
<SpanOrTerms
  fieldName="headline"
  boost="1.2">
pizza milk
</SpanOrTerms>

SpanFirst
{!xmlparser}
<SpanFirst
  fieldName="headline"
  end="1"
  boost="1.2">
<SpanTerm>tesla</SpanTerm>
</SpanFirst>

SpanNot -- TODO: Redo this--I'm getting some headlines with york in them
{!xmlparser}
<SpanNot fieldName="headline">
  <Include>
<SpanTerm>new</SpanTerm>
  </Include>
  <Exclude>
<SpanTerm>york</SpanTerm>
  </Exclude>
</SpanNot>