Coding Art: 2016

Wednesday, July 6, 2016

Running Solr 6.1 as a Windows Service

Goal

Get Solr 6.1 running as a Windows Service and have it recover on OutOfMemory Exceptions.

Install Solr 6.1 as a Service

Download NSSM
Extract nssm.exe somewhere
Create solr_start_6.1.0.cmd (this is a Windows Command file that does all of my configuration--yours will definitely be different):
C:\apache\solr-6.1.0\bin\solr start -f -h node3 -c -p 8983 -z "zk1:2181,zk2:2181,zk3:2181" -m 128m -s C:\apache\solr-6-cores
NOTE: The -f to run the script in the foreground. I set the JVM heap size to 128 MB (we want this thing to crash and burn to test out OutOfMemoryException restart!).
Test your script to make sure it starts solr and you can access your Solr Admin UI in a web browser
Open a command window and navigate to your nssm.exe directory
nssm.exe install "Apache - Solr 6.1"

Application Tab

Path: Select your solr_start_6.1.0.cmd from earlier
Startup Directory: set it to the directory containing your script (should populate by default)

Details Tab

Display Name: Apache - Solr 6.1
Startup type: Automatic

Log on Tab

Make sure you specify an account that has administrator-level permissions (Use your account if you're stuck here--but make sure to set it to something production-worthy later)

I/O Tab

I/O Redirection

Output (stdout): Set this to something like path\to\my\solr\cmd\script\dir\solr-6.1.0-out.txt
Error (stderr): path\to\my\solr\cmd\script\dir\solr-6.1.0-error.txt

File rotation

Check Rotate files
Check Rotate while service is running
Restrict rotation to files bigger than: (use common sense here, I did 5 MB, so 5242880 went into the box)

Click Install Service

Open Component Services and select Apache - Solr 6.1
Start the service
Validate that it came up by going to your Admin UI webpage

Make Solr Service respond to Out Of Memory Exceptions

Navigate to this JIRA ticket
Download oom_win.cmd and place it in your solr\bin directory next to solr.cmd
Open solr.cmd in a text editor
Find all the places where the script starts the server:

Search for /solr_gc.log

Immediately after /solr_gc.log, paste the following:
-XX:OnOutOfMemoryError="%SOLR_SERVER_DIR%\..\bin\oom_win.cmd %SOLR_PORT% !SOLR_LOGS_DIR!"

I had to replace two lines. NOTE that this is just the manual way of applying the patch file associated with the JIRA ticket above. If you want, apply it however you want.

Now that we've made our changes, go ahead and restart our new Solr 6.1 service so it knows to kill the process on OutOfMemory errors.
To force an OutOfMemoryError, query *:* and return 1000000 rows

If you have a decent amount of content, this should force an OutOfMemory exception. If you don't have a lot of content, do whatever you can to make it do a lot of memory-intensive work. Perhaps consider lowering the JVM memory, too.
You should see the web server go offline temporarily and then come back online

Now that you've seen it restart and come back online, let's give the JVM a good amount of RAM so that it doesn't run out of memory every other request. Just edit your solr_start_6.1.0.cmd file and change the -m 128m to -m 4g (128 MB to 4 GB)
Save and restart the service
Confirm that you have the new amount of RAM for the JVM by visiting the Dashboard tab in the Admin UI

Logs

When the OutOfMemory Killer runs, it generates a log file in the normal log directory. Navigate to that directory and you should see a file that looks something like: solr_oom_killer-9000-2016-07-06_13_59_39. Now you can know when this script runs and hopefully anticipate it in the future or make changes to not get it.

Final step (important!)

Do a happy dance!

Tuesday, May 24, 2016

Apache Solr - RangeQuery

RangeQuery appears to be a range query that operates on strings. I'm not sure if it only works on strings or if the strings are a way for it to auto-determine what field type it's going to work with.

If you know how to properly use this, please leave a comment below. I'm stumped on this one.

Fields

fieldName (required in hierarchy here or a parent)
lowerTerm - string value
upperTerm - string value
includeLower (optional, default true)
includeUpper (optional, default true)

Examples

Simply paste the following into the q= field in the Admin UI.

Here's a query that I came up with, but it doesn't work as I'd expect.

{!xmlparser}

<RangeQuery

fieldName="price"

lowerTerm="1.00"

upperTerm="3.00"

includeLower="true"

includeUpper="true">

</RangeQuery>

The above query returns no results, which isn't what I was expecting.

{!xmlparser}

<RangeQuery

fieldName="price"

lowerTerm="0"

upperTerm="10.00"

includeLower="true"

includeUpper="true">

</RangeQuery>

The above query returns 30 documents, of which includes prices > 10.00, which isn't what I expected.

Apache Solr - MatchAllDocsQuery

MatchAllDocsQuery is probably the simplest query there is. All it does is quickly match all documents in an index.

Example

Simply paste the following into the q= field in the Admin UI.

{!xmlparser}

<MatchAllDocsQuery></MatchAllDocsQuery>

Or, using the standard lucene syntax: *:*

Apache Solr - LegacyNumericRangeQuery

NumericRangeQuery was renamed to LegacyNumericRangeQuery about lucene 6.0.0 and marked deprecated. If you're on an older system, you will find it using NumericRangeQuery. Afterwards, LegacyNumericRangeQuery will be what you will want to use.

It looks like NumericRangeQuery has been replaced with PointRangeQuery.

Fields

fieldName (required here or in a parent node)
lowerTerm (optional, default null)
upperTerm (optional, default null)
includeLower (optional, default true)
includeUpper (optional, default true)
precisionStep (optional, default 16)
type (optional, default "int") - long | int | double | float

Examples

Simply paste the following into the q= field in the Admin UI.

NOTE: These examples will not work in Solr since Solr does not support Point types.

Lucene 6.0.0+:

{!xmlparser}

<LegacyNumericRangeQuery

fieldName="price"

lowerTerm="0"

upperTerm="10"

includeLower="true"

includeUpper="true"

precisionStep="16"

type="float">

</LegacyNumericRangeQuery>

Before Lucene 6.0.0:

{!xmlparser}

<NumericRangeQuery

fieldName="price"

lowerTerm="0"

upperTerm="10"

includeLower="true"

includeUpper="true"

precisionStep="16"

type="float">

</NumericRangeQuery>

Apache Solr - PointRangeQuery

PointRangeQuery is new as of about Apache Solr 6.0 and is meant to replace the, now deprecated, NumericRangeQuery (which got renamed to LegacyNumericRangeQuery).

NOTE: The following may be confusing, and here's why. The PointRangeQueryBuilder, used by XmlQueryParser, requires that lowerTerm and upperTerm are specified, which means that neither of them may default. However, in the code, it covers its bases in case lowerTerm or upperTerm was not specified. In these cases, it defaults to the MIN_VALUE and MAX_VALUE for the respective types. Until the code is updated after 6.0.0, the defaults cannot be reached. If you are reading this after they become optional, the following may be beneficial to you. Until then, you will be required to enter a value for both.

Fields

fieldName (required in hierarchy here or in a parent node)
lowerTerm (required)
upperTerm (required)
type (optional, default "int") - long | int | double | float

long

If lowerTerm is not specified, default to Long.MIN_VALUE (-9223372036854775808)
If upperTerm is not specified, default to Long.MAX_VALUE (9223372036854775807)

If lowerTerm is not specified, default to Integer.MIN_VALUE (-2147483648)
If upperTerm is not specified, default to Integer.MAX_VALUE (2147483647)

double

If lowerTerm is not specified, default to Double.NEGATIVE_INFINITY
If upperTerm is not specified, default to Double.POSITIVE_INFINITY

float

If lowerTerm is not specified, default to Float.NEGATIVE_INFINITY
If upperTerm is not specified, default to Float.POSITIVE_INFINITY

Example

Simply paste the following into the q= field in the Admin UI.

NOTE: This example should work, but it doesn't in my Solr 6.0.0 build because Point types aren't supported in Solr yet. I do know that the PointRangeQuery code is running when I submit this query, because it gives me error messages when I don't specify required parameters. However, somewhere in between, something isn't supported and no documents are returned. Please leave a comment if you know more.

{!xmlparser}

<PointRangeQuery

fieldName="price"

lowerTerm="0.00"

upperTerm="100.00"

type="float">

</PointRangeQuery>

Apache Solr - TermsQuery

TermsQuery is essentially a list of TermQuery with a couple of extra options.
It allows you to specify a group of terms on a field and require a minimum number of matches.

Fields

fieldName (required here or in a parent node)
disableCoord (optional, default false)

more about coord-factor in scoring
If disabled, resulting score gets multiplied by 1.0

minimumNumberShouldMatch (optional, default 0)
boost (optional, default 1.0)

more about query-boost in scoring

Value: list of tokens (space-delimited will do it)

Example

Simply paste the following into the q= field in the Admin UI.

{!xmlparser}

<TermsQuery

fieldName="series_t"

disableCoord="true"

boost="1.2"

minimumNumberShouldMatch="2">song ice fire ender black company</TermsQuery>

Monday, May 23, 2016

Apache Solr - TermQuery

TermQuery is a very simple query that matches documents containing a term.
A term represents a word found in a field. A term has a field property and a text property.

Fields

fieldName (required)

boost (optional, 1.0 default)

more about query-boost in scoring

Examples

Simply paste the following into the q= field in the Admin UI.

Query Parser	Syntax
XmlQueryParser	{!xmlparser} <TermQuery fieldName="_text_" boost="1.3">test</TermQuery>
Lucene	_text_:test^1.3

Apache Solr - XML Query Parser

Introduction

The XML Query Parser (XmlQueryParser) supports a very wide range of available Apache Solr
search queries--more so than any other query parser that ships with it.
This article will attempt to examine the breadth of that influence released with Solr 6.0.0.

I will be adding separate articles (and linking to them) for the different types of queries so that
more detail may be devoted to it and not overwhelm this main thread.

De-Facto Example

<BooleanQuery fieldName="description">

    <Clause
occurs="must">

        <TermQuery>shirt</TermQuery>

    </Clause>

    <Clause
occurs="mustnot">

        <TermQuery>plain</TermQuery>

    </Clause>

    <Clause
occurs="should">

        <TermQuery>cotton</TermQuery>

    </Clause>

    <Clause
occurs="must">

        <BooleanQuery
fieldName="size">

            <Clause
occurs="should">

                <TermsQuery>S M L</TermsQuery>

            </Clause>

        </BooleanQuery>

    </Clause>

</BooleanQuery>

Difficulties

How do I get highlighting to work?

Top-Level

BooleanQuery

disableCoord (optional, false)
minimumNumberShouldMatch (optional, 0)
boost (optional, 1.0)
Value

Clause

occurs: should | must | mustNot | filter
Value (Note: Many of the following can also have children, explained later)

TermQuery
TermsQuery
MatchAllDocsQuery
BooleanQuery
LegacyNumericRangeQuery (deprecated)
PointRangeQuery
DisjunctionMaxQuery
UserQuery
ConstantScoreQuery
SpanNear
BoostingTermQuery
SpanTerm
SpanOr
SpanOrTerms
SpanFirst
SpanNot

NOTE: Only the first Clause child is recognized--others will get silently ignored!

Ignores any other element types at this level--i.e. only Clause is recognized, no exceptions thrown if it finds something else

MatchAllDocsQuery - Matches all documents in an index
TermQuery

TermsQuery

[Legacy]NumericRangeQuery (deprecated in lucene 6.0.0ish)

Not supported as of Solr 6 (solr doesn't support point types yet)

PointRangeQuery (new in 6.0ish)

Not supported as of Solr 6 (solr doesn't support point types yet)

RangeQuery

DisjunctionMaxQuery

tieBreaker (optional, 0.0)
boost (optional, 1.0)
Value

May contain multiple queries of any type of Query defined in this list (i.e. DisjunctionMaxQuery, RangeQuery, …)

UserQuery

fieldName (optional, defaults to defaultField)
Value

Text is passed into QueryParser.parse
This appears to support the classic query syntax

NOTE: Wraps the query into a BoostQuery

ConstantScoreQuery

boost (optional, 1.0)
Value

Only gets the first child
Child may be any query in this list

SpanNear

boost (optional, 1.0)
slop
inOrder (optional, false)
Value

A collection of various types of SpanQuery

BoostingTermQuery

fieldName (required either here or in a parent)
boost (optional, 1.0)
Value: fieldName value

SpanTerm

fieldName (required either here or in a parent)
boost (optional, 1.0)
Value: fieldName value

SpanOr

boost (optional, 1.0)
Value: a collection of various types of SpanQuery

SpanOrTerms

fieldName (required either here or in a parent)
boost (optional, 1.0)
Value: terms commonly separated by a space
Wraps the terms in a SpanOr query

SpanFirst

This limits span matches to the first N (specified by the end parameter below) positions

More specifically, match spans in the subquery whose end position is less than or equal to end.

boost (optional, 1.0)
end (optional, 1, integer)
Value:

Gets the first child, which must be a SpanQuery
All other children are ignored

SpanNot

boost (optional, 1.0)
Include - First child element called Include must contain a SpanQuery
Exclude - First child element called Exclude must contain a SpanQuery

BooleanQuery

TermQuery

{!xmlparser}

</Clause>

</BooleanQuery>

{!xmlparser}

</Clause>

</BooleanQuery>

SpanNear

// Headline: new pre/3 york

{!xmlparser}

</SpanNear>

</Clause>

</BooleanQuery>

// Headline: new pre/3 (york or car)

{!xmlparser}

</SpanOr>

</SpanNear>

</Clause>

</BooleanQuery>

// Headline: new pre/3 (york or (car w/3 bart))

// Match: "headline":"New York. Hongkong. Wunsiedel"

{!xmlparser}

<SpanTerm>arrives</SpanTerm>

</SpanNear>

</SpanOr>

</SpanNear>

</Clause>

</BooleanQuery>

// Headline: new pre/3 (daybook or (employee w/3 onboarding))

{!xmlparser}

<SpanTerm>daybook</SpanTerm>

<SpanTerm>employee</SpanTerm>

<SpanTerm>onboarding</SpanTerm>

</SpanNear>

</SpanOr>

</SpanNear>

</Clause>

</BooleanQuery>

DisjunctionMaxQuery

{!xmlparser}

<DisjunctionMaxQuery

tieBreaker="1"

boost="2">

<TermsQuery fieldName="headline">new york times</TermsQuery>

</DisjunctionMaxQuery>

UserQuery

{!xmlparser}

"new computer*"~15

</UserQuery>

ConstantScoreQuery

{!xmlparser}

<UserQuery fieldName="headline">tesla</UserQuery>

</ConstantScoreQuery>

SpanNear

{!xmlparser}

<SpanTerm>computer</SpanTerm>

</SpanNear>

BoostingTermQuery

{!xmlparser}

<BoostingTermQuery

fieldName="headline"

boost="1.2">

tesla

</BoostingTermQuery>

SpanTerm

{!xmlparser}

<SpanTerm

fieldName="headline"

boost="1.2">

tesla

</SpanTerm>

SpanOr

{!xmlparser}

<SpanOr fieldName="headline"

boost="1.2">

<SpanTerm>pizza</SpanTerm>

</SpanOr>

SpanOrTerms

{!xmlparser}

<SpanOrTerms

fieldName="headline"

boost="1.2">

pizza milk

</SpanOrTerms>

SpanFirst

{!xmlparser}

<SpanFirst

fieldName="headline"

end="1"

boost="1.2">

<SpanTerm>tesla</SpanTerm>

</SpanFirst>

SpanNot -- TODO: Redo this--I'm getting some headlines with york in them

{!xmlparser}

</Include>

</Exclude>

</SpanNot>

Pages

Wednesday, July 6, 2016

Goal

Install Solr 6.1 as a Service

Make Solr Service respond to Out Of Memory Exceptions

Logs

Final step (important!)

Tuesday, May 24, 2016

Fields

Examples

Example

Fields

Examples

Fields

Example

Fields

Example

Monday, May 23, 2016

Fields

Examples