Coding Art: May 2016

Tuesday, May 24, 2016

Apache Solr - RangeQuery

RangeQuery appears to be a range query that operates on strings. I'm not sure if it only works on strings or if the strings are a way for it to auto-determine what field type it's going to work with.

If you know how to properly use this, please leave a comment below. I'm stumped on this one.

Fields

fieldName (required in hierarchy here or a parent)
lowerTerm - string value
upperTerm - string value
includeLower (optional, default true)
includeUpper (optional, default true)

Examples

Simply paste the following into the q= field in the Admin UI.

Here's a query that I came up with, but it doesn't work as I'd expect.

{!xmlparser}

<RangeQuery

fieldName="price"

lowerTerm="1.00"

upperTerm="3.00"

includeLower="true"

includeUpper="true">

</RangeQuery>

The above query returns no results, which isn't what I was expecting.

{!xmlparser}

<RangeQuery

fieldName="price"

lowerTerm="0"

upperTerm="10.00"

includeLower="true"

includeUpper="true">

</RangeQuery>

The above query returns 30 documents, of which includes prices > 10.00, which isn't what I expected.

Apache Solr - MatchAllDocsQuery

MatchAllDocsQuery is probably the simplest query there is. All it does is quickly match all documents in an index.

Example

Simply paste the following into the q= field in the Admin UI.

{!xmlparser}

<MatchAllDocsQuery></MatchAllDocsQuery>

Or, using the standard lucene syntax: *:*

Apache Solr - LegacyNumericRangeQuery

NumericRangeQuery was renamed to LegacyNumericRangeQuery about lucene 6.0.0 and marked deprecated. If you're on an older system, you will find it using NumericRangeQuery. Afterwards, LegacyNumericRangeQuery will be what you will want to use.

It looks like NumericRangeQuery has been replaced with PointRangeQuery.

Fields

fieldName (required here or in a parent node)
lowerTerm (optional, default null)
upperTerm (optional, default null)
includeLower (optional, default true)
includeUpper (optional, default true)
precisionStep (optional, default 16)
type (optional, default "int") - long | int | double | float

Examples

Simply paste the following into the q= field in the Admin UI.

NOTE: These examples will not work in Solr since Solr does not support Point types.

Lucene 6.0.0+:

{!xmlparser}

<LegacyNumericRangeQuery

fieldName="price"

lowerTerm="0"

upperTerm="10"

includeLower="true"

includeUpper="true"

precisionStep="16"

type="float">

</LegacyNumericRangeQuery>

Before Lucene 6.0.0:

{!xmlparser}

<NumericRangeQuery

fieldName="price"

lowerTerm="0"

upperTerm="10"

includeLower="true"

includeUpper="true"

precisionStep="16"

type="float">

</NumericRangeQuery>

Apache Solr - PointRangeQuery

PointRangeQuery is new as of about Apache Solr 6.0 and is meant to replace the, now deprecated, NumericRangeQuery (which got renamed to LegacyNumericRangeQuery).

NOTE: The following may be confusing, and here's why. The PointRangeQueryBuilder, used by XmlQueryParser, requires that lowerTerm and upperTerm are specified, which means that neither of them may default. However, in the code, it covers its bases in case lowerTerm or upperTerm was not specified. In these cases, it defaults to the MIN_VALUE and MAX_VALUE for the respective types. Until the code is updated after 6.0.0, the defaults cannot be reached. If you are reading this after they become optional, the following may be beneficial to you. Until then, you will be required to enter a value for both.

Fields

fieldName (required in hierarchy here or in a parent node)
lowerTerm (required)
upperTerm (required)
type (optional, default "int") - long | int | double | float

long

If lowerTerm is not specified, default to Long.MIN_VALUE (-9223372036854775808)
If upperTerm is not specified, default to Long.MAX_VALUE (9223372036854775807)

If lowerTerm is not specified, default to Integer.MIN_VALUE (-2147483648)
If upperTerm is not specified, default to Integer.MAX_VALUE (2147483647)

double

If lowerTerm is not specified, default to Double.NEGATIVE_INFINITY
If upperTerm is not specified, default to Double.POSITIVE_INFINITY

float

If lowerTerm is not specified, default to Float.NEGATIVE_INFINITY
If upperTerm is not specified, default to Float.POSITIVE_INFINITY

Example

Simply paste the following into the q= field in the Admin UI.

NOTE: This example should work, but it doesn't in my Solr 6.0.0 build because Point types aren't supported in Solr yet. I do know that the PointRangeQuery code is running when I submit this query, because it gives me error messages when I don't specify required parameters. However, somewhere in between, something isn't supported and no documents are returned. Please leave a comment if you know more.

{!xmlparser}

<PointRangeQuery

fieldName="price"

lowerTerm="0.00"

upperTerm="100.00"

type="float">

</PointRangeQuery>

Apache Solr - TermsQuery

TermsQuery is essentially a list of TermQuery with a couple of extra options.
It allows you to specify a group of terms on a field and require a minimum number of matches.

Fields

fieldName (required here or in a parent node)
disableCoord (optional, default false)

more about coord-factor in scoring
If disabled, resulting score gets multiplied by 1.0

minimumNumberShouldMatch (optional, default 0)
boost (optional, default 1.0)

more about query-boost in scoring

Value: list of tokens (space-delimited will do it)

Example

Simply paste the following into the q= field in the Admin UI.

{!xmlparser}

<TermsQuery

fieldName="series_t"

disableCoord="true"

boost="1.2"

minimumNumberShouldMatch="2">song ice fire ender black company</TermsQuery>

Monday, May 23, 2016

Apache Solr - TermQuery

TermQuery is a very simple query that matches documents containing a term.
A term represents a word found in a field. A term has a field property and a text property.

Fields

fieldName (required)

boost (optional, 1.0 default)

more about query-boost in scoring

Examples

Simply paste the following into the q= field in the Admin UI.

Query Parser	Syntax
XmlQueryParser	{!xmlparser} <TermQuery fieldName="_text_" boost="1.3">test</TermQuery>
Lucene	_text_:test^1.3

Apache Solr - XML Query Parser

Introduction

The XML Query Parser (XmlQueryParser) supports a very wide range of available Apache Solr
search queries--more so than any other query parser that ships with it.
This article will attempt to examine the breadth of that influence released with Solr 6.0.0.

I will be adding separate articles (and linking to them) for the different types of queries so that
more detail may be devoted to it and not overwhelm this main thread.

De-Facto Example

<BooleanQuery fieldName="description">

    <Clause
occurs="must">

        <TermQuery>shirt</TermQuery>

    </Clause>

    <Clause
occurs="mustnot">

        <TermQuery>plain</TermQuery>

    </Clause>

    <Clause
occurs="should">

        <TermQuery>cotton</TermQuery>

    </Clause>

    <Clause
occurs="must">

        <BooleanQuery
fieldName="size">

            <Clause
occurs="should">

                <TermsQuery>S M L</TermsQuery>

            </Clause>

        </BooleanQuery>

    </Clause>

</BooleanQuery>

Difficulties

How do I get highlighting to work?

Top-Level

BooleanQuery

disableCoord (optional, false)
minimumNumberShouldMatch (optional, 0)
boost (optional, 1.0)
Value

Clause

occurs: should | must | mustNot | filter
Value (Note: Many of the following can also have children, explained later)

TermQuery
TermsQuery
MatchAllDocsQuery
BooleanQuery
LegacyNumericRangeQuery (deprecated)
PointRangeQuery
DisjunctionMaxQuery
UserQuery
ConstantScoreQuery
SpanNear
BoostingTermQuery
SpanTerm
SpanOr
SpanOrTerms
SpanFirst
SpanNot

NOTE: Only the first Clause child is recognized--others will get silently ignored!

Ignores any other element types at this level--i.e. only Clause is recognized, no exceptions thrown if it finds something else

MatchAllDocsQuery - Matches all documents in an index
TermQuery

TermsQuery

[Legacy]NumericRangeQuery (deprecated in lucene 6.0.0ish)

Not supported as of Solr 6 (solr doesn't support point types yet)

PointRangeQuery (new in 6.0ish)

Not supported as of Solr 6 (solr doesn't support point types yet)

RangeQuery

DisjunctionMaxQuery

tieBreaker (optional, 0.0)
boost (optional, 1.0)
Value

May contain multiple queries of any type of Query defined in this list (i.e. DisjunctionMaxQuery, RangeQuery, …)

UserQuery

fieldName (optional, defaults to defaultField)
Value

Text is passed into QueryParser.parse
This appears to support the classic query syntax

NOTE: Wraps the query into a BoostQuery

ConstantScoreQuery

boost (optional, 1.0)
Value

Only gets the first child
Child may be any query in this list

SpanNear

boost (optional, 1.0)
slop
inOrder (optional, false)
Value

A collection of various types of SpanQuery

BoostingTermQuery

fieldName (required either here or in a parent)
boost (optional, 1.0)
Value: fieldName value

SpanTerm

fieldName (required either here or in a parent)
boost (optional, 1.0)
Value: fieldName value

SpanOr

boost (optional, 1.0)
Value: a collection of various types of SpanQuery

SpanOrTerms

fieldName (required either here or in a parent)
boost (optional, 1.0)
Value: terms commonly separated by a space
Wraps the terms in a SpanOr query

SpanFirst

This limits span matches to the first N (specified by the end parameter below) positions

More specifically, match spans in the subquery whose end position is less than or equal to end.

boost (optional, 1.0)
end (optional, 1, integer)
Value:

Gets the first child, which must be a SpanQuery
All other children are ignored

SpanNot

boost (optional, 1.0)
Include - First child element called Include must contain a SpanQuery
Exclude - First child element called Exclude must contain a SpanQuery

BooleanQuery

TermQuery

{!xmlparser}

</Clause>

</BooleanQuery>

{!xmlparser}

</Clause>

</BooleanQuery>

SpanNear

// Headline: new pre/3 york

{!xmlparser}

</SpanNear>

</Clause>

</BooleanQuery>

// Headline: new pre/3 (york or car)

{!xmlparser}

</SpanOr>

</SpanNear>

</Clause>

</BooleanQuery>

// Headline: new pre/3 (york or (car w/3 bart))

// Match: "headline":"New York. Hongkong. Wunsiedel"

{!xmlparser}

<SpanTerm>arrives</SpanTerm>

</SpanNear>

</SpanOr>

</SpanNear>

</Clause>

</BooleanQuery>

// Headline: new pre/3 (daybook or (employee w/3 onboarding))

{!xmlparser}

<SpanTerm>daybook</SpanTerm>

<SpanTerm>employee</SpanTerm>

<SpanTerm>onboarding</SpanTerm>

</SpanNear>

</SpanOr>

</SpanNear>

</Clause>

</BooleanQuery>

DisjunctionMaxQuery

{!xmlparser}

<DisjunctionMaxQuery

tieBreaker="1"

boost="2">

<TermsQuery fieldName="headline">new york times</TermsQuery>

</DisjunctionMaxQuery>

UserQuery

{!xmlparser}

"new computer*"~15

</UserQuery>

ConstantScoreQuery

{!xmlparser}

<UserQuery fieldName="headline">tesla</UserQuery>

</ConstantScoreQuery>

SpanNear

{!xmlparser}

<SpanTerm>computer</SpanTerm>

</SpanNear>

BoostingTermQuery

{!xmlparser}

<BoostingTermQuery

fieldName="headline"

boost="1.2">

tesla

</BoostingTermQuery>

SpanTerm

{!xmlparser}

<SpanTerm

fieldName="headline"

boost="1.2">

tesla

</SpanTerm>

SpanOr

{!xmlparser}

<SpanOr fieldName="headline"

boost="1.2">

<SpanTerm>pizza</SpanTerm>

</SpanOr>

SpanOrTerms

{!xmlparser}

<SpanOrTerms

fieldName="headline"

boost="1.2">

pizza milk

</SpanOrTerms>

SpanFirst

{!xmlparser}

<SpanFirst

fieldName="headline"

end="1"

boost="1.2">

<SpanTerm>tesla</SpanTerm>

</SpanFirst>

SpanNot -- TODO: Redo this--I'm getting some headlines with york in them

{!xmlparser}

</Include>

</Exclude>

</SpanNot>

Important Apache Solr 6+ Commands for Windows

Sometimes what we really need is a quick reference to common commands we use on a somewhat-daily basis.

We are a Windows shop and sometimes Windows doesn't receive the same love as the *nix world in Solr.

Note that most of these commands are to be executed from the root directory of your Apache Solr installation (or solr-src\solr directory if you compiled from source). This article will be updated as we get more involved with Apache Solr. Not all of these commands are Windows-specific, but many are.

ZooKeeper

Solr Cloud

One thing that's unique about Solr Cloud is that everything is in a "cloud," so instead of editing a file on the filesystem, you are expected to use the Config API or download and upload the config file to be distributed throughout the cloud. That process took a while to discover. The procedure is to download the working version of the solr cloud solrconfig.xml, put it in source control (optional, but recommended), make your changes, then push those changes back to the cloud by uploading the file to ZooKeeper and check back into source control.

Download the solrconfig.xml for the gettingstarted collection

server\scripts\cloud-scripts\zkcli.bat -cmd getfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983

Upload the solrconfig.xml for the gettingstarted collection

server\scripts\cloud-scripts\zkcli.bat -cmd putfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983

Apache Solr

Management

Start

Start Solr for the first time to create a cloud collection (called gettingstarted [default name]). Also used to start it after the first time (which created it).

(from solr dir)

bin\solr.cmd start -e cloud -noprompt

Stop

bin\solr.cmd stop -all

Index documents (from the file system)

java -Dc=gettingstarted -Dauto=yes -Drecursive=yes -jar example\exampledocs\post.jar example\exampledocs

Search

Use a different query parser

Let's say you want to use the surround query parser, which comes with Solr.

q={!surround}test

Pages

Tuesday, May 24, 2016

Fields

Examples

Example

Fields

Examples

Fields

Example

Fields

Example

Monday, May 23, 2016

Fields

Examples

ZooKeeper

Solr Cloud

Apache Solr

Management

Search