Tuesday, May 24, 2016

Apache Solr - RangeQuery

RangeQuery appears to be a range query that operates on strings.  I'm not sure if it only works on strings or if the strings are a way for it to auto-determine what field type it's going to work with.
If you know how to properly use this, please leave a comment below.  I'm stumped on this one.

Fields

  • fieldName (required in hierarchy here or a parent)
  • lowerTerm - string value
  • upperTerm - string value
  • includeLower (optional, default true)
  • includeUpper (optional, default true)

Examples

Simply paste the following into the q= field in the Admin UI.

Here's a query that I came up with, but it doesn't work as I'd expect.

{!xmlparser}
<RangeQuery
 fieldName="price"
 lowerTerm="1.00"
 upperTerm="3.00"
 includeLower="true"
 includeUpper="true">
</RangeQuery>

The above query returns no results, which isn't what I was expecting.


{!xmlparser}
<RangeQuery
 fieldName="price"
 lowerTerm="0"
 upperTerm="10.00"
 includeLower="true"
 includeUpper="true">
</RangeQuery>

The above query returns 30 documents, of which includes prices > 10.00, which isn't what I expected.

Apache Solr - MatchAllDocsQuery

MatchAllDocsQuery is probably the simplest query there is.  All it does is quickly match all documents in an index.

Example

Simply paste the following into the q= field in the Admin UI.

{!xmlparser}
<MatchAllDocsQuery></MatchAllDocsQuery>


Or, using the standard lucene syntax: *:*

Apache Solr - LegacyNumericRangeQuery

NumericRangeQuery was renamed to LegacyNumericRangeQuery about lucene 6.0.0 and marked deprecated. If you're on an older system, you will find it using NumericRangeQuery. Afterwards, LegacyNumericRangeQuery will be what you will want to use.

It looks like NumericRangeQuery has been replaced with PointRangeQuery.

Fields

  • fieldName (required here or in a parent node)
  • lowerTerm (optional, default null)
  • upperTerm (optional, default null)
  • includeLower (optional, default true)
  • includeUpper (optional, default true)
  • precisionStep (optional, default 16)
  • type (optional, default "int") - long | int | double | float

Examples

Simply paste the following into the q= field in the Admin UI.
NOTE: These examples will not work in Solr since Solr does not support Point types.

Lucene 6.0.0+:
{!xmlparser}
<LegacyNumericRangeQuery
 fieldName="price"
 lowerTerm="0"
 upperTerm="10"
 includeLower="true"
 includeUpper="true"
 precisionStep="16"
 type="float">
</LegacyNumericRangeQuery>

Before Lucene 6.0.0:
{!xmlparser}
<NumericRangeQuery
 fieldName="price"
 lowerTerm="0"
 upperTerm="10"
 includeLower="true"
 includeUpper="true"
 precisionStep="16"
 type="float">
</NumericRangeQuery>

Apache Solr - PointRangeQuery

PointRangeQuery is new as of about Apache Solr 6.0 and is meant to replace the, now deprecated, NumericRangeQuery (which got renamed to LegacyNumericRangeQuery).

NOTE: The following may be confusing, and here's why.  The PointRangeQueryBuilder, used by XmlQueryParser, requires that lowerTerm and upperTerm are specified, which means that neither of them may default.  However, in the code, it covers its bases in case lowerTerm or upperTerm was not specified.  In these cases, it defaults to the MIN_VALUE and MAX_VALUE for the respective types.  Until the code is updated after 6.0.0, the defaults cannot be reached.  If you are reading this after they become optional, the following may be beneficial to you.  Until then, you will be required to enter a value for both.

Fields

  • fieldName (required in hierarchy here or in a parent node)
  • lowerTerm (required)
  • upperTerm (required)
  • type (optional, default "int") - long | int | double | float
    • long
      • If lowerTerm is not specified, default to Long.MIN_VALUE (-9223372036854775808)
      • If upperTerm is not specified, default to Long.MAX_VALUE (9223372036854775807)
    • int
      • If lowerTerm is not specified, default to Integer.MIN_VALUE (-2147483648)
      • If upperTerm is not specified, default to Integer.MAX_VALUE (2147483647)
    • double
      • If lowerTerm is not specified, default to Double.NEGATIVE_INFINITY
      • If upperTerm is not specified, default to Double.POSITIVE_INFINITY
    • float
      • If lowerTerm is not specified, default to Float.NEGATIVE_INFINITY
      • If upperTerm is not specified, default to Float.POSITIVE_INFINITY

Example

Simply paste the following into the q= field in the Admin UI.
NOTE: This example should work, but it doesn't in my Solr 6.0.0 build because Point types aren't supported in Solr yet.  I do know that the PointRangeQuery code is running when I submit this query, because it gives me error messages when I don't specify required parameters.  However, somewhere in between, something isn't supported and no documents are returned.  Please leave a comment if you know more.

{!xmlparser}
<PointRangeQuery
 fieldName="price"
 lowerTerm="0.00"
 upperTerm="100.00"
 type="float">
</PointRangeQuery>

Apache Solr - TermsQuery

TermsQuery is essentially a list of TermQuery with a couple of extra options.
It allows you to specify a group of terms on a field and require a minimum number of matches.

Fields

  • fieldName (required here or in a parent node)
  • disableCoord (optional, default false)
  • minimumNumberShouldMatch (optional, default 0)
  • boost (optional, default 1.0)
  • Value: list of tokens (space-delimited will do it)

Example

Simply paste the following into the q= field in the Admin UI.
{!xmlparser}
<TermsQuery
 fieldName="series_t"
 disableCoord="true"
 boost="1.2"
 minimumNumberShouldMatch="2">song ice fire ender black company</TermsQuery>

Monday, May 23, 2016

Apache Solr - TermQuery

TermQuery is a very simple query that matches documents containing a term.
A term represents a word found in a field.  A term has a field property and a text property.

Fields


Examples

Simply paste the following into the q= field in the Admin UI.
Query ParserSyntax
XmlQueryParser
{!xmlparser}
<TermQuery fieldName="_text_" boost="1.3">test</TermQuery>
Lucene _text_:test^1.3

Apache Solr - XML Query Parser

Introduction
The XML Query Parser (XmlQueryParser) supports a very wide range of available Apache Solr
search queries--more so than any other query parser that ships with it.
This article will attempt to examine the breadth of that influence released with Solr 6.0.0.
I will be adding separate articles (and linking to them) for the different types of queries so that
more detail may be devoted to it and not overwhelm this main thread.

De-Facto Example
<BooleanQuery fieldName="description">
    <Clause occurs="must">
        <TermQuery>shirt</TermQuery>
    </Clause>
    <Clause occurs="mustnot">
        <TermQuery>plain</TermQuery>
    </Clause>
    <Clause occurs="should">
        <TermQuery>cotton</TermQuery>
    </Clause>
    <Clause occurs="must">
        <BooleanQuery fieldName="size">
            <Clause occurs="should">
                <TermsQuery>S M L</TermsQuery>
            </Clause>
        </BooleanQuery>
    </Clause>
</BooleanQuery>


Difficulties
  • How do I get highlighting to work?

Top-Level
  • BooleanQuery
    • disableCoord (optional, false)
    • minimumNumberShouldMatch (optional, 0)
    • boost (optional, 1.0)
    • Value
      • Clause
        • occurs: should | must | mustNot | filter
        • Value (Note: Many of the following can also have children, explained later)
          • TermQuery
          • TermsQuery
          • MatchAllDocsQuery
          • BooleanQuery
          • LegacyNumericRangeQuery (deprecated)
          • PointRangeQuery
          • DisjunctionMaxQuery
          • UserQuery
          • ConstantScoreQuery
          • SpanNear
          • BoostingTermQuery
          • SpanTerm
          • SpanOr
          • SpanOrTerms
          • SpanFirst
          • SpanNot
        • NOTE: Only the first Clause child is recognized--others will get silently ignored!
      • Ignores any other element types at this level--i.e. only Clause is recognized, no exceptions thrown if it finds something else

  • MatchAllDocsQuery - Matches all documents in an index
  • TermQuery
  • TermsQuery
  • [Legacy]NumericRangeQuery (deprecated in lucene 6.0.0ish)
    • Not supported as of Solr 6 (solr doesn't support point types yet)
  • PointRangeQuery (new in 6.0ish)
    • Not supported as of Solr 6 (solr doesn't support point types yet)
  • RangeQuery
  • DisjunctionMaxQuery
    • tieBreaker (optional, 0.0)
    • boost (optional, 1.0)
    • Value
      • May contain multiple queries of any type of Query defined in this list (i.e. DisjunctionMaxQuery, RangeQuery, …)
  • UserQuery
    • fieldName (optional, defaults to defaultField)
    • Value
      • Text is passed into QueryParser.parse
      • This appears to support the classic query syntax
    • NOTE: Wraps the query into a BoostQuery
  • ConstantScoreQuery
    • boost (optional, 1.0)
    • Value
      • Only gets the first child
      • Child may be any query in this list
  • SpanNear
    • boost (optional, 1.0)
    • slop
    • inOrder (optional, false)
    • Value
      • A collection of various types of SpanQuery
  • BoostingTermQuery
    • fieldName (required either here or in a parent)
    • boost (optional, 1.0)
    • Value: fieldName value
  • SpanTerm
    • fieldName (required either here or in a parent)
    • boost (optional, 1.0)
    • Value: fieldName value
  • SpanOr
    • boost (optional, 1.0)
    • Value: a collection of various types of SpanQuery
  • SpanOrTerms
    • fieldName (required either here or in a parent)
    • boost (optional, 1.0)
    • Value: terms commonly separated by a space
    • Wraps the terms in a SpanOr query
  • SpanFirst
    • This limits span matches to the first N (specified by the end parameter below) positions
      • More specifically, match spans in the subquery whose end position is less than or equal to end.
    • boost (optional, 1.0)
    • end (optional, 1, integer)
    • Value:
      • Gets the first child, which must be a SpanQuery
      • All other children are ignored
  • SpanNot
    • boost (optional, 1.0)
    • Include - First child element called Include must contain a SpanQuery
    • Exclude - First child element called Exclude must contain a SpanQuery



BooleanQuery
TermQuery
{!xmlparser}
<BooleanQuery fieldName="headline">
  <Clause occurs="must">
    <TermQuery>york</TermQuery>
  </Clause> 
</BooleanQuery>

{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <TermQuery fieldName="headline">york</TermQuery>
  </Clause>
</BooleanQuery>

SpanNear
// Headline: new pre/3 york
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanTerm>york</SpanTerm>
    </SpanNear>
  </Clause>
</BooleanQuery>

// Headline: new pre/3 (york or car)
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanOr>
    <SpanTerm>york</SpanTerm>
    <SpanTerm>car</SpanTerm>
</SpanOr>
    </SpanNear>
  </Clause>
</BooleanQuery>

// Headline: new pre/3 (york or (car w/3 bart))
// Match: "headline":"New York. Hongkong. Wunsiedel"
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanOr>
    <SpanTerm>york</SpanTerm>
    <SpanNear slop="3" inOrder="false">
<SpanTerm>car</SpanTerm>
<SpanTerm>arrives</SpanTerm>
    </SpanNear>
</SpanOr>
    </SpanNear>
  </Clause>
</BooleanQuery>

// Headline: new pre/3 (daybook or (employee w/3 onboarding))
{!xmlparser}
<BooleanQuery>
  <Clause occurs="must">
    <SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanOr>
    <SpanTerm>daybook</SpanTerm>
    <SpanNear slop="3" inOrder="false">
<SpanTerm>employee</SpanTerm>
<SpanTerm>onboarding</SpanTerm>
    </SpanNear>
</SpanOr>
    </SpanNear>
  </Clause>
</BooleanQuery>

DisjunctionMaxQuery
{!xmlparser}
<DisjunctionMaxQuery
 tieBreaker="1"
 boost="2">
    <UserQuery fieldName="headline">uber</UserQuery>
    <TermsQuery fieldName="headline">new york times</TermsQuery>
</DisjunctionMaxQuery>

UserQuery
{!xmlparser}
<UserQuery fieldName="headline">
"new computer*"~15
</UserQuery>

ConstantScoreQuery
{!xmlparser}
<ConstantScoreQuery boost="1.0">
    <UserQuery fieldName="headline">tesla</UserQuery>
</ConstantScoreQuery>

SpanNear
{!xmlparser}
<SpanNear fieldName="headline" slop="3" inOrder="true">
<SpanTerm>new</SpanTerm>
<SpanTerm>computer</SpanTerm>
</SpanNear>

BoostingTermQuery
{!xmlparser}
<BoostingTermQuery
  fieldName="headline"
  boost="1.2">
tesla
</BoostingTermQuery>

SpanTerm
{!xmlparser}
<SpanTerm
  fieldName="headline"
  boost="1.2">
tesla
</SpanTerm>

SpanOr
{!xmlparser}
<SpanOr fieldName="headline"
  boost="1.2">
<SpanTerm>pizza</SpanTerm>
<SpanTerm>milk</SpanTerm>
</SpanOr>

SpanOrTerms
{!xmlparser}
<SpanOrTerms
  fieldName="headline"
  boost="1.2">
pizza milk
</SpanOrTerms>

SpanFirst
{!xmlparser}
<SpanFirst
  fieldName="headline"
  end="1"
  boost="1.2">
<SpanTerm>tesla</SpanTerm>
</SpanFirst>

SpanNot -- TODO: Redo this--I'm getting some headlines with york in them
{!xmlparser}
<SpanNot fieldName="headline">
  <Include>
<SpanTerm>new</SpanTerm>
  </Include>
  <Exclude>
<SpanTerm>york</SpanTerm>
  </Exclude>
</SpanNot>

Important Apache Solr 6+ Commands for Windows

Sometimes what we really need is a quick reference to common commands we use on a somewhat-daily basis.
We are a Windows shop and sometimes Windows doesn't receive the same love as the *nix world in Solr.

Note that most of these commands are to be executed from the root directory of your Apache Solr installation (or solr-src\solr directory if you compiled from source).  This article will be updated as we get more involved with Apache Solr.  Not all of these commands are Windows-specific, but many are.

ZooKeeper

Solr Cloud

One thing that's unique about Solr Cloud is that everything is in a "cloud," so instead of editing a file on the filesystem, you are expected to use the Config API or download and upload the config file to be distributed throughout the cloud.  That process took a while to discover.  The procedure is to download the working version of the solr cloud solrconfig.xml, put it in source control (optional, but recommended), make your changes, then push those changes back to the cloud by uploading the file to ZooKeeper and check back into source control.


Download the solrconfig.xml for the gettingstarted collection
server\scripts\cloud-scripts\zkcli.bat -cmd getfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983


Upload the solrconfig.xml for the gettingstarted collection
server\scripts\cloud-scripts\zkcli.bat -cmd putfile /configs/gettingstarted/solrconfig.xml solrconfiglocal.xml -zkhost localhost:9983

Apache Solr

Management

Start
Start Solr for the first time to create a cloud collection (called gettingstarted [default name]).  Also used to start it after the first time (which created it).
(from solr dir)
bin\solr.cmd start -e cloud -noprompt

Stop
bin\solr.cmd stop -all

Index documents (from the file system)
java -Dc=gettingstarted -Dauto=yes -Drecursive=yes -jar example\exampledocs\post.jar example\exampledocs

Search

Use a different query parser
Let's say you want to use the surround query parser, which comes with Solr.
q={!surround}test