Introduction to Apache Solr 4.0 with Apache Tomcat

In this blog we will dive into Introduction and features of solr4.0 and will came to know how solr is a useful search server for full text searching. Solr is a simple configuration based implementation of Full Text Searching over lucene libraries. Lets take a look at the differences and similarities of these two.

Lucene was introduced by Doug Cutting in 1990, later in 2011 he donated it to Apache Software Foundation as a Open Source project. Lucene is a Java-based search library with stable and mature API that is regularly improving from last 10 years. It is used for best practices in indexing and query capabilities for full text search in a fast and lightweight manner. Lucene can be implemented in a enterprise applications in a very easy and effective way.

Solr was introduced in 2006 by Yonic Seeley. It provides an fully capable search engine implementation over lucene search libraries by adding a full enterprise search server features and capabilities. Solr makes it easy to implement lucene programming libraries as a managed and easy configuration. Solr runs in a java servlet container and it can be deployed as a standalone Java war file.

Features of Solr 
1 - Advanced full text search - Solr makes it easy search text over in a document or within a database structure. This is achieved by importing the database and indexing it wit a variety of field types.

2 - Standard open interfaces - Search text data from solr can be achieved in a variety of forms like - (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, CSV, binary.. etc.

3 - Html based Administrator Dashboard interface - Solr provides a HTML based easy to use interface to control most of its features including UI based data importer, filter analyser and query builder. UI based admin dashboard enables Comprehensive statistics on cache utilization, updates, index statics, logging control and queries.


4 - Auto indexing - Using data import filter auto indexing can be achieved dynamically. Solr provides an automatic indexing based on a minimal set of xml configuration file like schema.xml and solrconfig.xml.


5 - Strong Faceting - Using solr searching data can be filtered using facets capabilities. This can be done by declaring facets in schema.xml. Solr makes it very easy to define dynamic fields on the fly.Faceted Search data can be obtained based on unique field values, explicit queries, date ranges, numeric ranges or pivot.
<field name="player_name" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="player_age" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="player_description" type="text_general" indexed="true" stored="true" multiValued="true"/>

6 - Text Highlighting - Solr adds up a very useful and effective feature of highligting the context snippets. Solr adds up user defined html tags to search keywords so that the keywords can be identified and be highlighted in search results.
      <str name="hl">on</str>
<str name="hl.fl">content features title name</str>
<str name="hl.encoder">html</str>
<str name="hl.simple.pre">&lt;b&gt;</str>
<str name="">&lt;/b&gt;</str>
<str name="f.title.hl.fragsize">0</str>
<str name="f.title.hl.alternateField">title</str>
<str name="">0</str>
<str name="">name</str>
<str name="f.content.hl.snippets">3</str>
<str name="f.content.hl.fragsize">200</str>
<str name="f.content.hl.alternateField">content</str>
<str name="f.content.hl.maxAlternateFieldLength">750</str>
7 - Spelling Check - Solr provides a spell check filter to implement a full featured Spelling suggestions capability for user queries.
8 - Multi language support
9 - Frequent Realtime Updated - Solr provides almost real time updates on searching text data, this is obtained by solr because of its restfull way of showing output data.
10 - Rest full implementation - Solr is deployed on servers as a standalone war file and the query response can be obtained back in form of restfull services.

There is much more to add up as solr features including Date Math(specify dates relative to "NOW" in queries and updates), Numeric field statistics such as min, max, average, standard deviation, Auto-suggest functionality for completing user queries and Simple join capability between two document types, dynamically create and delete document collections without restarting.
In upcoming blogs we will walk through the implementation of solr4.0 with apache-tomcat and how to implement useful features of solr 4.0 in our application.