How to use the Solr Data Import Handler to index a MySQL database table

In this article we will go through Importing and indexing My Sql database table data in solr4 using Data Import Handler. In our previous discussions we went through the Integration of solr4 with tomcat, now its time to provide solr4 some data on which the search will be done. In solr4 database table data can be imported and indexed using a number of ways. The most easiest and effective way of doing this is using solr's Data Import Handler. Data Import Handler is not there in solr4 package directly so we need to add a handler entry in solrconfig.xml to let solr4 know about our database configuration.

If you are new to solr4, I recommend you to go through following discussion before we start.

Introduction to Apache Solr 4.0 with Apache Tomcat

In this particular blog i am assuming that you have a basic knowledge of How Solr 4 Works. If you have configured your solr setup correctly than you will be able to see solr dashboard something like this screen.

In case you are not able to see solr dashboard than go through Apache Solr 4.0 with Apache Tomcat 7 in Ubuntu Linux.

Now to import and index data in solr4 you need to follow two step process and everything will be done.

1 Configuring data source

Our first step is to configure data source to do this thing just add a configuration file to your solr-home like this /home/javaguy/solr-home/collection1/conf/data-config.xml

<dataConfig><dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/indian_team" user="root" password="root" batchSize="1" />    <document name="players">        <entity name="players_data" query="SELECT * FROM players_data">            <field column="player_id" name="id" />            <field column="player_name" name="player_name" /><field column="player_age" name="player_age" /><field column="player_description" name="player_description" />        </entity>    </document></dataConfig>
This file tells solr about your database credential and the fields of table that you want to import in order to enable search in solr. Document name and entity name can be given any value you want , in field columns add your table columns that you want to be imported as solr indexed data. column attribute represents the column name in your table and name attribute value represents related data fields value in schema.xml.

2 . Adding data import handler in solr-config.xml

Our next and final step is to add a data import handler entry in solr-config.xml to make solr aware of solr-config.xml and to point data source that is going to be imported. This can be done by adding following code in your solr-config.xml like this.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">   <lst name="defaults">  <str name="config">data-config.xml</str></lst> </requestHandler>

Make sure that you have added this handler entry at right place in your config, the best way is to add this just after any other handler entry.

Here you are all done with your data import configuration, now restart the server and go to Dataimport under collection 1, choose full-import from Command drop down, select your entity under Entity drop down and click  Execute Import.  If your import is successful you will see a green entry saying Indexing Completed as shown in the figure below.
By default solr indexed 0 to10 rows from database if you want to index more or less tables than change Start,Rows value before executing import.
We are all done

Here we are all done with importing and indexing data in solr. To check if your data is indexed correctly just go to Query tab under collection1 and execute query with default values and you will see something your indexed data in xml form like this screen.

This is all about importing and indexing data in solr using Data Import Handler. In upcoming blogs we will learn more tips and tricks about solr4 and full text searching.