Elastic Stack with SQL Server

IMHO simplest way to populate Elasticsearch with data from DMBS, assuming there is a driver for JDBC, is to use Logstash. Of course it's nowhere near to proper sync that should handle  not only inserts and updates but deletes as well but in most cases this will be enough for prototyping, testing or learning concepts.

These instructions are for Microsoft SQL Server but same can be done easily with other DBMSes like MySQL or PostgreSQL. I'm assuming Elasticsearch is already installed somewhere. If it's listening on localhost head to /etc/elasticsearch/elasticsearch.yml and add:

network.host: _site_
discovery.type: single-node

Restart elastic instance. Now lets install Logstash

curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list
apt update
apt install logstash

To be able to talk with SQL Server we will need to use appropriate JDBC driver. Most recent version available at time of writing is 9.2 and supports SQL Server versions starting from 2012 and newer. Logstash is currently packaged with Java 11.

mkdir -p /opt/sqljdbc_9.2/enu/
wget https://github.com/microsoft/mssql-jdbc/releases/download/v9.2.1/mssql-jdbc-9.2.1.jre11.jar -P /opt/sqljdbc_9.2/enu/

Logstash allows to select column that will be used to sync from SQL Server only inserted/updated records, in this case it's "updated_at". By default this value is stored in ~/.logstash_jdbc_last_run. Location of this file can be changed using last_run_metadata_path variable - I like to have all things related to particular service stored in one place so lets put it in /var/lib/logstash/jdbc_last_runs/ directory (should be taken from path.data)

mkdir /var/lib/logstash/jdbc_last_runs

Here's sample config file:

input {
    jdbc {
        jdbc_connection_string => "jdbc:sqlserver://192.168.1.124\MSSQLSERVER;database=Test;user=sa;password=Passwd123"
        jdbc_user => nil
        jdbc_driver_library => "/opt/sqljdbc_9.2/enu/mssql-jdbc-9.2.1.jre11.jar"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"

        statement => "SELECT * FROM dbo.sampletable WHERE updated_at >= :sql_last_value"
        # statement_filepath
        last_run_metadata_path => "/var/lib/logstash/jdbc_last_runs/sampletable"
        use_column_value => true
        tracking_column => "@timestamp"
        shedule => "0 * * * *"
    }
}

output {
    elasticsearch {
        hosts => ["192.168.1.121:9200"]
        index => "test"
    }
    stdout { codec => rubydebug }
}
/etc/logstash/conf.d/sampletable.conf

If there is no id column in our table we should add it to update rows that we have already fetched instead of inserting them again. Something like "SELECT [table id] as id, * FROM..." should do the job.  Now lets test our settings doing initial import:

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/sampletable.conf

If there were no errors we should have our data in Elasticsearch.

Useful links

https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server
https://docs.microsoft.com/en-us/sql/connect/jdbc/microsoft-jdbc-driver-for-sql-server-support-matrix