Big Data Analytics —Getting Started With Elasticsearch

March 31, 2019

Big Data Analytics —Getting Started With Elasticsearch

The Elastic Stack has recently risen to fame in the realm of Big Data analytics and machine learning. The Elastic Stack is a suite of tools (i.e. Elasticsearch, Logstash, Kibana and Beats) for analyzing large quantities of data in real time. In the proceeding article, we’ll briefly cover Elasticsearch.

Elasticsearch is an open source, distributed, RESTful search engine. There’s a lot of information in that short sentence so let’s break it down.

  • Open source — The source code for Elasticsearch is on GitHub and you can contribute if you’d like. It’s worth mentioning that the company Elastic has built a business around Elasticsearch and the rest of the Elastic Stack.
  • Distributed — Elasticsearch is designed to horizontally scale using node clusters. In other words, it can run on top of multiple computer systems.
  • RESTful — REST is a pattern followed when developing an API where actions are performed by making HTTP requests to different endpoints.

Elasticsearch allows you to store, search, and analyze big volumes of data in near real time. You might see Elasticsearch used for things like a web store that allows their customers to search for products, a business that wants to analyze and visualize consumer trends or a company that wants to aggregate, parse and perform queries on a set of logs.

Java

Elasticsearch runs on top of the JVM. Ergo, we need to have Java installed prior to installing Elasticsearch. You can verify if Java is installed by running java -version. In the event it isn’t already installed, you can install it by running the following command.

sudo apt-get install default-jdk

Next, we need to make sure that the JAVA_HOME environment variable is set.

echo $JAVA_HOME

If nothing comes back, then you’ll want to add the following line to your environment, using sudo vi /etc/environemnt.

JAVA_HOME="/usr/lib/jvm/<java-version>"

To update the bash profile, run source /etc/environment.

Elasticsearch

To start, download and install the public signing key.

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

You may need to install the apt-transport-https package on Debian before proceeding.

sudo apt-get install apt-transport-https

The following line adds Elastic’s Debian package repository.

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

Finally, pull down and install the package.

sudo apt-get update && sudo apt-get install elasticsearch

Elasticsearch isn’t automatically started after installation. You can start it by running the following command.

sudo systemctl start elasticsearch.service

To configure Elasticsearch to start automatically when the system boots up, you can run the following commands.

sudo systemctl daemon-reload  
sudo systemctl enable elasticsearch.service

By default, Elasticsearch runs on port 9200. We can verify that it’s working by running curl localhost:9200 | jq '.'. If everything is working you should see a output similar to the following.

{  
  "name" : "zP41Q2p",  
  "cluster_name" : "elasticsearch",  
  "cluster_uuid" : "z8FCPGNXTZymP8hmcas-YQ",  
  "version" : {  
    "number" : "6.7.0",  
    "build_flavor" : "default",  
    "build_type" : "deb",  
    "build_hash" : "8453f77",  
    "build_date" : "2019-03-21T15:32:29.844721Z",  
    "build_snapshot" : false,  
    "lucene_version" : "7.7.0",  
    "minimum_wire_compatibility_version" : "5.6.0",  
    "minimum_index_compatibility_version" : "5.0.0"  
  },  
  "tagline" : "You Know, for Search"  
}

Next, let’s create a data.json file with the following content.

{  
 "firstname": "John",  
 "lastname": "Doe"  
}

To add data, we make a POST request to <host>/<index>/<type><id>.

curl -d "@data.json" `-H "Content-Type: application/json"` -X `POST` localhost:9200/accounts/person/1 | jq '.'

We can verify that it was successful by making a GET request to the same endpoint.

curl localhost:9200/accounts/person/1 `| jq '.'`

We can a document by making a post request to _update.

curl -d '{"doc":{"age": 42}}' `-H "Content-Type: application/json"` -X `POST` localhost:9200/accounts/person/1/_update | jq '.'

We can verify that it was successful by making a GET request to the same endpoint.

curl localhost:9200/accounts/person/1 `| jq '.'`

Let’s create a data2.json file with the following content.

{  
 "firstname": "Jane",  
 "lastname": "Smith",  
 "age": 28  
}

Run the following command to add it to the store.

curl -d "@data2.json" `-H "Content-Type: application/json"` -X `POST` localhost:9200/accounts/person/2 | jq '.'

We can also search for data using query strings.

curl localhost:9200/_search?q=john `| jq '.'`

The following command will return all the data with a field age whose value is equal to 42.

curl localhost:9200/_search?q=age:42 | jq '.'

We can delete a specific document by making a DELETE request.

curl -X DELETE localhost:9200/accounts/person/1 | jq '.'

Finally, we can delete the full index.

curl -X DELETE localhost:9200/accounts | jq '.'

Cory Maklin
_Sign in now to see your channels and recommendations!_www.youtube.com


Profile picture

Written by Cory Maklin Genius is making complex ideas simple, not making simple ideas complex - Albert Einstein You should follow them on Twitter