In this article we will talk about the basic concepts of Elasticsearch. I recommend that you read the following articles to create Elasticsearch on Centos 7 and create indice in Elasticsearch over a sample scenario.
Elasticsearch is a search engine where data can be stored and searched for.
The data is stored as JSON and allows you to search very quickly. It uses the LUCENE java module in the background.
There is exceptional text search performance compared to text searches in database management systems.
Up to version 6.0, corresponds to the database concept in database management systems. After version 6.0, type(corresponds table in dbms) obligation has been removed. As a result of this, we can say that indice is the database and table with 6.0 version.
Corresponds to the table concept in database management systems. When we created an indice, it created a type named “doc_” by default until version 6.0. And when we wanted to add a record to indice, we had to specify the type name. With version 6.0, the type is no longer created by default and we do not have to specify the type during the insert operation.
The data is divided into parts called shard and distributed to multiple servers. In this way, more I / O performance can be achieved by using more server resources.
multiple copies of data are stored using replica and this provides protection against data loss.
Note: We determine the number of replica and shard during indice creation. To better understand these concepts, I suggest you read the article “How to Create an Elasticsearch Indice using Kibana“.
master nodes are servers that direct incoming requests to data nodes. They knows which Shard the requested data is in, and forward it to the respective shard for the purpose of reading the data. Also, master node decides which shard the data is written to when writing data to data nodes.
In some cases, we may want data to be transferred to certain data nodes. I will share the details of this process in the article “How to Create an Elasticsearch Indice using Kibana“.
Since the master node writes data to different nodes, I/O performance can increase when the data is read.
When you install multiple master nodes(There should be more than one master node in best practice. in general it is recommended to have 3 master nodes.), the master nodes will be active and active. If one of the master nodes is down, incoming requests are forwarded to the data nodes via the other master node.
Note: You can place the master nodes behind the load balancer and give the software developers virtual ip. Thus, when one of the nodes is down, the software developers do not need to change the connection string.
data nodes are servers where the data is stored and the queries are running. Therefore, you should consider this when determining Server Capacities. The actual load is on the data nodes, not on the master nodes.