一、数据操作

.........................................................................................看es-数据与集群章

二、集群介绍

1.集群的特点

1.集群中不论那一台操作数据，其他的节点都会同步
2.head插件可以连接集群中任意一台机器
3.数据会自动调度
4.主分片故障，副本分片会自动升为主分片
5.主节点故障，数据节点会自动升为主节点

2.查看进群信息

#查看节点信息
GET _cat/nodes
#查看集群状态
GET _cat/health

#查看主节点是谁
GET /_cat/master
#查看索引信息
GET /_cat/indices?v
#查看所有分片
GET _cat/shards
#查看指定索引的分片
GET _cat/shards/linux7

3.注意事项

1.发现节点的时候不用把集群所有的IP都加上，只需要加一个本机IP一个集群中任意一台的IP

2.选举的参数（节点数/2 +1）
discovery.zen.minimum_master_nodes: 2
#该条配置容易出现脑裂，为了防止脑裂我们可以加几个额外参数
    discovery.zen.fd.ping_timeout:60s        #超时时间
    discovery.zen.fd.ping_interval:10s        #重试间隔时间
    discovery.zen.fd.ping_retries:6            #重试次数

3.默认ES是5个分片1个副本

4.集群多节点，三个节点
    没有副本时，一台都不能坏
    1个副本时，可提供服务的情况，最大限度允许坏两台
    2个副本时，可以同时坏两台

5.索引建立后，分片不能进行修改，副本可以修改

三、集群内容修改

1.修改指定索引的副本数

PUT linux7/_settings
{
  "number_of_replicas": 2
}

2.修改所有的副本数

PUT _all/_settings
{
  "number_of_replicas": 2
}

3.配置文件修改默认分片数和副本数

index.number_of_shards: 5
设置默认索引分片个数，默认为5片。

index.number_of_replicas: 1
设置默认索引副本个数，默认为1个副本

4.创建时设置分片数和副本数

PUT /qiudao
{
  "settings": {
    "number_of_replicas": 2, 
    "number_of_shards": 3
  }
}

#注意：
1.每个分片都会占用额外的资源
2.每个分片都会占用一定的文件句柄数（to many open files）
3.查询数据的时候，通过计算方法会去指定的分片获取数据，分片数越多分到的数据越少，查询的成本越大

5.企业中怎么设置

1.跟开发沟通
2.看一共要几个节点
    2个节点，默认就可以了
    3个节点，重要的数据，2副本5分片，不重要的数据，1副本5分片
3.在开始阶段, 一个好的方案是根据你的节点数量按照1.5~3倍的原则来创建分片. 
    例如：如果你有3个节点, 则推荐你创建的分片数最多不超过9(3x3)个.
4.存储数据量多的可以设置分片多一些，存储数据量少的，可以少分写分片

四、集群的监控

1.监控

1.监控节点数量
    GET _cat/nodes
2.监控集群状态
    GET _cat/health
两者有一个产生变化就告警

2.监控状态脚本

#编写python脚本
[root@elkstack01 ~]# vim es_cluster_status.py
#!/usr/bin/env python
#coding:utf-8
#Author:_DriverZeng_
#Date:

import smtplib
from email.mime.text import MIMEText
from email.utils import formataddr
import subprocess
body = ""
false = "false"
clusterip = "10.0.0.51"
obj = subprocess.Popen(("curl -sXGET http://"+clusterip+":9200/_cluster/health?pretty=true"),shell=True, stdout=subprocess.PIPE)
data =  obj.stdout.read()
data1 = eval(data)
status = data1.get("status")
if status == "green":
    print "\033[1;32m 集群运行正常 \033[0m"
elif status == "yellow":
    print "\033[1;33m 副本分片丢失 \033[0m"
else:
    print "\033[1;31m 主分片丢失 \033[0m"

#执行结果如下
[root@elkstack01 ~]# python es_cluster_status.py
集群运行正常

3.增强插件 x-pack

五、ES优化

1.限制内存

1.内存最大不要超过32G
2.服务器内存的一半都要给ES   Lucene
    [root@redis01 ~]# vim /etc/elasticsearch/jvm.options 
    -Xms1g
    -Xms4g

3.设置的时候，先给小一点，数据太多内存不足时，先让开发删除无用数据
4.实在不能删除的时候，再加内存
5.内存不能加的时候，加机器
6.关闭swap空间

2.优化文件描述符

#编辑limit文件
[root@elkstack01 ~]# vim /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 131072
* hard nofile 131072

#编辑子配置文件（CentOS6）
[root@elkstack01 ~]# vim /etc/security/limits.d/90-nproc.conf
*          soft    nproc     2048
root       soft    nproc     unlimited

3.语句优化

1.条件查询的时候，尽量使用term查询，减少range的使用
2.建立索引的时候，使用范围更广的值建立索引

六、数据的备份和恢复

1.安装npm环境

2.安装备份工具

[root@redis01 ~]# npm install elasticdump -g

3.备份命令

1）备份参数

#我们运维需要掌握的
--input：来源文件或地址
--output：目标文件或地址
--type：备份内容类型（settings, analyzer, data, mapping, alias, template）

2）备份到另一台ES节点

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=http://staging.es.com:9200/test \
  --type=analyzer

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=http://staging.es.com:9200/test \
  --type=mapping

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=http://staging.es.com:9200/test \
  --type=data

3）备份数据成json文件

elasticdump \
  --input=http://10.0.0.51:9200/test \
  --output=/data/test_mapping.json \
  --type=mapping

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=/data/test_data.json \
  --type=data

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=/data/test_alias.json \
  --type=alias

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=/data/test_template.json \
  --type=template

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=/data/test_analyzer.json \
  --type=analyzer

4）备份成压缩文件

#当文件导出不是为了使用，只是为了保存，可以压缩
elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=$ | gzip > /data/test_data.json.gz

5）备份指定条件的数据

elasticdump \
  --input=http://10.0.0.91:9200/test \
  --output=/data/test_query.json \
  --searchBody='{"query":{"term":{"name": "lhd"}}}'

4.导入命令

elasticdump \
  --input=/data/test_alias.json \
  --output=http://10.0.0.91:9200/test \
  --type=alias

elasticdump \
  --input=/data/test_analyzer.json \
  --output=http://10.0.0.91:9200/test \
  --type=analyzer

elasticdump \
  --input=/data/test_data.json \
  --output=http://10.0.0.91:9200/test \
  --type=data

elasticdump \
  --input=/data/test_template.json \
  --output=http://10.0.0.91:9200/test \
  --type=template

elasticdump \
  --input=/data/test_mapping.json \
  --output=http://10.0.0.91:9200/test \
  --type=mapping

#注意：恢复的时候，如果已存在相同的数据，会覆盖原来的数据，如果不存在数据，则无影响

5.备份脚本

#!/bin/bash
echo '要备份的机器是：'${1}
index_name='
test
student
linux7
'
for index in `echo $index_name`
do
    echo "start input index ${index}"
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_alias.json --type=alias &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_analyzer.json --type=analyzer &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_data.json --type=data &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_alias.json --type=alias &> /dev/null
    elasticdump --input=http://${1}:9200/${index} --output=/data/${index}_template.json --type=template &> /dev/null
done

6.导入数据脚本

#!/bin/bash
#输入导入的机器ip
echo '要导入的机器是：'${1}
#输入导入的索引名称
index_name='
test
'
for index in `echo $index_name`
do
    echo "开始导入 ${index} 索引"
    elasticdump --input=/data/${index}_settings.json --output=http://${1}:9200/${index} --type=settings
    elasticdump --input=/data/${index}_alias.json --output=http://${1}:9200/${index} --type=alias
    elasticdump --input=/data/${index}_analyzer.json --output=http://${1}:9200/${index} --type=analyzer
    elasticdump --input=/data/${index}_data.json --output=http://${1}:9200/${index} --type=data
    elasticdump --input=/data/${index}_mapping.json --output=http://${1}:9200/${index} --type=mapping
    elasticdump --input=/data/${index}_template.json --output=http://${1}:9200/${index} --type=template
done

七、中文分词器

https://github.com/medcl/elasticsearch-analysis-ik/

1.插入测试数据

POST /index/text/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST /index/text/2
{"content":"公安部：各地校车将享最高路权"}

POST /index/text/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST /index/text/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

2.检测数据

POST /index/_search
{
  "query" : { "match" : { "content" : "中国" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
  }
}

3.配置中文分词器

1）安装插件（集群中所有机器都执行）

[root@redis01 ~]# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.0/elasticsearch-analysis-ik-6.6.0.zip

2）创建一个索引

PUT /news

3）添加mapping

curl -XPOST http://localhost:9200/news/text/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            }
        }
}'

4）添加我们制定的中文词语

[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict">/etc/elasticsearch/analysis-ik/my.dic</entry>

[root@redis01 ~]# vim /etc/elasticsearch/analysis-ik/my.dic 
中国

[root@redis01 ~]# chown -R elasticsearch.elasticsearch /etc/elasticsearch/analysis-ik/my.dic

3）重新插入数据

POST /news/text/1
{"content":"美国留给伊拉克的是个烂摊子吗"}

POST /news/text/2
{"content":"公安部：各地校车将享最高路权"}

POST /news/text/3
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}

POST /news/text/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}

4）再次检测

POST /news/_search
{
  "query" : { "match" : { "content" : "中国" }},
  "highlight" : {
      "pre_tags" : ["<tag1>", "<tag2>"],
      "post_tags" : ["</tag1>", "</tag2>"],
      "fields" : {
          "content" : {}
      }
  }
}

一、数据操作

二、集群介绍

1.集群的特点

2.查看进群信息

3.注意事项

三、集群内容修改

1.修改指定索引的副本数

2.修改所有的副本数

3.配置文件修改默认分片数和副本数

4.创建时设置分片数和副本数

5.企业中怎么设置

四、集群的监控

1.监控

2.监控状态脚本

3.增强插件 x-pack

五、ES优化

1.限制内存

2.优化文件描述符

3.语句优化

六、数据的备份和恢复

1.安装npm环境

2.安装备份工具

3.备份命令

1）备份参数

2）备份到另一台ES节点

3）备份数据成json文件

4）备份成压缩文件

5）备份指定条件的数据

4.导入命令

5.备份脚本

6.导入数据脚本

七、中文分词器

1.插入测试数据

2.检测数据

3.配置中文分词器

1）安装插件（集群中所有机器都执行）

2）创建一个索引

3）添加mapping

4）添加我们制定的中文词语

3）重新插入数据

4）再次检测

results matching ""

No results matching ""