Elasticsearch 集群出现 yellow 的问题分析

查看集群健康状态,发现状态为 yellow,这说明有副本分配不正常,我们再看 unassigned_shards 为 1 ,则说明有一个分配还未分配

root@ubuntu:~# curl -X GET nes01-giio.nes.cn-east-1.internal:9200/_cluster/health?pretty
{
  "cluster_name" : "nes01",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 1,
  "active_shards" : 1,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

那么具体是那个索引下的分片呢?我们通过以下命令查看

 root@ubuntu:~# curl -X GET nes01-giio.nes.cn-east-1.internal:9200/_cluster/health?level=indices | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   579  100   579    0     0  63361      0 --:--:-- --:--:-- --:--:-- 64333
{
    "active_primary_shards": 1,
    "active_shards": 1,
    "active_shards_percent_as_number": 50.0,
    "cluster_name": "nes01",
    "delayed_unassigned_shards": 0,
    "indices": {
        ".kibana": {
            "active_primary_shards": 1,
            "active_shards": 1,
            "initializing_shards": 0,
            "number_of_replicas": 1,
            "number_of_shards": 1,
            "relocating_shards": 0,
            "status": "yellow",
            "unassigned_shards": 1
        }
    },
    "initializing_shards": 0,
    "number_of_data_nodes": 1,
    "number_of_in_flight_fetch": 0,
    "number_of_nodes": 1,
    "number_of_pending_tasks": 0,
    "relocating_shards": 0,
    "status": "yellow",
    "task_max_waiting_in_queue_millis": 0,
    "timed_out": false,
    "unassigned_shards": 1
}

这个参数会让 cluster-health API 在我们的集群信息里添加一个索引清单,以及有关每个索引的细节(状态、分片数、未分配分片数等等)

问题解决

1.该问题后来查看是因为集群所在的网络之间通信的端口 9300 不通导致的机器无法同步分配,防火墙进行设置即可。