v2.0

v1.0

文档中心

监控指标说明

KubeSphere 资源监控共分为八个层级：Cluster（集群），Node（节点），Workspace（企业空间），Namespace（项目），Workload（工作负载），Pod（容器组），Container（容器），Component（ KubeSphere 核心组件）。

Cluster

指标名	说明	单位
cluster_cpu_utilisation	集群 CPU 使用率
cluster_cpu_usage	集群 CPU 用量	Core
cluster_cpu_total	集群 CPU 总量	Core
cluster_load1	集群 1 分钟 CPU 平均负载¹
cluster_load5	集群 5 分钟 CPU 平均负载
cluster_load15	集群 15 分钟 CPU 平均负载
cluster_memory_utilisation	集群内存使用率
cluster_memory_available	集群可用内存	Byte
cluster_memory_total	集群内存总量	Byte
cluster_memory_usage_wo_cache	集群内存使用量²	Byte
cluster_net_utilisation	集群网络数据传输速率	Byte/s
cluster_net_bytes_transmitted	集群网络数据发送速率	Byte/s
cluster_net_bytes_received	集群网络数据接受速率	Byte/s
cluster_disk_read_iops	集群磁盘每秒读次数	次/s
cluster_disk_write_iops	集群磁盘每秒写次数	次/s
cluster_disk_read_throughput	集群磁盘每秒读取数据量	Byte/s
cluster_disk_write_throughput	集群磁盘每秒写入数据量	Byte/s
cluster_disk_size_usage	集群磁盘使用量	Byte
cluster_disk_size_utilisation	集群磁盘使用率
cluster_disk_size_capacity	集群磁盘总容量	Byte
cluster_disk_size_available	集群磁盘可用大小	Byte
cluster_disk_inode_total	集群 inode 总数
cluster_disk_inode_usage	集群 inode 已使用数
cluster_disk_inode_utilisation	集群 inode 使用率
cluster_node_online	集群节点在线数
cluster_node_offline	集群节点下线数
cluster_node_offline_ratio	集群节点下线比例
cluster_node_total	集群节点总数
cluster_pod_count	集群中调度完成³ Pod 数量
cluster_pod_quota	集群各节点 Pod 最大容纳量⁴总和
cluster_pod_utilisation	集群 Pod 最大容纳量使用率
cluster_pod_running_count	集群中处于 Running 阶段⁵的 Pod 数量
cluster_pod_succeeded_count	集群中处于 Succeeded 阶段的 Pod 数量
cluster_pod_abnormal_count	集群中异常 Pod ⁶数量
cluster_pod_abnormal_ratio	集群中异常 Pod 比例 ⁷
cluster_ingresses_extensions_count	集群 Ingress 数
cluster_cronjob_count	集群 CronJob 数
cluster_pvc_count	集群 PersistentVolumeClaim 数
cluster_daemonset_count	集群 DaemonSet 数
cluster_deployment_count	集群 Deployment 数
cluster_endpoint_count	集群 Endpoint 数
cluster_hpa_count	集群 Horizontal Pod Autoscaler 数
cluster_job_count	集群 Job 数
cluster_statefulset_count	集群 StatefulSet 数
cluster_replicaset_count	集群 ReplicaSet 数
cluster_service_count	集群 Service 数
cluster_secret_count	集群 Secret 数
cluster_namespace_count	集群 Namespace 数

【说明】

¹ 指单位时间内，单位 CPU 运行队列中处于可运行或不可中断状态的平均进程数。如果数值大于 1，表示 CPU 不足以服务进程，有进程在等待。

² 不包含 buffer、 cache。

³ Pod 已经被调度到节点上，即 status.conditions.PodScheduled = true 。参考：Pod Lifecycle

⁴ 节点 Pod 最大容纳量一般默认 110 个 Pod。参考：kubelet Options

⁵ Running 阶段表示该 Pod 已经绑定到了一个节点上，Pod 中所有的容器都已被创建。至少有一个容器正在运行，或者正处于启动或重启状态。参考：Pod Lifecycle

⁶ 异常 Pod：如果一个 Pod 的 status.conditions.ContainersReady 字段值为 false，说明该 Pod 不可用。我们在判定 Pod 是否异常时，还需要考虑到 Pod 可能正处于 ContainerCreating 状态或者 Succeeded 已完成阶段。综合以上情况，异常 Pod 总数的算法可表示为： Abnormal Pods = Total Pods - ContainersReady Pods - ContainerCreating Pods - Succeeded Pods 。

⁷ 异常 Pod 比例：异常 Pod 数 / 非 Succeeded Pod 数。

Node

指标名	说明	单位
node_cpu_utilisation	节点 CPU 使用率
node_cpu_total	节点 CPU 总量	Core
node_cpu_usage	节点 CPU 用量	Core
node_load1	节点 1 分钟 CPU 平均负载
node_load5	节点 5 分钟 CPU 平均负载
node_load15	节点 15 分钟 CPU 平均负载
node_memory_utilisation	节点内存使用率
node_memory_usage_wo_cache	节点内存使用量¹	Byte
node_memory_available	节点可用内存	Byte
node_memory_total	节点内存总量	Byte
node_net_utilisation	节点网络数据传输速率	Byte/s
node_net_bytes_transmitted	节点网络数据发送速率	Byte/s
node_net_bytes_received	节点网络数据接受速率	Byte/s
node_disk_read_iops	节点磁盘每秒读次数	次/s
node_disk_write_iops	节点磁盘每秒写次数	次/s
node_disk_read_throughput	节点磁盘每秒读取数据量	Byte/s
node_disk_write_throughput	节点磁盘每秒写入数据量	Byte/s
node_disk_size_capacity	节点磁盘总容量	Byte
node_disk_size_available	节点磁盘可用大小	Byte
node_disk_size_usage	节点磁盘使用量	Byte
node_disk_size_utilisation	节点磁盘使用率
node_disk_inode_total	节点 inode 总数
node_disk_inode_usage	节点 inode 已使用数
node_disk_inode_utilisation	节点 inode 使用率
node_pod_count	节点调度完成 Pod 数量
node_pod_quota	节点 Pod 最大容纳量
node_pod_utilisation	节点 Pod 最大容纳量使用率
node_pod_running_count	节点中处于 Running 阶段的 Pod 数量
node_pod_succeeded_count	节点中处于 Succeeded 阶段的 Pod 数量
node_pod_abnormal_count	节点异常 Pod 数量
node_pod_abnormal_ratio	节点异常 Pod 比例

【说明】

¹ 不包含 buffer、 cache。

Workspace

指标名	说明	单位
workspace_cpu_usage	企业空间 CPU 用量	Core
workspace_memory_usage	企业空间内存使用量（包含缓存）	Byte
workspace_memory_usage_wo_cache	企业空间内存使用量	Byte
workspace_net_bytes_transmitted	企业空间网络数据发送速率	Byte/s
workspace_net_bytes_received	企业空间网络数据接受速率	Byte/s
workspace_pod_count	企业空间内非终止阶段 Pod 数量¹
workspace_pod_running_count	企业空间内处于 Running 阶段的 Pod 数量
workspace_pod_succeeded_count	企业空间内处于 Succeeded 阶段的 Pod 数量
workspace_pod_abnormal_count	企业空间异常 Pod 数量
workspace_pod_abnormal_ratio	企业空间异常 Pod 比例
workspace_ingresses_extensions_count	企业空间 Ingress 数
workspace_cronjob_count	企业空间 CronJob 数
workspace_pvc_count	企业空间 PersistentVolumeClaim 数
workspace_daemonset_count	企业空间 DaemonSet 数
workspace_deployment_count	企业空间 Deployment 数
workspace_endpoint_count	企业空间 Endpoint 数
workspace_hpa_count	企业空间 Horizontal Pod Autoscaler 数
workspace_job_count	企业空间 Job 数
workspace_statefulset_count	企业空间 StatefulSet 数
workspace_replicaset_count	企业空间 ReplicaSet 数
workspace_service_count	企业空间 Service 数
workspace_secret_count	企业空间 Secret 数
workspace_all_project_count	企业空间下项目总数

【说明】

¹ 非终止阶段的 Pod 指处于 Pending、Running、Unkown 阶段的 Pod，不包含被成功终止，或者因非 0 状态退出被系统终止的 Pod。参考：Pod Lifecycle

若 Workspace Monitoring API 设置了查询参数 type 为 statistics，则返回企业空间统计信息：

指标名	说明	单位
workspace_all_organization_count	集群企业空间总数
workspace_all_account_count	集群账号总数
workspace_all_project_count	集群项目总数
workspace_all_devops_project_count¹	集群 DevOps 工程总数
workspace_namespace_count	企业空间项目总数
workspace_devops_project_count	企业空间 DevOps 工程总数
workspace_member_count	企业空间成员数
workspace_role_count²	企业空间角色数

【说明】

¹ 前四个指标适用于 /kapis/devops.kubesphere.io/v1alpha2/workspaces

² 后四个指标适用于 /kapis/devops.kubesphere.io/v1alpha2/workspaces/{workspace}

Namespace

指标名	说明	单位
namespace_cpu_usage	项目 CPU 用量	Core
namespace_memory_usage	项目内存使用量（包含缓存）	Byte
namespace_memory_usage_wo_cache	项目内存使用量	Byte
namespace_net_bytes_transmitted	项目网络数据发送速率	Byte/s
namespace_net_bytes_received	项目网络数据接受速率	Byte/s
namespace_pod_count	项目内非终止阶段 Pod 数量
namespace_pod_running_count	项目内处于 Running 阶段的 Pod 数量
namespace_pod_succeeded_count	项目内处于 Succeeded 阶段的 Pod 数量
namespace_pod_abnormal_count	项目异常 Pod 数量
namespace_pod_abnormal_ratio	项目异常 Pod 比例
namespace_cronjob_count	项目 CronJob 数
namespace_pvc_count	项目 PersistentVolumeClaim 数
namespace_daemonset_count	项目 DaemonSet 数
namespace_deployment_count	项目 Deployment 数
namespace_endpoint_count	项目 Endpoint 数
namespace_hpa_count	项目 Horizontal Pod Autoscaler 数
namespace_job_count	项目 Job 数
namespace_statefulset_count	项目 StatefulSet 数
namespace_replicaset_count	项目 ReplicaSet 数
namespace_service_count	项目 Service 数
namespace_secret_count	项目 Secret 数
namespace_ingresses_extensions_count	项目 Ingress 数

Workload

指标名	说明	单位
workload_pod_cpu_usage	工作负载¹ CPU 用量	Core
workload_pod_memory_usage	工作负载内存使用量（包含缓存）	Byte
workload_pod_memory_usage_wo_cache	工作负载内存使用量	Byte
workload_pod_net_bytes_transmitted	工作负载网络数据发送速率	Byte/s
workload_pod_net_bytes_received	工作负载网络数据接受速率	Byte/s
workload_deployment_replica	Deployment 期望副本数
workload_deployment_replica_available	Deployment 可用副本数²
workload_deployment_unavailable_replicas_ratio	Deployment 不可用副本数比例³
workload_statefulset_replica	StatefulSet 期望副本数
workload_statefulset_replica_available	StatefulSet 可用副本数
workload_statefulset_unavailable_replicas_ratio	StatefulSet 不可用副本数比例
workload_daemonset_replica	DaemonSet 期望副本数
workload_daemonset_replica_available	DaemonSet 可用副本数
workload_daemonset_unavailable_replicas_ratio	DaemonSet 不可用副本数比例

【说明】

¹ 目前支持的工作负载类型包括：Deployment，StatefulSet 和 DaemonSet。

² 可用副本指工作负载创建出的 Pod 处于可用状态，即该 Pod 的 status.conditions.ContainersReady 字段值为 true。

³ 不可用副本数比例：不可用副本数 / 期望副本数。

Pod

指标名	说明	单位
pod_cpu_usage	容器组 CPU 用量	Core
pod_memory_usage	容器组内存使用量（包含缓存）	Byte
pod_memory_usage_wo_cache	容器组内存使用量	Byte
pod_net_bytes_transmitted	容器组网络数据发送速率	Byte/s
pod_net_bytes_received	容器组网络数据接受速率	Byte/s

Container

指标名	说明	单位
container_cpu_usage	容器 CPU 用量	Core
container_memory_usage	容器内存使用量（包含缓存）	Byte
container_memory_usage_wo_cache	容器内存使用量	Byte

Component

指标名	说明	单位
etcd_server_list	etcd 集群节点列表¹
etcd_server_total	etcd 集群节点总数
etcd_server_up_total	etcd 集群在线节点数
etcd_server_has_leader	etcd 集群各节点是否有 leader²
etcd_server_leader_changes	etcd 集群各节点观察到 leader 变化数（ 1h 内）
etcd_server_proposals_failed_rate	etcd 集群各节点提案失败³频率平均数	次/s
etcd_server_proposals_applied_rate	etcd 集群各节点提案应用频率平均数	次/s
etcd_server_proposals_committed_rate	etcd 集群各节提案提交频率平均数	次/s
etcd_server_proposals_pending_count	etcd 集群各节点排队提案数平均值
etcd_mvcc_db_size	etcd 集群各节点数据库大小平均值	Byte
etcd_network_client_grpc_received_bytes	etcd 集群向 gRPC 客户端发送数据速率	Byte/s
etcd_network_client_grpc_sent_bytes	etcd 集群接受 gRPC 客户端数据速率	Byte/s
etcd_grpc_call_rate	etcd 集群 gRPC 请求速率	次/s
etcd_grpc_call_failed_rate	etcd 集群 gRPC 请求失败速率	次/s
etcd_grpc_server_msg_received_rate	etcd 集群 gRPC 流式消息接收速率	次/s
etcd_grpc_server_msg_sent_rate	etcd 集群 gRPC 流式消息发送速率	次/s
etcd_disk_wal_fsync_duration	etcd 集群各节点 WAL 日志同步时间平均值	秒
etcd_disk_wal_fsync_duration_quantile	etcd 集群 WAL 日志同步时间平均值（按分位数统计）⁴	秒
etcd_disk_backend_commit_duration	etcd 集群各节点库同步时间⁵平均值	秒
etcd_disk_backend_commit_duration_quantile	etcd 集群各节点库同步时间平均值（按分位数统计）	秒
apiserver_up_sum	APIServer ⁶在线实例数
apiserver_request_rate	APIServer 每秒接受请求数
apiserver_request_by_verb_rate	APIServer 每秒接受请求数（按 HTTP 请求方法分类统计）
apiserver_request_latencies	APIServer 请求平均迟延	秒
apiserver_request_by_verb_latencies	APIServer 请求平均迟延（按 HTTP 请求方法分类统计）	秒
scheduler_up_sum	调度器⁷在线实例数
scheduler_schedule_attempts	调度器累计调度次数 ⁸
scheduler_schedule_attempt_rate	调度器调度频率	次/s
scheduler_e2e_scheduling_latency	调度器调度延迟	秒
scheduler_e2e_scheduling_latency_quantile	调度器调度延迟（按分位数统计）	秒
controller_manager_up_sum	Controller Manager⁹ 在线实例数
coredns_up_sum	CoreDNS 在线实例数
coredns_cache_hits	CoreDNS 缓存命中频率	次/s
coredns_cache_misses	CoreDNS 缓存未命中频率	次/s
coredns_dns_request_rate	CoreDNS 每秒请求数
coredns_dns_request_duration	CoreDNS 请求耗时	秒
coredns_dns_request_duration_quantile	CoreDNS 请求耗时（按分位数统计）	秒
coredns_dns_request_by_type_rate	CoreDNS 每秒请求数（按请求类型分类统计）
coredns_dns_request_by_rcode_rate	CoreDNS 每秒请求数（按 rcode 分类统计）
coredns_panic_rate	CoreDNS 异常发生频率	次/s
coredns_proxy_request_rate	CoreDNS 代理每秒请求数
coredns_proxy_request_duration	CoreDNS 代理请求耗时	秒
coredns_proxy_request_duration_quantile	CoreDNS 代理请求耗时（按分位数统计）	秒
prometheus_up_sum	Prometheus 在线实例数量
prometheus_tsdb_head_samples_appended_rate	Prometheus 每秒存储监控指标数

【说明】

¹ 如果某一节点返回值为 1 说明该 etcd 节点在线，0 说明节点下线。

² 如果某一节点返回值为 0 说明该节点没有leader ，即该节点不可使用；如果集群中，所有节点都没有任何 leader ，则整个集群不可用。

³ 中英文对照说明：提案（consensus proposals）,失败提案（failed proposals），已提交提案（commited proposals），应用提案（applied proposals），排队提案（pending proposals）。

⁴ 支持三种分位数统计：99th 百分位数、90th 百分位数、中位数。

⁵ 反映磁盘 I/O 延迟。如果数值过高，通常表示磁盘问题。

⁶ 指 kube-apiserver。

⁷ 指 kube-scheduler。

⁸ 按调度结果分类统计：error（因调度器异常而无法调度的 Pod 数量），scheduled（成功被调度的 Pod 数量），unschedulable（无法被调度的 Pod 数量）。

⁹ 指 kube-controller-manager。

KubeSphere Docs

Release Notes

Release Notes - 2.0.2最新

Release Notes - 2.0.1

Release Notes - 2.0.0

产品介绍

产品简介

产品功能

产品优势

架构说明

应用场景

名词解释

安装指南

安装说明

需开放的端口

All-in-One 模式

Multi-Node 模式

在 Kubernetes 在线部署 KubeSphere

在 Kubernetes 离线部署 KubeSphere

Master 和 etcd 节点高可用

存储安装配置说明

集群组件配置说明

安装负载均衡器插件

安装内置 Harbor

安装内置 GitLab

升级

访问 SonarQube 和 Jenkins 服务端

集群节点扩容

卸载

快速入门

入门必读

示例一 - 多租户管理快速入门

示例二 - 应用路由与服务示例

示例三 - 部署 MySQL

示例四 - 部署 Wordpress

示例五 - 创建简单任务

示例六 - 一键部署应用

示例七 - 设置弹性伸缩 (HPA)

示例八 - Source-to-Image

示例九 - Bookinfo 微服务的灰度发布

示例十 - 基于Spring Boot项目构建流水线

示例十一 - 图形化构建流水线

示例十二 - CI/CD 流水线(离线版)

示例十三 - 使用 Ingress-Nginx 进行灰度发布

管理员指南

多租户管理

多租户管理概述

角色权限概览

平台管理

企业空间管理

账号管理

平台角色

基础设施

服务组件

主机管理

存储类型

监控中心

监控概述

如何利用监控定位问题

集群状态监控

应用资源监控

监控策略 - 节点级别

监控消息 - 节点级别

平台设置

应用仓库

基于本地仓库搭建应用仓库部署Redis

上传应用到 KubeSphere 官方仓库

基于 GitHub 搭建自有应用仓库

邮件服务器

日志收集

添加 Fluentd 作为日志接收者

添加 Kafka 作为日志接收者

工具箱

Web Kubectl

日志收集

通用配置

系统配置修改

上传镜像至 Harbor

Jenkins 系统设置

FAQ

DevOps 运维FAQ