日志收集方案EFK

liliane • 2023-01-02 • 云技术社区 • 276 阅读

导语

本文将介绍常见的分布式系统日志收集组件EFK的搭建，以及一些常见问题的处理。

概述

EFK（ElasticSearch、Fluentd、Kibana）是常见的分布式系统日志收集方案，es 用于存储数据，kibana 用于展示数据，支持各种搜索及维度聚合。fluentd 为日志收集工具，支持从各个数据源收集数据，对数据进行过滤、解析、转换、结构化后，写入 es。

可能大家听说得比较多的是 ELK，其中区别的 L，指的是 logstash，它是另一种日志收集工具，二者从功能上区别不大，网上有一些性能对比文章，总体上，fluentd 更优。

组件搭建

背景

在上云大背景下，es 组件直接购买云上资源即可。腾讯云 es 创建、使用参考云官网 ES 产品。创建 es 实例，即可使用可视化工具 kibana（开启外网 kibana 需注意安全问题，可仅开通内网访问，或设置访问网段白名单）。

由于团队使用基于k8s的容器化部署，为了收集容器内日志，我们通过挂载数据卷方式，将容器内的日志路径映射到节点本地磁盘固定位置。为了保证每个节点的日志都能及时被收集，我们通过 daemonset 方式，部署 fluentd，确保每个节点都有一个日志收集进程。fluentd 根据配置文件，将 tail 到的日志以一定的时间间隔写入到目标 es 实例。

关键步骤

1. 部署 td-agent.conf 配置

fluentd 部署过程最麻烦的一点在于，配置文件。fluentd 收集到日志后，根据配置文件，对日志进行处理和输出。因此，我们首先需要部署一个 configmap，将配置文件以 td-agent.conf 文件名挂载到容器指定路径（/etc/fluent/config.d）下（subPath）。configmap 配置文件内容，使用 |- 定义复杂属性。

data:   
    td-agent.conf:|-

以下是 td-agent.conf 示例：

<match fluent.**>
  @type null
</match>
<source>
  @id xx-containers.log
  @type tail
  path /var/log/**/log.log
  pos_file /var/log/xx.log.pos
  tag log.**
  <parse>
        @type multiline
        format_firstline /^\\[\\d{4}\\/\\d{2}\\/\\d{2} /
        format1 /^\\[(?<logtime>[^\\]]*)\\] \\[(?<level>[^ ]*)\\] (?<position>[\\s\\S\\d]*): \\[Action=(?<action>[^ ]*)\\|RequestId=(?<reqId>[^|]*)\\|AppId=(?<appid>[^ ]*)\\|TfAid=(?<aid>[^|]*)\\|TfUid=(?<uid>[^|]*)\\|(?<context>[^\\]]*)\\] (?<message>[\\s\\S]*)$/
  </parse>
  emit_unmatched_lines false
  read_from_head false
</source>
<source>
  @id xx-containers.log
  @type tail
  path /var/log/**/vlog.log
  pos_file /var/log/xx.log.pos
  tag xx.*
  <parse>
    @type regexp
    expression /(?<logtime>\\d{4}-[01]\\d-[0-3]\\d [0-2]\\d:[0-5]\\d:[0-5]\\d\\.\\d{3})[\\t ]*(?<level>[^\\t ]*)[\\t ]*(?<position>[^\\t ]*)[\\t ]*(?<message>[\\s\\S]*)$/
  </parse>
  emit_unmatched_lines false
  read_from_head false
</source>
<filter **>
  @type record_transformer
  enable_ruby true
  <record>
    message ${record.dig("position")}:${record.dig("message")}
  </record>
</filter>
<filter **>
  @type record_transformer
  <record>
    namespace ${tag_parts[4]}
  </record>
</filter>
<filter **>
  @type record_transformer
  <record>
    module ${tag_parts[5]}
  </record>
</filter>
<filter **>
  @type record_transformer
  <record>
    service_name ${tag_parts[4]}:${tag_parts[5]}
  </record>
</filter>
<match **>
   @type elasticsearch_dynamic
   @log_level debug
   include_tag_key true
   type_name _doc
   host xx
   port xx
   user xx
   password xx
   logstash_format true
   logstash_prefix logstash-xx-test
   <buffer>
        @type file
        path /var/log/fluentd/buffer
        flush_interval 5s
        flush_thread_count 10
        retry_forever false
        retry_max_times 1
		retry_wait 1s
        chunk_limit_size 8MB
        total_limit_size 256MB
        queue_limit_length 32
        compress gzip
   </buffer>
</match>

source

为输入源，关键配置：@type 选择以 tail 插件方式，读取 path （支持通配符）路径下文件内容。pos_file 记录当前读到文件位置。内容经过 parse 解析插件，按 @type multiline 多行方式解析。format_firstline 为多行解析中，首行正则规则，format1 为行内容正则。可配置多个输入源。这一步因为日志格式较复杂，为了避免失败反复重试，可以先在线验证。

read_from_head：为 true，将从文件头开始读，默认为 false。需注意，首次接入由于历史日志量，可能引发 es 的 circuit_breaking_exception。建议设置为 false，如果之前已经存在 pos_file 指定的文件，需要删除，才能从文件末尾开始读。

filter

为过滤器，通过 tag 匹配，符合条件的记录，这里我们用到了 @type record_transformer 插件，可以实现记录中字段的转换，包括增、删、改。如涉及表达式运算，需指定 enable_ruby true。如字段内容可能为空，可使用 dig，例如 ${record.dig("position")} ，避免异常。详见：https://docs.fluentd.org/filter/record_transformer。

match

匹配 tag，输出类型指定 @type elasticsearch。如下文配置涉及表达式，如 logstash_prefix logstash-${tag_parts[3]}-test

，需使用 @type elasticsearch_dynamic。

针对 es 版本不同，需要注意配置细节。es 7.5 以上，type_name 必填，否则系统报错：

as the final mapping would have more than 1 type: [_doc, fluentd]

host、port、user、password 分配填已申请的 ES 集群信息。
logstash_format：true，fluentd 将会以 logstash 格式来转发结构化的日志数据。
logstash_prefix：index 名称前缀，可以添加 ${tag} 占位符来标识不同数据源。例如不同环境的数据。
或者使用 index_name logstash.${tag}.%Y%m%d 配置指定索引名称。
buffer：配置缓冲区，定时将缓冲区内容刷到 es。
若日志量过大，可能引发 BufferOverflowError。可配置：queue_full_action drop_oldest_chunk 解决。

个人经验，fluentd 的配置可能遇到较多问题，大部分问题都可以去官网找到答案。

部署 fluentd 服务

挂载日志路径

需将业务产生日志的磁盘路径以本地路径挂载数据卷到 fluentd 容器内。

挂载配置文件

将步骤1创建的 configmap，挂载到容器内。fluentd 镜像需选择 fluentd-elasticsearch，数据卷挂载关注volumes、volumeMounts 配置。

apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "6"
    description: fluentd日志收集组件
  generation: 7
  labels:
    k8s-app: fluentd-es
    qcloud-app: fluentd-es
  name: fluentd-es
  namespace: kube-public
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: fluentd-es
      qcloud-app: fluentd-es
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: fluentd-es
        qcloud-app: fluentd-es
    spec:
      containers:
      - image: ccr.ccs.tencentyun.com/k8s-comm/fluentd-elasticsearch:v2.5.2
        imagePullPolicy: IfNotPresent
        name: fluentd-es
        resources:
          limits:
            cpu: "1"
            memory: 2Gi
          requests:
            cpu: 500m
            memory: 256Mi
        securityContext:
          privileged: false
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log
          name: varlog
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
        - mountPath: /etc/fluent/config.d
          name: config-volume
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: efk
      serviceAccountName: efk
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /var/log
          type: DirectoryOrCreate
        name: varlog
      - hostPath:
          path: /var/lib/docker/containers
          type: DirectoryOrCreate
        name: varlibdockercontainers
      - configMap:
          defaultMode: 420
          name: td-agent-config
        name: config-volume
  updateStrategy:
    type: OnDelete