mplachter.flume
用于部署和配置 Apache Flume 的 Ansible 角色
- 部署 Apache Flume
- 配置 Apache Flume
- 创建
apache-flume
服务- 仅运行 agent 配置
要求
- 运行
- Ansible 2.3+
- 测试
- Docker/Vagrant
- Molecule = 1.2.5
角色变量
- Java
vars
java_heap_xms: 125 java_heap_xmx: 250
- Apache Flume
vars
mirror_url: http://apache.mirrors.ionfish.org/flume version: 1.7.0
- Linux 文件夹/路径安装
vars
download_path: /tmp installation_path: /usr/local owner: root group: root
- Apache Flume 配置
vars
- 请参考 Flume 用户指南
- 当前配置将允许
- 代理
- 服务将仅为
agent
创建 - 多个
- 当前需要手动创建更多服务来运行这些
- 服务将仅为
- 源
- 当前每个
agent
仅允许一个源
- 当前每个
- 通道
- 当前每个
agent
仅允许一个通道
- 当前每个
- 汇
- 多个
- 汇组
- 将所有
sinks
添加到sinkgroup
- 将所有
- 代理
- 由于 Flume 配置选项非常多,请阅读以下内容
- 请在 Flume 配置的 属性名 中将 "." 替换为 "_"
- 值不需要替换
- 示例
kafka_consumer_group_id: testflume
- 结果
agent.source.kafka.consumer.group.id = testflume
- 请在 Flume 配置的 属性名 中将 "." 替换为 "_"
- 可以传递
apache_flume_config
变量以复制配置- 示例
apache_flume_config: file/flume-conf.properties
- 结果
- 这将把 flume-conf.properties 从你的文件目录复制到目标机器
- 示例
- 额外变量
- HDFS 原生库
hdfs_libs = true
- 这将下载
HDFS 原生库
并放置在plugin.d/hdfs/native/
- 这将下载
- HDFS 原生库
- 示例变量
mirror_url: http://apache.mirrors.ionfish.org/flume version: 1.7.0 download_path: /tmp installation_path: /usr/local owner: root group: root java_heap_xms: 125 java_heap_xmx: 250 hdfs_libs: true agents: - name: agent source: name: kafkaSource type: org.apache.flume.source.kafka.KafkaSource kafka_consumer_group_id: flume kafka_consumer_auto_offset_reset: latest kafka_consumer_max_partition_fetch_bytes: 1048576 kafka_consumer_heartbeat_interval_ms: 3000 kafka_consumer_session_timeout_ms: 30000 kafka_consumer_request_timeout_ms: 40000 kafka_consumer_fetch_max_wait_ms: 500 kafka_bootstrap_servers: - 127.0.0.1:9092 - 0.0.0.0:9092 kafka_topics: - topic1 - topic2 channel: name: kakfaChannel type: memory capacity: 1000000 transactionCapacity: 100000 sinks: - name: kafkaHDFSSink1 type: hdfs hdfs_path: "s3n://GFGJFSHFJHFGFHSBJ:fdjhSFUYGSF65678+-saigfew123@hdfs/%{topic}/%y/%m/%d/%H" hdfs_filePrefix: FlumeData hdfs_inUseSuffix: .tmp hdfs_rollInterval: 30 hdfs_rollSize: 1024 hdfs_rollCount: 10 hdfs_idleTimeout: 0 hdfs_batchSize: 100 hdfs_fileType: "SequenceFile" hdfs_maxOpenFiles: 5000 hdfs_callTimeout: 10000 hdfs_threadsPoolSize: 10 hdfs_rollTimerPoolSize: 1 hdfs_round: false hdfs_roundValue: 1 hdfs_roundUnit: second hdfs_timeZone: Local Time hdfs_useLocalTimeStamp: false hdfs_closeTries: 0 hdfs_retryInterval: 180 - name: kafkaHDFSSink2 type: hdfs hdfs_path: "s3n://GFGJFSHFJHFGFHSBJ:fdjhSFUYGSF65678+-saigfew123@hdfs/%{topic}/%y/%m/%d/%H" hdfs_filePrefix: FlumeData hdfs_inUseSuffix: .tmp hdfs_rollInterval: 30 hdfs_rollSize: 1024 hdfs_rollCount: 10 hdfs_idleTimeout: 0 hdfs_batchSize: 100 hdfs_fileType: "SequenceFile" hdfs_maxOpenFiles: 5000 hdfs_callTimeout: 10000 hdfs_threadsPoolSize: 10 hdfs_rollTimerPoolSize: 1 hdfs_round: false hdfs_roundValue: 1 hdfs_roundUnit: second hdfs_timeZone: Local Time hdfs_useLocalTimeStamp: false hdfs_closeTries: 0 hdfs_retryInterval: 180 sink_group: name: sinkgroup1 processor_type: load_balance processor_backoff: false processor_selector: round_robin
依赖
- andrewrothstein.java-oracle-jre
示例剧本
- hosts: all
roles:
- role: mplachter.flume
许可证
MIT
作者信息
Matthew Plachter