09.04 Kafka 源碼：KafkaConsumer 消費處理技术頭條網

Kafka消費者客戶端從Kafka cluster中讀取消息並處理。

Kafka消費者可以手動綁定自己到某個topic的某些partition上或者通過subscribe方法監聽某個topic自動綁定。Kafka消費者綁定到某個parition後就和這個partition的leader連接，然後發出fetch request, 獲取消息後進行處理。

offset管理

kafka的消費模型是一個partition最多被一個consumer消費，而offset可以有consumer控制，例如通過seek前進或後退到某個offset位置。

首次連接時，可以通過KafkaConsumer配置參數裡的auto.offset.reset參數決定是從最新的位置（默認）還是從就早的位置開始消費。

默認情況下, enable.auto.commit參數是true,即KafkaConsumer客戶端會定時commit offset，所有要注意的一點是如果poll函數得到ConsumerRecords後如果處理是異步的，則可能出現消費處理還沒有完成但是卻commit offset了，這時如果進程掛掉則重啟後則會發生丟消息的情況。這裡有兩種解決方案，1是poll後的處理是同步的，這樣下一次poll會嘗試commit offset，則能保證at least one語義。2是關閉enable.auto.commit, 然後通過KafkaConsumer.commitSync方法來手動commit offset。

max.poll.interval.ms參數用於設置kafka消費者處理一次poll的消費結果的最大時間(默認300s)，如果超過了這個時間則consumer被認為掛了會重新rebalance。

Consumer線程相關

消費者多線程處理有幾種方式

每個consumer只由一個線程處理，優點是能保證partition內有序和實現簡單，缺點是併發能力受限於partition的數量
將consumption和process過程分離，即consumer拉到一個消息後傳遞給另一個線程或線程池處理，這裡提高了併發能力但是需要注意多線程處理中的順序問題不再保證以及可能出現consumer提交了offset而線程池沒處理完的情況，另外線程池要注意處理慢導致的內存隊列積壓問題。

KafkaConsumer.subscribe

監聽某個topic

subscribe(Collection topics, ConsumerRebalanceListener listener)

當消費者使用kafka cluster來管理group membership時，ConsumerRebalanceListener會在consumer rebalance時調用，consumer rebalance發生在消費者或消費關係變化的時候

某個消費進程掛掉
新消費進程加入
partition數量發生變化時

這個Listener的常見用途是保存這個partition的最新消費offset，在void onPartitionsRevoked(java.util.Collection<topicpartition> partitions)裡保存當前的partition和offset到數據庫中。然後reassign完成後，void onPartitionsAssigned(java.util.Collection partitions)中從數據庫讀取之前的消費位置，通過seek方法設置消費位置繼續消費。/<topicpartition>

Kafka.poll

public ConsumerRecords poll(long timeout) {

// KafkaConsumer不是線程安全的

acquireAndEnsureOpen();

try {

if (timeout < 0)

throw new IllegalArgumentException("Timeout must not be negative");

if (this.subscriptions.hasNoSubscriptionOrUserAssignment())

throw new IllegalStateException("Consumer is not subscribed to any topics or assigned any partitions");

// poll for new data until the timeout expires

long start = time.milliseconds();

long remaining = timeout;

do {

Map<topicpartition>>> records = pollOnce(remaining);/<topicpartition>

if (!records.isEmpty()) {

// before returning the fetched records, we can send off the next round of fetches

// and avoid block waiting for their responses to enable pipelining while the user

// is handling the fetched records.

// NOTE: since the consumed position has already been updated, we must not allow

// wakeups or any other errors to be triggered prior to returning the fetched records.

if (fetcher.sendFetches() > 0 || client.hasPendingRequests())

client.pollNoWakeup();

if (this.interceptors == null)

return new ConsumerRecords<>(records);

else

return this.interceptors.onConsume(new ConsumerRecords<>(records));

}

long elapsed = time.milliseconds() - start;

remaining = timeout - elapsed;

} while (remaining > 0);

return ConsumerRecords.empty();

} finally {

release();

}

pollOnce處理

private Map<topicpartition>>> pollOnce(long timeout) {/<topicpartition>

client.maybeTriggerWakeup();

// 協調者進行一次poll，裡面會根據auto.commit.interval.ms決定是否自動提交offset

coordinator.poll(time.milliseconds(), timeout);

// fetch positions if we have partitions we're subscribed to that we

// don't know the offset for

if (!subscriptions.hasAllFetchPositions())

updateFetchPositions(this.subscriptions.missingFetchPositions());

// 如果已經有record數據了直接返回

// if data is available already, return it immediately

Map<topicpartition>>> records = fetcher.fetchedRecords();/<topicpartition>

if (!records.isEmpty())

return records;

// 發送一次fetch請求

// send any new fetches (won't resend pending fetches)

fetcher.sendFetches();

long now = time.milliseconds();

long pollTimeout = Math.min(coordinator.timeToNextPoll(now), timeout);

// 等待fetch請求結果

client.poll(pollTimeout, now, new PollCondition() {

@Override

public boolean shouldBlock() {

// since a fetch might be completed by the background thread, we need this poll condition

// to ensure that we do not block unnecessarily in poll()

return !fetcher.hasCompletedFetches();

}

});

// after the long poll, we should check whether the group needs to rebalance

// prior to returning data so that the group can stabilize faster

if (coordinator.needRejoin())

return Collections.emptyMap();

// 返回fetch結果

return fetcher.fetchedRecords();

}

分享到:

閱讀更多 優秀的程序員 的文章

關鍵字: 消費數據庫源碼

09.04 Kafka 源碼：KafkaConsumer 消費處理

offset管理

Consumer線程相關

KafkaConsumer.subscribe

相關文章:

第二章 IoC容器和Bean配置

運算裡不得不說的python模塊—math

Devops度量--DevOps 現狀快速檢查表

SOP是什麼（解讀）

還不知道交換機上如何配置DHCP，趕緊過來圍觀吧，一分鐘包你學會

還在手動配置IP地址嗎？太Low了，一分鐘教會您如何配置DHCP

Python爬蟲自學筆記：分析頭條文章網頁源文件

DNS偵查工具

國人開源的異步 Python ORM：GINO

程序測評：Create React App 3.3中有哪些酷炫新功能？

“明學”的魅力？我只要我覺得：駕馭終端，提高生產力

（必收藏系列）Linux面試題——命令集

五分鐘學會如何在 IPFS 上部署網站

「正點原子NANO STM32F103開發板資料連載」第29章 內存管理實驗

小白怎麼學Web前端開發 如何成為技術達人

如何開發一個web靜態服務器

學Java編程還有前景嗎 如何才能拿到高薪

Python網絡爬蟲之配置篇（一）

SpringBoot 整合SpringSecurity示例實現前後分離權限註解+JWT登錄認證

Python的運行效率太低？幾行代碼快速提升！

python的優點是什麼？最新Python400集視頻（附教程）

MySQL中OOM故障應如何下手-愛可生

像專家一樣使用 panic

30種不同的編程語言怎麼寫“Hello, World”

percona QAN 介紹

面試官：你可以用純CSS判斷鼠標進入的方向嗎？

網絡工程師職業生涯中，哪兩點是最重要的？

交換機中相關術語代表什麼意思，有必要弄清楚

由淺入深瞭解以太坊 2.0：最常見問題和最全學習清單

【Linux簡單實用小命令001】CentOS 7、8的防火牆端口開放

吃透這些IPFS硬核知識點，日後搶頭礦隨時“彎道超車”

Hive分桶表

Spring中資源的加載原來是這麼一回事啊！

自己動手搭建郵件系統：怎樣讓Exchange Server 發出第一封郵件？

【MySQL】RDS物理備份文件(.idb\.frm)恢復到MySQL自建數據庫

NLP算法入門系列：隱含馬爾可夫鏈(HMM)模型的簡單介紹

第一章 Spring Framework概述

opencv人工智能深度學習這樣實現人臉的年齡檢測

嵌入式linux網絡編程之——5年程序員給你深度講解socket套接字

深入瞭解ProcessFunction的狀態操作(Flink-1.10)

Redis內存分析工具--rdr安裝與使用

資深架構師教你源碼講解zookeeper實現分佈式鎖以及集群搭建步驟

一行代碼提升遷移性能

利用相似幾何信息，做可泛化3D形狀分割模型

這麼好用的開源計算器SpeedCrunch，沒有不嘗試一下的道理

分佈式緩存，真香

特徵工程的力量

java架構：天天寫面向接口編程，你考慮過性能嗎？大神都是這麼寫

SpringBoot如何優雅的使用RocketMQ

css代碼規範工具stylelint

「正點原子NANO STM32F103開發板資料連載」第29章內存管理實驗

小白怎麼學Web前端開發如何成為技術達人

學Java編程還有前景嗎如何才能拿到高薪