Kinesis Consumers

Joshua Callis
3 min readApr 14, 2023

A useful overview of the Kinesis consumer / all the ways to consumer data. I’ve kept it to the point.

Kinesis SDK

  • Records are pulled directly from a shard.
  • Each shard has a maximum 2MB/s total aggregate (grouped data) throughput.
  • Example: if we had 3 shards, that would mean we are able to pull 6MB data for downstream per second.
  • Pull mechanism. GetRecords call can return up-to 10MB of data from a single shard. But because this is greater than the 2MB/s limit, you will have to wait for 5 seconds before you can make another call with GetRecords. Alternatively can return up-to 10,000 records.
  • Maximum of 5 GetRecord calls per shard per second, with a 200ms latency. This will mean that your consumer cannot continuously call GetRecords, it can only do this 5 times per second.
  • If you have multiple consumers ‘consuming’ from the same shard, each consumer will still have the 2MB/s limit. Kinesis fan out will solve this.

Kinesis Consumer Enhanced Fan Out

  • Works with KCL 2.0 and AWS Lambda ≥ 2018
  • By default can have 20 consumers per stream.
  • Each consumer can consume 2MB/s per shard.
  • Without using enhanced fan out, you will have a 2MB/s limit per shard.
  • With enhanced fan out, each consumer will increase the amount of data that can be consumed per second. Example, if we had 10 consumers reading from one shard we would get 20MB/s per shard as the consumers will be aggregated. This is done via SubscribeToShard()
  • Uses HTTP/2 to push data to consumers and reduce latency as a result.
  • Compared to standard consumers 200ms latency. Fan out provides on average 70ms latency.
  • Use a standard consumer when you have a low number of consumers and can tolerate 200ms latency/ 5 GetRecords per second. Use fan out when you have multiple consumers for the same stream, require very low latency and can’t have the 5 GetRecord call limit with a 200ms latency for the next invocation.

Kinesis Consumer Library (KCL)

  • Usually used with Java but available to use with other languages.
  • Share multiple shards with multiple consumers, in one “group” — this is called shard discovery.
  • Checkpointing feature, this ensures if a consumer fails. It will resume progress when healthy again.
  • Leverages DynamoDB for coordination and checkpointing. You will end up with one row per shard.
  • Be sure to provision your DynamoDB correctly or if unsure on the read/write capacity needed. Use the serverless on-demand offering. If DynamoDB doesn’t have enough write/read capacity, it will lag your consumers etc. You will get an ExpiredIteratorEception, meaning you need to increase your write capacity.

Kinesis Connector Library

  • Older Java Library from 2016, uses KCL under the hood.
  • Used to write data directly to Amazon S3, DynamoDB, OpenSearch, Redshift.
  • Must run on an EC2 instance.
  • Worth noting, but this has been replaced by Kinesis Firehose/lambda

Lambda

  • Read records from a data stream
  • Has a small library to de-aggregate from the KPL
  • Used to do light weight ETL for Amazon s3, Redshift etc
  • Good to use when you want to read data from Kinesis data streams in real time and trigger notifications, such as sending SMS messages/emails etc.
  • Can set a batch size, this will allow you to tell lambda how much data it should read from the stream per second.

--

--

Joshua Callis

Converted DevOps Engineer at oso.sh, Previously a Senior Software Engineer.