Apache Kafka has default message size limits that can be configured to handle larger payloads, but there are important considerations and best practices to follow.
Default message size limits
By default, Kafka has the following message size limits:
- Producer: 1MB (
max.request.size)
- Broker: 1MB (
message.max.bytes)
- Topic: Inherits from broker setting (
max.message.bytes)
- Consumer: 1MB (
max.partition.fetch.bytes)
Configuring Kafka for large messages
To send messages larger than 1MB, you need to configure multiple components:
Producer configuration
# Set maximum request size for producer
max.request.size=10485760 # 10MB
# Increase buffer memory if needed
buffer.memory=67108864 # 64MB
Broker configuration
# Set maximum message size for broker
message.max.bytes=10485760 # 10MB
# Set maximum replica fetch size
replica.fetch.max.bytes=10485760 # 10MB
# Set maximum response size
socket.receive.buffer.bytes=1048576 # 1MB
socket.send.buffer.bytes=1048576 # 1MB
Topic configuration
kafka-configs --bootstrap-server localhost:9092 \
--alter --entity-type topics --entity-name large-topic \
--add-config max.message.bytes=10485760
Consumer configuration
# Set maximum fetch size for consumer
max.partition.fetch.bytes=10485760 # 10MB
fetch.max.bytes=52428800 # 50MB
Sending large messages in Kafka has several performance implications:
Memory usage
- Larger messages consume more memory on brokers, producers, and consumers
- Can lead to increased garbage collection pressure
- May require tuning JVM heap sizes
Network bandwidth
- Large messages consume more network bandwidth
- Can lead to network congestion and timeouts
- May require adjusting network buffer sizes
Disk I/O
- Larger messages result in more disk I/O operations
- Can impact log compaction performance
- May require faster storage systems
Throughput impact
- Large messages generally reduce overall throughput
- Kafka is optimized for high-throughput, small messages
- Consider message batching strategies
Alternative approaches
Instead of sending large messages directly, consider these alternatives:
1. External storage pattern
Store large payloads in external systems and send only references:
{
"id": "message-123",
"timestamp": "2023-01-01T00:00:00Z",
"data_location": "s3://bucket/path/to/large-file.json",
"metadata": {
"size": 50000000,
"checksum": "abc123"
}
}
Benefits:
- Keeps Kafka messages small and fast
- Allows for separate scaling of storage and messaging
- Enables efficient caching strategies
2. Message splitting
Break large messages into smaller chunks:
{
"message_id": "msg-123",
"chunk_id": "chunk-1",
"total_chunks": 5,
"chunk_data": "...",
"sequence": 1
}
Benefits:
- Works within default Kafka limits
- Allows for parallel processing
- Provides better error recovery
3. Compression
Enable compression to reduce message sizes:
# Producer compression
compression.type=snappy # or gzip, lz4, zstd
Benefits:
- Reduces network bandwidth usage
- Decreases storage requirements
- Often improves throughput
Best practices
Recommendations for large messages
- Avoid large messages when possible - Kafka is optimized for small, high-throughput messages
- Use external storage - Store large payloads externally and reference them in Kafka messages
- Enable compression - Always enable compression for large messages
- Monitor memory usage - Ensure adequate heap sizing for all components
- Test thoroughly - Verify performance impact in your specific environment
Configuration checklist
When configuring for large messages, ensure all these settings are aligned:
- ✅ Producer
max.request.size
- ✅ Broker
message.max.bytes
- ✅ Topic
max.message.bytes
- ✅ Consumer
max.partition.fetch.bytes
- ✅ Consumer
fetch.max.bytes
- ✅ Broker
replica.fetch.max.bytes
Monitoring considerations
Monitor these metrics when working with large messages:
- Memory usage on brokers, producers, and consumers
- Network bandwidth utilization
- Disk I/O patterns and latency
- Garbage collection frequency and duration
- Message throughput and latency
Performance impactLarge messages can significantly impact Kafka performance. Always test in a staging environment that mirrors your production setup before deploying large message configurations.