Kafka topics: choosing the replication factor and partition count

Choosing the right replication factor and partition count is crucial for Kafka topic performance, availability, and scalability. This guide covers the key considerations and best practices.

Replication factor

The replication factor determines how many copies of your data are maintained across the Kafka cluster.

How replication works

Each topic partition has one leader and zero or more followers
All reads and writes go through the leader
Followers replicate data from the leader
If a leader fails, one of the followers becomes the new leader

Choosing replication factor

For production environments:

Minimum: 3 (recommended)
Common: 3-5 depending on cluster size and requirements
Maximum: Generally not more than 5-7

For development/testing:

Minimum: 1 (acceptable for non-critical data)
Recommended: 2-3 for realistic testing

Replication factor considerations

Fault tolerance formulaWith replication factor N, you can tolerate up to N-1 broker failures while maintaining data availability.

Replication factor 1: No fault tolerance (data loss if broker fails)
Replication factor 3: Tolerates 2 broker failures
Replication factor 5: Tolerates 4 broker failures

Trade-offs:

Higher replication factor:
- ✅ Better fault tolerance and availability
- ✅ Higher data durability
- ❌ Increased storage requirements
- ❌ Higher network overhead
- ❌ Increased replication lag
Lower replication factor:
- ✅ Lower storage costs
- ✅ Reduced network overhead
- ❌ Reduced fault tolerance
- ❌ Higher risk of data loss

Partition count

Partitions enable Kafka to scale and parallelize data processing across multiple brokers and consumers.

How partitions work

Topics are divided into partitions
Each partition is an ordered, immutable sequence of messages
Partitions are distributed across brokers
Consumers can process partitions in parallel

Choosing partition count

Starting recommendations:

Small topics (< 1GB/day): 1-3 partitions
Medium topics (1-10GB/day): 6-12 partitions
Large topics (> 10GB/day): 20+ partitions

Factors to consider:

1. Throughput requirements

Partitions needed ≥ Target throughput / Single partition throughput

Example: If you need 100MB/s and each partition handles 10MB/s, you need at least 10 partitions.

2. Consumer parallelism

Maximum consumers in a consumer group = Number of partitions
More partitions = More potential parallelism
Fewer partitions = Less parallelism but simpler management

3. Broker distribution

Partitions should be evenly distributed across brokers
Each broker should handle a reasonable number of partitions
Avoid having too many partitions per broker (recommended: < 4000)

Partition count considerations

Trade-offs:

More partitions:
- ✅ Higher potential throughput
- ✅ Better parallelism for consumers
- ✅ Better load distribution
- ❌ More overhead (file handles, memory)
- ❌ Longer leader election times
- ❌ More complex partition management
Fewer partitions:
- ✅ Lower resource overhead
- ✅ Simpler management
- ✅ Faster leader elections
- ❌ Limited throughput potential
- ❌ Less consumer parallelism

Best practices

Planning for growth

Partition planningStart with enough partitions to handle 2-3x your expected peak throughput. It’s easier to start with more partitions than to add them later.

Partitions cannot be decreased after topic creation, so plan for growth:

Estimate peak throughput needs for the next 2-3 years
Add 50-100% buffer for unexpected growth
Consider data retention and storage requirements

Production recommendations

Replication factor:

Use replication factor 3 for most production workloads
Use replication factor 5 for critical data that cannot tolerate loss
Never use replication factor 1 in production

Partition count:

Start with 6-12 partitions for most new topics
Scale up based on observed throughput requirements
Aim for 10-100MB per partition per day as a rough guideline

Performance testing

Before finalizing partition and replication settings:

Test with realistic data volumes
Measure single partition throughput
Test consumer group scaling
Monitor resource usage (CPU, memory, disk I/O, network)
Test failure scenarios (broker failures, network partitions)

Monitoring and adjustment

Key metrics to monitor:

Throughput per partition
Consumer lag by partition
Broker resource utilization
Replication lag
Leader election frequency

Partition limitsBe aware of cluster-wide partition limits:

Each broker has limits on total partitions (typically 2000-4000)
ZooKeeper has metadata overhead for each partition
Too many partitions can impact cluster stability

Common patterns

High-throughput topics

# Large topic with high replication for critical data
kafka-topics --create \
  --topic high-throughput-topic \
  --partitions 24 \
  --replication-factor 3 \
  --bootstrap-server localhost:9092

Low-volume, critical topics

# Small topic with high replication for critical data
kafka-topics --create \
  --topic critical-events \
  --partitions 3 \
  --replication-factor 5 \
  --bootstrap-server localhost:9092

Development/testing topics

# Simple topic for development
kafka-topics --create \
  --topic dev-topic \
  --partitions 1 \
  --replication-factor 1 \
  --bootstrap-server localhost:9092

Kafkademy

Understanding Kafka

Practicing Kafka

Next level Kafka

Kafka topics: choosing the replication factor and partition count

Replication factor

How replication works

Choosing replication factor

Replication factor considerations

Partition count

How partitions work

Choosing partition count

1. Throughput requirements

2. Consumer parallelism

3. Broker distribution

Partition count considerations

Best practices

Planning for growth

Production recommendations

Performance testing

Monitoring and adjustment

Common patterns

High-throughput topics

Low-volume, critical topics

Development/testing topics

Kafkademy

Understanding Kafka

Practicing Kafka

Next level Kafka

​Replication factor

​How replication works

​Choosing replication factor

​Replication factor considerations

​Partition count

​How partitions work

​Choosing partition count

​1. Throughput requirements

​2. Consumer parallelism

​3. Broker distribution

​Partition count considerations

​Best practices

​Planning for growth

​Production recommendations

​Performance testing

​Monitoring and adjustment

​Common patterns

​High-throughput topics

​Low-volume, critical topics

​Development/testing topics

Replication factor

How replication works

Choosing replication factor

Replication factor considerations

Partition count

How partitions work

Choosing partition count

1. Throughput requirements

2. Consumer parallelism

3. Broker distribution

Partition count considerations

Best practices

Planning for growth

Production recommendations

Performance testing

Monitoring and adjustment

Common patterns

High-throughput topics

Low-volume, critical topics

Development/testing topics