Each UMH Classic instance comes with a dedicated Redpanda deployment configured with default settings that may not match specific application requirements. Understanding Redpanda's retention mechanisms - such as retention.ms and segment.bytes - and customizing these settings is essential to optimize storage utilization and ensure high-performance data streaming within the UMH ecosystem.
Understanding Retention in Redpanda
Redpanda keeps data for a set period before deleting it. It works based on time (retention.ms
) and size (segment.bytes
). Here's a breakdown of how retention works:
- Topics and Partitions: Data in Redpanda is organized into topics, which are further divided into partitions. Each partition handles a subset of the data. See also “How we use Kafka in the United Manufacturing Hub” for an explanation how Kafka works.
- Segments: Partitions are divided into segments (e.g.,
0_18
,1_18
), which are physical log files that store the records.. - Active and closed segments: Only one segment per partition is active (open for read/write). Once a segment reaches a specified size (
segment.bytes
), it's closed and a new segment is created. - Retention Triggers: After a segment is closed, the
retention.ms
timer starts. When the retention period expires, the segment is eligible for deletion
UMH Default Retention Settings
When running Redpanda within UMH, the default retention settings are as follows:
- Retention Duration (
retention.ms
): 7 days - Segment Duration (
segment.ms
): 14 days - Segment Size (
segment.bytes
): 128 MiB
These defaults apply to all new topics created within the UMH environment.
Why adjust retention?
The default retention settings may lead to significant storage usage, especially under high message throughput. For instance:
- Example Scenario:
- Message Rate: 10,000 messages per second
- Retention Duration: 7 days
- Potential Storage Usage: ~400 GB per week
If your infrastructure lacks sufficient storage, adjusting retention settings can help manage and reduce disk usage effectively.
Step-by-Step Guide to Reducing Retention
1. Access the Redpanda Console
Start by accessing the Redpanda management console:
- Open your web browser.
- Go to
http://<IP>:8090
, replacing<IP>
with your UMH's IP address.
http://<IP>:8090
2. Go to Topics
Once logged into the console:
- Click Topics on the left sidebar.
- You'll see a list of topics. Select the topic you want to configure, especially those using a lot of data.
3. Select and Configure a Topic
After selecting a topic:
- Click the Configuration tab for the chosen topic.
- You can adjust the retention settings.
4. Adjust segment.ms
and retention.ms
To optimize retention:
- Set
segment.ms
: - Default: 14 days
- Recommended Change: 1 hour
This makes a new segment every hour, so older segments can be deleted based on retention policies.
- Set
retention.ms
: - Default: 7 days
- Recommended Change: 24 hours
This reduces the retention period, ensuring that data older than 24 hours is eligible for deletion.
retention.ms = 86400000 # 24 hours in milliseconds
- Understand
segment.bytes
: - Default: 128 MiB
- Each segment is deleted after the retention period.
5. Apply Size-Based Triggers (Optional)
You can also manage retention based on size.
- Use size-based triggers instead of time-based ones.
- This approach deletes old segments based on the total size rather than time, which can be beneficial for certain use cases.
retention.bytes = <desired_size_in_bytes>
Note: When testing these settings, ensure you use a sufficiently large topic or adjust segment.ms
to a smaller value to trigger segment rollover. Remember, changes to segment.ms
affect only new segments; existing segments remain active until rolled over.
Verifying Your Configuration
To confirm that your retention settings are correctly applied:
- Access the Redpanda Pod:
- Use tools like k9s or OpenLens to access the pod.
- Navigate to the Topic Directory:
bash ls -al /var/lib/redpanda/data/kafka/<your_topic_name>/
Replace<your_topic_name>
with your actual topic name (e.g.,umh.v1.enterprise-of-kings
). - Inspect Partitions and Segments:
bash ls -al /var/lib/redpanda/data/kafka/<your_topic_name>/<partition>_18/
Example:bash ls -al /var/lib/redpanda/data/kafka/umh.v1.enterprise-of-kings/0_18/
- Monitor Logs for Segment Activities: Check Redpanda pod logs to see segment creation and deletion events.
- Segment Creation:
INFO 2024-09-23 13:38:47,247 [shard 0] storage - segment.cc:759 - Creating new segment /var/lib/redpanda/data/kafka/<your_topic_name>/<partition>/new_segment.log
- Segment Deletion:
INFO 2024-09-23 13:39:10,209 [shard 0] storage - disk_log_impl.cc:1479 - Removing "/var/lib/redpanda/data/kafka/<your_topic_name>/<partition>/old_segment.log"
Related Resources
- Reducing Database Size in TimescaleDB: Learn how to manage database size using compression and retention strategies. Reduce Database Size in TimescaleDB
- Understanding Kafka Segment Retention: Although specific to Apache Kafka, the principles apply to Redpanda as well. Kafka Segment Retention Explained