By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Kafka ·

How to Manage Retention in Redpanda on UMH

Learn to use Redpanda on UMH with our technical guide. Find out how to manage storage and improve performance. Perfect for UMH users and Redpanda fans.

How to Manage Retention in Redpanda on UMH

Each UMH Classic instance comes with a dedicated Redpanda deployment configured with default settings that may not match specific application requirements. Understanding Redpanda's retention mechanisms - such as retention.ms and segment.bytes - and customizing these settings is essential to optimize storage utilization and ensure high-performance data streaming within the UMH ecosystem.

Understanding Retention in Redpanda

Redpanda keeps data for a set period before deleting it. It works based on time (retention.ms) and size (segment.bytes). Here's a breakdown of how retention works:

  • Topics and Partitions: Data in Redpanda is organized into topics, which are further divided into partitions. Each partition handles a subset of the data. See also “How we use Kafka in the United Manufacturing Hub” for an explanation how Kafka works.
  • Segments: Partitions are divided into segments (e.g., 0_18, 1_18), which are physical log files that store the records..
  • Active and closed segments: Only one segment per partition is active (open for read/write). Once a segment reaches a specified size (segment.bytes), it's closed and a new segment is created.
  • Retention Triggers: After a segment is closed, the retention.ms timer starts. When the retention period expires, the segment is eligible for deletion

UMH Default Retention Settings

When running Redpanda within UMH, the default retention settings are as follows:

  • Retention Duration (retention.ms): 7 days
  • Segment Duration (segment.ms): 14 days
  • Segment Size (segment.bytes): 128 MiB

These defaults apply to all new topics created within the UMH environment.

Why adjust retention?

The default retention settings may lead to significant storage usage, especially under high message throughput. For instance:

  • Example Scenario:
  • Message Rate: 10,000 messages per second
  • Retention Duration: 7 days
  • Potential Storage Usage: ~400 GB per week

If your infrastructure lacks sufficient storage, adjusting retention settings can help manage and reduce disk usage effectively.

Step-by-Step Guide to Reducing Retention

1. Access the Redpanda Console

Start by accessing the Redpanda management console:

  1. Open your web browser.
  2. Go to http://<IP>:8090, replacing <IP> with your UMH's IP address.
   http://<IP>:8090

2. Go to Topics

Once logged into the console:

  1. Click Topics on the left sidebar.
  2. You'll see a list of topics. Select the topic you want to configure, especially those using a lot of data.

3. Select and Configure a Topic

After selecting a topic:

  1. Click the Configuration tab for the chosen topic.
  2. You can adjust the retention settings.

4. Adjust segment.ms and retention.ms

To optimize retention:

  1. Set segment.ms:
  2. Default: 14 days
  3. Recommended Change: 1 hour

This makes a new segment every hour, so older segments can be deleted based on retention policies.

  1. Set retention.ms:
  2. Default: 7 days
  3. Recommended Change: 24 hours

This reduces the retention period, ensuring that data older than 24 hours is eligible for deletion.

   retention.ms = 86400000  # 24 hours in milliseconds
  1. Understand segment.bytes:
  2. Default: 128 MiB
  3. Each segment is deleted after the retention period.

5. Apply Size-Based Triggers (Optional)

You can also manage retention based on size.

  1. Use size-based triggers instead of time-based ones.
  2. This approach deletes old segments based on the total size rather than time, which can be beneficial for certain use cases.
   retention.bytes = <desired_size_in_bytes>

Note: When testing these settings, ensure you use a sufficiently large topic or adjust segment.ms to a smaller value to trigger segment rollover. Remember, changes to segment.ms affect only new segments; existing segments remain active until rolled over.

Verifying Your Configuration

To confirm that your retention settings are correctly applied:

  1. Access the Redpanda Pod:
  2. Use tools like k9s or OpenLens to access the pod.
  3. Navigate to the Topic Directory: bash ls -al /var/lib/redpanda/data/kafka/<your_topic_name>/ Replace <your_topic_name> with your actual topic name (e.g., umh.v1.enterprise-of-kings).
  4. Inspect Partitions and Segments: bash ls -al /var/lib/redpanda/data/kafka/<your_topic_name>/<partition>_18/ Example: bash ls -al /var/lib/redpanda/data/kafka/umh.v1.enterprise-of-kings/0_18/
  5. Monitor Logs for Segment Activities: Check Redpanda pod logs to see segment creation and deletion events.
  6. Segment Creation: INFO 2024-09-23 13:38:47,247 [shard 0] storage - segment.cc:759 - Creating new segment /var/lib/redpanda/data/kafka/<your_topic_name>/<partition>/new_segment.log
  7. Segment Deletion: INFO 2024-09-23 13:39:10,209 [shard 0] storage - disk_log_impl.cc:1479 - Removing "/var/lib/redpanda/data/kafka/<your_topic_name>/<partition>/old_segment.log"

Read next

Share, Engage, and Contribute!

Discover how you can share your ideas, contribute to our blog, and connect with us on other platforms.