Reduce database size

This page describes how to reduce the size of the United Manufacturing Hub database.

Over time, time-series data can consume a large amount of disk space. To reduce the amount of disk space used by time-series data, there are three options:

  • Enable data compression. This reduces the required disk space by applying mathematical compression to the data. This compression is lossless, so the data is not changed in any way. However, it will take more time to compress and decompress the data. For more information, see how TimescaleDB compression works.
  • Enable data retention. This deletes old data that is no longer needed, by setting policies that automatically delete data older than a specified time. This can be beneficial for managing the size of the database, as well as adhering to data retention regulations. However, by definition, data loss will occur. For more information, see how TimescaleDB data retention works.
  • Downsampling. This is a method of reducing the amount of data stored by aggregating data points over a period of time. For example, you can aggregate data points over a 30-minute period, instead of storing each data point. If exact data is not required, downsampling can be useful to reduce database size. However, data may be less accurate.

Before you begin

You need to have a UMH cluster. If you do not already have a cluster, you can create one by following the Getting Started guide.

You also need to access the system where the cluster is running, either by logging into it or by using a remote shell.

Open the database shell

sudo $(which kubectl) exec -it $(sudo $(which kubectl) get pods --kubeconfig /etc/rancher/k3s/k3s.yaml -n united-manufacturing-hub -l app.kubernetes.io/component=timescaledb -o jsonpath="{.items[0].metadata.name}") --kubeconfig /etc/rancher/k3s/k3s.yaml -n united-manufacturing-hub -- psql -U postgres

This command will open a psql shell connected to the default postgres database.

Connect to the corresponding database:


  \c factoryinsight
  


  \c umh_v2
  

Enable data compression

You can find sample SQL commands to enable data compression here.

  1. The first step is to turn on data compression on the target table, and set the compression options. Refer to the TimescaleDB documentation for a full list of options.

    
          -- set "asset_id" as the key for the compressed segments and orders the table by "valuename".
          ALTER TABLE processvaluetable SET (timescaledb.compress, timescaledb.compress_segmentby = 'asset_id', timescaledb.compress_orderby = 'valuename');
        

    
          -- set "asset_id" as the key for the compressed segments and orders the table by "name".
          ALTER TABLE tag SET (timescaledb.compress, timescaledb.compress_segmentby = 'asset_id', timescaledb.compress_orderby = 'name');
        
  2. Then, you have to create the compression policy. The interval determines the age that the chunks of data need to reach before being compressed. Read the official documentation for more information.

    
          -- set a compression policy on the "processvaluetable" table, which will compress data older than 7 days.
          SELECT add_compression_policy('processvaluetable', INTERVAL '7 days');
        

    
          -- set a compression policy on the "tag" table, which will compress data older than 2 weeks.
          SELECT add_compression_policy('tag', INTERVAL '2 weeks');
        

Enable data retention

You can find sample SQL commands to enable data retention here.

Sample command for factoryinsight and umh_v2 databases:

Enabling data retention consists in only adding the policy with the desired retention interval. Refer to the official documentation for more detailed information about these queries.


  -- Set a retention policy on the "processvaluetable" table, which will delete data older than 7 days.
  SELECT add_retention_policy('processvaluetable', INTERVAL '7 days');
  


  -- set a retention policy on the "tag" table, which will delete data older than 3 months.
  SELECT add_retention_policy('tag', INTERVAL '3 months');
  

What’s next