Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Add an ENV to set a configurable data retention period, keeping only recent data and automatically removing older entries #70

Open
amirashrafy opened this issue Jul 11, 2024 · 1 comment

Comments

@amirashrafy
Copy link

I have been running this script for 50 days, and the database has reached over 500GB. To address this rapid growth and manage storage more effectively, I suggest the following:
Add an ENV to set a configurable data retention period, keeping only recent data and automatically removing older entries. This approach will help to:

  • Control database size: Regularly removing old data can prevent unchecked growth.

  • Optimize performance: A smaller database typically leads to faster query times and better overall performance.

  • Reduce storage costs: Limiting data retention will help manage our storage expenses.

Implementing this change should significantly slow down the database growth rate and help me maintain a more manageable system.

@kdimentionaltree
Copy link
Collaborator

Hello, @amirashrafy. Thank you for your feedback.

The main purpose of this indexer is to store whole history of the blockchain so we didn't consider a data retention period feature.

However, one can index blockchain and store history from some specific masterchain block. In this case here is some suggestions of how to prune data and prevent unchecked growth of the database.

  1. Select some masterchain block seqno to use it as threshold to prune data. Here is a query to find the threshold to keep last 1000 blocks: select max(seqno) - 1000 from blocks where workchain = -1;.
  2. Drop outdated transactions: delete from transactions where mc_block_seqno < {threshold};.
  3. Drop outdated blocks: delete from blocks where mc_block_seqno < {threshold};.
  4. Drop masterchain blocks from blocks table: delete from blocks where workchain = -1 and seqno < {threshold}; (one may suppose that command (3) will clean them, but in old versions of ton-indexer mc_block_seqno for masterchain blocks may be NULL).

Caveats:

  1. The algorithm above will clean blocks and transactions tables (these tables requires a lot of storage). But, this algorithm will lead you to data orphans in tables "messages", "nft_transfers", "jetton_transfers", "jetton_burns".
  2. When you'll try to delete transactions, which represents NFT or Jetton transfers, you may face the issue "foreign key violation". In this case you need to drop the existing foreign key and recreate it with ON DELETE CASCADE clause. To get the exact foreign key creation SQL-queries, please inspect database schema using some tool like DataGrip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants