Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce cache layer to improve performance #15937

Closed
wy65701436 opened this issue Nov 3, 2021 · 9 comments
Closed

Introduce cache layer to improve performance #15937

wy65701436 opened this issue Nov 3, 2021 · 9 comments

Comments

@wy65701436
Copy link
Contributor

wy65701436 commented Nov 3, 2021

In the currely design, we found that in large scale scenario usage, Harbor in some particular case will reache the DB connection limit and high CPU usage.

However, it doesn't have to let all get/list requests go directly to the database. If it introduces an cache layer before DB, it will significantly improve concurrency capability.

@Vad1mo
Copy link
Member

Vad1mo commented Nov 3, 2021

We should also evaluate the option to use a third party DB layer cache present in Connection Pooler ( PgBouncer vs. Pgpool-II)

Also, if you plan to use a cache, we need to have an in Code Distributed Cache (ICDC) such as https://github.com/buraksezer/olric in order to scale horizontally.

Here I propose to switch to pgx the golang PostgreSQL driver which has some caching options as well.
related: #15209

@buraksezer
Copy link

Hi all,

Thank you for mentioning Olric. I'm the author of that library. If you have any questions, it will be my pleasure to help you.

@xaleeks
Copy link
Contributor

xaleeks commented Nov 9, 2021

let's aim for a concise problem statement and a design doc completed by v2.5, including estimation of performance improvement etc. Assigning to @wy65701436

@chlins
Copy link
Member

chlins commented Jan 20, 2022

Move to 2.6 after discussions. cc @xaleeks

@chlins
Copy link
Member

chlins commented Mar 3, 2022

Action Items

@schrej
Copy link

schrej commented Mar 11, 2022

Have you considered implementing connection pooling and re-using existing connections instead of opening one per request?
With our deployment we mainly have issues with the amount of connections getting opened by harbor, and it seems like it's one for each individual request. With authentication enabled that leads to pretty significant CPU overhead for authentication.

Imo it would be a better approach to try optimising the usage of the database before adding an additional layer.

Edt: After digging through the code, it seems like connection pooling is already enabled. Why is it that it needs that many connections then? Do they get locked up by transactions?
How does the database interaction work when uploading images for example, is it creating a transaction that takes as long as the upload?

@github-actions
Copy link

github-actions bot commented Jul 5, 2022

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions github-actions bot added the Stale label Jul 5, 2022
@Vad1mo Vad1mo removed the Stale label Jul 5, 2022
@chlins
Copy link
Member

chlins commented Jul 6, 2022

@schrej The cache layer is an abstract concept which based on the harbor codebase, there is no additional component will be introduced, we just cache the mostly used resource to the redis for quick search and reduce the database connection, the question you mentioned need to analyze case by case, so feel free to file issue when you met the db connection issue and describe your scenario, thanks.

@chlins
Copy link
Member

chlins commented Jul 6, 2022

Close this epic as engineer story has been completed.

@chlins chlins closed this as completed Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants