-
Notifications
You must be signed in to change notification settings - Fork 79
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
proposal: async update pull_time and pull_count
Signed-off-by: chlins <chenyuzh@vmware.com>
- Loading branch information
Showing
7 changed files
with
278 additions
and
0 deletions.
There are no files selected for viewing
278 changes: 278 additions & 0 deletions
278
proposals/async-update-artifact-pull_time-and-pull_count.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,278 @@ | ||
# Proposal: Asynchronous update of artifact's `pull_time` and `pull_count` | ||
|
||
Author: ChenYu Zhang / [chlins](https://github.com/chlins) | ||
|
||
## Background | ||
|
||
Currently, if the user pulls an artifact from the harbor, the `harbor-core` component | ||
will raise two goroutines to update the artifact's `pull_time` and `pull_count` | ||
in the background for every `pull` request, and since it is a concurrent | ||
operation on a single record of the same table, problems such as goroutines | ||
piling up and database lock waiting can occur in the case of a high concurrent pull, | ||
this will also take up the number of connections to the database and thus reduce | ||
the performance of harbor. | ||
|
||
## Abstract | ||
|
||
Asynchronous update and merge the update operations of artifact's `pull_time` and `pull_count` to reduce | ||
the duplication of update work to the database. | ||
|
||
## Goals | ||
|
||
* Reduce the number of database locks. | ||
|
||
* Fewer operations on the database update. | ||
|
||
* Improve the throughput of the harbor in the case of high concurrent pull. | ||
|
||
## Non-Goals | ||
|
||
* Fully accurate `pull_time` and `pull_count` of artifact. | ||
|
||
* Immediate synchronized `pull_time` and `pull_count` of artifact after pulling. | ||
|
||
## Implementation | ||
|
||
Related code path: `src/controller/event/handler/internal/artifact.go` | ||
|
||
### Rationale | ||
|
||
* Cache the operation of pull into the memory. | ||
|
||
* Merge the operations and only cache the final state. | ||
|
||
* Periodically flush data from memory to the database. | ||
|
||
### Model | ||
|
||
Extend the previous artifact event Handler. | ||
|
||
```go | ||
// Handler preprocess artifact event data | ||
type Handler struct { | ||
// once for initialize operation | ||
once sync.Once | ||
// pullCountStore caches the pull count group by repository | ||
// map[repositoryID]counts | ||
pullCountStore map[int64]uint64 | ||
// pullCountLock mutex for pullCountStore | ||
pullCountLock sync.Mutex | ||
// pullTimeStore caches the latest pull time group by artifact | ||
// map[artifactID:tagName]time | ||
pullTimeStore map[string]time.Time | ||
// pullTimeLock mutex for pullTimeStore | ||
pullTimeLock sync.Mutex | ||
} | ||
``` | ||
|
||
Adjust the logic of the `onPull` function. | ||
|
||
```go | ||
func (a *Handler) onPull(ctx context.Context, event *event.ArtifactEvent) error { | ||
a.once.Do(func() { | ||
a.pullTimeStore = make(map[string]time.Time) | ||
a.pullCountStore = make(map[int64]uint64) | ||
|
||
go a.asyncFlushPullTime(orm.Context()) | ||
go a.asyncFlushPullCount(orm.Context()) | ||
}) | ||
|
||
a.updatePullTime(ctx, event) | ||
a.addPullCount(ctx, event) | ||
return nil | ||
} | ||
``` | ||
|
||
Refactor `updatePullTime` and `addPullCount` to cache and merge operations in | ||
memory. | ||
|
||
```go | ||
func (a *Handler) updatePullTime(ctx context.Context, event *event.ArtifactEvent) { | ||
var tagName string | ||
if len(event.Tags) != 0 { | ||
tagName = event.Tags[0] | ||
} | ||
|
||
key := fmt.Sprintf("%d:%s", event.Artifact.ID, tagName) | ||
|
||
a.pullTimeLock.Lock() | ||
defer a.pullTimeLock.Unlock() | ||
|
||
a.pullTimeStore[key] = time.Now() | ||
} | ||
|
||
func (a *Handler) addPullCount(ctx context.Context, event *event.ArtifactEvent) { | ||
a.pullCountLock.Lock() | ||
defer a.pullCountLock.Unlock() | ||
|
||
a.pullCountStore[event.Artifact.RepositoryID] = a.pullCountStore[event.Artifact.RepositoryID] + 1 | ||
} | ||
``` | ||
|
||
Implement new functions for async update `pull_time` and `pull_count` | ||
periodically. | ||
|
||
```go | ||
|
||
var ( | ||
// maybe we can expose this variable to harbor configuration or environment variable. | ||
asyncFlushDuration = 60 * time.Second | ||
) | ||
|
||
|
||
func (a *Handler) asyncFlushPullCount(ctx context.Context) { | ||
for { | ||
<-time.After(asyncFlushDuration) | ||
a.pullCountLock.Lock() | ||
|
||
for repositoryID, count := range a.pullCountStore { | ||
if err := repository.Ctl.AddPullCount(ctx, repositoryID, count); err != nil { | ||
log.Debugf("failed to add pull count repository %d, %v", repositoryID, err) | ||
} | ||
} | ||
|
||
a.pullCountStore = make(map[int64]uint64) | ||
a.pullCountLock.Unlock() | ||
} | ||
} | ||
|
||
func (a *Handler) asyncFlushPullTime(ctx context.Context) { | ||
for { | ||
<-time.After(asyncFlushDuration) | ||
a.pullTimeLock.Lock() | ||
|
||
for key, time := range a.pullTimeStore { | ||
var tagID int64 | ||
|
||
keys := strings.Split(key, ":") | ||
artifactID, err := strconv.ParseInt(keys[0], 10, 64) | ||
if err != nil { | ||
log.Errorf("failed to parse artifact id %s, %v", key, err) | ||
continue | ||
} | ||
// with tag name | ||
if len(keys) > 1 && keys[1] != "" { | ||
tags, err := tag.Ctl.List(ctx, &q.Query{ | ||
Keywords: map[string]interface{}{ | ||
"ArtifactID": artifactID, | ||
"Name": keys[1], | ||
}, | ||
}, nil) | ||
if err != nil { | ||
log.Infof("failed to list tags when to update pull time, %v", err) | ||
} else { | ||
if len(tags) != 0 { | ||
tagID = tags[0].ID | ||
} | ||
} | ||
} | ||
|
||
if err := artifact.Ctl.UpdatePullTime(ctx, artifactID, tagID, time); err != nil { | ||
log.Debugf("failed to update pull time form artifact %d, %v", artifactID, err) | ||
} | ||
} | ||
|
||
a.pullTimeStore = make(map[string]time.Time) | ||
a.pullTimeLock.Unlock() | ||
} | ||
} | ||
``` | ||
|
||
### Comparison of optimization | ||
|
||
Scenarios | ||
|
||
Harbor deployed by `docker-compose` in the host which with 16 cpu cores and 16G | ||
memory, use `k6` to simulate 5k concurrency to pull the same image manifest. | ||
|
||
performance test tool: [perf](https://github.com/goharbor/perf) | ||
|
||
#### summary | ||
|
||
before | ||
|
||
```bash | ||
running (0d00h16m53.4s), 0000/5000 VUs, 500000 complete and 0 interrupted iterations | ||
default ✓ [======================================] 5000 VUs 0d00h16m53.4s/24h0m0s 500000/500000 shared iters | ||
|
||
|
||
data_received..................: 660 MB 652 kB/s | ||
data_sent......................: 99 MB 98 kB/s | ||
http_req_blocked...............: avg=24.54ms min=745ns med=2µs max=3.92s p(90)=3.48µs p(95)=4.61µs | ||
http_req_connecting............: avg=990.65µs min=0s med=0s max=281.21ms p(90)=0s p(95)=0s | ||
http_req_duration..............: avg=10.06s min=77.31ms med=7.64s max=1m0s p(90)=18.9s p(95)=25.36s | ||
{ expected_response:true }...: avg=9.95s min=77.31ms med=7.63s max=59.99s p(90)=18.73s p(95)=24.97s | ||
http_req_failed................: 0.21% ✓ 1072 ✗ 498928 | ||
http_req_receiving.............: avg=58.79µs min=0s med=54.41µs max=24.57ms p(90)=82.5µs p(95)=92.77µs | ||
http_req_sending...............: avg=16.02µs min=5.52µs med=13.24µs max=19.21ms p(90)=24.34µs p(95)=29.34µs | ||
http_req_tls_handshaking.......: avg=23.54ms min=0s med=0s max=3.78s p(90)=0s p(95)=0s | ||
http_req_waiting...............: avg=10.06s min=76.85ms med=7.64s max=1m0s p(90)=18.9s p(95)=25.36s | ||
http_reqs......................: 500000 493.390077/s | ||
iteration_duration.............: avg=10.08s min=84.13ms med=7.67s max=1m0s p(90)=18.93s p(95)=25.37s | ||
✓ { scenario:default }.........: avg=10.08s min=84.13ms med=7.67s max=1m0s p(90)=18.93s p(95)=25.37s | ||
iterations.....................: 500000 493.390077/s | ||
success........................: 99.78% ✓ 498928 ✗ 1072 | ||
vus............................: 211 min=211 max=5000 | ||
vus_max........................: 5000 min=5000 max=5000 | ||
``` | ||
after | ||
```bash | ||
running (0d00h08m36.4s), 0000/5000 VUs, 500000 complete and 0 interrupted iterations | ||
default ✓ [======================================] 5000 VUs 0d00h08m36.4s/24h0m0s 500000/500000 shared iters | ||
|
||
|
||
data_received..................: 660 MB 1.3 MB/s | ||
data_sent......................: 99 MB 191 kB/s | ||
http_req_blocked...............: avg=21.86ms min=794ns med=1.67µs max=3.05s p(90)=3.12µs p(95)=4.15µs | ||
http_req_connecting............: avg=1.24ms min=0s med=0s max=312.9ms p(90)=0s p(95)=0s | ||
http_req_duration..............: avg=5.12s min=16.33ms med=4.87s max=20.81s p(90)=7.56s p(95)=8.42s | ||
{ expected_response:true }...: avg=5.12s min=16.33ms med=4.87s max=20.81s p(90)=7.56s p(95)=8.42s | ||
http_req_failed................: 0.00% ✓ 0 ✗ 500000 | ||
http_req_receiving.............: avg=46.43µs min=17.24µs med=40.67µs max=37.29ms p(90)=66.4µs p(95)=77.75µs | ||
http_req_sending...............: avg=13.89µs min=5.22µs med=9.93µs max=17.08ms p(90)=21µs p(95)=26.15µs | ||
http_req_tls_handshaking.......: avg=20.6ms min=0s med=0s max=2.94s p(90)=0s p(95)=0s | ||
http_req_waiting...............: avg=5.12s min=16.28ms med=4.87s max=20.81s p(90)=7.56s p(95)=8.42s | ||
http_reqs......................: 500000 968.318185/s | ||
iteration_duration.............: avg=5.14s min=16.43ms med=4.89s max=20.81s p(90)=7.59s p(95)=8.46s | ||
✓ { scenario:default }.........: avg=5.14s min=16.43ms med=4.89s max=20.81s p(90)=7.59s p(95)=8.46s | ||
iterations.....................: 500000 968.318185/s | ||
success........................: 100.00% ✓ 500000 ✗ 0 | ||
vus............................: 1023 min=1023 max=5000 | ||
vus_max........................: 5000 min=5000 max=5000 | ||
``` | ||
#### pg_stat_activity | ||
before | ||
![pg_stat_activity](images/async-update-pull_count/before-pg-activity.png) | ||
after | ||
![pg_stat_activity](images/async-update-pull_count/after-pg-activity.png) | ||
#### pg_locks_count | ||
before | ||
![pg_locks_count](images/async-update-pull_count/before-pg-lock.png) | ||
after | ||
![pg_locks_count](images/async-update-pull_count/after-pg-lock.png) | ||
#### harbor_core_goroutines | ||
before | ||
![harbor_core_goroutines](images/async-update-pull_count/before-goroutine.png) | ||
after | ||
![harbor_core_goroutines](images/async-update-pull_count/after-goroutine.png) | ||
## Open issues | ||
<https://github.com/goharbor/harbor/issues/15963> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.