Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline-scoped (temporary) volumes [solves caching, preparation steps, inter step communication, ...] #1452

Closed
3 tasks done
smainz opened this issue Nov 27, 2022 · 4 comments
Labels
cache proposals ideas for make caching work seamles feature add new functionality

Comments

@smainz
Copy link
Contributor

smainz commented Nov 27, 2022

Clear and concise description of the problem

In some pipelines/workflows it is necessary to store data created by one step outside the source directory for use in other steps. E.g.

  • one step downloads dependencies for use in the next steps
  • a step sets up credentials (ssh-keys) to be used in later steps

At the moment the only way to share this between the steps is to either

  • store this in temporary directories in the workspace (does not play well wit git push)
  • share a volume (but this is not local to a pipeline and required privileged pipelines)
  • redo everything in each step.

Suggested solution

Main idea

Take the idea drone has already implemented and add a pipeline scoped configuration for volumes and change the semantic for volume configuration in a step to reference these volumes.

On pipeline/workflow level:

volumes:
  - name: my-temp-volume
    type: temp

In services / steps:

steps:
  - name: Interact with docker in docker
    volumes:
      - name: my-temp-volume   # this references a pipeline scoped volume
        path: /var/run/docker  # this is the path in the container to mount the volume to
    commands:
     - docker ps -a

services:
  - name: Docker in Docker
    image: docker/dind
    volumes:
      - name: my-temp-volume   # this references a pipeline scoped volume
        path: /var/run/docker  # this is the path in the container to mount the volume to

Temporary volumes should be created before the pipeline is run and deleted after the last step of a pipeline has finished.
The use of temporary volumes should not require privileged pipelines

Possible enhancements

Volumes on different kind

This could be enhanced on the pipeline level to provide different types of volumes

volumes:
  - name: my-host-volume
    type: host
    path: /some/absolute/path
  - name: my-docker-vomume
    type: docker
    volume: some-docker-volume
    create: false     # only works, if such a volume is already created

To solve the caching issue auto-magically, there could be volumes of type cache, which could be handled by some woodpecker magic, if still required. I consider a general caching solution a hard problem, but it could work something like this:

volumes:
  - name: some-kind-of-cache
    type: cache
    refresh:   ...
    additional-cache-config: ...

Volumes on different scope

One could provide volumes for different scopes (pipeline vs. workflow). This would require the possibility to provide configurations on a pipeline level (multiple workflows)

Quota for volumes

The only bad thing a step could do is to fill up the complete disk where the volumes are stored. There could be some limits on that (defined on the agent level) or in the project settings. Needs further ideas on how / what to limit

To be discussed

To be discussed: What would be the requirement for matrix builds?
E.g.

matrix:
  GO_VERSION:
    - 1.4
    - 1.3
    -
volumes:
  - name: my-volume-${GO_VERSION}
    type: temp

or something else?

Alternative

At the moment the following is possible (don't know if it is by intention):

  - name: Test volumes (1)
    image: alpine     
    volumes:
      - volume-test_${CI_PIPELINE_NUMBER}:/x
    commands: 
      - ls -lah /x
      - touch /x/file
      - ls -lah /x

  - name: Test volumes (2)
    image: alpine     
    volumes:
      - volume-test_${CI_PIPELINE_NUMBER}:/x
    commands: 
      - ls -lah /x
      - touch /x/file2
      - ls -lah /x

Executing the pipeline will create a docker volume on the host (or reuse an existing one) and bind it into the steps container.

But this

  • requires a privileged pipeline and
  • does not vlean up the volume-test_${CI_PIPELINE_NUMBER} docker volumes
  • is hard to get right to get unique volume names per pipeline execution

Additional context

There are already quite some issues for that or similair topic, but no common sulution

And a PoC on caching:

But i would like to see a common and consistent solution for most (all?) of these issues.

Validations

@anbraten
Copy link
Member

anbraten commented Nov 27, 2022

Some thoughts:

  • we need to make sure the user can only use his own volumes not the ones from others / existing volumes from none Woodpecker systems => maybe by using some prefix?
  • we should make sure volumes get removed after a pipeline is done or by some cleanup otherwise the agents system will be full with dangling volumes
  • how should a backend without volume support (local, ssh, ...) handle volumes?
  • should different / parallel steps be able to use the same volume? For k8s sharing rw volumes is normally a problem
  • Could we "just" place the workflow folder into a sub-directory, so the user can write to some folders like /workspace/my-cache while the repo is at /workspace/code...

@smainz
Copy link
Contributor Author

smainz commented Nov 27, 2022

  • we need to make sure the user can only use his own volumes not the ones from others / existing volumes from none Woodpecker systems => maybe by using some prefix?

Temporary volumes should be created when a pipeline is executed. To make the names unique we could use some random number and prefix it with wp_volume_<some UUID per pipeline run>_<volume number in config> This will make it unique enough and easy to recognize.

For the other types of volumes, you can use some config to allow volumes with names / paths matching some pattern, but this is the same problem we have now with the current volume implementation. So other types of volumes will require trusted projects for now.

  • we should make sure volumes get removed after a pipeline is done or by some cleanup otherwise the agents system will be full with dangling volumes

That is the idea of temporary volumes:

docker volume create wp_volume_...._1

docker run -v  wp_volume_...._1:/path/configured/in/step/for/volume/1
...
docker volume remove wp_volume_...._1

For other types of volume,s we have the same problem as today. maybe a cleanup job will help.

  • how should a backend without volume support (local, ssh, ...) handle volumes?

Probably woodpecker has to take the same route drone has taken: Have different config options per backend. I do not see how a local / ssh backend could support it, but for those you can use a directory with some naming convention. They can write on any place the user has permissions for.

BTW: What are people using local / ssh backends for?

  • should different / parallel steps be able to use the same volume? For k8s sharing rw volumes is normally a problem

Yes please. Maybe we once will externl need volume provider for these scenarios.

  • Could we "just" place the workflow folder into a sub-directory, so the user can write to some folders like /workspace/my-cache while the repo is at /workspace/code...

My use case is to put some files in the $HOME directory (ssh keys, .npmrc, settings.xml ,...) and to share a docker socket with a serivce container. Having this in the workspace does not help much, as you have to do a lot of config besides just using a third party program off the shelf.

@6543 6543 added the cache proposals ideas for make caching work seamles label Dec 23, 2022
@maxkratz
Copy link

maxkratz commented Feb 6, 2023

This would ease the sharing of a Docker socket with a service container. It would be a nice feature!

@lafriks
Copy link
Contributor

lafriks commented Feb 9, 2023

Need to keep in mind that docker volumes won't help much if multiple agents are deployed on different hosts or in swarm where agents on restart can change swarm node.

Also if you are allowed to specify host path for volume that would definitely create security issues

@anbraten anbraten added feature add new functionality and removed pending:feature labels Feb 28, 2023
@anbraten anbraten changed the title Pipeline-scoped (temporary) voumes [solves caching, preparation steps, inter step communication, ...] Pipeline-scoped (temporary) volumes [solves caching, preparation steps, inter step communication, ...] Aug 20, 2023
@woodpecker-ci woodpecker-ci locked and limited conversation to collaborators Aug 20, 2023
@anbraten anbraten converted this issue into discussion #2272 Aug 20, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
cache proposals ideas for make caching work seamles feature add new functionality
Projects
None yet
Development

No branches or pull requests

5 participants