Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'av' - Audio/Video/Media module #5

Open
cadebrown opened this issue Jan 3, 2021 · 1 comment
Open

'av' - Audio/Video/Media module #5

cadebrown opened this issue Jan 3, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request module Relating to a standard module

Comments

@cadebrown
Copy link
Owner

'mm' should be able to interface (i.e. read, write, and probe) standard multimedia formats, such as images, videos, and audios.

It should use the types present in nx (i.e. tensor0s, nx.array) whenever possible. For example, as soon as you read an image it should be just the raw pixel data, or a tuple of the raw data and relevant metadata

There should be stream writers as well for outputting/writing long streams as well as well as iteration through frames. We should use libav when available and fallbacks for whatever formats we can support without it (but this module won't be super useful when those aren't present)

Types:

  • mm.ImageStream - From a video, yields frames of the video like an iterable
  • mm.AudioStream - from an audio/video, yields chunks of the audio like an iterable
@cadebrown cadebrown added the enhancement New feature or request label Jan 3, 2021
@gcrois gcrois added the module Relating to a standard module label Jan 3, 2021
@gcrois gcrois changed the title 'mm' - Multi-media library 'mm' - Multi-media module Jan 3, 2021
@cadebrown cadebrown changed the title 'mm' - Multi-media module 'av' - Audio/Video/Media module Jan 7, 2021
@cadebrown
Copy link
Owner Author

Renamed the module to av, since it is more apt (multimedia may refer more to something like SDL or SFML)

Could also be called media, but I like av since it is short, yet consise. Also, the types have changed (there is only one main type av will support, but will provide utility functions and proxy objects). The main type is av.IO, which is a general purpose media IO which can have multiple streams.

av.open(src, mode) works a lot like the builtin open function, except it returns an av.IO, which is like a media container. Iterating over it returns frames from streams (so, in an audio+video file iterating over the opened media container will produce single frames and chunks of audio). They are returned as a tuple of (streamidx, val) (where streamidx is the index of the stream, and val is an nx.array/nx.view object describing the raw data)

Thoughts on improvement:

  • Internal queue on av.IO objects so irrelevant packets from streams can be skipped but not discarded (iterating over an av.IO will first yield the queue, and only after it is exhausted, then actually resume decoding)
  • Single stream objects from the av.IO, which are like proxy objects. They internally iterate over av.IO packets, and yield those with the correct stream index, and push the others on the back of the queue. This would make it easy to iterate over a single video stream
  • av.open() should also be able to create write-able streams. The problem is that creating a media container is a complicated task, in general, so we may need specialty functions (see the markdown below in this post)

av.open(src, mode='r')

Right now, only the default mode 'r' is supported, but I'd like to support 'w', as well as specific containers specifiable via 'w:'.

Readable streams behave as expected (as described above), but writable streams are another problem. In general there are lots of ways people may want to create media and we don't want to lock them into a specific paradigm (i.e. it would be too limiting to always assume a video file is a video and audio stream at a certain resolution).

We should probably have reasonable defaults, but we could have something like:

av.open(src, mode='w', streams=[ ('encoder-name', (shape,), {extra: meta}), ... ])

Where the streams argument is an iterable of stream descriptors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module Relating to a standard module
Projects
None yet
Development

No branches or pull requests

2 participants