'av' - Audio/Video/Media module #5

cadebrown · 2021-01-03T19:24:22Z

'mm' should be able to interface (i.e. read, write, and probe) standard multimedia formats, such as images, videos, and audios.

It should use the types present in nx (i.e. tensor0s, nx.array) whenever possible. For example, as soon as you read an image it should be just the raw pixel data, or a tuple of the raw data and relevant metadata

There should be stream writers as well for outputting/writing long streams as well as well as iteration through frames. We should use libav when available and fallbacks for whatever formats we can support without it (but this module won't be super useful when those aren't present)

Types:

mm.ImageStream - From a video, yields frames of the video like an iterable
mm.AudioStream - from an audio/video, yields chunks of the audio like an iterable

The text was updated successfully, but these errors were encountered:

cadebrown · 2021-01-07T06:09:31Z

Renamed the module to av, since it is more apt (multimedia may refer more to something like SDL or SFML)

Could also be called media, but I like av since it is short, yet consise. Also, the types have changed (there is only one main type av will support, but will provide utility functions and proxy objects). The main type is av.IO, which is a general purpose media IO which can have multiple streams.

av.open(src, mode) works a lot like the builtin open function, except it returns an av.IO, which is like a media container. Iterating over it returns frames from streams (so, in an audio+video file iterating over the opened media container will produce single frames and chunks of audio). They are returned as a tuple of (streamidx, val) (where streamidx is the index of the stream, and val is an nx.array/nx.view object describing the raw data)

Thoughts on improvement:

Internal queue on av.IO objects so irrelevant packets from streams can be skipped but not discarded (iterating over an av.IO will first yield the queue, and only after it is exhausted, then actually resume decoding)
Single stream objects from the av.IO, which are like proxy objects. They internally iterate over av.IO packets, and yield those with the correct stream index, and push the others on the back of the queue. This would make it easy to iterate over a single video stream
av.open() should also be able to create write-able streams. The problem is that creating a media container is a complicated task, in general, so we may need specialty functions (see the markdown below in this post)

av.open(src, mode='r')

Right now, only the default mode 'r' is supported, but I'd like to support 'w', as well as specific containers specifiable via 'w:'.

Readable streams behave as expected (as described above), but writable streams are another problem. In general there are lots of ways people may want to create media and we don't want to lock them into a specific paradigm (i.e. it would be too limiting to always assume a video file is a video and audio stream at a certain resolution).

We should probably have reasonable defaults, but we could have something like:

av.open(src, mode='w', streams=[ ('encoder-name', (shape,), {extra: meta}), ... ])

Where the streams argument is an iterable of stream descriptors

cadebrown added the enhancement New feature or request label Jan 3, 2021

gcrois assigned cadebrown Jan 3, 2021

gcrois added the module Relating to a standard module label Jan 3, 2021

gcrois changed the title ~~'mm' - Multi-media library~~ 'mm' - Multi-media module Jan 3, 2021

cadebrown changed the title ~~'mm' - Multi-media module~~ 'av' - Audio/Video/Media module Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'av' - Audio/Video/Media module #5

'av' - Audio/Video/Media module #5

cadebrown commented Jan 3, 2021

cadebrown commented Jan 7, 2021

'av' - Audio/Video/Media module #5

'av' - Audio/Video/Media module #5

Comments

cadebrown commented Jan 3, 2021

cadebrown commented Jan 7, 2021

av.open(src, mode='r')