You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
'mm' should be able to interface (i.e. read, write, and probe) standard multimedia formats, such as images, videos, and audios.
It should use the types present in nx (i.e. tensor0s, nx.array) whenever possible. For example, as soon as you read an image it should be just the raw pixel data, or a tuple of the raw data and relevant metadata
There should be stream writers as well for outputting/writing long streams as well as well as iteration through frames. We should use libav when available and fallbacks for whatever formats we can support without it (but this module won't be super useful when those aren't present)
Types:
mm.ImageStream - From a video, yields frames of the video like an iterable
mm.AudioStream - from an audio/video, yields chunks of the audio like an iterable
The text was updated successfully, but these errors were encountered:
Renamed the module to av, since it is more apt (multimedia may refer more to something like SDL or SFML)
Could also be called media, but I like av since it is short, yet consise. Also, the types have changed (there is only one main type av will support, but will provide utility functions and proxy objects). The main type is av.IO, which is a general purpose media IO which can have multiple streams.
av.open(src, mode) works a lot like the builtin open function, except it returns an av.IO, which is like a media container. Iterating over it returns frames from streams (so, in an audio+video file iterating over the opened media container will produce single frames and chunks of audio). They are returned as a tuple of (streamidx, val) (where streamidx is the index of the stream, and val is an nx.array/nx.view object describing the raw data)
Thoughts on improvement:
Internal queue on av.IO objects so irrelevant packets from streams can be skipped but not discarded (iterating over an av.IO will first yield the queue, and only after it is exhausted, then actually resume decoding)
Single stream objects from the av.IO, which are like proxy objects. They internally iterate over av.IO packets, and yield those with the correct stream index, and push the others on the back of the queue. This would make it easy to iterate over a single video stream
av.open() should also be able to create write-able streams. The problem is that creating a media container is a complicated task, in general, so we may need specialty functions (see the markdown below in this post)
av.open(src, mode='r')
Right now, only the default mode 'r' is supported, but I'd like to support 'w', as well as specific containers specifiable via 'w:'.
Readable streams behave as expected (as described above), but writable streams are another problem. In general there are lots of ways people may want to create media and we don't want to lock them into a specific paradigm (i.e. it would be too limiting to always assume a video file is a video and audio stream at a certain resolution).
We should probably have reasonable defaults, but we could have something like:
'mm' should be able to interface (i.e. read, write, and probe) standard multimedia formats, such as images, videos, and audios.
It should use the types present in
nx
(i.e. tensor0s,nx.array
) whenever possible. For example, as soon as you read an image it should be just the raw pixel data, or a tuple of the raw data and relevant metadataThere should be stream writers as well for outputting/writing long streams as well as well as iteration through frames. We should use libav when available and fallbacks for whatever formats we can support without it (but this module won't be super useful when those aren't present)
Types:
mm.ImageStream
- From a video, yields frames of the video like an iterablemm.AudioStream
- from an audio/video, yields chunks of the audio like an iterableThe text was updated successfully, but these errors were encountered: