9/26/2024 NOTE:

This documentation is not fully updated. We are moving from the old S3 buckets to new sponsored open data buckets in fall 2024! Most importantly the live lossy audio data are no longer stored in streaming-orcasound-net but instead are streaming to audio-orcasound-net...

Overview

Orcasound hydrophone data are stored in publicly accessible Amazon Web Service (AWS) Simple Cloud Storage Service (S3) buckets. The buckets have both public-list and public-read enabled, which means you can use the AWS Client to connect directly to the buckets, list the available files, and download them without any special credentials.

There are three types of buckets, two of which have live and dev versions:

Streaming -- lossy compressed data for live listening (e.g. HLS and/or DASH)
Archive -- lossless compressed data for nodes with sufficient bandwidth (FLAC format)
Sandboxes -- data sets for machine learning and other analyses

The two versions of the streaming bucket support three versions of the Orcasound app (as depicted in this evolution model): dev-streaming-orcasound-net is for end-to-end tests where the audio source is stable/known; streaming-orcasound-net is both for beta-testing new app features with realistic audio data from existing nodes and for the public production version at live.orcasound.net

Installing AWS CLI

pip install awscli

For linux distros, you may also use a package manager such as homebrew or apt-get. Or for a friendlier UI, check out SAWS. Either way, if the aws command works then you are ready to go!

Connecting to the buckets

No credentials are necessary to connect to the publicly accessible buckets, just use the --no-sign-request flag instead. For example, the command to access the lossy compressed audio stream segments (HLS format) in the streaming-orcasound-net bucket is:

aws --no-sign-request s3 ls streaming-orcasound-net

Practical example: If you take a look at the live stream for a particular node using the network tab of your browser's development console, you may be able to note the URL of the audio data segments.

From that URL, you should be able to derive variable $1 -- the node name (one string with underscores, e.g. bush_point) and variable $2 -- the UNIX timestamp of desired S3 folder within the node's hls folder. Then you can construct a command like this to download all the available data for that period:

aws s3 sync s3://streaming-orcasound-net/rpi_$1/hls/$2/ . generally or in this case of Bush Point in the evening of 27 Sep 2020 -- aws s3 sync s3://streaming-orcasound-net/rpi_bush_point/hls/1601253021/ .

7/28/2022 note: here is a shell script that Scott uses on OSX to grab 6-24 periods of live-streamed data that contain Orcasound bioacoustic bouts identified by human and/or machine detectors. There is a more programatic approach initiated by the OrcaHello realtime inference hackathon teams that was built upon by Dimtry during the 2021 Google Summer of Code. Prakruti and Valentina know the most about these efforts to improve and automate programmatic access to the Orcasound realtime data streams.

For nodes that have sufficent bandwidth, the lossless compressed audio data (FLAC format) can by found in the archive-orcasound-net bucket here:

aws --no-sign-request s3 ls archive-orcasound-net

Available buckets

Bucket	Description
streaming-orcasound-net	Production streaming data
dev-streaming-orcasound-net	Dev streaming data
archive-orcasound-net	Lossless compressed data
dev-archive-orcasound-net	Lossless compressed data
acoustic-sandbox	Acoustic machine learning labeled data & models
visual-sandbox	Visual machine learning labeled data & models

AWS CLI syntax

To learn how to use the AWS CLI to download Orcasound data, please see Using Amazon S3 with the AWS CLI.

Browsing Orcasound data via Quilt

An alternative to listing the contents of Orcasound's S3 buckets via the AWS CLI is browsing the buckets via open.quiltdata.com (thanks, Praful!). For example, you can examine the live-streamed data via https://open.quiltdata.com/b/streaming-orcasound-net. Substitute other bucket names as listed above to explore all of our raw and labeled data, and other open resources.

Accessing machine learning resources

The AWS CLI can be used to acquire training and testing data if you're interested in developing machine learning algorithms. Please refer to the Orcadata wiki for further information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

access.md

access.md

9/26/2024 NOTE:

Overview

Installing AWS CLI

Connecting to the buckets

Available buckets

AWS CLI syntax

Browsing Orcasound data via Quilt

Accessing machine learning resources

Files

access.md

Latest commit

History

access.md

File metadata and controls

9/26/2024 NOTE:

Overview

Installing AWS CLI

Connecting to the buckets

Available buckets

AWS CLI syntax

Browsing Orcasound data via Quilt

Accessing machine learning resources