Skip to content

Generate podcast transcripts using language and speech-to-text models

License

Notifications You must be signed in to change notification settings

deepakjois/podscript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

podscript

podscript is a tool to generate transcripts for podcasts (and other similar audio files), using LLMs and other Speech-to-Text (STT) APIs. Currently, ChatGPT, Anthropic, Deepgram and Groq are supported.

Prerequisites

You need an API key for at least one of the following services to use podscript:

  • ChatGPT API Key or Anthropic API Key, to clean up and transcribe YouTube autogenerated captions using either OpenAI's ChatGPT model or Anthropic's Claude model.
  • Deepgram API Key to transcribe any podcast audio file. Deepgram has some excellent and cheap STT models and offers free signup and $200 in credit to get started.
  • Groq API Key to clean up and transcribe YouTube autogenerated captions, or use Groq's whisper-v3-large model to transcribe an audio file.
  • (more APIs, for e.g. OpenAI Whisper will be supported in the future. Contributions are welcome).

Install

> go install github.com/deepakjois/podscript@latest

> ~/go/bin/podscript --help

Configure

This command displays prompts to enter API keys for supported services, and write them to $HOME/.podscript.toml.

> podscript configure

Alternatively, you can set keys in environment variable prefixed with PODSCRIPT_, for e.g. PODSCRIPT_OPENAI_API_KEY and PODSCRIPT_DEEPGRAM_API_KEY.

Usage

Transcript from YouTube autogenerated captions

For podcasts on YouTube with autogenerated captions (e.g. Andrew Huberman and Cal Newport), use the ytt subcommand to download the captions from the YouTube video and feed it to an LLM model to generate a clean transcript. You can customise the model used for transcription using the --model flag, which can be one of gpt-4o-mini (default if ommitted), gpt-4o, claude-3-5-sonnet-20240620 or llama-3.1-70b-versatile.

> podscript ytt https://www.youtube.com/watch?v=aO1-6X_f74M

To customise the path and add a recognizable suffix to the transcripts, use the --path and --suffix options

> podscript ytt https://www.youtube.com/watch?v=aO1-6X_f74M --suffix short --path ~/Downloads

Sample Output:

wrote raw autogenerated captions to /Users/deepak/Downloads/raw_transcript_2024-07-05-170548_short.txt
transcribed part 1/1…
wrote cleaned up transcripts to /Users/deepak/Downloads/cleaned_transcript_2024-07-05-170548_short.txt

You can also customise the model used for transcription using the --model flag, which can be one of gpt-4o-mini (default if ommitted), gpt-4o or claude-3-5-sonnet-20240620.

Transcript from Deepgram API

Use the deepgram subcommand to generate transcripts that are of a higher quality than YouTube autogenerated captions. Deepgram provides a great API (with $200 free signup credit!) and excellent, fast models for transcribing audio files.

Locate the audio file link for any podcast on ListenNotes and use the --from-url option

> podscript deepgram --from-url  https://audio.listennotes.com/e/p/d6cc86364eb540c1a30a1cac2b77b82c/

Sample Output:

podscript deepgram --from-url  https://audio.listennotes.com/e/p/d6cc86364eb540c1a30a1cac2b77b82c/
wrote raw JSON API response to deepgram_api_response_2024-07-05-173538.json
wrote transcript to deepgram_api_response_2024-07-05-173538.json

Alternatively, you can pass a local audio file to the command by setting --from-file instead of --from-url. You can also customise the path and add a recognizable suffix with --path and --suffix options.

Tip

You can find the audio download link for a podcast on ListenNotes under the More menu

image

Transcript from Groq Whisper API

Use the groq subcommand to generate transcripts using the whisper-v3-large model from Groq's API endpoint (which as of Jul 2024 is in beta and free to use within your rate limits).

> podscript groq huberman.mp3

Sample Output:

wrote raw JSON API response to groq_whisper_api_response_2024-07-11-145154.json
wrote transcript to groq_whisper_api_transcript_2024-07-11-145154.txt

Use the --verbose flag to dump timestamps for audio segments in the raw JSON response.

Feedback

Feel free to drop me a note on X or Email Me

License

MIT

About

Generate podcast transcripts using language and speech-to-text models

Topics

Resources

License

Stars

Watchers

Forks

Languages