AI Engineer Guide

How to Optimize OpenAI transcriptions faster and cheaper

I came across this wonderful tip by George Mandis where he has covered step by step process on how to do it

How does it works?

Speed up the audio by 2x or 3x and get results faster and cheaper.

Note: gpt-4o-transcribe charges by a min.

Dependency

brew install yt-dlp ffmpeg 

How to do it?

1. Extract the audio from video

# Extract the audio from the video
yt-dlp -f 'bestaudio[ext=m4a]' --extract-audio --audio-format m4a -o 'video-audio.m4a' "https://www.youtube.com/watch?v=LCEmiRjPEtQ" -k;

2. Create a low-bitrate MP3 version at 3x speed

ffmpeg -i "video-audio.m4a" -filter:a "atempo=3.0" -ac 1 -b:a 64k video-audio-3x.mp3;

3. Transcription

# Send it along to OpenAI for a transcription
curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header "Authorization: Bearer $OPENAI_API_KEY" \
  --header 'Content-Type: multipart/form-data' \
  --form file=@video-audio-3x.mp3 \
  --form model=gpt-4o-transcribe > video-transcript.txt;

4. Summarize of the transcripte

# Send it along to OpenAI for a transcription
curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header "Authorization: Bearer $OPENAI_API_KEY" \
  --header 'Content-Type: multipart/form-data' \
  --form file=@video-audio-3x.mp3 \
  --form model=gpt-4o-transcribe > video-transcript.txt;

References

#OpenAI

Want to stay updated?
Sign up for the newsletter.

No spam. Unsubscribe at any time.