Building a Telegram Bot That Podcasts Journal Articles with AI

Overview

I built a small personal tool that turns academic journal articles into audio podcast summaries and delivers them to my phone via Telegram — automatically, every morning.

Source code (private for now) is available on GitHub.

This post walks through the architecture, the AI pipeline, and the practical details of wiring it all together.

1 Motivation

Reading journal articles is time-consuming, and skimming abstracts often misses the substance. I wanted something that could:

Accept a paper (URL, pasted text, or PDF)
Distill the 5 key takeaways into a spoken, conversational summary
Deliver a polished MP3 directly to my phone
Run on autopilot every morning with the latest relevant paper from Semantic Scholar

The result is a Python bot that lives on my laptop and communicates through Telegram.

2 Architecture

The project is three focused modules:

bot.py          # Telegram webhook server + message handlers + daily scheduler
processor.py    # Article fetch / PDF extract → Claude AI → OpenAI TTS → MP3
scholar.py      # Semantic Scholar API client for the daily paper feed

The flow looks like this:

User sends URL / text / PDF
  ↓
bot.py receives message
  ↓
processor.py fetches & extracts text
  ↓
Claude generates podcast script (~500 words, 5 takeaways)
  ↓
OpenAI TTS renders MP3 (voice: nova)
  ↓
Bot sends audio back to Telegram

3 The AI Pipeline (`processor.py`)

3.1 Step 1 — Text extraction

Three input types are handled:

URL: requests + BeautifulSoup strips scripts/styles and joins <p> tags. Paywalled sites return a friendly error asking for direct paste.
Plain text: passed through directly.
PDF: PyMuPDF (fitz) extracts raw text page by page.

In all cases text is trimmed to ~8,000 characters to stay within token limits.

3.2 Step 2 — Script generation with Claude

PODCAST_PROMPT = """\
You are a podcast scriptwriter. Given the following article, extract the 5 most
important takeaways and rewrite them as a short, engaging, conversational podcast
segment (no headers, no bullet points — flowing speech only). Keep it under 500 words.

ARTICLE:
{text}"""

The model used is claude-sonnet-4-6 via the Anthropic Python SDK. The 500-word cap keeps the output comfortably under OpenAI TTS’s ~4,096-token input limit.

3.3 Step 3 — Text-to-speech with OpenAI

The MP3 is saved to a temp audio/ directory, sent to Telegram, then immediately deleted to save disk space.

response = _openai.audio.speech.create(
    model="tts-1",
    voice="nova",   # alloy | echo | fable | onyx | nova | shimmer
    input=script,
)
response.stream_to_file(str(mp3_path))

4 The Daily Paper Feed (`scholar.py`)

Every morning at a scheduled time, the bot:

Queries the Semantic Scholar API for recent papers matching a custom keyword query.
Sorts results by publicationDate (newest first).
Skips papers already sent (tracked in sent_papers.json by paper ID and lowercase title).
Formats the paper metadata + abstract and passes it through the same AI → TTS pipeline.
Sends a text summary card followed by the MP3 to a configured Telegram chat.

Rate-limit handling (HTTP 429) uses exponential backoff: 15 → 30 → 60 → 120 seconds between retries.

5 The Telegram Bot (`bot.py`)

Built on python-telegram-bot 21.* with a local webhook (tunnelled via ngrok on the free tier).

Handlers registered:

Handler	Trigger
`/start`	Welcome message with usage instructions
`MessageHandler(TEXT)`	URL or pasted article text
`MessageHandler(Document.PDF)`	PDF file upload
APScheduler `CronTrigger(hour=8)`	Daily paper job

The scheduler runs inside the same async event loop via AsyncIOScheduler from APScheduler <4 (the 4.x release has a different API).

6 Environment Setup

Dependencies:

python-telegram-bot[webhooks]==21.*
anthropic
openai
requests
beautifulsoup4
python-dotenv
pymupdf
apscheduler>=3.10,<4

Required environment variables (stored in .env, never committed):

TELEGRAM_TOKEN=<your bot token from @BotFather>
ANTHROPIC_API_KEY=<your Anthropic key>
OPENAI_API_KEY=<your OpenAI key>
WEBHOOK_URL=https://<ngrok-subdomain>.ngrok-free.app
TELEGRAM_CHAT_ID=<numeric ID of your personal chat>
TIMEZONE=America/New_York   # adjust to your local timezone

Running it:

# Terminal 1 — expose local port via ngrok
ngrok http 8080

# Terminal 2 — run the bot (caffeinate prevents Mac sleep)
caffeinate -i venv/bin/python bot.py

Note: ngrok generates a new subdomain on every restart (free tier). Update WEBHOOK_URL in .env and restart the bot each time.

7 Known Limitations

Issue	Notes
Paywalled sites (e.g. Wiley, Elsevier)	Return 403 — paste the article text directly
ngrok URL changes on restart	Free tier only; a paid static domain avoids this
Mac must stay awake	`caffeinate` handles this while the terminal is open
OpenAI TTS token limit	Scripts capped at 500 words to stay within ~4 096 tokens

8 Reflections

The project took an afternoon to wire together. The most interesting part was getting the prompt right — Claude’s first drafts were too listy and report-like, so the explicit instruction “flowing speech only, no bullet points” was necessary to get something that sounds natural when spoken aloud.

The daily paper feature turned out to be the most useful part in practice. Getting a two-minute audio digest of a new paper with my morning coffee is a habit I didn’t know I wanted.

--- title: "Building a Telegram Bot That Podcasts Journal Articles with AI" date: "2026-03-12" author: "Jihong Zhang" date-modified: "2026-03-12" draft: false image: apple-podcasts.png categories: - python - AI - tools - NLP format: html: toc: true toc-expand: 2 code-fold: false code-line-numbers: false code-overflow: wrap --- ::: objectives ## Overview {.unnumbered} I built a small personal tool that turns academic journal articles into audio podcast summaries and delivers them to my phone via Telegram — automatically, every morning. Source code (private for now) is available on [GitHub](https://github.com/JihongZ/telegram-podcast-bot). This post walks through the architecture, the AI pipeline, and the practical details of wiring it all together. ::: ## Motivation Reading journal articles is time-consuming, and skimming abstracts often misses the substance. I wanted something that could: 1. Accept a paper (URL, pasted text, or PDF) 2. Distill the 5 key takeaways into a spoken, conversational summary 3. Deliver a polished MP3 directly to my phone 4. Run on autopilot every morning with the latest relevant paper from Semantic Scholar The result is a Python bot that lives on my laptop and communicates through Telegram. ## Architecture The project is three focused modules: ```{.bash} bot.py # Telegram webhook server + message handlers + daily scheduler processor.py # Article fetch / PDF extract → Claude AI → OpenAI TTS → MP3 scholar.py # Semantic Scholar API client for the daily paper feed ``` The flow looks like this: ``` User sends URL / text / PDF ↓ bot.py receives message ↓ processor.py fetches & extracts text ↓ Claude generates podcast script (~500 words, 5 takeaways) ↓ OpenAI TTS renders MP3 (voice: nova) ↓ Bot sends audio back to Telegram ``` ## The AI Pipeline (`processor.py`) ### Step 1 — Text extraction Three input types are handled: - **URL**: requests + BeautifulSoup strips scripts/styles and joins `<p>` tags. Paywalled sites return a friendly error asking for direct paste. - **Plain text**: passed through directly. - **PDF**: PyMuPDF (`fitz`) extracts raw text page by page. In all cases text is trimmed to ~8,000 characters to stay within token limits. ### Step 2 — Script generation with Claude ```{.python} PODCAST_PROMPT = """\ You are a podcast scriptwriter. Given the following article, extract the 5 most important takeaways and rewrite them as a short, engaging, conversational podcast segment (no headers, no bullet points — flowing speech only). Keep it under 500 words. ARTICLE: {text}""" ``` The model used is `claude-sonnet-4-6` via the Anthropic Python SDK. The 500-word cap keeps the output comfortably under OpenAI TTS's ~4,096-token input limit. ### Step 3 — Text-to-speech with OpenAI The MP3 is saved to a temp `audio/` directory, sent to Telegram, then immediately deleted to save disk space. ```{.python} response = _openai.audio.speech.create( model="tts-1", voice="nova", # alloy | echo | fable | onyx | nova | shimmer input=script, ) response.stream_to_file(str(mp3_path)) ``` ## The Daily Paper Feed (`scholar.py`) Every morning at a scheduled time, the bot: 1. Queries the [Semantic Scholar API](https://api.semanticscholar.org/) for recent papers matching a custom keyword query. 2. Sorts results by `publicationDate` (newest first). 3. Skips papers already sent (tracked in `sent_papers.json` by paper ID and lowercase title). 4. Formats the paper metadata + abstract and passes it through the same AI → TTS pipeline. 5. Sends a text summary card followed by the MP3 to a configured Telegram chat. Rate-limit handling (HTTP 429) uses exponential backoff: 15 → 30 → 60 → 120 seconds between retries. ## The Telegram Bot (`bot.py`) Built on `python-telegram-bot 21.*` with a local webhook (tunnelled via ngrok on the free tier). Handlers registered: | Handler | Trigger | |:--------|:--------| | `/start` | Welcome message with usage instructions | | `MessageHandler(TEXT)` | URL or pasted article text | | `MessageHandler(Document.PDF)` | PDF file upload | | APScheduler `CronTrigger(hour=8)` | Daily paper job | The scheduler runs inside the same async event loop via `AsyncIOScheduler` from APScheduler `<4` (the 4.x release has a different API). ## Environment Setup Dependencies: ```{.default} python-telegram-bot[webhooks]==21.* anthropic openai requests beautifulsoup4 python-dotenv pymupdf apscheduler>=3.10,<4 ``` Required environment variables (stored in `.env`, never committed): ```{.default} TELEGRAM_TOKEN=<your bot token from @BotFather> ANTHROPIC_API_KEY=<your Anthropic key> OPENAI_API_KEY=<your OpenAI key> WEBHOOK_URL=https://<ngrok-subdomain>.ngrok-free.app TELEGRAM_CHAT_ID=<numeric ID of your personal chat> TIMEZONE=America/New_York # adjust to your local timezone ``` Running it: ```{.bash} # Terminal 1 — expose local port via ngrok ngrok http 8080 # Terminal 2 — run the bot (caffeinate prevents Mac sleep) caffeinate -i venv/bin/python bot.py ``` > **Note:** ngrok generates a new subdomain on every restart (free tier). Update `WEBHOOK_URL` in `.env` and restart the bot each time. ## Known Limitations | Issue | Notes | |:------|:------| | Paywalled sites (e.g. Wiley, Elsevier) | Return 403 — paste the article text directly | | ngrok URL changes on restart | Free tier only; a paid static domain avoids this | | Mac must stay awake | `caffeinate` handles this while the terminal is open | | OpenAI TTS token limit | Scripts capped at 500 words to stay within ~4 096 tokens | ## Reflections The project took an afternoon to wire together. The most interesting part was getting the prompt right — Claude's first drafts were too listy and report-like, so the explicit instruction *"flowing speech only, no bullet points"* was necessary to get something that sounds natural when spoken aloud. The daily paper feature turned out to be the most useful part in practice. Getting a two-minute audio digest of a new paper with my morning coffee is a habit I didn't know I wanted.