CLI reference
python -m thesisagents (also installed as the thesisagents
console script) is the canonical entrypoint. It has two mutually
exclusive modes:
Search mode —
--query <keywords>runs the search pipeline against the requested source(s).Single-paper mode —
--paper <identifier>resolves one paper by arXiv ID / URL, DOI, PMID, or IEEE document URL.
Usage
thesisagents (--query KEYWORDS | --paper IDENTIFIER)
[--source SOURCES] [--exclude-source SOURCES]
[--max N]
[--year-from YEAR] [--year-to YEAR] [--min-citations N]
[--export FORMATS]
[--out DIR]
[--filename-stem STEM]
[--no-abstract]
[--lang LANG]
[--enrich] [--lightweight]
[--llm-model MODEL]
[--all-venues]
[--paywall-threshold FLOAT] [--yes]
[--max-slides N] [--light-mode]
[--quiet]
Flags
Flag |
Default |
Notes |
|---|---|---|
|
— |
Keywords; mutually exclusive with |
|
— |
arXiv ( |
|
default mix |
Comma-separated. Available: |
|
— |
Comma-separated sources to remove from the mix, subtracted after |
|
|
Range 1..200. |
|
— |
Inclusive year filter. |
|
— |
Drop papers below this citation count, enforced across all sources (not just Semantic Scholar). Papers whose source reports no count are kept. Omit for no minimum. |
|
mode-specific |
Any of |
|
— |
Print the available search sources / export formats and exit (no query needed). |
|
|
Created if missing. |
|
auto |
|
|
off |
Drops abstracts and any LLM summary content; the deck shows only title / author / link slides. |
|
|
Slide-deck template language. Supported: |
|
auto-on when |
Fetch each paper’s PDF and have the Anthropic API write a structured summary; the deck switches to thesis-style layout. Requires |
|
off |
Force the abstract-only deck even when |
|
|
Override the default model used when |
|
off |
Restrict results to the curated top-tier CS venue whitelist (S&P / CCS / NDSS / USENIX Security / NeurIPS / ICML / ICSE / SIGMOD / SIGCOMM / CHI / etc.) + arXiv pass-through. Off by default so IEEE / ACM workshop papers (which dominate “LLM × security” / “LLM × X” topics) survive. |
|
off |
Skip the open-access PDF resolver step that runs after dedup. By default the pipeline looks up every paper without |
|
|
Fraction of paywalled results above which the search-mode pipeline asks the user before generating per-paper PPTs. |
|
off |
Auto-accept the paywall prompt. |
|
|
Per-paper slide cap. Pass |
|
off |
Render the pptx in light mode (white slide background + navy text). Dark mode is the default — the exporter swaps the brand palette via a post-build pass (slide bg |
|
off |
Suppress the per-paper one-line printout to stdout. |
Examples
Keyword search
# Default exports: pptx + xlsx + bib
thesisagents --query "diffusion models" --source arxiv --max 10 \
--out ./exports/
# Restrict to recent work + custom filename
thesisagents --query "graph neural network drug discovery" \
--year-from 2022 --year-to 2025 \
--max 15 --export pptx,xlsx,bib --out ./exports/ \
--filename-stem gnn-drug-review
Single paper
thesisagents --paper 2401.08741 --out ./exports/
thesisagents --paper "https://arxiv.org/abs/1706.03762" \
--filename-stem attention --out ./exports/
thesisagents --paper "https://pubmed.ncbi.nlm.nih.gov/34567890/" \
--out ./exports/
thesisagents --paper "https://ieeexplore.ieee.org/document/10965643" \
--out ./exports/
Local PDF (single or batch)
# One PDF — title / authors / year / DOI / arXiv ID / real abstract are
# extracted heuristically from the PDF front matter.
thesisagents --pdf ./papers/attention.pdf --out ./exports/
# Override any extracted field with a flag (only applies when exactly
# one PDF is passed).
thesisagents --pdf ./papers/preprint.pdf \
--title "Custom Title" --authors "A. Smith, B. Jones" \
--year 2025 --venue "NeurIPS 2025" \
--out ./exports/
# Directory — every *.pdf is read, metadata-extracted, and emitted as its
# own deck named after its BibTeX key (e.g. wang2024diffusion.pptx).
thesisagents --pdf ./papers/ --out ./exports/
Localised deck
thesisagents --paper 1706.03762 --lang zh-tw --out ./exports/
thesisagents --paper 1706.03762 --lang ja --out ./exports/
thesisagents --paper 1706.03762 --lang fr --out ./exports/
thesisagents --paper 1706.03762 --lang de --out ./exports/
thesisagents --paper 1706.03762 --lang ko --out ./exports/
# Also supported: es, pt, ru, it, vi, hi, id
Enriched thesis-style deck (Python pipeline)
export ANTHROPIC_API_KEY=sk-ant-...
thesisagents --paper "https://arxiv.org/abs/1706.03762" \
--enrich --lang zh-tw --out ./exports/
When --enrich is on, ThesisAgents downloads the PDF, sends the body
text + paper metadata to Claude (claude-opus-4-7 by default), parses
back a structured PaperSummary (motivation, contributions, method,
results, limitations, takeaways — plus the rich tier: pain points,
research question, KPI metrics, technique table, literature
positioning, per-RQ result tables, …), and the PPT exporter renders the
thesis-style layout.
Exit codes
Code |
Meaning |
|---|---|
|
Success — every requested export was written. |
|
Search returned zero results, or the single paper had no metadata. |
|
Validation error (unknown source, malformed identifier, bad year range, missing API key when |
Output structure
exports/
├── diffusion-models-20260515-001027.pptx
├── diffusion-models-20260515-001027.xlsx
├── diffusion-models-20260515-001027.bib
└── diffusion-models-20260515-001027.json # only when --export includes json
Filenames are derived from a sanitised slug of the keyword + timestamp;
pass --filename-stem to fix the stem. The .pptx file produced here
can be edited via the pptx_* MCP tools or the pptx_edit Python
module — see pptx editing.
Source plugin opt-ins
Some plugins are opt-in either because their upstream terms restrict automated traffic, or because the upstream service needs an API key that we cannot ship in the repo:
# IEEE — official API path (anonymous-safe, no Chrome needed)
export THESISAGENTS_IEEE_API_KEY=...
thesisagents --paper "https://ieeexplore.ieee.org/document/10965643" --out ./exports/
# IEEE — default visible-Chrome path (no key needed; works if you have VPN/subscription)
# IEEE is default-ON; opt out only on CI / no-Chrome:
# export THESISAGENTS_DISABLE_IEEE_SCRAPING=1
thesisagents --paper "https://ieeexplore.ieee.org/document/10965643" --out ./exports/
# Springer Nature — free API key from https://dev.springernature.com/
export THESISAGENTS_SPRINGER_API_KEY=...
thesisagents --query "diffusion models" --source springer --out ./exports/
# Google Scholar — default-ON via visible Chrome
# Opt out (e.g. on CI) with: export THESISAGENTS_DISABLE_SCHOLAR_SCRAPING=1
thesisagents --query "attention mechanism" --source scholar --out ./exports/
# Persistent Chrome profile — set once, VPN/SSO + Google sign-in survive across runs
export THESISAGENTS_CHROME_PROFILE_DIR=~/.cache/thesisagents-chrome
thesisagents --query "speculative decoding" --out ./exports/
Other source-related env vars (all optional):
Variable |
Effect |
|---|---|
|
Higher rate limit on Semantic Scholar. |
|
Raises PubMed’s anonymous limit (3 → 10 req/s). |
|
Sent to Crossref / OpenAlex ( |
|
Attached as |
|
Path to a Netscape-format |