Configuration reference
Every knob the user can turn — environment variables, CLI flags that map to env vars, GUI Settings page fields, and the on-disk QSettings store. This page is the single source of truth; other docs link here.
Environment variables
All env vars are read at the moment a fetcher or extra is
constructed, not lazily — so a set ANTHROPIC_API_KEY=...
after the CLI is running has no effect. Set them in the shell
before launching, or use the GUI’s Settings page, which mirrors
each value into os.environ before any fetcher initialises.
LLM enrichment
Variable |
Default |
Effect |
|---|---|---|
|
unset |
Required for the Python |
|
|
Override the model used when |
Source plugin keys
Variable |
Default |
Effect |
|---|---|---|
|
unset |
Higher rate limit on the Semantic Scholar plugin (1/s anonymous → 10/s with key). Also used by the OA resolver’s S2 |
|
unset |
Raises PubMed’s anonymous limit from 3 req/s to 10 req/s. |
|
unset |
Switches the IEEE plugin from the scrape fallback to the official Xplore API ( |
|
unset |
IEEE plugin is now default-ON. Set |
|
unset |
Free key from https://dev.springernature.com/. Required for the Springer plugin — it raises |
|
unset |
Crossref Plus subscriber token. Attached to requests as |
|
unset |
Scholar plugin is now default-ON. Set |
|
unset |
Scholar + IEEE plugins + the PDF downloader for paywalled publisher CDNs all default to driving a real visible Chrome through WebRunner ( |
|
unset |
When set, passes |
|
unset |
Free key from https://core.ac.uk/services/api. Enables both the OA resolver’s CORE.ac.uk lookup step (200M+ institutional / regional OA repository items) and the |
|
unset |
Sent to Crossref / OpenAlex as the |
PDF download
Variable |
Default |
Effect |
|---|---|---|
|
unset |
Path to a Netscape-format |
Logging / debug
Variable |
Default |
Effect |
|---|---|---|
|
|
Set to |
Qt / GUI
These are read by Qt itself (not by ThesisAgents), but the GUI sets sensible defaults when they’re absent. See the GUI doc for the full HiDPI story.
Variable |
Default |
Effect |
|---|---|---|
|
|
Enables Qt’s HiDPI scaling. |
|
|
Lets fractional scale factors (125%, 150%) flow through unchanged. |
|
OS default |
Set to |
QSettings on-disk store
When the GUI’s Settings → Save is clicked, the values land
in a per-OS persistent store and are mirrored into
os.environ for the current process. Each subsequent launch
of the GUI calls apply_saved_env() at startup, so the env
vars are re-applied automatically.
Storage locations
OS |
Path |
|---|---|
Windows |
|
macOS |
|
Linux |
|
The QSettings organisation name is ThesisAgents and the
application name is ThesisAgents. Tests inject a temporary
storage path via QSettings.setPath(...) so they never touch
the user’s real store.
Keys
QSettings key |
Env var mirrored to |
Type |
|---|---|---|
|
|
string |
|
|
string |
|
|
string |
|
|
string |
|
|
string |
|
|
string |
|
|
string |
|
|
string (absolute path) |
|
(not mirrored) |
string — a BCP-47 code from the 14 supported languages |
CLI flags ↔ env var equivalents
Many CLI flags have an environment-variable counterpart so unattended deployments don’t need to pass long argument lists.
CLI flag |
Env var |
Notes |
|---|---|---|
|
|
CLI wins if both are set. |
That’s the only one today; everything else is per-source plugin
config above. The CLI explicitly does NOT read API keys via
flags (no --anthropic-key) — passing a key on the command
line writes it to shell history, which is a security footgun.
CLI defaults
Behaviour you don’t see in --help because it’s hard-coded:
Setting |
Value |
Where |
|---|---|---|
Default page size ( |
|
|
Max results per source ( |
|
|
Default cache TTL |
|
|
Default output dir |
|
|
Default export formats ( |
|
|
Default export formats ( |
|
|
Default slide language ( |
|
|
Default max slides per paper |
|
|
Paywall warning threshold |
|
|
Top-tier-only filter |
on (off via |
|
Per-source rate limits
Defined in each plugin’s config.py; enforced by a per-source
token bucket in thesisagents.fetchers.rate_limit. The
defaults match each upstream’s published or observed soft limit:
Source |
Rate |
Jitter |
Notes |
|---|---|---|---|
|
1 req / 3 s |
0.5 s |
Matches arXiv’s API ToS. |
|
1 req / s |
0.1 s |
Anonymous limit; 10/s with API key. |
|
10 req / s |
0.1 s |
Polite pool (with |
|
3 req / s |
0.1 s |
Anonymous; 10/s with |
|
50 req / s |
0.05 s |
Crossref public; Plus token raises further. |
|
1 req / 2 s |
0.5 s |
Conservative — DBLP is single-server. |
|
50 req / s |
0.05 s |
Polite pool with |
|
2 req / s |
0.2 s |
Conservative — OpenAIRE rate limits are not published. |
|
5 req / s |
0.2 s |
Half of Europe PMC’s stated 10 req/s ceiling. |
|
2 req / s |
0.3 s |
DOAJ asks for ≤ 2 req/s from a single client. |
|
2 req / s |
0.3 s |
Polite — HAL publishes no per-second cap. |
|
10 req / s |
0.1 s |
Per the IEEE Xplore API ToS. |
|
1 req / 5 s |
1.0 s |
ToS-grey — extra-conservative. |
|
5 req / s |
0.2 s |
Per the Springer Meta API ToS. |
|
1 req / s |
0.3 s |
Free-tier headroom; needs |
|
1 req / 10 s |
2.0 s |
ToS forbids scraping — extra-conservative. |
These are enforced by a decorator on the HTTP client; retries on 429 / 5xx also go through the bucket so a burst can’t slip past the limit.
Cache layout
thesisagents.core.cache (used internally by fetchers) keys
every raw response by sha256(source + normalised_query + page)
and stores under ${XDG_CACHE_HOME:-~/.cache}/thesisagents/.
Override via:
Variable |
Default |
Effect |
|---|---|---|
|
|
Override the cache root. The autouse |
Clear the cache by deleting the directory; ThesisAgents re-creates it on demand.
Suppressing Scholar captchas with a persistent Chrome profile
Google flags an IP after a few automated Scholar requests even with WebRunner’s real-browser path. The reliable workaround is to seed a persistent Chrome profile with a real Google sign-in once; subsequent headless runs reuse the same session cookies, which Google trusts.
One-time setup:
# 1. Pick a directory anywhere on disk
$env:THESISAGENTS_CHROME_PROFILE_DIR = "D:\thesisagents-scholar-profile"
# 2. Open Chrome visibly and trigger one Scholar request
$env:THESISAGENTS_CHROME_HEADLESS = "0"
thesisagents --query "any keywords" --source scholar --max 1 --out .\tmp\
# Chrome opens. Sign into your Google account, accept any consent
# banners, complete any captcha. The window holds open for 60s.
Every run after that:
$env:THESISAGENTS_CHROME_PROFILE_DIR = "D:\thesisagents-scholar-profile"
Remove-Item Env:\THESISAGENTS_CHROME_HEADLESS # back to headless
thesisagents --query "..." --out .\exports\
Chrome boots headless but loads the same profile dir, sends your authenticated Google session cookie, and Scholar serves real results instead of a captcha page.
Caveats:
Only one Chrome process can hold the profile dir at a time. If you have a regular Chrome open on the same profile path, the WebRunner instance will fail to start. Use a dedicated path.
The session cookie is a real authentication credential. Treat the profile directory like a secret — back it up if you re-image the machine, restrict file permissions.
Cookie eventually expires (~1-2 months for Google). Re-do the interactive sign-in then.
Settings the project explicitly does NOT have
By design — listing them so a contributor doesn’t accidentally add them.
No
--anthropic-key/--ieee-key/ etc. CLI flag. Keys on the command line land in shell history, a screen-sharing reveal, and process listings. Use env vars or the GUI.No global rate-limit override. Per-source buckets are the enforcement boundary; a global override would let users break a single source’s ToS by upping the wrong limit.
No “skip robots.txt” flag. The scrape-based plugins are off-by-default precisely because of robots / ToS concerns.
No
--insecure/--allow-httpflag. All egress is HTTPS, enforced by the project’s transport wrapper. Plain-HTTP URLs are refused even after a redirect.