Contributing
How to add a feature, fix a bug, or improve docs without breaking the project’s invariants.
Quick start
git clone https://github.com/Integration-Automation/ThesisAgents.git
cd ThesisAgents
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows PowerShell
# source .venv/bin/activate # macOS / Linux
python -m pip install --upgrade pip
pip install -e ".[dev]"
Make a branch off dev, push, open a PR against main. The
project auto-bumps the patch version and publishes on every
merge to main that passes CI — see Releases.
Definition of Done
Every change must satisfy all of the following before commit. No exceptions — incomplete work stays on the working copy until the gates pass.
Unit tests written and passing. New code without new tests is incomplete; see “Test coverage expected” below.
pytest tests/runs clean. Pre-existing skips are OK; new skips need a written reason.ruff check .reports no new errors.bandit -c pyproject.toml -r thesisagents/ sources/reportsNo issues identified. The-cflag is mandatory — without it bandit ignores the project skip config.End-to-end smoke for changes touching
sources/,thesisagents/exporters/,thesisagents/intelligence/, orthesisagents/mcp/:Search:
thesisagents --query "transformer attention" --source arxiv --max 3 --out ./exports/smoke/— confirm.pptx,.xlsx,.bibland on disk and the deck opens without warnings.PPTX changes: also regenerate an enriched / thesis-style deck against a known paper (see
scripts/regen_*.py) and run the headless overflow check.MCP changes:
python -c "from thesisagents.mcp import build_server; import asyncio; print(asyncio.run(build_server().list_tools()))"— every documented tool present.
No live network calls in tests. Use recorded fixtures under
tests/fixtures/<source>/. Re-recording is a manual step (scripts/record_fixture.py) and the recorded file is committed.Commit message contains no AI tool/model names and no
Co-Authored-Byline.
Commit message rules
NEVER add
Co-Authored-Bylines.NEVER mention “Claude”, “Claude Code”, “AI-generated”, “GPT”, “Copilot”, or any AI tool / model name anywhere — including commit messages, PR titles, PR descriptions, code comments, and documentation.
Use imperative mood for the subject line (“Add”, “Fix”, “Remove” — not “Added”, “Fixes”).
Keep the subject ≤ 72 chars; wrap the body at 72.
Explain why, not just what. The diff already shows what.
Good:
Reject Windows-style absolute paths on POSIX in resolve_safe
Path(“C:/evil/path.txt”).is_absolute() is False on Linux because the host Path class is PosixPath and “C:” is just treated as a directory name. test_resolve_safe_rejects_absolute was passing on Windows but failing on the Linux CI runner.
Cross-check the reference against both PurePosixPath and PureWindowsPath so a drive-letter prefix is rejected regardless of host OS.
Bad:
fix bug
Test coverage expected
For every change:
Coverage |
What |
|---|---|
Happy path |
Representative input, expected output. |
Edge cases |
Empty / one-item / missing-optional-field inputs. |
Error handling |
Every |
Boundary conditions |
Values just inside and outside any limit (max keyword length, max results, year filter). |
Round-trips |
|
Test placement:
tests/test_<module>.pyfor core modules.tests/sources/<name>/test_<name>.pyfor fetchers (with recorded fixtures undertests/fixtures/<source>/).tests/exporters/test_<format>.pyfor exporters.tests/gui/test_<page>.pyfor GUI pages (usespytest-qt).
Use the shared fixtures in tests/conftest.py (http_recorder,
fake_cache, sample_papers, tmp_export_root). Do not roll
your own async loop or httpx client.
Code quality rules
Mirrored from the SonarQube / Codacy / pylint / flake8 / ruff default rule sets:
Complexity
Cognitive complexity ≤ 15 per function (SonarQube
S3776).Cyclomatic complexity ≤ 10 (pylint
R1260, radonC).Function length ≤ 75 logical lines.
File length ≤ 1000 lines (
S104).Parameter count ≤ 7 (group into a dataclass when exceeded).
Nesting depth ≤ 4 (use early returns / guards).
Return statements ≤ 6 per function.
Local variables ≤ 15 per function.
Style
snake_casefor functions / methods / variables / modules.PascalCasefor classes.UPPER_CASE_WITH_UNDERSCORESfor module constants._leading_underscorefor private attributes / methods.No single-letter names except loop indices (
i,j,k) or well-known short forms (qfor query in obvious local scope).
Errors
Never
except:(bare). Always specify exception type.Never
except Exception: passwithout a logged reason + comment.Never catch
BaseExceptiondirectly.Use specific exception types from
thesisagents.core.exceptions. Chain withraise X from errto preserve context (ruffB904).Never use
assertfor runtime validation (assertions are stripped underpython -O). Use explicitraiseinstead.
Smells
No unused imports / variables / params (prefix unused params with
_).No commented-out code (git preserves history).
No
print()in production code; usethesisagents.utils.logging.No
TODO/FIXME/XXXin merged code — file a ticket.No magic numbers — extract to
UPPER_CASEconstants. Exceptions: 0, 1, -1, 2 in obvious contexts.is None/is not None, never== None.isinstance(x, T), nevertype(x) == T.No mutable default args (
def f(x=[])).Prefer f-strings over
.format()or%.Always use context managers (
with/async with).
Security
pickle.load(s)on untrusted data forbidden. Cache uses JSON or msgpack.yaml.loadwithout SafeLoader forbidden — useyaml.safe_load.MD5 / SHA-1 forbidden for security purposes — use SHA-256+. Allowed for cache keys / dedup hashes only with
usedforsecurity=False.subprocesswithshell=Trueforbidden when any argument comes from user input.Never
eval/exec/compileon dynamic input.Never
tempfile.mktemp()— usetempfile.mkstemp()orNamedTemporaryFile.Network binds default to
127.0.0.1, not0.0.0.0.XML parsing uses
defusedxml, never stdlibxml.etreeon untrusted input.HTML parsing uses
beautifulsoup4withlxmlparser.Random for security uses
secrets, notrandom.All
urlopen/httpxcalls go through the project HTTPS-only transport viaget_client(source).
Typing & docs
Public functions and methods MUST have type hints on parameters and return type. Use
pydanticmodels ordataclassesfor structured payloads —list[Paper], not barelist.Public modules and classes SHOULD have a one-line docstring.
Private helpers may omit docstrings if names are self-explanatory.
Suppression comments
Tool |
Comment |
Notes |
|---|---|---|
ruff / flake8 |
|
Must list specific codes. |
bandit |
|
ruff’s |
SonarCloud |
|
Use for hotspots that can’t be config-skipped. |
pylint |
|
Prefer refactor over suppression. |
Every suppression must include a brief justification on the same
line (# nosec B310 # scheme validated immediately above).
Branch / PR conventions
mainis the release branch. Every CI-success push tomaintriggers an auto-bump → PyPI publish → GitHub Release.devis the integration branch. Open PRs againstmainfromdev(or from a feature branch offdev).Feature branches are fine for non-trivial work — branch off
dev, push, open PR todev, thendev→main.[skip release]in a commit message gates off the auto-bump for that push — use for docs / typo / refactor commits where you don’t want to burn a version number.
PR title + body:
Title: imperative mood, ≤ 72 chars.
Body:
## Summary(1-3 sentences on the change) +## Test plan(a checklist of what was verified). Reference the issue number if one exists.Do NOT mention AI tools / models anywhere.
Local CI reproduction
Before pushing, reproduce each gate locally:
# bandit (the -c flag is mandatory)
python -m bandit -c pyproject.toml -r thesisagents/ sources/
# ruff
python -m ruff check .
# pytest
python -m pytest tests/
# search-mode smoke
thesisagents --query "diffusion models" --source arxiv --max 3 --out ./smoke/
# single-paper smoke
thesisagents --paper "https://arxiv.org/abs/1706.03762" --out ./smoke/single/
# (only when touching pptx / i18n) overflow check
python -c "from thesisagents.exporters.pptx import inspect_overflow; \
inspect_overflow('./smoke/your-deck.pptx')"
CI runs the same set on Ubuntu + Windows × Python 3.12 / 3.13 / 3.14 (6 jobs). If your change touches Linux-specific code paths that pass on Windows locally, the Ubuntu cells will catch it.
What lives where (for adding code)
Adding… |
Goes to… |
|---|---|
A new source |
|
A new export format |
|
A new MCP tool |
|
A new CLI flag |
|
A new GUI tab |
|
A new i18n key |
Both tables (UI + deck) if user-facing on both surfaces; one if surface-specific. Always fill all 14 languages in one commit — the coverage tests block partial PRs. |
A new env var |
|
What NOT to add
A direct
httpx.get/requests.get/urllib.request.urlopencall. Always useget_client(source).A
pip install-only dependency in core. Heavy / optional deps belong in an extra ([intelligence],[gui],[mcp],[web]).A
--<service>-api-keyCLI flag. Keys go through env vars or the GUI Settings page.A feature flag for the auto-bump / release flow. The pipeline is intentionally minimal-state.
A new top-level entry point (console script). The three we have (
thesisagents,thesisagents-mcp,thesisagents-gui) cover every surface; new functionality goes as subcommands or tools.
Reviewing PRs
Look for:
The change has tests (Definition of Done #1).
The diff is focused — no unrelated reformatting / “while I was here” cleanups (those go in a separate PR).
The commit message explains why.
No
Co-Authored-Bylines, no AI tool / model names.No
# type: ignore/# noqa/# nosecwithout justification.The change respects the rate-limit policy of any source it touches.
Releases
See Releases for the auto-bump flow, the
[skip release] escape hatch, and the Nuitka exe pipeline.
Asking for help
Open an issue at https://github.com/Integration-Automation/ThesisAgents/issues or a draft PR with a question label. The maintainers are happy to weigh in on design questions before you write the code.