Packaging with Nuitka
Nuitka compiles Python to C and then to a native executable. Compared to PyInstaller it produces smaller and faster binaries but takes much longer to build. Use it when:
You want startup that feels like native code (under a second).
You want some level of bytecode protection (the compiled binary is harder to decompile than a PyInstaller zip).
You’re shipping the binary to end users and the build time is paid once in CI, not every iteration.
Trade-off |
Notes |
|---|---|
Build time |
5–15 minutes on this project (C compilation per module). |
Executable size |
~80–150 MB — roughly half of a PyInstaller build. |
Startup |
<1 second — true native code, no temp-dir unpack. |
Cross-platform |
You must build on the target OS. Cross-compilation is not supported. |
Source protection |
Medium — Python becomes C; recovering the original source needs serious effort. Not security, but not trivial either. |
Install
pip install nuitka
Nuitka also needs a C compiler. On Windows you can use the
MinGW-w64 toolchain that Nuitka offers to download on first run; on
Linux any gcc works; on macOS install Xcode Command Line Tools.
The project-specific gotcha
Source plugins are loaded by name at runtime. The pipeline calls
importlib.import_module("thesisagents.sources.<name>"), which Nuitka’s
static analysis can’t follow. The fix is a single
--include-package=thesisagents: it force-includes every sub-module of
the package, including all of thesisagents.sources.*, so the dynamic
import resolves at runtime.
Note (2026-05 migration). Older build commands listed each source with its own
--include-package=arxivflag and shipped a siblingsources/directory via--include-data-dir=sources=sources. Both are gone: the plugins now live inside the package (thesisagents/sources/), there is no siblingsources/directory (so--include-data-dir=sources=sourcesis a fatal “directory does not exist” error), and the oldsys.pathinjection was removed. One--include-package=thesisagentsreplaces all of it.
Build the CLI (thesisagents)
python -m nuitka `
--standalone `
--onefile `
--output-filename=thesisagents `
--include-package=thesisagents `
--include-package-data=pptx `
--include-package-data=openpyxl `
--assume-yes-for-downloads `
thesisagents/__main__.py
--include-package=thesisagents is what bundles the source plugins: the
pipeline loads them dynamically via
importlib.import_module("thesisagents.sources.<name>"), which Nuitka’s
static analysis can’t see, but --include-package pulls in every
sub-module of the package (including thesisagents.sources.*) regardless.
So you do not list the sources individually, and there is no
--include-data-dir=sources=sources — the plugins live inside the package
(thesisagents/sources/) since the 2026-05 migration, not in a sibling
sources/ directory.
Linux / macOS use the same command (Nuitka’s flag syntax is
OS-neutral; the ; separator quirk that PyInstaller has does not
apply here).
The binary lands at ./thesisagents (or .exe on Windows) in the
working directory. Subsequent rebuilds reuse the cache under
./<entrypoint>.build/ so the first build is slow but later builds
are 2–3 minutes.
Build the MCP server (thesisagents-mcp)
python -m nuitka `
--standalone `
--onefile `
--output-filename=thesisagents-mcp `
--include-package=thesisagents `
--assume-yes-for-downloads `
thesisagents/mcp/__main__.py
Standalone vs onefile
Nuitka has two modes worth knowing:
--standalone— build a directory (thesisagents.dist/) that contains the exe plus every DLL/SO it needs. Distribute the whole folder. No unpack delay on launch. Best for users who can install a folder.--onefile— wrap the standalone dir in a single self-extracting binary that unpacks to a temp dir on launch. Slightly slower startup (~0.5 s) because of the unpack step. Best when you have to ship a single file.
Either form is fine; the commands above include both flags so you
get the onefile binary. Drop --onefile to keep the dist folder
instead.
Verify the executable works
.\thesisagents.exe --query "transformer" --source arxiv --max 3 `
--out .\smoke-nuitka\
Confirm:
No
ModuleNotFoundError— all 15 source plugins were bundled..pptx+.xlsx+.bibland under./smoke-nuitka/.Startup is sub-second (the main win over PyInstaller).
Optional dependencies ([intelligence], [mcp])
Nuitka bundles whatever’s importable in the build venv. To produce a
binary that supports --enrich:
pip install -e .[intelligence]
python -m nuitka ... --include-package=pypdf --include-package=anthropic `
--include-package=pymupdf ...
pymupdf is the most problematic dep across both packagers; see
the “Common issues” section below if the build fails on it.
Common issues
Build runs out of memory. Nuitka’s C compilation can use 4–8 GB
of RAM on a project this size. On a 16 GB machine close other apps;
on smaller systems add --jobs=1 to compile serially and lower the
peak.
pymupdf fails to import in the binary. pymupdf wraps MuPDF
through Cython and ships precompiled native binaries that Nuitka
sometimes mis-routes. Add --include-package-data=pymupdf so the
.so/.dll files land next to the Python module. If still broken,
exclude [intelligence] from the build venv and skip pymupdf — the
LLM-as-agent flow over MCP doesn’t need it.
lxml build slowness or link errors. lxml’s C extension is
already compiled in the wheel; Nuitka just needs to copy it.
--include-package-data=lxml covers it. If the link fails on Linux
ensure libxml2-dev and libxslt1-dev are installed.
Slow CI builds. Cache the Nuitka build dir between runs:
- name: Restore Nuitka cache
uses: actions/cache@v4
with:
path: |
~/.nuitka
thesisagents.build
key: nuitka-${{ runner.os }}-${{ hashFiles('pyproject.toml') }}
Cuts subsequent builds from 15 minutes to 2–3 minutes.
Console encoding on Windows. Same trick as the PyInstaller doc:
set PYTHONUTF8=1 or chcp 65001 before running the binary, so CJK
paper titles in the search output render correctly.
Optional CI job
Mirror the PyInstaller CI sketch but with Nuitka. Note the longer
build time — make it workflow_dispatch-only:
# .github/workflows/packaging.yml — sketch, manual trigger
on: workflow_dispatch
jobs:
nuitka:
runs-on: ${{ matrix.os }}
timeout-minutes: 30
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install -e ".[dev]" nuitka
- name: Cache Nuitka build
uses: actions/cache@v4
with:
path: |
~/.nuitka
thesisagents.build
key: nuitka-${{ runner.os }}-${{ hashFiles('pyproject.toml') }}
- run: |
python -m nuitka --standalone --onefile \
--output-filename=thesisagents \
--include-package=thesisagents \
--assume-yes-for-downloads \
thesisagents/__main__.py
- uses: actions/upload-artifact@v4
with:
name: thesisagents-${{ matrix.os }}
path: |
thesisagents
thesisagents.exe
When to pick which
Scenario |
Use |
|---|---|
You want a single binary tomorrow with minimal fuss. |
PyInstaller |
Build time is paid in CI, end users run it many times. |
Nuitka |
Bundle size or startup matters. |
Nuitka |
You want to bytecode-protect commercial use. |
Nuitka |
You iterate on the build script frequently. |
PyInstaller |