Maintainers’ Guide

This document outlines essential guidelines for maintaining the Clusterium project. It provides instructions for testing, building, and deploying the package, as well as managing CI workflows.

Overview

The Clusterium project is managed via Poetry and adheres to modern Python packaging standards. This guide assumes familiarity with GitHub Actions, Poetry, and common Python development workflows.

Key configurations:

  • Python Versions Supported: >= 3.11 (tested on 3.11, 3.12, and 3.13)

  • Build Tool: poetry >= 2.0

  • Primary Dependencies: numpy, sentence-transformers, scipy, matplotlib

  • Documentation Tool: sphinx with Read the Docs theme

  • Testing Tools: pytest, coverage

  • Linting Tools: black, flake8, isort, pylint

Note

While the project provides a Makefile to simplify common development tasks, all operations can also be performed using direct commands without requiring the make program. This guide includes both Makefile commands and their direct command equivalents.

Development Environment

The project provides a Makefile to simplify common development tasks.

Prerequisites

To use the provided Makefile commands, you need to have the make program installed on your system:

  • Linux/macOS: Usually pre-installed or available through package managers (apt-get install make, brew install make)

  • Windows: Available through tools like MSYS2, MinGW, Cygwin, or Windows Subsystem for Linux (WSL)

If you don’t have make installed or prefer not to use it, equivalent direct commands are provided throughout this guide.

Setting Up

Clone the repository and install dependencies:

git clone https://github.com/sergeyklay/clusterium.git
cd clusterium
make install

If you don’t have make installed, you can use the equivalent Poetry command:

git clone https://github.com/sergeyklay/clusterium.git
cd clusterium
poetry install

This will install the package and all its dependencies using Poetry.

Note

For details on the installation of the development environment, see Installation section.

Available Make Commands

The project includes several make commands to streamline development:

make help              # Show available commands and environment information
make install           # Install the package and dependencies
make test              # Run tests
make ccov              # Generate combined coverage reports
make format            # Format code using black and isort
make format-check      # Check code formatting
make lint              # Run linters
make docs              # Test and build documentation
make clean             # Remove build artifacts and directories

For each make command, equivalent direct commands are provided in the relevant sections below.

Testing the Project

Unit tests and coverage reporting are managed using pytest and coverage.

Running Tests Locally

Run tests using the make command:

make test

Or manually with Poetry:

coverage erase

coverage run -m pytest ./clusx ./tests

coverage combine
coverage report

Generate Coverage Reports

Generate HTML, XML, and LCOV coverage reports:

make ccov

This will create reports in the coverage/ directory with subdirectories for each format.

Without make, use these Poetry commands:

mkdir -p coverage/html coverage/xml coverage/lcov
coverage combine || true
coverage report
coverage html -d coverage/html
coverage xml -o coverage/xml/coverage.xml

CI Workflow

Tests are executed automatically on supported platforms and Python versions (3.11, 3.12, and 3.13) on Ubuntu. See the configuration in .github/workflows/ci.yml.

The CI workflow includes:

  • Code formatting verification

  • Linting checks

  • Unit tests with coverage reporting

  • Coverage report upload to Codecov

Building the Package

The clusx package is distributed in wheel and sdist formats.

Local Build

Install build dependencies:

poetry install

Build the package:

poetry build

Verify the built package:

pip install dist/*.whl
clusx --help

CI Workflow

The build workflow in .github/workflows/cd.yml ensures the package is built and verified across multiple Python versions.

Documentation Management

Documentation is written using sphinx with the Read the Docs theme.

Building Documentation Locally

Install documentation dependencies:

poetry install --with docs

Build the documentation using the Makefile from the root directory:

make docs

Or build directly with sphinx:

# Test documentation files
python -m doctest CONTRIBUTING.rst README.rst

# Build HTML documentation
python -m sphinx \
   --jobs auto \
   --builder html \
   --nitpicky \
   --show-traceback \
   --fail-on-warning \
   --doctree-dir docs/build/doctrees \
   docs/source docs/build/html

View the documentation:

# On Linux/macOS
open docs/build/html/index.html

# On Windows
start docs/build/html/index.html

Other Documentation Formats

The docs Makefile supports various output formats:

cd docs
make epub      # Build EPUB documentation
make man       # Build man pages
make clean     # Clean build directory

Without make, use these sphinx-build commands:

cd docs

# Build EPUB documentation
sphinx-build -b epub source build/epub

# Build man pages
sphinx-build -b man source build/man

# Clean build directory
rm -rf build/

CI Workflow

The docs workflow automatically builds and validates documentation on pushes and pull requests. See .github/workflows/docs.yml.

Linting and Code Quality Checks

Code quality is enforced using black, flake8, isort, and pylint.

Running Locally

Format code and run linters using make commands:

make format       # Format code with black and isort
make format-check # Check formatting without making changes
make lint         # Run flake8 and pylint

Or manually with Poetry:

# Format code (equivalent to make format)
isort --profile black --python-version auto ./
black . ./clusx ./tests

# Check formatting without changes (equivalent to make format-check)
isort --check-only --profile black --python-version auto --diff ./
black --check . ./clusx ./tests

# Run linters (equivalent to make lint)
flake8 ./
pylint ./clusx

Pre-commit Hooks

The project uses pre-commit hooks to ensure code quality before commits:

# Install pre-commit hooks
pre-commit install

# Run pre-commit hooks on all files
pre-commit run --all-files

CI Workflow

The CI workflow in .github/workflows/ci.yml includes formatting and linting checks. Pull requests with formatting issues will show the diff of improperly formatted files.

Release Process

The release process involves version tagging and package publishing to PyPI.

Steps for Release

  1. Ensure all tests pass and documentation builds successfully

  2. Update CHANGELOG.md with the changes in the new version

  3. Tag the version using git and push tag to GitHub:

    git tag -a v0.x.y -m "Release v0.x.y"
    git push origin v0.x.y
    
  4. Build and publish the package:

    poetry build
    poetry publish
    

CI Workflow

The release workflow is triggered when a new tag matching the pattern v* is pushed to GitHub. It builds the package and publishes it to PyPI.

Continuous Integration and Deployment

CI/CD is managed via GitHub Actions, with workflows for:

  • Testing: Ensures functionality and compatibility across Python 3.11, 3.12, and 3.13 on Ubuntu

  • Linting: Maintains code quality with flake8, black, and isort

  • Documentation: Validates and builds project documentation

  • Building: Verifies the package’s integrity

  • Release: Publishes the package to PyPI

The CI workflow includes:

  • Caching of dependencies to speed up builds

  • Automatic code formatting verification

  • Coverage reporting to Codecov

  • JUnit XML test results

Development Guidelines

Code Style

The project follows the Black code style. Configuration is in pyproject.toml:

[tool.black]
line-length = 88
target-version = ["py312"]

Import Sorting

Imports should be sorted using isort with the Black profile:

[tool.isort]
profile = "black"
py_version = 312

Type Annotations

Use type annotations for all function parameters and return values:

def process_text(text: str, threshold: float = 0.5) -> list[str]:
    """Process the input text and return a list of tokens."""
    # Implementation

Documentation Standards

  • Use Numpy-style docstrings for all public functions, classes, and methods

  • Include examples in docstrings where appropriate

  • Keep the documentation up-to-date with code changes

Example docstring:

def calculate_similarity(text1: str, text2: str) -> float:
    """Calculate the semantic similarity between two texts.

    Parameters
    ----------
    text1 : str
        The first text string
    text2 : str
        The second text string

    Returns
    -------
    float
        A float between 0 and 1 representing similarity

    Examples
    --------
    >>> calculate_similarity("Hello world", "Hi world")
    0.85
    """
    # Implementation

Troubleshooting

Common Development Issues

  1. Poetry environment issues:

    # Recreate the virtual environment
    rm -rf .venv
    poetry env remove --all
    poetry install
    
  2. Pre-commit hook failures:

    # Update pre-commit hooks
    pre-commit autoupdate
    
    # Run hooks manually
    pre-commit run --all-files
    
  3. Documentation build errors:

    # Clean build directory
    cd docs
    make clean
    
    # Rebuild with verbose output
    sphinx-build -v --nitpicky --show-traceback --fail-on-warning --builder html docs/source docs/build/html
    
  4. Test failures:

    # Run tests with verbose output
    pytest -vvv ./clusx ./tests
    
    # Run a specific test
    pytest -vvv ./tests/test_specific_file.py::test_specific_function
    
  5. Cleaning build artifacts without make:

    # Remove Python cache files
    find ./ -name '__pycache__' -delete -o -name '*.pyc' -delete
    
    # Remove pytest cache
    rm -rf ./.pytest_cache
    
    # Remove coverage reports
    rm -rf ./coverage