Installation

Overview

The Clusterium project provides two main components:

  • Python package clusx: A comprehensive library that can be imported and used in your Python applications for text clustering and analysis.

  • Command-line utility clusx: A convenient command-line interface that provides direct access to the package’s functionality without writing Python code.

Both components are installed simultaneously when following the instructions below, allowing you to choose the most appropriate interface for your specific needs.

Requirements

Before installing clusx, ensure you have the following prerequisites:

  • Python 3.11 or higher

  • pip (for PyPI installation)

  • Poetry 2.0 or higher (for development installation only)

  • Git (for cloning the repository)

Python Version Compatibility

clusx requires Python 3.11 or higher. This requirement ensures access to the latest language features and optimizations. The project is tested with Python 3.11, 3.12, and 3.13.

If you’re using an older version of Python, you’ll need to upgrade before installing clusx:

# Check your current Python version
python --version

Warning

Python Command Differences Across Operating Systems

The python and pip commands may behave differently across operating systems:

  • macOS: By default, python often points to Python 2.x. You should use python3 and pip3 commands instead.

  • Linux: Many distributions now default python to Python 3.x, but some still maintain python as Python 2.x. Use python3 and pip3 to ensure you’re using the correct version.

  • Windows: Recent installations typically have python pointing to Python 3.x, but it’s best to verify with python --version.

To ensure you’re using the correct Python version, always check:

# Check Python version
python --version  # or python3 --version

# Check pip version
pip --version  # or pip3 --version

If python points to Python 2.x on your system, replace all python commands in this guide with python3 and all pip commands with pip3.

Finding Your Python Installations

To check where Python is installed and how many Python versions you have on your system:

On Unix-based systems (Linux, macOS):

# Find the location of the python/python3 executable
which python
which python3

# Alternative command to find executable location
command -v python
command -v python3

# List all instances of python in your PATH
type -a python
type -a python3

# Check if you have multiple Python installations
ls -l /usr/bin/python*
ls -l /usr/local/bin/python*

On Windows:

# Find the location of Python executable
where python
where python3

# Check Python version and installation path
py -0

Installation Methods

There are several ways to install clusx depending on your needs:

Installing from GitHub Releases

Another way to install package is to download it from GitHub Releases page:

  1. Visit the GitHub Releases page

  2. Download the desired release artifacts (both .whl and/or .tar.gz files)

  3. Download the corresponding checksum files (SHA256SUMS, SHA512SUMS, or MD5SUMS)

  4. Verify the integrity of the downloaded files:

    # Verify with SHA256 (recommended)
    sha256sum -c SHA256SUMS
    
  5. Install the verified package:

    # Create a directory for the download
    mkdir clusx-download && cd clusx-download
    
    # Download the latest release artifacts and checksums (replace X.Y.Z with the actual version)
    # You can use wget or curl
    wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/clusx-X.Y.Z-py3-none-any.whl
    wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/clusx-X.Y.Z.tar.gz
    wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/SHA256SUMS
    
    # Verify the integrity of the downloaded files
    sha256sum -c SHA256SUMS
    
    # Create and activate a virtual environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
    # Install the verified package (choose one)
    pip install clusx-X.Y.Z-py3-none-any.whl  # Wheel file (recommended)
    # OR
    pip install clusx-X.Y.Z.tar.gz  # Source distribution
    
    # If python points to Python 2.x on your system
    pip3 install clusx-X.Y.Z-py3-none-any.whl
    # Or
    pip3 install clusx-X.Y.Z.tar.gz
    
    # Verify the installation
    clusx --version
    

Installing the Development Version

If you need the latest unreleased features, you can install directly from the GitHub repository:

# Install the latest development version
python -m pip install -e git+https://github.com/sergeyklay/clusterium.git#egg=clusx

# If python points to Python 2.x on your system
python3 -m pip install -e git+https://github.com/sergeyklay/clusterium.git#egg=clusx

Note

The main branch will always contain the latest unstable version, so the experience might not be as smooth. If you wish to use a stable version, consider installing from PyPI or switching to a specific tag.

Installing for Development

If you plan to contribute to the project or need to modify the code, follow these steps:

  1. Clone the repository:

    git clone https://github.com/sergeyklay/clusterium.git
    cd clusterium
    
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
    # If python points to Python 2.x on your system
    python3 -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  3. Install with Poetry:

    # Install Poetry if you haven't already
    # See https://python-poetry.org/docs/#installation
    
    # Install dependencies
    poetry install
    

Installation Options with Poetry

Poetry allows for flexible installation options based on your specific needs:

Full Development Environment

To install all dependency groups, including development tools, testing frameworks, and documentation generators:

poetry install --with dev,testing,docs

Production Installation

For production environments where you only need the core functionality:

poetry install --without dev,testing,docs

Custom Installation

You can customize which dependency groups to include:

# For development without documentation tools
poetry install --with dev,testing --without docs

# For documentation work only
poetry install --with docs --without dev,testing

Verifying Installation

To verify that the installation was successful, run:

clusx --version

Or using the Python module:

python -m clusx --version
# If python points to Python 2.x on your system
python3 -m clusx --version

You should see the version information and a brief copyright notice.

Dependencies

Core Dependencies

These dependencies are installed by default and are required for the basic functionality:

  • numpy: For numerical operations

  • sentence-transformers: For text embeddings

  • scipy: For distance calculations

  • matplotlib: For visualization

  • torch: For deep learning operations

  • tqdm: For progress bars

  • click: For command-line interface

  • pandas: For data manipulation

  • powerlaw: For statistical analysis

  • scikit-learn: For machine learning algorithms

Optional Dependency Groups

When installing with Poetry, you can choose specific dependency groups:

Development Dependencies

Tools for development and code quality:

  • black: Code formatter

  • debugpy: Debugging tool

  • flake8: Linter

  • isort: Import sorter

  • pre-commit: Git hooks manager

Testing Dependencies

Tools for testing the codebase:

  • pytest: Testing framework

  • coverage: Code coverage tool

Documentation Dependencies

Tools for building documentation:

  • sphinx: Documentation generator

  • sphinx-rtd-theme: Read the Docs theme for Sphinx

Troubleshooting

Common Issues

If you encounter any issues during installation:

  1. Ensure you have the correct Python version (3.11+)

  2. Make sure you’re using the latest version of pip or Poetry

  3. Check for any error messages during the installation process

PyTorch Installation Issues

If you encounter issues with PyTorch installation:

# Install PyTorch separately with CUDA support if needed
pip install torch --index-url https://download.pytorch.org/whl/cu118

# Then continue with the installation
pip install clusx

Dependency Conflicts

If you encounter dependency conflicts:

# For pip installations, try:
pip install --upgrade pip
pip install clusx --no-deps
pip install -r <(pip freeze | grep -v clusx)

# For Poetry installations:
poetry self update
poetry lock --no-update
poetry install