Installation
Overview
The Clusterium project provides two main components:
Python package
clusx: A comprehensive library that can be imported and used in your Python applications for text clustering and analysis.Command-line utility
clusx: A convenient command-line interface that provides direct access to the package’s functionality without writing Python code.
Both components are installed simultaneously when following the instructions below, allowing you to choose the most appropriate interface for your specific needs.
Requirements
Before installing clusx, ensure you have the following prerequisites:
Python 3.11 or higher
pip (for PyPI installation)
Poetry 2.0 or higher (for development installation only)
Git (for cloning the repository)
Python Version Compatibility
clusx requires Python 3.11 or higher. This requirement ensures access to the latest language features and optimizations. The project is tested with Python 3.11, 3.12, and 3.13.
If you’re using an older version of Python, you’ll need to upgrade before installing clusx:
# Check your current Python version
python --version
Warning
Python Command Differences Across Operating Systems
The python and pip commands may behave differently across operating systems:
macOS: By default,
pythonoften points to Python 2.x. You should usepython3andpip3commands instead.Linux: Many distributions now default
pythonto Python 3.x, but some still maintainpythonas Python 2.x. Usepython3andpip3to ensure you’re using the correct version.Windows: Recent installations typically have
pythonpointing to Python 3.x, but it’s best to verify withpython --version.
To ensure you’re using the correct Python version, always check:
# Check Python version
python --version # or python3 --version
# Check pip version
pip --version # or pip3 --version
If python points to Python 2.x on your system, replace all python commands in this guide with python3 and all pip commands with pip3.
Finding Your Python Installations
To check where Python is installed and how many Python versions you have on your system:
On Unix-based systems (Linux, macOS):
# Find the location of the python/python3 executable
which python
which python3
# Alternative command to find executable location
command -v python
command -v python3
# List all instances of python in your PATH
type -a python
type -a python3
# Check if you have multiple Python installations
ls -l /usr/bin/python*
ls -l /usr/local/bin/python*
On Windows:
# Find the location of Python executable
where python
where python3
# Check Python version and installation path
py -0
Installation Methods
There are several ways to install clusx depending on your needs:
Installing from PyPI (Recommended)
clusx is a Python package hosted on PyPI.
The recommended installation method is using pip to install into a virtual environment:
# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install clusx
python -m pip install clusx
# Alternative commands if python points to Python 2.x on your system
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
python3 -m pip install clusx
# or
pip3 install clusx
After installation, the clusx command will be available from the command line:
# Verify installation
clusx --version
More information about pip and PyPI can be found here:
Installing from GitHub Releases
Another way to install package is to download it from GitHub Releases page:
Visit the GitHub Releases page
Download the desired release artifacts (both
.whland/or.tar.gzfiles)Download the corresponding checksum files (
SHA256SUMS,SHA512SUMS, orMD5SUMS)Verify the integrity of the downloaded files:
# Verify with SHA256 (recommended) sha256sum -c SHA256SUMS
Install the verified package:
# Create a directory for the download mkdir clusx-download && cd clusx-download # Download the latest release artifacts and checksums (replace X.Y.Z with the actual version) # You can use wget or curl wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/clusx-X.Y.Z-py3-none-any.whl wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/clusx-X.Y.Z.tar.gz wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/SHA256SUMS # Verify the integrity of the downloaded files sha256sum -c SHA256SUMS # Create and activate a virtual environment python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install the verified package (choose one) pip install clusx-X.Y.Z-py3-none-any.whl # Wheel file (recommended) # OR pip install clusx-X.Y.Z.tar.gz # Source distribution # If python points to Python 2.x on your system pip3 install clusx-X.Y.Z-py3-none-any.whl # Or pip3 install clusx-X.Y.Z.tar.gz # Verify the installation clusx --version
Installing the Development Version
If you need the latest unreleased features, you can install directly from the GitHub repository:
# Install the latest development version
python -m pip install -e git+https://github.com/sergeyklay/clusterium.git#egg=clusx
# If python points to Python 2.x on your system
python3 -m pip install -e git+https://github.com/sergeyklay/clusterium.git#egg=clusx
Note
The main branch will always contain the latest unstable version, so the experience
might not be as smooth. If you wish to use a stable version, consider installing from PyPI
or switching to a specific tag.
Installing for Development
If you plan to contribute to the project or need to modify the code, follow these steps:
Clone the repository:
git clone https://github.com/sergeyklay/clusterium.git cd clusterium
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # If python points to Python 2.x on your system python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
Install with Poetry:
# Install Poetry if you haven't already # See https://python-poetry.org/docs/#installation # Install dependencies poetry install
Installation Options with Poetry
Poetry allows for flexible installation options based on your specific needs:
Full Development Environment
To install all dependency groups, including development tools, testing frameworks, and documentation generators:
poetry install --with dev,testing,docs
Production Installation
For production environments where you only need the core functionality:
poetry install --without dev,testing,docs
Custom Installation
You can customize which dependency groups to include:
# For development without documentation tools
poetry install --with dev,testing --without docs
# For documentation work only
poetry install --with docs --without dev,testing
Verifying Installation
To verify that the installation was successful, run:
clusx --version
Or using the Python module:
python -m clusx --version
# If python points to Python 2.x on your system
python3 -m clusx --version
You should see the version information and a brief copyright notice.
Dependencies
Core Dependencies
These dependencies are installed by default and are required for the basic functionality:
numpy: For numerical operationssentence-transformers: For text embeddingsscipy: For distance calculationsmatplotlib: For visualizationtorch: For deep learning operationstqdm: For progress barsclick: For command-line interfacepandas: For data manipulationpowerlaw: For statistical analysisscikit-learn: For machine learning algorithms
Optional Dependency Groups
When installing with Poetry, you can choose specific dependency groups:
Development Dependencies
Tools for development and code quality:
black: Code formatterdebugpy: Debugging toolflake8: Linterisort: Import sorterpre-commit: Git hooks manager
Testing Dependencies
Tools for testing the codebase:
pytest: Testing frameworkcoverage: Code coverage tool
Documentation Dependencies
Tools for building documentation:
sphinx: Documentation generatorsphinx-rtd-theme: Read the Docs theme for Sphinx
Troubleshooting
Common Issues
If you encounter any issues during installation:
Ensure you have the correct Python version (3.11+)
Make sure you’re using the latest version of pip or Poetry
Check for any error messages during the installation process
PyTorch Installation Issues
If you encounter issues with PyTorch installation:
# Install PyTorch separately with CUDA support if needed
pip install torch --index-url https://download.pytorch.org/whl/cu118
# Then continue with the installation
pip install clusx
Dependency Conflicts
If you encounter dependency conflicts:
# For pip installations, try:
pip install --upgrade pip
pip install clusx --no-deps
pip install -r <(pip freeze | grep -v clusx)
# For Poetry installations:
poetry self update
poetry lock --no-update
poetry install