============ Installation ============ Overview ======== The Clusterium project provides two main components: * **Python package** ``clusx``: A comprehensive library that can be imported and used in your Python applications for text clustering and analysis. * **Command-line utility** ``clusx``: A convenient command-line interface that provides direct access to the package's functionality without writing Python code. Both components are installed simultaneously when following the instructions below, allowing you to choose the most appropriate interface for your specific needs. Requirements ============ Before installing ``clusx``, ensure you have the following prerequisites: * Python 3.11 or higher * `pip `_ (for PyPI installation) * `Poetry `_ 2.0 or higher (for development installation only) * `Git `_ (for cloning the repository) Python Version Compatibility ---------------------------- ``clusx`` requires Python 3.11 or higher. This requirement ensures access to the latest language features and optimizations. The project is tested with Python 3.11, 3.12, and 3.13. If you're using an older version of Python, you'll need to upgrade before installing ``clusx``: .. code-block:: bash # Check your current Python version python --version .. warning:: **Python Command Differences Across Operating Systems** The ``python`` and ``pip`` commands may behave differently across operating systems: * **macOS**: By default, ``python`` often points to Python 2.x. You should use ``python3`` and ``pip3`` commands instead. * **Linux**: Many distributions now default ``python`` to Python 3.x, but some still maintain ``python`` as Python 2.x. Use ``python3`` and ``pip3`` to ensure you're using the correct version. * **Windows**: Recent installations typically have ``python`` pointing to Python 3.x, but it's best to verify with ``python --version``. To ensure you're using the correct Python version, always check: .. code-block:: bash # Check Python version python --version # or python3 --version # Check pip version pip --version # or pip3 --version If ``python`` points to Python 2.x on your system, replace all ``python`` commands in this guide with ``python3`` and all ``pip`` commands with ``pip3``. **Finding Your Python Installations** To check where Python is installed and how many Python versions you have on your system: **On Unix-based systems (Linux, macOS):** .. code-block:: bash # Find the location of the python/python3 executable which python which python3 # Alternative command to find executable location command -v python command -v python3 # List all instances of python in your PATH type -a python type -a python3 # Check if you have multiple Python installations ls -l /usr/bin/python* ls -l /usr/local/bin/python* **On Windows:** .. code-block:: bash # Find the location of Python executable where python where python3 # Check Python version and installation path py -0 Installation Methods ==================== There are several ways to install ``clusx`` depending on your needs: Installing from PyPI (Recommended) ---------------------------------- ``clusx`` is a Python package `hosted on PyPI `_. The recommended installation method is using `pip `_ to install into a virtual environment: .. code-block:: bash # Create and activate a virtual environment (recommended) python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install clusx python -m pip install clusx # Alternative commands if python points to Python 2.x on your system python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate python3 -m pip install clusx # or pip3 install clusx After installation, the ``clusx`` command will be available from the command line: .. code-block:: bash # Verify installation clusx --version More information about ``pip`` and PyPI can be found here: * `Install pip `_ * `Python Packaging User Guide `_ Installing from GitHub Releases ------------------------------- Another way to install package is to download it from GitHub Releases page: 1. Visit the `GitHub Releases page `_ 2. Download the desired release artifacts (both ``.whl`` and/or ``.tar.gz`` files) 3. Download the corresponding checksum files (``SHA256SUMS``, ``SHA512SUMS``, or ``MD5SUMS``) 4. Verify the integrity of the downloaded files: .. code-block:: bash # Verify with SHA256 (recommended) sha256sum -c SHA256SUMS 5. Install the verified package: .. code-block:: bash # Create a directory for the download mkdir clusx-download && cd clusx-download # Download the latest release artifacts and checksums (replace X.Y.Z with the actual version) # You can use wget or curl wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/clusx-X.Y.Z-py3-none-any.whl wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/clusx-X.Y.Z.tar.gz wget https://github.com/sergeyklay/clusterium/releases/download/X.Y.Z/SHA256SUMS # Verify the integrity of the downloaded files sha256sum -c SHA256SUMS # Create and activate a virtual environment python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install the verified package (choose one) pip install clusx-X.Y.Z-py3-none-any.whl # Wheel file (recommended) # OR pip install clusx-X.Y.Z.tar.gz # Source distribution # If python points to Python 2.x on your system pip3 install clusx-X.Y.Z-py3-none-any.whl # Or pip3 install clusx-X.Y.Z.tar.gz # Verify the installation clusx --version Installing the Development Version ---------------------------------- If you need the latest unreleased features, you can install directly from the GitHub repository: .. code-block:: bash # Install the latest development version python -m pip install -e git+https://github.com/sergeyklay/clusterium.git#egg=clusx # If python points to Python 2.x on your system python3 -m pip install -e git+https://github.com/sergeyklay/clusterium.git#egg=clusx .. note:: The ``main`` branch will always contain the latest unstable version, so the experience might not be as smooth. If you wish to use a stable version, consider installing from PyPI or switching to a specific `tag `_. Installing for Development -------------------------- If you plan to contribute to the project or need to modify the code, follow these steps: 1. Clone the repository: .. code-block:: bash git clone https://github.com/sergeyklay/clusterium.git cd clusterium 2. Create and activate a virtual environment: .. code-block:: bash python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # If python points to Python 2.x on your system python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate 3. Install with Poetry: .. code-block:: bash # Install Poetry if you haven't already # See https://python-poetry.org/docs/#installation # Install dependencies poetry install Installation Options with Poetry ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Poetry allows for flexible installation options based on your specific needs: **Full Development Environment** To install all dependency groups, including development tools, testing frameworks, and documentation generators: .. code-block:: bash poetry install --with dev,testing,docs **Production Installation** For production environments where you only need the core functionality: .. code-block:: bash poetry install --without dev,testing,docs **Custom Installation** You can customize which dependency groups to include: .. code-block:: bash # For development without documentation tools poetry install --with dev,testing --without docs # For documentation work only poetry install --with docs --without dev,testing Verifying Installation ====================== To verify that the installation was successful, run: .. code-block:: bash clusx --version Or using the Python module: .. code-block:: bash python -m clusx --version # If python points to Python 2.x on your system python3 -m clusx --version You should see the version information and a brief copyright notice. Dependencies ============ Core Dependencies ----------------- These dependencies are installed by default and are required for the basic functionality: * ``numpy``: For numerical operations * ``sentence-transformers``: For text embeddings * ``scipy``: For distance calculations * ``matplotlib``: For visualization * ``torch``: For deep learning operations * ``tqdm``: For progress bars * ``click``: For command-line interface * ``pandas``: For data manipulation * ``powerlaw``: For statistical analysis * ``scikit-learn``: For machine learning algorithms Optional Dependency Groups -------------------------- When installing with Poetry, you can choose specific dependency groups: Development Dependencies ^^^^^^^^^^^^^^^^^^^^^^^^ Tools for development and code quality: * ``black``: Code formatter * ``debugpy``: Debugging tool * ``flake8``: Linter * ``isort``: Import sorter * ``pre-commit``: Git hooks manager Testing Dependencies ^^^^^^^^^^^^^^^^^^^^ Tools for testing the codebase: * ``pytest``: Testing framework * ``coverage``: Code coverage tool Documentation Dependencies ^^^^^^^^^^^^^^^^^^^^^^^^^^ Tools for building documentation: * ``sphinx``: Documentation generator * ``sphinx-rtd-theme``: Read the Docs theme for Sphinx Troubleshooting =============== Common Issues ------------- If you encounter any issues during installation: 1. Ensure you have the correct Python version (3.11+) 2. Make sure you're using the latest version of pip or Poetry 3. Check for any error messages during the installation process PyTorch Installation Issues --------------------------- If you encounter issues with PyTorch installation: .. code-block:: bash # Install PyTorch separately with CUDA support if needed pip install torch --index-url https://download.pytorch.org/whl/cu118 # Then continue with the installation pip install clusx Dependency Conflicts -------------------- If you encounter dependency conflicts: .. code-block:: bash # For pip installations, try: pip install --upgrade pip pip install clusx --no-deps pip install -r <(pip freeze | grep -v clusx) # For Poetry installations: poetry self update poetry lock --no-update poetry install