Contributing to RegressionLab

Thank you for your interest in contributing to RegressionLab! This guide will help you get started with contributing code, documentation, or other improvements to the project.

Ways to Contribute

There are many ways to contribute to RegressionLab:

🐛 Report bugs: Help us identify issues.
💡 Suggest features: Share your ideas for improvements.
📝 Improve documentation: Fix typos, add examples, clarify explanations.
🔧 Fix bugs: Submit patches for known issues.
✨ Add features: Implement new functionality.
🧪 Write tests: Improve code coverage.
🌍 Add translations: Support more languages.
📊 Add equations: Contribute new fitting functions.
🎨 Improve UI/UX: Enhance user experience.

Getting Started

1. Fork and Clone

# Fork the repository on GitHub (click "Fork" button)

# Clone your fork
git clone https://github.com/DOKOS-TAYOS/RegressionLab.git
cd RegressionLab

# Add upstream remote
git remote add upstream https://github.com/DOKOS-TAYOS/RegressionLab.git

2. Set Up Development Environment

# Create virtual environment
python -m venv .venv

# Activate virtual environment
source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate     # Windows

# Install dependencies including development tools (pytest, ruff, black, mypy, pre-commit)
pip install -r requirements-dev.txt
# Or install from pyproject.toml with dev extras:
# pip install -e ".[dev]"

# Install the package in editable mode (if not already done by the line above)
pip install -e .

3. Create a Branch

# Update main branch
git checkout main
git pull upstream main

# Create feature branch
git checkout -b feature/your-feature-name
# or for bug fixes:
git checkout -b fix/issue-description

Branch Naming Conventions:

feature/ - New features.
fix/ - Bug fixes.
docs/ - Documentation updates.
refactor/ - Code refactoring.
test/ - Adding or updating tests.

Development Guidelines

Code Style

RegressionLab follows PEP 8 with some project-specific conventions:

1. Line Length

Maximum 100 characters per line.
For long strings, use implicit concatenation or textwrap.

# Good
message = (
    "This is a very long message that spans "
    "multiple lines for better readability."
)

# Bad
message = "This is a very long message that spans multiple lines for better readability."

2. Type Hints

Always include type hints for function signatures:

from typing import Optional, Tuple
from numpy.typing import NDArray
import numpy as np

def process_data(
    data: NDArray[np.floating],
    threshold: float = 0.5,
    normalize: bool = True
) -> Tuple[NDArray[np.floating], float]:
    """Process data with optional normalization."""
    ...

3. Docstrings

Use Google-style docstrings:

def fit_curve(data: pd.DataFrame, equation: str) -> Tuple[np.ndarray, float]:
    """
    Fit a curve to the provided data.
    
    This function performs nonlinear least squares fitting using
    scipy.optimize.curve_fit with automatic initial parameter estimation.
    
    Args:
        data: DataFrame containing x, y, and optional uncertainty columns
        equation: Name of the equation to fit (e.g., 'linear_function')
        
    Returns:
        Tuple containing:
            - Fitted parameters as ndarray
            - R-squared value as float
            
    Raises:
        FittingError: If the fitting algorithm fails to converge
        ValueError: If equation name is not recognized
        
    Examples:
        >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': [2, 4, 6]})
        >>> params, r2 = fit_curve(data, 'linear_function')
        >>> params
        array([2.0])
        >>> r2
        1.0
    """
    ...

4. Imports

Organize imports in three groups, separated by blank lines:

# Standard library
import os
import sys
from pathlib import Path
from typing import Optional, List

# Third-party packages
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit

# Local imports
from config import AVAILABLE_EQUATION_TYPES
from fitting.fitting_utils import generic_fit
from utils.exceptions import FittingError
from utils.logger import get_logger

5. Naming Conventions

Functions/variables: snake_case
Classes: PascalCase
Constants: UPPER_SNAKE_CASE
Private members: _leading_underscore

# Good
MAX_ITERATIONS = 1000

class DataLoader:
    def __init__(self):
        self._cache = {}
    
    def load_file(self, file_path: str) -> pd.DataFrame:
        ...
    
    def _parse_header(self, line: str) -> List[str]:
        ...

6. Comments

Use comments sparingly - prefer self-documenting code
Comments should explain why, not what
Keep comments up-to-date when code changes

# Good - explains why
# Use absolute_sigma=True to treat uncertainties as absolute values,
# not relative weights, which is correct for experimental data
popt, pcov = curve_fit(func, x, y, sigma=uy, absolute_sigma=True)

# Bad - states the obvious
# Call curve_fit function
popt, pcov = curve_fit(func, x, y)

Testing

Writing Tests

Location: Place tests in tests/ directory
Naming: Test files should match source files: test_<module>.py
Structure: Use pytest conventions

# tests/test_fitting_functions.py
import numpy as np
import pandas as pd
import pytest
from fitting.fitting_functions import func_lineal, ajlineal


class TestFuncLineal:
    """Tests for func_lineal mathematical function."""
    
    def test_scalar_input(self):
        """Test with scalar input."""
        result = func_lineal(5.0, 2.0)
        assert result == 10.0
    
    def test_array_input(self):
        """Test with array input."""
        t = np.array([1, 2, 3])
        result = func_lineal(t, 2.0)
        expected = np.array([2, 4, 6])
        np.testing.assert_array_equal(result, expected)
    
    @pytest.mark.parametrize("t,m,expected", [
        (0, 5, 0),
        (3, 2, 6),
        (-2, 4, -8),
    ])
    def test_various_inputs(self, t, m, expected):
        """Test with various parameter combinations."""
        assert func_lineal(t, m) == expected


class TestAjlineal:
    """Tests for ajlineal fitting function."""
    
    def test_perfect_linear_fit(self):
        """Test fitting with perfect linear data."""
        x = np.linspace(0, 10, 50)
        y = 3.0 * x  # Perfect linear relationship
        
        data = pd.DataFrame({'x': x, 'y': y})
        
        param_text, y_fitted, equation, r_squared = ajlineal(data, 'x', 'y')
        
        # R² should be nearly 1 for perfect fit
        assert r_squared > 0.9999
        
        # Fitted values should match original data
        np.testing.assert_array_almost_equal(y_fitted, y, decimal=10)
    
    def test_noisy_data(self):
        """Test fitting with noisy data."""
        np.random.seed(42)  # Reproducibility
        x = np.linspace(0, 10, 100)
        y = 2.5 * x + np.random.normal(0, 0.5, 100)
        
        data = pd.DataFrame({'x': x, 'y': y})
        
        param_text, y_fitted, equation, r_squared = ajlineal(data, 'x', 'y')
        
        # Should still get good fit despite noise
        assert r_squared > 0.95
    
    def test_with_uncertainties(self):
        """Test fitting with uncertainty columns."""
        x = np.linspace(0, 10, 50)
        y = 3.0 * x
        
        data = pd.DataFrame({
            'x': x,
            'y': y,
            'ux': np.ones_like(x) * 0.1,
            'uy': np.ones_like(y) * 0.2
        })
        
        param_text, y_fitted, equation, r_squared = ajlineal(data, 'x', 'y')
        
        assert r_squared > 0.99
    
    def test_raises_on_invalid_data(self):
        """Test that appropriate errors are raised for invalid data."""
        with pytest.raises(KeyError):
            # Missing column
            data = pd.DataFrame({'x': [1, 2, 3]})
            ajlineal(data, 'x', 'y')

Running Tests

# Run all tests
pytest tests/
# Or: python tests/run_tests.py
# Or use launcher: bin\run_tests.bat (Windows) / bin/run_tests.sh (Linux/macOS)

# Run specific test file
pytest tests/test_fitting_functions.py

# Run specific test
pytest tests/test_fitting_functions.py::TestFuncLineal::test_scalar_input

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run with verbose output
pytest tests/ -v

# Run tests in parallel (requires pytest-xdist)
pytest tests/ -n auto

Check coverage:

pytest tests/ --cov=src --cov-report=term-missing

Adding New Fitting Functions

See Extending RegressionLab for detailed guide.

Summary:

Add mathematical and fitting functions in src/fitting/functions/ (e.g. special.py, polynomials.py)
Export the fit function from src/fitting/functions/__init__.py, then register in src/config/equations.yaml (add entry with function, formula, format, param_names)
Add translations to src/locales/ (en.json, es.json, de.json)
Write tests in tests/test_fitting_functions.py
Update documentation

Adding Translations

To add a new language:

Create locale file: src/locales/<language_code>.json

{
  "menu": {
    "normal_fitting": "Your translation",
    "multiple_datasets": "Your translation"
  },
  "dialog": {
    "select_file": "Your translation"
  }
}

Update config: Add the language code to SUPPORTED_LANGUAGE_CODES and LANGUAGE_ALIASES in src/config/constants.py
Test thoroughly: Check all UI elements in both interfaces
Update documentation: Add language to README and docs

Documentation

Updating Documentation

Documentation is in docs/ directory
Use Markdown format
Follow existing structure and style
Include code examples where appropriate
Add screenshots for UI changes (place in docs/images/)

Building Documentation

# Install documentation dependencies
pip install -r sphinx-docs/requirements.txt

# Build HTML documentation
cd sphinx-docs
./build_docs.sh  # Linux/macOS
build_docs.bat   # Windows

# View documentation
./open_docs.sh   # Linux/macOS
open_docs.bat    # Windows

Development Setup

Recommended Tools

IDE: Visual Studio Code, PyCharm, or your preference
Git Client: Command line, GitKraken, or GitHub Desktop
Python Version: 3.12 or higher
Virtual Environment: Always use virtual environments

VS Code Setup

Recommended extensions:

Python (Microsoft)
Pylance
Python Test Explorer
GitLens
Markdown All in One

Workspace settings (.vscode/settings.json):

{
    "ruff.enable": true,
    "python.formatting.provider": "black",
    "python.testing.pytestEnabled": true,
    "editor.rulers": [100],
    "files.trimTrailingWhitespace": true,
    "files.insertFinalNewline": true
}

The project uses Ruff for linting (see requirements-dev.txt and pyproject.toml optional [dev] dependencies). Ruff reads configuration from pyproject.toml. Install the Ruff extension for VS Code.

Project Structure

Understanding the codebase:

RegressionLab/
├── src/                          # Source code
│   ├── config/                  # Configuration (env, theme, paths, constants, equations.yaml)
│   ├── i18n.py                  # Internationalization
│   ├── main_program.py          # Tkinter entry point
│   ├── fitting/                 # Curve fitting (functions/, fitting_utils, workflow_controller)
│   ├── frontend/                # Tkinter UI (ui_main_menu, image_utils, ui_dialogs/)
│   ├── loaders/                 # Data loaders, CSV/Excel
│   ├── plotting/                # Plot utilities
│   ├── streamlit_app/           # Streamlit web app (app.py, sections/)
│   ├── locales/                 # Translation JSON (en, es, de)
│   └── utils/                   # Exceptions, logger, validators
├── tests/                       # Pytest suite (run_tests.py, conftest.py, test_*.py)
├── docs/                        # User documentation (Markdown)
├── sphinx-docs/                 # Sphinx sources and build scripts
├── input/                       # Sample datasets
├── output/                      # Generated plots
├── bin/                         # Launchers (run, run_streamlit, run_tests)
├── scripts/                     # Helper scripts (clean, generate_test_datasets, generate_multi_var_dataset)
├── install.bat                  # Windows installation script
├── install.sh                   # Linux/macOS installation script
├── setup.bat                    # Windows setup script
├── setup.sh                     # Linux/macOS setup script
├── .env.example                 # Sample environment configuration (dotenv)
├── .gitignore                   # git ignore rules
├── requirements.txt             # Python dependencies (runtime + Streamlit, Pillow)
├── requirements-dev.txt         # Developer dependencies (pytest, ruff, black, mypy, pre-commit)
├── pyproject.toml               # Project metadata, build config, and optional [dev] / [docs] deps
├── README.md                    # Project overview/readme
├── CHANGELOG.md                 # Project changelog
└── LICENSE                      # License file

Communication

Asking Questions

GitHub Discussions: For general questions and ideas
GitHub Issues: For bug reports and feature requests
Email: For private inquiries

Reporting Bugs

Use the GitHub issue template and include:

Title: Clear, specific description
Version: RegressionLab version
Environment: OS, Python version
Steps to reproduce: Exact steps
Expected behavior: What should happen
Actual behavior: What actually happens
Error messages: Full traceback
Sample data: If possible
Screenshots: For UI issues

Suggesting Features

Use the GitHub feature request template:

Problem description: What problem does this solve?
Proposed solution: How should it work?
Alternatives: Other solutions considered
Additional context: Examples, mockups, etc.

Code of Conduct

Our Standards

Be respectful: Treat everyone with respect.
Be constructive: Provide helpful feedback.
Be patient: Everyone is learning.
Be professional: Keep discussions on-topic.

Unacceptable Behavior

Harassment or discriminatory language.
Personal attacks.
Trolling or inflammatory comments.
Publishing private information.
Other unprofessional conduct.

Enforcement

Violations may result in:

Warning.
Temporary ban.
Permanent ban.

Report issues to: alejandro.mata.ali@gmail.com

License

By contributing to RegressionLab, you agree that your contributions will be licensed under the MIT License.

Recognition

Contributors may be recognized in:

A CONTRIBUTORS.md file (if added to the project).
Release notes.
Documentation credits.

Thank you for contributing to RegressionLab! 🎉

Questions about contributing? Open a GitHub Discussion or email alejandro.mata.ali@gmail.com.