RAGLib Plugin System¶

RAGLib supports a powerful plugin architecture that allows you to extend the framework with custom techniques, adapters, and other components. This guide explains how to create, distribute, and use plugins with RAGLib.

Overview¶

The plugin system enables:

Technique plugins: Custom RAG techniques and processing components
Adapter plugins: New data sources and storage backends
Architecture plugins: Novel RAG architectures and patterns
Automatic discovery: Plugins are automatically found and registered
Entry point support: Standard Python packaging with setuptools
Local development: Load plugins from local directories

Quick Start¶

Creating Your First Plugin¶

Create a new Python package with your custom technique:

# my_plugin/techniques.py
from raglib.techniques.base import RAGTechnique
from raglib.schemas import Document, RagResult

class CustomRetriever(RAGTechnique):
    """Example custom retrieval technique."""

    def __init__(self, similarity_threshold=0.7):
        super().__init__()
        self.similarity_threshold = similarity_threshold

    def process(self, query: str, docs: list[Document]) -> RagResult:
        # Your custom retrieval logic
        filtered_docs = []
        for doc in docs:
            if self._compute_similarity(query, doc.text) > self.similarity_threshold:
                filtered_docs.append(doc)

        return RagResult(
            documents=filtered_docs,
            metadata={"threshold": self.similarity_threshold}
        )

    def _compute_similarity(self, query: str, text: str) -> float:
        # Simple word overlap example
        query_words = set(query.lower().split())
        text_words = set(text.lower().split())

        if not query_words:
            return 0.0

        overlap = len(query_words.intersection(text_words))
        return overlap / len(query_words)

Plugin Package Structure¶

my_raglib_plugin/
├── setup.py                 # Package configuration with entry points
├── my_raglib_plugin/
│   ├── __init__.py          # Plugin initialization
│   ├── techniques.py        # Custom techniques
│   ├── adapters.py          # Custom adapters (optional)
│   └── utils.py            # Utility functions (optional)
└── tests/
    └── test_techniques.py   # Plugin tests

setup.py Configuration¶

Configure entry points for automatic discovery:

# setup.py
from setuptools import setup, find_packages

setup(
    name="my-raglib-plugin",
    version="0.1.0",
    packages=find_packages(),
    install_requires=[
        "raglib>=0.1.0",
    ],
    entry_points={
        # Technique plugins
        "raglib.techniques": [
            "custom_retriever = my_raglib_plugin.techniques:CustomRetriever",
            "custom_ranker = my_raglib_plugin.techniques:CustomRanker",
        ],
        # Adapter plugins (optional)
        "raglib.adapters": [
            "custom_vectorstore = my_raglib_plugin.adapters:CustomVectorStore",
        ],
        # Architecture plugins (optional)
        "raglib.architectures": [
            "custom_rag = my_raglib_plugin.architectures:CustomRAGArchitecture",
        ],
    },
    author="Your Name",
    author_email="your.email@example.com",
    description="Custom RAGLib plugin with specialized techniques",
    python_requires=">=3.8",
)

Installing and Using Plugins¶

# Install your plugin
pip install my-raglib-plugin

# Or install in development mode
pip install -e /path/to/my-raglib-plugin

# Use your plugin techniques
from raglib.plugins import PluginLoader
from raglib.registry import TechniqueRegistry

# Discover and register plugins
loader = PluginLoader()
loader.discover()

# Your custom technique is now available
technique = TechniqueRegistry.get("custom_retriever")
result = technique.process("sample query", documents)

Plugin Development Guide¶

Technique Plugins¶

Custom techniques should inherit from the appropriate base class:

from raglib.techniques.base import RAGTechnique, Chunker, Retriever, Generator

class MyCustomChunker(Chunker):
    """Custom document chunking strategy."""

    def process(self, text: str) -> list[Document]:
        # Your chunking logic
        chunks = self._smart_chunk(text)

        return [
            Document(id=f"chunk_{i}", text=chunk)
            for i, chunk in enumerate(chunks)
        ]

    def _smart_chunk(self, text: str) -> list[str]:
        # Custom chunking algorithm
        return text.split('\n\n')  # Simple example

class MyCustomGenerator(Generator):
    """Custom response generation."""

    def __init__(self, model_name="gpt-3.5-turbo"):
        super().__init__()
        self.model_name = model_name

    def process(self, query: str, docs: list[Document]) -> RagResult:
        # Your generation logic
        context = "\n".join(doc.text for doc in docs)
        response = self._generate_response(query, context)

        return RagResult(
            response=response,
            documents=docs,
            metadata={"model": self.model_name}
        )

    def _generate_response(self, query: str, context: str) -> str:
        # Your LLM integration
        return f"Generated response for: {query}"

Adapter Plugins¶

Create custom data adapters for different sources:

from raglib.adapters.base import VectorStore, Embedder

class RedisVectorStore(VectorStore):
    """Redis-based vector storage."""

    def __init__(self, host="localhost", port=6379):
        import redis
        self.client = redis.Redis(host=host, port=port)

    def store_embeddings(self, docs: list[Document], embeddings: list[list[float]]):
        for doc, embedding in zip(docs, embeddings):
            self.client.hset(
                f"doc:{doc.id}",
                mapping={
                    "text": doc.text,
                    "embedding": json.dumps(embedding),
                    "metadata": json.dumps(doc.metadata or {})
                }
            )

    def similarity_search(self, query_embedding: list[float], top_k: int = 5) -> list[Document]:
        # Your similarity search implementation
        # This would typically use Redis search capabilities
        pass

class HuggingFaceEmbedder(Embedder):
    """HuggingFace transformer embeddings."""

    def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer(model_name)

    def embed_texts(self, texts: list[str]) -> list[list[float]]:
        embeddings = self.model.encode(texts)
        return embeddings.tolist()

    def embed_query(self, query: str) -> list[float]:
        return self.model.encode([query])[0].tolist()

Architecture Plugins¶

Implement complete RAG architectures:

from raglib.architectures.base import RAGArchitecture

class MultiStepRAG(RAGArchitecture):
    """Multi-step reasoning RAG architecture."""

    def __init__(self, retriever, reranker, generator):
        self.retriever = retriever
        self.reranker = reranker  
        self.generator = generator

    def process(self, query: str, docs: list[Document]) -> RagResult:
        # Step 1: Initial retrieval
        retrieval_result = self.retriever.process(query, docs)

        # Step 2: Re-ranking
        rerank_result = self.reranker.process(query, retrieval_result.documents)

        # Step 3: Multi-step generation
        steps = self._decompose_query(query)
        final_result = None

        for step in steps:
            step_result = self.generator.process(step, rerank_result.documents)
            # Accumulate results...
            final_result = step_result

        return final_result

    def _decompose_query(self, query: str) -> list[str]:
        # Query decomposition logic
        return [query]  # Simplified

Plugin Distribution¶

Publishing to PyPI¶

# Build distribution
python setup.py sdist bdist_wheel

# Upload to PyPI
twine upload dist/*

Version Compatibility¶

Specify RAGLib version requirements:

# setup.py
install_requires=[
    "raglib>=0.1.0,<0.2.0",  # Compatible with 0.1.x
]

Plugin Metadata¶

Add rich metadata for discovery:

setup(
    name="my-raglib-plugin",
    # ... other setup args ...
    classifiers=[
        "Development Status :: 4 - Beta",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: MIT License",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Topic :: Scientific/Engineering :: Artificial Intelligence",
    ],
    keywords="rag, retrieval, augmented, generation, nlp",
    project_urls={
        "Homepage": "https://github.com/yourusername/my-raglib-plugin",
        "Documentation": "https://my-raglib-plugin.readthedocs.io/",
        "Bug Reports": "https://github.com/yourusername/my-raglib-plugin/issues",
    },
)

Local Development¶

Development Installation¶

# Install RAGLib in development mode
pip install -e /path/to/raglib

# Install your plugin in development mode
pip install -e /path/to/my-plugin

Local Plugin Discovery¶

Load plugins from local directories without installation:

from raglib.plugins import PluginLoader

# Load plugins from specific directories
loader = PluginLoader()
loader.discover(local_dirs=[
    "/path/to/my-plugin",
    "/path/to/another-plugin"
])

# Plugins are now available in registry
print(TechniqueRegistry.list())

Testing Plugins¶

Create comprehensive tests for your plugins:

# tests/test_techniques.py
import pytest
from raglib.schemas import Document
from my_raglib_plugin.techniques import CustomRetriever

def test_custom_retriever():
    """Test custom retriever functionality."""
    retriever = CustomRetriever(similarity_threshold=0.5)

    docs = [
        Document(id="1", text="machine learning algorithms"),
        Document(id="2", text="deep learning networks"), 
        Document(id="3", text="cooking recipes")
    ]

    result = retriever.process("machine learning", docs)

    # Should retrieve relevant documents
    assert len(result.documents) >= 1
    assert result.documents[0].id == "1"
    assert result.metadata["threshold"] == 0.5

def test_custom_retriever_no_matches():
    """Test retriever with no matching documents."""
    retriever = CustomRetriever(similarity_threshold=0.9)

    docs = [Document(id="1", text="completely unrelated content")]
    result = retriever.process("machine learning", docs)

    assert len(result.documents) == 0

Plugin Discovery API¶

PluginLoader Class¶

from raglib.plugins import PluginLoader

# Initialize loader
loader = PluginLoader()

# Discover all plugins
discovered = loader.discover(
    local_dirs=[],           # Optional local directories
    force_reload=False       # Force re-discovery of entry points
)

print(f"Discovered {discovered['techniques']} techniques")
print(f"Discovered {discovered['adapters']} adapters") 
print(f"Discovered {discovered['architectures']} architectures")

Manual Plugin Registration¶

Register plugins manually if needed:

from raglib.registry import TechniqueRegistry
from my_plugin.techniques import CustomTechnique

# Register directly
TechniqueRegistry.register("my_custom", CustomTechnique)

# Use immediately
technique = TechniqueRegistry.get("my_custom")

Plugin Information¶

Get information about loaded plugins:

# List all available techniques
techniques = TechniqueRegistry.list()
print("Available techniques:", techniques)

# Get technique class
technique_class = TechniqueRegistry.get_class("custom_retriever")
print(f"Class: {technique_class.__name__}")
print(f"Module: {technique_class.__module__}")

Advanced Plugin Features¶

Configuration Support¶

Add configuration support to your plugins:

class ConfigurableRetriever(RAGTechnique):
    """Retriever with external configuration support."""

    @classmethod
    def from_config(cls, config: dict):
        return cls(
            threshold=config.get("threshold", 0.7),
            max_results=config.get("max_results", 10)
        )

    def __init__(self, threshold=0.7, max_results=10):
        super().__init__()
        self.threshold = threshold
        self.max_results = max_results

Plugin Dependencies¶

Handle optional dependencies gracefully:

class OptionalDependencyTechnique(RAGTechnique):
    """Technique with optional external dependencies."""

    def __init__(self):
        super().__init__()
        self._check_dependencies()

    def _check_dependencies(self):
        try:
            import optional_library
            self.has_optional = True
        except ImportError:
            self.has_optional = False
            print("Warning: optional_library not found, using fallback")

    def process(self, query: str, docs: list[Document]) -> RagResult:
        if self.has_optional:
            return self._advanced_process(query, docs)
        else:
            return self._fallback_process(query, docs)

Lifecycle Hooks¶

Implement plugin lifecycle methods:

class LifecycleAwareTechnique(RAGTechnique):
    """Technique with initialization and cleanup hooks."""

    def initialize(self):
        """Called after plugin discovery."""
        self.setup_resources()

    def cleanup(self):
        """Called on shutdown.""" 
        self.release_resources()

    def setup_resources(self):
        # Initialize expensive resources
        pass

    def release_resources(self):
        # Clean up resources
        pass

Best Practices¶

Plugin Design¶

Single responsibility: Each plugin should have a focused purpose
Clear interfaces: Follow RAGLib base class contracts
Error handling: Handle errors gracefully with informative messages
Documentation: Include docstrings and usage examples
Testing: Comprehensive test coverage for all functionality

Performance¶

Lazy loading: Don't load expensive resources until needed
Caching: Cache expensive computations when appropriate
Memory management: Clean up resources properly
Batch processing: Support batch operations when possible

Compatibility¶

Version pinning: Specify compatible RAGLib versions
Python versions: Test with multiple Python versions
Dependencies: Minimize required dependencies
Graceful degradation: Handle missing optional dependencies

Distribution¶

Semantic versioning: Use proper version numbering
Change logs: Document changes between versions
License: Include appropriate license
Metadata: Rich package metadata for discoverability

Troubleshooting¶

Common Issues¶

Plugin not discovered:

# Check entry points
import pkg_resources
for entry_point in pkg_resources.iter_entry_points("raglib.techniques"):
    print(entry_point.name, entry_point.module_name)

Import errors:

# Verify plugin installation
import my_raglib_plugin
print(my_raglib_plugin.__file__)

Registration conflicts:

# Check for name collisions
existing = TechniqueRegistry.list()
print("Existing techniques:", existing)

Local plugin not found:

# Verify directory structure
import os
plugin_dir = "/path/to/plugin" 
print("Files:", os.listdir(plugin_dir))
print("Python files:", [f for f in os.listdir(plugin_dir) if f.endswith('.py')])

Debug Mode¶

Enable verbose plugin discovery:

loader = PluginLoader(verbose=True)
discovered = loader.discover()
# Prints detailed discovery information

For comprehensive examples, see examples/plugin_example/ and tests/test_plugins.py.