Конспект 15: Advanced RAG - Продвинутые техники

Введение в Advanced RAG

Что такое Advanced RAG?

Определение: Высокоуровневые техники для улучшения качества Retrieval-Augmented Generation систем

Проблемы базового RAG:

Базовый RAG:
Query → Embed → Search → Retrieve top-k → LLM → Answer

Проблемы:
❌ Может найти неправильные документы
❌ Может вернуть нерелевантные чанки
❌ Не понимает сложные запросы
❌ Не может делать multi-step reasoning
❌ Плохо работает с редкими вопросами

Advanced RAG решение:

Advanced RAG:
Query → Understand intent → Smart retrieval → Re-rank → 
Fuse multiple sources → Compress context → LLM → Answer

Улучшения:
✅ Понимает что нужно
✅ Находит лучшие документы
✅ Может комбинировать источники
✅ Multi-hop reasoning
✅ Работает с редким и сложным

История RAG

2020: RAG основы (Lewis et al.)
- Простой retriever + LLM
- BERT для embeddings

2021-2022: Dense retrieval (DPR, ColBERT)
- Лучше embeddings
- Более точный поиск

2023: Advanced RAG techniques
- Query expansion
- Multi-modal retrieval
- Reranking

2024: Hybrid RAG
- Text + Knowledge graphs
- Multi-hop reasoning
- Adaptive retrieval

2025: Reasoning-augmented RAG
- Integration with o1/o3
- Complex reasoning over documents

Базовый RAG vs Advanced RAG

Архитектурное сравнение

БАЗОВЫЙ RAG:

User Query
    ↓
Embed Query
    ↓
Vector Search
    ↓
Top-K Retrieval
    ↓
LLM Generation
    ↓
Answer

Простой, быстрый, но часто неправильный


ADVANCED RAG:

User Query
    ↓
Query Understanding (NER, intent detection)
    ↓
Query Expansion (паraphrase, decomposition)
    ↓
Multi-source Retrieval:
├─ Vector search
├─ BM25 (keyword search)
├─ Knowledge graph
└─ Structured data
    ↓
Re-ranking (cross-encoder)
    ↓
Fusion (combine results)
    ↓
Contextual Compression
    ↓
LLM Generation
    ↓
Answer Verification
    ↓
Answer

Метрики улучшения

Метрика           | Базовый RAG | Advanced RAG | Улучшение
|-----------------|-------------|-------------|-----------|
| Retrieval Success| 70%         | 92%         | +22%
| Answer Accuracy | 75%         | 88%         | +13%
| Latency         | 2 sec       | 5 sec       | -150%
| Hallucination   | 15%         | 5%          | -67%
| Context Quality | 6/10        | 9/10        | +50%

Вывод: Качество выше, но медленнее
Решение: Smart caching, parallel processing

Query Understanding

Intent Detection

Идея: Понять что на самом деле ищет пользователь

from enum import Enum

class QueryIntent(Enum):
    FACTUAL = "factual_qa"       # "What is N8N?"
    ANALYTICAL = "analysis"       # "Compare X vs Y"
    INSTRUCTIONAL = "how_to"     # "How to use N8N?"
    REASONING = "reasoning"       # "Why is X better?"
    CREATIVE = "creative"         # "Generate ideas for..."

def detect_intent(query: str, llm) -> QueryIntent:
    """
    Determine what type of query this is
    """

    prompt = f"""
    Classify this query intent:

    Query: {query}

    Options:
    - FACTUAL: Simple fact lookup
    - ANALYTICAL: Comparison/analysis
    - INSTRUCTIONAL: How-to/tutorial
    - REASONING: Why/explain
    - CREATIVE: Ideation/generation

    Intent:
    """

    response = llm.predict(prompt).strip()
    return QueryIntent[response]

# Использование
intent = detect_intent("How does N8N handle workflows?", llm)
# → INSTRUCTIONAL

Named Entity Recognition (NER)

from spacy import load

nlp = load("en_core_web_lg")

def extract_entities(query: str) -> dict:
    """
    Extract important entities from query
    """

    doc = nlp(query)

    entities = {
        "PERSON": [],
        "ORG": [],
        "PRODUCT": [],
        "LOCATION": [],
        "DATE": []
    }

    for ent in doc.ents:
        if ent.label_ in entities:
            entities[ent.label_].append(ent.text)

    return entities

# Использование
query = "How does OpenAI's GPT-4 work in production?"
entities = extract_entities(query)
# → {"PERSON": [], "ORG": ["OpenAI"], "PRODUCT": ["GPT-4"]}

Query Expansion

Идея: Сгенерировать варианты вопроса для лучшего поиска

def expand_query(query: str, llm, num_expansions: int = 3) -> list[str]:
    """
    Generate multiple versions of the query
    """

    prompt = f"""
    Generate {num_expansions} paraphrases of this query:

    Original: {query}

    Paraphrases (one per line):
    """

    response = llm.predict(prompt)
    paraphrases = [line.strip() for line in response.split('\n') if line.strip()]

    return [query] + paraphrases  # Include original

# Использование
original = "How to use N8N for automation?"
expanded = expand_query(original, llm)
# → [
#    "How to use N8N for automation?",
#    "How can I use N8N to automate workflows?",
#    "Guide to using N8N for task automation",
#    "What is the process of automating tasks with N8N?"
# ]

# Теперь можно искать все варианты
for query_variant in expanded:
    results.extend(vector_search(query_variant, top_k=2))
# Больше релевантных результатов!

Retrieval Optimization

Hybrid Retrieval (Vector + BM25)

from sklearn.metrics.pairwise import cosine_similarity
from rank_bm25 import BM25Okapi

class HybridRetriever:
    def __init__(self, documents, embeddings_model):
        self.documents = documents
        self.embeddings = embeddings_model

        # Vector search
        self.document_embeddings = [
            embeddings_model.embed(doc) for doc in documents
        ]

        # BM25 search
        tokenized_docs = [doc.split() for doc in documents]
        self.bm25 = BM25Okapi(tokenized_docs)

    def retrieve(self, query: str, top_k: int = 5) -> list[dict]:
        """
        Retrieve using both vector and keyword search
        """

        # Vector search
        query_embedding = self.embeddings.embed(query)
        vector_scores = cosine_similarity(
            [query_embedding],
            self.document_embeddings
        )[0]

        # BM25 search
        bm25_scores = self.bm25.get_scores(query.split())

        # Normalize and combine
        vector_scores_norm = (vector_scores - vector_scores.min()) / (vector_scores.max() - vector_scores.min())
        bm25_scores_norm = (bm25_scores - bm25_scores.min()) / (bm25_scores.max() - bm25_scores.min())

        combined_scores = 0.7 * vector_scores_norm + 0.3 * bm25_scores_norm

        # Get top-k
        top_indices = combined_scores.argsort()[-top_k:][::-1]

        return [
            {
                "document": self.documents[i],
                "score": combined_scores[i],
                "vector_score": vector_scores_norm[i],
                "bm25_score": bm25_scores_norm[i]
            }
            for i in top_indices
        ]

# Использование
retriever = HybridRetriever(documents, embeddings_model)
results = retriever.retrieve("How to automate workflows?", top_k=5)

from sentence_transformers import util

class MultimodalRetriever:
    def __init__(self, documents, images, embeddings_model):
        self.documents = documents
        self.images = images
        self.embeddings = embeddings_model

        # Embed all modalities
        self.doc_embeddings = [embeddings_model.embed(doc) for doc in documents]
        self.img_embeddings = [embeddings_model.embed(img) for img in images]

    def retrieve(self, query: str, top_k: int = 5) -> list[dict]:
        """
        Retrieve both text and images
        """

        query_embedding = self.embeddings.embed(query)

        # Search documents
        doc_scores = util.cos_sim(query_embedding, self.doc_embeddings)[0]
        top_docs = doc_scores.argsort()[-top_k:][::-1]

        # Search images
        img_scores = util.cos_sim(query_embedding, self.img_embeddings)[0]
        top_imgs = img_scores.argsort()[-top_k:][::-1]

        results = []
        for idx in top_docs:
            results.append({
                "type": "text",
                "content": self.documents[idx],
                "score": doc_scores[idx]
            })

        for idx in top_imgs:
            results.append({
                "type": "image",
                "content": self.images[idx],
                "score": img_scores[idx]
            })

        return sorted(results, key=lambda x: x["score"], reverse=True)[:top_k]

Re-ranking и Fusion

Cross-Encoder Re-ranking

from sentence_transformers import CrossEncoder

class RAGReranker:
    def __init__(self):
        self.model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

    def rerank(self, query: str, documents: list[str], top_k: int = 5) -> list[dict]:
        """
        Re-rank documents using cross-encoder
        """

        # Pair query with each document
        pairs = [[query, doc] for doc in documents]

        # Get scores
        scores = self.model.predict(pairs)

        # Sort by score
        ranked = sorted(
            zip(documents, scores),
            key=lambda x: x[1],
            reverse=True
        )

        return [
            {
                "document": doc,
                "score": float(score)
            }
            for doc, score in ranked[:top_k]
        ]

# Использование
reranker = RAGReranker()

# First retrieve with fast method (100 docs)
initial_results = vector_search(query, top_k=100)

# Then rerank (slower but more accurate)
final_results = reranker.rerank(query, initial_results, top_k=5)

Result Fusion (RRF - Reciprocal Rank Fusion)

def reciprocal_rank_fusion(result_lists: list[list[dict]], k: int = 60) -> list[dict]:
    """
    Combine results from multiple retrievers
    """

    fused_scores = {}

    for result_list in result_lists:
        for rank, result in enumerate(result_list, 1):
            doc_id = id(result["document"])

            # RRF formula: 1 / (k + rank)
            rrf_score = 1 / (k + rank)

            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"document": result["document"], "score": 0}

            fused_scores[doc_id]["score"] += rrf_score

    # Sort by combined score
    return sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )

# Использование
vector_results = vector_search(query, top_k=10)
bm25_results = bm25_search(query, top_k=10)
knowledge_graph_results = kg_search(query, top_k=10)

# Combine all
fused = reciprocal_rank_fusion([vector_results, bm25_results, knowledge_graph_results])

Multi-hop Reasoning

Идея: Для сложных вопросов нужно несколько шагов поиска

class MultiHopRAG:
    def __init__(self, retriever, llm):
        self.retriever = retriever
        self.llm = llm

    def answer_multi_hop(self, query: str, max_hops: int = 3) -> dict:
        """
        Answer complex questions through multiple retrieval steps
        """

        context = []
        current_query = query

        for hop in range(max_hops):
            # 1. Retrieve for current query
            results = self.retriever.retrieve(current_query, top_k=3)
            context.extend([r["document"] for r in results])

            # 2. Check if we have enough info
            enough_info = self.llm.predict(f"""
                Based on this context, can we answer: {query}?

                Context: {' '.join(context)}

                Answer YES or NO and suggest next search:
            """)

            if "YES" in enough_info:
                break

            # 3. Extract next search query
            next_query = self.llm.predict(f"""
                What should we search next to answer: {query}?

                Current context: {' '.join(context)}

                Next search query:
            """)

            current_query = next_query.strip()

        # Generate final answer
        answer = self.llm.predict(f"""
            Answer this question: {query}

            Using context: {' '.join(context)}

            Answer:
        """)

        return {
            "query": query,
            "hops": min(hop + 1, max_hops),
            "context": context,
            "answer": answer
        }

# Пример
# Q: "What year was the founder of N8N born?"
# Hop 1: Search "Who founded N8N?" → "Jan Hruska"
# Hop 2: Search "Jan Hruska birth year" → "1990"
# Answer: "1990"

Adaptive RAG

Идея: Выбирать стратегию retrieval в зависимости от типа вопроса

class AdaptiveRAG:
    def __init__(self, retriever, llm, knowledge_graph=None):
        self.retriever = retriever
        self.llm = llm
        self.kg = knowledge_graph

    def retrieve_adaptively(self, query: str) -> dict:
        """
        Choose retrieval strategy based on query type
        """

        # 1. Determine complexity
        complexity = self.llm.predict(f"""
            Is this query simple, moderate, or complex?
            Query: {query}
            Answer: (simple/moderate/complex)
        """).strip()

        # 2. Determine knowledge type needed
        needs_facts = "factual" in query.lower() or "what" in query.lower()
        needs_reasoning = "why" in query.lower() or "how" in query.lower()
        needs_comparison = "vs" in query.lower() or "compare" in query.lower()

        # 3. Choose strategy
        if complexity == "simple" and needs_facts:
            # Fast path: simple keyword search
            strategy = "fast_retrieval"
            results = self.retriever.retrieve(query, top_k=3, method="bm25")

        elif complexity == "moderate":
            # Balanced path: hybrid search + reranking
            strategy = "hybrid_retrieval"
            results = self.retriever.retrieve(query, top_k=10, method="hybrid")
            results = rerank(query, results, top_k=5)

        elif needs_comparison:
            # Comparison path: multiple searches
            strategy = "comparison_retrieval"
            entities = extract_entities(query)
            results = []
            for entity in entities:
                results.extend(self.retriever.retrieve(entity, top_k=3))

        elif self.kg and needs_reasoning:
            # Knowledge graph path: structured reasoning
            strategy = "kg_retrieval"
            results = self.kg.retrieve(query)

        else:
            # Complex path: multi-hop
            strategy = "multi_hop_retrieval"
            results = multi_hop_retrieval(query)

        return {
            "strategy": strategy,
            "complexity": complexity,
            "results": results
        }

# Использование
rag = AdaptiveRAG(retriever, llm, knowledge_graph)
result = rag.retrieve_adaptively("Compare N8N and Zapier")
# → Automatically chooses comparison_retrieval strategy

Contextual Compression

Идея: Сжать контекст чтобы LLM видел только релевантное

class ContextCompressor:
    def __init__(self, llm):
        self.llm = llm

    def compress(self, query: str, documents: list[str]) -> str:
        """
        Extract only relevant parts from documents
        """

        combined = "\n\n".join([f"[Doc {i+1}]\n{doc}" for i, doc in enumerate(documents)])

        prompt = f"""
        Extract only the parts relevant to: "{query}"

        Documents:
        {combined}

        Compressed context (only relevant parts):
        """

        return self.llm.predict(prompt)

    def summarize(self, documents: list[str]) -> str:
        """
        Summarize documents while keeping relevance
        """

        combined = "\n\n".join(documents)

        prompt = f"""
        Summarize these documents in 3-5 sentences:

        {combined}

        Summary:
        """

        return self.llm.predict(prompt)

# Использование
compressor = ContextCompressor(llm)

# Instead of passing full documents
full_context = "\n".join(documents)  # Can be 10K tokens!

# Compress first
compressed = compressor.compress(query, documents)  # Now 2K tokens

response = llm.predict(f"""
Answer: {query}
Context: {compressed}
""")
# Экономия на токенах: 80%!

Knowledge Graph Integration

class KnowledgeGraphRAG:
    def __init__(self, kg, embeddings_model):
        self.kg = kg  # Neo4j, etc.
        self.embeddings = embeddings_model

    def retrieve_with_kg(self, query: str) -> dict:
        """
        Combine semantic search with structured knowledge
        """

        # 1. Extract entities from query
        entities = extract_entities(query)

        # 2. Find related entities in KG
        related_entities = []
        for entity in entities["PRODUCT"]:
            related = self.kg.query(f"""
                MATCH (n {{name: '{entity}'}})-[r]->(m)
                RETURN m.name, r.type
                LIMIT 10
            """)
            related_entities.extend(related)

        # 3. Retrieve documents for main + related entities
        documents = []
        for entity in entities["PRODUCT"] + [e[0] for e in related_entities]:
            docs = vector_search(entity, top_k=3)
            documents.extend(docs)

        # 4. Build knowledge context
        kg_context = f"""
        Key entities: {', '.join([e['name'] for e in entities['PRODUCT']])}

        Related concepts:
        """
        for entity, rel_type in related_entities[:5]:
            kg_context += f"\n- {entity} ({rel_type})"

        return {
            "entities": entities,
            "documents": documents,
            "kg_context": kg_context
        }

# Использование
kg_rag = KnowledgeGraphRAG(neo4j_connection, embeddings)
result = kg_rag.retrieve_with_kg("How does N8N integrate with Zapier?")
# → Finds both direct documents AND related services from KG

Практические примеры

Пример 1: Production RAG Pipeline

class ProductionRAG:
    def __init__(self, config: dict):
        self.retriever = HybridRetriever(...)
        self.reranker = RAGReranker()
        self.compressor = ContextCompressor(...)
        self.llm = OpenAI(...)
        self.cache = Redis()

    def query(self, user_query: str) -> dict:
        """Full RAG pipeline"""

        # 1. Check cache
        cache_key = hashlib.md5(user_query.encode()).hexdigest()
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)

        # 2. Understand query
        intent = detect_intent(user_query, self.llm)
        expanded_queries = expand_query(user_query, self.llm)

        # 3. Retrieve (parallel)
        all_results = []
        for q in expanded_queries:
            results = self.retriever.retrieve(q, top_k=20)
            all_results.extend(results)

        # 4. Re-rank
        reranked = self.reranker.rerank(user_query, all_results, top_k=10)

        # 5. Compress context
        documents = [r["document"] for r in reranked]
        compressed_context = self.compressor.compress(user_query, documents)

        # 6. Generate answer
        prompt = f"""
        Question: {user_query}

        Context:
        {compressed_context}

        Answer:
        """

        answer = self.llm.predict(prompt)

        # 7. Verify answer
        verification = self.llm.predict(f"""
            Is this answer supported by the context?
            Answer: {answer}
            Context: {compressed_context}

            YES or NO:
        """)

        result = {
            "query": user_query,
            "intent": intent.value,
            "answer": answer,
            "verified": "YES" in verification,
            "source_count": len(reranked),
            "sources": documents[:3]
        }

        # 8. Cache result
        self.cache.setex(cache_key, 3600, json.dumps(result))

        return result

Пример 2: Multi-hop RAG System

class InvestigativeRAG:
    """For complex research questions"""

    def __init__(self, retriever, llm):
        self.retriever = retriever
        self.llm = llm
        self.research_path = []

    def research(self, research_question: str, max_depth: int = 5) -> dict:
        """
        Conduct multi-step research
        """

        findings = {
            "question": research_question,
            "steps": [],
            "final_answer": "",
            "evidence": []
        }

        current_questions = [research_question]

        for depth in range(max_depth):
            step_findings = []
            next_questions = []

            for q in current_questions:
                # Retrieve
                results = self.retriever.retrieve(q, top_k=5)

                # Ask follow-up questions
                followup = self.llm.predict(f"""
                    To answer "{q}", what else should we know?

                    Current findings: {results}

                    Next questions (3 max, one per line):
                """)

                step_findings.append({
                    "question": q,
                    "results": results,
                    "followup": followup
                })

                next_questions.extend(followup.split('\n')[:3])

            findings["steps"].append({
                "depth": depth + 1,
                "findings": step_findings
            })

            if not next_questions:
                break

            current_questions = next_questions

        # Final synthesis
        all_evidence = []
        for step in findings["steps"]:
            for finding in step["findings"]:
                all_evidence.extend(finding["results"])

        findings["final_answer"] = self.llm.predict(f"""
            Based on this research:

            {json.dumps(findings, indent=2)}

            Answer the original question: {research_question}

            Final answer:
        """)

        findings["evidence"] = all_evidence

        return findings

Лучшие практики

Do's ✅

Используй hybrid retrieval Vector + BM25 > одного
Всегда переранжируй результаты Cross-encoder дает лучше качество
Сжимай контекст перед LLM Меньше токенов = дешевле и быстрее
Кэшируй результаты Одни и те же вопросы часто повторяются
Мониторь качество retrieval Отслеживай успешность поиска

Don'ts ❌

Не передавай весь контекст в LLM Компресс перед отправкой
Не игнорируй специальные вопросы Используй adaptive strategies
Не используй только vector search Гибридный подход работает лучше
Не забывай про latency Advanced RAG медленнее - используй cache и parallel

Чек-лист Advanced RAG

Retrieval:
☑️ Hybrid retrieval настроено
☑️ Query expansion реализована
☑️ Multi-source retrieval работает

Processing:
☑️ Re-ranking применяется
☑️ Contextual compression работает
☑️ Result fusion настроено

Quality:
☑️ Retrieval quality мониторится
☑️ Answer verification включена
☑️ Hallucination detection работает

Performance:
☑️ Caching реализовано
☑️ Parallel processing применено
☑️ Latency оптимизирована

Maintenance:
☑️ Documents регулярно обновляются
☑️ Metrics отслеживаются
☑️ A/B testing проводится

Дата создания: December 2025
Версия: 1.0
Автор: Pavel
Применение: Information Retrieval, Knowledge Integration, Production Search Systems

Конспект 15: Advanced RAG - Продвинутые техники

Оглавление

Введение в Advanced RAG

Что такое Advanced RAG?

История RAG

Базовый RAG vs Advanced RAG

Архитектурное сравнение

Метрики улучшения

Query Understanding

Intent Detection

Named Entity Recognition (NER)

Query Expansion

Retrieval Optimization

Hybrid Retrieval (Vector + BM25)

Re-ranking и Fusion

Cross-Encoder Re-ranking

Result Fusion (RRF - Reciprocal Rank Fusion)

Multi-hop Reasoning

Adaptive RAG

Contextual Compression

Knowledge Graph Integration

Практические примеры

Пример 1: Production RAG Pipeline

Пример 2: Multi-hop RAG System

Лучшие практики

Do's ✅

Don'ts ❌

Чек-лист Advanced RAG

Комментарии (0)

Конспект 15: Advanced RAG - Продвинутые техники

Оглавление

Введение в Advanced RAG

Что такое Advanced RAG?

История RAG

Базовый RAG vs Advanced RAG

Архитектурное сравнение

Метрики улучшения

Query Understanding

Intent Detection

Named Entity Recognition (NER)

Query Expansion

Retrieval Optimization

Hybrid Retrieval (Vector + BM25)

Multi-modal Retrieval

Re-ranking и Fusion

Cross-Encoder Re-ranking

Result Fusion (RRF - Reciprocal Rank Fusion)

Multi-hop Reasoning

Adaptive RAG

Contextual Compression

Knowledge Graph Integration

Практические примеры

Пример 1: Production RAG Pipeline

Пример 2: Multi-hop RAG System

Лучшие практики

Do's ✅

Don'ts ❌

Чек-лист Advanced RAG

Комментарии (0)