Búsqueda híbrida con Qdrant y n8n, Legal AI: Recuperación

Name: Búsqueda híbrida con Qdrant y n8n, Legal AI: Recuperación
Rating: 4.5 (10 reviews)
Author: Jenny

Avanzado

Este es unautomatización que contiene 17 nodos.Utiliza principalmente nodos como Set, Merge, Filter, SplitOut, Qdrant. Búsqueda híbrida basada en Qdrant y n8n, Legal AI: Recuperación

Requisitos previos

•Información de conexión del servidor Qdrant
•Pueden requerirse credenciales de autenticación para la API de destino

Nodos utilizados (17)

Categoría

Vista previa del flujo de trabajo

Visualización de las conexiones entre nodos, con soporte para zoom y panorámica

Indexar Dataset desde HuggingFace

Dividir Todos los Elementos

Obtener Divisiones del Dataset

Dividir por Fila

Conservar División de Prueba

Obtener Consultas de Prueba

Consultar Puntos

Combinar

Iterar sobre Elementos

Conservar Preguntas con Respuestas en el Dataset

Conservar Preguntas e IDs

Agregar Evaluaciones

Porcentaje de isHits en Evaluaciones

isHit = Si Encontramos la Respuesta Correcta

React Flow

Exportar flujo de trabajo

Copie la siguiente configuración JSON en n8n para importar y usar este flujo de trabajo

{
  "id": "h81ddl7uooV3eLBq",
  "meta": {
    "instanceId": "d975180a7308eb9e1d0eb6c8833136580b02ced551ba46ad477d3b76dff98527",
    "templateCredsSetupCompleted": true
  },
  "name": "Hybrid Search with Qdrant & n8n, Legal AI: Retrieval",
  "tags": [],
  "nodes": [
    {
      "id": "eb8d4dd7-f40b-4524-a9de-f9ef9eef0eca",
      "name": "Indexar Dataset desde HuggingFace",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -256,
        400
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "78030f66-5331-463f-ad22-9d09f477e3f9",
      "name": "Dividir Todos los Elementos",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        176,
        400
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "splits"
      },
      "typeVersion": 1
    },
    {
      "id": "e6d1e789-1293-480b-a163-992b0c7a2ae8",
      "name": "Obtener Divisiones del Dataset",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -32,
        400
      ],
      "parameters": {
        "url": "https://datasets-server.huggingface.co/splits",
        "options": {},
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "dataset",
              "value": "={{ $json.dataset }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "cb58241e-4579-4c5b-bd65-4a20f6cf3698",
      "name": "Dividir por Fila",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        816,
        400
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "rows"
      },
      "typeVersion": 1
    },
    {
      "id": "2ee71c28-71fd-4cba-87c7-c9886fb403c7",
      "name": "Conservar División de Prueba",
      "type": "n8n-nodes-base.filter",
      "position": [
        384,
        400
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "52e3d8e2-825f-4e43-9d5f-e275d196b442",
              "operator": {
                "name": "filter.operator.equals",
                "type": "string",
                "operation": "equals"
              },
              "leftValue": "={{ $json.split }}",
              "rightValue": "test"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "484a82fe-93d8-439b-bbb5-e96a4b5d7861",
      "name": "Obtener Consultas de Prueba",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        592,
        400
      ],
      "parameters": {
        "url": "=https://datasets-server.huggingface.co/rows",
        "options": {},
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "dataset",
              "value": "={{ $json.dataset }}"
            },
            {
              "name": "config",
              "value": "={{ $json.config }}"
            },
            {
              "name": "split",
              "value": "={{ $json.split }}"
            },
            {
              "name": "length",
              "value": "=100"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "20f67ae7-6631-4602-aa20-42a382db12ae",
      "name": "Consultar Puntos",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        2144,
        416
      ],
      "parameters": {
        "limit": 1,
        "query": "{\n  \"fusion\": \"rrf\"\n}",
        "prefetch": "=[\n  {\n    \"query\": {\n      \"text\": \"{{ $json.question }}\",\n      \"model\": \"mixedbread-ai/mxbai-embed-large-v1\"\n    },\n    \"using\": \"mxbai_large\",\n    \"limit\": 25\n  },\n  {\n    \"query\": {\n      \"text\": \"{{ $json.question }}\",\n      \"model\": \"qdrant/bm25\"\n    },\n    \"using\": \"bm25\",\n    \"limit\": 25\n  }\n]",
        "resource": "search",
        "operation": "queryPoints",
        "collectionName": {
          "__rl": true,
          "mode": "list",
          "value": "legalQA_test",
          "cachedResultName": "legalQA_test"
        },
        "requestOptions": {}
      },
      "credentials": {
        "qdrantApi": {
          "id": "LVjhdCt8pAJjLyt5",
          "name": "Qdrant account 2"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "445ace25-f900-4bcf-9f7d-9bc1db662867",
      "name": "Combinar",
      "type": "n8n-nodes-base.merge",
      "position": [
        2320,
        608
      ],
      "parameters": {
        "mode": "combine",
        "options": {},
        "combineBy": "combineAll"
      },
      "typeVersion": 3.2
    },
    {
      "id": "c631ce99-a672-499f-bbf3-e740ef431884",
      "name": "Iterar sobre Elementos",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        1776,
        400
      ],
      "parameters": {
        "options": {
          "reset": false
        }
      },
      "typeVersion": 3
    },
    {
      "id": "c075a745-ed50-459f-87dd-101a559e4523",
      "name": "Conservar Preguntas con Respuestas en el Dataset",
      "type": "n8n-nodes-base.filter",
      "position": [
        1056,
        400
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "d1120153-1852-42c0-8b0a-084e8c3190d3",
              "operator": {
                "type": "number",
                "operation": "gt"
              },
              "leftValue": "={{ $json.row.answers.length }}",
              "rightValue": 0
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "94f78e9b-f9eb-4179-9844-d6e23bc79751",
      "name": "Conservar Preguntas e IDs",
      "type": "n8n-nodes-base.set",
      "position": [
        1280,
        400
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "961c95d9-c803-404b-b4b6-cb66a8a33928",
              "name": "id_qa",
              "type": "string",
              "value": "={{ $json.row.id }}"
            },
            {
              "id": "0fefba06-4567-479c-9eb5-efbb3e13e743",
              "name": "question",
              "type": "string",
              "value": "={{ $json.row.question }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "08f67e31-ba8f-47fd-bc78-f352a160d4fd",
      "name": "Agregar Evaluaciones",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        2032,
        224
      ],
      "parameters": {
        "options": {},
        "aggregate": "aggregateAllItemData",
        "destinationFieldName": "eval"
      },
      "typeVersion": 1
    },
    {
      "id": "413ade44-d27d-4c49-862f-afb9d4e18bf6",
      "name": "Porcentaje de isHits en Evaluaciones",
      "type": "n8n-nodes-base.set",
      "position": [
        2256,
        224
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "5bca1a50-3e41-4f50-8362-cb7b185b50f6",
              "name": "Hits percentage",
              "type": "number",
              "value": "={{ ($json.eval.filter(item => item.isHit).length * 100) / $json.eval.length}}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "c0840c22-8954-4937-80d9-f32741b81e1e",
      "name": "Nota Adhesiva 2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -96,
        144
      ],
      "parameters": {
        "color": 5,
        "width": 1520,
        "height": 464,
        "content": "## Get Questions to Eval Retrieval from Hugging Face Dataset (Already Indexed to Qdrant)\n\nFetching questions from a sample Q&A dataset on Hugging Face using the [Dataset Viewer API](https://huggingface.co/docs/dataset-viewer/quick_start).  \n**Dataset:** [LegalQAEval (isaacus)](https://huggingface.co/datasets/isaacus/LegalQAEval)\n\n1. **Retrieve dataset splits**.  \n2. **Get a small subsample of questions from the `test` split**.  \n   To fetch the full split, apply [pagination in HTTP node](https://docs.n8n.io/code/cookbook/http-node/pagination/#enable-pagination), as shown in Part 1.  \n3. **Keep only questions that have a paired text chunk answering them**, so evaluation remains fair.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "742e68ae-0013-4dda-a818-4485ff80a986",
      "name": "Nota Adhesiva 4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1696,
        -256
      ],
      "parameters": {
        "color": 5,
        "width": 1088,
        "height": 1120,
        "content": "## Check Quality of Simple Hybrid Search on Legal Q&A Dataset\nFor each question in the evaluation set, using the qdrant collection created and indexed in Part 1:\n1. **Perform a Hybrid Search in Qdrant**  \n   - Get 25 results with [**BM25-based keyword retrieval**](https://en.wikipedia.org/wiki/Okapi_BM25) (exact word matches).  \n     - Sparse representations for BM25 are created automatically by Qdrant.  \n   - Get 25 results with [**mxbai-embed-large-v1**](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) semantic search (meaning-based matches).  \n     - Here we use [**Qdrant Cloud Inference**](https://qdrant.tech/documentation/cloud/inference/), so conversion of questions to vectors and searching is handled by the Qdrant node.  \n     - To use an external provider (e.g. OpenAI), see Part 1 for an example on how to adapt this template.  \n   - Fuse both result lists with **Reciprocal Rank Fusion (RRF)**.  \n   - Select the **top-1 result**.  \n2. **Check the top-1 result**  \n   - Verify if the text chunk contains the correct answer. This is done by checking if the question ID is present in the list of related to the text chunk question IDs (created in Part 1).  \n3. **Aggregate results**  \n   - Calculate the **hits@1**: percentage of evaluation questions where the top-1 retrieved chunk contained the answer.  \n\n- If results are good → you can reuse the **Qdrant Query Points** node as a tool for an **agentic legal AI RAG** system.  \n- If results are poor → don’t worry. This is the *simplest* hybrid query setup. You can improve quality with [various tooling for hybrid search in Qdrant](https://qdrant.tech/documentation/concepts/hybrid-queries/):  \n  - Reranking  \n  - Score boosting  \n  - Tuning vector index parameters  \n  - …  \n\n\nExperiment! 🙂\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "d4a32298-02ef-4f9e-b22d-3f30e9b74eb2",
      "name": "Nota Adhesiva 1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1344,
        -128
      ],
      "parameters": {
        "width": 1008,
        "height": 960,
        "content": "## Evaluate Hybrid Search on Legal Dataset\n*This is the second part of **\"Hybrid Search with Qdrant & n8n, Legal AI.\"**\nThe first part, **\"Indexing,\"** covers preparing and uploading the dataset to Qdrant.*\n\n### Overview\nThis pipeline demonstrates how to perform **Hybrid Search** on a [Qdrant collection](https://qdrant.tech/documentation/concepts/collections/#collections) using `question`s and `text` chunks (containing answers) from the  \n[LegalQAEval dataset (isaacus)](https://huggingface.co/datasets/isaacus/LegalQAEval).\n\nOn a small subset of questions, it shows:  \n- How to set up hybrid retrieval in Qdrant with:  \n  - [BM25](https://en.wikipedia.org/wiki/Okapi_BM25)-based keyword retrieval;\n  - [mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) semantic retrieval;  \n  - **Reciprocal Rank Fusion (RRF)**, a simple zero-shot fusion of the two searches;\n- How to run a basic evaluation:  \n  - Calculate **hits@1** — the percentage of evaluation questions where the top-1 retrieved text chunk contains the correct answer  \n\n\nAfter running this pipeline, you will have a quality estimate of a simple hybrid retrieval setup.  \nFrom there, you can reuse Qdrant’s **Query Points** node to build a **legal RAG chatbot**.  \n\n### Embedding Inference\n- By default, this pipeline uses [**Qdrant Cloud Inference**](https://qdrant.tech/documentation/cloud/inference/) to convert questions to embeddings.  \n- You can also use an **external embedding provider** (e.g. OpenAI).  \n  - In that case, minimally update the pipeline, similar to the adjustments showed in **Part 1: Indexing**.  \n\n### Prerequisites\n- **Completed Part 1 pipeline**, *\"Hybrid Search with Qdrant & n8n, Legal AI: Indexing\"*, and the collection created in it;\n- All the requirements of **Part 1 pipeline**;\n\n### Hybrid Search\nThe example here is a **basic hybrid query**. You can extend/enhance it with:\n- Reranking strategies;  \n- Different fusion techniques;\n- Score boosting based on metadata;\n- ...  \n\nMore details: [Hybrid Queries in Qdrant](https://qdrant.tech/documentation/concepts/hybrid-queries/).  \n\n#### P.S.\n- To ask retrieval in Qdrant-related questions, join the [Qdrant Discord](https://discord.gg/ArVgNHV6).  \n- Star [Qdrant n8n community node repo](https://github.com/qdrant/n8n-nodes-qdrant) <3\n"
      },
      "typeVersion": 1
    },
    {
      "id": "56a5efd8-ed3f-46f7-85c8-966536f24a13",
      "name": "isHit = Si Encontramos la Respuesta Correcta",
      "type": "n8n-nodes-base.set",
      "position": [
        2512,
        608
      ],
      "parameters": {
        "include": "selected",
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "80089820-cc55-4b74-966e-b50a3f4b6e36",
              "name": "isHit",
              "type": "boolean",
              "value": "={{ $json.result.points[0].payload.ids_qa.includes($json.id_qa) }}"
            }
          ]
        },
        "includeFields": "id_qa,question",
        "includeOtherFields": true
      },
      "typeVersion": 3.4
    }
  ],
  "active": false,
  "pinData": {
    "Index Dataset from HuggingFace": [
      {
        "json": {
          "dataset": "isaacus/LegalQAEval"
        }
      }
    ]
  },
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "20b38566-7985-4139-98a3-6b275e85a9cb",
  "connections": {
    "445ace25-f900-4bcf-9f7d-9bc1db662867": {
      "main": [
        [
          {
            "node": "56a5efd8-ed3f-46f7-85c8-966536f24a13",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "20f67ae7-6631-4602-aa20-42a382db12ae": {
      "main": [
        [
          {
            "node": "445ace25-f900-4bcf-9f7d-9bc1db662867",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "cb58241e-4579-4c5b-bd65-4a20f6cf3698": {
      "main": [
        [
          {
            "node": "c075a745-ed50-459f-87dd-101a559e4523",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "08f67e31-ba8f-47fd-bc78-f352a160d4fd": {
      "main": [
        [
          {
            "node": "413ade44-d27d-4c49-862f-afb9d4e18bf6",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "2ee71c28-71fd-4cba-87c7-c9886fb403c7": {
      "main": [
        [
          {
            "node": "484a82fe-93d8-439b-bbb5-e96a4b5d7861",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "c631ce99-a672-499f-bbf3-e740ef431884": {
      "main": [
        [
          {
            "node": "08f67e31-ba8f-47fd-bc78-f352a160d4fd",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "445ace25-f900-4bcf-9f7d-9bc1db662867",
            "type": "main",
            "index": 1
          },
          {
            "node": "20f67ae7-6631-4602-aa20-42a382db12ae",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "484a82fe-93d8-439b-bbb5-e96a4b5d7861": {
      "main": [
        [
          {
            "node": "cb58241e-4579-4c5b-bd65-4a20f6cf3698",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "e6d1e789-1293-480b-a163-992b0c7a2ae8": {
      "main": [
        [
          {
            "node": "78030f66-5331-463f-ad22-9d09f477e3f9",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "78030f66-5331-463f-ad22-9d09f477e3f9": {
      "main": [
        [
          {
            "node": "2ee71c28-71fd-4cba-87c7-c9886fb403c7",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "94f78e9b-f9eb-4179-9844-d6e23bc79751": {
      "main": [
        [
          {
            "node": "c631ce99-a672-499f-bbf3-e740ef431884",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "eb8d4dd7-f40b-4524-a9de-f9ef9eef0eca": {
      "main": [
        [
          {
            "node": "e6d1e789-1293-480b-a163-992b0c7a2ae8",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "56a5efd8-ed3f-46f7-85c8-966536f24a13": {
      "main": [
        [
          {
            "node": "c631ce99-a672-499f-bbf3-e740ef431884",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "c075a745-ed50-459f-87dd-101a559e4523": {
      "main": [
        [
          {
            "node": "94f78e9b-f9eb-4179-9844-d6e23bc79751",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Preguntas frecuentes

¿Cómo usar este flujo de trabajo?

Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.

¿En qué escenarios es adecuado este flujo de trabajo?

Avanzado

¿Es de pago?

Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.