Extracción de datos de sitios web a partir de entradas de formularios (Gemini 2.5 flash + Gmail)

Intermedio

Este es unAI Summarization, Multimodal AIflujo de automatización del dominio deautomatización que contiene 13 nodos.Utiliza principalmente nodos como Html, Gmail, FormTrigger, HttpRequest, ChainLlm. Extraer datos de sitios web específicos usando entradas de formulario, Gemini 2.5 flash y Gmail

Requisitos previos
  • Cuenta de Google y credenciales de API de Gmail
  • Pueden requerirse credenciales de autenticación para la API de destino
  • Clave de API de Google Gemini
Vista previa del flujo de trabajo
Visualización de las conexiones entre nodos, con soporte para zoom y panorámica
Exportar flujo de trabajo
Copie la siguiente configuración JSON en n8n para importar y usar este flujo de trabajo
{
  "meta": {
    "instanceId": "d1786ab0d745a7498abf13a9c2cdabb1374c006e889b79eef64ce0386b8f8a41",
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "6d85bf32-59a5-4644-b5b0-d31aaf677bda",
      "name": "Analizador de salida estructurada",
      "type": "@n8n/n8n-nodes-langchain.outputParserStructured",
      "position": [
        640,
        200
      ],
      "parameters": {
        "jsonSchemaExample": "{\n    \"result\": \"extracted value(s)\"\n}"
      },
      "typeVersion": 1.2
    },
    {
      "id": "d011794a-d4bf-4750-9ad1-3cb0df662aff",
      "name": "Obtener HTML desde URL fuente",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        40,
        0
      ],
      "parameters": {
        "url": "={{ $json['Source URL'] }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "389ac2ce-39d8-4bc9-a1af-7ea3dba0240d",
      "name": "Cadena LLM de extractor de datos",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        460,
        0
      ],
      "parameters": {
        "text": "=Your task is to extract the exact information specified by the user.\n\nUser’s extraction request:\n\"{{ $('Web Scraper form submission').item.json['Data to extract'] }}\"\n\nRules:\n1. Extract ONLY the requested information.\n2. If multiple matches exist, combine them into a single string separated by commas.\n3. Do NOT add explanations or extra text—output only the extracted data.\n4. Maintain the original values unless formatting is requested.\n5. If no matches are found, return: { \"result\": \"No data found\" }.\n6. Always return the response in this format:\n{\n    \"result\": \"extracted value(s)\"\n}\n\nHere is the source data:\n{{ $json.body }}\n",
        "promptType": "define",
        "hasOutputParser": true
      },
      "typeVersion": 1.6
    },
    {
      "id": "a73e7657-4a79-4b50-973f-e27d406f0278",
      "name": "Gmail - Enviar resultado",
      "type": "n8n-nodes-base.gmail",
      "position": [
        880,
        0
      ],
      "webhookId": "fa29cdcc-e8e9-449a-a6a4-88a874e2a0c5",
      "parameters": {
        "sendTo": " template_data_extactor_replace_me@yopmail.com",
        "message": "=Your web scraping task has been completed.\n\nSource URL:\n{{ $('Web Scraper form submission').item.json['Source URL'] }}\n\nData Requested:\n{{ $('Web Scraper form submission').item.json['Data to extract'] }}\n\nExtracted Result:\n{{ $json.output.result }}\n\nThank you for using our web scraping automation.",
        "options": {
          "appendAttribution": false
        },
        "subject": "=✅ Web Scraping Result for {{ $('Web Scraper form submission').item.json['Source URL'] }}",
        "emailType": "text"
      },
      "credentials": {
        "gmailOAuth2": {
          "id": "CeBpTZBQSAMKVKJY",
          "name": "Gmail account (Billy Email 2)"
        }
      },
      "typeVersion": 2.1
    },
    {
      "id": "ec13f750-6015-4c17-b062-9036b0ae8697",
      "name": "Envío de formulario con raspador web",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        -160,
        0
      ],
      "webhookId": "a757a352-5ab2-4fa7-a8ee-08bb5d3448cc",
      "parameters": {
        "options": {},
        "formTitle": "Web Scraper Form",
        "formFields": {
          "values": [
            {
              "fieldLabel": "Source URL"
            },
            {
              "fieldLabel": "Data to extract"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "1ed56fc9-5124-454c-8b4d-ee2c9e72076c",
      "name": "Nota adhesiva4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1100,
        -260
      ],
      "parameters": {
        "color": 4,
        "width": 380,
        "height": 760,
        "content": "# 👋 Hi, I’m Billy!\n\nI help businesses build **n8n workflows** & **AI automation projects**.  \nNeed help with n8n or AI Automation projects? \nContact me and let’s build your automation together.\n\n📩 **Email:** billychartanto@gmail.com  \n🤝 **n8n Creator:** [n8n.io/creators/billy](https://n8n.io/creators/billy/)\n🌐 **My n8n Projects:** [billychristi.com/n8n](https://www.billychristi.com/n8n)  \n\n\n\n---\n💡 Feel free to get in touch if you’d like help on your next automation project or if you have any feedback or thoughts to share.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "4b1b75f3-95fa-4a42-8180-cb47ef7c3a02",
      "name": "Nota adhesiva",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -760,
        -80
      ],
      "parameters": {
        "color": 4,
        "width": 500,
        "height": 360,
        "content": "## SETUP REQUIRED\n\nWorkflow Configurations:\n- Update the email recipient in the Gmail node (currently set to template_data_extactor_replace_me@yopmail.com)\n- Adjust the JSON schema in the Structured Output Parser if you need different output formats\n- Modify the LLM prompt in the Data Extractor LLM Chain based on your specific extraction requirements\n\nRequired Credentials:\n- Google Gemini API Key (Google PaLM API account)\n- Gmail Credential for sending result emails"
      },
      "typeVersion": 1
    },
    {
      "id": "35116b5c-c3ab-4c47-9914-c6cecdf3e3b4",
      "name": "Nota adhesiva1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -860,
        420
      ],
      "parameters": {
        "color": 4,
        "width": 600,
        "height": 400,
        "content": "## 🔍Extract Specific Website Data with Form Input, Gemini 2.5 flash and Gmail Delivery\n\nWhat This Template Does:\n\n- Provides a web form interface for users to submit scraping requests\n- Accepts any website URL and custom data extraction requirements\n- Fetches HTML content from the specified source URL\n- Uses Google Gemini AI to intelligently extract only the requested information\n- Processes raw HTML content and returns structured JSON results\n- Automatically sends extraction results via Gmail with detailed reporting\n- Handles various data types and formats while maintaining original values unless formatting is requested\n"
      },
      "typeVersion": 1
    },
    {
      "id": "b1526d24-29c6-4552-8964-84953933494b",
      "name": "Nota adhesiva2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -220,
        420
      ],
      "parameters": {
        "color": 4,
        "width": 1000,
        "height": 300,
        "content": "## 📋 WORKFLOW PROCESS OVERVIEW\n\nStep 1: 📝 Web Scraper Form Submission triggers the workflow when users submit URL and extraction requirements\nStep 2: 🌐 Get HTML from Source URL fetches the complete HTML content from the provided website\nStep 3: 🔧 HTML Extractor processes the raw HTML and extracts the body content for analysis\nStep 4: 🤖 Data Extractor LLM Chain uses Google Gemini AI to analyze content and extract only the specific data requested by the user\nStep 5: 📊 Structured Output Parser formats the AI response into clean JSON structure with standardized format\nStep 6: 📧 Gmail Send Result delivers the extraction results via email including:\n  - Original source URL\n  - Data extraction request details  \n  - Clean extracted results\n  - Professional formatting with success confirmation"
      },
      "typeVersion": 1
    },
    {
      "id": "461e0675-1755-41eb-b445-20fd0d733d8c",
      "name": "Nota adhesiva3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        380,
        -180
      ],
      "parameters": {
        "color": 4,
        "width": 400,
        "height": 560,
        "content": "## Data Extractor LLM Chain  \nThis is where we extract the content based on the user request  \n\nConfiguration:  \nYou can update the prompt and the model here to adjust to your use case.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "e147282c-51f0-4f76-8416-bfeb00a47f64",
      "name": "Nota adhesiva5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        800,
        -160
      ],
      "parameters": {
        "color": 4,
        "width": 260,
        "height": 340,
        "content": "## Gmail - Send Results  \n\nConfiguration:  \nUpdate the target email  \nUpdate the email subject and body  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "adb7c780-8210-4dab-ad4f-9e7fc366cd16",
      "name": "Modelo de chat Google Gemini",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        440,
        200
      ],
      "parameters": {
        "options": {},
        "modelName": "models/gemini-2.5-flash"
      },
      "credentials": {
        "googlePalmApi": {
          "id": "gdaO8lU3HwsldifM",
          "name": "Google Gemini(PaLM) Api account"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "25e49c2a-9018-4503-8ce1-a95699e9941c",
      "name": "Extractor HTML",
      "type": "n8n-nodes-base.html",
      "position": [
        220,
        0
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "body",
              "cssSelector": "body"
            }
          ]
        }
      },
      "typeVersion": 1.2
    }
  ],
  "pinData": {},
  "connections": {
    "25e49c2a-9018-4503-8ce1-a95699e9941c": {
      "main": [
        [
          {
            "node": "389ac2ce-39d8-4bc9-a1af-7ea3dba0240d",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "389ac2ce-39d8-4bc9-a1af-7ea3dba0240d": {
      "main": [
        [
          {
            "node": "a73e7657-4a79-4b50-973f-e27d406f0278",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "d011794a-d4bf-4750-9ad1-3cb0df662aff": {
      "main": [
        [
          {
            "node": "25e49c2a-9018-4503-8ce1-a95699e9941c",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "adb7c780-8210-4dab-ad4f-9e7fc366cd16": {
      "ai_languageModel": [
        [
          {
            "node": "389ac2ce-39d8-4bc9-a1af-7ea3dba0240d",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "6d85bf32-59a5-4644-b5b0-d31aaf677bda": {
      "ai_outputParser": [
        [
          {
            "node": "389ac2ce-39d8-4bc9-a1af-7ea3dba0240d",
            "type": "ai_outputParser",
            "index": 0
          }
        ]
      ]
    },
    "ec13f750-6015-4c17-b062-9036b0ae8697": {
      "main": [
        [
          {
            "node": "d011794a-d4bf-4750-9ad1-3cb0df662aff",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Preguntas frecuentes

¿Cómo usar este flujo de trabajo?

Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.

¿En qué escenarios es adecuado este flujo de trabajo?

Intermedio - Resumen de IA, IA Multimodal

¿Es de pago?

Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.

Información del flujo de trabajo
Nivel de dificultad
Intermedio
Número de nodos13
Categoría2
Tipos de nodos8
Descripción de la dificultad

Adecuado para usuarios con experiencia intermedia, flujos de trabajo de complejidad media con 6-15 nodos

Autor
Billy Christi

Billy Christi

@billy

I build scalable automation systems with n8n to help businesses save time and cut costs. 💼 n8n expert available for new projects 📩 billychartanto@gmail.com

Enlaces externos
Ver en n8n.io

Compartir este flujo de trabajo

Categorías

Categorías: 34