Automatisierter Web-Crawler: Detaillierte Überwachung von Jobs/Produkten mit Telegram-Benachrichtigungen

Fortgeschritten

Dies ist ein Market Research, AI Summarization-Bereich Automatisierungsworkflow mit 6 Nodes. Hauptsächlich werden If, Cron, Function, Telegram, HtmlExtract und andere Nodes verwendet. Automatisierter Web-Crawler: Segmentierte Job-/Produktüberwachung mit Telegram-Benachrichtigungen

Voraussetzungen
  • Telegram Bot Token
  • Möglicherweise sind Ziel-API-Anmeldedaten erforderlich
Workflow-Vorschau
Visualisierung der Node-Verbindungen, mit Zoom und Pan
Workflow exportieren
Kopieren Sie die folgende JSON-Konfiguration und importieren Sie sie in n8n
{
  "nodes": [
    {
      "name": "Stündlicher Monitor-Trigger",
      "type": "n8n-nodes-base.cron",
      "notes": {
        "text": "### 1. Hourly Monitor Trigger\n\nThis `Cron` node will trigger the workflow automatically every **hour**.\n\n**To change the schedule:** Adjust the 'Mode' or set specific 'Hour' and 'Minute' values to match how often you want to check the website (e.g., every 4 hours, daily).",
        "position": "right"
      },
      "position": [
        240,
        300
      ],
      "parameters": {
        "mode": "everyHour",
        "options": {}
      },
      "typeVersion": 1,
      "id": "St-ndlicher-Monitor-Trigger-0"
    },
    {
      "name": "Webseiteninhalt abrufen",
      "type": "n8n-nodes-base.httpRequest",
      "notes": {
        "text": "### 2. Fetch Webpage Content\n\nThis `HTTP Request` node downloads the entire HTML content of the target webpage.\n\n**Setup:**\n1.  **URL:** **IMPORTANT:** Change `https://www.n8n.io/blog/` to the exact URL of the job board, product page, or any webpage you want to monitor.\n2.  **Response Format:** Ensure this is set to `string` (for HTML content).\n\n**Considerations:**\n* If the website requires login, you might need to add authentication headers or cookies (more advanced).\n* If the content loads dynamically with JavaScript after the initial page load, this method might not capture it. You'd need more advanced tools (like Puppeteer/Playwright in a `Code` node).",
        "position": "right"
      },
      "position": [
        460,
        300
      ],
      "parameters": {
        "url": "https://www.n8n.io/blog/",
        "options": {},
        "responseFormat": "string"
      },
      "typeVersion": 3,
      "id": "Webseiteninhalt-abrufen-1"
    },
    {
      "name": "Stellen-/Produktinformationen extrahieren",
      "type": "n8n-nodes-base.htmlExtract",
      "notes": {
        "text": "### 3. Extract Specific Data (`HTML Extract` - Key Node!)\n\nThis `HTML Extract` node is the core of the web scraping. It parses the HTML and pulls out specific data points based on CSS Selectors.\n\n**Setup (CRITICAL!):**\n1.  **HTML:** This field is already set to `{{ $node[\"Fetch Webpage Content\"].json.data }}`, taking the HTML from the previous node.\n2.  **Extract Operations:**\n    * **Change or Add Operations:** You'll need to define exactly *what* to extract.\n    * **Selector:** This is the most important part. You need to find the correct CSS selector for the data you want. \n        * **How to find:** Open the target webpage in your browser (Chrome/Firefox). Right-click on the specific text/element (e.g., a job title, a product price) and choose 'Inspect' or 'Inspect Element'. In the developer tools panel, right-click on the highlighted HTML code, then select 'Copy' -> **'Copy selector'** or 'Copy XPath'. Paste this into the 'Selector' field.\n    * **Attribute:** Usually `textContent` for visible text, or `href` for links, `src` for image URLs, etc.\n    * **Property Name:** Give it a meaningful name (e.g., `JobTitle`, `JobLink`, `ProductName`, `StockStatus`).\n\n**Example (from n8n blog):**\n* `h3.BlogItem_title__d78Xb` for blog post titles (`textContent`)\n* `a.BlogItem_blogItem__a_H6E` for blog post links (`href`)\n\n**Test this node carefully!** Run the workflow up to this point and inspect its output to ensure it extracts what you expect.",
        "position": "right"
      },
      "position": [
        700,
        300
      ],
      "parameters": {
        "html": "={{ $node[\"Fetch Webpage Content\"].json.data }}",
        "extractOperations": [
          {
            "options": {},
            "selector": "h3.BlogItem_title__d78Xb",
            "attribute": "textContent",
            "operation": "extract",
            "propertyName": "JobTitle"
          },
          {
            "options": {},
            "selector": "a.BlogItem_blogItem__a_H6E",
            "attribute": "href",
            "operation": "extract",
            "propertyName": "JobLink"
          }
        ]
      },
      "typeVersion": 1,
      "id": "Stellen--Produktinformationen-extrahieren-2"
    },
    {
      "name": "Wenn Einträge gefunden",
      "type": "n8n-nodes-base.if",
      "notes": {
        "text": "### 4. If Items Found (Conditional Check)\n\nThis `If` node checks if the 'Extract Job/Product Info' node actually found any items. If it did, the workflow continues down the 'True' path to send a notification.\n\n**No configuration needed**; it checks if the array of extracted items is not empty.",
        "position": "right"
      },
      "position": [
        940,
        300
      ],
      "parameters": {
        "conditions": [
          {
            "value1": "={{ $json.length }}",
            "value2": "0",
            "operation": "notEqual"
          }
        ]
      },
      "typeVersion": 1,
      "id": "Wenn-Eintr-ge-gefunden-3"
    },
    {
      "name": "Benachrichtigungsnachricht formatieren",
      "type": "n8n-nodes-base.function",
      "notes": {
        "text": "### 5. Format Notification Message\n\nThis `Function` node takes the extracted data and formats it into a human-readable message for your Telegram alert.\n\n**Customization:**\n* **Adjust `item.json.JobTitle`, `item.json.JobLink`, etc.:** Make sure these match the 'Property Name' you defined in the 'Extract Job/Product Info' node.\n* You can add more details or change the formatting here.\n\n**No configuration needed if your property names match the example.**",
        "position": "right"
      },
      "position": [
        1180,
        220
      ],
      "parameters": {
        "options": {},
        "function": "let summary = \"\";\n\nif (items.length > 0) {\n  summary = `**Found ${items.length} new/updated items!**\\n\\n`;\n  for (const item of items) {\n    // Assuming you extracted 'JobTitle' and 'JobLink' from HTML Extract\n    const title = item.json.JobTitle || item.json.ProductName || 'N/A';\n    const link = item.json.JobLink || 'No link';\n    const otherInfo = item.json.StockStatus ? ` (Status: ${item.json.StockStatus})` : '';\n    summary += `* **${title}**${otherInfo}\\n  Link: ${link}\\n\\n`;\n  }\n} else {\n  summary = \"No new items found during this check.\";\n}\n\nreturn [{ json: { notificationMessage: summary } }];"
      },
      "typeVersion": 1,
      "id": "Benachrichtigungsnachricht-formatieren-4"
    },
    {
      "name": "Send Telegram Alert",
      "type": "n8n-nodes-base.telegram",
      "notes": {
        "text": "### 6. Send Telegram Alert\n\nThis `Telegram` node sends the formatted notification message to your Telegram chat.\n\n**Setup:**\n1.  **Telegram Credential:** Click 'Credentials' and select 'New Credential'. Choose 'Telegram API'.\n    * You'll need a **Bot Token** from BotFather on Telegram (search for '@BotFather' in Telegram, type `/newbot`, follow instructions).\n2.  **Chat ID:** **IMPORTANT: You need your specific Telegram Chat ID.**\n    * **How to get it:** Send a message to your new bot. Then, open this URL in your browser: `https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates` (replace `<YOUR_BOT_TOKEN>` with your bot's token). Look for the `\"chat\": {\"id\": ...}` field; that's your Chat ID.\n    * Paste this ID into the 'Chat ID' field.\n3.  **Text:** This pulls the message from the 'Format Notification Message' node.\n4.  **Parse Mode:** Set to `Markdown` for bolding (`**`) and links.\n\n**Test this node by running the workflow (from the 'Hourly Monitor Trigger') and checking your Telegram!**",
        "position": "right"
      },
      "position": [
        1420,
        220
      ],
      "parameters": {
        "text": "={{ $json.notificationMessage }}",
        "chatId": "YOUR_TELEGRAM_CHAT_ID",
        "options": {},
        "parseMode": "Markdown"
      },
      "credentials": {
        "telegramApi": {
          "id": "YOUR_TELEGRAM_CREDENTIAL_ID",
          "resolve": false
        }
      },
      "typeVersion": 1,
      "id": "Send-Telegram-Alert-5"
    }
  ],
  "pinData": {},
  "version": 1,
  "connections": {
    "Wenn-Eintr-ge-gefunden-3": {
      "main": [
        [
          {
            "node": "Benachrichtigungsnachricht-formatieren-4",
            "type": "main"
          }
        ],
        []
      ]
    },
    "Webseiteninhalt-abrufen-1": {
      "main": [
        [
          {
            "node": "Stellen--Produktinformationen-extrahieren-2",
            "type": "main"
          }
        ]
      ]
    },
    "St-ndlicher-Monitor-Trigger-0": {
      "main": [
        [
          {
            "node": "Webseiteninhalt-abrufen-1",
            "type": "main"
          }
        ]
      ]
    },
    "Stellen--Produktinformationen-extrahieren-2": {
      "main": [
        [
          {
            "node": "Wenn-Eintr-ge-gefunden-3",
            "type": "main"
          }
        ]
      ]
    },
    "Benachrichtigungsnachricht-formatieren-4": {
      "main": [
        [
          {
            "node": "Send-Telegram-Alert-5",
            "type": "main"
          }
        ]
      ]
    }
  }
}
Häufig gestellte Fragen

Wie verwende ich diesen Workflow?

Kopieren Sie den obigen JSON-Code, erstellen Sie einen neuen Workflow in Ihrer n8n-Instanz und wählen Sie "Aus JSON importieren". Fügen Sie die Konfiguration ein und passen Sie die Anmeldedaten nach Bedarf an.

Für welche Szenarien ist dieser Workflow geeignet?

Fortgeschritten - Marktforschung, KI-Zusammenfassung

Ist es kostenpflichtig?

Dieser Workflow ist völlig kostenlos. Beachten Sie jedoch, dass Drittanbieterdienste (wie OpenAI API), die im Workflow verwendet werden, möglicherweise kostenpflichtig sind.

Workflow-Informationen
Schwierigkeitsgrad
Fortgeschritten
Anzahl der Nodes6
Kategorie2
Node-Typen6
Schwierigkeitsbeschreibung

Für erfahrene Benutzer, mittelkomplexe Workflows mit 6-15 Nodes

Autor
Piotr Sobolewski

Piotr Sobolewski

@piotrsobolewski

AI PhD with 7 years experience as a game dev CEO, currently teaching, helping others and building something new.

Externe Links
Auf n8n.io ansehen

Diesen Workflow teilen

Kategorien

Kategorien: 34