Web crawler automatisé : surveillance de postes d'emploi/produits segmentée et alertes Telegram

Intermédiaire

Ceci est unMarket Research, AI Summarizationworkflow d'automatisation du domainecontenant 6 nœuds.Utilise principalement des nœuds comme If, Cron, Function, Telegram, HtmlExtract. Web scraping automatisé : Surveillance des emplois/produits avec alertes Telegram

Prérequis
  • Token Bot Telegram
  • Peut nécessiter les informations d'identification d'authentification de l'API cible
Aperçu du workflow
Visualisation des connexions entre les nœuds, avec support du zoom et du déplacement
Exporter le workflow
Copiez la configuration JSON suivante dans n8n pour importer et utiliser ce workflow
{
  "nodes": [
    {
      "name": "Déclencheur de surveillance horaire",
      "type": "n8n-nodes-base.cron",
      "notes": {
        "text": "### 1. Hourly Monitor Trigger\n\nThis `Cron` node will trigger the workflow automatically every **hour**.\n\n**To change the schedule:** Adjust the 'Mode' or set specific 'Hour' and 'Minute' values to match how often you want to check the website (e.g., every 4 hours, daily).",
        "position": "right"
      },
      "position": [
        240,
        300
      ],
      "parameters": {
        "mode": "everyHour",
        "options": {}
      },
      "typeVersion": 1,
      "id": "D-clencheur-de-surveillance-horaire-0"
    },
    {
      "name": "Récupération du contenu web",
      "type": "n8n-nodes-base.httpRequest",
      "notes": {
        "text": "### 2. Fetch Webpage Content\n\nThis `HTTP Request` node downloads the entire HTML content of the target webpage.\n\n**Setup:**\n1.  **URL:** **IMPORTANT:** Change `https://www.n8n.io/blog/` to the exact URL of the job board, product page, or any webpage you want to monitor.\n2.  **Response Format:** Ensure this is set to `string` (for HTML content).\n\n**Considerations:**\n* If the website requires login, you might need to add authentication headers or cookies (more advanced).\n* If the content loads dynamically with JavaScript after the initial page load, this method might not capture it. You'd need more advanced tools (like Puppeteer/Playwright in a `Code` node).",
        "position": "right"
      },
      "position": [
        460,
        300
      ],
      "parameters": {
        "url": "https://www.n8n.io/blog/",
        "options": {},
        "responseFormat": "string"
      },
      "typeVersion": 3,
      "id": "R-cup-ration-du-contenu-web-1"
    },
    {
      "name": "Extraction des informations postes/produits",
      "type": "n8n-nodes-base.htmlExtract",
      "notes": {
        "text": "### 3. Extract Specific Data (`HTML Extract` - Key Node!)\n\nThis `HTML Extract` node is the core of the web scraping. It parses the HTML and pulls out specific data points based on CSS Selectors.\n\n**Setup (CRITICAL!):**\n1.  **HTML:** This field is already set to `{{ $node[\"Fetch Webpage Content\"].json.data }}`, taking the HTML from the previous node.\n2.  **Extract Operations:**\n    * **Change or Add Operations:** You'll need to define exactly *what* to extract.\n    * **Selector:** This is the most important part. You need to find the correct CSS selector for the data you want. \n        * **How to find:** Open the target webpage in your browser (Chrome/Firefox). Right-click on the specific text/element (e.g., a job title, a product price) and choose 'Inspect' or 'Inspect Element'. In the developer tools panel, right-click on the highlighted HTML code, then select 'Copy' -> **'Copy selector'** or 'Copy XPath'. Paste this into the 'Selector' field.\n    * **Attribute:** Usually `textContent` for visible text, or `href` for links, `src` for image URLs, etc.\n    * **Property Name:** Give it a meaningful name (e.g., `JobTitle`, `JobLink`, `ProductName`, `StockStatus`).\n\n**Example (from n8n blog):**\n* `h3.BlogItem_title__d78Xb` for blog post titles (`textContent`)\n* `a.BlogItem_blogItem__a_H6E` for blog post links (`href`)\n\n**Test this node carefully!** Run the workflow up to this point and inspect its output to ensure it extracts what you expect.",
        "position": "right"
      },
      "position": [
        700,
        300
      ],
      "parameters": {
        "html": "={{ $node[\"Fetch Webpage Content\"].json.data }}",
        "extractOperations": [
          {
            "options": {},
            "selector": "h3.BlogItem_title__d78Xb",
            "attribute": "textContent",
            "operation": "extract",
            "propertyName": "JobTitle"
          },
          {
            "options": {},
            "selector": "a.BlogItem_blogItem__a_H6E",
            "attribute": "href",
            "operation": "extract",
            "propertyName": "JobLink"
          }
        ]
      },
      "typeVersion": 1,
      "id": "Extraction-des-informations-postes-produits-2"
    },
    {
      "name": "Si éléments trouvés",
      "type": "n8n-nodes-base.if",
      "notes": {
        "text": "### 4. If Items Found (Conditional Check)\n\nThis `If` node checks if the 'Extract Job/Product Info' node actually found any items. If it did, the workflow continues down the 'True' path to send a notification.\n\n**No configuration needed**; it checks if the array of extracted items is not empty.",
        "position": "right"
      },
      "position": [
        940,
        300
      ],
      "parameters": {
        "conditions": [
          {
            "value1": "={{ $json.length }}",
            "value2": "0",
            "operation": "notEqual"
          }
        ]
      },
      "typeVersion": 1,
      "id": "Si-l-ments-trouv-s-3"
    },
    {
      "name": "Formatage du message de notification",
      "type": "n8n-nodes-base.function",
      "notes": {
        "text": "### 5. Format Notification Message\n\nThis `Function` node takes the extracted data and formats it into a human-readable message for your Telegram alert.\n\n**Customization:**\n* **Adjust `item.json.JobTitle`, `item.json.JobLink`, etc.:** Make sure these match the 'Property Name' you defined in the 'Extract Job/Product Info' node.\n* You can add more details or change the formatting here.\n\n**No configuration needed if your property names match the example.**",
        "position": "right"
      },
      "position": [
        1180,
        220
      ],
      "parameters": {
        "options": {},
        "function": "let summary = \"\";\n\nif (items.length > 0) {\n  summary = `**Found ${items.length} new/updated items!**\\n\\n`;\n  for (const item of items) {\n    // Assuming you extracted 'JobTitle' and 'JobLink' from HTML Extract\n    const title = item.json.JobTitle || item.json.ProductName || 'N/A';\n    const link = item.json.JobLink || 'No link';\n    const otherInfo = item.json.StockStatus ? ` (Status: ${item.json.StockStatus})` : '';\n    summary += `* **${title}**${otherInfo}\\n  Link: ${link}\\n\\n`;\n  }\n} else {\n  summary = \"No new items found during this check.\";\n}\n\nreturn [{ json: { notificationMessage: summary } }];"
      },
      "typeVersion": 1,
      "id": "Formatage-du-message-de-notification-4"
    },
    {
      "name": "Send Telegram Alert",
      "type": "n8n-nodes-base.telegram",
      "notes": {
        "text": "### 6. Send Telegram Alert\n\nThis `Telegram` node sends the formatted notification message to your Telegram chat.\n\n**Setup:**\n1.  **Telegram Credential:** Click 'Credentials' and select 'New Credential'. Choose 'Telegram API'.\n    * You'll need a **Bot Token** from BotFather on Telegram (search for '@BotFather' in Telegram, type `/newbot`, follow instructions).\n2.  **Chat ID:** **IMPORTANT: You need your specific Telegram Chat ID.**\n    * **How to get it:** Send a message to your new bot. Then, open this URL in your browser: `https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates` (replace `<YOUR_BOT_TOKEN>` with your bot's token). Look for the `\"chat\": {\"id\": ...}` field; that's your Chat ID.\n    * Paste this ID into the 'Chat ID' field.\n3.  **Text:** This pulls the message from the 'Format Notification Message' node.\n4.  **Parse Mode:** Set to `Markdown` for bolding (`**`) and links.\n\n**Test this node by running the workflow (from the 'Hourly Monitor Trigger') and checking your Telegram!**",
        "position": "right"
      },
      "position": [
        1420,
        220
      ],
      "parameters": {
        "text": "={{ $json.notificationMessage }}",
        "chatId": "YOUR_TELEGRAM_CHAT_ID",
        "options": {},
        "parseMode": "Markdown"
      },
      "credentials": {
        "telegramApi": {
          "id": "YOUR_TELEGRAM_CREDENTIAL_ID",
          "resolve": false
        }
      },
      "typeVersion": 1,
      "id": "Send-Telegram-Alert-5"
    }
  ],
  "pinData": {},
  "version": 1,
  "connections": {
    "Si-l-ments-trouv-s-3": {
      "main": [
        [
          {
            "node": "Formatage-du-message-de-notification-4",
            "type": "main"
          }
        ],
        []
      ]
    },
    "R-cup-ration-du-contenu-web-1": {
      "main": [
        [
          {
            "node": "Extraction-des-informations-postes-produits-2",
            "type": "main"
          }
        ]
      ]
    },
    "D-clencheur-de-surveillance-horaire-0": {
      "main": [
        [
          {
            "node": "R-cup-ration-du-contenu-web-1",
            "type": "main"
          }
        ]
      ]
    },
    "Extraction-des-informations-postes-produits-2": {
      "main": [
        [
          {
            "node": "Si-l-ments-trouv-s-3",
            "type": "main"
          }
        ]
      ]
    },
    "Formatage-du-message-de-notification-4": {
      "main": [
        [
          {
            "node": "Send-Telegram-Alert-5",
            "type": "main"
          }
        ]
      ]
    }
  }
}
Foire aux questions

Comment utiliser ce workflow ?

Copiez le code de configuration JSON ci-dessus, créez un nouveau workflow dans votre instance n8n et sélectionnez "Importer depuis le JSON", collez la configuration et modifiez les paramètres d'authentification selon vos besoins.

Dans quelles scénarios ce workflow est-il adapté ?

Intermédiaire - Étude de marché, Résumé IA

Est-ce payant ?

Ce workflow est entièrement gratuit et peut être utilisé directement. Veuillez noter que les services tiers utilisés dans le workflow (comme l'API OpenAI) peuvent nécessiter un paiement de votre part.

Informations sur le workflow
Niveau de difficulté
Intermédiaire
Nombre de nœuds6
Catégorie2
Types de nœuds6
Description de la difficulté

Adapté aux utilisateurs expérimentés, avec des workflows de complexité moyenne contenant 6-15 nœuds

Auteur
Piotr Sobolewski

Piotr Sobolewski

@piotrsobolewski

AI PhD with 7 years experience as a game dev CEO, currently teaching, helping others and building something new.

Liens externes
Voir sur n8n.io

Partager ce workflow

Catégories

Catégories: 34