使用 RSS、GPT-4.1-MINI 和 BrightData 监控并提取种子轮初创公司数据到 Excel

Name: 使用 RSS、GPT-4.1-MINI 和 BrightData 监控并提取种子轮初创公司数据到 Excel
Rating: 4.5 (10 reviews)
Author: Eumentis

中级

这是一个Lead Generation, AI Summarization领域的自动化工作流，包含 14 个节点。主要使用 Set, Code, Markdown, HttpRequest, OpenAi 等节点。使用 RSS、GPT-4.1-MINI 和 BrightData 监控并提取种子轮初创公司数据到 Excel

前置要求

•可能需要目标 API 的认证凭证
•OpenAI API Key

使用的节点 (14)

分类

潜在客户开发

AI 摘要总结

工作流预览

可视化展示节点连接关系，支持缩放和平移

编辑字段

消息模型

Markdown

重构文章链接

添加文章链接

RSS订阅触发器

获取文章页面

筛选公司数据

将数据添加到Excel表格

React Flow

导出工作流

复制以下 JSON 配置到 n8n 导入，即可使用此工作流

{
  "meta": {
    "instanceId": "588297c7214f1c4e25d370806d33145b7a547bf66f8157b64edb38d64fc3c5f2",
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "66526413-badf-48cc-b08d-29a87490bf75",
      "name": "编辑字段",
      "type": "n8n-nodes-base.set",
      "notes": "Filter the",
      "position": [
        1024,
        176
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "28ed03f9-1e17-432f-9438-6484aab19e35",
              "name": "",
              "type": "array",
              "value": "={{ $json.choices.map(choice => choice.message.content) }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "c527cac4-fb30-48f0-82f2-f516aa266ce5",
      "name": "消息模型",
      "type": "@n8n/n8n-nodes-langchain.openAi",
      "notes": "get seed funded companay data",
      "position": [
        640,
        176
      ],
      "parameters": {
        "modelId": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4.1-mini",
          "cachedResultName": "GPT-4.1-MINI"
        },
        "options": {},
        "messages": {
          "values": [
            {
              "role": "system",
              "content": "You are an AI designed to extract key information from a specified news article related to startup funding. You will receive the link to the article and its content in markdown format. Your task is to meticulously gather relevant data concerning startup funding as outlined below.\n\n### Input:\n- The URL of the news article discussing recent startup funding events.\n- The complete markdown text of the article.\n\n### Tasks:\n1. Review the provided article content and extract necessary information regarding companies that have received seed funding. If the article contains multiple instances of seed funding data, ensure you gather details for each company without addressing any generic explanations of seed funding itself.\n2. Dont use the article url for extracting the data. use it only for output\n\n3. Extract the following information in JSON format for each company reported in the article:\n\n   - **companyName**: Name of the startup company.\n   - **companyWebsite**: Official website of the company (do not reference any URLs provided in the markdown).\n   - **companyLinkedIn**: URL of the company's LinkedIn page.\n   - **fundingAmount**: The total amount raised in this funding round (e.g., \"£950,000\" or \"$1.2 million\").\n   - **founderName**: An array containing the full names of all the founders.\n   - **founderLinkedIn**: An array of LinkedIn profile URLs for each founder (set to null if not available).\n   - **articleUrl**: Return the input article URL instead of the article content.\n\n### Output Format:\n- Provide your output strictly in JSON format, ensuring proper structure even if some fields contain null values. If multiple companies are mentioned, return an array of objects, each representing a different company.\n\n### JSON Example:\n```json\n[\n  {\n    \"companyName\": \"Sample Startup 1\",\n    \"companyWebsite\": \"https://www.samplestartup1.com\",\n    \"companyLinkedIn\": \"https://www.linkedin.com/company/sample-startup-1\",\n    \"fundingAmount\": \"$1.5 million\",\n    \"founderName\": [\"John Doe\", \"Jane Smith\"],\n    \"founderLinkedIn\": [\"https://www.linkedin.com/in/johndoe\", null],\n    \"articleUrl\": \"https://www.example.com/sample-article\"\n  },\n  {\n    \"companyName\": \"Sample Startup 2\",\n    \"companyWebsite\": \"https://www.samplestartup2.com\",\n    \"companyLinkedIn\": \"https://www.linkedin.com/company/sample-startup-2\",\n    \"fundingAmount\": \"$950,000\",\n    \"founderName\": [\"Alice Johnson\"],\n    \"founderLinkedIn\": [null],\n    \"articleUrl\": \"https://www.example.com/sample-article\"\n  }\n]\n```\n\n### Guidelines:\n- Utilize only verified information from the article provided.\n- Set any unavailable fields to null.\n- Avoid seeking additional information from external websites.\n- Refrain from providing interpretations; stick strictly to the facts as presented in the markdown content."
            },
            {
              "content": "=\narticle content in markdown format : {{ $json.data }}\narticle link : {{ $json.link }}\n"
            }
          ]
        },
        "simplify": false,
        "jsonOutput": true
      },
      "notesInFlow": true,
      "typeVersion": 1.8
    },
    {
      "id": "4c6baf0d-c407-40cb-8d4c-89f1a716de24",
      "name": "Markdown",
      "type": "n8n-nodes-base.markdown",
      "onError": "continueRegularOutput",
      "position": [
        416,
        176
      ],
      "parameters": {
        "html": "={{ $json.body }}",
        "options": {
          "ignore": "head, script, img",
          "useLinkReferenceDefinitions": true
        }
      },
      "typeVersion": 1
    },
    {
      "id": "a0d497c0-81cb-4835-a632-42035ddc01e8",
      "name": "重构文章链接",
      "type": "n8n-nodes-base.code",
      "notes": "Get the redirect URL",
      "position": [
        -256,
        176
      ],
      "parameters": {
        "jsCode": "/** \n* Loop for extracting the valid article URL from the   redirect URL\n*/\nfor (const item of $input.all()) {\n  /** Redirect URL */\n  const rawLink = item.json.link;\n\n  let extractedUrl = rawLink;\n\n  /**\n  * Actual URL is start from \"&Url\" to \"&\" \n  * It will match and extract the URL\n  */\n  const match = rawLink.match(/[?&]url=([^&]+)/);\n  \n  if (match && match[1]) {\n    /** Decode the URL-encoded value */\n    extractedUrl = decodeURIComponent(match[1]);\n  }\n\n  /** Replace the redirect URL with actual URL */\n  item.json.link = extractedUrl;\n  \n}\nreturn $input.all();"
      },
      "notesInFlow": true,
      "typeVersion": 2
    },
    {
      "id": "8df2524a-ba30-48a0-afe7-059187fd334b",
      "name": "添加文章链接",
      "type": "n8n-nodes-base.code",
      "position": [
        192,
        176
      ],
      "parameters": {
        "jsCode": "/**\n * This code will integrate the article link from Refactor article link node and output of get article page node\n */\n\n/** Input for get articel page node */\nconst inputForBightData = $items(\"Refactor article link\"); \n\n/** Output of get articel page node */\nconst outputOfBightData = $input.all(); // from the BrightData response\n\n\nreturn outputOfBightData.map((item, index) => {\n  const input = inputForBightData[index].json;\n  const output = item.json;\n\n  return {\n    json: {\n      ...output,\n      link: input.link // Add the link from the original input\n    }\n  };\n});"
      },
      "typeVersion": 2
    },
    {
      "id": "0d725bba-0d9e-4e18-a6e8-3fb10bb5835a",
      "name": "RSS订阅触发器",
      "type": "n8n-nodes-base.rssFeedReadTrigger",
      "position": [
        -464,
        176
      ],
      "parameters": {
        "feedUrl": "https://www.google.co.in/alerts/feeds/02881064610578318478/7170584238880554951",
        "pollTimes": {
          "item": [
            {
              "mode": "everyX",
              "unit": "minutes",
              "value": 5
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "30fa6ec5-7cb1-45cb-bfce-169dc5e284f6",
      "name": "便签",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -528,
        -32
      ],
      "parameters": {
        "width": 420,
        "height": 380,
        "content": "## 触发器与文章发现"
      },
      "typeVersion": 1
    },
    {
      "id": "c705f45e-9063-4738-85d8-2f67bea53ea5",
      "name": "便签1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -64,
        -32
      ],
      "parameters": {
        "width": 620,
        "height": 380,
        "content": "## 内容抓取与准备"
      },
      "typeVersion": 1
    },
    {
      "id": "5cdcd2db-5e38-4245-8da1-281e28f238cc",
      "name": "便签2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        592,
        -32
      ],
      "parameters": {
        "width": 380,
        "height": 380,
        "content": "## 使用AI进行数据提取"
      },
      "typeVersion": 1
    },
    {
      "id": "ba4045f1-628b-4b6c-aef3-40f00c3f0e4e",
      "name": "便签3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        992,
        -32
      ],
      "parameters": {
        "width": 380,
        "height": 380,
        "content": "## 从嵌套数据提取有效初创公司条目"
      },
      "typeVersion": 1
    },
    {
      "id": "c5a79532-75a6-40b5-b7c1-a911ecb5cf82",
      "name": "便签4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1392,
        -32
      ],
      "parameters": {
        "width": 280,
        "height": 380,
        "content": "## 将数据追加到Excel表格"
      },
      "typeVersion": 1
    },
    {
      "id": "edad49fb-c923-4374-b9e2-87eeb1e2630a",
      "name": "获取文章页面",
      "type": "@brightdata/n8n-nodes-brightdata.brightData",
      "onError": "continueRegularOutput",
      "position": [
        -32,
        176
      ],
      "parameters": {
        "url": "={{ $json.link }}",
        "zone": {
          "__rl": true,
          "mode": "list",
          "value": "web_unlocker1"
        },
        "format": "json",
        "country": {
          "__rl": true,
          "mode": "list",
          "value": "us"
        },
        "requestOptions": {}
      },
      "retryOnFail": false,
      "typeVersion": 1
    },
    {
      "id": "684530ce-8472-4312-a3c6-6fc0c9c8ff84",
      "name": "筛选公司数据",
      "type": "n8n-nodes-base.code",
      "position": [
        1232,
        176
      ],
      "parameters": {
        "jsCode": "/** \n * this code will generate the array of company details by using the row and unstructured data from previous node\n * It also remove the duplicate entry\n*/\n\nconst results = [];\nconst seenCompanyNames = new Set();\n\nfunction extractValidStartups(obj) {\n  if (Array.isArray(obj)) {\n    for (const item of obj) {\n      extractValidStartups(item);\n    }\n  } else if (typeof obj === 'object' && obj !== null) {\n    // Skip if it's an error object\n    if (obj.error) return;\n\n    // Check if it looks like a startup object\n    if (obj.companyName) {\n      const key = obj.companyName.trim().toLowerCase(); // normalize name\n      if (!seenCompanyNames.has(key)) {\n        seenCompanyNames.add(key);\n        results.push({ json: obj });\n      }\n      return;\n    }\n\n    // Otherwise, recursively search its values\n    for (const key in obj) {\n      extractValidStartups(obj[key]);\n    }\n  }\n}\n\nfor (const item of $input.all()) {\n  const root = item.json[\"\"];\n  if (!Array.isArray(root)) continue;\n\n  for (const entry of root) {\n    extractValidStartups(entry);\n  }\n}\n\nreturn results;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "8c50058f-58c4-424e-a45a-ea27df89a47d",
      "name": "将数据添加到Excel表格",
      "type": "n8n-nodes-base.httpRequest",
      "onError": "continueErrorOutput",
      "position": [
        1456,
        176
      ],
      "parameters": {
        "url": "https://graph.microsoft.com/v1.0/drives/{{drive-id}}/items/{{file-id}}/workbook/tables/{ {{ sheet-id }} }/rows",
        "method": "POST",
        "options": {
          "batching": {
            "batch": {
              "batchSize": 1,
              "batchInterval": 3000
            }
          }
        },
        "jsonBody": "={\n  \"values\": [\n    {{ $input.all().map((item, index) => \n      `${index > 0 ? ',' : ''}[` +\n      `\"${$now.format('yyyy-MM-dd \\'at\\' T')}\",` +\n      `\"${item.json.companyName || \"-\"}\",` +\n      `\"${item.json.companyWebsite || \"-\"}\",` +\n      `\"${item.json.companyLinkedIn || \"-\"}\",` +\n      `\"${item.json.fundingAmount || \"-\"}\",` +\n      `\"${Array.isArray(item.json.founderName) && item.json.founderName.filter(n => n).length > 0 \n          ? item.json.founderName.filter(n => n).join(', ') \n          : \"-\" }\",` +\n      `\"${Array.isArray(item.json.founderLinkedIn) && item.json.founderLinkedIn.filter(n => n).length > 0 \n          ? item.json.founderLinkedIn.filter(n => n).join(', ') \n          : \"-\" }\",` +\n      `\"${item.json.articleUrl || \"-\"}\"` +\n      `]`\n    ).join('\\n') }}\n  ]\n}",
        "sendBody": true,
        "specifyBody": "json",
        "authentication": "genericCredentialType",
        "genericAuthType": "oAuth2Api"
      },
      "executeOnce": true,
      "retryOnFail": true,
      "typeVersion": 4.2
    }
  ],
  "pinData": {},
  "connections": {
    "Markdown": {
      "main": [
        [
          {
            "node": "Message a model",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Edit Fields": {
      "main": [
        [
          {
            "node": "Filter company data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message a model": {
      "main": [
        [
          {
            "node": "Edit Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Add article link": {
      "main": [
        [
          {
            "node": "Markdown",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get article Page": {
      "main": [
        [
          {
            "node": "Add article link",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "RSS Feed Trigger": {
      "main": [
        [
          {
            "node": "Refactor article link",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter company data": {
      "main": [
        [
          {
            "node": "Add data into excel sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Refactor article link": {
      "main": [
        [
          {
            "node": "Get article Page",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

常见问题

如何使用这个工作流？

复制上方的 JSON 配置代码，在您的 n8n 实例中创建新工作流并选择「从 JSON 导入」，粘贴配置后根据需要修改凭证设置即可。

这个工作流适合什么场景？

中级 - 潜在客户开发, AI 摘要总结

需要付费吗？

本工作流完全免费，您可以直接导入使用。但请注意，工作流中使用的第三方服务（如 OpenAI API）可能需要您自行付费。

使用 RSS、GPT-4.1-MINI 和 BrightData 监控并提取种子轮初创公司数据到 Excel

使用的节点 (14)

分类

如何使用这个工作流？

这个工作流适合什么场景？

需要付费吗？

相关工作流推荐