Live-Autoscraping von Y Combinator-Startups mit Apify und Google Sheets

Fortgeschritten

Dies ist ein Lead Generation, Multimodal AI-Bereich Automatisierungsworkflow mit 9 Nodes. Hauptsächlich werden GoogleSheets, Apify, ManualTrigger und andere Nodes verwendet. Automatisierung des Scrapens von Y Combinator-Startups mit Apify und Google Sheets

Voraussetzungen
  • Google Sheets API-Anmeldedaten
Workflow-Vorschau
Visualisierung der Node-Verbindungen, mit Zoom und Pan
Workflow exportieren
Kopieren Sie die folgende JSON-Konfiguration und importieren Sie sie in n8n
{
  "id": "f0l6j5GkLScFOfqK",
  "meta": {
    "instanceId": "1a54c41d9050a8f1fa6f74ca858828ad9fb97b9fafa3e9760e576171c531a787",
    "templateCredsSetupCompleted": true
  },
  "name": "Live-Automate Scraping Y Combinator Startups with Apify & Google Sheets",
  "tags": [],
  "nodes": [
    {
      "id": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
      "name": "Actor ausführen",
      "type": "@apify/n8n-nodes-apify.apify",
      "position": [
        1632,
        1632
      ],
      "parameters": {
        "actorId": {
          "__rl": true,
          "mode": "list",
          "value": "XXsXDaNQLjoF4lgmU",
          "cachedResultUrl": "https://console.apify.com/actors/XXsXDaNQLjoF4lgmU/input",
          "cachedResultName": "Y Combinator Directory Scraper | Fast & Reliable | $4.5 / 1K (fatihtahta/y-combinator-directory-scraper)"
        },
        "customBody": "{\n  \"maxCompanies\": 5,\n  \"startUrls\": \"{https://www.ycombinator.com/companies?industry=Fintech&regions=America%20%2F%20Canada&team_size=%5B%221%22%2C%2225%22%5D}\",\n  \"proxyConfiguration\": {\n    \"useApifyProxy\": true\n  }\n}"
      },
      "credentials": {
        "apifyApi": {
          "id": "8decwrzbYTySCGCT",
          "name": "Apify account 4"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "e524c759-a193-42b6-9553-683656413431",
      "name": "Datensatzelemente abrufen",
      "type": "@apify/n8n-nodes-apify.apify",
      "position": [
        2432,
        1968
      ],
      "parameters": {
        "resource": "Datasets",
        "datasetId": "={{ $json.defaultDatasetId }}"
      },
      "credentials": {
        "apifyApi": {
          "id": "8decwrzbYTySCGCT",
          "name": "Apify account 4"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "4eea9bab-911c-4480-9073-831b8ac46571",
      "name": "Haftnotiz",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        608,
        1744
      ],
      "parameters": {
        "width": 528,
        "height": 336,
        "content": "### **Step 1 – Manual Trigger**\n\n- The workflow begins with a **Manual Trigger node**, allowing you to start the process on demand.  \n- This approach ensures full control over when company data from **Y Combinator** is scraped and logged.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "b5814a97-7dd1-4488-8af3-6bf0af555d51",
      "name": "Workflow starten",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        816,
        1936
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "3eacc0a3-ca74-4405-ad0e-a25b9b4b964e",
      "name": "Haftnotiz1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1392,
        1424
      ],
      "parameters": {
        "color": 3,
        "width": 592,
        "height": 368,
        "content": "### **Step 2 – Apify Actor (Scrape Company Data)**\n\n- This step uses an **Apify Actor node** to scrape details of companies listed on **Y Combinator**.  \n- You need to provide the **URL of the Y Combinator search page** with your desired filters applied (e.g., industry, location, funding stage).  \n- The actor then extracts structured company data, including names, descriptions, websites, and other available details, preparing it for downstream logging and processing.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "d67e5ff1-ff84-4196-9a76-cc59215e4061",
      "name": "Haftnotiz2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2176,
        1760
      ],
      "parameters": {
        "color": 4,
        "width": 592,
        "height": 368,
        "content": "### **Step 3 – Apify Get Dataset Items**\n\n- This step uses the **Apify Get Dataset Items node** to fetch the actual company data generated by the Apify Actor in the previous step.  \n- The node requires the **Dataset ID** returned by the Apify Actor to retrieve structured results.  \n- The output includes detailed company information (e.g., name, description, website, location, sector), which is then prepared for logging into Google Sheets.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "04149226-1821-419d-b7c6-f2288de0f4cc",
      "name": "Haftnotiz3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3040,
        1104
      ],
      "parameters": {
        "color": 5,
        "width": 640,
        "height": 720,
        "content": "### **Step 4 – Add or Update Row in Google Sheet**\n\n- This step uses the **Google Sheets (Add or Update Row) node** to log the company data into a connected Google Sheet.  \n- You must **select the target Google Document and specific Sheet** where the data will be stored.  \n- Ensure the following columns are already created in the sheet (**case-sensitive**):  \n  - Company  \n  - Location  \n  - Website  \n  - LinkedIn  \n  - Founded  \n  - Description  \n  - Industry Tags  \n  - Founder 1 Name  \n  - Founder 1 LinkedIn  \n  - Founder 2 Name  \n  - Founder 2 LinkedIn  \n\n- The node will automatically add new rows or update existing entries, keeping the sheet clean and up to date with the latest scraped company details.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
      "name": "Daten zu Google Sheet hinzufügen",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3312,
        1616
      ],
      "parameters": {
        "columns": {
          "value": {
            "Company": "={{ $json.company_name }}",
            "Founded": "={{ $json.year_founded }}",
            "Website": "={{ $json.website }}",
            "LinkedIn": "={{ $json.company_linkedin }}",
            "Location": "={{ $json.company_location }}",
            "Description": "={{ $json.long_description }}",
            "Industry Tags": "={{ $json['tags/0'] }} {{ $json['tags/1'] }} {{ $json['tags/2'] }} {{ $json['tags/3'] }}",
            "Founder 1 Name": "={{ $json['founders/0/name'] }}",
            "Founder 2 Name": "={{ $json['founders/1/name'] }}",
            "Founder 1 LinkedIn": "={{ $json['founders/0/linkedin'] }}",
            "Founder 2 LinkedIn": "={{ $json['founders/1/linkedin'] }}"
          },
          "schema": [
            {
              "id": "Company",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Company",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Location",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Location",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Website",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Website",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "LinkedIn",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "LinkedIn",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founded",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founded",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Description",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Description",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Industry Tags",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Industry Tags",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 1 Name",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 1 Name",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 1 LinkedIn",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 1 LinkedIn",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 2 Name",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 2 Name",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 2 LinkedIn",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 2 LinkedIn",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "Company"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "appendOrUpdate",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit#gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit?usp=drivesdk",
          "cachedResultName": "YCom Apify Scrapped "
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "id": "dZG6jp43p2oX45HG",
          "name": "Google Sheets account 4-Smit"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "c8f614e2-2aa5-4f4a-8be9-090fb24bf616",
      "name": "Haftnotiz4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        368,
        944
      ],
      "parameters": {
        "color": 3,
        "width": 768,
        "height": 672,
        "content": "### **Step 0 – Prerequisites**\n\nBefore running the workflow, ensure the following configurations are complete:\n\n- **Apify Setup:**\n  - Connect your Apify account in n8n.  \n  - Select the **Y Combinator Directory Scraper** actor.  \n  - Paste the Y Combinator search URL (with filters applied) into the `searchUrls` parameter.  \n  - Adjust the `maxCompanies` parameter to control the number of companies scraped per run.  \n\n- **Google Sheets Setup:**\n  - Connect your Google account using **OAuth2 credentials** with both **Google Sheets** and **Google Drive** features enabled.  \n  - Ensure the target Google Sheet is created in advance with the following column headers (**case-sensitive**):  \n    - Company  \n    - Location  \n    - Website  \n    - LinkedIn  \n    - Founded  \n    - Description  \n    - Industry Tags  \n    - Founder 1 Name  \n    - Founder 1 LinkedIn  \n    - Founder 2 Name  \n    - Founder 2 LinkedIn  \n\n- **n8n Configuration:**\n  - Confirm that both Apify and Google integrations are properly authenticated and available in your workflow.\n"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "pinData": {},
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "36ae4ec1-b59a-49a4-b4e6-0f80bd2111f3",
  "connections": {
    "4d88b9f9-6909-47c8-91a5-c27ebc97de49": {
      "main": [
        [
          {
            "node": "e524c759-a193-42b6-9553-683656413431",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "b5814a97-7dd1-4488-8af3-6bf0af555d51": {
      "main": [
        [
          {
            "node": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "e524c759-a193-42b6-9553-683656413431": {
      "main": [
        [
          {
            "node": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Häufig gestellte Fragen

Wie verwende ich diesen Workflow?

Kopieren Sie den obigen JSON-Code, erstellen Sie einen neuen Workflow in Ihrer n8n-Instanz und wählen Sie "Aus JSON importieren". Fügen Sie die Konfiguration ein und passen Sie die Anmeldedaten nach Bedarf an.

Für welche Szenarien ist dieser Workflow geeignet?

Fortgeschritten - Lead-Generierung, Multimodales KI

Ist es kostenpflichtig?

Dieser Workflow ist völlig kostenlos. Beachten Sie jedoch, dass Drittanbieterdienste (wie OpenAI API), die im Workflow verwendet werden, möglicherweise kostenpflichtig sind.

Workflow-Informationen
Schwierigkeitsgrad
Fortgeschritten
Anzahl der Nodes9
Kategorie2
Node-Typen4
Schwierigkeitsbeschreibung

Für erfahrene Benutzer, mittelkomplexe Workflows mit 6-15 Nodes

Autor
Intuz

Intuz

@intuz

Workflow automation can help automate your routine activities and help saves $$$, as well as hours of time. As a boutique tech consulting company, Intuz help businesses with custom AI/ML, AI Workflow Automations, and software development. Automate your business workflow for: Sales Marketing Accounting Finance Operations E-Commerce Customer Support Admin & Backoffice Logistics & Supply Chain

Externe Links
Auf n8n.io ansehen

Diesen Workflow teilen

Kategorien

Kategorien: 34