Scraping en tiempo real de startups de Y Combinator con Apify y Google Sheets

Name: Scraping en tiempo real de startups de Y Combinator con Apify y Google Sheets
Rating: 4.5 (10 reviews)
Author: Intuz

Intermedio

Este es unLead Generation, Multimodal AIflujo de automatización del dominio deautomatización que contiene 9 nodos.Utiliza principalmente nodos como GoogleSheets, Apify, ManualTrigger. Automatización del scraping de startups de Y Combinator con Apify y Google Sheets

Requisitos previos

•Credenciales de API de Google Sheets

Nodos utilizados (9)

Categoría

Generación de leads

IA Multimodal

Vista previa del flujo de trabajo

Visualización de las conexiones entre nodos, con soporte para zoom y panorámica

Ejecutar un Actor

Obtener elementos del dataset

Iniciar flujo de trabajo

Agregar datos a la hoja Google

React Flow

Exportar flujo de trabajo

Copie la siguiente configuración JSON en n8n para importar y usar este flujo de trabajo

{
  "id": "f0l6j5GkLScFOfqK",
  "meta": {
    "instanceId": "1a54c41d9050a8f1fa6f74ca858828ad9fb97b9fafa3e9760e576171c531a787",
    "templateCredsSetupCompleted": true
  },
  "name": "Live-Automate Scraping Y Combinator Startups with Apify & Google Sheets",
  "tags": [],
  "nodes": [
    {
      "id": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
      "name": "Ejecutar un Actor",
      "type": "@apify/n8n-nodes-apify.apify",
      "position": [
        1632,
        1632
      ],
      "parameters": {
        "actorId": {
          "__rl": true,
          "mode": "list",
          "value": "XXsXDaNQLjoF4lgmU",
          "cachedResultUrl": "https://console.apify.com/actors/XXsXDaNQLjoF4lgmU/input",
          "cachedResultName": "Y Combinator Directory Scraper | Fast & Reliable | $4.5 / 1K (fatihtahta/y-combinator-directory-scraper)"
        },
        "customBody": "{\n  \"maxCompanies\": 5,\n  \"startUrls\": \"{https://www.ycombinator.com/companies?industry=Fintech&regions=America%20%2F%20Canada&team_size=%5B%221%22%2C%2225%22%5D}\",\n  \"proxyConfiguration\": {\n    \"useApifyProxy\": true\n  }\n}"
      },
      "credentials": {
        "apifyApi": {
          "id": "8decwrzbYTySCGCT",
          "name": "Apify account 4"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "e524c759-a193-42b6-9553-683656413431",
      "name": "Obtener elementos del dataset",
      "type": "@apify/n8n-nodes-apify.apify",
      "position": [
        2432,
        1968
      ],
      "parameters": {
        "resource": "Datasets",
        "datasetId": "={{ $json.defaultDatasetId }}"
      },
      "credentials": {
        "apifyApi": {
          "id": "8decwrzbYTySCGCT",
          "name": "Apify account 4"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "4eea9bab-911c-4480-9073-831b8ac46571",
      "name": "Nota adhesiva",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        608,
        1744
      ],
      "parameters": {
        "width": 528,
        "height": 336,
        "content": "### **Step 1 – Manual Trigger**\n\n- The workflow begins with a **Manual Trigger node**, allowing you to start the process on demand.  \n- This approach ensures full control over when company data from **Y Combinator** is scraped and logged.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "b5814a97-7dd1-4488-8af3-6bf0af555d51",
      "name": "Iniciar flujo de trabajo",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        816,
        1936
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "3eacc0a3-ca74-4405-ad0e-a25b9b4b964e",
      "name": "Nota adhesiva1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1392,
        1424
      ],
      "parameters": {
        "color": 3,
        "width": 592,
        "height": 368,
        "content": "### **Step 2 – Apify Actor (Scrape Company Data)**\n\n- This step uses an **Apify Actor node** to scrape details of companies listed on **Y Combinator**.  \n- You need to provide the **URL of the Y Combinator search page** with your desired filters applied (e.g., industry, location, funding stage).  \n- The actor then extracts structured company data, including names, descriptions, websites, and other available details, preparing it for downstream logging and processing.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "d67e5ff1-ff84-4196-9a76-cc59215e4061",
      "name": "Nota adhesiva2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2176,
        1760
      ],
      "parameters": {
        "color": 4,
        "width": 592,
        "height": 368,
        "content": "### **Step 3 – Apify Get Dataset Items**\n\n- This step uses the **Apify Get Dataset Items node** to fetch the actual company data generated by the Apify Actor in the previous step.  \n- The node requires the **Dataset ID** returned by the Apify Actor to retrieve structured results.  \n- The output includes detailed company information (e.g., name, description, website, location, sector), which is then prepared for logging into Google Sheets.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "04149226-1821-419d-b7c6-f2288de0f4cc",
      "name": "Nota adhesiva3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3040,
        1104
      ],
      "parameters": {
        "color": 5,
        "width": 640,
        "height": 720,
        "content": "### **Step 4 – Add or Update Row in Google Sheet**\n\n- This step uses the **Google Sheets (Add or Update Row) node** to log the company data into a connected Google Sheet.  \n- You must **select the target Google Document and specific Sheet** where the data will be stored.  \n- Ensure the following columns are already created in the sheet (**case-sensitive**):  \n  - Company  \n  - Location  \n  - Website  \n  - LinkedIn  \n  - Founded  \n  - Description  \n  - Industry Tags  \n  - Founder 1 Name  \n  - Founder 1 LinkedIn  \n  - Founder 2 Name  \n  - Founder 2 LinkedIn  \n\n- The node will automatically add new rows or update existing entries, keeping the sheet clean and up to date with the latest scraped company details.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
      "name": "Agregar datos a la hoja Google",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3312,
        1616
      ],
      "parameters": {
        "columns": {
          "value": {
            "Company": "={{ $json.company_name }}",
            "Founded": "={{ $json.year_founded }}",
            "Website": "={{ $json.website }}",
            "LinkedIn": "={{ $json.company_linkedin }}",
            "Location": "={{ $json.company_location }}",
            "Description": "={{ $json.long_description }}",
            "Industry Tags": "={{ $json['tags/0'] }} {{ $json['tags/1'] }} {{ $json['tags/2'] }} {{ $json['tags/3'] }}",
            "Founder 1 Name": "={{ $json['founders/0/name'] }}",
            "Founder 2 Name": "={{ $json['founders/1/name'] }}",
            "Founder 1 LinkedIn": "={{ $json['founders/0/linkedin'] }}",
            "Founder 2 LinkedIn": "={{ $json['founders/1/linkedin'] }}"
          },
          "schema": [
            {
              "id": "Company",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Company",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Location",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Location",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Website",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Website",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "LinkedIn",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "LinkedIn",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founded",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founded",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Description",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Description",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Industry Tags",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Industry Tags",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 1 Name",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 1 Name",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 1 LinkedIn",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 1 LinkedIn",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 2 Name",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 2 Name",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Founder 2 LinkedIn",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Founder 2 LinkedIn",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "Company"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "appendOrUpdate",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit#gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit?usp=drivesdk",
          "cachedResultName": "YCom Apify Scrapped "
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "id": "dZG6jp43p2oX45HG",
          "name": "Google Sheets account 4-Smit"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "c8f614e2-2aa5-4f4a-8be9-090fb24bf616",
      "name": "Nota adhesiva4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        368,
        944
      ],
      "parameters": {
        "color": 3,
        "width": 768,
        "height": 672,
        "content": "### **Step 0 – Prerequisites**\n\nBefore running the workflow, ensure the following configurations are complete:\n\n- **Apify Setup:**\n  - Connect your Apify account in n8n.  \n  - Select the **Y Combinator Directory Scraper** actor.  \n  - Paste the Y Combinator search URL (with filters applied) into the `searchUrls` parameter.  \n  - Adjust the `maxCompanies` parameter to control the number of companies scraped per run.  \n\n- **Google Sheets Setup:**\n  - Connect your Google account using **OAuth2 credentials** with both **Google Sheets** and **Google Drive** features enabled.  \n  - Ensure the target Google Sheet is created in advance with the following column headers (**case-sensitive**):  \n    - Company  \n    - Location  \n    - Website  \n    - LinkedIn  \n    - Founded  \n    - Description  \n    - Industry Tags  \n    - Founder 1 Name  \n    - Founder 1 LinkedIn  \n    - Founder 2 Name  \n    - Founder 2 LinkedIn  \n\n- **n8n Configuration:**\n  - Confirm that both Apify and Google integrations are properly authenticated and available in your workflow.\n"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "pinData": {},
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "36ae4ec1-b59a-49a4-b4e6-0f80bd2111f3",
  "connections": {
    "4d88b9f9-6909-47c8-91a5-c27ebc97de49": {
      "main": [
        [
          {
            "node": "e524c759-a193-42b6-9553-683656413431",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "b5814a97-7dd1-4488-8af3-6bf0af555d51": {
      "main": [
        [
          {
            "node": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "e524c759-a193-42b6-9553-683656413431": {
      "main": [
        [
          {
            "node": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Preguntas frecuentes

¿Cómo usar este flujo de trabajo?

Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.

¿En qué escenarios es adecuado este flujo de trabajo?

Intermedio - Generación de leads, IA Multimodal

¿Es de pago?

Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.

Flujos de trabajo relacionados recomendados

Generación automática de propuestas de Upwork en tiempo real usando Apify, Google Gemini y Sheets

Usar Apify, Google Gemini y Sheets para automatizar la generación de propuestas de Upwork de IA

Empresas financiadas por CB e investigación de información

Automatización de generación y contacto por correo a prospectos: Apify, Apollo.io, GPT-4 y Google Sheets

Generación automatizada de prospectos impulsados por IA para empleos de LinkedIn con Apify, Apollo.io y Google Gemini

Automatización de generación de prospectos de empleos de LinkedIn: Apify, Apollo.io y Google Gemini

Investigación automática de perfiles de LinkedIn en tiempo real y contacto externo con IA (usando Apify, Gemini y Sheets)

Automatización de investigación de perfiles y outreach por correo electrónico de LinkedIn con Apify, Gemini y Sheets

Campañas de correo electrónico hiperpersonalizadas con IA, Gmail y Google Sheets

Campañas de correo hiperpersonalizado usando IA, Gmail y Google Sheets

Automatización de desarrollo de ventas utilizando señales de empleos de LinkedIn, Apify, Apollo.io y Google Gemini

Basado en señales de empleos de LinkedIn, usar Apify y Google Gemini para generar outreach de ventas personalizados

Información del flujo de trabajo

Nivel de dificultad

Intermedio

Número de nodos9

Categoría2

Tipos de nodos4

Descripción de la dificultad

Adecuado para usuarios con experiencia intermedia, flujos de trabajo de complejidad media con 6-15 nodos

Autor

Intuz

@intuz

Workflow automation can help automate your routine activities and help saves $$$, as well as hours of time. As a boutique tech consulting company, Intuz help businesses with custom AI/ML, AI Workflow Automations, and software development. Automate your business workflow for: Sales Marketing Accounting Finance Operations E-Commerce Customer Support Admin & Backoffice Logistics & Supply Chain

Enlaces externos

Ver en n8n.io →

Compartir este flujo de trabajo

Scraping en tiempo real de startups de Y Combinator con Apify y Google Sheets

Nodos utilizados (9)

Categoría

¿Cómo usar este flujo de trabajo?

¿En qué escenarios es adecuado este flujo de trabajo?

¿Es de pago?

Flujos de trabajo relacionados recomendados

Categorías