Scraping en tiempo real de startups de Y Combinator con Apify y Google Sheets
Este es unLead Generation, Multimodal AIflujo de automatización del dominio deautomatización que contiene 9 nodos.Utiliza principalmente nodos como GoogleSheets, Apify, ManualTrigger. Automatización del scraping de startups de Y Combinator con Apify y Google Sheets
- •Credenciales de API de Google Sheets
Nodos utilizados (9)
Categoría
{
"id": "f0l6j5GkLScFOfqK",
"meta": {
"instanceId": "1a54c41d9050a8f1fa6f74ca858828ad9fb97b9fafa3e9760e576171c531a787",
"templateCredsSetupCompleted": true
},
"name": "Live-Automate Scraping Y Combinator Startups with Apify & Google Sheets",
"tags": [],
"nodes": [
{
"id": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
"name": "Ejecutar un Actor",
"type": "@apify/n8n-nodes-apify.apify",
"position": [
1632,
1632
],
"parameters": {
"actorId": {
"__rl": true,
"mode": "list",
"value": "XXsXDaNQLjoF4lgmU",
"cachedResultUrl": "https://console.apify.com/actors/XXsXDaNQLjoF4lgmU/input",
"cachedResultName": "Y Combinator Directory Scraper | Fast & Reliable | $4.5 / 1K (fatihtahta/y-combinator-directory-scraper)"
},
"customBody": "{\n \"maxCompanies\": 5,\n \"startUrls\": \"{https://www.ycombinator.com/companies?industry=Fintech®ions=America%20%2F%20Canada&team_size=%5B%221%22%2C%2225%22%5D}\",\n \"proxyConfiguration\": {\n \"useApifyProxy\": true\n }\n}"
},
"credentials": {
"apifyApi": {
"id": "8decwrzbYTySCGCT",
"name": "Apify account 4"
}
},
"typeVersion": 1
},
{
"id": "e524c759-a193-42b6-9553-683656413431",
"name": "Obtener elementos del dataset",
"type": "@apify/n8n-nodes-apify.apify",
"position": [
2432,
1968
],
"parameters": {
"resource": "Datasets",
"datasetId": "={{ $json.defaultDatasetId }}"
},
"credentials": {
"apifyApi": {
"id": "8decwrzbYTySCGCT",
"name": "Apify account 4"
}
},
"typeVersion": 1
},
{
"id": "4eea9bab-911c-4480-9073-831b8ac46571",
"name": "Nota adhesiva",
"type": "n8n-nodes-base.stickyNote",
"position": [
608,
1744
],
"parameters": {
"width": 528,
"height": 336,
"content": "### **Step 1 – Manual Trigger**\n\n- The workflow begins with a **Manual Trigger node**, allowing you to start the process on demand. \n- This approach ensures full control over when company data from **Y Combinator** is scraped and logged. \n"
},
"typeVersion": 1
},
{
"id": "b5814a97-7dd1-4488-8af3-6bf0af555d51",
"name": "Iniciar flujo de trabajo",
"type": "n8n-nodes-base.manualTrigger",
"position": [
816,
1936
],
"parameters": {},
"typeVersion": 1
},
{
"id": "3eacc0a3-ca74-4405-ad0e-a25b9b4b964e",
"name": "Nota adhesiva1",
"type": "n8n-nodes-base.stickyNote",
"position": [
1392,
1424
],
"parameters": {
"color": 3,
"width": 592,
"height": 368,
"content": "### **Step 2 – Apify Actor (Scrape Company Data)**\n\n- This step uses an **Apify Actor node** to scrape details of companies listed on **Y Combinator**. \n- You need to provide the **URL of the Y Combinator search page** with your desired filters applied (e.g., industry, location, funding stage). \n- The actor then extracts structured company data, including names, descriptions, websites, and other available details, preparing it for downstream logging and processing.\n"
},
"typeVersion": 1
},
{
"id": "d67e5ff1-ff84-4196-9a76-cc59215e4061",
"name": "Nota adhesiva2",
"type": "n8n-nodes-base.stickyNote",
"position": [
2176,
1760
],
"parameters": {
"color": 4,
"width": 592,
"height": 368,
"content": "### **Step 3 – Apify Get Dataset Items**\n\n- This step uses the **Apify Get Dataset Items node** to fetch the actual company data generated by the Apify Actor in the previous step. \n- The node requires the **Dataset ID** returned by the Apify Actor to retrieve structured results. \n- The output includes detailed company information (e.g., name, description, website, location, sector), which is then prepared for logging into Google Sheets.\n"
},
"typeVersion": 1
},
{
"id": "04149226-1821-419d-b7c6-f2288de0f4cc",
"name": "Nota adhesiva3",
"type": "n8n-nodes-base.stickyNote",
"position": [
3040,
1104
],
"parameters": {
"color": 5,
"width": 640,
"height": 720,
"content": "### **Step 4 – Add or Update Row in Google Sheet**\n\n- This step uses the **Google Sheets (Add or Update Row) node** to log the company data into a connected Google Sheet. \n- You must **select the target Google Document and specific Sheet** where the data will be stored. \n- Ensure the following columns are already created in the sheet (**case-sensitive**): \n - Company \n - Location \n - Website \n - LinkedIn \n - Founded \n - Description \n - Industry Tags \n - Founder 1 Name \n - Founder 1 LinkedIn \n - Founder 2 Name \n - Founder 2 LinkedIn \n\n- The node will automatically add new rows or update existing entries, keeping the sheet clean and up to date with the latest scraped company details.\n"
},
"typeVersion": 1
},
{
"id": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
"name": "Agregar datos a la hoja Google",
"type": "n8n-nodes-base.googleSheets",
"position": [
3312,
1616
],
"parameters": {
"columns": {
"value": {
"Company": "={{ $json.company_name }}",
"Founded": "={{ $json.year_founded }}",
"Website": "={{ $json.website }}",
"LinkedIn": "={{ $json.company_linkedin }}",
"Location": "={{ $json.company_location }}",
"Description": "={{ $json.long_description }}",
"Industry Tags": "={{ $json['tags/0'] }} {{ $json['tags/1'] }} {{ $json['tags/2'] }} {{ $json['tags/3'] }}",
"Founder 1 Name": "={{ $json['founders/0/name'] }}",
"Founder 2 Name": "={{ $json['founders/1/name'] }}",
"Founder 1 LinkedIn": "={{ $json['founders/0/linkedin'] }}",
"Founder 2 LinkedIn": "={{ $json['founders/1/linkedin'] }}"
},
"schema": [
{
"id": "Company",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "Company",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Location",
"type": "string",
"display": true,
"required": false,
"displayName": "Location",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Website",
"type": "string",
"display": true,
"required": false,
"displayName": "Website",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "LinkedIn",
"type": "string",
"display": true,
"required": false,
"displayName": "LinkedIn",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founded",
"type": "string",
"display": true,
"required": false,
"displayName": "Founded",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Description",
"type": "string",
"display": true,
"required": false,
"displayName": "Description",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Industry Tags",
"type": "string",
"display": true,
"required": false,
"displayName": "Industry Tags",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 1 Name",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 1 Name",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 1 LinkedIn",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 1 LinkedIn",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 2 Name",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 2 Name",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "Founder 2 LinkedIn",
"type": "string",
"display": true,
"required": false,
"displayName": "Founder 2 LinkedIn",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "defineBelow",
"matchingColumns": [
"Company"
],
"attemptToConvertTypes": false,
"convertFieldsToString": false
},
"options": {},
"operation": "appendOrUpdate",
"sheetName": {
"__rl": true,
"mode": "list",
"value": "gid=0",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit#gid=0",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1AEOYMIRNgxYN3gihT1bIrGswnkCzuWbFljX2ac4XjUU/edit?usp=drivesdk",
"cachedResultName": "YCom Apify Scrapped "
}
},
"credentials": {
"googleSheetsOAuth2Api": {
"id": "dZG6jp43p2oX45HG",
"name": "Google Sheets account 4-Smit"
}
},
"typeVersion": 4.7
},
{
"id": "c8f614e2-2aa5-4f4a-8be9-090fb24bf616",
"name": "Nota adhesiva4",
"type": "n8n-nodes-base.stickyNote",
"position": [
368,
944
],
"parameters": {
"color": 3,
"width": 768,
"height": 672,
"content": "### **Step 0 – Prerequisites**\n\nBefore running the workflow, ensure the following configurations are complete:\n\n- **Apify Setup:**\n - Connect your Apify account in n8n. \n - Select the **Y Combinator Directory Scraper** actor. \n - Paste the Y Combinator search URL (with filters applied) into the `searchUrls` parameter. \n - Adjust the `maxCompanies` parameter to control the number of companies scraped per run. \n\n- **Google Sheets Setup:**\n - Connect your Google account using **OAuth2 credentials** with both **Google Sheets** and **Google Drive** features enabled. \n - Ensure the target Google Sheet is created in advance with the following column headers (**case-sensitive**): \n - Company \n - Location \n - Website \n - LinkedIn \n - Founded \n - Description \n - Industry Tags \n - Founder 1 Name \n - Founder 1 LinkedIn \n - Founder 2 Name \n - Founder 2 LinkedIn \n\n- **n8n Configuration:**\n - Confirm that both Apify and Google integrations are properly authenticated and available in your workflow.\n"
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "36ae4ec1-b59a-49a4-b4e6-0f80bd2111f3",
"connections": {
"4d88b9f9-6909-47c8-91a5-c27ebc97de49": {
"main": [
[
{
"node": "e524c759-a193-42b6-9553-683656413431",
"type": "main",
"index": 0
}
]
]
},
"b5814a97-7dd1-4488-8af3-6bf0af555d51": {
"main": [
[
{
"node": "4d88b9f9-6909-47c8-91a5-c27ebc97de49",
"type": "main",
"index": 0
}
]
]
},
"e524c759-a193-42b6-9553-683656413431": {
"main": [
[
{
"node": "e0cff6ae-ea8b-47c6-8cc1-884459e8224e",
"type": "main",
"index": 0
}
]
]
}
}
}¿Cómo usar este flujo de trabajo?
Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.
¿En qué escenarios es adecuado este flujo de trabajo?
Intermedio - Generación de leads, IA Multimodal
¿Es de pago?
Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.
Flujos de trabajo relacionados recomendados
Intuz
@intuzWorkflow automation can help automate your routine activities and help saves $$$, as well as hours of time. As a boutique tech consulting company, Intuz help businesses with custom AI/ML, AI Workflow Automations, and software development. Automate your business workflow for: Sales Marketing Accounting Finance Operations E-Commerce Customer Support Admin & Backoffice Logistics & Supply Chain
Compartir este flujo de trabajo