使用n8n的航空公司网上值机抓取器与AI和向量数据库存储

Name: 使用n8n的航空公司网上值机抓取器与AI和向量数据库存储
Rating: 4.5 (10 reviews)
Author: Oneclick AI Squad
中级
这是一个Document Extraction, AI RAG领域的自动化工作流，包含 14 个节点。主要使用 Wait, HttpRequest, GoogleSheets, SplitInBatches, ChainLlm 等节点。使用Ollama AI、Google Sheets和Postgres向量数据库提取航空公司网上值机数据
前置要求
•可能需要目标 API 的认证凭证
•Google Sheets API 凭证
使用的节点 (14)

DocumentDefaultDataLoader
TextSplitterTokenSplitter
分类

文档提取
AI RAG 检索增强
工作流预览
可视化展示节点连接关系，支持缩放和平移
遍历项目
获取航空公司 URL
聊天触发器 - 开始
抓取航空公司网页
使用 LLM 提取信息
聊天模型
等待响应
存储提取的信息
生成嵌入向量
准备向量数据库文本
分割长文本
保存到向量数据库
下一批次前等待
React Flow
导出工作流
复制以下 JSON 配置到 n8n 导入，即可使用此工作流
{
  "id": "FLn2skSh92HNO2SS",
  "meta": {
    "instanceId": "dd69efaf8212c74ad206700d104739d3329588a6f3f8381a46a481f34c9cc281",
    "templateCredsSetupCompleted": true
  },
  "name": "使用 n8n 的航空公司网上值机抓取器与 AI 和向量数据库存储",
  "tags": [],
  "nodes": [
    {
      "id": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
      "name": "遍历项目",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        -220,
        175
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 3
    },
    {
      "id": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
      "name": "获取航空公司 URL",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        -440,
        175
      ],
      "parameters": {
        "options": {},
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": 2125635496,
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
          "cachedResultName": "airline_faq_urls"
        },
        "authentication": "serviceAccount"
      },
      "credentials": {
        "googleApi": {
          "id": "ScSS2KxGQULuPtdy",
          "name": "Google Sheets- test"
        }
      },
      "typeVersion": 4.5
    },
    {
      "id": "7e2ca713-229f-490c-bd2e-481cf8f18184",
      "name": "聊天触发器 - 开始",
      "type": "@n8n/n8n-nodes-langchain.chatTrigger",
      "position": [
        -660,
        175
      ],
      "webhookId": "6c85024c-928b-4f43-82b3-d1469283586f",
      "parameters": {
        "public": true,
        "options": {}
      },
      "typeVersion": 1.1
    },
    {
      "id": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
      "name": "抓取航空公司网页",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        0,
        0
      ],
      "parameters": {
        "url": "=https://r.jina.ai/{{ $json['WEB CHECK IN URL'] }}",
        "method": "POST",
        "options": {},
        "jsonHeaders": "{\n  \"Cookie\": \"cookie-keyname1=cookie-value1; cookie-keyname2=cookie-value2; cookie-keyname3=cookie-value3\"\n}\n",
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "specifyHeaders": "json",
        "genericAuthType": "httpHeaderAuth"
      },
      "credentials": {
        "httpHeaderAuth": {
          "id": "KCqBydsOZHvzNKAI",
          "name": "Header Auth account"
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "27072e20-58dc-49e2-ae6b-1053750607f9",
      "name": "使用 LLM 提取信息",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        220,
        0
      ],
      "parameters": {
        "text": "={{ $json.data }}",
        "messages": {
          "messageValues": [
            {
              "message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n  * \"mobile\\_boarding\\_pass\\_available\": true/false\n  * \"printed\\_boarding\\_pass\\_required\": true/false\n  * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n  * \"phone\": \"<Phone Number>\"\n  * \"email\": \"<Email>\"\n  * \"support\\_url\": \"<Support URL>\"\n  * \"chat\\_url\": \"<Chat URL>\"\n  * \"operating\\_hours\": \"<Hours>\"\n  * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n  * \"hand\\_baggage\":\n\n    * \"weight\\_limit\": \"<Weight Limit>\"\n    * \"size\\_limit\": \"<Size Limit>\"\n    * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n    * \"special\\_conditions\": \"<Any special baggage conditions>\"\n  * \"checked\\_baggage\":\n\n    * \"general\\_rules\": \"<Baggage Rules>\"\n    * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n    * \"baggage\\_calculator\\_url\": \"<URL>\"\n    * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n  * \"conditions\": \"<Refund conditions>\"\n  * \"processing\\_time\": \"<Processing Time>\"\n  * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n  * \"conditions\": \"<Cancellation conditions>\"\n  * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n  * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n  * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n  * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n  * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n  * \"faq\\_url\": \"<FAQ Page URL>\"\n  * \"questions\\_answers\": \\[\n    {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n    {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n    ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
            },
            {
              "type": "AIMessagePromptTemplate",
              "message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n  * \"mobile\\_boarding\\_pass\\_available\": true/false\n  * \"printed\\_boarding\\_pass\\_required\": true/false\n  * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n  * \"phone\": \"<Phone Number>\"\n  * \"email\": \"<Email>\"\n  * \"support\\_url\": \"<Support URL>\"\n  * \"chat\\_url\": \"<Chat URL>\"\n  * \"operating\\_hours\": \"<Hours>\"\n  * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n  * \"hand\\_baggage\":\n\n    * \"weight\\_limit\": \"<Weight Limit>\"\n    * \"size\\_limit\": \"<Size Limit>\"\n    * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n    * \"special\\_conditions\": \"<Any special baggage conditions>\"\n  * \"checked\\_baggage\":\n\n    * \"general\\_rules\": \"<Baggage Rules>\"\n    * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n    * \"baggage\\_calculator\\_url\": \"<URL>\"\n    * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n  * \"conditions\": \"<Refund conditions>\"\n  * \"processing\\_time\": \"<Processing Time>\"\n  * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n  * \"conditions\": \"<Cancellation conditions>\"\n  * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n  * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n  * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n  * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n  * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n  * \"faq\\_url\": \"<FAQ Page URL>\"\n  * \"questions\\_answers\": \\[\n    {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n    {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n    ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
            }
          ]
        },
        "promptType": "define"
      },
      "typeVersion": 1.5
    },
    {
      "id": "ba090b45-e6e8-434a-9577-51d281dd4a5b",
      "name": "聊天模型",
      "type": "@n8n/n8n-nodes-langchain.lmChatOllama",
      "position": [
        308,
        220
      ],
      "parameters": {
        "options": {}
      },
      "credentials": {
        "ollamaApi": {
          "id": "7td3WzXCW2wNhraP",
          "name": "Ollama - test"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "d557adab-856e-460e-aa81-f929a66ca465",
      "name": "等待响应",
      "type": "n8n-nodes-base.wait",
      "position": [
        580,
        0
      ],
      "webhookId": "b29f8fd3-b6ff-43ee-878b-17de4b411f99",
      "parameters": {},
      "typeVersion": 1.1
    },
    {
      "id": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
      "name": "存储提取的信息",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        816,
        0
      ],
      "parameters": {
        "columns": {
          "value": {
            "row_number": "={{ $('Loop Over Items').item.json.row_number }}",
            "web check in details": "={{ $json.text.removeTags().replace(/^```json|```$/g, '').trim() }}"
          },
          "schema": [
            {
              "id": "Airline",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Airline",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "WEB CHECK IN URL",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "WEB CHECK IN URL",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "web check in details",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "web check in details",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "output",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "output",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "row_number",
              "type": "string",
              "display": true,
              "removed": false,
              "readOnly": true,
              "required": false,
              "displayName": "row_number",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "row_number"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "update",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": 2125635496,
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
          "cachedResultName": "airline_faq_urls"
        },
        "authentication": "serviceAccount"
      },
      "credentials": {
        "googleApi": {
          "id": "ScSS2KxGQULuPtdy",
          "name": "Google Sheets- test"
        }
      },
      "typeVersion": 4.5
    },
    {
      "id": "866e9eca-68ad-419e-acf0-c28141bf7727",
      "name": "生成嵌入向量",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
      "position": [
        1036,
        220
      ],
      "parameters": {},
      "credentials": {
        "ollamaApi": {
          "id": "7td3WzXCW2wNhraP",
          "name": "Ollama - test"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "56553cae-a61f-4b64-8709-06dbab314bce",
      "name": "准备向量数据库文本",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "position": [
        1156,
        222.5
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 1
    },
    {
      "id": "82da65d6-9ecd-451a-b2f8-466795cd07a0",
      "name": "分割长文本",
      "type": "@n8n/n8n-nodes-langchain.textSplitterTokenSplitter",
      "position": [
        1244,
        420
      ],
      "parameters": {
        "chunkSize": 10000
      },
      "typeVersion": 1
    },
    {
      "id": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
      "name": "保存到向量数据库",
      "type": "@n8n/n8n-nodes-langchain.vectorStorePGVector",
      "position": [
        1052,
        0
      ],
      "parameters": {
        "mode": "insert",
        "options": {
          "collection": {
            "values": {
              "useCollection": true
            }
          }
        }
      },
      "credentials": {
        "postgres": {
          "id": "4Y4qEFGqF2krfRHZ",
          "name": "Postgres-test"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "7c4941f0-4dff-49d0-ac9b-901a23987686",
      "name": "下一批次前等待",
      "type": "n8n-nodes-base.wait",
      "position": [
        1532,
        175
      ],
      "webhookId": "43d5c764-27a7-4b37-b879-96ebd8c84fce",
      "parameters": {
        "amount": 15
      },
      "typeVersion": 1.1
    },
    {
      "id": "0fef87ce-bfc3-4edd-aff8-8e10a0e7489a",
      "name": "便签",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -740,
        -860
      ],
      "parameters": {
        "width": 660,
        "height": 860,
        "content": ""
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "pinData": {},
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "b785e5c9-19bf-42c0-8c99-1659b1c2509b",
  "connections": {
    "Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Extract Info with LLM\t",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over Items": {
      "main": [
        [],
        [
          {
            "node": "Scrape Airline Webpage\t",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Long Text\t": {
      "ai_textSplitter": [
        [
          {
            "node": "Prepare Text for Vector DB\t",
            "type": "ai_textSplitter",
            "index": 0
          }
        ]
      ]
    },
    "Wait for Response": {
      "main": [
        [
          {
            "node": "Store Extracted Info\t",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Save to Vector DB\t": {
      "main": [
        [
          {
            "node": "Wait Before Next Batch\t",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Airline URLs\t": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Generate Embeddings\t": {
      "ai_embedding": [
        [
          {
            "node": "Save to Vector DB\t",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "Chat Trigger - Start\t": {
      "main": [
        [
          {
            "node": "Fetch Airline URLs\t",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Store Extracted Info\t": {
      "main": [
        [
          {
            "node": "Save to Vector DB\t",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Info with LLM\t": {
      "main": [
        [
          {
            "node": "Wait for Response",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Scrape Airline Webpage\t": {
      "main": [
        [
          {
            "node": "Extract Info with LLM\t",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Wait Before Next Batch\t": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Prepare Text for Vector DB\t": {
      "ai_document": [
        [
          {
            "node": "Save to Vector DB\t",
            "type": "ai_document",
            "index": 0
          }
        ]
      ]
    }
  }
}
常见问题
如何使用这个工作流？

复制上方的 JSON 配置代码，在您的 n8n 实例中创建新工作流并选择「从 JSON 导入」，粘贴配置后根据需要修改凭证设置即可。
这个工作流适合什么场景？

中级 - 文档提取, AI RAG 检索增强
需要付费吗？

本工作流完全免费，您可以直接导入使用。但请注意，工作流中使用的第三方服务（如 OpenAI API）可能需要您自行付费。
使用n8n的航空公司网上值机抓取器与AI和向量数据库存储

使用的节点 (14)

分类

如何使用这个工作流？

这个工作流适合什么场景？

需要付费吗？

相关工作流推荐