使用n8n的航空公司网上值机抓取器与AI和向量数据库存储
这是一个Document Extraction, AI RAG领域的自动化工作流,包含 14 个节点。主要使用 Wait, HttpRequest, GoogleSheets, SplitInBatches, ChainLlm 等节点。 使用Ollama AI、Google Sheets和Postgres向量数据库提取航空公司网上值机数据
- •可能需要目标 API 的认证凭证
- •Google Sheets API 凭证
使用的节点 (14)
{
"id": "FLn2skSh92HNO2SS",
"meta": {
"instanceId": "dd69efaf8212c74ad206700d104739d3329588a6f3f8381a46a481f34c9cc281",
"templateCredsSetupCompleted": true
},
"name": "使用 n8n 的航空公司网上值机抓取器与 AI 和向量数据库存储",
"tags": [],
"nodes": [
{
"id": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"name": "遍历项目",
"type": "n8n-nodes-base.splitInBatches",
"position": [
-220,
175
],
"parameters": {
"options": {}
},
"typeVersion": 3
},
{
"id": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
"name": "获取航空公司 URL",
"type": "n8n-nodes-base.googleSheets",
"position": [
-440,
175
],
"parameters": {
"options": {},
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "7e2ca713-229f-490c-bd2e-481cf8f18184",
"name": "聊天触发器 - 开始",
"type": "@n8n/n8n-nodes-langchain.chatTrigger",
"position": [
-660,
175
],
"webhookId": "6c85024c-928b-4f43-82b3-d1469283586f",
"parameters": {
"public": true,
"options": {}
},
"typeVersion": 1.1
},
{
"id": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
"name": "抓取航空公司网页",
"type": "n8n-nodes-base.httpRequest",
"position": [
0,
0
],
"parameters": {
"url": "=https://r.jina.ai/{{ $json['WEB CHECK IN URL'] }}",
"method": "POST",
"options": {},
"jsonHeaders": "{\n \"Cookie\": \"cookie-keyname1=cookie-value1; cookie-keyname2=cookie-value2; cookie-keyname3=cookie-value3\"\n}\n",
"sendHeaders": true,
"authentication": "genericCredentialType",
"specifyHeaders": "json",
"genericAuthType": "httpHeaderAuth"
},
"credentials": {
"httpHeaderAuth": {
"id": "KCqBydsOZHvzNKAI",
"name": "Header Auth account"
}
},
"typeVersion": 4.2
},
{
"id": "27072e20-58dc-49e2-ae6b-1053750607f9",
"name": "使用 LLM 提取信息",
"type": "@n8n/n8n-nodes-langchain.chainLlm",
"position": [
220,
0
],
"parameters": {
"text": "={{ $json.data }}",
"messages": {
"messageValues": [
{
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
},
{
"type": "AIMessagePromptTemplate",
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
}
]
},
"promptType": "define"
},
"typeVersion": 1.5
},
{
"id": "ba090b45-e6e8-434a-9577-51d281dd4a5b",
"name": "聊天模型",
"type": "@n8n/n8n-nodes-langchain.lmChatOllama",
"position": [
308,
220
],
"parameters": {
"options": {}
},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "d557adab-856e-460e-aa81-f929a66ca465",
"name": "等待响应",
"type": "n8n-nodes-base.wait",
"position": [
580,
0
],
"webhookId": "b29f8fd3-b6ff-43ee-878b-17de4b411f99",
"parameters": {},
"typeVersion": 1.1
},
{
"id": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
"name": "存储提取的信息",
"type": "n8n-nodes-base.googleSheets",
"position": [
816,
0
],
"parameters": {
"columns": {
"value": {
"row_number": "={{ $('Loop Over Items').item.json.row_number }}",
"web check in details": "={{ $json.text.removeTags().replace(/^```json|```$/g, '').trim() }}"
},
"schema": [
{
"id": "Airline",
"type": "string",
"display": true,
"required": false,
"displayName": "Airline",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "WEB CHECK IN URL",
"type": "string",
"display": true,
"removed": true,
"required": false,
"displayName": "WEB CHECK IN URL",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "web check in details",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "web check in details",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "output",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "output",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "row_number",
"type": "string",
"display": true,
"removed": false,
"readOnly": true,
"required": false,
"displayName": "row_number",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "defineBelow",
"matchingColumns": [
"row_number"
],
"attemptToConvertTypes": false,
"convertFieldsToString": false
},
"options": {},
"operation": "update",
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "866e9eca-68ad-419e-acf0-c28141bf7727",
"name": "生成嵌入向量",
"type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
"position": [
1036,
220
],
"parameters": {},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "56553cae-a61f-4b64-8709-06dbab314bce",
"name": "准备向量数据库文本",
"type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
"position": [
1156,
222.5
],
"parameters": {
"options": {}
},
"typeVersion": 1
},
{
"id": "82da65d6-9ecd-451a-b2f8-466795cd07a0",
"name": "分割长文本",
"type": "@n8n/n8n-nodes-langchain.textSplitterTokenSplitter",
"position": [
1244,
420
],
"parameters": {
"chunkSize": 10000
},
"typeVersion": 1
},
{
"id": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"name": "保存到向量数据库",
"type": "@n8n/n8n-nodes-langchain.vectorStorePGVector",
"position": [
1052,
0
],
"parameters": {
"mode": "insert",
"options": {
"collection": {
"values": {
"useCollection": true
}
}
}
},
"credentials": {
"postgres": {
"id": "4Y4qEFGqF2krfRHZ",
"name": "Postgres-test"
}
},
"typeVersion": 1
},
{
"id": "7c4941f0-4dff-49d0-ac9b-901a23987686",
"name": "下一批次前等待",
"type": "n8n-nodes-base.wait",
"position": [
1532,
175
],
"webhookId": "43d5c764-27a7-4b37-b879-96ebd8c84fce",
"parameters": {
"amount": 15
},
"typeVersion": 1.1
},
{
"id": "0fef87ce-bfc3-4edd-aff8-8e10a0e7489a",
"name": "便签",
"type": "n8n-nodes-base.stickyNote",
"position": [
-740,
-860
],
"parameters": {
"width": 660,
"height": 860,
"content": ""
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "b785e5c9-19bf-42c0-8c99-1659b1c2509b",
"connections": {
"Chat Model": {
"ai_languageModel": [
[
{
"node": "Extract Info with LLM\t",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"Loop Over Items": {
"main": [
[],
[
{
"node": "Scrape Airline Webpage\t",
"type": "main",
"index": 0
}
]
]
},
"Split Long Text\t": {
"ai_textSplitter": [
[
{
"node": "Prepare Text for Vector DB\t",
"type": "ai_textSplitter",
"index": 0
}
]
]
},
"Wait for Response": {
"main": [
[
{
"node": "Store Extracted Info\t",
"type": "main",
"index": 0
}
]
]
},
"Save to Vector DB\t": {
"main": [
[
{
"node": "Wait Before Next Batch\t",
"type": "main",
"index": 0
}
]
]
},
"Fetch Airline URLs\t": {
"main": [
[
{
"node": "Loop Over Items",
"type": "main",
"index": 0
}
]
]
},
"Generate Embeddings\t": {
"ai_embedding": [
[
{
"node": "Save to Vector DB\t",
"type": "ai_embedding",
"index": 0
}
]
]
},
"Chat Trigger - Start\t": {
"main": [
[
{
"node": "Fetch Airline URLs\t",
"type": "main",
"index": 0
}
]
]
},
"Store Extracted Info\t": {
"main": [
[
{
"node": "Save to Vector DB\t",
"type": "main",
"index": 0
}
]
]
},
"Extract Info with LLM\t": {
"main": [
[
{
"node": "Wait for Response",
"type": "main",
"index": 0
}
]
]
},
"Scrape Airline Webpage\t": {
"main": [
[
{
"node": "Extract Info with LLM\t",
"type": "main",
"index": 0
}
]
]
},
"Wait Before Next Batch\t": {
"main": [
[
{
"node": "Loop Over Items",
"type": "main",
"index": 0
}
]
]
},
"Prepare Text for Vector DB\t": {
"ai_document": [
[
{
"node": "Save to Vector DB\t",
"type": "ai_document",
"index": 0
}
]
]
}
}
}如何使用这个工作流?
复制上方的 JSON 配置代码,在您的 n8n 实例中创建新工作流并选择「从 JSON 导入」,粘贴配置后根据需要修改凭证设置即可。
这个工作流适合什么场景?
中级 - 文档提取, AI RAG 检索增强
需要付费吗?
本工作流完全免费,您可以直接导入使用。但请注意,工作流中使用的第三方服务(如 OpenAI API)可能需要您自行付费。
相关工作流推荐
Oneclick AI Squad
@oneclick-aiThe AI Squad Initiative is a pioneering effort to build, automate and scale AI-powered workflows using n8n.io. Our mission is to help individuals and businesses integrate AI agents seamlessly into their daily operations from automating tasks and enhancing productivity to creating innovative, intelligent solutions. We design modular, reusable AI workflow templates that empower creators, developers and teams to supercharge their automation with minimal effort and maximum impact.
分享此工作流