n8n을 사용한 항공사 온라인 체크인 스크래퍼 (AI 및 벡터 데이터베이스 저장 포함)
이것은Document Extraction, AI RAG분야의자동화 워크플로우로, 14개의 노드를 포함합니다.주로 Wait, HttpRequest, GoogleSheets, SplitInBatches, ChainLlm 등의 노드를 사용하며. 사용Ollama AI、Google Sheets및Postgres向量데이터库추출航空公司网上值机데이터
- •대상 API의 인증 정보가 필요할 수 있음
- •Google Sheets API 인증 정보
사용된 노드 (14)
{
"id": "FLn2skSh92HNO2SS",
"meta": {
"instanceId": "dd69efaf8212c74ad206700d104739d3329588a6f3f8381a46a481f34c9cc281",
"templateCredsSetupCompleted": true
},
"name": "Airline Web Check-in Scraper with AI & Vector DB Storage using n8n",
"tags": [],
"nodes": [
{
"id": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"name": "항목 순회",
"type": "n8n-nodes-base.splitInBatches",
"position": [
-220,
175
],
"parameters": {
"options": {}
},
"typeVersion": 3
},
{
"id": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
"name": "항공사 URL 가져오기",
"type": "n8n-nodes-base.googleSheets",
"position": [
-440,
175
],
"parameters": {
"options": {},
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "7e2ca713-229f-490c-bd2e-481cf8f18184",
"name": "채팅 트리거 - 시작",
"type": "@n8n/n8n-nodes-langchain.chatTrigger",
"position": [
-660,
175
],
"webhookId": "6c85024c-928b-4f43-82b3-d1469283586f",
"parameters": {
"public": true,
"options": {}
},
"typeVersion": 1.1
},
{
"id": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
"name": "항공사 웹페이지 스크래핑",
"type": "n8n-nodes-base.httpRequest",
"position": [
0,
0
],
"parameters": {
"url": "=https://r.jina.ai/{{ $json['WEB CHECK IN URL'] }}",
"method": "POST",
"options": {},
"jsonHeaders": "{\n \"Cookie\": \"cookie-keyname1=cookie-value1; cookie-keyname2=cookie-value2; cookie-keyname3=cookie-value3\"\n}\n",
"sendHeaders": true,
"authentication": "genericCredentialType",
"specifyHeaders": "json",
"genericAuthType": "httpHeaderAuth"
},
"credentials": {
"httpHeaderAuth": {
"id": "KCqBydsOZHvzNKAI",
"name": "Header Auth account"
}
},
"typeVersion": 4.2
},
{
"id": "27072e20-58dc-49e2-ae6b-1053750607f9",
"name": "LLM으로 정보 추출",
"type": "@n8n/n8n-nodes-langchain.chainLlm",
"position": [
220,
0
],
"parameters": {
"text": "={{ $json.data }}",
"messages": {
"messageValues": [
{
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
},
{
"type": "AIMessagePromptTemplate",
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
}
]
},
"promptType": "define"
},
"typeVersion": 1.5
},
{
"id": "ba090b45-e6e8-434a-9577-51d281dd4a5b",
"name": "채팅 모델",
"type": "@n8n/n8n-nodes-langchain.lmChatOllama",
"position": [
308,
220
],
"parameters": {
"options": {}
},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "d557adab-856e-460e-aa81-f929a66ca465",
"name": "응답 대기",
"type": "n8n-nodes-base.wait",
"position": [
580,
0
],
"webhookId": "b29f8fd3-b6ff-43ee-878b-17de4b411f99",
"parameters": {},
"typeVersion": 1.1
},
{
"id": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
"name": "추출된 정보 저장",
"type": "n8n-nodes-base.googleSheets",
"position": [
816,
0
],
"parameters": {
"columns": {
"value": {
"row_number": "={{ $('Loop Over Items').item.json.row_number }}",
"web check in details": "={{ $json.text.removeTags().replace(/^```json|```$/g, '').trim() }}"
},
"schema": [
{
"id": "Airline",
"type": "string",
"display": true,
"required": false,
"displayName": "Airline",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "WEB CHECK IN URL",
"type": "string",
"display": true,
"removed": true,
"required": false,
"displayName": "WEB CHECK IN URL",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "web check in details",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "web check in details",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "output",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "output",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "row_number",
"type": "string",
"display": true,
"removed": false,
"readOnly": true,
"required": false,
"displayName": "row_number",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "defineBelow",
"matchingColumns": [
"row_number"
],
"attemptToConvertTypes": false,
"convertFieldsToString": false
},
"options": {},
"operation": "update",
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "866e9eca-68ad-419e-acf0-c28141bf7727",
"name": "임베딩 생성",
"type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
"position": [
1036,
220
],
"parameters": {},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "56553cae-a61f-4b64-8709-06dbab314bce",
"name": "벡터 DB용 텍스트 준비",
"type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
"position": [
1156,
222.5
],
"parameters": {
"options": {}
},
"typeVersion": 1
},
{
"id": "82da65d6-9ecd-451a-b2f8-466795cd07a0",
"name": "긴 텍스트 분할",
"type": "@n8n/n8n-nodes-langchain.textSplitterTokenSplitter",
"position": [
1244,
420
],
"parameters": {
"chunkSize": 10000
},
"typeVersion": 1
},
{
"id": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"name": "벡터 DB에 저장",
"type": "@n8n/n8n-nodes-langchain.vectorStorePGVector",
"position": [
1052,
0
],
"parameters": {
"mode": "insert",
"options": {
"collection": {
"values": {
"useCollection": true
}
}
}
},
"credentials": {
"postgres": {
"id": "4Y4qEFGqF2krfRHZ",
"name": "Postgres-test"
}
},
"typeVersion": 1
},
{
"id": "7c4941f0-4dff-49d0-ac9b-901a23987686",
"name": "다음 배치 전 대기",
"type": "n8n-nodes-base.wait",
"position": [
1532,
175
],
"webhookId": "43d5c764-27a7-4b37-b879-96ebd8c84fce",
"parameters": {
"amount": 15
},
"typeVersion": 1.1
},
{
"id": "0fef87ce-bfc3-4edd-aff8-8e10a0e7489a",
"name": "스티커 노트",
"type": "n8n-nodes-base.stickyNote",
"position": [
-740,
-860
],
"parameters": {
"width": 660,
"height": 860,
"content": "\n\n### 📝 Web Check-in Details Extractor (LLM Prompt Guide)\n\n#### ✅ What is this?\n\nThis is a powerful AI prompt used inside the **\"Basic LLM Chain\"** node. It tells the AI how to **extract structured airline web check-in data** (like check-in time, baggage policy, cancellation rules) from messy airline webpages.\n\n#### 🎯 Why is it used?\n\nAirline websites often present data in unstructured formats. This LLM-based step:\n\n* Cleans the content scraped from airline URLs.\n* Extracts important travel-related info in a consistent JSON format.\n* Helps automate the enrichment of airline data stored in your Google Sheets and Vector DB.\n\n#### 🛠️ How to use it?\n\n1. **Input**: This node receives raw webpage content from the airline’s \"Web Check-in URL\".\n2. **Prompt**: It applies a fixed set of rules (in natural language) to guide the AI to convert the unstructured data into clean JSON.\n3. **Output**: The AI returns a **structured JSON** object with fields like:\n\n * `checkin_url`\n * `baggage_allowance`\n * `refund_policy`\n * `faq`\n * `additional_info`\n4. The next nodes save this output to:\n\n * Google Sheet (for visibility)\n * PGVector (for semantic search)\n\n💡 **Pro Tip:** This works best when the HTML content is readable and includes useful labels like “Check-in”, “Cancellation”, “Support”, etc.\n\n\n"
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "b785e5c9-19bf-42c0-8c99-1659b1c2509b",
"connections": {
"ba090b45-e6e8-434a-9577-51d281dd4a5b": {
"ai_languageModel": [
[
{
"node": "27072e20-58dc-49e2-ae6b-1053750607f9",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"ee7a49bf-a2dc-4d12-aef0-9add291a398c": {
"main": [
[],
[
{
"node": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
"type": "main",
"index": 0
}
]
]
},
"82da65d6-9ecd-451a-b2f8-466795cd07a0": {
"ai_textSplitter": [
[
{
"node": "56553cae-a61f-4b64-8709-06dbab314bce",
"type": "ai_textSplitter",
"index": 0
}
]
]
},
"d557adab-856e-460e-aa81-f929a66ca465": {
"main": [
[
{
"node": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
"type": "main",
"index": 0
}
]
]
},
"a670a7c1-af95-452d-92e5-a82d5be2d0a5": {
"main": [
[
{
"node": "7c4941f0-4dff-49d0-ac9b-901a23987686",
"type": "main",
"index": 0
}
]
]
},
"049cfbd5-bbc7-483c-964e-a32cdab1e6b8": {
"main": [
[
{
"node": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"type": "main",
"index": 0
}
]
]
},
"866e9eca-68ad-419e-acf0-c28141bf7727": {
"ai_embedding": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "ai_embedding",
"index": 0
}
]
]
},
"7e2ca713-229f-490c-bd2e-481cf8f18184": {
"main": [
[
{
"node": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
"type": "main",
"index": 0
}
]
]
},
"279b24fc-e1f3-4a1c-9c70-0177b13f32d8": {
"main": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "main",
"index": 0
}
]
]
},
"27072e20-58dc-49e2-ae6b-1053750607f9": {
"main": [
[
{
"node": "d557adab-856e-460e-aa81-f929a66ca465",
"type": "main",
"index": 0
}
]
]
},
"c11c66ea-3e36-4c12-a263-109d03d8be1a": {
"main": [
[
{
"node": "27072e20-58dc-49e2-ae6b-1053750607f9",
"type": "main",
"index": 0
}
]
]
},
"7c4941f0-4dff-49d0-ac9b-901a23987686": {
"main": [
[
{
"node": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"type": "main",
"index": 0
}
]
]
},
"56553cae-a61f-4b64-8709-06dbab314bce": {
"ai_document": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "ai_document",
"index": 0
}
]
]
}
}
}이 워크플로우를 어떻게 사용하나요?
위의 JSON 구성 코드를 복사하여 n8n 인스턴스에서 새 워크플로우를 생성하고 "JSON에서 가져오기"를 선택한 후, 구성을 붙여넣고 필요에 따라 인증 설정을 수정하세요.
이 워크플로우는 어떤 시나리오에 적합한가요?
중급 - 문서 추출, AI RAG
유료인가요?
이 워크플로우는 완전히 무료이며 직접 가져와 사용할 수 있습니다. 다만, 워크플로우에서 사용하는 타사 서비스(예: OpenAI API)는 사용자 직접 비용을 지불해야 할 수 있습니다.
관련 워크플로우 추천
Oneclick AI Squad
@oneclick-aiThe AI Squad Initiative is a pioneering effort to build, automate and scale AI-powered workflows using n8n.io. Our mission is to help individuals and businesses integrate AI agents seamlessly into their daily operations from automating tasks and enhancing productivity to creating innovative, intelligent solutions. We design modular, reusable AI workflow templates that empower creators, developers and teams to supercharge their automation with minimal effort and maximum impact.
이 워크플로우 공유