Gratteur de web check-in aérien avec n8n, AI et stockage de base de données vectorielle
Ceci est unDocument Extraction, AI RAGworkflow d'automatisation du domainecontenant 14 nœuds.Utilise principalement des nœuds comme Wait, HttpRequest, GoogleSheets, SplitInBatches, ChainLlm. Extraction des données d'enregistrement en ligne des compagnies aériennes avec Ollama AI, Google Sheets et Postgres vector database
- •Peut nécessiter les informations d'identification d'authentification de l'API cible
- •Informations d'identification Google Sheets API
Nœuds utilisés (14)
Catégorie
{
"id": "FLn2skSh92HNO2SS",
"meta": {
"instanceId": "dd69efaf8212c74ad206700d104739d3329588a6f3f8381a46a481f34c9cc281",
"templateCredsSetupCompleted": true
},
"name": "Airline Web Check-in Scraper with AI & Vector DB Storage using n8n",
"tags": [],
"nodes": [
{
"id": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"name": "Boucle sur les éléments",
"type": "n8n-nodes-base.splitInBatches",
"position": [
-220,
175
],
"parameters": {
"options": {}
},
"typeVersion": 3
},
{
"id": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
"name": "Récupérer les URLs des compagnies aériennes",
"type": "n8n-nodes-base.googleSheets",
"position": [
-440,
175
],
"parameters": {
"options": {},
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "7e2ca713-229f-490c-bd2e-481cf8f18184",
"name": "Déclencheur de chat - Début",
"type": "@n8n/n8n-nodes-langchain.chatTrigger",
"position": [
-660,
175
],
"webhookId": "6c85024c-928b-4f43-82b3-d1469283586f",
"parameters": {
"public": true,
"options": {}
},
"typeVersion": 1.1
},
{
"id": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
"name": "Extraire la page web de la compagnie aérienne",
"type": "n8n-nodes-base.httpRequest",
"position": [
0,
0
],
"parameters": {
"url": "=https://r.jina.ai/{{ $json['WEB CHECK IN URL'] }}",
"method": "POST",
"options": {},
"jsonHeaders": "{\n \"Cookie\": \"cookie-keyname1=cookie-value1; cookie-keyname2=cookie-value2; cookie-keyname3=cookie-value3\"\n}\n",
"sendHeaders": true,
"authentication": "genericCredentialType",
"specifyHeaders": "json",
"genericAuthType": "httpHeaderAuth"
},
"credentials": {
"httpHeaderAuth": {
"id": "KCqBydsOZHvzNKAI",
"name": "Header Auth account"
}
},
"typeVersion": 4.2
},
{
"id": "27072e20-58dc-49e2-ae6b-1053750607f9",
"name": "Extraire les informations avec LLM",
"type": "@n8n/n8n-nodes-langchain.chainLlm",
"position": [
220,
0
],
"parameters": {
"text": "={{ $json.data }}",
"messages": {
"messageValues": [
{
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
},
{
"type": "AIMessagePromptTemplate",
"message": "=You are an intelligent parser trained to extract structured data from messy airline webpages.\n\nYour task is to extract and return well-structured airline check-in and policy details from raw text. Always return the result as a clean, valid JSON object using the exact schema described below.\n\n---\n\nExtraction Guidelines:\n\n* Ensure consistent JSON structure for every airline.\n* If a key has no value or content, **remove it** (do not return null, empty arrays, or empty objects).\n* Include any other useful data under `\"additional_info\"` if it doesn’t fit existing keys.\n* Always extract **direct URLs** wherever available.\n* Your output should be compact, valid, and readable JSON.\n\n---\n\nJSON Structure Format:\n\n1. Web Check-in Details\n\n* \"web\\_checkin\\_available\": true/false\n* \"checkin\\_url\": \"<URL>\"\n* \"checkin\\_methods\": \\[\"Online\", \"Mobile App\", \"Kiosk\", \"Airport Counter\"]\n* \"checkin\\_start\": \"<Start Time>\"\n* \"checkin\\_deadline\": \"<Deadline Time>\"\n* \"boarding\\_pass\\_options\":\n\n * \"mobile\\_boarding\\_pass\\_available\": true/false\n * \"printed\\_boarding\\_pass\\_required\": true/false\n * \"additional\\_checkin\\_info\": \"<Extra instructions>\"\n\n2. Customer Support\n\n* \"customer\\_support\":\n\n * \"phone\": \"<Phone Number>\"\n * \"email\": \"<Email>\"\n * \"support\\_url\": \"<Support URL>\"\n * \"chat\\_url\": \"<Chat URL>\"\n * \"operating\\_hours\": \"<Hours>\"\n * \"additional\\_help\\_channels\": \\[\"WhatsApp\", \"Twitter Support\", \"Chatbot\"]\n\n3. Baggage Allowance\n\n* \"baggage\\_allowance\":\n\n * \"hand\\_baggage\":\n\n * \"weight\\_limit\": \"<Weight Limit>\"\n * \"size\\_limit\": \"<Size Limit>\"\n * \"additional\\_items\\_allowed\": \\[\"Handbag\", \"Laptop\", \"Baby Items\", \"Medical Equipment\"]\n * \"special\\_conditions\": \"<Any special baggage conditions>\"\n * \"checked\\_baggage\":\n\n * \"general\\_rules\": \"<Baggage Rules>\"\n * \"class\\_specific\\_limits\": \"<Limits for different travel classes>\"\n * \"baggage\\_calculator\\_url\": \"<URL>\"\n * \"oversized\\_special\\_baggage\": \"\\<Details on sports/music equipment>\"\n\n4. Refund & Cancellation Policy\n\n* \"refund\\_policy\":\n\n * \"conditions\": \"<Refund conditions>\"\n * \"processing\\_time\": \"<Processing Time>\"\n * \"refund\\_policy\\_url\": \"<URL>\"\n* \"cancellation\\_policy\":\n\n * \"conditions\": \"<Cancellation conditions>\"\n * \"fees\\_or\\_penalties\": \"<Cancellation Fees>\"\n * \"cancellation\\_policy\\_url\": \"<URL>\"\n\n5. Airport & Travel Guidelines\n\n* \"airport\\_travel\\_guidelines\":\n\n * \"security\\_immigration\\_rules\": \"\\<Security & Immigration Rules>\"\n * \"airport\\_checkin\\_requirements\": \"<Check-in document requirements>\"\n * \"special\\_services\": \\[\"Priority Boarding\", \"Lounge Access\", \"Wheelchair Assistance\"]\n\n6. Frequently Asked Questions (FAQ)\n\n* \"faq\":\n\n * \"faq\\_url\": \"<FAQ Page URL>\"\n * \"questions\\_answers\": \\[\n {\"question\": \"<Q1>\", \"answer\": \"<A1>\"},\n {\"question\": \"<Q2>\", \"answer\": \"<A2>\"}\n ]\n\n7. Additional Details\n\n* \"additional\\_info\": \"<Any extra info from the page not captured above>\"\n\n---\n\nOutput Rules Summary:\n\n* Always return valid JSON.\n* Do **not** include empty fields or structures (null, {}, or \\[]).\n* Place unrelated or extra info under \"additional\\_info\".\n\nUse this structure for every page, even if some values are missing. Just remove the missing fields completely in the output.\n\n\n"
}
]
},
"promptType": "define"
},
"typeVersion": 1.5
},
{
"id": "ba090b45-e6e8-434a-9577-51d281dd4a5b",
"name": "Modèle de chat",
"type": "@n8n/n8n-nodes-langchain.lmChatOllama",
"position": [
308,
220
],
"parameters": {
"options": {}
},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "d557adab-856e-460e-aa81-f929a66ca465",
"name": "Attendre la réponse",
"type": "n8n-nodes-base.wait",
"position": [
580,
0
],
"webhookId": "b29f8fd3-b6ff-43ee-878b-17de4b411f99",
"parameters": {},
"typeVersion": 1.1
},
{
"id": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
"name": "Stocker les informations extraites",
"type": "n8n-nodes-base.googleSheets",
"position": [
816,
0
],
"parameters": {
"columns": {
"value": {
"row_number": "={{ $('Loop Over Items').item.json.row_number }}",
"web check in details": "={{ $json.text.removeTags().replace(/^```json|```$/g, '').trim() }}"
},
"schema": [
{
"id": "Airline",
"type": "string",
"display": true,
"required": false,
"displayName": "Airline",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "WEB CHECK IN URL",
"type": "string",
"display": true,
"removed": true,
"required": false,
"displayName": "WEB CHECK IN URL",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "web check in details",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "web check in details",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "output",
"type": "string",
"display": true,
"removed": false,
"required": false,
"displayName": "output",
"defaultMatch": false,
"canBeUsedToMatch": true
},
{
"id": "row_number",
"type": "string",
"display": true,
"removed": false,
"readOnly": true,
"required": false,
"displayName": "row_number",
"defaultMatch": false,
"canBeUsedToMatch": true
}
],
"mappingMode": "defineBelow",
"matchingColumns": [
"row_number"
],
"attemptToConvertTypes": false,
"convertFieldsToString": false
},
"options": {},
"operation": "update",
"sheetName": {
"__rl": true,
"mode": "list",
"value": 2125635496,
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit#gid=2125635496",
"cachedResultName": "Sheet1"
},
"documentId": {
"__rl": true,
"mode": "list",
"value": "1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8",
"cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ws8YonQyc32SveWQdfihYOW_OzOS-2REIrwSYS37oQ8/edit?usp=drivesdk",
"cachedResultName": "airline_faq_urls"
},
"authentication": "serviceAccount"
},
"credentials": {
"googleApi": {
"id": "ScSS2KxGQULuPtdy",
"name": "Google Sheets- test"
}
},
"typeVersion": 4.5
},
{
"id": "866e9eca-68ad-419e-acf0-c28141bf7727",
"name": "Générer des embeddings",
"type": "@n8n/n8n-nodes-langchain.embeddingsOllama",
"position": [
1036,
220
],
"parameters": {},
"credentials": {
"ollamaApi": {
"id": "7td3WzXCW2wNhraP",
"name": "Ollama - test"
}
},
"typeVersion": 1
},
{
"id": "56553cae-a61f-4b64-8709-06dbab314bce",
"name": "Préparer le texte pour la base de données vectorielle",
"type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
"position": [
1156,
222.5
],
"parameters": {
"options": {}
},
"typeVersion": 1
},
{
"id": "82da65d6-9ecd-451a-b2f8-466795cd07a0",
"name": "Diviser les textes longs",
"type": "@n8n/n8n-nodes-langchain.textSplitterTokenSplitter",
"position": [
1244,
420
],
"parameters": {
"chunkSize": 10000
},
"typeVersion": 1
},
{
"id": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"name": "Sauvegarder dans la base de données vectorielle",
"type": "@n8n/n8n-nodes-langchain.vectorStorePGVector",
"position": [
1052,
0
],
"parameters": {
"mode": "insert",
"options": {
"collection": {
"values": {
"useCollection": true
}
}
}
},
"credentials": {
"postgres": {
"id": "4Y4qEFGqF2krfRHZ",
"name": "Postgres-test"
}
},
"typeVersion": 1
},
{
"id": "7c4941f0-4dff-49d0-ac9b-901a23987686",
"name": "Attente avant le lot suivant",
"type": "n8n-nodes-base.wait",
"position": [
1532,
175
],
"webhookId": "43d5c764-27a7-4b37-b879-96ebd8c84fce",
"parameters": {
"amount": 15
},
"typeVersion": 1.1
},
{
"id": "0fef87ce-bfc3-4edd-aff8-8e10a0e7489a",
"name": "Note adhésive",
"type": "n8n-nodes-base.stickyNote",
"position": [
-740,
-860
],
"parameters": {
"width": 660,
"height": 860,
"content": "\n\n### 📝 Web Check-in Details Extractor (LLM Prompt Guide)\n\n#### ✅ What is this?\n\nThis is a powerful AI prompt used inside the **\"Basic LLM Chain\"** node. It tells the AI how to **extract structured airline web check-in data** (like check-in time, baggage policy, cancellation rules) from messy airline webpages.\n\n#### 🎯 Why is it used?\n\nAirline websites often present data in unstructured formats. This LLM-based step:\n\n* Cleans the content scraped from airline URLs.\n* Extracts important travel-related info in a consistent JSON format.\n* Helps automate the enrichment of airline data stored in your Google Sheets and Vector DB.\n\n#### 🛠️ How to use it?\n\n1. **Input**: This node receives raw webpage content from the airline’s \"Web Check-in URL\".\n2. **Prompt**: It applies a fixed set of rules (in natural language) to guide the AI to convert the unstructured data into clean JSON.\n3. **Output**: The AI returns a **structured JSON** object with fields like:\n\n * `checkin_url`\n * `baggage_allowance`\n * `refund_policy`\n * `faq`\n * `additional_info`\n4. The next nodes save this output to:\n\n * Google Sheet (for visibility)\n * PGVector (for semantic search)\n\n💡 **Pro Tip:** This works best when the HTML content is readable and includes useful labels like “Check-in”, “Cancellation”, “Support”, etc.\n\n\n"
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "b785e5c9-19bf-42c0-8c99-1659b1c2509b",
"connections": {
"ba090b45-e6e8-434a-9577-51d281dd4a5b": {
"ai_languageModel": [
[
{
"node": "27072e20-58dc-49e2-ae6b-1053750607f9",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"ee7a49bf-a2dc-4d12-aef0-9add291a398c": {
"main": [
[],
[
{
"node": "c11c66ea-3e36-4c12-a263-109d03d8be1a",
"type": "main",
"index": 0
}
]
]
},
"82da65d6-9ecd-451a-b2f8-466795cd07a0": {
"ai_textSplitter": [
[
{
"node": "56553cae-a61f-4b64-8709-06dbab314bce",
"type": "ai_textSplitter",
"index": 0
}
]
]
},
"d557adab-856e-460e-aa81-f929a66ca465": {
"main": [
[
{
"node": "279b24fc-e1f3-4a1c-9c70-0177b13f32d8",
"type": "main",
"index": 0
}
]
]
},
"a670a7c1-af95-452d-92e5-a82d5be2d0a5": {
"main": [
[
{
"node": "7c4941f0-4dff-49d0-ac9b-901a23987686",
"type": "main",
"index": 0
}
]
]
},
"049cfbd5-bbc7-483c-964e-a32cdab1e6b8": {
"main": [
[
{
"node": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"type": "main",
"index": 0
}
]
]
},
"866e9eca-68ad-419e-acf0-c28141bf7727": {
"ai_embedding": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "ai_embedding",
"index": 0
}
]
]
},
"7e2ca713-229f-490c-bd2e-481cf8f18184": {
"main": [
[
{
"node": "049cfbd5-bbc7-483c-964e-a32cdab1e6b8",
"type": "main",
"index": 0
}
]
]
},
"279b24fc-e1f3-4a1c-9c70-0177b13f32d8": {
"main": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "main",
"index": 0
}
]
]
},
"27072e20-58dc-49e2-ae6b-1053750607f9": {
"main": [
[
{
"node": "d557adab-856e-460e-aa81-f929a66ca465",
"type": "main",
"index": 0
}
]
]
},
"c11c66ea-3e36-4c12-a263-109d03d8be1a": {
"main": [
[
{
"node": "27072e20-58dc-49e2-ae6b-1053750607f9",
"type": "main",
"index": 0
}
]
]
},
"7c4941f0-4dff-49d0-ac9b-901a23987686": {
"main": [
[
{
"node": "ee7a49bf-a2dc-4d12-aef0-9add291a398c",
"type": "main",
"index": 0
}
]
]
},
"56553cae-a61f-4b64-8709-06dbab314bce": {
"ai_document": [
[
{
"node": "a670a7c1-af95-452d-92e5-a82d5be2d0a5",
"type": "ai_document",
"index": 0
}
]
]
}
}
}Comment utiliser ce workflow ?
Copiez le code de configuration JSON ci-dessus, créez un nouveau workflow dans votre instance n8n et sélectionnez "Importer depuis le JSON", collez la configuration et modifiez les paramètres d'authentification selon vos besoins.
Dans quelles scénarios ce workflow est-il adapté ?
Intermédiaire - Extraction de documents, RAG IA
Est-ce payant ?
Ce workflow est entièrement gratuit et peut être utilisé directement. Veuillez noter que les services tiers utilisés dans le workflow (comme l'API OpenAI) peuvent nécessiter un paiement de votre part.
Workflows recommandés
Oneclick AI Squad
@oneclick-aiThe AI Squad Initiative is a pioneering effort to build, automate and scale AI-powered workflows using n8n.io. Our mission is to help individuals and businesses integrate AI agents seamlessly into their daily operations from automating tasks and enhancing productivity to creating innovative, intelligent solutions. We design modular, reusable AI workflow templates that empower creators, developers and teams to supercharge their automation with minimal effort and maximum impact.
Partager ce workflow