PDF 내용을 분석, 정규화, 추출하여 Pinecone에 저장하여 RAG에 사용합니다.
이것은AI RAG, Multimodal AI분야의자동화 워크플로우로, 18개의 노드를 포함합니다.주로 If, Code, Wait, GoogleDrive, HttpRequest 등의 노드를 사용하며. LlamaIndex, OpenAI 임베딩, Pinecone 벡터 데이터베이스를 사용하여 PDF 질문 응답 시스템을 구축합니다.
- •Google Drive API 인증 정보
- •대상 API의 인증 정보가 필요할 수 있음
- •OpenAI API Key
- •Pinecone API Key
사용된 노드 (18)
{
"id": "xDiuqZUZnShKpPzX",
"meta": {
"instanceId": "70273a2379644db63ce659827cfd8abac2d0b189210eafa02dd5376e3a62cd1d",
"templateCredsSetupCompleted": true
},
"name": "Parse, Normalize, Extract, and Store PDF Content for RAG in Pinecone",
"tags": [],
"nodes": [
{
"id": "19b009db-a418-458c-a216-bdcc9af6fd2f",
"name": "Google Drive 트리거",
"type": "n8n-nodes-base.googleDriveTrigger",
"position": [
-1504,
2080
],
"parameters": {
"event": "fileCreated",
"options": {},
"pollTimes": {
"item": [
{
"mode": "everyMinute"
}
]
},
"triggerOn": "specificFolder",
"folderToWatch": {
"__rl": true,
"mode": "list",
"value": ""
}
},
"credentials": {
"googleDriveOAuth2Api": {
"id": "aU33fzddE6s3ZQw6",
"name": "LearnBy-Google-Drive"
}
},
"typeVersion": 1
},
{
"id": "ff933f76-d719-40b5-b193-8a29e5fa2197",
"name": "파일 다운로드",
"type": "n8n-nodes-base.googleDrive",
"position": [
-1248,
2096
],
"parameters": {
"fileId": {
"__rl": true,
"mode": "id",
"value": "={{ $json.id }}"
},
"options": {},
"operation": "download"
},
"credentials": {
"googleDriveOAuth2Api": {
"id": "aU33fzddE6s3ZQw6",
"name": "LearnBy-Google-Drive"
}
},
"typeVersion": 3
},
{
"id": "127b41ed-ad45-4234-b87f-4f3c2b6ea531",
"name": "기본 데이터 로더",
"type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
"position": [
528,
2192
],
"parameters": {
"options": {},
"textSplittingMode": "custom"
},
"typeVersion": 1.1
},
{
"id": "0316c7d4-449f-4275-a9b1-8848545beba8",
"name": "스티커 노트",
"type": "n8n-nodes-base.stickyNote",
"position": [
336,
1712
],
"parameters": {
"width": 736,
"height": 832,
"content": "## Save to Vector DB"
},
"typeVersion": 1
},
{
"id": "b06702b5-c322-4a5a-949a-855c8b97dadc",
"name": "스티커 노트1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1088,
1720
],
"parameters": {
"color": 4,
"width": 1392,
"height": 656,
"content": "## Prepare data - Parse and Normalize\n"
},
"typeVersion": 1
},
{
"id": "05034e35-f6bf-45a6-860e-94f4da566daf",
"name": "대기",
"type": "n8n-nodes-base.wait",
"position": [
-720,
2088
],
"webhookId": "a0518843-31f8-44f9-bd8e-1189e16de0f1",
"parameters": {
"amount": 30
},
"typeVersion": 1.1
},
{
"id": "9bb49bf6-a02e-4cf7-a1d1-ca4addff2bc6",
"name": "조건문",
"type": "n8n-nodes-base.if",
"position": [
-272,
2016
],
"parameters": {
"options": {},
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "7a07aec1-fc5f-4b76-94d9-6fa8f509ac8e",
"operator": {
"name": "filter.operator.equals",
"type": "string",
"operation": "equals"
},
"leftValue": "={{ $json.status }}",
"rightValue": "SUCCESS"
}
]
}
},
"typeVersion": 2.2
},
{
"id": "28654aff-e603-4080-9e7a-e706aaee47c4",
"name": "대기2",
"type": "n8n-nodes-base.wait",
"position": [
-48,
2184
],
"webhookId": "8da5da31-1ebd-4c82-8c6a-476d5d277cdd",
"parameters": {
"amount": 60
},
"typeVersion": 1.1
},
{
"id": "ba935542-d2c0-4781-b6f9-5e1e007a9740",
"name": "스티커 노트2",
"type": "n8n-nodes-base.stickyNote",
"position": [
0,
1584
],
"parameters": {
"width": 288,
"height": 352,
"content": "## Normalized Content\n\n* Removes noise\n* Reduces duplication\n* Improves retrieval quality \n* Preserves context \n* Consistent format \n* Prevents wasted tokens \n\n**Note : Update the code based in your requirement**"
},
"typeVersion": 1
},
{
"id": "45efcdf9-89a9-4638-a9ed-cac39506270f",
"name": "스티커 노트3",
"type": "n8n-nodes-base.stickyNote",
"position": [
-2080,
1472
],
"parameters": {
"width": 464,
"height": 1200,
"content": "## Try It Out! \n### This n8n template demonstrates how to normalize, index, and query insurance PDFs using AI and Pinecone for a full **RAG (Retrieval-Augmented Generation)** workflow. \n\n### Use cases include: creating **chatbots or Q&A systems** for structured documents, extracting insights from insurance policies, or managing compliance/legal PDFs efficiently. \n\n---\n\n## How it works\n* New PDFs are automatically detected from a **Google Drive** folder. \n* PDFs are sent to **LlamaIndex Cloud** for parsing → returns clean Markdown text. \n* Text is normalized to remove headers, footers, page numbers, and formatting artifacts. \n* The normalized text is split into chunks (~1200 characters with 150-character overlap) for better embedding. \n* **OpenAI embeddings** are generated for each chunk. \n* Chunks and metadata are stored in **Pinecone** for semantic search. \n* A **Chat Agent** queries Pinecone to retrieve answers from your document vector database. \n\n---\n\n### How to use\n* Update the folder name in google drive trigger node. \n* Place a pdf file in the same folder in google drive.\n* Customize the `Normalized Content` function node to adjust regex for headers/footers specific to your documents. \n* Adjust chunk size or metadata namespace in the Pinecone node to fit your project needs. \n\n---\n\n### Requirements\n* Google Drive account for PDF source files. \n* **LlamaIndex Cloud** account (parsing API key). \n* **Pinecone** account for vector storage. \n* **OpenAI** account for model and embeddings. \n\n---\n\n### Need Help? \nask in the [n8n Forum](https://community.n8n.io/)! \n\nHappy Automating! 🚀\n"
},
"typeVersion": 1
},
{
"id": "34f5ba5e-7f4e-4c94-a4e8-41bfbaf163a1",
"name": "Llama 클라우드에 업로드",
"type": "n8n-nodes-base.httpRequest",
"position": [
-944,
2088
],
"parameters": {
"url": "https://api.cloud.llamaindex.ai/api/v1/parsing/upload",
"method": "POST",
"options": {},
"sendBody": true,
"contentType": "multipart-form-data",
"sendHeaders": true,
"authentication": "genericCredentialType",
"bodyParameters": {
"parameters": [
{
"name": "file",
"parameterType": "formBinaryData",
"inputDataFieldName": "data"
}
]
},
"genericAuthType": "httpBearerAuth",
"headerParameters": {
"parameters": [
{
"name": "accept",
"value": "application/json"
},
{
"name": "Content-Type",
"value": "multipart/form-data"
}
]
}
},
"credentials": {
"httpBearerAuth": {
"id": "FlAAm17M7G6as02l",
"name": "learnby_llama_cloud"
}
},
"executeOnce": false,
"retryOnFail": true,
"typeVersion": 4.2,
"alwaysOutputData": false
},
{
"id": "1199e4ff-1952-4225-b655-1f63875f8903",
"name": "파싱 상태 확인",
"type": "n8n-nodes-base.httpRequest",
"position": [
-496,
2088
],
"parameters": {
"url": "=https://api.cloud.llamaindex.ai/api/parsing/job/{{ $('Upload to Llama Cloud').item.json.id }}",
"options": {},
"sendHeaders": true,
"authentication": "genericCredentialType",
"genericAuthType": "httpBearerAuth",
"headerParameters": {
"parameters": [
{
"name": "accept",
"value": "application/json"
}
]
}
},
"credentials": {
"httpBearerAuth": {
"id": "FlAAm17M7G6as02l",
"name": "learnby_llama_cloud"
}
},
"retryOnFail": true,
"typeVersion": 4.2
},
{
"id": "cfaf9e10-0297-423a-b3f4-c25561c92078",
"name": "Llama 클라우드에서 마크다운 추출",
"type": "n8n-nodes-base.httpRequest",
"position": [
-48,
1968
],
"parameters": {
"url": "=https://api.cloud.llamaindex.ai/api/v1/parsing/job/{{ $json.id }}/result/markdown",
"options": {},
"sendHeaders": true,
"authentication": "genericCredentialType",
"genericAuthType": "httpBearerAuth",
"headerParameters": {
"parameters": [
{
"name": "accept",
"value": "application/json"
}
]
}
},
"credentials": {
"httpBearerAuth": {
"id": "FlAAm17M7G6as02l",
"name": "learnby_llama_cloud"
}
},
"retryOnFail": true,
"typeVersion": 4.2
},
{
"id": "564a0930-e80b-4db6-a62a-5224248e5cd9",
"name": "텍스트 정규화",
"type": "n8n-nodes-base.code",
"position": [
176,
1968
],
"parameters": {
"mode": "runOnceForEachItem",
"jsCode": "// Get the input text from the previous node\nconst input = $json.markdown || $json.text || \"\";\n\nlet text = input.replace(/Car Insurance Policy\\s*\\d+/gi, \"\");\n\n// Remove \"Page X\" markers\ntext = text.replace(/Page\\s*\\d+/gi, \"\");\n\n// Replace --- dividers with a single newline\ntext = text.replace(/-{3,}/g, \"\\n\");\n\n// Decode & cleanup artifacts\ntext = text.replace(/&/g, \"&\"); // fix HTML entities\ntext = text.replace(/[ⓤ]/g, \"-\"); // replace bullet symbols with dashes\n\n// Collapse whitespace\ntext = text.replace(/\\n{2,}/g, \"\\n\\n\"); // keep paragraph breaks\ntext = text.replace(/[ \\t]+/g, \" \"); // collapse spaces\n\n// Step 5: Trim\ntext = text.trim();\n\n// Output for next node\nreturn { json: { normalizedText: text } };\n"
},
"typeVersion": 2
},
{
"id": "fe32694a-2cbc-4ad4-88aa-4eb3dba0256c",
"name": "텍스트 청킹",
"type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter",
"position": [
608,
2400
],
"parameters": {
"options": {
"splitCode": "markdown"
},
"chunkSize": 1200,
"chunkOverlap": 150
},
"typeVersion": 1
},
{
"id": "b399c9fa-03d7-4126-9026-674d091b9ddf",
"name": "임베딩 생성",
"type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
"position": [
400,
2192
],
"parameters": {
"options": {}
},
"credentials": {
"openAiApi": {
"id": "Yj4Rt75fspowAEru",
"name": "nextweb-openai"
}
},
"typeVersion": 1.2
},
{
"id": "1da27207-b77e-41d0-a249-5096ec8ac259",
"name": "Pinecone에 저장",
"type": "@n8n/n8n-nodes-langchain.vectorStorePinecone",
"position": [
432,
1968
],
"parameters": {
"mode": "insert",
"options": {
"pineconeNamespace": "rag"
},
"pineconeIndex": {
"__rl": true,
"mode": "id",
"value": "demo"
}
},
"credentials": {
"pineconeApi": {
"id": "uo1lZDPNWTsMAeOC",
"name": "learnby-PineconeApi-account"
}
},
"notesInFlow": false,
"typeVersion": 1.3
},
{
"id": "4f65ec8b-f936-41cb-b05c-0cc710df1c9e",
"name": "스티커 노트4",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1568,
1728
],
"parameters": {
"color": 6,
"width": 464,
"height": 640,
"content": "## Extract Data"
},
"typeVersion": 1
}
],
"active": false,
"pinData": {},
"settings": {
"executionOrder": "v1"
},
"versionId": "5ec0ee83-34cd-423d-8bd5-41400bde4a4a",
"connections": {
"9bb49bf6-a02e-4cf7-a1d1-ca4addff2bc6": {
"main": [
[
{
"node": "cfaf9e10-0297-423a-b3f4-c25561c92078",
"type": "main",
"index": 0
}
],
[
{
"node": "28654aff-e603-4080-9e7a-e706aaee47c4",
"type": "main",
"index": 0
}
]
]
},
"05034e35-f6bf-45a6-860e-94f4da566daf": {
"main": [
[
{
"node": "1199e4ff-1952-4225-b655-1f63875f8903",
"type": "main",
"index": 0
}
]
]
},
"28654aff-e603-4080-9e7a-e706aaee47c4": {
"main": [
[
{
"node": "1199e4ff-1952-4225-b655-1f63875f8903",
"type": "main",
"index": 0
}
]
]
},
"fe32694a-2cbc-4ad4-88aa-4eb3dba0256c": {
"ai_textSplitter": [
[
{
"node": "127b41ed-ad45-4234-b87f-4f3c2b6ea531",
"type": "ai_textSplitter",
"index": 0
}
]
]
},
"ff933f76-d719-40b5-b193-8a29e5fa2197": {
"main": [
[
{
"node": "34f5ba5e-7f4e-4c94-a4e8-41bfbaf163a1",
"type": "main",
"index": 0
}
]
]
},
"564a0930-e80b-4db6-a62a-5224248e5cd9": {
"main": [
[
{
"node": "1da27207-b77e-41d0-a249-5096ec8ac259",
"type": "main",
"index": 0
}
]
]
},
"1da27207-b77e-41d0-a249-5096ec8ac259": {
"main": [
[]
]
},
"127b41ed-ad45-4234-b87f-4f3c2b6ea531": {
"ai_document": [
[
{
"node": "1da27207-b77e-41d0-a249-5096ec8ac259",
"type": "ai_document",
"index": 0
}
]
]
},
"b399c9fa-03d7-4126-9026-674d091b9ddf": {
"ai_embedding": [
[
{
"node": "1da27207-b77e-41d0-a249-5096ec8ac259",
"type": "ai_embedding",
"index": 0
}
]
]
},
"1199e4ff-1952-4225-b655-1f63875f8903": {
"main": [
[
{
"node": "9bb49bf6-a02e-4cf7-a1d1-ca4addff2bc6",
"type": "main",
"index": 0
}
]
]
},
"19b009db-a418-458c-a216-bdcc9af6fd2f": {
"main": [
[
{
"node": "ff933f76-d719-40b5-b193-8a29e5fa2197",
"type": "main",
"index": 0
}
]
]
},
"34f5ba5e-7f4e-4c94-a4e8-41bfbaf163a1": {
"main": [
[
{
"node": "05034e35-f6bf-45a6-860e-94f4da566daf",
"type": "main",
"index": 0
}
]
]
},
"cfaf9e10-0297-423a-b3f4-c25561c92078": {
"main": [
[
{
"node": "564a0930-e80b-4db6-a62a-5224248e5cd9",
"type": "main",
"index": 0
}
]
]
}
}
}이 워크플로우를 어떻게 사용하나요?
위의 JSON 구성 코드를 복사하여 n8n 인스턴스에서 새 워크플로우를 생성하고 "JSON에서 가져오기"를 선택한 후, 구성을 붙여넣고 필요에 따라 인증 설정을 수정하세요.
이 워크플로우는 어떤 시나리오에 적합한가요?
고급 - AI RAG, 멀티모달 AI
유료인가요?
이 워크플로우는 완전히 무료이며 직접 가져와 사용할 수 있습니다. 다만, 워크플로우에서 사용하는 타사 서비스(예: OpenAI API)는 사용자 직접 비용을 지불해야 할 수 있습니다.
관련 워크플로우 추천
Alok Kumar
@alokkumarI am a Principal Software Engineer based in Ireland with a deep passion for AI and emerging technologies. With extensive experience in designing and implementing scalable software solutions, I focus on leveraging artificial intelligence to solve real-world problems. I enjoy exploring innovative applications of AI, from intelligent automation to data-driven insights, and I’m dedicated to building systems that are both efficient and impactful.
이 워크플로우 공유