PDF 내용을 분석, 정규화, 추출하여 Pinecone에 저장하여 RAG에 사용합니다.

Name: PDF 내용을 분석, 정규화, 추출하여 Pinecone에 저장하여 RAG에 사용합니다.
Rating: 4.5 (10 reviews)
Author: Alok Kumar

고급

이것은AI RAG, Multimodal AI분야의자동화 워크플로우로, 18개의 노드를 포함합니다.주로 If, Code, Wait, GoogleDrive, HttpRequest 등의 노드를 사용하며. LlamaIndex, OpenAI 임베딩, Pinecone 벡터 데이터베이스를 사용하여 PDF 질문 응답 시스템을 구축합니다.

사전 요구사항

•Google Drive API 인증 정보
•대상 API의 인증 정보가 필요할 수 있음
•OpenAI API Key
•Pinecone API Key

사용된 노드 (18)

DocumentDefaultDataLoader

TextSplitterRecursiveCharacterTextSplitter

카테고리

AI RAG

멀티모달 AI

워크플로우 미리보기

노드 연결 관계를 시각적으로 표시하며, 확대/축소 및 이동을 지원합니다

Google Drive 트리거

파일 다운로드

기본 데이터 로더

대기

조건문

대기2

Llama 클라우드에 업로드

파싱 상태 확인

Llama 클라우드에서 마크다운 추출

텍스트 정규화

텍스트 청킹

임베딩 생성

Pinecone에 저장

React Flow

워크플로우 내보내기

다음 JSON 구성을 복사하여 n8n에 가져오면 이 워크플로우를 사용할 수 있습니다

{
  "id": "xDiuqZUZnShKpPzX",
  "meta": {
    "instanceId": "70273a2379644db63ce659827cfd8abac2d0b189210eafa02dd5376e3a62cd1d",
    "templateCredsSetupCompleted": true
  },
  "name": "Parse, Normalize, Extract, and Store PDF Content for RAG in Pinecone",
  "tags": [],
  "nodes": [
    {
      "id": "19b009db-a418-458c-a216-bdcc9af6fd2f",
      "name": "Google Drive 트리거",
      "type": "n8n-nodes-base.googleDriveTrigger",
      "position": [
        -1504,
        2080
      ],
      "parameters": {
        "event": "fileCreated",
        "options": {},
        "pollTimes": {
          "item": [
            {
              "mode": "everyMinute"
            }
          ]
        },
        "triggerOn": "specificFolder",
        "folderToWatch": {
          "__rl": true,
          "mode": "list",
          "value": ""
        }
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "id": "aU33fzddE6s3ZQw6",
          "name": "LearnBy-Google-Drive"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "ff933f76-d719-40b5-b193-8a29e5fa2197",
      "name": "파일 다운로드",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        -1248,
        2096
      ],
      "parameters": {
        "fileId": {
          "__rl": true,
          "mode": "id",
          "value": "={{ $json.id }}"
        },
        "options": {},
        "operation": "download"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "id": "aU33fzddE6s3ZQw6",
          "name": "LearnBy-Google-Drive"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "127b41ed-ad45-4234-b87f-4f3c2b6ea531",
      "name": "기본 데이터 로더",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "position": [
        528,
        2192
      ],
      "parameters": {
        "options": {},
        "textSplittingMode": "custom"
      },
      "typeVersion": 1.1
    },
    {
      "id": "0316c7d4-449f-4275-a9b1-8848545beba8",
      "name": "스티커 노트",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        336,
        1712
      ],
      "parameters": {
        "width": 736,
        "height": 832,
        "content": "## Save to Vector DB"
      },
      "typeVersion": 1
    },
    {
      "id": "b06702b5-c322-4a5a-949a-855c8b97dadc",
      "name": "스티커 노트1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1088,
        1720
      ],
      "parameters": {
        "color": 4,
        "width": 1392,
        "height": 656,
        "content": "## Prepare data - Parse and Normalize\n"
      },
      "typeVersion": 1
    },
    {
      "id": "05034e35-f6bf-45a6-860e-94f4da566daf",
      "name": "대기",
      "type": "n8n-nodes-base.wait",
      "position": [
        -720,
        2088
      ],
      "webhookId": "a0518843-31f8-44f9-bd8e-1189e16de0f1",
      "parameters": {
        "amount": 30
      },
      "typeVersion": 1.1
    },
    {
      "id": "9bb49bf6-a02e-4cf7-a1d1-ca4addff2bc6",
      "name": "조건문",
      "type": "n8n-nodes-base.if",
      "position": [
        -272,
        2016
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "7a07aec1-fc5f-4b76-94d9-6fa8f509ac8e",
              "operator": {
                "name": "filter.operator.equals",
                "type": "string",
                "operation": "equals"
              },
              "leftValue": "={{ $json.status }}",
              "rightValue": "SUCCESS"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "28654aff-e603-4080-9e7a-e706aaee47c4",
      "name": "대기2",
      "type": "n8n-nodes-base.wait",
      "position": [
        -48,
        2184
      ],
      "webhookId": "8da5da31-1ebd-4c82-8c6a-476d5d277cdd",
      "parameters": {
        "amount": 60
      },
      "typeVersion": 1.1
    },
    {
      "id": "ba935542-d2c0-4781-b6f9-5e1e007a9740",
      "name": "스티커 노트2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        0,
        1584
      ],
      "parameters": {
        "width": 288,
        "height": 352,
        "content": "## Normalized Content\n\n* Removes noise\n* Reduces duplication\n* Improves retrieval quality \n* Preserves context \n* Consistent format \n* Prevents wasted tokens \n\n**Note : Update the code based in your requirement**"
      },
      "typeVersion": 1
    },
    {
      "id": "45efcdf9-89a9-4638-a9ed-cac39506270f",
      "name": "스티커 노트3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2080,
        1472
      ],
      "parameters": {
        "width": 464,
        "height": 1200,
        "content": "## Try It Out!  \n### This n8n template demonstrates how to normalize, index, and query insurance PDFs using AI and Pinecone for a full **RAG (Retrieval-Augmented Generation)** workflow.  \n\n### Use cases include: creating **chatbots or Q&A systems** for structured documents, extracting insights from insurance policies, or managing compliance/legal PDFs efficiently.  \n\n---\n\n## How it works\n* New PDFs are automatically detected from a **Google Drive** folder.  \n* PDFs are sent to **LlamaIndex Cloud** for parsing → returns clean Markdown text.  \n* Text is normalized to remove headers, footers, page numbers, and formatting artifacts.  \n* The normalized text is split into chunks (~1200 characters with 150-character overlap) for better embedding.  \n* **OpenAI embeddings** are generated for each chunk.  \n* Chunks and metadata are stored in **Pinecone** for semantic search.  \n* A **Chat Agent** queries Pinecone to retrieve answers from your document vector database.  \n\n---\n\n### How to use\n* Update the folder name in google drive trigger node. \n* Place a pdf file in the same folder in google drive.\n* Customize the `Normalized Content` function node to adjust regex for headers/footers specific to your documents.  \n* Adjust chunk size or metadata namespace in the Pinecone node to fit your project needs.  \n\n---\n\n### Requirements\n* Google Drive account for PDF source files.  \n* **LlamaIndex Cloud** account (parsing API key).  \n* **Pinecone** account for vector storage.  \n* **OpenAI** account for model and embeddings.   \n\n---\n\n### Need Help?  \nask in the [n8n Forum](https://community.n8n.io/)!  \n\nHappy Automating! 🚀\n"
      },
      "typeVersion": 1
    },
    {
      "id": "34f5ba5e-7f4e-4c94-a4e8-41bfbaf163a1",
      "name": "Llama 클라우드에 업로드",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -944,
        2088
      ],
      "parameters": {
        "url": "https://api.cloud.llamaindex.ai/api/v1/parsing/upload",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "contentType": "multipart-form-data",
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "file",
              "parameterType": "formBinaryData",
              "inputDataFieldName": "data"
            }
          ]
        },
        "genericAuthType": "httpBearerAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "accept",
              "value": "application/json"
            },
            {
              "name": "Content-Type",
              "value": "multipart/form-data"
            }
          ]
        }
      },
      "credentials": {
        "httpBearerAuth": {
          "id": "FlAAm17M7G6as02l",
          "name": "learnby_llama_cloud"
        }
      },
      "executeOnce": false,
      "retryOnFail": true,
      "typeVersion": 4.2,
      "alwaysOutputData": false
    },
    {
      "id": "1199e4ff-1952-4225-b655-1f63875f8903",
      "name": "파싱 상태 확인",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -496,
        2088
      ],
      "parameters": {
        "url": "=https://api.cloud.llamaindex.ai/api/parsing/job/{{ $('Upload to Llama Cloud').item.json.id }}",
        "options": {},
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "genericAuthType": "httpBearerAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        }
      },
      "credentials": {
        "httpBearerAuth": {
          "id": "FlAAm17M7G6as02l",
          "name": "learnby_llama_cloud"
        }
      },
      "retryOnFail": true,
      "typeVersion": 4.2
    },
    {
      "id": "cfaf9e10-0297-423a-b3f4-c25561c92078",
      "name": "Llama 클라우드에서 마크다운 추출",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -48,
        1968
      ],
      "parameters": {
        "url": "=https://api.cloud.llamaindex.ai/api/v1/parsing/job/{{ $json.id }}/result/markdown",
        "options": {},
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "genericAuthType": "httpBearerAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        }
      },
      "credentials": {
        "httpBearerAuth": {
          "id": "FlAAm17M7G6as02l",
          "name": "learnby_llama_cloud"
        }
      },
      "retryOnFail": true,
      "typeVersion": 4.2
    },
    {
      "id": "564a0930-e80b-4db6-a62a-5224248e5cd9",
      "name": "텍스트 정규화",
      "type": "n8n-nodes-base.code",
      "position": [
        176,
        1968
      ],
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "// Get the input text from the previous node\nconst input = $json.markdown || $json.text || \"\";\n\nlet text = input.replace(/Car Insurance Policy\\s*\\d+/gi, \"\");\n\n// Remove \"Page X\" markers\ntext = text.replace(/Page\\s*\\d+/gi, \"\");\n\n// Replace --- dividers with a single newline\ntext = text.replace(/-{3,}/g, \"\\n\");\n\n// Decode & cleanup artifacts\ntext = text.replace(/&#x26;/g, \"&\");   // fix HTML entities\ntext = text.replace(/[ⓤ]/g, \"-\");      // replace bullet symbols with dashes\n\n// Collapse whitespace\ntext = text.replace(/\\n{2,}/g, \"\\n\\n\"); // keep paragraph breaks\ntext = text.replace(/[ \\t]+/g, \" \");    // collapse spaces\n\n// Step 5: Trim\ntext = text.trim();\n\n// Output for next node\nreturn { json: { normalizedText: text } };\n"
      },
      "typeVersion": 2
    },
    {
      "id": "fe32694a-2cbc-4ad4-88aa-4eb3dba0256c",
      "name": "텍스트 청킹",
      "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter",
      "position": [
        608,
        2400
      ],
      "parameters": {
        "options": {
          "splitCode": "markdown"
        },
        "chunkSize": 1200,
        "chunkOverlap": 150
      },
      "typeVersion": 1
    },
    {
      "id": "b399c9fa-03d7-4126-9026-674d091b9ddf",
      "name": "임베딩 생성",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "position": [
        400,
        2192
      ],
      "parameters": {
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "id": "Yj4Rt75fspowAEru",
          "name": "nextweb-openai"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "1da27207-b77e-41d0-a249-5096ec8ac259",
      "name": "Pinecone에 저장",
      "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone",
      "position": [
        432,
        1968
      ],
      "parameters": {
        "mode": "insert",
        "options": {
          "pineconeNamespace": "rag"
        },
        "pineconeIndex": {
          "__rl": true,
          "mode": "id",
          "value": "demo"
        }
      },
      "credentials": {
        "pineconeApi": {
          "id": "uo1lZDPNWTsMAeOC",
          "name": "learnby-PineconeApi-account"
        }
      },
      "notesInFlow": false,
      "typeVersion": 1.3
    },
    {
      "id": "4f65ec8b-f936-41cb-b05c-0cc710df1c9e",
      "name": "스티커 노트4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1568,
        1728
      ],
      "parameters": {
        "color": 6,
        "width": 464,
        "height": 640,
        "content": "## Extract Data"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "pinData": {},
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "5ec0ee83-34cd-423d-8bd5-41400bde4a4a",
  "connections": {
    "9bb49bf6-a02e-4cf7-a1d1-ca4addff2bc6": {
      "main": [
        [
          {
            "node": "cfaf9e10-0297-423a-b3f4-c25561c92078",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "28654aff-e603-4080-9e7a-e706aaee47c4",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "05034e35-f6bf-45a6-860e-94f4da566daf": {
      "main": [
        [
          {
            "node": "1199e4ff-1952-4225-b655-1f63875f8903",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "28654aff-e603-4080-9e7a-e706aaee47c4": {
      "main": [
        [
          {
            "node": "1199e4ff-1952-4225-b655-1f63875f8903",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "fe32694a-2cbc-4ad4-88aa-4eb3dba0256c": {
      "ai_textSplitter": [
        [
          {
            "node": "127b41ed-ad45-4234-b87f-4f3c2b6ea531",
            "type": "ai_textSplitter",
            "index": 0
          }
        ]
      ]
    },
    "ff933f76-d719-40b5-b193-8a29e5fa2197": {
      "main": [
        [
          {
            "node": "34f5ba5e-7f4e-4c94-a4e8-41bfbaf163a1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "564a0930-e80b-4db6-a62a-5224248e5cd9": {
      "main": [
        [
          {
            "node": "1da27207-b77e-41d0-a249-5096ec8ac259",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "1da27207-b77e-41d0-a249-5096ec8ac259": {
      "main": [
        []
      ]
    },
    "127b41ed-ad45-4234-b87f-4f3c2b6ea531": {
      "ai_document": [
        [
          {
            "node": "1da27207-b77e-41d0-a249-5096ec8ac259",
            "type": "ai_document",
            "index": 0
          }
        ]
      ]
    },
    "b399c9fa-03d7-4126-9026-674d091b9ddf": {
      "ai_embedding": [
        [
          {
            "node": "1da27207-b77e-41d0-a249-5096ec8ac259",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "1199e4ff-1952-4225-b655-1f63875f8903": {
      "main": [
        [
          {
            "node": "9bb49bf6-a02e-4cf7-a1d1-ca4addff2bc6",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "19b009db-a418-458c-a216-bdcc9af6fd2f": {
      "main": [
        [
          {
            "node": "ff933f76-d719-40b5-b193-8a29e5fa2197",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "34f5ba5e-7f4e-4c94-a4e8-41bfbaf163a1": {
      "main": [
        [
          {
            "node": "05034e35-f6bf-45a6-860e-94f4da566daf",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "cfaf9e10-0297-423a-b3f4-c25561c92078": {
      "main": [
        [
          {
            "node": "564a0930-e80b-4db6-a62a-5224248e5cd9",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

자주 묻는 질문

이 워크플로우를 어떻게 사용하나요?

위의 JSON 구성 코드를 복사하여 n8n 인스턴스에서 새 워크플로우를 생성하고 "JSON에서 가져오기"를 선택한 후, 구성을 붙여넣고 필요에 따라 인증 설정을 수정하세요.

이 워크플로우는 어떤 시나리오에 적합한가요?

고급 - AI RAG, 멀티모달 AI

유료인가요?

이 워크플로우는 완전히 무료이며 직접 가져와 사용할 수 있습니다. 다만, 워크플로우에서 사용하는 타사 서비스(예: OpenAI API)는 사용자 직접 비용을 지불해야 할 수 있습니다.

PDF 내용을 분석, 정규화, 추출하여 Pinecone에 저장하여 RAG에 사용합니다.

사용된 노드 (18)

카테고리

이 워크플로우를 어떻게 사용하나요?

이 워크플로우는 어떤 시나리오에 적합한가요?

유료인가요?

관련 워크플로우 추천

카테고리