GPT-4, PDFVector, PostgreSQL를 사용하여 문서에서 데이터 추출

Name: GPT-4, PDFVector, PostgreSQL를 사용하여 문서에서 데이터 추출
Rating: 4.5 (10 reviews)
Author: PDF Vector

중급

이것은Document Extraction, Multimodal AI분야의자동화 워크플로우로, 9개의 노드를 포함합니다.주로 Code, OpenAi, Switch, Postgres, PdfVector 등의 노드를 사용하며. GPT-4、PDFVector와 PostgreSQL을 사용하여 문서에서 데이터를 추출하여 내보내기

사전 요구사항

•OpenAI API Key
•PostgreSQL 데이터베이스 연결 정보

사용된 노드 (9)

카테고리

문서 추출

멀티모달 AI

워크플로우 미리보기

노드 연결 관계를 시각적으로 표시하며, 확대/축소 및 이동을 지원합니다

폴더 감시

PDF Vector - 문서 구문 분석

구조화된 데이터 추출

데이터 검증 및 정리

문서 유형별 라우팅

인보이스 데이터 저장

기타 문서 저장

Export to CSV

React Flow

워크플로우 내보내기

다음 JSON 구성을 복사하여 n8n에 가져오면 이 워크플로우를 사용할 수 있습니다

{
  "meta": {
    "instanceId": "placeholder"
  },
  "nodes": [
    {
      "id": "workflow-info",
      "name": "파이프라인 정보",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        250,
        150
      ],
      "parameters": {
        "content": "## Document Extraction Pipeline\n\nExtracts structured data from:\n- Invoices\n- Contracts\n- Reports\n- Forms\n\nCustomize extraction rules in the AI node"
      },
      "typeVersion": 1
    },
    {
      "id": "file-trigger",
      "name": "폴더 감시",
      "type": "n8n-nodes-base.localFileTrigger",
      "notes": "Triggers when new documents arrive",
      "position": [
        450,
        300
      ],
      "parameters": {
        "path": "/documents/incoming",
        "events": [
          "file:created"
        ]
      },
      "typeVersion": 1
    },
    {
      "id": "pdfvector-parse",
      "name": "PDF Vector - 문서 구문 분석",
      "type": "n8n-nodes-pdfvector.pdfVector",
      "notes": "Parse with LLM for better extraction",
      "position": [
        650,
        300
      ],
      "parameters": {
        "useLlm": "always",
        "resource": "document",
        "operation": "parse",
        "documentUrl": "={{ $json.filePath }}"
      },
      "typeVersion": 1
    },
    {
      "id": "extract-data",
      "name": "구조화된 데이터 추출",
      "type": "n8n-nodes-base.openAi",
      "position": [
        850,
        300
      ],
      "parameters": {
        "model": "gpt-4",
        "options": {
          "responseFormat": {
            "type": "json_object"
          }
        },
        "messages": {
          "values": [
            {
              "content": "Extract the following information from this document:\n\n1. Document Type (invoice, contract, report, etc.)\n2. Date/Dates mentioned\n3. Parties involved (names, companies)\n4. Key amounts/values\n5. Important terms or conditions\n6. Reference numbers\n7. Addresses\n8. Contact information\n\nDocument content:\n{{ $json.content }}\n\nReturn as structured JSON."
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "validate-data",
      "name": "데이터 검증 및 정리",
      "type": "n8n-nodes-base.code",
      "position": [
        1050,
        300
      ],
      "parameters": {
        "functionCode": "// Validate and clean extracted data\nconst extracted = JSON.parse($json.content);\nconst validated = {};\n\n// Validate document type\nvalidated.documentType = extracted.documentType || 'unknown';\n\n// Parse and validate dates\nif (extracted.date) {\n  const date = new Date(extracted.date);\n  validated.date = isNaN(date) ? null : date.toISOString();\n}\n\n// Clean monetary values\nif (extracted.amounts) {\n  validated.amounts = extracted.amounts.map(amt => {\n    const cleaned = amt.replace(/[^0-9.-]/g, '');\n    return parseFloat(cleaned) || 0;\n  });\n}\n\n// Validate email addresses\nif (extracted.emails) {\n  validated.emails = extracted.emails.filter(email => \n    /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(email)\n  );\n}\n\nvalidated.raw = extracted;\nvalidated.fileName = $node['Watch Folder'].json.fileName;\nvalidated.processedAt = new Date().toISOString();\n\nreturn validated;"
      },
      "typeVersion": 1
    },
    {
      "id": "route-by-type",
      "name": "문서 유형별 라우팅",
      "type": "n8n-nodes-base.switch",
      "position": [
        1250,
        300
      ],
      "parameters": {
        "conditions": {
          "string": [
            {
              "value1": "={{ $json.documentType }}",
              "value2": "invoice",
              "operation": "equals"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "store-invoice",
      "name": "인보이스 데이터 저장",
      "type": "n8n-nodes-base.postgres",
      "position": [
        1450,
        250
      ],
      "parameters": {
        "table": "invoices",
        "columns": "invoice_number,vendor,amount,date,raw_data",
        "operation": "insert"
      },
      "typeVersion": 1
    },
    {
      "id": "store-other",
      "name": "기타 문서 저장",
      "type": "n8n-nodes-base.postgres",
      "position": [
        1450,
        350
      ],
      "parameters": {
        "table": "documents",
        "columns": "type,content,metadata,processed_at",
        "operation": "insert"
      },
      "typeVersion": 1
    },
    {
      "id": "export-csv",
      "name": "Export to CSV",
      "type": "n8n-nodes-base.writeBinaryFile",
      "position": [
        1650,
        300
      ],
      "parameters": {
        "fileName": "extracted_data_{{ $now.format('yyyy-MM-dd') }}.csv",
        "fileContent": "={{ $items().map(item => item.json).toCsv() }}"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "file-trigger": {
      "main": [
        [
          {
            "node": "pdfvector-parse",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "store-invoice": {
      "main": [
        [
          {
            "node": "export-csv",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "store-other": {
      "main": [
        [
          {
            "node": "export-csv",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "validate-data": {
      "main": [
        [
          {
            "node": "route-by-type",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "route-by-type": {
      "main": [
        [
          {
            "node": "store-invoice",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "store-other",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "extract-data": {
      "main": [
        [
          {
            "node": "validate-data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "pdfvector-parse": {
      "main": [
        [
          {
            "node": "extract-data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

자주 묻는 질문

이 워크플로우를 어떻게 사용하나요?

위의 JSON 구성 코드를 복사하여 n8n 인스턴스에서 새 워크플로우를 생성하고 "JSON에서 가져오기"를 선택한 후, 구성을 붙여넣고 필요에 따라 인증 설정을 수정하세요.

이 워크플로우는 어떤 시나리오에 적합한가요?

중급 - 문서 추출, 멀티모달 AI

유료인가요?

이 워크플로우는 완전히 무료이며 직접 가져와 사용할 수 있습니다. 다만, 워크플로우에서 사용하는 타사 서비스(예: OpenAI API)는 사용자 직접 비용을 지불해야 할 수 있습니다.

GPT-4, PDFVector, PostgreSQL를 사용하여 문서에서 데이터 추출

사용된 노드 (9)

카테고리

이 워크플로우를 어떻게 사용하나요?

이 워크플로우는 어떤 시나리오에 적합한가요?

유료인가요?

관련 워크플로우 추천

카테고리