GPT-4、PDFVector、PostgreSQLエクスポートを使用して文書からデータを抽出

中級

これはDocument Extraction, Multimodal AI分野の自動化ワークフローで、9個のノードを含みます。主にCode, OpenAi, Switch, Postgres, PdfVectorなどのノードを使用。 ドキュメントからデータをGPT-4、PDFVector、PostgreSQLで出力する

前提条件
  • OpenAI API Key
  • PostgreSQLデータベース接続情報
ワークフロープレビュー
ノード接続関係を可視化、ズームとパンをサポート
ワークフローをエクスポート
以下のJSON設定をn8nにインポートして、このワークフローを使用できます
{
  "meta": {
    "instanceId": "placeholder"
  },
  "nodes": [
    {
      "id": "workflow-info",
      "name": "パイプライン情報",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        250,
        150
      ],
      "parameters": {
        "content": "## Document Extraction Pipeline\n\nExtracts structured data from:\n- Invoices\n- Contracts\n- Reports\n- Forms\n\nCustomize extraction rules in the AI node"
      },
      "typeVersion": 1
    },
    {
      "id": "file-trigger",
      "name": "フォルダ監視",
      "type": "n8n-nodes-base.localFileTrigger",
      "notes": "Triggers when new documents arrive",
      "position": [
        450,
        300
      ],
      "parameters": {
        "path": "/documents/incoming",
        "events": [
          "file:created"
        ]
      },
      "typeVersion": 1
    },
    {
      "id": "pdfvector-parse",
      "name": "PDF Vector - ドキュメント解析",
      "type": "n8n-nodes-pdfvector.pdfVector",
      "notes": "Parse with LLM for better extraction",
      "position": [
        650,
        300
      ],
      "parameters": {
        "useLlm": "always",
        "resource": "document",
        "operation": "parse",
        "documentUrl": "={{ $json.filePath }}"
      },
      "typeVersion": 1
    },
    {
      "id": "extract-data",
      "name": "構造化データ抽出",
      "type": "n8n-nodes-base.openAi",
      "position": [
        850,
        300
      ],
      "parameters": {
        "model": "gpt-4",
        "options": {
          "responseFormat": {
            "type": "json_object"
          }
        },
        "messages": {
          "values": [
            {
              "content": "Extract the following information from this document:\n\n1. Document Type (invoice, contract, report, etc.)\n2. Date/Dates mentioned\n3. Parties involved (names, companies)\n4. Key amounts/values\n5. Important terms or conditions\n6. Reference numbers\n7. Addresses\n8. Contact information\n\nDocument content:\n{{ $json.content }}\n\nReturn as structured JSON."
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "validate-data",
      "name": "データ検証・クリーニング",
      "type": "n8n-nodes-base.code",
      "position": [
        1050,
        300
      ],
      "parameters": {
        "functionCode": "// Validate and clean extracted data\nconst extracted = JSON.parse($json.content);\nconst validated = {};\n\n// Validate document type\nvalidated.documentType = extracted.documentType || 'unknown';\n\n// Parse and validate dates\nif (extracted.date) {\n  const date = new Date(extracted.date);\n  validated.date = isNaN(date) ? null : date.toISOString();\n}\n\n// Clean monetary values\nif (extracted.amounts) {\n  validated.amounts = extracted.amounts.map(amt => {\n    const cleaned = amt.replace(/[^0-9.-]/g, '');\n    return parseFloat(cleaned) || 0;\n  });\n}\n\n// Validate email addresses\nif (extracted.emails) {\n  validated.emails = extracted.emails.filter(email => \n    /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(email)\n  );\n}\n\nvalidated.raw = extracted;\nvalidated.fileName = $node['Watch Folder'].json.fileName;\nvalidated.processedAt = new Date().toISOString();\n\nreturn validated;"
      },
      "typeVersion": 1
    },
    {
      "id": "route-by-type",
      "name": "ドキュメント種別によるルーティング",
      "type": "n8n-nodes-base.switch",
      "position": [
        1250,
        300
      ],
      "parameters": {
        "conditions": {
          "string": [
            {
              "value1": "={{ $json.documentType }}",
              "value2": "invoice",
              "operation": "equals"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "store-invoice",
      "name": "請求書データ保存",
      "type": "n8n-nodes-base.postgres",
      "position": [
        1450,
        250
      ],
      "parameters": {
        "table": "invoices",
        "columns": "invoice_number,vendor,amount,date,raw_data",
        "operation": "insert"
      },
      "typeVersion": 1
    },
    {
      "id": "store-other",
      "name": "その他ドキュメント保存",
      "type": "n8n-nodes-base.postgres",
      "position": [
        1450,
        350
      ],
      "parameters": {
        "table": "documents",
        "columns": "type,content,metadata,processed_at",
        "operation": "insert"
      },
      "typeVersion": 1
    },
    {
      "id": "export-csv",
      "name": "Export to CSV",
      "type": "n8n-nodes-base.writeBinaryFile",
      "position": [
        1650,
        300
      ],
      "parameters": {
        "fileName": "extracted_data_{{ $now.format('yyyy-MM-dd') }}.csv",
        "fileContent": "={{ $items().map(item => item.json).toCsv() }}"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "file-trigger": {
      "main": [
        [
          {
            "node": "pdfvector-parse",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "store-invoice": {
      "main": [
        [
          {
            "node": "export-csv",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "store-other": {
      "main": [
        [
          {
            "node": "export-csv",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "validate-data": {
      "main": [
        [
          {
            "node": "route-by-type",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "route-by-type": {
      "main": [
        [
          {
            "node": "store-invoice",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "store-other",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "extract-data": {
      "main": [
        [
          {
            "node": "validate-data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "pdfvector-parse": {
      "main": [
        [
          {
            "node": "extract-data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
よくある質問

このワークフローの使い方は?

上記のJSON設定コードをコピーし、n8nインスタンスで新しいワークフローを作成して「JSONからインポート」を選択、設定を貼り付けて認証情報を必要に応じて変更してください。

このワークフローはどんな場面に適していますか?

中級 - 文書抽出, マルチモーダルAI

有料ですか?

このワークフローは完全無料です。ただし、ワークフローで使用するサードパーティサービス(OpenAI APIなど)は別途料金が発生する場合があります。

ワークフロー情報
難易度
中級
ノード数9
カテゴリー2
ノードタイプ8
難易度説明

経験者向け、6-15ノードの中程度の複雑さのワークフロー

作成者
PDF Vector

PDF Vector

@pdfvector

A fully featured PDF APIs for developers - Parse any PDF or Word document, extract structured data, and access millions of academic papers - all through simple APIs.

外部リンク
n8n.ioで表示

このワークフローを共有

カテゴリー

カテゴリー: 34