8
n8n 中文网amn8n.com

每日RAG研究论文中心与arXiv、Gemini AI和Notion

高级

这是一个Content Creation, Multimodal AI领域的自动化工作流,包含 22 个节点。主要使用 If, Code, Gmail, Notion, Switch 等节点。 每日RAG研究论文中心与arXiv、Gemini AI和Notion

前置要求
  • Google 账号和 Gmail API 凭证
  • Notion API Key
  • 可能需要目标 API 的认证凭证
  • Google Gemini API Key
工作流预览
可视化展示节点连接关系,支持缩放和平移
导出工作流
复制以下 JSON 配置到 n8n 导入,即可使用此工作流
{
  "meta": {
    "instanceId": "a6011e4876c6b1225fa48dae1dbfa92e1932a633b3186bbb7bfd5c9e6ad2d878"
  },
  "nodes": [
    {
      "id": "7e9f18f1-edfe-4af6-835b-12fe16a99034",
      "name": "基础 LLM 链",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        272,
        0
      ],
      "parameters": {
        "text": "={{ $json.data }}",
        "batching": {},
        "messages": {
          "messageValues": [
            {
              "message": "You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:\n\n1. RAG Relevance and Labeling:\n   - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.\n   - For each data item, add three new fields:\n     - `RAG_TF`: \"T\" if related, \"F\" if not\n     - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty\n     - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty\n\n2. RAG Method Extraction:\n   - Analyze the `summary` and extract the RAG method proposed in the paper.\n   - Store it in the new field `RAG_NAME`.\n\n3. External Link Extraction:\n   - Analyze the `summary` content for `github` or `huggingface` links.\n   - If present, extract the URLs and populate the existing `github` and `huggingface` fields.\n   - If not present, leave them unchanged.\n\nOutput Format: standard JSON\n\nExample:\n\nGiven a data item with the following `summary`:\n\n\"summary\":\"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer\n"
            }
          ]
        },
        "promptType": "define"
      },
      "typeVersion": 1.7
    },
    {
      "id": "92d37dc1-aaaf-47ec-987a-e6d23c93e055",
      "name": "Google Gemini 聊天模型",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        272,
        144
      ],
      "parameters": {
        "options": {},
        "modelName": "=models/gemini-2.5-flash"
      },
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
      "name": "提交日期:T-1",
      "type": "n8n-nodes-base.code",
      "position": [
        -1664,
        320
      ],
      "parameters": {
        "jsCode": "// Function 节点代码\nconst now = new Date();\nconst yesterday = new Date(now);\nyesterday.setDate(now.getDate() - 2);\n\nconst y = yesterday.getFullYear();\nconst m = String(yesterday.getMonth() + 1).padStart(2, '0');\nconst d = String(yesterday.getDate()).padStart(2, '0');\n\nreturn [\n  {\n    json: {\n      from: `${y}${m}${d}0000`,\n      to: `${y}${m}${d}2359`\n    }\n  }\n];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "c3685631-8bbd-409a-978a-fbb3e9847115",
      "name": "如果",
      "type": "n8n-nodes-base.if",
      "position": [
        -160,
        16
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "de0a5a7e-67dd-4dd0-8ccc-3406e17bd09c",
              "operator": {
                "type": "number",
                "operation": "notEquals"
              },
              "leftValue": "={{ $json.paperCount }}",
              "rightValue": 0
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "4dd24343-1872-472d-8d7d-4cd28a9dbabe",
      "name": "计划触发器",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        -1856,
        320
      ],
      "parameters": {
        "rule": {
          "interval": [
            {
              "triggerAtHour": 6
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
      "name": "飞书",
      "type": "n8n-nodes-base.switch",
      "position": [
        576,
        720
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "7b804f5e-6702-4d4a-99b9-3f06f8eb20d4",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "feishu"
                  }
                ]
              }
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 3.2
    },
    {
      "id": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
      "name": "飞书 POST",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        800,
        720
      ],
      "parameters": {
        "url": "=",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "bodyParameters": {
          "parameters": [
            {
              "name": "msg_type",
              "value": "={{ $json.msg_type }}"
            },
            {
              "name": "content",
              "value": "={{ $json.content }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
      "name": "Gmail",
      "type": "n8n-nodes-base.switch",
      "position": [
        576,
        544
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "3222832c-bbf2-46a2-abd8-2bb14095b7bf",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "gmail"
                  }
                ]
              }
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 3.2
    },
    {
      "id": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
      "name": "发送消息",
      "type": "n8n-nodes-base.gmail",
      "position": [
        800,
        544
      ],
      "webhookId": "cb0a1f30-59e0-4505-af24-db689d9c1f23",
      "parameters": {
        "sendTo": "xing.adam@gmail.com",
        "message": "={{ $json.message }}",
        "options": {},
        "subject": "={{ $json.subject }}"
      },
      "credentials": {
        "gmailOAuth2": {
          "id": "WoyY5hj4D93bD2Fp",
          "name": "Gmail account"
        }
      },
      "typeVersion": 2.1
    },
    {
      "id": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
      "name": "向模型发送消息",
      "type": "@n8n/n8n-nodes-langchain.googleGemini",
      "position": [
        -1040,
        320
      ],
      "parameters": {
        "modelId": {
          "__rl": true,
          "mode": "list",
          "value": "models/gemini-2.5-flash-lite",
          "cachedResultName": "models/gemini-2.5-flash-lite"
        },
        "options": {},
        "messages": {
          "values": [
            {
              "role": "model",
              "content": "You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:\n\n1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary\n2. Set the daily date field `Date`: yyyy-mm-dd\n3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.\n4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.\n5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.\n\nExample: If there are papers:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 2,\n  \"SUMMARY_CN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.\",\n  \"SUMMARY_EN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency.\"\n}\n\nIf the number of papers is 0, maintain the JSON structure:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 0,\n  \"SUMMARY_CN\": \"\",\n  \"SUMMARY_EN\": \"\"\n}"
            },
            {
              "content": "={{ $json.data }}"
            }
          ]
        },
        "simplify": false
      },
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "024c6399-857e-45a3-a15d-8b733e16da67",
      "name": "RAG 每日论文摘要",
      "type": "n8n-nodes-base.notion",
      "position": [
        800,
        320
      ],
      "parameters": {
        "title": "={{ $json.title }}",
        "simple": false,
        "options": {},
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "mode": "list",
          "value": "26fa136d-cee4-8092-8b85-cf9e9cbc424f",
          "cachedResultUrl": "https://www.notion.so/26fa136dcee480928b85cf9e9cbc424f",
          "cachedResultName": "RAG Daily Paper Summary"
        },
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "DATE|date",
              "date": "={{ $json.date }}"
            },
            {
              "key": "Number of papers|number",
              "numberValue": "={{ $json.paperCount }}"
            },
            {
              "key": "SUMMARY_EN|rich_text",
              "textContent": "={{ $json.summaryEN }}"
            },
            {
              "key": "SUMMARY_CN|rich_text",
              "textContent": "={{ $json.summaryCN }}"
            }
          ]
        }
      },
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
      "name": "JSON 格式",
      "type": "n8n-nodes-base.code",
      "position": [
        -688,
        320
      ],
      "parameters": {
        "jsCode": "const items = $input.all();\nconst response = items[0].json;\n\ntry {\n  // Extract text content from Gemini API response\n  // Note: response is directly an object, not an array\n  const text = response.candidates[0].content.parts[0].text;\n  \n  // Extract JSON content\n  const jsonMatch = text.match(/```json\\n([\\s\\S]*?)\\n```/);\n  const jsonStr = jsonMatch[1];\n  \n  // Parse JSON\n  const data = JSON.parse(jsonStr);\n  \n  // Manually handle duplicate keys - extract from original string\n  const titleMatch = jsonStr.match(/\"Number of papers\":\\s*\"([^\"]+)\"/);\n  const countMatch = jsonStr.match(/\"Number of papers\":\\s*(\\d+)/);\n  \n  // Construct result\n  items[0].json = {\n    title: titleMatch ? titleMatch[1] : '',\n    date: data.Date || '',\n    paperCount: countMatch ? parseInt(countMatch[1]) : 0,\n    summaryCN: data.SUMMARY_CN || '',\n    summaryEN: data.SUMMARY_EN || ''\n  };\n  \n} catch (error) {\n  items[0].json = {\n    error: error.message,\n    originalData: response\n  };\n}\n\nreturn items;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "f1a331fa-d830-4656-b108-7e18e7430b04",
      "name": "便签3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1984,
        544
      ],
      "parameters": {
        "width": 736,
        "height": 768,
        "content": "## 1. 数据检索"
      },
      "typeVersion": 1
    },
    {
      "id": "ae855e91-2363-4b97-8933-761934b269fe",
      "name": "arXiv API",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -1440,
        320
      ],
      "parameters": {
        "url": "=https://export.arxiv.org/api/query?search_query=all:RAG+AND+submittedDate:[{{$json[\"from\"]}}+TO+{{$json[\"to\"]}}]",
        "options": {},
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "={{ $json.from }}"
            },
            {
              "name": "={{ $json.to }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
      "name": "消息构建",
      "type": "n8n-nodes-base.code",
      "position": [
        -128,
        528
      ],
      "parameters": {
        "jsCode": "// Get current date\nconst now = new Date();\nconst year = now.getFullYear();\nconst month = String(now.getMonth() + 1).padStart(2, '0');\nconst day = String(now.getDate()).padStart(2, '0');\nconst date = `${year}-${month}-${day}`;\n\n// Get input data\nconst inputData = $input.first().json;\n\n// Generate message content\nconst messageContent = inputData.SUMMARY_CN;\n\n// Gmail message body\nconst gmailMessage = {\n    subject: inputData.title || `Daily Paper Summary - ${date}`,\n    message: `<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n    <title> RAG Daily Paper Summary - ${date}</title>\n    <style type=\"text/css\">\n        /* Gmail safe styles */\n        body {\n            font-family: Arial, sans-serif;\n            line-height: 1.4;\n            margin: 0;\n            padding: 0;\n            background-color: #f9f9f9;\n            color: #333333;\n        }\n        \n        table {\n            border-collapse: collapse;\n            mso-table-lspace: 0pt;\n            mso-table-rspace: 0pt;\n        }\n        \n        .email-wrapper {\n            width: 100%;\n            background-color: #f9f9f9;\n            padding: 40px 20px;\n        }\n        \n        .email-container {\n            width: 100%;\n            max-width: 600px;\n            margin: 0 auto;\n            background-color: #ffffff;\n            border-radius: 8px;\n            box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);\n        }\n        \n        .header {\n            background-color: #2563eb;\n            padding: 24px;\n            text-align: center;\n            border-radius: 8px 8px 0 0;\n        }\n        \n        .header h1 {\n            margin: 0 0 8px 0;\n            font-size: 24px;\n            font-weight: 600;\n            color: #ffffff;\n        }\n        \n        .date {\n            font-size: 14px;\n            color: #ffffff;\n            opacity: 0.9;\n        }\n        \n        .stats {\n            background-color: #f1f5f9;\n            padding: 16px 24px;\n            font-size: 14px;\n            color: #64748b;\n        }\n        \n        .content {\n            padding: 32px 24px 40px 24px;\n        }\n        \n        .section {\n            margin-bottom: 24px;\n        }\n        \n        .section-title {\n            font-size: 16px;\n            font-weight: 600;\n            color: #1e293b;\n            margin-bottom: 12px;\n            padding-bottom: 8px;\n            border-bottom: 1px solid #e2e8f0;\n        }\n        \n        .flag {\n            display: inline-block;\n            width: 20px;\n            height: 14px;\n            margin-right: 8px;\n            border-radius: 2px;\n            vertical-align: middle;\n        }\n        \n        .flag-cn {\n            background-color: #de2910;\n        }\n        \n        .flag-en {\n            background-color: #012169;\n        }\n        \n        .summary {\n            font-size: 14px;\n            line-height: 1.6;\n            color: #475569;\n            padding: 16px;\n            background-color: #f8fafc;\n            border-radius: 6px;\n            border-left: 3px solid #2563eb;\n        }\n        \n        .divider {\n            height: 1px;\n            background-color: #e2e8f0;\n            margin: 20px 0;\n            border: none;\n        }\n        \n        /* Mobile responsive */\n        @media screen and (max-width: 600px) {\n            .email-wrapper {\n                padding: 20px 10px !important;\n            }\n            \n            .header, .stats {\n                padding: 20px 16px !important;\n            }\n            \n            .content {\n                padding: 24px 16px 32px 16px !important;\n            }\n            \n            .email-container {\n                border-radius: 0;\n            }\n        }\n        \n        /* Gmail specific fixes */\n        .gmail-fix {\n            display: none;\n        }\n        \n        /* Outlook specific fixes */\n        .ExternalClass {\n            width: 100%;\n        }\n        \n        .ExternalClass,\n        .ExternalClass p,\n        .ExternalClass span,\n        .ExternalClass font,\n        .ExternalClass td,\n        .ExternalClass div {\n            line-height: 100%;\n        }\n    </style>\n    <!--[if mso]>\n    <style type=\"text/css\">\n        .email-container {\n            width: 600px !important;\n        }\n    </style>\n    <![endif]-->\n</head>\n<body>\n    <table role=\"presentation\" class=\"email-wrapper\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n        <tr>\n            <td align=\"center\">\n                <table role=\"presentation\" class=\"email-container\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n                    <!-- Header -->\n                    <tr>\n                        <td class=\"header\">\n                            <h1>RAG Daily Papers</h1>\n                            <div class=\"date\">${inputData.Date || date}</div>\n                        </td>\n                    </tr>\n                    \n                    <!-- Stats -->\n                    <tr>\n                        <td class=\"stats\">\n                            <strong>${inputData[\"Number of papers\"] || inputData.paperCount || 0} papers</strong> reviewed today\n                        </td>\n                    </tr>\n                    \n                    <!-- Content -->\n                    <tr>\n                        <td class=\"content\">\n                            <!-- Chinese Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                  🇨🇳 Chinese\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}\n                                </div>\n                            </div>\n                            \n                            <!-- Divider -->\n                            <hr class=\"divider\">\n                            \n                            <!-- English Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                    🇺🇸 English\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}\n                                </div>\n                            </div>\n                        </td>\n                    </tr>\n                </table>\n            </td>\n        </tr>\n    </table>\n</body>\n</html>`\n};\n\n// Feishu message body\nconst feishuMessage = {\n    msg_type: \"text\",\n    content: {\n        text: `Today ${$input.first().json.date} ${$input.first().json.paperCount}  papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`\n    }\n};\n\n// n8n output format\nreturn [\n    { json: { type: \"gmail\", ...gmailMessage } },\n    { json: { type: \"feishu\", ...feishuMessage } }\n];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "2582c7df-9b15-4473-bc47-91cf6f7304e0",
      "name": "便签",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -176,
        896
      ],
      "parameters": {
        "width": 1152,
        "height": 576,
        "content": "## 5. 消息推送"
      },
      "typeVersion": 1
    },
    {
      "id": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
      "name": "RAG 每日论文",
      "type": "n8n-nodes-base.notion",
      "position": [
        800,
        0
      ],
      "parameters": {
        "title": "={{ $json.title }}",
        "simple": false,
        "blockUi": {
          "blockValues": [
            {
              "textContent": "={{ $json.summary }}"
            }
          ]
        },
        "options": {},
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "mode": "list",
          "value": "26ba136d-cee4-8029-ad3d-e0e8ac64993f",
          "cachedResultUrl": "https://www.notion.so/26ba136dcee48029ad3de0e8ac64993f",
          "cachedResultName": "RAG DAILY"
        },
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "published|date",
              "date": "={{ $json.published }}"
            },
            {
              "key": "summary|rich_text",
              "textContent": "={{ $json.summary }}"
            },
            {
              "key": "id|rich_text",
              "textContent": "={{ $json.id }}"
            },
            {
              "key": "html_url|url",
              "urlValue": "={{ $json.html_url }}"
            },
            {
              "key": "pdf_url|url",
              "urlValue": "={{ $json.pdf_url }}"
            },
            {
              "key": "primary_category|rich_text",
              "textContent": "={{ $json.primary_category }}"
            },
            {
              "key": "github|url",
              "urlValue": "={{ $json.github }}",
              "ignoreIfEmpty": true
            },
            {
              "key": "huggingface|url",
              "urlValue": "={{ $json.huggingface }}",
              "ignoreIfEmpty": true
            },
            {
              "key": "RAG_TF|rich_text",
              "textContent": "={{ $json.RAG_TF }}"
            },
            {
              "key": "RAG_REASON|rich_text",
              "textContent": "={{ $json.RAG_REASON }}"
            },
            {
              "key": "RAG_Category|rich_text",
              "textContent": "={{ $json.RAG_Category }}"
            },
            {
              "key": "RAG_NAME|rich_text",
              "textContent": "={{ $json.RAG_NAME }}"
            },
            {
              "key": "updated|date",
              "date": "={{ $json.updated }}"
            },
            {
              "key": "author|multi_select",
              "multiSelectValue": "={{ $json.authors }}"
            },
            {
              "key": "category|multi_select",
              "multiSelectValue": "={{ $json.categories }}"
            }
          ]
        }
      },
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
      "name": "数据提取",
      "type": "n8n-nodes-base.code",
      "position": [
        112,
        0
      ],
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "typeVersion": 2
    },
    {
      "id": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
      "name": "JSON 格式",
      "type": "n8n-nodes-base.code",
      "position": [
        592,
        0
      ],
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "typeVersion": 2
    },
    {
      "id": "8fbefc67-e9f7-4597-b935-d5f5895cf93c",
      "name": "便签1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -160,
        -224
      ],
      "parameters": {
        "width": 656,
        "height": 192,
        "content": "## 3. 数据处理"
      },
      "typeVersion": 1
    },
    {
      "id": "884f2c40-4628-4376-a040-709e2db34c48",
      "name": "便签2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1024,
        16
      ],
      "parameters": {
        "width": 624,
        "height": 368,
        "content": "## 4. 数据存储:Notion 数据库"
      },
      "typeVersion": 1
    },
    {
      "id": "4991129d-9406-4c52-bd8f-87e2721c4a6f",
      "name": "便签4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1088,
        544
      ],
      "parameters": {
        "width": 624,
        "height": 912,
        "content": "## 2. **数据提取**"
      },
      "typeVersion": 1
    }
  ],
  "pinData": {},
  "connections": {
    "If": {
      "main": [
        [
          {
            "node": "Data Extraction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "gmail": {
      "main": [
        [
          {
            "node": "Send a message",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "FEISHU": {
      "main": [
        [
          {
            "node": "FEISHU POST",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "arXiv API": {
      "main": [
        [
          {
            "node": "Message a model",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON FORMAT": {
      "main": [
        [
          {
            "node": "RAG Daily Paper Summary",
            "type": "main",
            "index": 0
          },
          {
            "node": "If",
            "type": "main",
            "index": 0
          },
          {
            "node": "Message Construction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON Format": {
      "main": [
        [
          {
            "node": "RAG Daily papers",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Basic LLM Chain": {
      "main": [
        [
          {
            "node": "JSON Format",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Data Extraction": {
      "main": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message a model": {
      "main": [
        [
          {
            "node": "JSON FORMAT",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "submittedDate:T-1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "submittedDate:T-1": {
      "main": [
        [
          {
            "node": "arXiv API",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message Construction": {
      "main": [
        [
          {
            "node": "gmail",
            "type": "main",
            "index": 0
          },
          {
            "node": "FEISHU",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    }
  }
}
常见问题

如何使用这个工作流?

复制上方的 JSON 配置代码,在您的 n8n 实例中创建新工作流并选择「从 JSON 导入」,粘贴配置后根据需要修改凭证设置即可。

这个工作流适合什么场景?

高级 - 内容创作, 多模态 AI

需要付费吗?

本工作流完全免费,您可以直接导入使用。但请注意,工作流中使用的第三方服务(如 OpenAI API)可能需要您自行付费。

工作流信息
难度等级
高级
节点数量22
分类2
节点类型11
难度说明

适合高级用户,包含 16+ 个节点的复杂工作流

外部链接
在 n8n.io 查看

分享此工作流