Täglicher RAG-Forschungs-Paper-Hub mit arXiv, Gemini AI und Notion

Experte

Dies ist ein Content Creation, Multimodal AI-Bereich Automatisierungsworkflow mit 22 Nodes. Hauptsächlich werden If, Code, Gmail, Notion, Switch und andere Nodes verwendet. Täglicher RAG-Forschungspapier-Hub mit arXiv, Gemini AI und Notion

Voraussetzungen
  • Google-Konto + Gmail API-Anmeldedaten
  • Notion API Key
  • Möglicherweise sind Ziel-API-Anmeldedaten erforderlich
  • Google Gemini API Key
Workflow-Vorschau
Visualisierung der Node-Verbindungen, mit Zoom und Pan
Workflow exportieren
Kopieren Sie die folgende JSON-Konfiguration und importieren Sie sie in n8n
{
  "meta": {
    "instanceId": "a6011e4876c6b1225fa48dae1dbfa92e1932a633b3186bbb7bfd5c9e6ad2d878"
  },
  "nodes": [
    {
      "id": "7e9f18f1-edfe-4af6-835b-12fe16a99034",
      "name": "Basic LLM Kette",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        272,
        0
      ],
      "parameters": {
        "text": "={{ $json.data }}",
        "batching": {},
        "messages": {
          "messageValues": [
            {
              "message": "You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:\n\n1. RAG Relevance and Labeling:\n   - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.\n   - For each data item, add three new fields:\n     - `RAG_TF`: \"T\" if related, \"F\" if not\n     - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty\n     - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / …); otherwise, leave empty\n\n2. RAG Method Extraction:\n   - Analyze the `summary` and extract the RAG method proposed in the paper.\n   - Store it in the new field `RAG_NAME`.\n\n3. External Link Extraction:\n   - Analyze the `summary` content for `github` or `huggingface` links.\n   - If present, extract the URLs and populate the existing `github` and `huggingface` fields.\n   - If not present, leave them unchanged.\n\nOutput Format: standard JSON\n\nExample:\n\nGiven a data item with the following `summary`:\n\n\"summary\":\"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer\n"
            }
          ]
        },
        "promptType": "define"
      },
      "typeVersion": 1.7
    },
    {
      "id": "92d37dc1-aaaf-47ec-987a-e6d23c93e055",
      "name": "Google Gemini-Chat-Modell",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        272,
        144
      ],
      "parameters": {
        "options": {},
        "modelName": "=models/gemini-2.5-flash"
      },
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
      "name": "submittedDate:T-1",
      "type": "n8n-nodes-base.code",
      "position": [
        -1664,
        320
      ],
      "parameters": {
        "jsCode": "// Function 节点代码\nconst now = new Date();\nconst yesterday = new Date(now);\nyesterday.setDate(now.getDate() - 2);\n\nconst y = yesterday.getFullYear();\nconst m = String(yesterday.getMonth() + 1).padStart(2, '0');\nconst d = String(yesterday.getDate()).padStart(2, '0');\n\nreturn [\n  {\n    json: {\n      from: `${y}${m}${d}0000`,\n      to: `${y}${m}${d}2359`\n    }\n  }\n];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "c3685631-8bbd-409a-978a-fbb3e9847115",
      "name": "If",
      "type": "n8n-nodes-base.if",
      "position": [
        -160,
        16
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "de0a5a7e-67dd-4dd0-8ccc-3406e17bd09c",
              "operator": {
                "type": "number",
                "operation": "notEquals"
              },
              "leftValue": "={{ $json.paperCount }}",
              "rightValue": 0
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "4dd24343-1872-472d-8d7d-4cd28a9dbabe",
      "name": "Zeitplan-Trigger",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        -1856,
        320
      ],
      "parameters": {
        "rule": {
          "interval": [
            {
              "triggerAtHour": 6
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
      "name": "FEISHU",
      "type": "n8n-nodes-base.switch",
      "position": [
        576,
        720
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "7b804f5e-6702-4d4a-99b9-3f06f8eb20d4",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "feishu"
                  }
                ]
              }
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 3.2
    },
    {
      "id": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
      "name": "FEISHU POST",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        800,
        720
      ],
      "parameters": {
        "url": "=",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "bodyParameters": {
          "parameters": [
            {
              "name": "msg_type",
              "value": "={{ $json.msg_type }}"
            },
            {
              "name": "content",
              "value": "={{ $json.content }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
      "name": "gmail",
      "type": "n8n-nodes-base.switch",
      "position": [
        576,
        544
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "3222832c-bbf2-46a2-abd8-2bb14095b7bf",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "gmail"
                  }
                ]
              }
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 3.2
    },
    {
      "id": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
      "name": "Send a message",
      "type": "n8n-nodes-base.gmail",
      "position": [
        800,
        544
      ],
      "webhookId": "cb0a1f30-59e0-4505-af24-db689d9c1f23",
      "parameters": {
        "sendTo": "xing.adam@gmail.com",
        "message": "={{ $json.message }}",
        "options": {},
        "subject": "={{ $json.subject }}"
      },
      "credentials": {
        "gmailOAuth2": {
          "id": "WoyY5hj4D93bD2Fp",
          "name": "Gmail account"
        }
      },
      "typeVersion": 2.1
    },
    {
      "id": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
      "name": "Message a model",
      "type": "@n8n/n8n-nodes-langchain.googleGemini",
      "position": [
        -1040,
        320
      ],
      "parameters": {
        "modelId": {
          "__rl": true,
          "mode": "list",
          "value": "models/gemini-2.5-flash-lite",
          "cachedResultName": "models/gemini-2.5-flash-lite"
        },
        "options": {},
        "messages": {
          "values": [
            {
              "role": "model",
              "content": "You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:\n\n1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary\n2. Set the daily date field `Date`: yyyy-mm-dd\n3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.\n4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.\n5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.\n\nExample: If there are papers:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 2,\n  \"SUMMARY_CN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.\",\n  \"SUMMARY_EN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency.\"\n}\n\nIf the number of papers is 0, maintain the JSON structure:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 0,\n  \"SUMMARY_CN\": \"\",\n  \"SUMMARY_EN\": \"\"\n}"
            },
            {
              "content": "={{ $json.data }}"
            }
          ]
        },
        "simplify": false
      },
      "credentials": {
        "googlePalmApi": {
          "id": "ra9slZSGvLJTHQw1",
          "name": "Google Gemini(PaLM) Api account"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "024c6399-857e-45a3-a15d-8b733e16da67",
      "name": "RAG Daily Paper Summary",
      "type": "n8n-nodes-base.notion",
      "position": [
        800,
        320
      ],
      "parameters": {
        "title": "={{ $json.title }}",
        "simple": false,
        "options": {},
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "mode": "list",
          "value": "26fa136d-cee4-8092-8b85-cf9e9cbc424f",
          "cachedResultUrl": "https://www.notion.so/26fa136dcee480928b85cf9e9cbc424f",
          "cachedResultName": "RAG Daily Paper Summary"
        },
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "DATE|date",
              "date": "={{ $json.date }}"
            },
            {
              "key": "Number of papers|number",
              "numberValue": "={{ $json.paperCount }}"
            },
            {
              "key": "SUMMARY_EN|rich_text",
              "textContent": "={{ $json.summaryEN }}"
            },
            {
              "key": "SUMMARY_CN|rich_text",
              "textContent": "={{ $json.summaryCN }}"
            }
          ]
        }
      },
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
      "name": "JSON FORMAT",
      "type": "n8n-nodes-base.code",
      "position": [
        -688,
        320
      ],
      "parameters": {
        "jsCode": "const items = $input.all();\nconst response = items[0].json;\n\ntry {\n  // Extract text content from Gemini API response\n  // Note: response is directly an object, not an array\n  const text = response.candidates[0].content.parts[0].text;\n  \n  // Extract JSON content\n  const jsonMatch = text.match(/```json\\n([\\s\\S]*?)\\n```/);\n  const jsonStr = jsonMatch[1];\n  \n  // Parse JSON\n  const data = JSON.parse(jsonStr);\n  \n  // Manually handle duplicate keys - extract from original string\n  const titleMatch = jsonStr.match(/\"Number of papers\":\\s*\"([^\"]+)\"/);\n  const countMatch = jsonStr.match(/\"Number of papers\":\\s*(\\d+)/);\n  \n  // Construct result\n  items[0].json = {\n    title: titleMatch ? titleMatch[1] : '',\n    date: data.Date || '',\n    paperCount: countMatch ? parseInt(countMatch[1]) : 0,\n    summaryCN: data.SUMMARY_CN || '',\n    summaryEN: data.SUMMARY_EN || ''\n  };\n  \n} catch (error) {\n  items[0].json = {\n    error: error.message,\n    originalData: response\n  };\n}\n\nreturn items;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "f1a331fa-d830-4656-b108-7e18e7430b04",
      "name": "Haftnotiz3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1984,
        544
      ],
      "parameters": {
        "width": 736,
        "height": 768,
        "content": "## 1. Data Retrieval\n### arXiv API\n\nThe arXiv provides a public API that allows users to query research papers by topic or by predefined categories.\n\n[arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual)\n\n**Key Notes:**\n\n1. **Response Format**: The API returns data as a typical *Atom Response*.\n2. **Timezone & Update Frequency**:  \n   - The arXiv submission process operates on a 24-hour cycle.  \n   - Newly submitted articles become available in the API only at midnight *after* they have been processed.  \n   - Feeds are updated daily at midnight Eastern Standard Time (EST).  \n   - Therefore, a single request per day is sufficient.  \n3. **Request Limits**:  \n   - The maximum number of results per call (`max_results`) is **30,000**,  \n   - Results must be retrieved in slices of at most **2,000** at a time, using the `max_results` and `start` query parameters.  \n4. **Time Format**:  \n   - The expected format is `[YYYYMMDDTTTT+TO+YYYYMMDDTTTT]`,  \n   - `TTTT` is provided in 24-hour time to the minute, in GMT.\n\n### Scheduled Task\n\n- **Execution Frequency**: Daily  \n- **Execution Time**: 6:00 AM  \n- **Time Parameter Handling (JS)**:  \n  According to arXiv’s update rules, the scheduled task should query the **previous day’s (T-1)** `submittedDate` data.\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "ae855e91-2363-4b97-8933-761934b269fe",
      "name": "arXiv API",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -1440,
        320
      ],
      "parameters": {
        "url": "=https://export.arxiv.org/api/query?search_query=all:RAG+AND+submittedDate:[{{$json[\"from\"]}}+TO+{{$json[\"to\"]}}]",
        "options": {},
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "={{ $json.from }}"
            },
            {
              "name": "={{ $json.to }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
      "name": "Message Construction",
      "type": "n8n-nodes-base.code",
      "position": [
        -128,
        528
      ],
      "parameters": {
        "jsCode": "// Get current date\nconst now = new Date();\nconst year = now.getFullYear();\nconst month = String(now.getMonth() + 1).padStart(2, '0');\nconst day = String(now.getDate()).padStart(2, '0');\nconst date = `${year}-${month}-${day}`;\n\n// Get input data\nconst inputData = $input.first().json;\n\n// Generate message content\nconst messageContent = inputData.SUMMARY_CN;\n\n// Gmail message body\nconst gmailMessage = {\n    subject: inputData.title || `Daily Paper Summary - ${date}`,\n    message: `<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n    <title> RAG Daily Paper Summary - ${date}</title>\n    <style type=\"text/css\">\n        /* Gmail safe styles */\n        body {\n            font-family: Arial, sans-serif;\n            line-height: 1.4;\n            margin: 0;\n            padding: 0;\n            background-color: #f9f9f9;\n            color: #333333;\n        }\n        \n        table {\n            border-collapse: collapse;\n            mso-table-lspace: 0pt;\n            mso-table-rspace: 0pt;\n        }\n        \n        .email-wrapper {\n            width: 100%;\n            background-color: #f9f9f9;\n            padding: 40px 20px;\n        }\n        \n        .email-container {\n            width: 100%;\n            max-width: 600px;\n            margin: 0 auto;\n            background-color: #ffffff;\n            border-radius: 8px;\n            box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);\n        }\n        \n        .header {\n            background-color: #2563eb;\n            padding: 24px;\n            text-align: center;\n            border-radius: 8px 8px 0 0;\n        }\n        \n        .header h1 {\n            margin: 0 0 8px 0;\n            font-size: 24px;\n            font-weight: 600;\n            color: #ffffff;\n        }\n        \n        .date {\n            font-size: 14px;\n            color: #ffffff;\n            opacity: 0.9;\n        }\n        \n        .stats {\n            background-color: #f1f5f9;\n            padding: 16px 24px;\n            font-size: 14px;\n            color: #64748b;\n        }\n        \n        .content {\n            padding: 32px 24px 40px 24px;\n        }\n        \n        .section {\n            margin-bottom: 24px;\n        }\n        \n        .section-title {\n            font-size: 16px;\n            font-weight: 600;\n            color: #1e293b;\n            margin-bottom: 12px;\n            padding-bottom: 8px;\n            border-bottom: 1px solid #e2e8f0;\n        }\n        \n        .flag {\n            display: inline-block;\n            width: 20px;\n            height: 14px;\n            margin-right: 8px;\n            border-radius: 2px;\n            vertical-align: middle;\n        }\n        \n        .flag-cn {\n            background-color: #de2910;\n        }\n        \n        .flag-en {\n            background-color: #012169;\n        }\n        \n        .summary {\n            font-size: 14px;\n            line-height: 1.6;\n            color: #475569;\n            padding: 16px;\n            background-color: #f8fafc;\n            border-radius: 6px;\n            border-left: 3px solid #2563eb;\n        }\n        \n        .divider {\n            height: 1px;\n            background-color: #e2e8f0;\n            margin: 20px 0;\n            border: none;\n        }\n        \n        /* Mobile responsive */\n        @media screen and (max-width: 600px) {\n            .email-wrapper {\n                padding: 20px 10px !important;\n            }\n            \n            .header, .stats {\n                padding: 20px 16px !important;\n            }\n            \n            .content {\n                padding: 24px 16px 32px 16px !important;\n            }\n            \n            .email-container {\n                border-radius: 0;\n            }\n        }\n        \n        /* Gmail specific fixes */\n        .gmail-fix {\n            display: none;\n        }\n        \n        /* Outlook specific fixes */\n        .ExternalClass {\n            width: 100%;\n        }\n        \n        .ExternalClass,\n        .ExternalClass p,\n        .ExternalClass span,\n        .ExternalClass font,\n        .ExternalClass td,\n        .ExternalClass div {\n            line-height: 100%;\n        }\n    </style>\n    <!--[if mso]>\n    <style type=\"text/css\">\n        .email-container {\n            width: 600px !important;\n        }\n    </style>\n    <![endif]-->\n</head>\n<body>\n    <table role=\"presentation\" class=\"email-wrapper\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n        <tr>\n            <td align=\"center\">\n                <table role=\"presentation\" class=\"email-container\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n                    <!-- Header -->\n                    <tr>\n                        <td class=\"header\">\n                            <h1>RAG Daily Papers</h1>\n                            <div class=\"date\">${inputData.Date || date}</div>\n                        </td>\n                    </tr>\n                    \n                    <!-- Stats -->\n                    <tr>\n                        <td class=\"stats\">\n                            <strong>${inputData[\"Number of papers\"] || inputData.paperCount || 0} papers</strong> reviewed today\n                        </td>\n                    </tr>\n                    \n                    <!-- Content -->\n                    <tr>\n                        <td class=\"content\">\n                            <!-- Chinese Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                  🇨🇳 Chinese\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}\n                                </div>\n                            </div>\n                            \n                            <!-- Divider -->\n                            <hr class=\"divider\">\n                            \n                            <!-- English Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                    🇺🇸 English\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}\n                                </div>\n                            </div>\n                        </td>\n                    </tr>\n                </table>\n            </td>\n        </tr>\n    </table>\n</body>\n</html>`\n};\n\n// Feishu message body\nconst feishuMessage = {\n    msg_type: \"text\",\n    content: {\n        text: `Today ${$input.first().json.date} ${$input.first().json.paperCount}  papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`\n    }\n};\n\n// n8n output format\nreturn [\n    { json: { type: \"gmail\", ...gmailMessage } },\n    { json: { type: \"feishu\", ...feishuMessage } }\n];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "2582c7df-9b15-4473-bc47-91cf6f7304e0",
      "name": "Haftnotiz",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -176,
        896
      ],
      "parameters": {
        "width": 1152,
        "height": 576,
        "content": "## 5. Message Push\n\nSet up two channels for message delivery: **EMAIL** and **IM**, and define the message format and content.\n\n### Email: Gmail\n\n**GMAIL OAuth 2.0 – Official Documentation**  \n[Configure your OAuth consent screen](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#configure-your-oauth-consent-screen)\n\n**Steps:**\n- Enable Gmail API  \n- Create OAuth consent screen  \n- Create OAuth client credentials  \n- Audience: Add **Test users** under Testing status  \n\n**Message format**: HTML  \n(Model: OpenAI GPT — used to design an HTML email template)\n\n### IM: Feishu (LARK)\n\n**Bots in groups**  \n[Use bots in groups](https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups)\n"
      },
      "typeVersion": 1
    },
    {
      "id": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
      "name": "RAG Daily papers",
      "type": "n8n-nodes-base.notion",
      "position": [
        800,
        0
      ],
      "parameters": {
        "title": "={{ $json.title }}",
        "simple": false,
        "blockUi": {
          "blockValues": [
            {
              "textContent": "={{ $json.summary }}"
            }
          ]
        },
        "options": {},
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "mode": "list",
          "value": "26ba136d-cee4-8029-ad3d-e0e8ac64993f",
          "cachedResultUrl": "https://www.notion.so/26ba136dcee48029ad3de0e8ac64993f",
          "cachedResultName": "RAG DAILY"
        },
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "published|date",
              "date": "={{ $json.published }}"
            },
            {
              "key": "summary|rich_text",
              "textContent": "={{ $json.summary }}"
            },
            {
              "key": "id|rich_text",
              "textContent": "={{ $json.id }}"
            },
            {
              "key": "html_url|url",
              "urlValue": "={{ $json.html_url }}"
            },
            {
              "key": "pdf_url|url",
              "urlValue": "={{ $json.pdf_url }}"
            },
            {
              "key": "primary_category|rich_text",
              "textContent": "={{ $json.primary_category }}"
            },
            {
              "key": "github|url",
              "urlValue": "={{ $json.github }}",
              "ignoreIfEmpty": true
            },
            {
              "key": "huggingface|url",
              "urlValue": "={{ $json.huggingface }}",
              "ignoreIfEmpty": true
            },
            {
              "key": "RAG_TF|rich_text",
              "textContent": "={{ $json.RAG_TF }}"
            },
            {
              "key": "RAG_REASON|rich_text",
              "textContent": "={{ $json.RAG_REASON }}"
            },
            {
              "key": "RAG_Category|rich_text",
              "textContent": "={{ $json.RAG_Category }}"
            },
            {
              "key": "RAG_NAME|rich_text",
              "textContent": "={{ $json.RAG_NAME }}"
            },
            {
              "key": "updated|date",
              "date": "={{ $json.updated }}"
            },
            {
              "key": "author|multi_select",
              "multiSelectValue": "={{ $json.authors }}"
            },
            {
              "key": "category|multi_select",
              "multiSelectValue": "={{ $json.categories }}"
            }
          ]
        }
      },
      "credentials": {
        "notionApi": {
          "id": "BNsFk38kgqvRDJpX",
          "name": "Notion account"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
      "name": "Data Extraction",
      "type": "n8n-nodes-base.code",
      "position": [
        112,
        0
      ],
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "typeVersion": 2
    },
    {
      "id": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
      "name": "JSON Format",
      "type": "n8n-nodes-base.code",
      "position": [
        592,
        0
      ],
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "typeVersion": 2
    },
    {
      "id": "8fbefc67-e9f7-4597-b935-d5f5895cf93c",
      "name": "Haftnotiz1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -160,
        -224
      ],
      "parameters": {
        "width": 656,
        "height": 192,
        "content": "## 3. Data Processing\n\nAnalyze and summarize paper data using AI, then standardize output as JSON.\n\n### Single Paper Basic Information Analysis and Enhancement  \n### Daily Paper Summary and Multilingual Translation"
      },
      "typeVersion": 1
    },
    {
      "id": "884f2c40-4628-4376-a040-709e2db34c48",
      "name": "Haftnotiz2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1024,
        16
      ],
      "parameters": {
        "width": 624,
        "height": 368,
        "content": "## 4. Data Storage: Notion Database\n\n- Create a corresponding database in Notion with the same predefined field names.  \n- In Notion, create an integration under **Integrations** and grant access to the database. Obtain the corresponding **Secret Key**.  \n- Use the Notion **\"Create a database page\"** node to configure the field mapping and store the data.  \n\n**Notes**  \n- **\"Create a database page\"** only adds new entries; data will not be updated.  \n- The `updated` and `published` timestamps of arXiv papers are in **UTC**.  \n- Notion **single-select** and **multi-select** fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.  \n- Notion does not accept `null` values, which causes a **400 error**.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "4991129d-9406-4c52-bd8f-87e2721c4a6f",
      "name": "Haftnotiz4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1088,
        544
      ],
      "parameters": {
        "width": 624,
        "height": 912,
        "content": "## 2. **Data Extraction**\n\n### Data Cleaning Rules (Convert to Standard JSON)\n\n1. **Remove Header**  \n   - Keep only the `<entry></entry>` blocks representing paper items.\n\n2. **Single Item**  \n   - Each `<entry></entry>` represents a single item.\n\n3. **Field Processing Rules**  \n   - `<id></id>` ➡️ `id`  \n     Extract content.  \n     Example: `<id>http://arxiv.org/abs/2409.06062v1</id>` → `http://arxiv.org/abs/2409.06062v1`  \n   - `<updated></updated>` ➡️ `updated`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<published></published>` ➡️ `published`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<title></title>` ➡️ `title`  \n     Extract text content  \n   - `<summary></summary>` ➡️ `summary`  \n     Keep text, remove line breaks  \n   - `<author></author>` ➡️ `author`  \n     Combine all authors into an array  \n     Example: `[ \"Ernest Pusateri\", \"Anmol Walia\" ]` (for Notion multi-select field)  \n   - `<arxiv:comment></arxiv:comment>` ➡️ Ignore / discard  \n   - `<link type=\"text/html\">` ➡️ `html_url`  \n     Extract URL  \n   - `<link type=\"application/pdf\">` ➡️ `pdf_url`  \n     Extract URL  \n   - `<arxiv:primary_category term=\"cs.CL\">` ➡️ `primary_category`  \n     Extract `term` value  \n   - `<category>` ➡️ `category`  \n     Merge all `<category>` values into an array  \n     Example: `[ \"eess.AS\", \"cs.SD\" ]` (for Notion multi-select field)  \n\n4. **Add Empty Fields**  \n   - `github`  \n   - `huggingface`\n"
      },
      "typeVersion": 1
    }
  ],
  "pinData": {},
  "connections": {
    "c3685631-8bbd-409a-978a-fbb3e9847115": {
      "main": [
        [
          {
            "node": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "9151ab18-379f-4d3b-8ca2-cf65c547e78d": {
      "main": [
        [
          {
            "node": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "a38b1b58-a6f6-4c6b-ba6e-f153980a220d": {
      "main": [
        [
          {
            "node": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "ae855e91-2363-4b97-8933-761934b269fe": {
      "main": [
        [
          {
            "node": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d": {
      "main": [
        [
          {
            "node": "024c6399-857e-45a3-a15d-8b733e16da67",
            "type": "main",
            "index": 0
          },
          {
            "node": "c3685631-8bbd-409a-978a-fbb3e9847115",
            "type": "main",
            "index": 0
          },
          {
            "node": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "ae2d8994-7a52-4f7b-81fd-61c0538ba380": {
      "main": [
        [
          {
            "node": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Basic LLM Chain": {
      "main": [
        [
          {
            "node": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "5d897d4d-968b-4336-bbee-d1d3b4dcae06": {
      "main": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "3df82b76-e9c8-4b0b-a552-428f2fc12c97": {
      "main": [
        [
          {
            "node": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "aaa67776-c308-443e-98f6-e1fe7035cbb5": {
      "main": [
        [
          {
            "node": "ae855e91-2363-4b97-8933-761934b269fe",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "6f3df3be-a376-42e9-b0be-32c4fba5a8e2": {
      "main": [
        [
          {
            "node": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
            "type": "main",
            "index": 0
          },
          {
            "node": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    }
  }
}
Häufig gestellte Fragen

Wie verwende ich diesen Workflow?

Kopieren Sie den obigen JSON-Code, erstellen Sie einen neuen Workflow in Ihrer n8n-Instanz und wählen Sie "Aus JSON importieren". Fügen Sie die Konfiguration ein und passen Sie die Anmeldedaten nach Bedarf an.

Für welche Szenarien ist dieser Workflow geeignet?

Experte - Content-Erstellung, Multimodales KI

Ist es kostenpflichtig?

Dieser Workflow ist völlig kostenlos. Beachten Sie jedoch, dass Drittanbieterdienste (wie OpenAI API), die im Workflow verwendet werden, möglicherweise kostenpflichtig sind.

Workflow-Informationen
Schwierigkeitsgrad
Experte
Anzahl der Nodes22
Kategorie2
Node-Typen11
Schwierigkeitsbeschreibung

Für fortgeschrittene Benutzer, komplexe Workflows mit 16+ Nodes

Externe Links
Auf n8n.io ansehen

Diesen Workflow teilen

Kategorien

Kategorien: 34