Crear una API de texto a voz con la transcripción de OpenAI GPT4o-mini

Intermedio

Este es unDocument Extraction, Multimodal AIflujo de automatización del dominio deautomatización que contiene 10 nodos.Utiliza principalmente nodos como Set, Webhook, HttpRequest, RespondToWebhook. Crear una API de texto a voz usando transcripción de OpenAI GPT4o-mini

Requisitos previos
  • Punto final de HTTP Webhook (n8n generará automáticamente)
  • Pueden requerirse credenciales de autenticación para la API de destino
Vista previa del flujo de trabajo
Visualización de las conexiones entre nodos, con soporte para zoom y panorámica
Exportar flujo de trabajo
Copie la siguiente configuración JSON en n8n para importar y usar este flujo de trabajo
{
  "nodes": [
    {
      "id": "6b7dd876-ed21-47f6-877b-d6c45f8bc9b3",
      "name": "Transcribir con OpenAI",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        560,
        140
      ],
      "parameters": {
        "url": "https://api.openai.com/v1/audio/transcriptions",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "contentType": "multipart-form-data",
        "authentication": "predefinedCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "file",
              "parameterType": "formBinaryData",
              "inputDataFieldName": "audio_file"
            },
            {
              "name": "model",
              "value": "gpt-4o-mini-transcribe"
            }
          ]
        },
        "nodeCredentialType": "openAiApi"
      },
      "credentials": {
        "openAiApi": {
          "id": "dMiSy27YCK6c6rra",
          "name": "Duv's OpenAI"
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "26543502-9e91-4d70-af12-df78ac5ba630",
      "name": "Extraer transcripción",
      "type": "n8n-nodes-base.set",
      "position": [
        840,
        140
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "51b3d82e-6ef0-4b0b-86aa-33cf8203a24e",
              "name": "Transcript",
              "type": "string",
              "value": "={{ $json.text }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "d12de568-f2b8-4757-b45f-f79bc579ee36",
      "name": "Webhook que contiene audio para transcribir",
      "type": "n8n-nodes-base.webhook",
      "position": [
        220,
        140
      ],
      "webhookId": "6a06f5e4-9105-4780-9840-9b7619a25647",
      "parameters": {
        "path": "audio-to-transcribe",
        "options": {},
        "httpMethod": "POST",
        "responseMode": "responseNode"
      },
      "typeVersion": 2
    },
    {
      "id": "e5de7e17-64a7-4466-a381-0dfb2e9d9711",
      "name": "Responder a Webhook con transcripción",
      "type": "n8n-nodes-base.respondToWebhook",
      "position": [
        1100,
        140
      ],
      "parameters": {
        "options": {
          "responseCode": 200
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "883dbfbe-7330-41e6-bc7e-6dda8385250c",
      "name": "Nota adhesiva",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        120,
        700
      ],
      "parameters": {
        "color": 4,
        "width": 580,
        "height": 120,
        "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    <title>Transcribe Your Audio</title>\n    <script src=\"https://cdn.tailwindcss.com\"></script>\n    <style>\n        body {\n            font-family: sans-serif;\n            background-color: #f0f4f8;\n            display: flex;\n            justify-content: center;\n            align-items: center;\n            min-height: 100vh;\n            margin: 0;\n        }\n        .container {\n            background-color: #ffffff;\n            border-radius: 0.75rem; /* Equivalent to rounded-xl */\n            box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06); /* Equivalent to shadow-md */\n            padding: 2rem; /* Equivalent to p-8 */\n            width: 100%;\n            max-width: 28rem; /* Equivalent to max-w-sm */\n            position: relative;\n        }\n        .loading-overlay {\n            position: absolute;\n            top: 0;\n            left: 0;\n            right: 0;\n            bottom: 0;\n            background: rgba(255, 255, 255, 0.8);\n            backdrop-filter: blur(4px);\n            display: flex;\n            flex-direction: column;\n            align-items: center;\n            justify-content: center;\n            border-radius: 0.75rem;\n            z-index: 10;\n            opacity: 0;\n            visibility: hidden;\n            transition: opacity 0.3s ease, visibility 0.3s ease;\n        }\n        .loading-overlay.visible {\n            opacity: 1;\n            visibility: visible;\n        }\n        .spinner {\n            border: 4px solid rgba(0, 0, 0, 0.1);\n            border-left-color: #000;\n            border-radius: 50%;\n            width: 32px;\n            height: 32px;\n            animation: spin 1s linear infinite;\n        }\n        @keyframes spin {\n            to { transform: rotate(360deg); }\n        }\n    </style>\n</head>\n<body>\n    <div class=\"container\">\n        <h2 class=\"text-2xl font-bold text-center mb-6\">Transcribe Your Audio</h2>\n\n        <!-- Audio Recording Section -->\n        <div id=\"audio-section\" class=\"space-y-4\">\n            <div class=\"flex flex-col items-center\">\n                <button id=\"record-btn\" class=\"bg-blue-500 hover:bg-blue-600 text-white font-bold py-3 px-6 rounded-full text-lg mb-4 transition-colors\">\n                    Start Recording\n                </button>\n                <p id=\"recording-status\" class=\"text-gray-600 text-sm\">Press to record</p>\n                <p id=\"timer\" class=\"text-xl font-mono text-gray-700 mt-2\">00:00</p>\n            </div>\n\n            <!-- Audio Playback -->\n            <div id=\"audio-playback-container\" class=\"hidden flex flex-col items-center mt-4\">\n                <audio id=\"audio-player\" controls class=\"w-full\"></audio>\n                <div class=\"flex space-x-4 mt-4\">\n                    <button id=\"re-record-btn\" class=\"bg-gray-200 hover:bg-gray-300 text-gray-800 font-semibold py-2 px-4 rounded-md transition-colors\">\n                        Re-record\n                    </button>\n                    <button id=\"submit-audio-btn\" class=\"bg-green-500 hover:bg-green-600 text-white font-semibold py-2 px-4 rounded-md transition-colors\">\n                        Transcribe\n                    </button>\n                </div>\n            </div>\n        </div>\n\n        <!-- Transcript Display Section -->\n        <div id=\"transcript-section\" class=\"hidden mt-6 space-y-4\">\n            <label for=\"transcript-input\" class=\"block text-gray-700 font-semibold mb-1\">Your Transcript</label>\n            <textarea id=\"transcript-input\" rows=\"6\" class=\"w-full p-3 border border-gray-300 rounded-md focus:ring-2 focus:ring-blue-500 focus:border-transparent resize-y\"></textarea>\n            <button id=\"transcribe-another-btn\" class=\"w-full bg-blue-500 hover:bg-blue-600 text-white font-bold py-2 px-4 rounded-md transition-colors\">\n                Transcribe Another Audio\n            </button>\n        </div>\n\n        <!-- Loading Overlay -->\n        <div id=\"loading-overlay\" class=\"loading-overlay\">\n            <div class=\"spinner\"></div>\n            <p id=\"loading-text\" class=\"mt-4 text-gray-700\">Processing...</p>\n        </div>\n    </div>\n\n    <script>\n        const recordBtn = document.getElementById('record-btn');\n        const recordingStatus = document.getElementById('recording-status');\n        const timerEl = document.getElementById('timer');\n        const audioPlaybackContainer = document.getElementById('audio-playback-container');\n        const audioPlayer = document.getElementById('audio-player');\n        const reRecordBtn = document.getElementById('re-record-btn');\n        const submitAudioBtn = document.getElementById('submit-audio-btn');\n        const transcriptSection = document.getElementById('transcript-section');\n        const transcriptInput = document.getElementById('transcript-input');\n        const transcribeAnotherBtn = document.getElementById('transcribe-another-btn'); // Re-added\n        const loadingOverlay = document.getElementById('loading-overlay');\n        const loadingText = document.getElementById('loading-text');\n\n        let mediaRecorder;\n        let audioChunks = [];\n        let recordedAudioBlob = null;\n        let timerInterval;\n        let seconds = 0;\n        let isRecording = false;\n\n        const WEBHOOK_URL = 'YOUR WEBHOOK URL';\n\n        // --- Section Management ---\n        const sections = {\n            'audio-section': document.getElementById('audio-section'),\n            'transcript-section': document.getElementById('transcript-section')\n        };\n\n        const showSection = (sectionId) => {\n            for (const id in sections) {\n                if (sections.hasOwnProperty(id)) {\n                    if (id === sectionId) {\n                        sections[id].classList.remove('hidden');\n                    } else {\n                        sections[id].classList.add('hidden');\n                    }\n                }\n            }\n        };\n\n        // --- Utility Functions ---\n        const formatTime = (time) => {\n            const minutes = Math.floor(time / 60).toString().padStart(2, '0');\n            const seconds = Math.floor(time % 60).toString().padStart(2, '0');\n            return `${minutes}:${seconds}`;\n        };\n\n        const showLoading = (message) => {\n            loadingText.textContent = message;\n            loadingOverlay.classList.add('visible');\n        };\n\n        const hideLoading = () => {\n            loadingOverlay.classList.remove('visible');\n        };\n\n        const updateTimer = () => {\n            seconds++;\n            timerEl.textContent = formatTime(seconds);\n        };\n\n        const resetRecordingState = () => {\n            isRecording = false;\n            clearInterval(timerInterval);\n            seconds = 0;\n            timerEl.textContent = '00:00';\n            recordBtn.textContent = 'Start Recording';\n            recordBtn.classList.remove('bg-red-500', 'hover:bg-red-600');\n            recordBtn.classList.add('bg-blue-500', 'hover:bg-blue-600');\n            recordingStatus.textContent = 'Press to record';\n            audioPlaybackContainer.classList.add('hidden');\n            recordBtn.style.display = 'block';\n            transcriptInput.value = '';\n        };\n\n        // --- Event Handlers ---\n        recordBtn.addEventListener('click', async () => {\n            if (isRecording) {\n                mediaRecorder.stop();\n            } else if (recordBtn.textContent === 'Record New Audio') {\n                resetRecordingState();\n                showSection('audio-section');\n            } else {\n                try {\n                    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });\n                    mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });\n                    audioChunks = [];\n\n                    mediaRecorder.ondataavailable = (event) => {\n                        audioChunks.push(event.data);\n                    };\n\n                    mediaRecorder.onstop = () => {\n                        recordedAudioBlob = new Blob(audioChunks, { type: 'audio/webm' });\n                        const audioURL = URL.createObjectURL(recordedAudioBlob);\n                        audioPlayer.src = audioURL;\n                        stream.getTracks().forEach(track => track.stop());\n\n                        recordBtn.style.display = 'none';\n                        recordingStatus.textContent = 'Recording finished.';\n                        audioPlaybackContainer.classList.remove('hidden');\n                        clearInterval(timerInterval);\n                    };\n\n                    mediaRecorder.start();\n                    isRecording = true;\n                    recordBtn.textContent = 'Stop Recording';\n                    recordBtn.classList.remove('bg-blue-500', 'hover:bg-blue-600');\n                    recordBtn.classList.add('bg-red-500', 'hover:bg-red-600');\n                    recordingStatus.textContent = 'Recording...';\n                    seconds = 0;\n                    timerEl.textContent = '00:00';\n                    timerInterval = setInterval(updateTimer, 1000);\n                } catch (error) {\n                    console.error('Error accessing microphone:', error);\n                    alert('Could not access microphone. Please allow access.'); // Using alert for simplicity as per previous instructions\n                    recordingStatus.textContent = 'Error: Microphone access denied.';\n                }\n            }\n        });\n\n        reRecordBtn.addEventListener('click', () => {\n            resetRecordingState();\n            showSection('audio-section');\n        });\n\n        submitAudioBtn.addEventListener('click', async () => {\n            if (!recordedAudioBlob) {\n                alert('Please record audio first.');\n                return;\n            }\n\n            showLoading('Transcribing audio...');\n\n            const formData = new FormData();\n            formData.append('audio_file', recordedAudioBlob, 'audio.webm');\n\n            try {\n                const response = await fetch(WEBHOOK_URL, {\n                    method: 'POST',\n                    body: formData,\n                });\n\n                if (!response.ok) {\n                    throw new Error(`HTTP error! status: ${response.status}`);\n                }\n\n                const result = await response.json();\n                console.log(\"Webhook response:\", result);\n\n                transcriptInput.value = result.Transcript || 'No transcript received.';\n                showSection('transcript-section');\n                \n                audioPlaybackContainer.classList.add('hidden');\n                \n                recordBtn.style.display = 'block'; \n                recordBtn.textContent = 'Record New Audio'; \n                recordBtn.classList.remove('bg-red-500', 'hover:bg-red-600');\n                recordBtn.classList.add('bg-blue-500', 'hover:bg-blue-600');\n                recordingStatus.textContent = 'Audio transcribed!';\n                timerEl.textContent = '00:00';\n                \n            } catch (error) {\n                console.error('Error sending audio to webhook:', error);\n                alert(`Failed to transcribe audio: ${error.message}`);\n                resetRecordingState();\n                showSection('audio-section');\n            } finally {\n                hideLoading();\n            }\n        });\n\n        // Event listener for the new \"Transcribe Another Audio\" button\n        transcribeAnotherBtn.addEventListener('click', () => {\n            resetRecordingState(); // Reset recording state\n            showSection('audio-section'); // Go back to the audio recording section\n        });\n\n        // Initial setup when the page loads\n        document.addEventListener('DOMContentLoaded', () => {\n            showSection('audio-section');\n            resetRecordingState(); // Ensure initial state is clean\n        });\n    </script>\n</body>\n</html>\n"
      },
      "typeVersion": 1
    },
    {
      "id": "9c06f4c8-ae6e-43a9-9eda-a1452d81e17f",
      "name": "Nota adhesiva1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -320,
        -40
      ],
      "parameters": {
        "width": 380,
        "height": 860,
        "content": "## Speech Transcription API Endpoint\n\nThis workflow exposes a webhook that transcribes any audio file sent to it.\n\n**How to use**\n\n1.  **Add credentials:** Select the **Transcribe with OpenAI** node and add your OpenAI API key.\n2.  **Get your endpoint URL:**\n\n    * Make sure the workflow is **Active**.\n\n    * Click the **Webhook** node and copy the **Production URL**.\n\n3.  **Connect the frontend:**\n\n    * Find the sticky note labeled \"Example Frontend Code Below\". Copy the code from the note beneath it.\n\n    * In the code, replace the `YOUR WEBHOOK URL` placeholder with the URL you copied in step 2.\n\n\nThe provided snippet below is a great starting point. Feel free to adapt it and build the interface you need!\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "b92cc8ab-64c9-4b24-a222-aa542b4bb710",
      "name": "Nota adhesiva2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        120,
        380
      ],
      "parameters": {
        "color": 4,
        "width": 580,
        "height": 300,
        "content": "## Example Frontend Code Below**\nThe sticky note directly below contains a complete HTML file that creates a functional audio recording interface. It's a plug-and-play example to demonstrate how to call this webhook.\n\n**To use it:**\n\n1.  Copy the entire code block from the note below.\n2.  Save it as an `.html` file.\n3.  **Remember to replace** the `YOUR WEBHOOK URL` placeholder inside the code with your actual URL from the Webhook node.\n4.  Open the file in your browser to test."
      },
      "typeVersion": 1
    },
    {
      "id": "883ba3ee-2a32-477f-8493-da931847a9cb",
      "name": "Nota adhesiva3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        120,
        -40
      ],
      "parameters": {
        "color": 7,
        "width": 280,
        "height": 360,
        "content": "## The webhook to call from your app\nPOST the audio as \"audio_file\" to this webhook to start the workflow."
      },
      "typeVersion": 1
    },
    {
      "id": "40e86f5a-c472-4801-9235-3a2f8e3b0088",
      "name": "Nota adhesiva4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        460,
        -40
      ],
      "parameters": {
        "color": 7,
        "width": 280,
        "height": 360,
        "content": "## AI transcription with OpenAI GPT4o-mini transcribe"
      },
      "typeVersion": 1
    },
    {
      "id": "4d9d11f7-ebfa-4277-bf41-9070b6d052b1",
      "name": "Nota adhesiva5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1020,
        -20
      ],
      "parameters": {
        "color": 7,
        "width": 280,
        "height": 340,
        "content": "## Sending the transcript back to your app\nYour app should expect the key \"Transcript\" in the body of the webhook response."
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "26543502-9e91-4d70-af12-df78ac5ba630": {
      "main": [
        [
          {
            "node": "e5de7e17-64a7-4466-a381-0dfb2e9d9711",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "6b7dd876-ed21-47f6-877b-d6c45f8bc9b3": {
      "main": [
        [
          {
            "node": "26543502-9e91-4d70-af12-df78ac5ba630",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "d12de568-f2b8-4757-b45f-f79bc579ee36": {
      "main": [
        [
          {
            "node": "6b7dd876-ed21-47f6-877b-d6c45f8bc9b3",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Preguntas frecuentes

¿Cómo usar este flujo de trabajo?

Copie el código de configuración JSON de arriba, cree un nuevo flujo de trabajo en su instancia de n8n y seleccione "Importar desde JSON", pegue la configuración y luego modifique la configuración de credenciales según sea necesario.

¿En qué escenarios es adecuado este flujo de trabajo?

Intermedio - Extracción de documentos, IA Multimodal

¿Es de pago?

Este flujo de trabajo es completamente gratuito, puede importarlo y usarlo directamente. Sin embargo, tenga en cuenta que los servicios de terceros utilizados en el flujo de trabajo (como la API de OpenAI) pueden requerir un pago por su cuenta.

Flujos de trabajo relacionados recomendados

Flujo de trabajo de aprobación y auditoría de documentos multinivel
Automatización de la aprobación de documentos para flujos de trabajo multinivel con Supabase y Gmail
If
Set
Form
+
If
Set
Form
38 NodosAlok Kumar
Extracción de documentos
Extraer texto de imágenes y PDF a Markdown usando Mistral OCR a través de Telegram
Usar Mistral OCR a través de Telegram para extraer texto de imágenes y PDF a Markdown
If
Set
Code
+
If
Set
Code
40 NodosRostislav
Extracción de documentos
Registrar y respaldar automáticamente flujos de trabajo usando GPT-4 y Airtable
Usar GPT-4 y Airtable para registrar y respaldar automáticamente flujos de trabajo
If
N8n
Set
+
If
N8n
Set
38 NodosGuillaume Duvernay
Resumen de IA
Usar Lookio y OpenAI GPT para crear artículos basados en hechos desde fuentes de conocimiento
Crear artículos basados en hechos a partir de fuentes de conocimiento utilizando Lookio y OpenAI GPT
Set
Split Out
Aggregate
+
Set
Split Out
Aggregate
19 NodosGuillaume Duvernay
Creación de contenido
Usar Super RAG y GPT-5 para crear artículos basados en hechos desde fuentes de conocimiento
Crear artículos basados en hechos a partir de fuentes de conocimiento utilizando Super RAG y GPT-5
Set
Split Out
Aggregate
+
Set
Split Out
Aggregate
19 NodosGuillaume Duvernay
RAG de IA
Crear resúmenes de datos impulsados por IA de forma regular en Slack usando Super Assistant
Crear resúmenes de datos impulsados por IA de forma periódica en Slack usando Super Assistant
Set
Slack
Http Request
+
Set
Slack
Http Request
9 NodosGuillaume Duvernay
RAG de IA
Información del flujo de trabajo
Nivel de dificultad
Intermedio
Número de nodos10
Categoría2
Tipos de nodos5
Descripción de la dificultad

Adecuado para usuarios con experiencia intermedia, flujos de trabajo de complejidad media con 6-15 nodos

Autor
Guillaume Duvernay

Guillaume Duvernay

@duv

AI and automation expert

Enlaces externos
Ver en n8n.io

Compartir este flujo de trabajo

Categorías

Categorías: 34