Skip to content
Posts en inglés. Usá el traductor del navegador para leerlos en tu idioma.

How to Build a Browser-Based PDF Metadata Editor with JavaScript

Yammbo
· 5 min read
javascript pdf editor browser pdf tools edit pdf properties pdf-lib tutorial client-side pdf editing
How to Build a Browser-Based PDF Metadata Editor with JavaScript

PDF files contain more than just the visual content you see on the page. Hidden within every PDF document is metadata—structured information that describes the document itself. This can include details like the title, author, subject, keywords, creation date, and the application used to create it. Understanding and managing this metadata is crucial for document organization, searchability, and proper identification when files are shared. This tutorial will guide you through building a browser-based PDF metadata editor using JavaScript, allowing users to upload a PDF, view and edit its metadata, and download the updated file, all without server-side processing.

Step 1: Project Setup and Dependencies

To begin, you'll need a basic HTML file to serve as your application's interface and a JavaScript file for the core logic. We'll also integrate two essential libraries via Content Delivery Networks (CDNs): PDF-lib for reading and writing PDF metadata, and PDF.js for rendering PDF previews in the browser.

Create index.html:

<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>PDF Metadata Editor</title><!-- PDF.js for rendering PDF previews --><script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.10.377/pdf.min.js"></script><script>pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.10.377/pdf.worker.min.js';</script><!-- PDF-lib for reading and writing PDF metadata --><script src="https://unpkg.com/pdf-lib/dist/pdf-lib.min.js"></script></head><body><h1>Browser-Based PDF Metadata Editor</h1><input type="file" id="pdfInput" accept=".pdf"><div id="pdfPreview"><h3>PDF Preview</h3><canvas id="pdfCanvas"></canvas></div><div id="metadataEditor"><h3>Edit Metadata</h3><form id="metadataForm"><p><label for="title">Title:</label><input type="text" id="title"></p><p><label for="author">Author:</label><input type="text" id="author"></p><p><label for="subject">Subject:</label><input type="text" id="subject"></p><p><label for="keywords">Keywords:</label><input type="text" id="keywords"></p><p><label for="creator">Creator:</label><input type="text" id="creator"></p><p><label for="creationDate">Creation Date:</label><input type="text" id="creationDate" disabled></p><p><label for="modDate">Modification Date:</label><input type="text" id="modDate" disabled></p><button type="submit">Save & Download PDF</button></form></div><script src="app.js"></script></body></html>

The pdfInput element allows users to select a PDF file, pdfCanvas will display a preview, and metadataForm contains input fields for editing document properties.

Create app.js:

Create an empty file named app.js in the same directory as index.html. This is where all your JavaScript logic will reside.

Step 2: Handling PDF Upload and Preview

The first step in our JavaScript logic is to handle file selection and display a preview of the uploaded PDF. This ensures the user has selected the correct document before proceeding to edit its metadata.

// app.jsconst pdfInput = document.getElementById('pdfInput');const pdfCanvas = document.getElementById('pdfCanvas');const metadataForm = document.getElementById('metadataForm');const metadataFields = {title: document.getElementById('title'),author: document.getElementById('author'),subject: document.getElementById('subject'),keywords: document.getElementById('keywords'),creator: document.getElementById('creator'),creationDate: document.getElementById('creationDate'),modDate: document.getElementById('modDate'),};let currentPdfBytes = null; // To store the original PDF bytespdfInput.addEventListener('change', async (event) => {const file = event.target.files[0];if (!file) return;// Store the original PDF bytescurrentPdfBytes = await file.arrayBuffer();// Display PDF previewawait renderPdfPreview(currentPdfBytes);// Read and display metadataawait readAndDisplayMetadata(currentPdfBytes);});async function renderPdfPreview(pdfBytes) {const loadingTask = pdfjsLib.getDocument({ data: pdfBytes });const pdf = await loadingTask.promise;const page = await pdf.getPage(1); // Get the first pageconst viewport = page.getViewport({ scale: 1.5 });const context = pdfCanvas.getContext('2d');pdfCanvas.height = viewport.height;pdfCanvas.width = viewport.width;const renderContext = {canvasContext: context,viewport: viewport,};await page.render(renderContext).promise;}

In this code:

  • We get references to our HTML elements.
  • An event listener on pdfInput triggers when a file is selected.
  • The file's content is read into an ArrayBuffer.
  • renderPdfPreview uses PDF.js to load the PDF data and draw the first page onto the pdfCanvas. You can find more details on PDF.js usage in its official documentation.

Step 3: Reading and Displaying Existing Metadata

Once the PDF is loaded and previewed, the next step is to extract its existing metadata using PDF-lib and populate the editing form. This allows users to see the current values before making changes.

// app.js (continued)async function readAndDisplayMetadata(pdfBytes) {const pdfDoc = await PDFLib.PDFDocument.load(pdfBytes);metadataFields.title.value = pdfDoc.getTitle() || '';metadataFields.author.value = pdfDoc.getAuthor() || '';metadataFields.subject.value = pdfDoc.getSubject() || '';metadataFields.keywords.value = pdfDoc.getKeywords() || '';metadataFields.creator.value = pdfDoc.getCreator() || '';const creationDate = pdfDoc.getCreationDate();metadataFields.creationDate.value = creationDate ? creationDate.toLocaleString() : '';const modDate = pdfDoc.getModificationDate();metadataFields.modDate.value = modDate ? modDate.toLocaleString() : '';}

Here's how it works:

  1. PDFLib.PDFDocument.load(pdfBytes) asynchronously loads the PDF document from the provided byte array. For more information on PDF-lib, refer to its GitHub repository.
  2. Methods like getTitle(), getAuthor(), etc., are called on the pdfDoc object to retrieve the respective metadata.
  3. The retrieved values are then assigned to the value property of their corresponding input fields in the metadataForm. We use || '' to ensure empty strings if a metadata field is not present, preventing null from appearing in the input fields.
  4. Creation and Modification dates are Date objects and are converted to a localized string for display. These fields are disabled as they are typically system-generated and not directly editable by users.

Step 4: Updating and Saving the PDF

The final step involves capturing the user's changes from the form, applying them to the PDF document using PDF-lib, and then generating a new PDF file for download.

// app.js (continued)metadataForm.addEventListener('submit', async (event) => {event.preventDefault(); // Prevent default form submissionif (!currentPdfBytes) {alert('Please upload a PDF first.');return;}const pdfDoc = await PDFLib.PDFDocument.load(currentPdfBytes);// Update metadata fieldspdfDoc.setTitle(metadataFields.title.value);pdfDoc.setAuthor(metadataFields.author.value);pdfDoc.setSubject(metadataFields.subject.value);pdfDoc.setKeywords(metadataFields.keywords.value);pdfDoc.setCreator(metadataFields.creator.value);// Save the modified PDFconst modifiedPdfBytes = await pdfDoc.save();// Create a Blob and URL for downloadconst blob = new Blob([modifiedPdfBytes], { type: 'application/pdf' });const url = URL.createObjectURL(blob);const a = document.createElement('a');a.href = url;a.download = 'edited_document.pdf';document.body.appendChild(a);a.click();document.body.removeChild(a);URL.revokeObjectURL(url); // Clean up the object URL});

In this section:

  1. An event listener on the metadataForm's submit event prevents the default browser form submission.
  2. We reload the original PDF bytes into a new PDFDocument instance. This ensures we're working with the original document's content before applying new metadata.
  3. Methods like setTitle(), setAuthor(), etc., are used to update the PDF's metadata with the values from the form input fields.
  4. pdfDoc.save() serializes the modified PDF document back into a Uint8Array (byte array).
  5. A Blob is created from these bytes, and URL.createObjectURL() generates a temporary URL.
  6. An anchor (<a>) element is programmatically created, its href set to the temporary URL, and its download attribute specifies the filename.
  7. Simulating a click on this anchor element triggers the browser's download prompt.
  8. Finally, the temporary URL is revoked to release memory resources.

This completes the functionality for a client-side PDF metadata editor. Users can now upload a PDF, modify its descriptive information, and download the updated version, all within their browser without any data leaving their device.

Managing digital documents and their properties effectively can streamline workflows. For more tools and guides on enhancing your online presence and managing digital assets, explore the resources available at Yammbo.