How to Build a Browser-Based PDF Metadata Editor with JavaScript
PDF files contain more than just the visual content you see on the page. Hidden within every PDF document is metadata—structured information that describes the document itself. This can include details like the title, author, subject, keywords, creation date, and the application used to create it. Understanding and managing this metadata is crucial for document organization, searchability, and proper identification when files are shared. This tutorial will guide you through building a browser-based PDF metadata editor using JavaScript, allowing users to upload a PDF, view and edit its metadata, and download the updated file, all without server-side processing.
Step 1: Project Setup and Dependencies
To begin, you'll need a basic HTML file to serve as your application's interface and a JavaScript file for the core logic. We'll also integrate two essential libraries via Content Delivery Networks (CDNs): PDF-lib for reading and writing PDF metadata, and PDF.js for rendering PDF previews in the browser.
Create index.html:
<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>PDF Metadata Editor</title><!-- PDF.js for rendering PDF previews --><script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.10.377/pdf.min.js"></script><script>pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.10.377/pdf.worker.min.js';</script><!-- PDF-lib for reading and writing PDF metadata --><script src="https://unpkg.com/pdf-lib/dist/pdf-lib.min.js"></script></head><body><h1>Browser-Based PDF Metadata Editor</h1><input type="file" id="pdfInput" accept=".pdf"><div id="pdfPreview"><h3>PDF Preview</h3><canvas id="pdfCanvas"></canvas></div><div id="metadataEditor"><h3>Edit Metadata</h3><form id="metadataForm"><p><label for="title">Title:</label><input type="text" id="title"></p><p><label for="author">Author:</label><input type="text" id="author"></p><p><label for="subject">Subject:</label><input type="text" id="subject"></p><p><label for="keywords">Keywords:</label><input type="text" id="keywords"></p><p><label for="creator">Creator:</label><input type="text" id="creator"></p><p><label for="creationDate">Creation Date:</label><input type="text" id="creationDate" disabled></p><p><label for="modDate">Modification Date:</label><input type="text" id="modDate" disabled></p><button type="submit">Save & Download PDF</button></form></div><script src="app.js"></script></body></html>The pdfInput element allows users to select a PDF file, pdfCanvas will display a preview, and metadataForm contains input fields for editing document properties.
Create app.js:
Create an empty file named app.js in the same directory as index.html. This is where all your JavaScript logic will reside.
Step 2: Handling PDF Upload and Preview
The first step in our JavaScript logic is to handle file selection and display a preview of the uploaded PDF. This ensures the user has selected the correct document before proceeding to edit its metadata.
// app.jsconst pdfInput = document.getElementById('pdfInput');const pdfCanvas = document.getElementById('pdfCanvas');const metadataForm = document.getElementById('metadataForm');const metadataFields = {title: document.getElementById('title'),author: document.getElementById('author'),subject: document.getElementById('subject'),keywords: document.getElementById('keywords'),creator: document.getElementById('creator'),creationDate: document.getElementById('creationDate'),modDate: document.getElementById('modDate'),};let currentPdfBytes = null; // To store the original PDF bytespdfInput.addEventListener('change', async (event) => {const file = event.target.files[0];if (!file) return;// Store the original PDF bytescurrentPdfBytes = await file.arrayBuffer();// Display PDF previewawait renderPdfPreview(currentPdfBytes);// Read and display metadataawait readAndDisplayMetadata(currentPdfBytes);});async function renderPdfPreview(pdfBytes) {const loadingTask = pdfjsLib.getDocument({ data: pdfBytes });const pdf = await loadingTask.promise;const page = await pdf.getPage(1); // Get the first pageconst viewport = page.getViewport({ scale: 1.5 });const context = pdfCanvas.getContext('2d');pdfCanvas.height = viewport.height;pdfCanvas.width = viewport.width;const renderContext = {canvasContext: context,viewport: viewport,};await page.render(renderContext).promise;}In this code:
- We get references to our HTML elements.
- An event listener on
pdfInputtriggers when a file is selected. - The file's content is read into an
ArrayBuffer. renderPdfPreviewusesPDF.jsto load the PDF data and draw the first page onto thepdfCanvas. You can find more details onPDF.jsusage in its official documentation.
Step 3: Reading and Displaying Existing Metadata
Once the PDF is loaded and previewed, the next step is to extract its existing metadata using PDF-lib and populate the editing form. This allows users to see the current values before making changes.
// app.js (continued)async function readAndDisplayMetadata(pdfBytes) {const pdfDoc = await PDFLib.PDFDocument.load(pdfBytes);metadataFields.title.value = pdfDoc.getTitle() || '';metadataFields.author.value = pdfDoc.getAuthor() || '';metadataFields.subject.value = pdfDoc.getSubject() || '';metadataFields.keywords.value = pdfDoc.getKeywords() || '';metadataFields.creator.value = pdfDoc.getCreator() || '';const creationDate = pdfDoc.getCreationDate();metadataFields.creationDate.value = creationDate ? creationDate.toLocaleString() : '';const modDate = pdfDoc.getModificationDate();metadataFields.modDate.value = modDate ? modDate.toLocaleString() : '';}Here's how it works:
PDFLib.PDFDocument.load(pdfBytes)asynchronously loads the PDF document from the provided byte array. For more information onPDF-lib, refer to its GitHub repository.- Methods like
getTitle(),getAuthor(), etc., are called on thepdfDocobject to retrieve the respective metadata. - The retrieved values are then assigned to the
valueproperty of their corresponding input fields in themetadataForm. We use|| ''to ensure empty strings if a metadata field is not present, preventingnullfrom appearing in the input fields. - Creation and Modification dates are
Dateobjects and are converted to a localized string for display. These fields are disabled as they are typically system-generated and not directly editable by users.
Step 4: Updating and Saving the PDF
The final step involves capturing the user's changes from the form, applying them to the PDF document using PDF-lib, and then generating a new PDF file for download.
// app.js (continued)metadataForm.addEventListener('submit', async (event) => {event.preventDefault(); // Prevent default form submissionif (!currentPdfBytes) {alert('Please upload a PDF first.');return;}const pdfDoc = await PDFLib.PDFDocument.load(currentPdfBytes);// Update metadata fieldspdfDoc.setTitle(metadataFields.title.value);pdfDoc.setAuthor(metadataFields.author.value);pdfDoc.setSubject(metadataFields.subject.value);pdfDoc.setKeywords(metadataFields.keywords.value);pdfDoc.setCreator(metadataFields.creator.value);// Save the modified PDFconst modifiedPdfBytes = await pdfDoc.save();// Create a Blob and URL for downloadconst blob = new Blob([modifiedPdfBytes], { type: 'application/pdf' });const url = URL.createObjectURL(blob);const a = document.createElement('a');a.href = url;a.download = 'edited_document.pdf';document.body.appendChild(a);a.click();document.body.removeChild(a);URL.revokeObjectURL(url); // Clean up the object URL});In this section:
- An event listener on the
metadataForm'ssubmitevent prevents the default browser form submission. - We reload the original PDF bytes into a new
PDFDocumentinstance. This ensures we're working with the original document's content before applying new metadata. - Methods like
setTitle(),setAuthor(), etc., are used to update the PDF's metadata with the values from the form input fields. pdfDoc.save()serializes the modified PDF document back into aUint8Array(byte array).- A
Blobis created from these bytes, andURL.createObjectURL()generates a temporary URL. - An anchor (
<a>) element is programmatically created, itshrefset to the temporary URL, and itsdownloadattribute specifies the filename. - Simulating a click on this anchor element triggers the browser's download prompt.
- Finally, the temporary URL is revoked to release memory resources.
This completes the functionality for a client-side PDF metadata editor. Users can now upload a PDF, modify its descriptive information, and download the updated version, all within their browser without any data leaving their device.
Managing digital documents and their properties effectively can streamline workflows. For more tools and guides on enhancing your online presence and managing digital assets, explore the resources available at Yammbo.