SummerEase

PDF Summarizer & Highlight Extractor

Project Overview

I started out this project with a pretty straightforward idea that is to extract highlights from a PDF and present them in one place. As I developed it, I figured, why not take it a step further and also generate a summary based on those highlights, but over time, as I built and iterated on the web app, I ended up with a slightly different version where the app extracts highlights and also provides a clean summary of the PDF itself.

Final Features

Drag & Drop PDF Upload: Intuitive UI for seamless file uploads (stored locally for performance)
Two Core Functions:
- Get Highlights – Extracts annotations and highlight tags from the PDF
- Get Summary – Uses PyPDF2 and networkx to generate a concise summary
Caching System: Improves performance using hashed file caching for recent uploads
Feedback Form: Sends user feedback via email using Nodemailer
Python Integration: Used Python where Node.js fell short for accurate PDF parsing and processing

Tech Stack

Frontend: HTML, Tailwind CSS, JavaScript
Backend: Node.js, Express, Multer
Python: PyMuPDF, PyPDF2, pdf-parse for parsing logic

Challenges & Solutions

PDF Parsing Limitations in JavaScript → Solved by integrating Python scripts executed via child_process, ensuring better accuracy and flexibility
Handling Cold Starts for Python Scripts → Preloaded Python environments with a minimal process pool to reduce latency on first run
Cross Platform Development → Wrote adaptive code for OS specific Python paths (Windows, macOS, Linux, and Render hosting)
Performance Optimization → Implemented caching for repeated file processing and added timeout guards to prevent server hangs

Impact & Learnings

Deployed a fully working end to end summarization tool with scalable architecture
Designed with minimalist UX to simplify interaction while offering powerful features
Balanced user needs, technical constraints, and API limitations through quick iteration and agile debugging
Gained experience in full stack development, process optimization, and backend performance tuning

Project Links

Visit Site GitHub Repository