I started out this project with a pretty straightforward idea that is to extract highlights
from a PDF and present them in one place. As I developed it, I figured, why
not take it a step further and also generate a summary based on those highlights,
but over time, as I built and iterated on the web app, I ended up with a slightly different version
where the app extracts highlights and also provides a clean summary of the PDF itself.
Final Features
Drag & Drop PDF Upload: Intuitive UI for seamless file uploads (stored locally for performance)
Two Core Functions:
Get Highlights – Extracts annotations and highlight tags from the PDF
Get Summary – Uses PyPDF2 and networkx to generate a concise summary
Caching System: Improves performance using hashed file caching for recent uploads
Feedback Form: Sends user feedback via email using Nodemailer
Python Integration: Used Python where Node.js fell short for accurate PDF parsing and processing
Tech Stack
Frontend: HTML, Tailwind CSS, JavaScript
Backend: Node.js, Express, Multer
Python: PyMuPDF, PyPDF2, pdf-parse for parsing logic
Challenges & Solutions
PDF Parsing Limitations in JavaScript → Solved by integrating Python scripts executed via child_process, ensuring better accuracy and flexibility
Handling Cold Starts for Python Scripts → Preloaded Python environments with a minimal process pool to reduce latency on first run
Cross Platform Development → Wrote adaptive code for OS specific Python paths (Windows, macOS, Linux, and Render hosting)
Performance Optimization → Implemented caching for repeated file processing and added timeout guards to prevent server hangs
Impact & Learnings
Deployed a fully working end to end summarization tool with scalable architecture
Designed with minimalist UX to simplify interaction while offering powerful features
Balanced user needs, technical constraints, and API limitations through quick iteration and agile debugging
Gained experience in full stack development, process optimization, and backend performance tuning