What is an efficient workflow to separate and organize bulk scanned PDF documents? (At work; software is limited.)

endless@lemmy.ml · 4 hours ago

I am not going to learn how to train an AI for this task. It is non trivial to install anything and I cannot use any remote/online tools. I would need to find an appropriate local AI (deepseek?) and learn how to use it from scratch.

I could write a bash script to modify filenames at home on my linux machine. But at work I just have windows. It has… powershell? I guess. I’ve never used that and to be honest I have no desire to. I would have to install something to cut up the PDFs. ocrmypdf that could do everything. And there are various other cli PDF manipulation tools in the repos. I would have to ask to have it installed. And any other dependencies required. Not gonna happen.

I want a way to easily go through hundreds of pages, look at them and quickly tag them. That is a perfect task for a GUI. To use a script I would have to scroll through the PDF in one application then switch back and forth into a text editor, to manually create a text document specifying which pages are in what document, and what category etc. I’d sooner do it on paper. But I’m sure there is a solution for this, I just don’t know what it is.

endless@lemmy.ml · 4 hours ago

Originals are unavailable, I only the scans. Which have been printed out of tradition. I could scan them again but it would take a very long time and further decrease the quality. And I don’t have the ability to sit by the scanner to catch the files as the come in. Scanner and workstation are in different locations. Manually separating the existing PDFs by using “print to PDF” would be faster.

endless@lemmy.ml · 6 hours ago

I don’t have the volume where learning a completely new technology would be worthwhile. I would have to manually verify each one anyways because it has to be perfect. The documents do not have any format as nice as a heading at the top. I’m willing to put in the time to go through each page, I just need a fast way to tag them, then automate separation and renaming.

endless@lemmy.ml · 6 hours ago

no way I’ll be able to install docker and whatever else if needed, run a server etc.

endless@lemmy.ml · 6 hours ago

that’s how I know how tall it is printed out! lol

endless@lemmy.ml · 10 hours ago

They print it out on paper, organize it with sticky notes and paper clips, then have someone re-scan it and name/organize files digitally according to the sticky notes. I don’t want to do it that way.

endless@lemmy.ml · 15 hours ago

What is an efficient workflow to separate and organize bulk scanned PDF documents? (At work; software is limited.)