pointless

  • 0 Posts
  • 12 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle







  • Another vote for Tesseract – just to clarify the terminology, though: PDF is a fragile format best used read-only; so you really don’t want to edit a pdf, but make a new one using the same (or cleaned-up) bitmaps and a new ocr text layer.

    Now, tesseract is excellent at recognizing glyphs; but especially if the scanned image is a little fuzzy, the layout detection falters; and when it falters, you get redundant line breaks, & chunks of text in the wrong order – all of which gets incredibly annoying for searching & copying purposes. So if you can spare the time, and the text requires it, you may need to mark regions (paragraphs & titles mainly) on the bitmap image manually. There exist a few frontends to Tesseract that help with a task like that; check out, e.g., https://github.com/manisandro/gImageReader - inside single paragraph blocks of text, Tesseract doesn’t get as easily confused; and the text output is in the correct reading order, & w/o redundant breaks.



  • Yeah, I’m really happy with my Leopold which I’ve been using for the past 3 months. I used to have Unicomp before that; and while the typing feel was a little better than the brown switches I currently have on the Leopold, its build quality was lower, and eventually it just died on me thanks to what I later found out was a notoriously failure-prone controller they used to use back then. I’m told that Unicomp’s build quality has improved a lot since then.

    … though the frustrating thing is that I was able to get the Unicomp only because I was living in the US at the time; and the Leopold I got thanks to relatives in S. Korea. Where I live, ‘mechanical keyboard’ is treated like a synonym for ‘gamer keyboard’, and all the BS associated with that.

    So excellent off the shelf brands exist, though one has to do some local research first.