Almost Timely News: 🗞️ Using Local AI for Document Scanning (2026-02-01)Mundane? Yes. Useful? Also yes.Almost Timely News: 🗞️ Using Local AI for Document Scanning (2026-02-01) :: View in Browser The Big PlugTwo new things to try out this week: 1. Got a stuck AI project? Try out Katie’s new, free AI Readiness Assessment tool. A simple quiz to help predict project success. Content Authenticity Statement100% of this week’s newsletter content was originated by me, the human. Learn why this kind of disclosure is a good idea and might be required for anyone doing business in any capacity with the EU in the near future. Watch This Newsletter On YouTube 📺Click here for the video 📺 version of this newsletter on YouTube » Click here for an MP3 audio 🎧 only version » What’s On My Mind: Using Local AI for Document ScanningThis week, let’s dig into a very specific application of local AI. Last week we covered how to get started with private, local AI which I recommend you review and do. We’ll be building on that. If you don’t want or can’t get a local AI model running, consider using an infrastructure provider like DeepInfra or Groq (note the spelling) as they can provide low cost access to today’s best models, often with zero data retention APIs. The specific application of local AI we’re looking at this week is something seemingly mundane: document scanning. Now, you might say, “Chris, that is the more boring, mundane, unsexiest use of generative AI, don’t you have anything more interesting?” But something like document scanning is the epitome of the Shirky Principle: once a technology is technologically boring, it can be societally interesting. Using generative AI for document scanning is boring, but there are plenty of documents in the world that are very difficult to read through normal scanning. Photographic scans of paperwork. Documents with charts and graphs and images embedded in them. Weirdly formatted tables. Partially redacted documents. All those are things that can throw regular document scanners for a loop. Generative AI models trained to be document scanners can overcome many of those issues. So let’s dig in. Part 1: A Pre-Emptive GlossaryBefore we dig into the how-to, let’s take a few moments to describe the what. Document scanning is a profession unto itself and has a lot of lingo and jargon - jargon that, if you know, makes it easier to work with AI. The most common term you’ll hear is OCR - optical character recognition. This is what a lot of computer vision software started with, the need to scan letters and convert analog text (like the printed page) into digital text. Additionally, as models got more powerful and compute got bigger, OCR improved to start reading handwriting. Today, you can take a photo of handwritten text even from centuries past and most AI models will be able to transcribe it. As a fun aside, I tried that with some Sumerian text from a local museum, from the Museum of Fine Arts in Boston, and Google’s Gem and I was actually able to read it with reasonable accuracy. Transcribe itself is a specific word in document scanning, something you’ll want to note for AI prompts. To transcribe is to write down text, word for word, as closely as possible to the original. In general, you often want AI to transcribe something first before doing anything else, so you can check the quality of its work. Many people make the mistake of trying to have AI do too much and just process an entire document in one shot, rather than break down the steps of the workflow. When we talk about using AI for document scanning, we are often talking about VLMs, vision language models. VLMs are models that can work with images as well as text (and sometimes video). They can “see” in ways that a text model cannot, because they’ve been trained on images as well as text. When we’re doing document scanning, we want to make sure we’re using a vision language model for the processing part. Speaking of which, there is a distinct workflow for doing document scanning with AI:
A few other terms you’ll want to know: SQLite is one of the most useful database formats there is, because it’s a single file that lives on your computer. Unlike bigger systems that require servers (MySQL, PostgreSQL, Microsoft SQL Server, Google BigQuery, etc.) SQLite is just a single flat file that lives in any folder. You can pick it up and move it around if needed. It’s also a database format that AI is especially fluent in and knows how to manipulate, which comes in handy when we’re talking about document scanning and storage. Open source software is any software that is licensed for other people to use and modify, often for free, even for commercial use (depending on the license). Many of the world’s top systems and software are open source, such as the Apache web server, the Linux operating system, many programming languages, and other core technologies. Often abbreviated FOSS (free and open source software), open source is what powers a lot of the modern Internet. Generative AI has extensive knowledge of open source software, which comes in handy for not reinventing the wheel. Python is probably the most common programming language in the world now, and certainly the most popular language in open source software. Python version 3.12 and 3.13 are the versions that many libraries (basically like plugins) that AI tools depend upon. Finally, context window. All AI has two kinds of memory, long term and short term. Long term memory is the data the AI has been trained on, and as of today, when you use any AI model, that memory does not change. It’s why so many AI tools integrate web search. The short term memory, or working memory, is called the context window. For the purposes of building and working with your own local AI, the bigger you make it (you set it in your software, like LM Studio/AnythingLLM) the more memory your AI consumes and the slower it runs. That’s another reason why lots of small tasks are better than one big task - it’s far more resource efficient. Okay, now that we’ve got the book learning out of the way, let’s dig in. Part 2: Choosing a ModelAssuming you completed the setup from last week’s newsletter, you should have either LM Studio or AnythingLLM set up on your computer. We now need to find an AI model that will work for document scanning. There are a ton of excellent choices out there, some of which include:
|