I am not Tolkien or Isaac Asimov, so my narrative may not be so appealing. But I can promise you: after every Mission you will be able to have a new useful personal tool.
An AI powered one, running on your PC, where you control the knobs, you connect the dots.
In today mission you will learn to use python to search the web and fetch fresh articles about a topic. No payments, no affiliation links: only free resources.
The end goal: build an app that do thing with AI for us automatically, in a clear workflow and with expected results.
Here a small recap:
🟢 Your AI your rules - Introduction
🕵️♂️ Mission 1: How I taught Python to search the web and fetch fresh articles
🧠 Mission 2: How AI reads those articles and summarizes them in plain English
📋 Mission 3: Auto-generating a clean table of contents
🔑 Mission 4: Extracting keywords like a digital bloodhound
📬 Mission 5: Sending a daily recap to my inbox
🔗 Mission 6: Stitching it all together into one smooth, automated pipeline.
🤖 Mission 7: Suggest me image prompts and generate these images for free
🎨 Mission 8: 4 images are too much for an email - collage and attach!
Let’s start
Search for me - I review later
I have always been amazed by the ChatGPT option to search the web for relevant sources. This feature now is not anymore relegated there, but also part of Qwen, Anthropic and other Big AI tech products.
I don’t know where or what kind of search engine they use… but I wanted to do the same.
I am a writer, a teacher and a Process Engineer, so I always look for relevant new data and information in the world wide (often too wide…) web.
What if Python and AI can do this for me? What if I can weekly skim through recent news and articles and get in my inbox the web search results?
I can always decide later what is relevant (or what caught my eyes)…
AI web search
Python is the perfect programming language for these kind of things: and we need so little dependencies to install. To be fair, I had to go through quite the (web) search to find free resources for the job.
Almost all the demos around the LLM community are built around the SERP API (google, paid service) or Tavily… with clear limits and costs. Here an example from Langchain docs:
screenshot from Langchain documentation
That being said, you can explore other options, but here I will show you how to use DuckDuckGo. If you have never heard about it, DDG is a browser (and a service) that does not track user data and perform web search keeping ads at the minimum and in privacy:
Use DuckDuckGo to help protect yourself online. Ever get ads so creepy it feels like your phone is listening to you? Or see ads for something you searched for once following you around everywhere? ... For decades, they’ve been tracking your searches, embedding trackers in their browser, and hiding even more trackers on the most-popular websites. This type of mass collection of your personal information can make you vulnerable to hackers, scammers, and other privacy-invasive companies.
# ddgs for search, rich for pretty printing
pip install ddgs rich
And you can start searching with 4 lines of code:
from ddgs import DDGS
from rich import print
results = DDGS().text("Token Order Prediction in Generative AI",
max_results=10, timelimit="m", region='us-en',
backend='google', timeout=20)
print(results)
The results are returned by default in a list of dictionaries, containing the title, the url and the text headers (only 50 words of the entire text).
Here an example…
[
{
'title': 'Predicting the Order of Upcoming Tokens Improves ...',
'href': 'https://arxiv.org/html/2508.19228v1',
'body': '27 Aug 2025 — 1. We introduce Token Order Prediction (TOP), a novel auxiliary training loss in addition
to NTP to improve language modeling in general. 2.'
},
{
'title': 'Predicting the Order of Upcoming Tokens Improves ...',
'href': 'https://arxiv.org/pdf/2508.19228',
'body': 'by ZMK Zuhri · 2025 — It has been shown that MTP improves performance of language models on certain
generative tasks that require some level of look-ahead, such as coding and ...'
},
{
'title': 'A new generative AI approach to predicting chemical ...',
'href': 'https://news.mit.edu/2025/generative-ai-approach-to-predicting-chemical-reactions-0903',
'body': '2 days ago — The new FlowER generative AI system may improve the prediction of chemical reactions. The
approach, developed at MIT, could provide realistic predictions ...'
},
{
'title': "How Generative LLM Prompts Work (or Don't Work)",
'href': 'https://www.linkedin.com/pulse/how-generative-llm-prompts-work-dont-rob-dods-dx5bc',
'body': "Prediction Loop – The model builds a reply by predicting the most likely 'next token', identifying
which token in the training data most often followed token ..."
},
{
'title': 'Apple taught an LLM to predict tokens up to 5x faster in ...',
'href': 'https://www.reddit.com/r/apple/comments/1mlae0p/apple_taught_an_llm_to_predict_tokens_up_to_5x/',
'body': 'The technique, called “multi-token prediction,” uses mask tokens to allow the model to speculate on
upcoming words while ensuring accuracy.'
},
{
'title': 'Chatting with organisational data: a generative AI ...',
'href': 'https://link.springer.com/article/10.1007/s10115-025-02551-x',
'body': 'by MA Bashar · 2025 — It does this by analysing the encoded information and determining the most
probable word or token that should follow. This prediction is not a one-time process; ...'
}
]
The good thing is: we can use this initial search results to retrieve the entire text of all the source! We need a python loop to iterate over the list and use Newspaper3k to get the text for us.