{<Z Kordian Zadrożny

AI, Strony WWW, Programowanie, Bazy danych

Building a Personal AI Assistant: Experiments with Google Gemini and Lessons from the Battlefield

by | Oct 3, 2025 | AI | 0 comments

Introduction

The goal of any serious AI implementation in business-or really, of anything, anywhere-is to solve a specific problem.

Of course, the TONS of HYPE and FAME cause people to implement AI whether it makes sense in a given case or not, but that’s a different story.

In my case, the problem was the complexity of data in a non-professional project: a science fiction novel I’m writing because, as I’ve mentioned before, I enjoy writing for relaxation. I decided to use Gemini to create a personalized AI assistant (a “Gem,” so to speak), which was intended to function as an interactive knowledge base about the world and plot of my story.

Whenever I wrote longer forms (and even shorter ones), I had this problem where after a few pages, I could no longer remember if a character had brown hair or green eyes, or whether a certain lady was a lieutenant or perhaps a colonel 😉 Seriously, I have a great memory, but it’s short.

The experiment provided a key conclusion: the potential is enormous and I’ve started using it, but the RAG technology that such solutions are based on has fundamental limitations in precise data retrieval. Understanding these limitations is extremely important today to grasp the constraints of these products.

This case study shows why.

1. The Hypothesis: AI as the Guardian of Fictional Canon

Managing information in an extensive project-whether it’s technical documentation, a legal regulation database, or, as in this case, the world of a novel-is a challenge. My hypothesis was simple: a language model with access to a dedicated file containing the scenario, character descriptions, and the text of the novel itself should be able to answer precise questions about facts.

Goal: To create a tool that, when asked, “Which characters were involved in the incident at the station over Titan?” returns an exact list, eliminating the need to manually search through pages.

2. Solution Architecture: What is RAG and How Does It Work?

The tool I used was Google Gemini (actually, it’s probably my favorite LLM – except for programming, where the Sonnet series wins). This system, much like Custom GPTs from OpenAI, bases its operation on an architecture called RAG (Retrieval-Augmented Generation).

To understand why my assistant made mistakes, we need to understand how this process works. It’s not “magic.” It’s a two-step, logical pipeline.

Imagine RAG as the work of an analyst with limited resources:

  • Step 1: Retrieval. When you ask a question, the system doesn’t immediately pass it to the main AI model. First, a component called the “Retriever” gets to work. Its sole task is to search the connected knowledge base (in my case, a Google Docs file) and find a few fragments of text (“chunks”) that seem to best match your query. It acts like a very fast but not very bright assistant, looking for keywords and semantic similarity, not deep understanding.
  • Step 2: Augmented Generation. The main language model (the Generator, e.g., Gemini) receives a specially prepared package on its “desk”: your original question AND those few text fragments found by the Retriever. This is key: the model’s task is to answer your question using ONLY these provided fragments as the source of truth. Everything else from its vast general knowledge is treated as secondary.

So, to answer your question, RAG is both a method of retrieving and providing data in context. The system first retrieves relevant data and then provides it as the sole permissible context for the model that generates the final answer. This architecture aims to limit AI “hallucinations,” but as my experiment showed, it also creates new, subtle problems.

3. Tests and Brutal Verification: Where the System Works and Where It Fails Spectacularly

The test results were unequivocal and showed a dichotomy in the model’s abilities. Honestly, despite knowing the limitations, I was shocked a few times by how poorly it performed.

Successes: The model excelled at tasks requiring synthesis and creativity based on a general context. For example:

  • Generating suggestions for developing dialogue in the style of a specific character.
  • Research and general brainstorming. For instance, a scene took place near the UN, and I asked where the nearest and best place for a helicopter landing was. It gave me a location, I checked it on Google Maps, and there it was—a perfect spot for the action.
  • Analyzing a character’s motivations based on a description of their actions. This was also a form of brainstorming, discussing whether a character’s behavior seemed consistent with their previous actions, and similar topics.

Failure (Case Study: The President Identification Problem): The system failed most spectacularly on a task that seemed trivial—precisely querying for a specific fact. Here is the test scenario:

  • Context: In my novel, which begins in 2035, a character who is the president of Poland appears. Instead of creating a fictional character, I used a real person from the current political scene, giving their first name and two unique traits that unambiguously identify them. Anyone from Poland would recognize him.
  • Control Test: When I pasted the same text fragment into a standard chat window with the model, without access to the file, it correctly identified the person described. The model used its broad, general knowledge.
  • The Real Test (using RAG): When I asked my specialized “Gem” the same question, its logic collapsed. The assistant first incorrectly identified the character as Marian Banaś, and in a subsequent attempt as… Andrzej Duda!

Diagnosis: This is not a simple mistake. It is a fundamental error resulting from the RAG architecture. The model, in its general mode, knows that a presidential term limit prevents Andrzej Duda from taking office again in 2035. However, in RAG mode, its reasoning was “trapped.” The Retriever likely failed to find the precise fragment with the identifying features, providing the Generator with incomplete or misleading context. As a result, the model, forced to answer based only on this weak data, generated an absurd conclusion.

4. Business Conclusions: From a Sci-Fi Novel to Corporate Implementations

The experience from this project translates directly to the business world. Companies that want to implement AI assistants to analyze internal documentation, product knowledge bases, or HR procedures must be aware of this challenge.

The key lesson is this: It’s not enough to “feed” AI a folder of PDF files. That’s a straight path to frustration and unreliable results.

Effective implementation requires architectural work:

  • Data Preparation and Structuring: Clean, well-organized sources are the foundation. Sometimes a simple database is better than a thousand unstructured documents.
  • Advanced RAG Strategies: It may be necessary to implement more complex pipelines that are better at indexing and retrieving facts (e.g., by creating knowledge graphs).
  • Design Oriented Around Limitations: Building systems that leverage the strengths of LLMs (synthesis, creativity) and minimize risks in areas where they are weak (factual precision).

The era of personal AI assistants is already here. However, the critical competence of the decade will not be just using them, but the ability to design and implement them correctly-with full awareness of the limits of their capabilities.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Share This

Share this post with your friends!