Because AI Counts Every Word, Even the Ones You Regret Typing
You sit down with an AI tool, type out a few prompts, and boom—it spits out a paragraph. But have you ever wondered how AI processes your words? Or why it sometimes cuts off mid-sentence like a writer who ran out of caffeine?
Welcome to the world of tokens—the bite-sized chunks of text that AI actually “sees.” If you’ve ever thought, Why did my AI response stop? or Why am I paying so much for this AI session?, this guide is for you.
At Future Fiction Academy, we believe creators need control. That includes understanding the cost of AI and how to maximize every word. So, let’s break down tokens, how AI counts them, and why it matters for writers.
A Quick Refresher: What Is an LLM?
In last week’s post, we covered the basics of large language models (LLMs)—AI that predicts words based on patterns. Instead of understanding meaning, AI simply guesses the most likely next word, one token at a time.
But what exactly is a token, and why should you care?
What Are Tokens?
AI doesn’t process text in full words or sentences. Instead, it breaks everything down into tokens—small chunks of text that could be complete words, parts of words, or even punctuation.
Here’s how AI sees some common text:
- “Hello” → 1 token
- “Fantastic” → 1 token
- “Shouldn’t” → 3 tokens (“should”, “n” + “’t”)
- “Artificial intelligence” → 2 tokens (“Artificial”, “intelligence”)
Even spaces and punctuation can affect token counts. That means you’re paying for every little piece AI processes, whether it’s useful or not.
How Tokens Affect Input, Output & Costs
Every time you interact with AI, both your input (prompt) and its output (response) count toward your total token usage. That means:
- Longer prompts use more tokens. If you give AI a detailed, multi-paragraph instruction, those words count toward your total token usage.
- Longer responses use more tokens. AI doesn’t generate free words; every sentence it returns costs tokens.
- Even chat history uses tokens. In a chatbot setting, previous messages remain in memory and count toward the total usage.
If you’ve ever had AI cut off mid-sentence, it’s because you’ve hit the token limit for that response cycle.
How AI Uses Tokens (And Why It Sometimes Forgets Things)
Most AI models have a context window, meaning they can only “remember” a certain number of tokens at a time. Once that limit is reached, the AI starts forgetting earlier parts of the conversation.
For example:
- GPT-4 has a 32,000-token limit (around 24,000 words). Great for long-form storytelling!
- GPT-3.5 has a 4,096-token limit (around 3,000 words). Best for shorter responses.
- Claude 2 has a 100,000-token limit (huge), meaning it remembers a whole 75,000-word novel’s worth of text!
But here’s the catch: The context window includes both the input (your prompt) and the output (the AI’s response).
When using AI through an API, this is straightforward. You input text, the AI generates a response, and both count toward the total token limit.
In a chat setting, however, it’s more complicated. The entire chat history is included in the context window, meaning every past message takes up space. The more you chat, the sooner AI will start forgetting earlier parts of the conversation.
How to Tell When AI Is Forgetting Details
If AI starts acting confused or contradicting itself, it’s probably reached its context window limit. Signs include:
🚩 Repeating information you already provided
🚩 Forgetting key details mentioned earlier
🚩 Contradicting itself within the same conversation
🚩 Losing track of complex multi-step instructions
To work around this, you may need to summarize key details in your prompt, provide summaries or “real-time story development guides” for previous chapters, or reset the conversation to start fresh.
How to Use AI Efficiently (Without Wasting Tokens)
Want to get the most out of AI without burning through tokens? Try these tricks:
- Be concise in your prompts. The longer your instructions, the more tokens you use. Keep it clear and direct.
- Avoid unnecessary words. AI doesn’t need polite phrases like “Can you please” (unless tone matters). Just say “Rewrite this using simple language.”
- Use summaries for long prompts. Instead of feeding AI every detail, provide a short summary of previous text.
- Trim the fluff in AI responses. If AI gives you overwritten prose, tell it to “be more concise” or “summarize in one sentence.”
Want to Try It Yourself?
At Future Fiction Academy, we built RaptorWrite to give authors full control over their AI experience. Unlike other AI “wrapper” tools that hide how many tokens you’re using or place guidance prompts between you and the LLM to control the type of response you get , RaptorWrite lets you see your usage, adjust responses, and work efficiently. But you’ll need to bring your own OpenRouter API key to access AI models.
💡 Try it for free: 👉 Use RaptorWrite Here
By understanding tokens, you’re not just saving money, you’re taking control of your AI writing process. And when you control AI, you control your creativity.
Prefer to Learn in a Different Format?
📺 Prefer a visual guide? Check out our free Introduction to AI Basics for Fiction Writers course: Watch here
📖 Want everything in one place? Grab our AI Basics for Fiction Authors book: Get it on Amazon
TL;DR (Because Writers Love a Summary)
AI processes text in tokens, not words.
- More tokens = higher cost and more memory use.
- AI forgets old text when it hits its token limit.
- Be concise to maximize efficiency.
Next up: AI Hyperparameters: How to Tame the Beast!
Last post: How AI Writes: What Large Language Models (LLMs) Mean for Authors
Want to read more of our blog posts? Check out the archive.