Title: Engage's balance is broken at the formula level. I rebuilt it from scratch and I'm tuning enemies with an LLM reading playtest logs — tell me where this falls apart

Title: Engage’s balance is broken at the formula level. I rebuilt it from scratch and I’m tuning enemies with an LLM reading playtest logs — tell me where this falls apart.

Engage’s lategame isn’t “undertuned,” it’s structurally broken. Avoid stacks with no ceiling — terrain +30, doubled to +60 on covert, plus support — until enemy hit hits ~0%. And their “fix” is the same band-aid slapped over every kind of collapse at once: enemies just skip any unit they can’t meaningfully hurt — 0% to hit, 0 damage, doesn’t matter, same patch. It doesn’t fix anything, it hides every failure mode under one rule. It’s a band-aid on a corpse that’s already been through a meat grinder. The avoid stacking is still there, the damage cliff is still there — the game just looks away from both.
I built a from-scratch SRPG (not a GBAFE hack) so I can change the formulas. My fixes:

・Hit floor at 30%, no ceiling. I clamp the floor only — permanent sources (base avoid, terrain, supports) can’t push enemy hit below 30%. Only temporary effects (active skills, consumables) can go under, down to a second floor of 10%. I’m deliberately not capping the top: I kill avoid-stacking by capping the avoid inputs, not by hiding hit behind a ceiling. A displayed-hit ceiling is the same symptom-hiding as Engage’s skip hack — it leaves the real cause (avoid with no ceiling) untouched.
・No subtractive-damage cliff. Instead of letting DEF cross ATK and pin damage at 0/1, I cap final DEF (all bonuses included) so the wall-unit-takes-0 case can’t exist.
・HP scales to a per-turn survival line, not a flat curve. Enemy burst grows compound; HP follows it for exactly one turn of hits, and AoE healing covers the cross-turn accumulation. HP has a hard cap at ~1.3–1.5× the survival line so the rare triple-rolled tank can’t go invincible.

Here’s the part some of you will hate: I’m not hand-tuning enemy stats. I auto-playtest each map, dump the logs (hit cdistribution, hits-to-kill, clear turns), and feed them to an LLM that adjusts enemy stats and re-runs. The human writes the rules (the floor, the survival-line target); the model just chases the target from data instead of me eyeballing it. Build is fully playable, ch01 generated end to end. Maps are rough because the log corpus is still thin.

■Questions for people who’ve actually shipped/playtested balance:

1:Does a hard 30% floor feel blunt in play, or is Thracia’s 1–99 hated for the inputs swinging too wide rather than the clamp itself?

2:Final-DEF cap vs. minimum-guaranteed-damage vs. sub-1 DEF coefficient — which holds up best lategame in your experience?

3:For anyone who’s tried data-driven / automated tuning: where does “tune to a target metric” break down? My fear is it converges to “technically survivable” maps that are boring — that the metric can’t see fun.

4:The big one. Everything FE does to patch lategame collapse is symptom-hiding: “enemies skip 0-damage units,” guaranteed-hit weapons to dodge the avoid problem, etc. — they hide the broken number instead of fixing why it broke. My clamps are arguably one tier up (capping inputs instead of masking outputs), but I don’t think they’re the root answer either. So: has anyone found a genuinely structural fix? Something that stops player power and enemy threat from diverging in the first place — not a clamp that catches them after they’ve already split. I’m thinking about the player-scales-multiplicatively / enemies-scale-additively asymmetry as the actual disease. How do you treat that, short of just rubber-banding enemies to player stats (which kills the reward of getting stronger)?

(English isn’t my first language — this is partly machine-translated, so apologies if anything reads oddly.)

This feels like something you’d want to do with either occasional manual testing or an Excel spreadsheet, not an LLM. LLMs have a rough time with math (and even counting) and have no experience actually playing Fire Emblem. You’re basically showing your damage logs to a 1st grader who’s never played FE before (but still spends time on the r/FE subreddit) and asking them if the numbers look right.

3 Likes

Then yeah, the answer is probably to build the environment so an LLM/agent can actually play the game and dump structured logs.

I don’t think an LLM can evaluate “fun” directly. Fun is not in the log. But it can absolutely help produce something plausible before the human pass: no 0% hit collapse, no 0-damage wall, hits-to-kill inside the target band, clear turns not exploding, no unit being skipped forever, etc.

So the loop should be:

play → log → adjust → replay

The model does not need to “understand” the game. It just needs to read histograms and move numbers toward a target.

Honestly, for post-Awakening FE-style systems, this should have been mandatory from the start. Once you let units jump through countless classes and stack huge numbers of skills, manual tuning becomes fake. Stats grow like compound interest, and the number of possible combinations explodes in a way even a child can understand.

If you remove limits, player power keeps multiplying forever. If enemies are still tuned additively by hand, the lategame will collapse. That is not a mysterious balance problem. It is just combination explosion.

And no, there is no Komuro inside the LLM trying to privatize the series. That alone is already an improvement.

Wouldn’t it be much easier to just, like, make a Python script for those tests? Like, a program can very easily simulate the relevant equations without needing to simulate/play the actual game. Teaching an LLM to play Fire Emblem seems like a massive overcomplication.

You don’t need to fine-tune balance to that degree, though? Like, it’s fine if some options end up being better than others. If your hack’s meta ends up being unbalanced to a toxic degree, you can just… nerf the outliers that are being reported as overperforming.

1 Like

Pick up a pencil

8 Likes

私は、RNG(ランダム要素)はタクティカルRPGの難易度を調整するための適切な手段ではないと思う。

運はプレイヤーが解決すべき予測不能な問題を生み出すべきであって、その問題を代わりに解決するべきではない。幸運な回避や会心の一撃が悪い判断の代償を帳消しにするべきではないし、不運なミスの直後に敵の必殺が発生して良い判断そのものが無意味になるべきでもない。

「どこまでのRNGなら許容できるのか」という議論は、しばしば本質を見失っていると思う。問題は敵が5%、10%、あるいは20%の確率で戦況をひっくり返し、プレイヤーの計画を無意味にしてしまうことではない。問題は、意味のある戦術的な挑戦の代わりとしてランダムな結果を用いていることだ。

そもそも、こうしたRPGは何のためにあるのだろうか。

なぜ私たちはそれを遊ぶのだろうか。

RNGが最も効果的に機能するのは、情報や計画に不確実性を与える時だ。戦場を予測しにくくするのは良い。しかし、敵の攻撃がまるでスロットマシンのようになり、突然二連続で必殺が発生したり、即死スキルが次々と連鎖したりするようなものではないはずだ。

では、タクティカルRPGは何を試すゲームなのだろうか。

『ファイアーエムブレム』はレベル上げを繰り返し、パッシブ効果を積み重ね、最終的には統計的に負けようがないほど強くなることを目指すゲームなのだろうか。それとも、プレッシャーのかかる状況で興味深い判断を下し、適切なユニットを選び、最善の行動を取り、刻々と変化する状況に適応していくことを試すゲームなのだろうか。

もし答えが後者であるならば、難易度はプレイヤーの意思決定から生まれるべきであって、そのターンにサイコロが味方したかどうかによって決まるべきではない。

Or, to put this in English…

I think RNG is not a valid lever of Tactical RPG Design difficulty.

Luck should create unpredictable problems for the player to solve, not solve those problems for them. A lucky miss or critical hit shouldn’t erase the consequences of a bad decision, and an unlucky miss followed by an enemy crit shouldn’t invalidate a good decision.

Debates about where to draw the line on acceptable RNG often miss the bigger issue. The problem isn’t whether an enemy has a 5%, 10%, or 20% chance to swing a battle and make your planning irrelevant. The problem is treating random outcomes as a substitute for meaningful tactical challenge.

What are these RPGs for?

Why do we play them?

RNG is at its best when it creates uncertainty around information and planning. It should make the battlefield harder to predict, not turn every attack into a slot machine where enemies can suddenly crit twice and chain multiple instant-kill effects.

What are tactical RPGs supposed to test?

Is Fire Emblem about grinding levels, stacking passive bonuses, and eventually becoming so untouchable and powerful it’s statistically impossible for you to lose? Or are they about making interesting decisions under pressure, choosing the right units and making the right moves, and adapting to evolving interesting situations?

If the answer is the latter, then difficulty should come primarily from the decisions players make, not from whether the dice happened to like them that turn.

6 Likes

I think your premise is wrong, or you are just dismissing it without really trying it.

Someone at my technical level can already make an LLM write the Python scripts for those tests. And I am not even an engineer. I had never touched code before this.

So the issue is not that this is impossible. It is just that IS cannot do it. At this point, even a child can make an SRPG with an LLM if the environment is prepared properly.

If you are making only one game alone, then maybe manual tuning is faster. But if several people are making two or three games or more, the equation flips very quickly.

Most people throw away their production know-how once a game is finished. But if that know-how is treated as shared infrastructure, other people can reuse it. And an LLM is basically a mass of shared knowledge.

Also, Fate or Engage collapsing is not just a matter of “patch the outliers.”

Even if you pretend the 100% avoid ring or the 0-damage unit does not exist, there is still a fundamental flaw: the player grows on a compound-interest curve, while enemies grow on a simple linear curve tied to growth rates.

That is not something you fix by lightly nerfing a few reported outliers. You would need to inject an absurdly strong corrective drug into the system, and in most cases it would probably be faster to rebuild the system from scratch.

jason you’re goated and all but why did you type this in japanese and then translate it to english

1 Like

You’ll either end up spending more time doublechecking the machine’s work or presenting garbage.

Not gonna trust a machine playing FE number madlibs lol

4 Likes

Ore thought it would be funny desu.

4 Likes

this is the kind of shit a Pokémon villain would say before throwing out a level 45 magnezone. Very funny quote.

Anyway, in my opinion you aren’t going to get any answers to your questions, because they aren’t attached to anything. Any formulas can work if the rest of the game - units, maps, enemy placement, supplementary mechanics - are designed with it in mind. “Which of these formulas works better for the endgame in your experience” is a meaningless question because the endgame of each hack is different.

Make a demo. Release it. Let people play it. They’ll tell you if it’s fun or not.

8 Likes

Most people throw away their production know-how once a game is finished. But if that know-how is treated as shared infrastructure, other people can reuse it. And an LLM is basically a mass of shared knowledge.

I don’t know what you’re trying to say with this. The FE games are extremely iterative, “know-how” from a previous title is always being reincorporated into the next game, else we wouldn’t see such a clear evolution of IS’s design priorities through the years. They are very obviously learning from each game and reimplementing those lessons and ideas into their next titles, as are most developers in the scene. Several of IS’s games are even obviously on the exact same engine, they’re not throwing anything out.

Even if you pretend the 100% avoid ring or the 0-damage unit does not exist, there is still a fundamental flaw: the player grows on a compound-interest curve, while enemies grow on a simple linear curve tied to growth rates.

I think the simple answer is that FE is not a game purely about pitting numbers against numbers. I find it interesting that you’re using Engage as your primary example for games where the player mercilessly outstrips enemy progress, when one of the most notable things about Engage is that on Maddening difficulty, enemies outstat you so consistently that by endgame you’re almost entirely relying on tools in your arsenal rather than numbers on your units. The additive scaling actually outstrips the player scaling, and requires solutions that get around those numbers rather than brute forcing them. A lot of people complain about Engage lategame, but I actually really enjoyed it for this reason, since it meant the game still had bite right up to the very end of the game where most FE games fall apart at a certain point.

If you want to design a lategame to challenge players, you don’t need to algorithmically perfect the numbers based on a, bluntly, arbitrary definition of perfection. You just need to create challenges for the player that they can’t solve by reaching some kind of obscene stat benchmark. Being able to avoid stack / defense stack to extreme levels in Engage actually very rarely matters in my experience on Maddening, and I think calling it a band-aid fix that enemies ignore you if they can’t hurt you ignores the obvious tactical consideration that comes into play there. Reaching arbitrarily high avoid has a specific purpose now, which is allowing your unit to spend that turn being ignored by the enemies. None of the maps are solved by this 0 hit strategy, so spending a massive amount of effort fixing it is just simply not important compared to creating maps intended to challenge a player with a fully built team of units.

6 Likes

If you really want to make a good game, play Tactical Breach Wizards and do everything it did except making the story bad.

cyrus upon sending out his weavile

2 Likes

i knew this one person who’d do this in the discord vc and it stressed me the fuck out. id be like “why are you doing that” people would be like “oh shes practicing” practicing for what??