Why Good Trading Strategies Fail Before They Go Live

There is a moment every quant team eventually hits. The backtest looks promising. The Sharpe is clean. The drawdowns are manageable. Then someone asks: "What does this look like on a point-in-time universe?" — and three days disappear. Not because the question is hard. Because the data layer was never built to answer it cleanly.

This is the central dysfunction of most quantitative research operations, from a two-person prop desk in BKC to a mid-sized AIF running ₹500 crore in systematic strategies. The signal research — the actual intellectual work — is fast. The infrastructure surrounding it is not. And when the infrastructure is slow, the team starts optimising for comfort rather than discovery: they rerun the same backtest with slightly different parameters, they avoid asset classes where the data is messy, they stop questioning assumptions that are expensive to re-examine. The alpha decays quietly while the pipeline stays unchanged.

The 70% Problem

Ask any honest quant how their week broke down and the answer is usually the same: roughly 70% of time goes to data work. Fetching, validating, adjusting for corporate actions, reconciling vendor feeds, debugging why the adjusted close on a 2019 bonus issue does not match what the exchange bhavcopy says. The remaining 30% — the research, the hypothesis testing, the signal refinement — happens in the gaps.

This is not a new observation. A 2023 survey by a US-based quantitative research firm found that data preparation consumed between 60–80% of researcher time across teams of all sizes. The number is stubbornly consistent regardless of headcount. Adding researchers does not fix it — it scales the dysfunction. Each new hire inherits the same broken pipeline, learns to work around the same gaps, and adds more bespoke scripts that the next hire will also have to learn to ignore.

In India, the problem is structurally worse. NSE bhavcopy data is freely available but comes with no guaranteed corporate action adjustment. Bonus issues, splits, rights offerings, and demergers all require a separate adjustment layer that most teams build themselves — inconsistently, once, never revisited. Symbol mapping between NSE and BSE is a persistent source of silent errors, and survivorship-bias-free historical universes are not a default — they must be explicitly constructed and maintained. A team running a momentum backtest on the Nifty 500 without delisted constituents is not running a backtest. They are running a fantasy.

What Survivorship Bias Actually Costs You in India

Most prior studies documenting the performance of Fama-French factors in India failed to address survivorship bias in their analysis and may have provided potentially misleading conclusions. This is academic language for a practical disaster: strategies that look like they compound at 22% CAGR on a clean Nifty 500 universe often drop to 14–15% once you include the stocks that were in the index in 2012 and are no longer around to be counted.

The Indian small and mid-cap space is particularly treacherous. Between 2018 and 2020, dozens of Nifty Midcap 100 constituents were either suspended, delisted, or saw their market caps collapse by 70–90%. A momentum or quality strategy backtested on the current constituents of that index would completely miss this episode — because the stocks that blew up are no longer in the index to be included in the historical test.

The Corporate Action Gap

The second India-specific data hazard is the corporate action gap. India has an unusually high frequency of bonus issues relative to global peers — companies use them as a retail signalling mechanism rather than a genuine capital return tool. Each bonus issue, if not correctly back-adjusted, creates a phantom price drop in the historical series that looks exactly like a sharp drawdown. Momentum models trained on unadjusted data learn to avoid these "drawdowns" — which were never real — and the resulting signal has a hidden structural flaw that only surfaces in live trading.

A proper corporate action engine handling splits, bonuses, and mergers with intraday re-basing is not optional infrastructure — it is the foundation without which no other research is reliable.

The Org Structure That Kills Alpha

The most common quant team structure looks like this: one or two senior researchers who understand the theory, two or three junior researchers who do the data work, and a separate engineering team that "owns the infrastructure." The researchers and the engineers have different priorities, different sprint cycles, and different definitions of done. Every research request that touches infrastructure becomes a negotiation.

This structure is almost perfectly designed to slow signal discovery. The researchers cannot move without the engineers. The engineers cannot prioritise research requests over production system maintenance. The result is a queue — and a queue in a research operation is not a neutral inconvenience. It is where alpha ideas go to become stale.

The teams that consistently ship signals have a different configuration. Every researcher owns their own data environment. The shared infrastructure is minimal and opinionated: a canonical data store, an agreed backtest framework, and a signal review protocol. Individual researchers have full autonomy within those constraints. There is no ticket raised, no sprint dependency, no waiting for someone else to unblock you.

At WorldQuant, this philosophy is taken to an extreme — researchers operate almost entirely independently, submitting alphas to a centralised evaluation system without coordinating with other researchers at all. The insight is that coordination overhead compounds faster than most teams expect. The Alpha-GPT study, which evaluated research efficiency by comparing AI-assisted and human quant workflows, found that consistency and throughput — not raw intelligence — were the primary differentiators between high-output and low-output researchers.

The Right Stack for a Lean Quant Team

The technology choices a quant team makes in its first six months tend to calcify. Teams that start with Excel-based analysis keep Excel in the loop forever, even when it becomes the bottleneck. Teams that build their first backtest in a bespoke framework spend years maintaining it instead of improving it. The stack decision is a long-term org design decision.

For a lean team — two to six researchers — the right stack has three layers and nothing else.

Layer 1: The Data Store. A single canonical source of truth for price, volume, fundamentals, and corporate actions. Parquet files on object storage with a lightweight cataloguing layer work well. The critical constraint: every dataset must have a point-in-time flag. If you cannot reconstruct what the data looked like on any given date in history, the backtest is not trustworthy. For Indian equities, this means maintaining a separate historical universe file that records index additions and deletions by date — NSE does publish constituent history, but it requires active maintenance.

Layer 2: The Backtest Framework. Vectorised first, event-driven second. A vectorised framework — Pandas, Polars, or NumPy-based — runs fast enough for signal research. Event-driven simulation is necessary only for execution modelling and transaction cost analysis. Most teams build event-driven frameworks first because they feel more "real." The result is a framework that takes 40 minutes to run a simple momentum backtest, so the researchers run fewer backtests.

Layer 3: The Signal Registry. Every signal that passes initial testing gets logged: hypothesis, feature set, in-sample period, out-of-sample performance, decay rate, correlation to existing live signals. The registry does two things. It prevents duplicate work — someone rediscovering a signal that was tested and rejected eighteen months ago is a pure waste. And it forces researchers to be explicit about what they expect a signal to do before they look at the results, which is the only practical defence against overfitting.

Signal Decay and the Review Cadence Problem

Alpha decays. This is one of the few truly reliable empirical facts in quantitative finance. A momentum signal that delivered a 0.8 IC on NSE mid-caps in 2018 likely delivers something closer to 0.4 today. The decay is not linear. It tends to accelerate at inflection points — regime changes, new market participants, shifts in retail trading behaviour driven by SEBI regulation.

Most quant teams do not have a formal signal review cadence. Signals go live and stay live until they are obviously broken — meaning they have already cost the book money. The right structure is a scheduled quarterly review of every live signal, measuring realised IC against the IC that justified deployment. Signals whose IC has decayed below a threshold get reduced or retired. Signals that have held up get increased allocation.

The biggest source of alpha destruction in most quant operations is not a bad signal — it is a good signal that was never retired.

This requires someone to own the review process — not as a part-time responsibility but as a defined role. In a small team, this is often the senior researcher. The instinct is to skip the review when markets are calm and the book is performing. That instinct is exactly backwards. The review is most important when performance is good, because that is when decaying signals are hardest to see.

How AI Is Changing the Research Loop

The Alpha-GPT 2.0 study introduces a paradigm that directly attacks the research bottleneck: Human-in-the-Loop AI for quantitative investment — an interactive multi-agent system where large language models and human experts co-drive the entire quant pipeline, from alpha discovery to risk filtering. The practical implication is not that AI replaces the researcher. It is that AI can compress the translation cost between a research hypothesis and a testable signal expression.

The part of signal research that most consumes junior researcher time is not conceptual — it is syntactic. Translating "I want to test whether the ratio of delivery volume to total volume predicts next-week returns in mid-caps" into a correctly specified, error-free backtest expression takes time that has nothing to do with the quality of the idea. LLM-assisted code generation compresses this dramatically. In structured evaluations, AI-assisted researchers showed substantially higher throughput and consistency in alpha expression quality compared to junior human researchers working independently.

For Indian market research specifically, the practical gains are in data manipulation — writing the corporate action adjustment logic, constructing the point-in-time universe filters, handling the NSE-to-ISIN mapping. These are solved problems that should not consume researcher hours. They are exactly the kind of structured, rule-based work that LLM-assisted coding handles well.

SB Research Findings

Across our analysis of quant team structures and signal pipeline architectures, the pattern is consistent: teams that invest in data infrastructure first — canonical stores, point-in-time universes, automated corporate action adjustment — produce more signals per researcher per quarter than teams that invest in model sophistication first. The research bottleneck in India is not a shortage of ideas or modelling ability; it is the compounding cost of bad data hygiene. A team running survivorship-bias-free NSE data with clean corporate action history is starting each backtest from a fundamentally more honest position than one that is not, and the gap between their out-of-sample performance and their in-sample performance is correspondingly narrower. SB Signal is built on exactly this principle — every momentum signal in the scanner runs against a point-in-time universe with back-adjusted prices, because a signal that only looks good on biased data is not a signal.

Build the Plumbing or Lose the Alpha

The quant teams that will compound fastest over the next five years are not the ones with the best models. Models are increasingly commoditised — the theory is public, the implementations are open source, the academic literature is searchable. The edge is in the infrastructure that lets you test more ideas, faster, on cleaner data, with a formal process for retiring what stops working.

Two researchers with a canonical data store, a fast vectorised backtest framework, a signal registry, and a quarterly review cadence will outrun a ten-person team without those things. Not because they are smarter. Because they are not spending their best hours on work that should have been automated. The alpha is there. The question is whether your pipeline is fast enough to find it before someone else does.