Skip to main content
Prediction market platforms each write their own question titles. Polymarket might list “Will BTC reach 100kby2026?"whileLimitlesslists"Bitcointohit100k by 2026?" while Limitless lists "Bitcoin to hit 100,000 before 2026?” and Manifold phrases it differently again. Before Predexy can compare prices or detect arbitrage, it must first establish that these markets refer to the same outcome. That is the job of the matching system.

Why title-matching alone fails

Simple text matching — comparing words directly — breaks down quickly in practice. Synonyms, date formats, capitalization choices, and varying levels of specificity all produce titles that mean the same thing but look different to a string-comparison algorithm. Naive keyword matching also creates false positives: two questions that share words but ask about different events can appear more similar than they actually are.

How semantic matching works

Predexy uses semantic embedding vectors to compare market titles. When a new market arrives, its title is converted into a vector representation and compared against existing canonical questions. Semantically similar titles cluster together even when the exact words differ. Vector similarity is then combined with lexical and structural signals — entity hints, time window alignment, and category — to produce a composite confidence score. This hybrid approach reduces both false positives (unrelated markets incorrectly matched) and false negatives (related markets missed because the text looks different).

Three match methods

Every linked market-to-question pair carries a match_method that tells you how the connection was established:
MethodHow it works
semanticMatched automatically using embedding similarity and lexical signals
manualConfirmed or corrected by a human reviewer
exactMatched by an identical platform market ID or title string

Confidence scores

The matching engine assigns a confidence score between 0 and 1 to every proposed link. Three bands determine what happens next:
ConfidenceAction
> 0.85Auto-accepted — the link is created without human review
0.70–0.84Queued for manual review — a human confirms or rejects
< 0.70Auto-rejected — the markets are not linked
You can see the confidence score for each linked market in the markets[] array returned by GET /api/v1/questions/{id}. Use it to gauge how certain Predexy is that two listings represent the same event.

QuestionMarket fields

Each entry in the markets[] array of a question detail response includes matching metadata alongside pricing data:
FieldTypeDescription
confidencenumber (0–1)Composite match confidence score
match_methodsemantic | manual | exactHow the match was produced
semantic_similaritynumber (0–1)Raw cosine similarity from the embedding comparison
A semantic_similarity close to 1.0 means the market titles are nearly identical in meaning; a value closer to 0.7 indicates a borderline match that may have required human review.

Why this matters for arbitrage

Arbitrage detection runs only on questions where at least two platforms are matched. A false match — linking two markets that actually refer to different events — would create a phantom arbitrage opportunity. Because one position would resolve Yes while the other resolves No (or vice versa), acting on a false match is not a risk-free trade; it is an uncovered bet.
Always check the confidence and match_method on both legs of an arbitrage opportunity. A semantic match with confidence near 0.70 is borderline — you may want to verify the market titles manually before committing capital.
Strict matching thresholds are a risk control as much as a data-quality feature. Predexy deliberately errs on the side of rejecting uncertain matches rather than passing them downstream to the arbitrage scanner.