Why do supplier scorecards fail?

The main reason: organisations measure what is easy to collect, not what is important to the relationship. On-time delivery and invoice accuracy are good data points but they tell you nothing about the supplier's reliability during a disruption, their willingness to absorb cost pressure, or their responsiveness when something goes wrong. If your scorecard only has metrics your ERP exports automatically, it is not a management tool — it is a reporting exercise.

How many KPIs should a supplier scorecard have?

Between 8 and 12 active KPIs per supplier is the practical range. More than that and nobody reviews the scores seriously. Fewer than 5 and you miss dimensions that matter. The key is having one KPI per category that is the primary signal — quality, delivery, commercial, relationship, compliance — and 1-2 supporting metrics per category.

Can you build a supplier scorecard in Excel?

Yes. A well-built Excel/Google Sheets scorecard outperforms many commercial supplier management tools for mid-size procurement teams. The logic is straightforward: a scoring tab where raw data gets converted to 1-5 ratings per KPI, a weight tab where you assign category importance, and a dashboard tab that multiplies scores by weights and shows composite results. The challenge is not the tool — it is the process of collecting the inputs consistently.

Supplier Scoreboard Template & KPI Guide

Here is a situation you have probably seen. The company decides it needs better supplier performance visibility. Someone builds a scorecard. It tracks on-time delivery, invoice accuracy, maybe quality rejects. It runs for two quarters. Then the reviews stop happening because nobody has time, the data is always late, and the scores do not match what people actually experience with those suppliers.

The scorecard gets shelved. Or it keeps running as a reporting ritual that everyone ignores except during audits.

The scorecard was not the problem. The approach was. This guide covers how to build one that does not end up that way — without hiring a consultant, without expensive software, and without spending six months on a framework that never gets used.

Why 95% of supplier scorecards end up unused

The most common mistake is building a scorecard around data that is easy to get, not data that is useful to act on. ERP systems export on-time delivery percentages and invoice error rates automatically. So those become the KPIs. The scorecard fills with numbers that are statistically clean but strategically empty.

On-time delivery at 96% sounds good. But it says nothing about whether the supplier communicated proactively when they had a production issue last month, whether they absorbed a 3% material cost increase without pushing it straight to you, or whether they are investing in capacity that matters for your category over the next two years.

The second mistake is too many KPIs. A scorecard with 25 metrics per supplier is not a management tool — it is a spreadsheet that takes two hours to update every quarter and produces a composite score that nobody trusts because the inputs are inconsistent.

The signal problem

If your scorecards are consistently telling you that suppliers are performing at 85-90% when you know operationally that three of them are causing you real pain, your metrics are wrong. Good KPIs produce scores that match what your team already knows — and occasionally reveal things that were invisible.

The third mistake is treating the scorecard as a procurement tool rather than a relationship tool. When suppliers find out they are being scored, they react in one of two ways: they either start gaming the metrics, or they disengage from the process entirely. The review meeting becomes a one-sided presentation of numbers instead of a conversation about what is actually working.

The framework: 5 categories, 8–12 active KPIs

The right number of KPIs depends on the strategic importance of the supplier. For a tier-1 strategic supplier, you want the full set. For a transactional supplier, two or three KPIs are enough.

The categories and what belongs in each:

Category	Default weight	Primary KPI	Supporting KPIs
Delivery	25%	On-time delivery rate	Lead time vs. quoted; advance notice of delays
Quality	25%	Defect / non-conformance rate	Corrective action response time; repeat issues
Commercial	20%	Price competitiveness vs. market	Cost transparency; invoice accuracy; payment compliance
Relationship	20%	Responsiveness (measured)	Proactive communication; willingness to collaborate
Compliance	10%	Documentation up to date	ESG/sustainability reporting; audit readiness

Adjust the weights for your context. In a category where you are buying regulated components for production, quality might be 40% and commercial might drop to 15%. For indirect services suppliers, relationship and responsiveness might carry more weight than delivery.

On the relationship category

This is the one most procurement teams skip because it feels subjective. But "responsiveness" is completely measurable: average email response time in hours, percentage of issues acknowledged within 24 hours, number of escalations that required a second contact. You do not need to be vague about it. You need to track it.

The scoring scale that actually works

Use a 1–5 scale with pre-defined anchors per KPI. The anchors are what make the difference between a score that means something and a score that reflects whoever entered the data that day.

1 Unacceptable

2 Below expectations

3 Meets requirements

4 Exceeds

5 Benchmark

For on-time delivery, the anchors might be: 1 = below 70%, 2 = 70–84%, 3 = 85–94%, 4 = 95–98%, 5 = 99%+. For responsiveness: 1 = consistently unresponsive or requires multiple follow-ups, 3 = responds within 24-48h, 5 = same-day response with substantive answer.

Write the anchors before you start scoring. If you define them after, you will unconsciously write them to match the scores you already gave.

Building it in Excel or Google Sheets

Three tabs. That is all you need.

Tab 1: Scoring

One row per KPI. Columns: KPI name, Category, Weight %, Raw input, Score (1-5), Weighted score. Raw input is the actual number or text observation. Score is either calculated automatically (for quantitative KPIs) or entered manually (for qualitative ones). Weighted score = Score × Weight.

// On-time delivery score formula (auto-calculated from raw %) =IF(D4<0.70, 1, IF(D4<0.85, 2, IF(D4<0.95, 3, IF(D4<0.99, 4, 5))))

Tab 2: Weights configuration

Category name, Total weight %, list of KPIs in this category. This tab should sum to 100%. Keeping it separate means you can adjust category weights without touching the scoring logic. When your category strategy shifts — for instance, sustainability becomes a bigger factor — you change this tab only.

Tab 3: Dashboard

One row per supplier. Columns: Supplier name, composite score (sum of all weighted scores), score per category, trend vs. previous quarter, RAG status. The RAG thresholds: below 2.5 = Red, 2.5–3.4 = Amber, 3.5+ = Green. These are starting points — calibrate to your portfolio after the first two quarters.

// Composite score: sum all weighted KPI scores for this supplier =SUMPRODUCT(Scoring!F4:F15) // F column = weighted scores // RAG status =IF(B4<2.5, "Red", IF(B4<3.5, "Amber", "Green"))

One file per supplier, or one file for all?

One master file with a tab per supplier gets unmanageable fast. Better structure: one master dashboard file that pulls from individual supplier files via IMPORTRANGE (Sheets) or Power Query (Excel). Each supplier file can be shared with the supplier directly — they see their own scores and can add context. You keep the dashboard.

The review meeting: partnership, not audit

This is where most supplier performance management actually fails, regardless of how good the scorecard is. The meeting format sets the tone for the entire relationship.

A bad review meeting looks like this: procurement presents the scores, the supplier spends the meeting explaining why the numbers are unfair or wrong, both sides leave with a vague action plan that nobody follows up on. The next review starts the same way.

A good review meeting structure:

Share the scorecard 48 hours before the meeting. The supplier should not be seeing their scores for the first time in the room. Give them time to prepare context.
Open with what went well. Not as a formality — genuinely cover one or two things that worked. If nothing went well, the problem is bigger than a review meeting can fix.
On low scores, ask before you explain. "We scored you a 2 on responsiveness this quarter. Before I share our data, I'd like to understand what you saw from your side." Suppliers often have context that changes the picture. Sometimes they do not, and that is also useful information.
One action item per problem, with an owner and a date. Not a list of six things the supplier promises to improve. One thing, owned by a named person, with a specific deadline.
Tell them what is at stake. If the score is dropping toward a threshold that triggers a sourcing review, say so. Vague consequences produce vague responses.

The best outcome of a review meeting is that the supplier leaves knowing exactly what they need to do and why it matters to your business — not that they feel judged.

Connecting it to AI: automated scoring from emails and documents

Once the framework is stable and you have two or three quarters of data, you can start reducing the manual input burden. This is where AI adds real value — not in building the scorecard, but in populating it.

The two highest-ROI automation targets:

Responsiveness scoring from email

You already have the data — it is in your inbox. A simple script (Power Automate, Google Apps Script, or direct API call) can scan emails from a given supplier domain, calculate average response times per month, flag threads where the supplier required a second follow-up, and write the results directly into the scoring tab. No manual input. The data is more accurate than memory-based ratings.

Invoice accuracy from AP data

If you have access to AP data export (even just a CSV from your ERP), a prompt like the following turns it into a score in seconds:

// Prompt for invoice accuracy scoring from AP export You are a procurement analyst. I will give you invoice data for supplier [X] for Q[N] [YEAR]. Calculate: - Total invoices processed - Number with pricing discrepancies vs. PO - Number with missing or incorrect references - Number requiring manual intervention before payment Return a brief summary and a score from 1-5 using these thresholds: 5 = 0% error rate, 4 = under 2%, 3 = 2-5%, 2 = 5-10%, 1 = above 10%. Invoice data: [paste CSV rows]

For larger supplier bases, the same logic works as a batch process — one API call per supplier per quarter, outputs written back to the scorecard. The scoring criteria are defined by you, the anchor thresholds are yours, and the AI is doing arithmetic and pattern recognition — not making judgment calls.

Start with the manual version first

Do not try to automate the scoring before you have done two quarters manually. Manual scoring teaches you where the data is unreliable, which KPIs are actually informative, and where the human judgment is irreplaceable. Then you automate the parts that are mechanical. Automating a broken process just produces wrong answers faster.

What to do in the first 30 days

Pick three suppliers. Not your top 20 — three. One strategic, one that has been causing pain, one that has been performing well. Build the scorecard for those three manually, run one review meeting each, and see what you learn.

If the scores match your operational reality and the meetings produce at least one useful conversation, the framework is working. Then you scale. If the scores feel wrong or the meetings feel like a waste of time, fix the framework before adding more suppliers to it.

A scoreboard that runs for three years on 10 suppliers is worth more than one that launches across 50 suppliers and collapses after two quarters because nobody has bandwidth to maintain it.

Znáte tu situaci. Firma se rozhodne, že potřebuje lepší přehled o výkonnosti dodavatelů. Někdo postaví scorecard. Sleduje se on-time delivery, přesnost faktur, možná reklamace kvality. Dva kvartály to funguje. Pak review přestávají být — nikdo nemá čas, data jsou vždy se zpožděním, a skóre neodpovídá tomu, co lidé se svými dodavateli skutečně zažívají.

Scorecard skončí v šuplíku. Nebo pokračuje jako reportingový rituál, který všichni ignorují kromě auditů.

Problémem nebyl scorecard. Byl přístup. Tenhle průvodce popisuje, jak ho postavit tak, aby to nedopadlo stejně — bez konzultanta, bez drahého softwaru, a bez šesti měsíců práce na frameworku, který se nikdy nepoužije.

Proč 95 % supplier scoreboardů skončí nevyužitých

Nejčastější chyba: scorecard je postavený kolem dat, která jsou snadno dostupná — ne kolem dat, která jsou užitečná pro řízení. ERP systémy exportují on-time delivery a chybovost faktur automaticky. Takže se z toho stávají KPI. Scorecard se naplní čísly, která jsou statisticky čistá, ale strategicky prázdná.

On-time delivery 96 % zní dobře. Ale neříká nic o tom, jestli dodavatel proaktivně komunikoval, když měl minulý měsíc výrobní problém, jestli absorboval 3% zdražení materiálu bez toho, aby ho okamžitě přenesl na vás, nebo jestli investuje do kapacit, které pro vaši kategorii budou relevantní v příštích dvou letech.

Druhá chyba je příliš mnoho KPI. Scorecard s 25 metrikami na dodavatele není manažerský nástroj — je to tabulka, která zabere dvě hodiny aktualizace každý kvartál a produkuje kompozitní skóre, kterému nikdo nevěří, protože vstupy jsou nekonzistentní.

Problém signálu

Pokud vám scoreboardy konzistentně říkají, že dodavatelé jsou na 85–90 %, a vy zároveň víte, že tři z nich vám operativně způsobují skutečné problémy — vaše metriky jsou špatné. Dobré KPI produkují skóre, která odpovídají tomu, co váš tým už ví, a občas odhalí věci, které byly neviditelné.

Třetí chyba: scorecard je tratován jako procurement nástroj, ne jako relationship nástroj. Když dodavatelé zjistí, že jsou hodnoceni, reagují jedním ze dvou způsobů: začnou optimalizovat metriky, nebo se z procesu odtáhnou. Review meeting se stane jednostrannou prezentací čísel místo rozhovoru o tom, co skutečně funguje.

Framework: 5 kategorií, 8–12 aktivních KPI

Správný počet KPI závisí na strategické důležitosti dodavatele. Pro tier-1 strategického dodavatele chcete plný set. Pro transakčního dodavatele stačí dvě nebo tři KPI.

Kategorie	Výchozí váha	Primární KPI	Doplňkové KPI
Dodávky	25%	Podíl včasných dodávek	Lead time vs. nabídka; předběžné oznámení zpoždění
Kvalita	25%	Míra neshod / reklamací	Čas odpovědi na corrective action; opakované problémy
Komerční	20%	Cenová konkurenceschopnost vs. trh	Transparentnost nákladů; přesnost faktur; dodržení platebních podmínek
Vztah	20%	Rychlost odezvy (měřená)	Proaktivní komunikace; ochota ke spolupráci
Compliance	10%	Aktuálnost dokumentace	ESG/sustainability reporting; audit readiness

Váhy upravte pro svůj kontext. V kategorii, kde nakupujete regulované komponenty pro výrobu, může mít kvalita 40 % a komerční 15 %. U nepřímých dodavatelů služeb může vztah a rychlost odezvy nést větší váhu než dodávky.

K "relationship" kategorii

Tohle je ta kategorie, kterou většina procurement týmů vynechává, protože se to zdá subjektivní. Ale "rychlost odezvy" je naprosto měřitelná: průměrná doba odpovědi na email v hodinách, procento problémů potvrzených do 24 hodin, počet eskalací, které vyžadovaly druhý kontakt. Nemusí být vágní. Musí se sledovat.

Hodnotící stupnice, která funguje

Stupnice 1–5 s předem definovanými kotevními body pro každé KPI. Kotevní body jsou to, co rozděluje skóre, které něco znamená, od skóre, které odráží náladu toho, kdo v daný den data zadával.

1 Nepřijatelné

2 Pod očekáváním

3 Splňuje požadavky

4 Překračuje

5 Benchmark

Pro on-time delivery: 1 = pod 70 %, 2 = 70–84 %, 3 = 85–94 %, 4 = 95–98 %, 5 = 99 %+. Pro rychlost odezvy: 1 = konzistentně pomalá nebo potřebuje multiple follow-upy, 3 = odpověď do 24–48 h, 5 = odpověď tentýž den s věcnou reakcí.

Kotevní body napiš před začátkem hodnocení. Pokud je definujete až po zadání skóre, nevědomky je napíšete tak, aby odpovídaly číslům, která jste si vymysleli.

Implementace v Excelu nebo Google Sheets

Tři záložky. To je vše, co potřebujete.

Záložka 1: Scoring

Jeden řádek na KPI. Sloupce: název KPI, kategorie, váha %, vstupní data, skóre (1–5), vážené skóre. Vstupní data jsou skutečné číslo nebo textová poznámka. Skóre je buď vypočítané automaticky (pro kvantitativní KPI) nebo zadané ručně (pro kvalitativní). Vážené skóre = Skóre × Váha.

// Výpočet skóre on-time delivery ze surového procenta =IF(D4<0.70, 1, IF(D4<0.85, 2, IF(D4<0.95, 3, IF(D4<0.99, 4, 5))))

Záložka 2: Konfigurace vah

Název kategorie, celková váha %, seznam KPI v kategorii. Tato záložka musí dávat součet 100 %. Oddělení vah od scoringu znamená, že při změně category strategie upravíte jen tuto záložku.

Záložka 3: Dashboard

Jeden řádek na dodavatele. Sloupce: název, kompozitní skóre, skóre po kategoriích, trend vs. předchozí kvartál, RAG status. Prahové hodnoty pro RAG: pod 2,5 = červená, 2,5–3,4 = žlutá, 3,5+ = zelená.

Review meeting: partnerství, ne audit

Tady skutečně selhává většina supplier performance managementu, bez ohledu na to, jak dobrý je scorecard. Formát schůzky nastavuje tón celého vztahu.

Špatná review schůzka vypadá takto: nákup prezentuje skóre, dodavatel tráví schůzku vysvětlováním, proč čísla nejsou spravedlivá, obě strany odcházejí s vágním akčním plánem, který nikdo nenásleduje. Příští review začne stejně.

Dobrá struktura review schůzky:

Pošlete scorecard 48 hodin předem. Dodavatel by neměl vidět svá skóre poprvé v místnosti. Dejte jim čas připravit kontext.
Začněte tím, co fungovalo. Ne jako formalita — opravdu projděte jednu nebo dvě věci, které šly dobře. Pokud nic nešlo dobře, problém je větší, než co review schůzka zvládne vyřešit.
U nízkých skóre nejdříve ptejte se, pak vysvětlujte. "Dali jsme vám 2 za rychlost odezvy v tomto kvartálu. Než sdělím naše data, rád bych slyšel, co jste viděli z vaší strany." Dodavatelé mají často kontext, který změní obraz.
Jeden akční bod na problém, s vlastníkem a datem. Ne seznam šesti věcí, které dodavatel slíbí zlepšit. Jedna věc, vlastněná konkrétní osobou, s konkrétním termínem.
Řekněte jim, co je v sázce. Pokud skóre klesá k prahu, který spustí sourcing review, řekněte to. Vágní následky produkují vágní reakce.

Napojení na AI: automatické hodnocení z emailů a dokumentů

Jakmile je framework stabilní a máte dva nebo tři kvartály dat, můžete začít snižovat manuální zátěž. Zde AI přináší skutečnou hodnotu — ne při budování scoreboardu, ale při jeho naplňování.

Hodnocení rychlosti odezvy z emailu

Data už máte — jsou ve vaší schránce. Jednoduchý skript (Power Automate, Google Apps Script nebo přímé API volání) dokáže skenovat emaily od domény daného dodavatele, vypočítat průměrné doby odezvy za měsíc, označit vlákna, kde dodavatel vyžadoval druhý follow-up, a výsledky zapsat přímo do scoring záložky. Žádný manuální vstup. Data jsou přesnější než hodnocení z paměti.

Přesnost faktur z AP dat

Pokud máte přístup k exportu AP dat, prompt jako níže z nich udělá skóre během sekund:

// Prompt pro hodnocení přesnosti faktur z AP exportu Jsi procurement analytik. Dám ti data faktur dodavatele [X] za Q[N] [ROK]. Vypočítej: - Celkový počet zpracovaných faktur - Počet s cenovými neshodami vs. objednávka - Počet s chybějícími nebo špatnými referencemi - Počet vyžadujících manuální zásah před platbou Vrať stručné shrnutí a skóre 1–5 podle těchto prahů: 5 = 0 % chybovost, 4 = pod 2 %, 3 = 2–5 %, 2 = 5–10 %, 1 = nad 10 %. Data faktur: [vložit CSV řádky]

Nejdřív ruční verze

Nepokoušejte se automatizovat scoring před dvěma manuálně provedenými kvartály. Manuální hodnocení vás naučí, kde jsou data nespolehlivá, která KPI jsou skutečně informativní, a kde je lidský úsudek nenahraditelný. Pak automatizujte mechanické části. Automatizace chybného procesu jen rychleji produkuje špatné odpovědi.

Co udělat v prvních 30 dnech

Vyberte tři dodavatele. Ne top 20 — tři. Jednoho strategického, jednoho, který způsobuje problémy, jednoho, který funguje dobře. Postavte pro ně scorecard manuálně, proveďte jeden review meeting s každým, a zjistěte, co se dozvíte.

Pokud skóre odpovídá vaší operativní realitě a schůzky přinesou alespoň jeden užitečný rozhovor, framework funguje. Pak škálujte. Pokud skóre sedí špatně nebo schůzky připadají jako ztráta času, opravte framework předtím, než do něj přidáte více dodavatelů.

Scoreboard, který funguje tři roky na 10 dodavatelích, má větší hodnotu než ten, který se spustí na 50 dodavatelích a zhroutí se po dvou kvartálech, protože nikdo nemá kapacitu ho udržovat.

How to Build a Supplier Scoreboard Without a Consultant Jak postavit supplier scoreboard bez konzultanta

Why 95% of supplier scorecards end up unused

The framework: 5 categories, 8–12 active KPIs

The scoring scale that actually works

Building it in Excel or Google Sheets

Tab 1: Scoring

Tab 2: Weights configuration

Tab 3: Dashboard

The review meeting: partnership, not audit

Connecting it to AI: automated scoring from emails and documents

Responsiveness scoring from email

Invoice accuracy from AP data

What to do in the first 30 days

Proč 95 % supplier scoreboardů skončí nevyužitých

Framework: 5 kategorií, 8–12 aktivních KPI

Hodnotící stupnice, která funguje

Implementace v Excelu nebo Google Sheets

Záložka 1: Scoring

Záložka 2: Konfigurace vah

Záložka 3: Dashboard

Review meeting: partnerství, ne audit

Napojení na AI: automatické hodnocení z emailů a dokumentů

Hodnocení rychlosti odezvy z emailu

Přesnost faktur z AP dat

Co udělat v prvních 30 dnech

Ready to build this properly? Chcete to postavit správně?

How to Build a Supplier Scoreboard Without a Consultant Jak postavit supplier scoreboard bez konzultanta

Why 95% of supplier scorecards end up unused

The framework: 5 categories, 8–12 active KPIs

The scoring scale that actually works

Building it in Excel or Google Sheets

Tab 1: Scoring

Tab 2: Weights configuration

Tab 3: Dashboard

The review meeting: partnership, not audit

Connecting it to AI: automated scoring from emails and documents

Responsiveness scoring from email

Invoice accuracy from AP data

What to do in the first 30 days

Proč 95 % supplier scoreboardů skončí nevyužitých

Framework: 5 kategorií, 8–12 aktivních KPI

Hodnotící stupnice, která funguje

Implementace v Excelu nebo Google Sheets

Záložka 1: Scoring

Záložka 2: Konfigurace vah

Záložka 3: Dashboard

Review meeting: partnerství, ne audit

Napojení na AI: automatické hodnocení z emailů a dokumentů

Hodnocení rychlosti odezvy z emailu

Přesnost faktur z AP dat

Co udělat v prvních 30 dnech

Ready to build this properly? Chcete to postavit správně?

5 Procurement Processes You Should Automate First5 nákupních procesů, které byste měli automatizovat jako první

Why 90% of Procurement AI Projects FailProč 90 % AI projektů v nákupu selže

Prompt Engineering for Procurement ProfessionalsPrompt engineering pro nákupčí