How AI Classifies HS Codes: The Technology Behind Automated Tariff Classification

Q: My supplier gives me terrible product descriptions. Can AI still classify from those?

Sometimes, but not reliably. 'Electronic component' or 'plastic part' isn't enough for any classification system — human or AI — to do a good job. The better tools will flag when the input description is too vague to classify with confidence, which is actually useful. The worse ones will confidently assign a code anyway. If your suppliers consistently give you vague descriptions, the real fix is upstream — work with them on what information you actually need on the commercial invoice. Your broker can help you build a template.

AI classification tools are everywhere right now, and every software vendor wants to sell you one. Some of them are actually useful. Some of them will get you a CBSA compliance review and a very bad Tuesday. Knowing the difference means understanding what's actually happening under the hood — not the marketing version, the real version.

This isn't a pitch for any particular tool. It's an explanation of how these systems work, where they're genuinely good, and where they'll confidently give you the wrong answer with a smile.

One thing worth flagging before we get into the technology: CBSA updated its trade compliance verification priorities in 2026 to specifically target goods subject to retaliatory tariffs. If your AI tool is classifying goods anywhere near those categories, the stakes for getting it right just went up considerably.

What "AI Classification" Actually Means

When someone says their software uses AI to classify HS codes, they usually mean one of three things — and they're not the same thing.

Rules-based automation: Not really AI. It's a lookup table with logic. "If product description contains 'cotton t-shirt', assign 6109.10." Fast, consistent, brittle. One unusual product breaks it.
Machine learning classification: A model trained on historical data — usually millions of past customs entries — that predicts the most likely HS code based on your product description. This is what most commercial tools actually use.
Large language model (LLM) integration: Newer systems that use GPT-style models to read product descriptions, technical specs, and even documents, then reason through the tariff schedule the way a classifier would. More flexible, but also more capable of hallucinating a plausible-sounding wrong answer.

Most tools on the market right now are some combination of the second and third. A few are still just the first one with a chatbot bolted on.

How the Machine Learning Approach Works

The core of most AI classification systems is a text classification model. Here's the basic process:

You feed it a product description — say, "polyester mesh back office chair with adjustable lumbar support and chrome base."
The model converts that text into a numerical representation (called an embedding) that captures semantic meaning.
It compares that representation against patterns learned from training data — typically millions of previously classified goods.
It returns the HS code (or a ranked list of codes) that most closely matches those patterns, along with a confidence score.

That chair probably lands at 9401.30 — seats of a kind used in offices. A well-trained model gets that right almost every time. The problems start when your product doesn't look like anything in the training data, or when the classification depends on technical details that aren't obvious from a description.

A furniture importer we worked with had a line of "ergonomic kneeling stools." The AI kept classifying them as 9401.39 — other seats. Technically defensible, but the correct heading was actually 9401.80 for other seats with different construction. The difference was $0 in duty in that case, but on a medical device or a textile, that kind of miss costs real money.

The Training Data Problem

Every AI model is only as good as what it learned from. For customs classification, that means the training data is everything.

The best systems are trained on actual customs entry data from multiple countries — US ACE data, CBSA historical filings, EU customs declarations, and so on. That gives the model exposure to how real classifiers handle real goods. Some vendors also train on CBSA advance rulings, WCO classification opinions, and national tariff schedules.

The problem is that training data reflects past decisions — including wrong ones. If a classification was consistently applied incorrectly across thousands of entries before CBSA issued a D-memo correction, the model learned the wrong answer. Garbage in, garbage out, just at scale.

There's also a recency problem. The HS schedule was updated in 2022 (HS 2022). Any model trained heavily on pre-2022 data has gaps. Chapter 85 in particular changed significantly — new headings for flat panel displays, solar cells, and lithium-ion batteries. If your vendor can't tell you when their training data was last updated and how they handle schedule changes, that's a red flag.

The retaliatory tariff situation makes this even more pointed right now. Canada's surtax remission orders have been amended multiple times in 2025 and 2026 — CBSA extended the surtax remission for two additional months as recently as this spring. A model that isn't current on those changes could be classifying goods into or out of surtax exposure incorrectly. Ask your vendor specifically how they handle regulatory changes between major HS updates.

Where LLMs Change the Game

The newer generation of tools uses large language models — think GPT-4 class or similar — either on their own or layered on top of a traditional classification model. This matters for a few reasons.

LLMs can read and reason, not just pattern-match. You can give them a full product specification sheet — materials, dimensions, function, end use — and they'll work through the General Rules of Interpretation the way a trained classifier would. They can handle products they've never seen before because they understand language and logic, not just statistical patterns.

They're also better at explaining their reasoning. A good LLM-based tool will tell you: "Classified under 8471.30 because this is a portable automatic data processing machine weighing less than 10 kg, per GRI 1 and Note 5(A) to Chapter 84." That's auditable. You can check the logic.

The downside is hallucination. LLMs can generate confident, well-structured, completely wrong answers. I've seen demo outputs where the model cited a tariff note that doesn't exist, or applied a rule from the US HTS that has no equivalent in the Canadian tariff. The reasoning sounded perfect. The answer was wrong.

The better vendors mitigate this by grounding the LLM in the actual tariff schedule text — forcing it to cite real notes and headings rather than generating from memory. That's called retrieval-augmented generation (RAG), and it's the difference between a useful tool and a liability.

Vision-Based Document Processing

Here's where things get genuinely interesting for day-to-day operations.

The newest systems don't just accept text descriptions. They can read documents — product spec sheets, invoices, technical drawings, even images of the actual goods. This is called multimodal AI, and it matters because the information you need to classify correctly is often locked inside a PDF that your supplier sent you.

A good vision-based system can:

Extract the relevant product details from a commercial invoice automatically
Read a technical datasheet and pull out material composition, function, and end-use
Look at a product photo and identify physical characteristics that affect classification
Cross-reference multiple documents to resolve conflicting information

Practically speaking, this cuts the manual data entry out of the classification workflow. Instead of your team reading a 12-page spec sheet and typing a description into a classification tool, the tool reads the spec sheet itself.

The accuracy on document extraction varies a lot by vendor. Tables are hard. Handwritten notes are hard. Poorly scanned PDFs are hard. Before you commit to any tool, test it on the actual documents your suppliers send — not the clean sample PDFs in the demo.

Confidence Scores and When to Trust Them

Most AI classification tools return a confidence score alongside the HS code. "9401.30 — 94% confidence." This sounds reassuring. It's more complicated than that.

Confidence scores in machine learning models measure how certain the model is relative to its own training — not how likely the answer is to be correct in absolute terms. A model can be 97% confident and still be wrong if the product type is underrepresented in its training data.

A better way to think about confidence scores: they're useful for triaging your review queue. High confidence on a straightforward product — a cotton t-shirt, a steel bolt, a glass bottle — probably doesn't need a second look. Low confidence on anything, or high confidence on a complex product, warrants human review.

Set your internal threshold based on your risk profile. If you're importing goods with duty rates under 3% and no trade remedy exposure, you can probably accept higher automation rates. If you're importing goods subject to SIMA duties, anti-dumping measures, retaliatory surtaxes, or controlled goods regulations, you want a human eye on every classification regardless of what the AI says.

Accuracy Benchmarks: What's Actually Achievable

Vendors love to quote accuracy numbers. "98% accurate at the 6-digit level." That sounds great until you understand what it means.

Six-digit accuracy is the easy version. The HS schedule has about 5,000 6-digit subheadings. Most products cluster into a small fraction of those. A model trained on typical commercial goods will hit 95%+ at 6 digits without breaking a sweat.

Eight-digit accuracy — which is what you actually need for a Canadian B3 — is harder. The Canadian Customs Tariff has roughly 8,000 8-digit tariff items. More granularity means more ways to be wrong.

Ten-digit accuracy, where statistical reporting codes come in, is harder still.

Honest benchmarks from independent testing (not vendor marketing) put current AI tools at roughly:

6-digit level: 90–96% on common commercial goods
8-digit level: 80–90% on common commercial goods
8-digit level on complex or technical goods: 60–75%

That 60–75% range for complex goods is where most of the real-world problems live. Electronics, chemicals, medical devices, textiles with specific fibre compositions — these are the classifications that require judgment, and judgment is still where humans outperform the models.

How AI Tools Handle CBSA Advance Rulings

This is an underused feature in most tools, and it's one of the most valuable ones.

CBSA publishes advance rulings — official tariff classification decisions — through the CERS (Customs Electronic Ruling System). These are binding determinations. If CBSA has already ruled on a product substantially similar to yours, that ruling is your best evidence of the correct classification.

Better AI classification systems are trained on or can search against the CBSA advance ruling database. When you classify a product, the tool can surface relevant rulings and show you how CBSA actually decided similar cases. That's not just useful for accuracy — it's useful for your audit defence.

If you're ever in a CBSA compliance review and you can show that your classification matched an existing advance ruling for a substantially similar product, you're in a much better position than if you just say "the AI told us so."

If your AI tool doesn't reference advance rulings at all, ask why. Either they don't have access to the database, or they haven't built that feature. Both are worth knowing.

The Retaliatory Tariff Layer Nobody Talks About

This is new enough that most AI classification tools haven't caught up to it yet.

Since 2025, Canada has been running a retaliatory surtax regime on a significant range of US-origin goods. The surtax remission orders have been amended repeatedly — CBSA has issued guidance specifically to narrow the scope of relief, and compliance verification priorities have been updated to target goods in these categories. As of June 2026, this is still an active, moving situation.

Here's the problem for AI tools: getting the HS code right is only step one. Whether that code is subject to the surtax, eligible for remission, or caught by CBSA's updated verification priorities is a separate analysis. Most classification tools don't do that second step automatically. They'll give you the code and stop there.

If you're importing anything from the US — or anything that might be US-origin even if it's shipped from elsewhere — you need a workflow that checks the classification output against the current surtax schedule. Your broker should have that. If they don't, that's a conversation worth having.

What AI Can't Do (Yet)

Be honest with yourself about the limits. AI classification tools are genuinely useful — I'm not dismissing them. But there are things they consistently struggle with:

End-use provisions: Some tariff items depend on what the goods will actually be used for. The AI doesn't know your customer's factory floor. You do.
Composite goods and sets: GRI 3 is hard. When you have a product made of multiple materials or a set of items packed together, the classification rules get complicated fast. Most AI tools handle simple cases fine and complex cases poorly.
New product categories: The HS schedule is updated every five years. Novel products — new battery chemistries, new materials, new technologies — often don't fit neatly anywhere. The AI will classify them somewhere, but "somewhere" might not be right.
Trade remedy exposure: Knowing the HS code is one thing. Knowing whether that code is subject to a SIMA finding, a safeguard, a retaliatory surtax, or a remission order requires a separate layer of analysis that most classification tools don't do automatically.

The practical answer: use AI for the first pass, especially on high-volume, repetitive classifications. Use a human — your broker, your in-house trade team, or a classification specialist — for anything complex, high-value, or high-risk.

Practical Steps for Evaluating an AI Classification Tool

If you're looking at tools right now, here's what to actually test:

Run your own products through it. Not the vendor's demo products — your actual SKUs, with your actual supplier descriptions. See what comes out.
Test the hard ones. Pick five products you know were difficult to classify. See if the tool gets them right and whether the reasoning makes sense.
Ask about training data recency. When was the model last updated? Does it reflect HS 2022? How do they handle tariff schedule amendments and mid-cycle regulatory changes like surtax orders?
Ask about advance ruling integration. Can it surface relevant CBSA rulings? Can it search WCO opinions?
Check the explainability. Does it tell you why it chose that code? Can you follow the logic back to the tariff schedule?
Understand the confidence threshold controls. Can you set your own review thresholds? Can you route low-confidence items to a human queue automatically?
Ask about trade remedy flagging. Does the tool flag when a classified code is subject to SIMA, surtax, or remission orders? If not, how does your workflow handle that gap?

Frequently Asked Questions

Is AI classification legally binding in Canada?

No. Only a CBSA advance ruling is binding. AI classification is a tool to help you arrive at the correct answer — the legal responsibility for what goes on your B3 is yours, or your broker's. If CBSA audits you and your classification is wrong, "the AI said so" is not a defence. It's actually worse than no defence, because it suggests you didn't apply professional judgment.

How accurate is AI classification compared to a trained customs broker?

On straightforward, common goods — textiles, basic hardware, standard consumer products — a well-trained AI tool is competitive with a junior classifier and faster than anyone. On complex goods, technical products, or anything requiring interpretation of tariff notes and GRI rules, an experienced broker still outperforms the current generation of tools. The honest answer is: it depends heavily on the product category and the quality of the tool.

Can AI tools handle the Canadian tariff specifically, or are they mostly built for the US HTS?

Most of the major tools were built primarily on US HTS data because the US market is larger and the ACE database is more accessible. Some have added Canadian tariff coverage, but the depth of training data for Canadian-specific classifications — including Canadian tariff treatments, GPT, LDCT, CETA preferential rates — varies a lot. Ask your vendor specifically about Canadian coverage and test it on goods where the Canadian and US classifications diverge.

What happens if I rely on AI classification and CBSA disagrees?

You'll owe the difference in duties plus interest, and potentially a penalty under the Administrative Monetary Penalty System (AMPS). Penalties under AMPS for misclassification start at $150 for a first offence and can reach $25,000 per occurrence for repeated or flagrant violations. The duty recovery itself is often the bigger number — CBSA can go back four years on a self-assessment basis. One importer we know had a systematic misclassification on industrial fasteners that ran for three years. The recovery was over $180,000 before penalties.

Should I use AI classification for goods subject to SIMA duties or anti-dumping measures?

Use it as a starting point, but never as the final answer. SIMA findings are product-specific and often turn on technical details — steel grade, dimensions, end use — that require careful analysis. A wrong classification on SIMA-subject goods isn't just a duty underpayment; it can trigger a CBSA investigation. The same applies to goods caught by retaliatory surtaxes right now — CBSA has explicitly updated its verification priorities to focus there. Get a human expert involved on anything where trade remedies are in play.

My supplier gives me terrible product descriptions. Can AI still classify from those?

Sometimes, but not reliably. "Electronic component" or "plastic part" isn't enough for any classification system — human or AI — to do a good job. The better tools will flag when the input description is too vague to classify with confidence, which is actually useful. The worse ones will confidently assign a code anyway. If your suppliers consistently give you vague descriptions, the real fix is upstream — work with them on what information you actually need on the commercial invoice. Your broker can help you build a template.

With all the tariff changes happening right now, how do I know if my AI tool is current?

Ask directly. Get a written answer. The question isn't just whether they've incorporated HS 2022 — it's whether they're tracking the Canadian surtax orders, the remission amendments, and CBSA's updated verification priorities. Those have been changing on a near-monthly basis through 2025 and into 2026. A tool that was accurate in January might be giving you stale guidance in June. If your vendor can't tell you their update cadence for regulatory changes, that's your answer.