logo
podcast Podcast
get help Get Unstuck

Which collation and character set should I pick?

The answer has changed over time

Joel Clermont
Joel Clermont
2024-12-06

This question came up recently in the Mastering Laravel community: What is the right collation and character set to pick for typical Laravel applications with MySQL?

The answer has changed over time, and it's a good question to revisit. And as the question frames it, this is purely from the perspective of MySQL. Other databases have their own defaults and considerations.

First, some definitions:

  • A character set defines the set of characters you can use (a, b, 🤩, and so on) and how they're encoded when stored and retrieved.
  • A collation defines how those characters in the character set are compared and sorted, like when using a WHERE clause or ORDER BY in a query.

If you're working with a modern version of MySQL (8.0 or newer), I'd recommend sticking with the default utf8mb4 character set and utf8mb4_0900_ai_ci collation.

In the past, there were other character sets like utf8mb3 (also aliased as just utf8) that are no longer used and will likely be removed in the future. If you go back even further (5.7 and before), latin1 was the default, which is why you'd get weird artifacts when storing emojis or other non-Latin characters.

The utf8mb4 character set supports the full range of Unicode characters, so it's a reasonable default to stick with. But what about the collation?

First, what does utf8mb4_0900_ai_ci mean?

  • All collations belong to a specific character set, so the utf8mb4_ prefix is just saying that this collation is for the utf8mb4 character set.
  • 0900 is the version of the Unicode Collation Algorithm (UCA) that this collation is based on. This is getting too nerdy, turn back now.
  • ai stands for accent-insensitive, meaning that it treats à or ä the same as a.
  • ci stands for case-insensitive, meaning that it treats B the same as b.

So putting that all together, utf8mb4_0900_ai_ci is a good default collation for most applications.

If you want to see what other collations are available for utf8mb4, run this query:

SHOW COLLATION WHERE Charset = 'utf8mb4';

The list is big! But now that you know what the different sections of the name mean, you can probably make better sense of it. If you want to go deeper, the MySQL docs have you covered.

You may have valid reasons to pick a different collation for your application, but if not, stick with the defaults and save yourself some headaches.

Here to help,

Joel

P.S. This was just one of many interesting discussions from the Mastering Laravel community this week. You should check it out!

Toss a coin in the jar if you found this helpful.
Want a tip like this in your inbox every weekday? Sign up below 👇🏼
email
No spam. Only real-world advice you can use.