This question came up recently in the Mastering Laravel community: What is the right collation and character set to pick for typical Laravel applications with MySQL?
The answer has changed over time, and it's a good question to revisit. And as the question frames it, this is purely from the perspective of MySQL. Other databases have their own defaults and considerations.
First, some definitions:
- A character set defines the set of characters you can use (
a
,b
,🤩
, and so on) and how they're encoded when stored and retrieved. - A collation defines how those characters in the character set are compared and sorted, like when using a
WHERE
clause orORDER BY
in a query.
If you're working with a modern version of MySQL (8.0 or newer), I'd recommend sticking with the default utf8mb4
character set and utf8mb4_0900_ai_ci
collation.
In the past, there were other character sets like utf8mb3
(also aliased as just utf8
) that are no longer used and will likely be removed in the future.
If you go back even further (5.7 and before), latin1
was the default, which is why you'd get weird artifacts when storing emojis or other non-Latin characters.
The utf8mb4
character set supports the full range of Unicode characters, so it's a reasonable default to stick with.
But what about the collation?
First, what does utf8mb4_0900_ai_ci
mean?
- All collations belong to a specific character set, so the
utf8mb4_
prefix is just saying that this collation is for theutf8mb4
character set. -
0900
is the version of the Unicode Collation Algorithm (UCA) that this collation is based on. This is getting too nerdy, turn back now. -
ai
stands for accent-insensitive, meaning that it treatsà
orä
the same asa
. -
ci
stands for case-insensitive, meaning that it treatsB
the same asb
.
So putting that all together, utf8mb4_0900_ai_ci
is a good default collation for most applications.
If you want to see what other collations are available for utf8mb4
, run this query:
SHOW COLLATION WHERE Charset = 'utf8mb4';
The list is big! But now that you know what the different sections of the name mean, you can probably make better sense of it. If you want to go deeper, the MySQL docs have you covered.
You may have valid reasons to pick a different collation for your application, but if not, stick with the defaults and save yourself some headaches.
Here to help,
Joel
P.S. This was just one of many interesting discussions from the Mastering Laravel community this week. You should check it out!