I have a lot of thoughts on validation. Recently, someone asked me about how Laravel handles validating string length when multibyte characters are used, and I thought I'd share my answer here with everyone.
First, a little context. What is a multibyte character?
The original set of ASCII characters that computers used could be represented in a single byte of data, but that only allowed 256 possible characters.
That got us the Roman alphabet, numbers, some symbols, and a handful of accented characters, but written language is more complex than that.
And then along come emoji and UTF-8 encoding, which use more than one byte to represent a single character. Here are some examples of 2-byte, 3-byte, and 4-byte characters.
With that background out of the way, let's get back to the Laravel question.
Laravel's max
rule handles strings, numbers, arrays, and files differently.
So when we say max:255
on a piece of string data, what precisely does it mean?
It means 255 characters, not 255 bytes. So it doesn't matter if you're using ASCII characters or fancy 4-byte ancient Sumerian cuneiform symbols. A character is a character, regardless of how many bytes it takes to represent it.
Laravel has our back, and handles this the way we probably intended. Now just make sure your database is also configured properly to store multibyte characters.
I'll leave you with a puzzle: When is one time you probably would NOT want max
to treat a multibyte character as a single character?
Send me a reply if you think you know the answer.
Here to help,
Joel
P.S. Imagine how much cleaner your data would be if you read our validation book!