Primitive Obsession is a code smell. Consider this code:
type User struct {
FirstName string
LastName string
Email string
}
We’re using the type string
everywhere. If we wanted a function to create a new user, its signature
would be obtuse:
func NewUser(first, last, email string) User { ... }
It’s easy to get confused and swap first and last name, or worse. The smell is called “primitive obsession” because we tend to use primitive types such as strings or integers for everything.
Luckily, in Go it’s easy to define custom types that behave like a primitive, but are considered distinct by the compiler:
type FirstName string
type LastName string
type EmailAddress string
type User struct {
First FirstName
Last LastName
Email EmailAddress
}
func NewUser(
first FirstName,
last LastName,
email EmailAddress,
) User {
...
}
Now it’s much more difficult to swap first and last name, or pass a name in place of an email address. This code does not compile:
func main() {
first := FirstName("Jane")
last := LastName("Doe")
email := EmailAddress("jane@example.com")
NewUser(last, first, email) // type error!
}
And it’s much more expressive: if you are looking to express the model of the domain in your code, as suggested by DDD, this is the way to go.
*
But wait, there’s more! Does it make sense for an email address to be one million characters long? Or to be empty? Or to consist entirely of whitespace? Probably not, yet our EmailAddress
type allows it. We should do something about it; we should limit the minimum and maximum length of a EmailAddress
, and probably also restrict which characters it may contain. As a minimum, an EmailAddress
must necessarily contain an @
sign.
Why go to this trouble? On the one hand, is to prevent other parts of our system to break; on the other hand, it’s also a matter of security. If we’re building our data from input coming from the outside world, we should prevent attackers from trying to cram an attack vector inside an EmailAddress
; this is a great learning I got from the Secure By Design book. Make sure every data field is validated for length and content.
The first step is to write a function that will create a new EmailAddress
only if the input string passes some basic checks.
// thanks https://stackoverflow.com/a/201447/164802
var validEmailAddress = regexp.MustCompile("^\\S+@\\S+.\\S+$")
func NewEmailAddress(s string) (EmailAddress, error) {
const maxLength = 320 // RFC 5321 and RFC 5322
const minLength = 5
if len(s) < minLength || len(s) > maxLength {
return nil, errors.New("invalid EmailAddress length")
}
if !validEmailAddress.MatchString(s) {
return nil, errors.New("invalid EmailAddress")
}
return EmailAddress(s), nil
}
The first thing we do is validate the input string length; this check is fast and cheap. If this passes, we validate the input string against a regexp. This regexp is simplistic, but it does its job.
Why two separate checks? Couldn’t we check for length in the regexp? We could, but it would be riskier; matching a regexp against a possibly very long string (remember, this input string could be crafted by an attacker to be 100K characters long) is certainly computionally more expensive than just checking for length beforehand. And we, or a future maintainer, could bungle the regexp, as regexps can be tricky. A length check is cheap and very hard to get wrong.
Now we have a function that will only create an EmailAddress
if the input string is valid, but nothing is preventing other parts of our program to create an EmailAddress
directly with EmailAddress(str)
. We want to make sure that the only way to create an EmailAddress
, outside of this package, is the NewEmailAddress
function.
The way to go is to change the definition of type EmailAddress
to be an interface, and define a non-exported type emailAddress
to represent an email address concretely.
// An EmailAddress is something that can be represented
// as a string
type EmailAddress interface {
String() string
}
// The actual implementation of an email address is a
// wrapper type around string
type emailAddress string
// Make emailAddress implement the EmailAddress interface
func (e emailAddress) String() string {
return string(e)
}
func NewEmailAddress(s string) (EmailAddress, error) {
// ...
// we build a non-exported, lowercase emailAdress
return emailAddress(s), nil
}
Now we’re good, aren’t we? We are not leaking any information about how the EmailAddress
is implemented internally. Unfortunately, the interface EmailAddress
can be implemented by anything that has a String() string
method… which defeats our initial idea that an EmailAddress
can be created only through our exported NewEmailAddress
function.
The trick is to extend the EmailAddress
interface in a way that can not be implemented by any other type defined elsewhere.
type EmailAddress interface {
String() string
// Ensure not anyone anywhere can create an EmailAddress
implementsEmailAddress()
}
// Make emailAddress implement EmailAddress
func (e emailAddress) implementsEmailAddress() {
}
Now the EmailAddress
interface contains an unexported method implementsEmailAddress
that cannot be defined outside our package. This method does not do anything, except ensuring that only our package can define an EmailAddress
. Thanks Brett Slatkin for this tip!
*
In summary:
- Make your code more expressive
- Avoid using strings and integers directly. Wrap them with domain-specific small types.
- Make your code more safe and secure
- Ensure these types can only be created through a validating function
- If the underlying type is a string, check for length first; then check for valid structure and characters.
See a complete working example from my TodoMVC example code.
Want to leave a comment? Please do so on Linkedin!