1

Which is good practice? To store data as a comma separated list in the database or have multiple rows?

I have a table for accounts, classes, and enrolments. If the enrolment table has 3 fields: ID, AccountID and ClassID, is it better for ClassID to be a varchar containing a comma separated list such as this: "24,21,182,12" or for it to be just an int and have one entry per enrolment?

1
  • 3
    Proper design would have you create a lookup table that contains the UID of the account and the class, basically creating a map. The comma-separated list method is too unstable (you can never know how long the data will be). Commented Aug 6, 2012 at 18:05

4 Answers 4

3

tldr: Don't do this. That is, don't use a "packed array" here.

Use a correctly normalized design with "multiple rows". This is likely a good candidate for a Many-to-Many relationship. Consider this structure:

Classes 1:M Enrollments(Class,Student) M:1 Students

Following a properly normalized design will reduce pain. In addition, here are some other advantages:

  • Referential integrity (use InnoDB)
    • Consistent model described with relationships
    • Type enforcement (can't have "foo,,")
  • JOIN and query without needing custom code
    • "What are the names of the students in class A?"
    • "Who is taking more than one class?"
    • Columns can be useful indexed (query performance)
    • Generally faster than handling locally in code
  • More flexible and consistent
    • Can attach attributes to enrollments such as status
    • No need to have code to handle serialization at access sites
    • More accommodating of placeholders and ORMs
Sign up to request clarification or add additional context in comments.

10 Comments

+1 - eventually you'll regret storing multiple values in a single column. It's just a bad idea all around.
If you know with certainty that the list will only ever be required as a complete query, why would you want to impose the performance issues of a JOIN? (serious question, not being snarky)
@kingjeffrey it's pretty clear that the comma-separated string is a list of IDs, which will have to be queried again in order to find what they refer to.
@kingjeffrey What "performance issue"? Modern relational databases eat this sort of query up for breakfast. Yumm!
|
3

Never ever ever cram multiple values into a single database field by combining them with some sort of delimiter, like a comma, or fixed length substrings. In the rare cases where this clearly gives a benefit in storage requirements or performance ... see rule #1: never ever ever. Ever.

When you cram multiple values into a single field, you sabatague all the clever features built into the database engine to help you retrieve and manipulate values.

Like let's say you have this -- I guess it's some sort of student database.

Plan A

student (student_id, account_id, class_id_mash)

Plan B

student (student_id, account_id)
student_class (student_id, class_id)

Okay, lets' say you want a list of all the students taking class #27. With Plan B you write

select student_id
from student join student_class on student.student_id=student_class.student_id
where class_id=27

Easy.

How would you do it with Plan A? You might think

select student_id
from student
where class_id_mash like '%27%'

But that will not only find all students in class 27, but also all those in class 127 or 272.

Okay, how about:

select student_id
from student
where class_id_mash like '%,27,%'

There, now we won't find 127 or 272! But, oops, we also won't find it if the 27 happens to be the first or last one in the list, because then there aren't commas on both sides.

So okay, maybe we could get around that with more rules about delimiters or with a more complex matching expression. But it would be unnecessariliy complex and painful.

And even if we did it, every search for class id has to be a full-fill sequential search. With one value per field and multiple records, you can create an index on the class_id field for fast, efficient retrieval. (Some database engines have ways to index into the middle of text fields, but again, why get into complicated solutions when there's an easy solution?)

How do we validate the class_id's? With separate fields, we can say "class_id references class" and the database engine will insure that we don't enter an illegal value. With the mash, no such free validation.

Comments

2

I have done both, but instead of storing the information in the database as comma seperated, I use another delimiter, such as | (so that I don't worry about formatting on insert into db). Its more about how often you will query the data

3 Comments

If you use commas, you can also use str_getcsv to easily separate the csv –including escapes– assuming the original content was escaped as the csv format allows for.
@kingjeffrey is there a problem with using | though?
If the delimiter could feasibly be used in the content, you will need to escape it. As such, there is no significant benefit to using any particular delimiter (| or ,). The sole exception being that , is the default delimiter and more likely to be immediately understood by someone trying to understand the content of the field.
1

If you are only going to need the complete list, it is fine to store it as a comma separated value. But if you need to query the list, they should be stored separately.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.