MySQL/PostgreSQL Column Sizes, Why?

Question

I'm developing a program and ran into a bug where inserting a value in a tables column, that has the type int, and the value is larger than Integer.MAX_VALUE it spits out an error saying the number is too large. I read that the fix for this is to quite simply just alter the table to BigInt and that should fix it. But that made me thinking, why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?

Wouldn't this almost completely eliminate an error like mine occurring when you're not sure of whats going to be inserted, especially if it's based off of users input? Is there any cons into just using the largest possible type you need for the columns? Would the table size be bigger even if you just "2" in a big int column (even though that would work with int?). Is there a performance loss?

Thanks!

Please edit your question tags and label with the actual database you are using. — Tim Biegeleisen
– Tim Biegeleisen, Commented Sep 2, 2017 at 3:42
@TimBiegeleisen is right, there is no general answer to this sort of question, only various database-specific ones. — mu is too short
– mu is too short, Commented Sep 2, 2017 at 3:47
varchar(255) is most definitely not the "maximum column size" for varchar columns. — user330315
– user330315, Commented Jul 23, 2019 at 13:18

Xedni · Accepted Answer · 2017-09-02 03:18:03Z

4

For Varchar, the reason you generally don't just use MAX is because it stores it differently and puts limitations on your index maintenance operations. For instance, you cannot rebuild an index "online" with a varchar(max) field on it. While there's a little hand waving involved, basically varchar(max) data gets stored off row so there's overhead in maintaining that extra data store.

For numeric types, the main thing is space. Bigint is an 8 byte signed integer whereas an int is only 4 bytes. If you dont need a space bigger than 2.4 billion, that's just wasted space (and often a lot of it if you have, say, 2.4 billion rows of data).

Data Compression can solve some of those issues, but not without the cost of having to de-compress the data when it's queried.

So the reasons are varied, but with the possible exception of using larger size varchars (not varchar(max)), picking the "right" data type for your data is just a good idea.

answered Sep 2, 2017 at 3:18

Xedni

4,7913 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Nicster15 Over a year ago

Thank you for the information! What do you suggest I do? I need to somehow store numbers larger than int in my table, but not always (maybe 1/20 times).

Tim Biegeleisen Over a year ago

Do you think it might be a good idea to find out which database the OP is using before posting an answer?

Xedni Over a year ago

If you need big int, you need big int. THere's nothing wrong with using that if you need it. You just dont want to automatically use bigint when you dont. If you need REALLY big numbers, you might look at floats

Xedni Over a year ago

@TimBiegeleisen do you think the advice is going to change?

Tim Biegeleisen Over a year ago

MySQL doesn't even have a varchar max, and I'll bet the answer for each database would have substantial differences.

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

I can't speak to any RDBMS other than SQL Server (but I imagine this applies to all of them)... A BIG INT takes up twice as much space as an INT... which means less data fitting onto a page meaning less data in cache meaning slower performance.

In SQL Server there are actually 4 INT types:

TINYINT (1 byte),

SMALLINT (2 bytes),

INT (4 bytes),

BIGINT (8 bytes).

A good database developer will put very careful thought into choosing the proper data type based on the data that's expected to be put in the column. Aside from the issue of storage space, data types function as data constraints. So if I choose TINYINT as my data type, that means I only expect to see values between 0 and 255 and will reject anything that falls outside of that range.

If a coworker were to submit a table design with all VARCHAR(255) & BIGINTs, I'd reject it and have them size everything appropriately. It's lazy thinking like that, that causes huge problem on the DB side of the house.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Sep 2, 2017 at 3:29

Jason A. Long

4,4421 gold badge14 silver badges18 bronze badges

4 Comments

Nicster15 Over a year ago

I understand the situation now. When you mentioned a good database developer evaluating the possible situations and picking the best outcome what if he's in my situation? Where I can't for certain tell you the output, but I know there has been cases from my own testing where (lets say 1/20 times) the code tries to insert something greater than Integer.MAX_VALUE. Would the developer just go to BIGINT or do something sneaky like storing a varchar with less of a length than a bigint, and then converting it to a string or vice versa?

Jason A. Long Over a year ago

Before I started changing data types I'd at least try to figure out where the offending value came from and make sure it's a legitimate value. Given that the INT data type can hold values ranging from -2,147,483,647 to 2,147,483,647, something falling outside that range will at least raise an eyebrow. If my program is simply generating random numbers, I'd lean toward narrowing the allowable range than switching to BIGINT. Also, I'm not saying that BIGINT is never appropriate. I'm simply saying, only use it when it is appropriate.

Jason A. Long Over a year ago

If BIGINT is appropriate in the particular case you're working on, then by all means use it... But... That wasn't the point of your OP. You posed the question, "why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?" And that was the question I was addressing.

Binarus Over a year ago

@Nicster15 I totally agree with Jason A. Long. Having said this, I'd like to add the following aspect: Obviously you have been surprised that your code tries to insert values that don't fit into an INT. That indicates that it doesn't check user input the right way, or that it contains other bugs. Thus, even if you change to BIGINT, there are still chances that your code will try to insert values which even don't fit into that. So I would advise you to examine your application until you have fully understood the cause of the problem and make your decision after doing so.

Dan Guzman · Accepted Answer · 2017-09-02 11:27:01Z

why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?

Some do exactly that. It's also not at all uncommon to see developers store numeric or date/time values in varchar columns too.

I often see performance and storage costs called out as the reason not to do this. Those are considerations (which vary by DBMS) but a more important one in the world of relational databases is data integrity. The chosen datatype is a critical part of the data model because it determines the domain of data that can be stored. On top of that, relational databases provide check, referential, and NULL constraints to further limit column values.

Wouldn't this almost completely eliminate an error like mine occurring when you're not sure of whats going to be inserted, especially if it's based off of users input?

Of course, but why stop at a 64-bit integer? Why not NUMERIC(1000)? That's a rhetorical question to point out that one must know about the business domain so data can be properly modeled and validation rules enforced. A 64-bit integer is certainly overkill to store a person's number of children but you may end up with a value of several billion due to careless data entry. The column data type is the last defense for bad data and is especially important when it's based off of users input.

All that being said, one can use a RDBMS as nothing more than a dumb storage engine and enforce data integrity rules (if any) in application code. In that case, storage and performance are the only consideration.

Collectives™ on Stack Overflow

MySQL/PostgreSQL Column Sizes, Why?

3 Answers 3

9 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related