1

I'm developing a program and ran into a bug where inserting a value in a tables column, that has the type int, and the value is larger than Integer.MAX_VALUE it spits out an error saying the number is too large. I read that the fix for this is to quite simply just alter the table to BigInt and that should fix it. But that made me thinking, why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?

Wouldn't this almost completely eliminate an error like mine occurring when you're not sure of whats going to be inserted, especially if it's based off of users input? Is there any cons into just using the largest possible type you need for the columns? Would the table size be bigger even if you just "2" in a big int column (even though that would work with int?). Is there a performance loss?

Thanks!

4
  • 2
    Which database are you using? Commented Sep 2, 2017 at 3:24
  • 1
    Please edit your question tags and label with the actual database you are using. Commented Sep 2, 2017 at 3:42
  • @TimBiegeleisen is right, there is no general answer to this sort of question, only various database-specific ones. Commented Sep 2, 2017 at 3:47
  • varchar(255) is most definitely not the "maximum column size" for varchar columns. Commented Jul 23, 2019 at 13:18

3 Answers 3

4

For Varchar, the reason you generally don't just use MAX is because it stores it differently and puts limitations on your index maintenance operations. For instance, you cannot rebuild an index "online" with a varchar(max) field on it. While there's a little hand waving involved, basically varchar(max) data gets stored off row so there's overhead in maintaining that extra data store.

For numeric types, the main thing is space. Bigint is an 8 byte signed integer whereas an int is only 4 bytes. If you dont need a space bigger than 2.4 billion, that's just wasted space (and often a lot of it if you have, say, 2.4 billion rows of data).

Data Compression can solve some of those issues, but not without the cost of having to de-compress the data when it's queried.

So the reasons are varied, but with the possible exception of using larger size varchars (not varchar(max)), picking the "right" data type for your data is just a good idea.

Sign up to request clarification or add additional context in comments.

9 Comments

Thank you for the information! What do you suggest I do? I need to somehow store numbers larger than int in my table, but not always (maybe 1/20 times).
Do you think it might be a good idea to find out which database the OP is using before posting an answer?
If you need big int, you need big int. THere's nothing wrong with using that if you need it. You just dont want to automatically use bigint when you dont. If you need REALLY big numbers, you might look at floats
@TimBiegeleisen do you think the advice is going to change?
MySQL doesn't even have a varchar max, and I'll bet the answer for each database would have substantial differences.
|
2

I can't speak to any RDBMS other than SQL Server (but I imagine this applies to all of them)... A BIG INT takes up twice as much space as an INT... which means less data fitting onto a page meaning less data in cache meaning slower performance.

In SQL Server there are actually 4 INT types:

TINYINT (1 byte),

SMALLINT (2 bytes),

INT (4 bytes),

BIGINT (8 bytes).

A good database developer will put very careful thought into choosing the proper data type based on the data that's expected to be put in the column. Aside from the issue of storage space, data types function as data constraints. So if I choose TINYINT as my data type, that means I only expect to see values between 0 and 255 and will reject anything that falls outside of that range.

If a coworker were to submit a table design with all VARCHAR(255) & BIGINTs, I'd reject it and have them size everything appropriately. It's lazy thinking like that, that causes huge problem on the DB side of the house.

4 Comments

I understand the situation now. When you mentioned a good database developer evaluating the possible situations and picking the best outcome what if he's in my situation? Where I can't for certain tell you the output, but I know there has been cases from my own testing where (lets say 1/20 times) the code tries to insert something greater than Integer.MAX_VALUE. Would the developer just go to BIGINT or do something sneaky like storing a varchar with less of a length than a bigint, and then converting it to a string or vice versa?
Before I started changing data types I'd at least try to figure out where the offending value came from and make sure it's a legitimate value. Given that the INT data type can hold values ranging from -2,147,483,647 to 2,147,483,647, something falling outside that range will at least raise an eyebrow. If my program is simply generating random numbers, I'd lean toward narrowing the allowable range than switching to BIGINT. Also, I'm not saying that BIGINT is never appropriate. I'm simply saying, only use it when it is appropriate.
If BIGINT is appropriate in the particular case you're working on, then by all means use it... But... That wasn't the point of your OP. You posed the question, "why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?" And that was the question I was addressing.
@Nicster15 I totally agree with Jason A. Long. Having said this, I'd like to add the following aspect: Obviously you have been surprised that your code tries to insert values that don't fit into an INT. That indicates that it doesn't check user input the right way, or that it contains other bugs. Thus, even if you change to BIGINT, there are still chances that your code will try to insert values which even don't fit into that. So I would advise you to examine your application until you have fully understood the cause of the problem and make your decision after doing so.
2

why don't all programmers just use the max column values (such as Varchar(255), BigInt, etc.) rather than something smaller like Varchar(30) or Int?

Some do exactly that. It's also not at all uncommon to see developers store numeric or date/time values in varchar columns too.

I often see performance and storage costs called out as the reason not to do this. Those are considerations (which vary by DBMS) but a more important one in the world of relational databases is data integrity. The chosen datatype is a critical part of the data model because it determines the domain of data that can be stored. On top of that, relational databases provide check, referential, and NULL constraints to further limit column values.

Wouldn't this almost completely eliminate an error like mine occurring when you're not sure of whats going to be inserted, especially if it's based off of users input?

Of course, but why stop at a 64-bit integer? Why not NUMERIC(1000)? That's a rhetorical question to point out that one must know about the business domain so data can be properly modeled and validation rules enforced. A 64-bit integer is certainly overkill to store a person's number of children but you may end up with a value of several billion due to careless data entry. The column data type is the last defense for bad data and is especially important when it's based off of users input.

All that being said, one can use a RDBMS as nothing more than a dumb storage engine and enforce data integrity rules (if any) in application code. In that case, storage and performance are the only consideration.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.