will I burn in hell for using all VARCHAR(MAX)?

M

mike

so I keep optimizing my fields down to the minimum character length
necessary i.e., varchar(15), then I find out a month later its gotta
get bigger, then a few months later, bigger again, etc. Nowadays on
sql server 2005 and on, how bad is it really to use varchar(max)? Is
there really a big performance or storage hit or is it negligible?

-Mike
 
B

Bob Barrows [MVP]

mike said:
so I keep optimizing my fields down to the minimum character length
necessary i.e., varchar(15), then I find out a month later its gotta
get bigger, then a few months later, bigger again, etc. Nowadays on
sql server 2005 and on, how bad is it really to use varchar(max)? Is
there really a big performance or storage hit or is it negligible?

-Mike

Don't be lazy. There are several reasons for not doing this in the
comments for this blog article:
http://sqljunkies.com/WebLog/simons/archive/2006/02/28/Why_use_anything_but_varchar_max.aspx
 
A

Anthony Jones

Bob Barrows said:
Don't be lazy. There are several reasons for not doing this in the
comments for this blog article:
http://sqljunkies.com/WebLog/simons/archive/2006/02/28/Why_use_anything_but_varchar_max.aspx


Hmm.. Interesting. I'm not sure some of the commenters were making a
distinction between varchar(somevalue) and varchar(max)? They seemed to be
comparing varchar with char. Others were concerned unexpected growth but if
there is a cause for that in the lifetime of a DB then the alternative is
errors downstream where data entered doesn't fit in the defined field.
Personally I'd rather see things slow down instead of break.

I agree it does seem a bit lazy though.
 
A

Anthony Jones

Bob Barrows said:
I'm not sure what you were looking at. all the comments seemed to be on
target.

The final comment by werner de jong seems to be entirely based on the
difference between char and varchar.

Also others references performance issues. I'm wondering where there is a
cost in performance between varchar(x) and varchar(max)?

Could be that I'm ignorant of some new details of 2005 internals. What is
the difference internally in the way these fields are handled.

Belsteak says 'unpredictable growth of the DB' and 'unwanted results in
column', what does that mean??

Anthony.
 
B

Bob Barrows [MVP]

Anthony said:
The final comment by werner de jong seems to be entirely based on the
difference between char and varchar.

No, I believe he was talking about the impact of having many varchar columns
as opposed to a few. He is likely assuming that at least some of the
varchar(max) columns could be char, since the blog post he was replying to
made no reference to leaving existing char columns alone. Or using
varchar(max) to store other data types.
Also others references performance issues. I'm wondering where there
is a
cost in performance between varchar(x) and varchar(max)?

I think Adam Machanic's article covers this: "Remember that the query
optimizer uses column size as one of the many metrics for determining
optimal query plans. Given this table, the optimizer would have very few
options in that regard. "
Could be that I'm ignorant of some new details of 2005 internals.
What is
the difference internally in the way these fields are handled.

Again, from Adam's article:
The MAX data types, by default, use a hybrid of the TEXT/IMAGE overflow
behavior and the behavior of the normal (sized) VARCHAR/VARBINARY types. If
a column's data, plus the data in all of the other columns in the table, has
a total size of less than 8060 bytes, the data is stored in-row. If the data
exceeds 8060 bytes, the data in the MAX column will be stored off-row.
Belsteak says 'unpredictable growth of the DB' and 'unwanted results
in
column', what does that mean??

With varchar(max) one cannot be sure how much data is going to be stored in
it. Thus database growth cannot be predicted, making any forecasts of
database size meaningless. Also, every time a record's size exceeds a page
(8000 b), the record gets split. Every time the split occurs, performance
suffers, both for maintaining and for reading the data. Since we are not
controlling how much data gets stored, we cannot predict how many splits
will occur. "Unwanted results" I believe is referring to the lack of control
of data size removing one key validation method for preventing unwanted
data.

Many of these arguments are the same ones used to counter the old "why not
make every column a varchar(8000) column" proposal in the SQL2000 days.
 
A

Anthony Jones

Bob Barrows said:
No, I believe he was talking about the impact of having many varchar columns
as opposed to a few. He is likely assuming that at least some of the
varchar(max) columns could be char, since the blog post he was replying to
made no reference to leaving existing char columns alone. Or using
varchar(max) to store other data types.

I see. I didn't read it that way. It seems to me the blogger is talking
about varchar(n) vs varchar(max).
I think Adam Machanic's article covers this: "Remember that the query
optimizer uses column size as one of the many metrics for determining
optimal query plans. Given this table, the optimizer would have very few
options in that regard. "

That's a good point. Statistics are not by default built for varchar(max)
fields but I believe they can be specified.
Again, from Adam's article:
The MAX data types, by default, use a hybrid of the TEXT/IMAGE overflow
behavior and the behavior of the normal (sized) VARCHAR/VARBINARY types. If
a column's data, plus the data in all of the other columns in the table, has
a total size of less than 8060 bytes, the data is stored in-row. If the data
exceeds 8060 bytes, the data in the MAX column will be stored off-row.

Cool. A pragmatic solution. I like.
With varchar(max) one cannot be sure how much data is going to be stored in
it. Thus database growth cannot be predicted, making any forecasts of
database size meaningless.

Using varchar(max) doesn't mean you haven't got a reasonable idea of the
size of data going into the field. It means you don't want to limit the size
that might end up in there. I'm being devils adovacate here, for small
fields, titles, descriptions, names etc I wouldn't use varchar(max) either
(I agreed it is lazy) but for anything over say 512 I would have been
tempted but:-
Also, every time a record's size exceeds a page
(8000 b), the record gets split. Every time the split occurs, performance
suffers, both for maintaining and for reading the data. Since we are not
controlling how much data gets stored, we cannot predict how many splits
will occur.

Ouch. Yes that's a killer reason.
"Unwanted results" I believe is referring to the lack of control
of data size removing one key validation method for preventing unwanted
data.

Good point.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top