How Would you Store this Data?

Discussion in 'ASP .Net' started by Jonathan Wood, Feb 28, 2010.

  1. I'm thinking about building a site that would include user-contributed
    articles, file attachments, images, and maybe videos.

    I'm trying to determine the best way to store this data. Specifically:

    A) What are the pros and cons of storing the file attachments, images, and
    videos in a database vs. storing them as separate files?

    B) I've decided user-contributed articles should not be submitted as HTML.
    I'd have some code to convert newlines to <br /> and otherwise convert the
    articles to HTML. I'm thinking about separate database columns for both the
    original and the converted articles. The advantages would be that the site
    would be fast because no conversion would be needed when serving the
    articles, and it would be possible for a contributor to obtain the original
    if they wanted to edit it and resubmit. I'd be curious if others thought
    this was the best approach.

    I'm interested in hearing all ideas.

    Thanks for any suggestions

    --
    Jonathan Wood
    SoftCircuits Programming
    http://www.softcircuits.com
     
    Jonathan Wood, Feb 28, 2010
    #1
    1. Advertising

  2. Jonathan Wood

    Mr. Arnold Guest

    Jonathan Wood wrote:
    > I'm thinking about building a site that would include user-contributed
    > articles, file attachments, images, and maybe videos.
    >
    > I'm trying to determine the best way to store this data. Specifically:
    >
    > A) What are the pros and cons of storing the file attachments, images,
    > and videos in a database vs. storing them as separate files?
    >
    > B) I've decided user-contributed articles should not be submitted as
    > HTML. I'd have some code to convert newlines to <br /> and otherwise
    > convert the articles to HTML. I'm thinking about separate database
    > columns for both the original and the converted articles. The advantages
    > would be that the site would be fast because no conversion would be
    > needed when serving the articles, and it would be possible for a
    > contributor to obtain the original if they wanted to edit it and
    > resubmit. I'd be curious if others thought this was the best approach.
    >
    > I'm interested in hearing all ideas.


    Those are files and you need a control that does file uploads and
    downloads. The files should be kept on a file share or file share array.

    The database table would only have a record that has the name of the
    file, and the path to the file on the file share a pointer record.

    You need a viewer control so when the file is selected for viewing, it
    takes the file extension and makes a determination as to what
    application would be used to allow the viewing of the file. If is's a
    *.doc, then the viewer is going to select MS Word to view the file, etc,
    etc.

    The various application used to view a file will reside on the Web
    server like a file association that the Windows O/S does is most likely
    going to be invoked.

    That's how you do a document repository that's controlled by a Web
    application using a browser.
     
    Mr. Arnold, Feb 28, 2010
    #2
    1. Advertising

  3. On Feb 28, 4:18 am, "Jonathan Wood" <> wrote:
    > I'm thinking about building a site that would include user-contributed
    > articles, file attachments, images, and maybe videos.
    >
    > I'm trying to determine the best way to store this data. Specifically:
    >
    > A) What are the pros and cons of storing the file attachments, images, and
    > videos in a database vs. storing them as separate files?
    >
    > B) I've decided user-contributed articles should not be submitted as HTML..
    > I'd have some code to convert newlines to <br /> and otherwise convert the
    > articles to HTML. I'm thinking about separate database columns for both the
    > original and the converted articles. The advantages would be that the site
    > would be fast because no conversion would be needed when serving the
    > articles, and it would be possible for a contributor to obtain the original
    > if they wanted to edit it and resubmit. I'd be curious if others thought
    > this was the best approach.
    >
    > I'm interested in hearing all ideas.
    >
    > Thanks for any suggestions
    >
    > --
    > Jonathan Wood
    > SoftCircuits Programminghttp://www.softcircuits.com


    A) It depends on size of attachments and many other things. You should
    estimate how many users, files and traffic you will get. Imagine that
    your database would be 10 TB, are you sure that you can manage it? In
    most cases you will get an answer that going for a file share is the
    easiest and scalable way.

    B) Replacing <br> on-the-fly should not be a performance problem
     
    Alexey Smirnov, Feb 28, 2010
    #3
  4. Jonathan Wood

    Andy B. Guest

    "Jonathan Wood" <> wrote in message
    news:...
    > I'm thinking about building a site that would include user-contributed
    > articles, file attachments, images, and maybe videos.
    >
    > I'm trying to determine the best way to store this data. Specifically:
    >
    > A) What are the pros and cons of storing the file attachments, images, and
    > videos in a database vs. storing them as separate files?
    >

    There are lots of differences in storing in databases VS file system files.
    The main thing for me is if the files them selves are going to be somewhat
    on the huge size, consider file system storage. The good side of database
    storage is that your files/videos are secured by default (you have to do
    almost no extra work to keep them away from prying fingers) except the
    streaming factor from the database to the browser. In file system storage,
    you need to implement file security (and that can be quite a difficult area
    to deal with). Either way you decide, make sure not to put your files that
    will be accessed through the browser in the app_data folder. I tried this
    and coldn't figure out why nothing worked. Come to find out, it has special
    permissions on it preventing browser access of any kind.

    > B) I've decided user-contributed articles should not be submitted as HTML.
    > I'd have some code to convert newlines to <br /> and otherwise convert the
    > articles to HTML. I'm thinking about separate database columns for both
    > the original and the converted articles. The advantages would be that the
    > site would be fast because no conversion would be needed when serving the
    > articles, and it would be possible for a contributor to obtain the
    > original if they wanted to edit it and resubmit. I'd be curious if others
    > thought this was the best approach.
    >

    Converting from html to plain text on the fly shouldn't be a problem. Here
    is a good article that shows you how to do it with Regex.
    http://www.4guysfromrolla.com/webtech/042501-1.shtml

    If you don't want people sending articles and stuff in html format, there
    isn't any point in keeping the html version. If you don't be careful, it
    could leave room for hackers into your database. Just be careful to close
    all your holes in database access code if you do decide to keep the original
    html version. You also might want to consider scanning the incoming article
    text for any kind of scripting languages as well as any sql code. That way
    you disarm someone if the intent is to mess with your website. Wish I had
    some links for the scripting/sql scanning/removal code, but I use DotNetNuke
    5.2 which has the security features I talked about already built in. Any
    html/scripting/sql code will be rendered as plain text. The link above just
    removes the html/xml code alltogether.
     
    Andy B., Feb 28, 2010
    #4
  5. Alexey Smirnov wrote:

    > A) It depends on size of attachments and many other things. You should
    > estimate how many users, files and traffic you will get. Imagine that
    > your database would be 10 TB, are you sure that you can manage it? In
    > most cases you will get an answer that going for a file share is the
    > easiest and scalable way.


    What do you mean by "are you sure that you can manage it?" What part of
    managing do you question if I can do? (Sure not disk space as that would be
    needed if I used files as well.)

    > B) Replacing <br> on-the-fly should not be a performance problem


    There would be other changes too, which could possibly include color-coding
    source code.

    Thanks.

    --
    Jonathan Wood
    SoftCircuits Programming
    http://www.softcircuits.com
     
    Jonathan Wood, Feb 28, 2010
    #5
  6. Andy B. wrote:

    > There are lots of differences in storing in databases VS file system
    > files. The main thing for me is if the files them selves are going to be
    > somewhat on the huge size, consider file system storage. The good side of
    > database storage is that your files/videos are secured by default (you
    > have to do almost no extra work to keep them away from prying fingers)
    > except the streaming factor from the database to the browser. In file
    > system storage, you need to implement file security (and that can be quite
    > a difficult area to deal with). Either way you decide, make sure not to
    > put your files that will be accessed through the browser in the app_data
    > folder. I tried this and coldn't figure out why nothing worked. Come to
    > find out, it has special permissions on it preventing browser access of
    > any kind.


    Thanks for the last tip.

    I can't think of any case where access to the files would be restricted to
    anybody--they would be available to all anonymous users. If that's what you
    meant by "secured", then that won't be an issue.

    >> B) I've decided user-contributed articles should not be submitted as
    >> HTML. I'd have some code to convert newlines to <br /> and otherwise
    >> convert the articles to HTML. I'm thinking about separate database
    >> columns for both the original and the converted articles. The advantages
    >> would be that the site would be fast because no conversion would be
    >> needed when serving the articles, and it would be possible for a
    >> contributor to obtain the original if they wanted to edit it and
    >> resubmit. I'd be curious if others thought this was the best approach.
    >>

    > Converting from html to plain text on the fly shouldn't be a problem. Here
    > is a good article that shows you how to do it with Regex.
    > http://www.4guysfromrolla.com/webtech/042501-1.shtml


    Thanks for the link, but the articles won't initially have HTML. They might
    be closer to Wikipedia articles that have their own coding, which must then
    be converted to HTML. Also, as I mentioned elsewhere, my conversion might be
    fairly involved and include color coding source code and other processing.

    > html version. You also might want to consider scanning the incoming
    > article text for any kind of scripting languages as well as any sql code.
    > That way you disarm someone if the intent is to mess with your website.
    > Wish I had some links for the scripting/sql scanning/removal code, but I
    > use DotNetNuke 5.2 which has the security features I talked about already
    > built in. Any html/scripting/sql code will be rendered as plain text.


    Right, that's the main reason I don't want to accept articles that already
    have HTML. (Any existing HTML not removed on review will simply be
    HTML-encoded.)

    Thanks.

    Jon
     
    Jonathan Wood, Feb 28, 2010
    #6
  7. Well, that doesn't add much to the discuss about the pros and cons of
    storing as files vs. storing in the database, but thanks for your comments.

    Jon

    "Mr. Arnold" <> wrote in message
    news:...
    > Jonathan Wood wrote:
    >> I'm thinking about building a site that would include user-contributed
    >> articles, file attachments, images, and maybe videos.
    >>
    >> I'm trying to determine the best way to store this data. Specifically:
    >>
    >> A) What are the pros and cons of storing the file attachments, images,
    >> and videos in a database vs. storing them as separate files?
    >>
    >> B) I've decided user-contributed articles should not be submitted as
    >> HTML. I'd have some code to convert newlines to <br /> and otherwise
    >> convert the articles to HTML. I'm thinking about separate database
    >> columns for both the original and the converted articles. The advantages
    >> would be that the site would be fast because no conversion would be
    >> needed when serving the articles, and it would be possible for a
    >> contributor to obtain the original if they wanted to edit it and
    >> resubmit. I'd be curious if others thought this was the best approach.
    >>
    >> I'm interested in hearing all ideas.

    >
    > Those are files and you need a control that does file uploads and
    > downloads. The files should be kept on a file share or file share array.
    >
    > The database table would only have a record that has the name of the file,
    > and the path to the file on the file share a pointer record.
    >
    > You need a viewer control so when the file is selected for viewing, it
    > takes the file extension and makes a determination as to what application
    > would be used to allow the viewing of the file. If is's a *.doc, then the
    > viewer is going to select MS Word to view the file, etc, etc.
    >
    > The various application used to view a file will reside on the Web server
    > like a file association that the Windows O/S does is most likely going to
    > be invoked.
    >
    > That's how you do a document repository that's controlled by a Web
    > application using a browser.
    >
    >
    >
     
    Jonathan Wood, Feb 28, 2010
    #7
  8. Jonathan Wood

    Andy B. Guest

    "Jonathan Wood" <> wrote in message
    news:...
    > Andy B. wrote:
    >
    >> There are lots of differences in storing in databases VS file system
    >> files. The main thing for me is if the files them selves are going to be
    >> somewhat on the huge size, consider file system storage. The good side of
    >> database storage is that your files/videos are secured by default (you
    >> have to do almost no extra work to keep them away from prying fingers)
    >> except the streaming factor from the database to the browser. In file
    >> system storage, you need to implement file security (and that can be
    >> quite a difficult area to deal with). Either way you decide, make sure
    >> not to put your files that will be accessed through the browser in the
    >> app_data folder. I tried this and coldn't figure out why nothing worked.
    >> Come to find out, it has special permissions on it preventing browser
    >> access of any kind.

    >
    > Thanks for the last tip.
    >
    > I can't think of any case where access to the files would be restricted to
    > anybody--they would be available to all anonymous users. If that's what
    > you meant by "secured", then that won't be an issue.
    >


    That's what I originally meant. The thing you would want to decide is this:
    1. Do you want the files to be accessed directly from a URL? Something like
    www.yourDomain.com/Filename.mv4. This would allow anyone to bookmark/put it
    in their favorites. Depending, this [might] cause some possible security
    issues but I wouldn't know at this point what ones they would be. or
    2. Have the website be a proxy for the files themselves. Possibly have a
    generic handler dealing with the files directly. This would give you better
    control on what happens to the files as they go to the browser stream.
    Meaning you can tell the server what to do with the files before they are
    downloaded. A good example would be when you see an mv4 file that is 300MB
    in size on a website, you download it but find out it was actually zipped
    and is now 200MB instead of the original uncompressed video. I want to use
    this idea for my rss feeds sometime down the road. or
    3. Database option. The link would be something like
    www.yourDomain.com/fileID=12948&compress=1. This would get the file from the
    database and set a query string flag to let you compress the file (if you
    wanted to provide that feature). Database option for files has some
    drawbacks:
    A. It takes more load on the server especially for larger files because the
    webserver has to make 2 trips. 1 to the database to get the file stream and
    1 to the browser to give the user their content. All of a sudden, a 300MB
    file just used 600MB of bandwidth. If you have to worry about bandwidth with
    huge files, this might not be for you.
    B. You have to write the code to get the files (binary) out of the database,
    figure out how to temporarly store it (if you don't just stream it directly)
    and then how you are going to get it to the user.
    - Are you going to directly stream it with Response.Write? or
    - Write it to the filesystem and generate a page with a link on it for
    download.
     
    Andy B., Mar 1, 2010
    #8
  9. On Feb 28, 4:30 pm, "Jonathan Wood" <> wrote:
    > Alexey Smirnov wrote:
    > > A) It depends on size of attachments and many other things. You should
    > > estimate how many users, files and traffic you will get. Imagine that
    > > your database would be 10 TB, are you sure that you can manage it? In
    > > most cases you will get an answer that going for a file share is the
    > > easiest and scalable way.

    >
    > What do you mean by "are you sure that you can manage it?" What part of
    > managing do you question if I can do? (Sure not disk space as that would be
    > needed if I used files as well.)
    >
    > > B) Replacing <br> on-the-fly should not be a performance problem

    >
    > There would be other changes too, which could possibly include color-coding
    > source code.
    >
    > Thanks.
    >
    > --
    > Jonathan Wood
    > SoftCircuits Programminghttp://www.softcircuits.com


    Managing large database could be a challenge. It's not only about disk
    space but also hardware and software requirements, and technical
    personnel. The database server for a large database would need more
    support than a file share.

    I have a project with a small 1GB SQL Server database where I have
    about 20,000 small pictures (~50-100 Kb). It works well. This way was
    chosen because of data replication over multiple web and SQL Servers,
    easy backup and security. In order to decrease the load on the
    database and to improve the system's performance, I use pre-cached
    files for most requested pictures.
     
    Alexey Smirnov, Mar 1, 2010
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    1,162
  2. Alexander
    Replies:
    620
    Views:
    8,484
    Seebs
    Nov 9, 2010
  3. Alexander
    Replies:
    68
    Views:
    1,546
  4. Ramon F Herrera
    Replies:
    3
    Views:
    6,536
    Cambridge Ray
    Jul 27, 2011
  5. Ryan Macy
    Replies:
    4
    Views:
    110
    Jan Svitok
    Jul 27, 2006
Loading...

Share This Page