extracting data from a file

Discussion in 'C Programming' started by Tony Clarke, Jul 1, 2003.

  1. Tony Clarke

    Tony Clarke Guest

    Hi All,

    I have been trying to extract data from a text file using the fscanf()
    functions and sscanf() functions. The file is of various characters and
    integers separated by semicolons, the problem I'm having is that each line
    is of varying length and the fields separated by semicolons are of varying
    length also. Is there a way that I could check the first field and depending
    on this extract data from certain fields contained in this line. An example
    of the type of information in the text file is given below. What I want to
    do is depending on the first field i.e. "1031" extract the time i.e.
    "15:09:27" or some other details. I'm just wondering if anyone could suggest
    an appropriate method for approaching this. I think the problem is that each
    line is not formatted the same.

    1031;00005882;admin;5;Printer;2;103001-;STD;Lodg
    ;12.06.2003;15:09:27;13.06.2003;08:30:31;1;1

    1032;00005882;;;;;;;

    1040;00005882;12.06.2003;15:09:33;12.06.2003;17:01:21;1;0;;3;12400;0;;;12400
    ;0;0;;11366

    1041;00005882;1;1
     
    Tony Clarke, Jul 1, 2003
    #1
    1. Advertising

  2. On Tue, 1 Jul 2003 14:20:38 +0100, Tony Clarke <> wrote:
    > Hi All,
    >
    > I have been trying to extract data from a text file using the fscanf()
    > functions and sscanf() functions. The file is of various characters and
    > integers separated by semicolons, the problem I'm having is that each line
    > is of varying length and the fields separated by semicolons are of varying
    > length also. Is there a way that I could check the first field and depending
    > on this extract data from certain fields contained in this line. An example
    > of the type of information in the text file is given below. What I want to
    > do is depending on the first field i.e. "1031" extract the time i.e.
    > "15:09:27" or some other details. I'm just wondering if anyone could suggest
    > an appropriate method for approaching this. I think the problem is that each
    > line is not formatted the same.
    >


    You have to be able to describe your desired system. Try writing some
    pseudo-code and work out what you need to do, *then* write the program.

    Sounds to me like you need to check a field (based on how many ';' have
    been read in if it's, say, the 4th field, then read the rest or just
    skip on through until you hit '\n' and then start testing again.

    Anyway, write the pseudo-code first!

    good luck,

    --
    Ben Fitzgerald
    London, UK
     
    Ben Fitzgerald, Jul 1, 2003
    #2
    1. Advertising

  3. Tony Clarke

    Dan Pop Guest

    In <kTfMa.20832$> "Tony Clarke" <> writes:

    >I have been trying to extract data from a text file using the fscanf()
    >functions and sscanf() functions. The file is of various characters and
    >integers separated by semicolons, the problem I'm having is that each line
    >is of varying length and the fields separated by semicolons are of varying
    >length also. Is there a way that I could check the first field and depending
    >on this extract data from certain fields contained in this line. An example
    >of the type of information in the text file is given below. What I want to
    >do is depending on the first field i.e. "1031" extract the time i.e.
    >"15:09:27" or some other details. I'm just wondering if anyone could suggest
    >an appropriate method for approaching this. I think the problem is that each
    >line is not formatted the same.
    >
    >1031;00005882;admin;5;Printer;2;103001-;STD;Lodg
    >;12.06.2003;15:09:27;13.06.2003;08:30:31;1;1
    >
    >1032;00005882;;;;;;;
    >
    >1040;00005882;12.06.2003;15:09:33;12.06.2003;17:01:21;1;0;;3;12400;0;;;12400
    >;0;0;;11366
    >
    >1041;00005882;1;1


    The easiest solution is to use a regexp (regular expression) library.
    There are some portable ones floating around.

    Depending on what the rest of the application consists of, you may want
    to use a language with built-in support for regular expressions, like
    Perl.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
     
    Dan Pop, Jul 1, 2003
    #3
  4. Tony Clarke

    Morris Dovey Guest

    Tony Clarke wrote:

    > What I want to do is depending on the first field i.e. "1031"
    > extract the time i.e. "15:09:27" or some other details. I'm
    > just wondering if anyone could suggest an appropriate method
    > for approaching this. I think the problem is that each line is
    > not formatted the same.


    Tony...

    I have some code at http://www.iedu.com/mrd/c/tokenize.c and
    http://www.iedu.com/mrd/c/tokfile.c that might provide a usable
    approach to the problem.

    --
    Morris Dovey
    West Des Moines, Iowa USA
    C links at http://www.iedu.com/c
     
    Morris Dovey, Jul 1, 2003
    #4
  5. Tony Clarke

    Kevin Easton Guest

    Tony Clarke <> wrote:
    > Hi All,
    >
    > I have been trying to extract data from a text file using the fscanf()
    > functions and sscanf() functions. The file is of various characters and
    > integers separated by semicolons, the problem I'm having is that each line
    > is of varying length and the fields separated by semicolons are of varying
    > length also. Is there a way that I could check the first field and depending
    > on this extract data from certain fields contained in this line. An example
    > of the type of information in the text file is given below. What I want to
    > do is depending on the first field i.e. "1031" extract the time i.e.
    > "15:09:27" or some other details. I'm just wondering if anyone could suggest
    > an appropriate method for approaching this. I think the problem is that each
    > line is not formatted the same.
    >
    > 1031;00005882;admin;5;Printer;2;103001-;STD;Lodg
    > ;12.06.2003;15:09:27;13.06.2003;08:30:31;1;1
    >
    > 1032;00005882;;;;;;;
    >
    > 1040;00005882;12.06.2003;15:09:33;12.06.2003;17:01:21;1;0;;3;12400;0;;;12400
    > ;0;0;;11366
    >
    > 1041;00005882;1;1


    Read each line using fgets(), then use strchr() to find the ';'
    characters, replacing them with a '\0' and retaining a pointer to the
    following character. Then you'll end up with something like:

    f1 => "1031"
    f2 => "00005882"
    f3 => "admin"
    f4 => "5"
    f5 => "Printer"
    f6 => "2"
    f7 => "103001-"
    f8 => "STD"
    f9 => "Lodg"

    where f1 to f9 are char * objects, and => denotes what they are pointing
    to.

    After that it should be pretty simple to code the logic you want. You
    will need to decide what to do about really long lines - depending on
    your data source, you may be able to set a fixed maximum line length and
    silently or non-silently truncate or ignore lines that exceed it.

    - Kevin.
     
    Kevin Easton, Jul 2, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Toto
    Replies:
    5
    Views:
    678
  2. Replies:
    7
    Views:
    588
    Tom Anderson
    Nov 9, 2005
  3. Mag Gam

    Extracting data from xml file

    Mag Gam, Mar 3, 2007, in forum: XML
    Replies:
    6
    Views:
    728
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=
    Mar 4, 2007
  4. TYR
    Replies:
    2
    Views:
    456
    Dennis Lee Bieber
    Nov 23, 2007
  5. ruds
    Replies:
    4
    Views:
    871
    ttrifonov
    Jun 2, 2008
Loading...

Share This Page