Text Parser Help Please

Discussion in 'Ruby' started by Bucco, Jul 2, 2006.

  1. Bucco

    Bucco Guest

    I am trying to put together a simple script that will parse a text file
    that contains a list of tasks. Each line could be different in format
    from the other. Most lines have words that are marked and can be
    pulled out with a regex. Here is a simple example:

    (A) @home Mow lawn d:6/30/06
    @phone call home
    (B) p:program @pc @desk Add text parser to the program

    Basically, each line is a task in a list of todos. They can have one
    of three priority rankings (A), (B), or (C). The priority is always
    first on the line if it is present. Then There can be a project name
    that the task is related to, "p:program". The next item on the line is
    a context and starts with the @ symbol. Each task can have more than
    one context. After this is the task description that consists of one
    or more words and has no definitive marker. Some tasks may have a due
    date after the task that is marked by a d: followed by a date.

    So basically, the program will read in the text file, process each line
    so that a task is printed to a new file in either a prject file, due
    file, and/or context file. When processing each line, I thought of
    breaking them down by white space into an array and then using a regex
    to match the easy items and remove them the array and use them as a
    hash key for the task.

    I gues the best way might be to extract each marker assin it to a hash
    as a key and then extract the task and assign it to the hash as the
    value. I can't seem to get to this point without a lot of if
    statements. I was wondering if anyone else had a cleaner way of doing
    this.

    Thanks:)
    SA
     
    Bucco, Jul 2, 2006
    #1
    1. Advertising

  2. Bucco

    vasudevram Guest

    If the format of the text file is not hardwired (i.e. you have the
    freedom to change it), why not try this:

    Instead of your current format, use Ruby data (hashes, arrays, etc.) as
    the format for the task list content - in a text file. That way, you
    can directly read in the content - which is Ruby code - and eval it.
    Almost no programming needed for parsing this way - the Ruby
    interpreter will do the parsing for you. All you have to do is design
    the data structure and a little code to read in and eval the text file.

    Vasudev
    ---
    Vasudev Ram
    Independent software consultant
    http://www.geocities.com/vasudevram
    PDF conversion toolkit:
    http://sourceforge.net/projects/xtopdf
    ---


    Bucco wrote:
    > I am trying to put together a simple script that will parse a text file
    > that contains a list of tasks. Each line could be different in format
    > from the other. Most lines have words that are marked and can be
    > pulled out with a regex. Here is a simple example:
    >
    > (A) @home Mow lawn d:6/30/06
    > @phone call home
    > (B) p:program @pc @desk Add text parser to the program
    >
    > Basically, each line is a task in a list of todos. They can have one
    > of three priority rankings (A), (B), or (C). The priority is always
    > first on the line if it is present. Then There can be a project name
    > that the task is related to, "p:program". The next item on the line is
    > a context and starts with the @ symbol. Each task can have more than
    > one context. After this is the task description that consists of one
    > or more words and has no definitive marker. Some tasks may have a due
    > date after the task that is marked by a d: followed by a date.
    >
    > So basically, the program will read in the text file, process each line
    > so that a task is printed to a new file in either a prject file, due
    > file, and/or context file. When processing each line, I thought of
    > breaking them down by white space into an array and then using a regex
    > to match the easy items and remove them the array and use them as a
    > hash key for the task.
    >
    > I gues the best way might be to extract each marker assin it to a hash
    > as a key and then extract the task and assign it to the hash as the
    > value. I can't seem to get to this point without a lot of if
    > statements. I was wondering if anyone else had a cleaner way of doing
    > this.
    >
    > Thanks:)
    > SA
     
    vasudevram, Jul 2, 2006
    #2
    1. Advertising

  3. Bucco

    ccahua Guest

    Hi,

    I'm still learning to 'put' :), but I found this script very handy and
    it might fit your needs.
    My Fiendish Plan - http://www.sedumphotos.net/nfagerlund/fmp/ from Mr.
    Fagerlund

    When run, it parses lines prefixed with a ^ symbol and category name
    exporting them as separate text files. A text file with all your todos
    categorized by project, context or whatever category is broken out into
    their respective text files.

    Example: ^project Learn Ruby in 10 years becomes project.txt with
    'Learn Ruby in 10 years' as the content.

    hth,
    tony


    Bucco wrote:
    > I am trying to put together a simple script that will parse a text file
    > that contains a list of tasks. Each line could be different in format
    > from the other. Most lines have words that are marked and can be
    > pulled out with a regex. Here is a simple example:
    >
    > (A) @home Mow lawn d:6/30/06
    > @phone call home
    > (B) p:program @pc @desk Add text parser to the program
    >
    > Basically, each line is a task in a list of todos. They can have one
    > of three priority rankings (A), (B), or (C). The priority is always
    > first on the line if it is present. Then There can be a project name
    > that the task is related to, "p:program". The next item on the line is
    > a context and starts with the @ symbol. Each task can have more than
    > one context. After this is the task description that consists of one
    > or more words and has no definitive marker. Some tasks may have a due
    > date after the task that is marked by a d: followed by a date.
    >
    > So basically, the program will read in the text file, process each line
    > so that a task is printed to a new file in either a prject file, due
    > file, and/or context file. When processing each line, I thought of
    > breaking them down by white space into an array and then using a regex
    > to match the easy items and remove them the array and use them as a
    > hash key for the task.
    >
    > I gues the best way might be to extract each marker assin it to a hash
    > as a key and then extract the task and assign it to the hash as the
    > value. I can't seem to get to this point without a lot of if
    > statements. I was wondering if anyone else had a cleaner way of doing
    > this.
    >
    > Thanks:)
    > SA
     
    ccahua, Jul 2, 2006
    #3
  4. Bucco

    snowball Guest

    vasudevram wrote:
    > If the format of the text file is not hardwired (i.e. you have the
    > freedom to change it), why not try this:
    >
    > Instead of your current format, use Ruby data (hashes, arrays, etc.) as
    > the format for the task list content - in a text file. That way, you
    > can directly read in the content - which is Ruby code - and eval it.
    > Almost no programming needed for parsing this way - the Ruby
    > interpreter will do the parsing for you. All you have to do is design
    > the data structure and a little code to read in and eval the text file.
    >


    Another option might be to write the file in yaml (http://www.yaml.org)
    and parse the data into Ruby using Syck.
     
    snowball, Jul 2, 2006
    #4
  5. Bucco

    Bucco Guest

    snowball wrote:

    > Another option might be to write the file in yaml (http://www.yaml.org)
    > and parse the data into Ruby using Syck.


    I do not disagree that changing the format of the text file would be
    easier, but, that is not an option at this time. I think if I can
    extract the marked words, I coul then use them as keys in a hash with
    the task as the value. I just can't think of an easy way to do it
    without a lot of if statements.

    Thanks:)
    SA
     
    Bucco, Jul 2, 2006
    #5
  6. Bucco

    Guest

    Perl has a great parser much similar to yacc written by Damien
    Conway. There is a book out that describes using it as well.
    I don't think ruby has this type of thing yet, but it would be nice.
    I have used the perl parser and it works great once you figure it
    out, but I have used yacc which is similiar. It's
    based on compiler theory. You could buil a C, java or ruby parser
    with it or use it for simpler parsing.

    here is the URL:

    http://search.cpan.org/dist/Parse-RecDescent/lib/Parse/RecDescent.pod


    Bucco wrote:
    > I am trying to put together a simple script that will parse a text file
    > that contains a list of tasks. Each line could be different in format
    > from the other. Most lines have words that are marked and can be
    > pulled out with a regex. Here is a simple example:
    >
    > (A) @home Mow lawn d:6/30/06
    > @phone call home
    > (B) p:program @pc @desk Add text parser to the program
    >
    > Basically, each line is a task in a list of todos. They can have one
    > of three priority rankings (A), (B), or (C). The priority is always
    > first on the line if it is present. Then There can be a project name
    > that the task is related to, "p:program". The next item on the line is
    > a context and starts with the @ symbol. Each task can have more than
    > one context. After this is the task description that consists of one
    > or more words and has no definitive marker. Some tasks may have a due
    > date after the task that is marked by a d: followed by a date.
    >
    > So basically, the program will read in the text file, process each line
    > so that a task is printed to a new file in either a prject file, due
    > file, and/or context file. When processing each line, I thought of
    > breaking them down by white space into an array and then using a regex
    > to match the easy items and remove them the array and use them as a
    > hash key for the task.
    >
    > I gues the best way might be to extract each marker assin it to a hash
    > as a key and then extract the task and assign it to the hash as the
    > value. I can't seem to get to this point without a lot of if
    > statements. I was wondering if anyone else had a cleaner way of doing
    > this.
    >
    > Thanks:)
    > SA
     
    , Jul 3, 2006
    #6
  7. Bucco

    Bucco Guest

    Exactly what I was looking for. This would allow me to dump to
    specific files based upon different parameters. Thank you all for your
    help.

    Thanks:)
    SA
     
    Bucco, Jul 3, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bernd Oninger
    Replies:
    0
    Views:
    767
    Bernd Oninger
    Jun 9, 2004
  2. Replies:
    4
    Views:
    529
    Chris Uppal
    May 5, 2005
  3. KK
    Replies:
    2
    Views:
    618
    Big Brian
    Oct 14, 2003
  4. MuZZy
    Replies:
    7
    Views:
    1,779
    Mike Hewson
    Jan 7, 2005
  5. mike b.
    Replies:
    3
    Views:
    169
    James Edward Gray II
    Jul 30, 2007
Loading...

Share This Page