Reading files, splitting on a delimiter and newlines.

Discussion in 'Python' started by Bruno Desthuilliers, Jul 22, 2007.

  1. a écrit :
    > Hello,
    >
    > I have a situation where I have a file that contains text similar to:
    >
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    >
    > My first approach was to open the file, use readlines to split the
    > lines on the "=" delimiter into a key/value pair (to be stored in a
    > dict).
    >
    > After processing a couple files I noticed its possible that a newline
    > can be present in the value as shown in myValue2.
    >
    > In this case its not an option to say remove the newlines if its a
    > "multi line" value as the value data needs to stay intact.
    >
    > I'm a bit confused as how to go about getting this to work.
    >
    > Any suggestions on an approach would be greatly appreciated!
    >


    data = {}
    key = None
    for line in open('yourfile.txt'):
    line = line.strip()
    if not line:
    # skip empty lines
    continue
    if '=' in line:
    key, value = map(str.strip, line.split('=', 1))
    data[key] = value
    elif key is None:
    # first line without a '='
    raise ValueError("invalid format")
    else:
    # multiline
    data[key] += "\n" + line


    print data
    => {'myValue3': 'contents of value3', 'myValue2': 'contents of value2
    but\nwith a new line here', 'myValue1': 'contents of value1'}

    HTH
    Bruno Desthuilliers, Jul 22, 2007
    #1
    1. Advertising

  2. a écrit :
    > On Jul 25, 8:46 am, wrote:
    >
    >>Hello,
    >>
    >>I have a situation where I have a file that contains text similar to:
    >>
    >>myValue1 = contents of value1
    >>myValue2 = contents of value2 but
    >> with a new line here
    >>myValue3 = contents of value3
    >>
    >>My first approach was to open the file, use readlines to split the
    >>lines on the "=" delimiter into a key/value pair (to be stored in a
    >>dict).
    >>
    >>After processing a couple files I noticed its possible that a newline
    >>can be present in the value as shown in myValue2.
    >>
    >>In this case its not an option to say remove the newlines if its a
    >>"multi line" value as the value data needs to stay intact.
    >>
    >>I'm a bit confused as how to go about getting this to work.
    >>
    >>Any suggestions on an approach would be greatly appreciated!

    >
    >
    >
    >
    > Check the length of the list returned from split; this allows
    > your to append to the previously extracted value if need be.
    >
    > import StringIO
    > import pprint
    >
    > buf = """\
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    > """
    >
    > mockfile = StringIO.StringIO(buf)
    >
    > record=dict()
    >
    > for line in mockfile:
    > kvpair = line.split('=', 2)


    You want :
    kvpair = line.split('=', 1)

    >>> toto = "x = 42 = 33"
    >>> toto.split('=', 2)

    ['x ', ' 42 ', ' 33']


    > if len(kvpair) == 2:
    > key, value = kvpair
    > record[key] = value
    > else:
    > record[key] += line


    Also, this won't handle the case where the first line doesn't contain an
    '=' (NameError, name 'key' is not defined)
    Bruno Desthuilliers, Jul 23, 2007
    #2
    1. Advertising

  3. Bruno Desthuilliers

    Guest

    Hello,

    I have a situation where I have a file that contains text similar to:

    myValue1 = contents of value1
    myValue2 = contents of value2 but
    with a new line here
    myValue3 = contents of value3

    My first approach was to open the file, use readlines to split the
    lines on the "=" delimiter into a key/value pair (to be stored in a
    dict).

    After processing a couple files I noticed its possible that a newline
    can be present in the value as shown in myValue2.

    In this case its not an option to say remove the newlines if its a
    "multi line" value as the value data needs to stay intact.

    I'm a bit confused as how to go about getting this to work.

    Any suggestions on an approach would be greatly appreciated!
    , Jul 25, 2007
    #3
  4. Bruno Desthuilliers

    Guest

    On Jul 25, 10:46 am, wrote:
    > Hello,
    >
    > I have a situation where I have a file that contains text similar to:
    >
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    >
    > My first approach was to open the file, use readlines to split the
    > lines on the "=" delimiter into a key/value pair (to be stored in a
    > dict).
    >
    > After processing a couple files I noticed its possible that a newline
    > can be present in the value as shown in myValue2.
    >
    > In this case its not an option to say remove the newlines if its a
    > "multi line" value as the value data needs to stay intact.
    >
    > I'm a bit confused as how to go about getting this to work.
    >
    > Any suggestions on an approach would be greatly appreciated!


    I'm confused. You don't want the newline to be present, but you can't
    remove it because the data has to stay intact? If you don't want to
    change it, then what's the problem?

    Mike
    , Jul 25, 2007
    #4
  5. Bruno Desthuilliers

    Stargaming Guest

    On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:

    > On Jul 25, 10:46 am, wrote:
    >> Hello,
    >>
    >> I have a situation where I have a file that contains text similar to:
    >>
    >> myValue1 = contents of value1
    >> myValue2 = contents of value2 but
    >> with a new line here
    >> myValue3 = contents of value3
    >>
    >> My first approach was to open the file, use readlines to split the
    >> lines on the "=" delimiter into a key/value pair (to be stored in a
    >> dict).
    >>
    >> After processing a couple files I noticed its possible that a newline
    >> can be present in the value as shown in myValue2.
    >>
    >> In this case its not an option to say remove the newlines if its a
    >> "multi line" value as the value data needs to stay intact.
    >>
    >> I'm a bit confused as how to go about getting this to work.
    >>
    >> Any suggestions on an approach would be greatly appreciated!

    >
    > I'm confused. You don't want the newline to be present, but you can't
    > remove it because the data has to stay intact? If you don't want to
    > change it, then what's the problem?
    >
    > Mike


    It's obviously that simple line-by-line filtering won't handle multi-line
    statements.

    You could solve that by saving the last item you added something to and,
    if the line currently handles doesn't look like an assignment, append it
    to this item. You might run into problems with such data:

    foo = modern maths
    proved that 1 = 1
    bar = single

    If your dataset always has indendation on subsequent lines, you might use
    this. Or if the key's name is always just one word.

    HTH,
    Stargaming
    Stargaming, Jul 25, 2007
    #5
  6. Bruno Desthuilliers

    John Machin Guest

    On Jul 26, 3:08 am, Stargaming <> wrote:
    > On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:
    > > On Jul 25, 10:46 am, wrote:
    > >> Hello,

    >
    > >> I have a situation where I have a file that contains text similar to:

    >
    > >> myValue1 = contents of value1
    > >> myValue2 = contents of value2 but
    > >> with a new line here
    > >> myValue3 = contents of value3

    >
    > >> My first approach was to open the file, use readlines to split the
    > >> lines on the "=" delimiter into a key/value pair (to be stored in a
    > >> dict).

    >
    > >> After processing a couple files I noticed its possible that a newline
    > >> can be present in the value as shown in myValue2.

    >
    > >> In this case its not an option to say remove the newlines if its a
    > >> "multi line" value as the value data needs to stay intact.

    >
    > >> I'm a bit confused as how to go about getting this to work.

    >
    > >> Any suggestions on an approach would be greatly appreciated!

    >
    > > I'm confused. You don't want the newline to be present, but you can't
    > > remove it because the data has to stay intact? If you don't want to
    > > change it, then what's the problem?

    >
    > > Mike

    >
    > It's obviously that simple line-by-line filtering won't handle multi-line
    > statements.
    >
    > You could solve that by saving the last item you added something to and,
    > if the line currently handles doesn't look like an assignment, append it
    > to this item. You might run into problems with such data:
    >
    > foo = modern maths
    > proved that 1 = 1
    > bar = single
    >
    > If your dataset always has indendation on subsequent lines, you might use
    > this. Or if the key's name is always just one word.
    >


    My take: all of the above, plus: Given that you want to extract stuff
    of the form <LHS> = <RHS> I'd suggest developing a fairly precise
    regular expression for LHS, maybe even for RHS, and trying this on as
    many of these files as you can.

    Why an RE for RHS? Consider:

    foo = somebody said "I think that
    REs = trouble
    maybe_better = pyparsing"

    :)
    John Machin, Jul 25, 2007
    #6
  7. Bruno Desthuilliers

    Guest

    On Jul 25, 8:46 am, wrote:
    > Hello,
    >
    > I have a situation where I have a file that contains text similar to:
    >
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    >
    > My first approach was to open the file, use readlines to split the
    > lines on the "=" delimiter into a key/value pair (to be stored in a
    > dict).
    >
    > After processing a couple files I noticed its possible that a newline
    > can be present in the value as shown in myValue2.
    >
    > In this case its not an option to say remove the newlines if its a
    > "multi line" value as the value data needs to stay intact.
    >
    > I'm a bit confused as how to go about getting this to work.
    >
    > Any suggestions on an approach would be greatly appreciated!




    Check the length of the list returned from split; this allows
    your to append to the previously extracted value if need be.

    import StringIO
    import pprint

    buf = """\
    myValue1 = contents of value1
    myValue2 = contents of value2 but
    with a new line here
    myValue3 = contents of value3
    """

    mockfile = StringIO.StringIO(buf)

    record=dict()

    for line in mockfile:
    kvpair = line.split('=', 2)
    if len(kvpair) == 2:
    key, value = kvpair
    record[key] = value
    else:
    record[key] += line

    pprint.pprint(record)

    # lstrip() to remove newlines if needed ...

    --
    Hope this helps,
    Steven
    , Jul 26, 2007
    #7
  8. Bruno Desthuilliers

    Guest

    On Jul 25, 7:56 pm, ""
    <> wrote:
    > On Jul 25, 8:46 am, wrote:
    >
    >
    >
    > > Hello,

    >
    > > I have a situation where I have a file that contains text similar to:

    >
    > > myValue1 = contents of value1
    > > myValue2 = contents of value2 but
    > > with a new line here
    > > myValue3 = contents of value3

    >
    > > My first approach was to open the file, use readlines to split the
    > > lines on the "=" delimiter into a key/value pair (to be stored in a
    > > dict).

    >
    > > After processing a couple files I noticed its possible that a newline
    > > can be present in the value as shown in myValue2.

    >
    > > In this case its not an option to say remove the newlines if its a
    > > "multi line" value as the value data needs to stay intact.

    >
    > > I'm a bit confused as how to go about getting this to work.

    >
    > > Any suggestions on an approach would be greatly appreciated!

    >
    > Check the length of the list returned from split; this allows
    > your to append to the previously extracted value if need be.
    >
    > import StringIO
    > import pprint
    >
    > buf = """\
    > myValue1 = contents of value1
    > myValue2 = contents of value2 but
    > with a new line here
    > myValue3 = contents of value3
    > """
    >
    > mockfile = StringIO.StringIO(buf)
    >
    > record=dict()
    >
    > for line in mockfile:
    > kvpair = line.split('=', 2)
    > if len(kvpair) == 2:
    > key, value = kvpair
    > record[key] = value
    > else:
    > record[key] += line
    >
    > pprint.pprint(record)
    >
    > # lstrip() to remove newlines if needed ...
    >
    > --
    > Hope this helps,
    > Steven


    Great thank you! That was the logic I was looking for.
    , Jul 26, 2007
    #8
  9. : <> Wrote:

    > On Jul 25, 10:46 am, wrote:
    > > Hello,
    > >
    > > I have a situation where I have a file that contains text similar to:
    > >
    > > myValue1 = contents of value1
    > > myValue2 = contents of value2 but
    > > with a new line here
    > > myValue3 = contents of value3
    > >
    > > My first approach was to open the file, use readlines to split the
    > > lines on the "=" delimiter into a key/value pair (to be stored in a
    > > dict).
    > >
    > > After processing a couple files I noticed its possible that a newline
    > > can be present in the value as shown in myValue2.
    > >
    > > In this case its not an option to say remove the newlines if its a
    > > "multi line" value as the value data needs to stay intact.
    > >
    > > I'm a bit confused as how to go about getting this to work.
    > >
    > > Any suggestions on an approach would be greatly appreciated!

    >
    > I'm confused. You don't want the newline to be present, but you can't
    > remove it because the data has to stay intact? If you don't want to
    > change it, then what's the problem?


    I think the OP's trouble is that the value he wants gets split up by the
    newline at the end of the line when he uses readline().

    One can try adding the single value to the previous value in the previous
    key/value pair when the split does not yield two values - a bit hackish,
    but given structured input data it might work.

    - Hendrik
    Hendrik van Rooyen, Jul 26, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Ericson
    Replies:
    0
    Views:
    423
    John Ericson
    Jul 19, 2003
  2. Mark
    Replies:
    0
    Views:
    440
  3. Prasanth
    Replies:
    4
    Views:
    406
    Prasanth
    Nov 22, 2008
  4. Ajithkumar Warrier

    Splitting a sentence with delimiter preserved

    Ajithkumar Warrier, Oct 17, 2006, in forum: Ruby
    Replies:
    0
    Views:
    90
    Ajithkumar Warrier
    Oct 17, 2006
  5. Sandman

    Splitting and keeping the delimiter

    Sandman, Sep 10, 2003, in forum: Perl Misc
    Replies:
    7
    Views:
    433
    Sandman
    Sep 12, 2003
Loading...

Share This Page