Parsing of a file

Discussion in 'Python' started by Tommy Grav, Aug 6, 2008.

  1. Tommy Grav

    Tommy Grav Guest

    I have a file with the format

    Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    5 Set 1
    Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
    5 Set 2
    Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    5 Set 3
    Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    5 Set 4
    Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    5 Set 5
    Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    5 Set 6
    Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    5 Set 7
    Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    5 Set 8
    Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    5 Set 9
    Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    5 Set 10

    I would like to parse this file by extracting the field id, ra, dec
    and mjd for each line. It is
    not, however, certain that the width of each value of the field id,
    ra, dec or mjd is the same
    in each line. Is there a way to do this such that even if there was a
    line where Ra=****** and
    MJD=******** was swapped it would be parsed correctly?

    Cheers
    Tommy
    Tommy Grav, Aug 6, 2008
    #1
    1. Advertising

  2. On Aug 6, 1:55 pm, Tommy Grav <> wrote:
    > I have a file with the format
    >
    > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames  
    > 5 Set 1
    > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames  
    > 5 Set 2
    > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames  
    > 5 Set 3
    > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames  
    > 5 Set 4
    > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames  
    > 5 Set 5
    > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames  
    > 5 Set 6
    > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames  
    > 5 Set 7
    > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames  
    > 5 Set 8
    > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames  
    > 5 Set 9
    > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames  
    > 5 Set 10
    >
    > I would like to parse this file by extracting the field id, ra, dec  
    > and mjd for each line. It is
    > not, however, certain that the width of each value of the field id,  
    > ra, dec or mjd is the same
    > in each line. Is there a way to do this such that even if there was a  
    > line where Ra=****** and
    > MJD=******** was swapped it would be parsed correctly?
    >
    > Cheers
    >    Tommy


    I'm sure Python can handle this. Try the PyParsing module or learn
    Python regular expression syntax.

    http://pyparsing.wikispaces.com/

    You could probably do it very crudely by just iterating over each line
    and then using the string's find() method.

    Mike
    Mike Driscoll, Aug 6, 2008
    #2
    1. Advertising

  3. Tommy Grav

    John Machin Guest

    On Aug 7, 6:02 am, Mike Driscoll <> wrote:
    > On Aug 6, 1:55 pm, Tommy Grav <> wrote:
    >
    >
    >
    > > I have a file with the format

    >
    > > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    > > 5 Set 1
    > > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
    > > 5 Set 2
    > > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    > > 5 Set 3
    > > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    > > 5 Set 4
    > > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    > > 5 Set 5
    > > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    > > 5 Set 6
    > > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    > > 5 Set 7
    > > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    > > 5 Set 8
    > > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    > > 5 Set 9
    > > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    > > 5 Set 10

    >
    > > I would like to parse this file by extracting the field id, ra, dec
    > > and mjd for each line. It is
    > > not, however, certain that the width of each value of the field id,
    > > ra, dec or mjd is the same
    > > in each line. Is there a way to do this such that even if there was a
    > > line where Ra=****** and
    > > MJD=******** was swapped it would be parsed correctly?

    >
    > > Cheers
    > > Tommy

    >
    > I'm sure Python can handle this. Try the PyParsing module or learn
    > Python regular expression syntax.
    >
    > http://pyparsing.wikispaces.com/
    >
    > You could probably do it very crudely by just iterating over each line
    > and then using the string's find() method.
    >


    Perhaps you and the OP could spend some time becoming familiar with
    built-in functions and str methods. In particular, str.split is your
    friend:

    C:\junk>type tommy_grav.py
    # Look, Ma, no imports!

    guff = """\
    Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    5 Set 1
    Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
    Frames 5 Set
    2
    Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    5 Set 3
    Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    5 Set 4
    Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    5 Set 5

    Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    5 Set 6
    Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    5 Set 7
    Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    5 Set 8
    Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    5 Set 9
    Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    5 Set 10

    """

    is_angle = {
    'ra': True,
    'dec': True,
    'mjd': False,
    }

    def convert_angle(text):
    deg, min, sec = map(float, text.split(':'))
    return (sec / 60. + min) / 60. + deg

    def parse_line(line):
    t = line.split()
    assert t[0].lower() == 'field'
    assert t[1].startswith('f')
    assert t[1].endswith(':')
    field_id = t[1].rstrip(':')
    rdict = {}
    for f in t[2:]:
    parts = f.split('=')
    if len(parts) == 2:
    key = parts[0].lower()
    value = parts[1]
    assert key not in rdict
    if is_angle[key]:
    rvalue = convert_angle(value)
    else:
    rvalue = float(value)
    rdict[key] = rvalue
    return field_id, rdict['ra'], rdict['dec'], rdict['mjd']

    for line in guff.splitlines():
    line = line.strip()
    if not line:
    continue
    field_id, ra, dec, mjd = parse_line(line)
    print field_id, ra, dec, mjd


    C:\junk>tommy_grav.py
    f29227 20.3962611111 67.5 53370.0679769
    f31448 20.4161472222 79.6621944444 53370.0681162
    f31226 20.4126388889 78.4458888889 53370.0682386
    f31004 20.4181333333 77.2296944444 53370.0683602
    f30782 20.4310944444 76.0135 53370.0684821
    f30560 20.4505055556 74.7973055556 53370.068604
    f30338 20.4756527778 73.5811111111 53370.0687262
    f30116 20.5060277778 72.3648888889 53370.0688489
    f29894 20.5412611111 71.1486111111 53370.0689707
    f29672 20.5810805556 69.9323888889 53370.0690935

    Cheers,
    John
    John Machin, Aug 6, 2008
    #3
  4. Tommy Grav

    Guest

    Using something like PyParsing is probably better, but if you don't
    want to use it you may use something like this:

    raw_data = """
    Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    5 Set 1
    Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
    5 Set 2
    Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    5 Set 3
    Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    5 Set 4
    Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    5 Set 5
    Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    5 Set 6
    Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    5 Set 7
    Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    5 Set 8
    Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    5 Set 9
    Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    5 Set 10"""

    # from each line extract the fields: id, ra, dec, mjd
    # even if they are swapped

    data = []
    for line in raw_data.lower().splitlines():
    if line.startswith("field"):
    parts = line.split()
    record = {"id": int(parts[1][1:-1])}
    for part in parts[2:]:
    if "=" in part:
    title, field = part.split("=")
    record[title] = field
    data.append(record)
    print data

    -----------------

    Stefan Behnel:
    >You can use named groups in a single regular expression.<


    Can you show how to use them in this situation when fields can be
    swapped?

    Bye,
    bearophile
    , Aug 6, 2008
    #4
  5. Tommy Grav

    John Machin Guest

    On Aug 7, 7:06 am, John Machin <> wrote:
    > On Aug 7, 6:02 am, Mike Driscoll <> wrote:
    >
    >
    >
    > > On Aug 6, 1:55 pm, Tommy Grav <> wrote:

    >
    > > > I have a file with the format

    >
    > > > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    > > > 5 Set 1
    > > > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
    > > > 5 Set 2
    > > > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    > > > 5 Set 3
    > > > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    > > > 5 Set 4
    > > > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    > > > 5 Set 5
    > > > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    > > > 5 Set 6
    > > > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    > > > 5 Set 7
    > > > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    > > > 5 Set 8
    > > > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    > > > 5 Set 9
    > > > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    > > > 5 Set 10

    >
    > > > I would like to parse this file by extracting the field id, ra, dec
    > > > and mjd for each line. It is
    > > > not, however, certain that the width of each value of the field id,
    > > > ra, dec or mjd is the same
    > > > in each line. Is there a way to do this such that even if there was a
    > > > line where Ra=****** and
    > > > MJD=******** was swapped it would be parsed correctly?

    >
    > > > Cheers
    > > > Tommy

    >
    > > I'm sure Python can handle this. Try the PyParsing module or learn
    > > Python regular expression syntax.

    >
    > >http://pyparsing.wikispaces.com/

    >
    > > You could probably do it very crudely by just iterating over each line
    > > and then using the string's find() method.

    >
    > Perhaps you and the OP could spend some time becoming familiar with
    > built-in functions and str methods. In particular, str.split is your
    > friend:
    >
    > C:\junk>type tommy_grav.py
    > # Look, Ma, no imports!
    >
    > guff = """\
    > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    > 5 Set 1
    > Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
    > Frames 5 Set
    > 2
    > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    > 5 Set 3
    > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    > 5 Set 4
    > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    > 5 Set 5
    >
    > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    > 5 Set 6
    > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    > 5 Set 7
    > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    > 5 Set 8
    > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    > 5 Set 9
    > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    > 5 Set 10
    >
    > """
    >
    > is_angle = {
    > 'ra': True,
    > 'dec': True,
    > 'mjd': False,
    > }
    >
    > def convert_angle(text):
    > deg, min, sec = map(float, text.split(':'))
    > return (sec / 60. + min) / 60. + deg
    >
    > def parse_line(line):
    > t = line.split()
    > assert t[0].lower() == 'field'
    > assert t[1].startswith('f')
    > assert t[1].endswith(':')
    > field_id = t[1].rstrip(':')
    > rdict = {}
    > for f in t[2:]:
    > parts = f.split('=')
    > if len(parts) == 2:
    > key = parts[0].lower()
    > value = parts[1]
    > assert key not in rdict
    > if is_angle[key]:
    > rvalue = convert_angle(value)
    > else:
    > rvalue = float(value)
    > rdict[key] = rvalue
    > return field_id, rdict['ra'], rdict['dec'], rdict['mjd']
    >
    > for line in guff.splitlines():
    > line = line.strip()
    > if not line:
    > continue
    > field_id, ra, dec, mjd = parse_line(line)
    > print field_id, ra, dec, mjd
    >
    > C:\junk>tommy_grav.py
    > f29227 20.3962611111 67.5 53370.0679769
    > f31448 20.4161472222 79.6621944444 53370.0681162
    > f31226 20.4126388889 78.4458888889 53370.0682386
    > f31004 20.4181333333 77.2296944444 53370.0683602
    > f30782 20.4310944444 76.0135 53370.0684821
    > f30560 20.4505055556 74.7973055556 53370.068604
    > f30338 20.4756527778 73.5811111111 53370.0687262
    > f30116 20.5060277778 72.3648888889 53370.0688489
    > f29894 20.5412611111 71.1486111111 53370.0689707
    > f29672 20.5810805556 69.9323888889 53370.0690935
    >
    > Cheers,
    > John


    Slightly less ugly:

    C:\junk>diff tommy_grav.py tommy_grav_2.py
    18,23d17
    < is_angle = {
    < 'ra': True,
    < 'dec': True,
    < 'mjd': False,
    < }
    <
    27a22,27
    > converter = {
    > 'ra': convert_angle,
    > 'dec': convert_angle,
    > 'mjd': float,
    > }
    >

    41,44c41
    < if is_angle[key]:
    < rvalue = convert_angle(value)
    < else:
    < rvalue = float(value)
    ---
    > rvalue = converter[key](value)
    John Machin, Aug 6, 2008
    #5
  6. On Aug 6, 3:55 pm, Tommy Grav <> wrote:
    > I have a file with the format
    >
    > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames  
    > 5 Set 1
    > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames  
    > 5 Set 2
    > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames  
    > 5 Set 3
    > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames  
    > 5 Set 4
    > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames  
    > 5 Set 5
    > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames  
    > 5 Set 6
    > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames  
    > 5 Set 7
    > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames  
    > 5 Set 8
    > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames  
    > 5 Set 9
    > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames  
    > 5 Set 10
    >
    > I would like to parse this file by extracting the field id, ra, dec  
    > and mjd for each line. It is
    > not, however, certain that the width of each value of the field id,  
    > ra, dec or mjd is the same
    > in each line. Is there a way to do this such that even if there was a  
    > line where Ra=****** and
    > MJD=******** was swapped it would be parsed correctly?
    >
    > Cheers
    >    Tommy


    Did you consider changing the file format in the first place, so that
    you don't have to do any contortions to parse it ?

    Anyway, here is a solution with regular expressions (I'm a beginner
    with re's in python, so, please correct it if wrong and suggest better
    solutions):

    import re
    s = """Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690
    Frames 5 Set 1
    Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
    5 Set 2
    Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    5 Set 3
    Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    5 Set 4
    Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    5 Set 5
    Field f30560: Dec=+74:47:50.3 Ra=20:27:01.82 MJD=53370.06860400 Frames
    5 Set 6
    Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    5 Set 7
    Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    5 Set 8
    Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    5 Set 9
    Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    5 Set 10"""

    s = s.split('\n')
    r = re.compile(r'Field (\S+): (?:(?:Ra=(\S+) Dec=(\S+))|(?:Dec=(\S+)
    Ra=(\S+))) MJD=(\S+)')
    for i in s:
    match = r.findall(i)
    field = match[0][0]
    Ra = match[0][1] or match[0][4]
    Dec = match[0][2] or match[0][3]
    MJD = match[0][5]
    print field, Ra, Dec, MJD
    Henrique Dante de Almeida, Aug 7, 2008
    #6
  7. Tommy Grav a écrit :
    > I have a file with the format
    >
    > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames 5
    > Set 1
    > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames 5
    > Set 2
    > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames 5
    > Set 3
    > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames 5
    > Set 4
    > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames 5
    > Set 5
    > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames 5
    > Set 6
    > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames 5
    > Set 7
    > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames 5
    > Set 8
    > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames 5
    > Set 9
    > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames 5
    > Set 10
    >
    > I would like to parse this file by extracting the field id, ra, dec and
    > mjd for each line. It is
    > not, however, certain that the width of each value of the field id, ra,
    > dec or mjd is the same
    > in each line. Is there a way to do this such that even if there was a
    > line where Ra=****** and
    > MJD=******** was swapped it would be parsed correctly?


    Q&D :

    src = open('/path/to/yourfile.ext')
    parsed = []
    for line in src:
    line = line.strip()
    if not line:
    continue
    head, rest = line.split(':', 1)
    field_id = head.split()[1]
    data = dict(field_id=field_id)
    parts = rest.split()
    for part in parts:
    try:
    key, val = part.split('=')
    except ValueError:
    continue
    data[key] = val
    parsed.append(data)
    src.close()
    Bruno Desthuilliers, Aug 7, 2008
    #7
  8. On Aug 6, 4:06 pm, John Machin <> wrote:
    > On Aug 7, 6:02 am, Mike Driscoll <> wrote:
    >
    >
    >
    > > On Aug 6, 1:55 pm, Tommy Grav <> wrote:

    >
    > > > I have a file with the format

    >
    > > > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    > > > 5 Set 1
    > > > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
    > > > 5 Set 2
    > > > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    > > > 5 Set 3
    > > > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    > > > 5 Set 4
    > > > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    > > > 5 Set 5
    > > > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    > > > 5 Set 6
    > > > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    > > > 5 Set 7
    > > > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    > > > 5 Set 8
    > > > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    > > > 5 Set 9
    > > > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    > > > 5 Set 10

    >
    > > > I would like to parse this file by extracting the field id, ra, dec
    > > > and mjd for each line. It is
    > > > not, however, certain that the width of each value of the field id,
    > > > ra, dec or mjd is the same
    > > > in each line. Is there a way to do this such that even if there was a
    > > > line where Ra=****** and
    > > > MJD=******** was swapped it would be parsed correctly?

    >
    > > > Cheers
    > > >    Tommy

    >
    > > I'm sure Python can handle this. Try the PyParsing module or learn
    > > Python regular expression syntax.

    >
    > >http://pyparsing.wikispaces.com/

    >
    > > You could probably do it very crudely by just iterating over each line
    > > and then using the string's find() method.

    >
    > Perhaps you and the OP could spend some time becoming familiar with
    > built-in functions and str methods. In particular, str.split is your
    > friend:
    >


    I'm well aware of the split() method and built-ins, however since this
    appeared to be a homework-type question and I was at work, I didn't
    spend any time on the issue. The only reason I mentioned McGuire's
    PyParsing module was because I had just finished reading his article
    on the subject in Python Magazine and it sounded like something the OP
    might find interesting.

    Here's my own implementation based on what's already been done here.
    I'm sure one get have some fun doing it with itertools or list
    comprehensions if you wanted to get really fancy.

    <code>

    raw_data = """
    Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
    5 Set 1
    Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
    5 Set 2
    Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
    5 Set 3
    Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
    5 Set 4
    Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
    5 Set 5
    Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
    5 Set 6
    Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
    5 Set 7
    Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
    5 Set 8
    Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
    5 Set 9
    Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
    5 Set 10
    """.splitlines()

    myList = []
    for line in raw_data:
    items = line.split()
    myDict = {}
    for item in items:
    if '=' in item:
    key, value = item.split('=')
    myDict[key] = value
    elif item[:1].lower() == 'f' and item[-1:] == ':':
    myDict['id'] = item[1:-1]
    myList.append(myDict)

    print myList

    </code>

    This doesn't have any type checking or error handling, but it works
    with the data provided.

    Mike
    Mike Driscoll, Aug 7, 2008
    #8
  9. Tommy Grav

    Tommy Grav Guest

    On Aug 7, 2008, at 12:52 PM, Mike Driscoll wrote:
    > I'm well aware of the split() method and built-ins, however since this
    > appeared to be a homework-type question and I was at work, I didn't
    > spend any time on the issue. The only reason I mentioned McGuire's
    > PyParsing module was because I had just finished reading his article
    > on the subject in Python Magazine and it sounded like something the OP
    > might find interesting.\


    Thanks to everyone that responded, I learned a lot about text parsing
    from
    the responses. I just wanted to respond to Mike and let him know that
    this
    was not a homework problem. I was given a file in the format by a
    colleague
    for a project that I am working on (it contains a list of fields
    observed by
    the LINEAR asteroid search project during 2005 and 2006). I could have
    parsed it using slices of each line, but the unusual format of each line
    got me thinking about wether there was another way to do it. I had
    tried a
    few approaches, but I had not considered the .split() and .split("=").
    Of course
    the list members quickly came up with a simple and elegant solution. And
    I learned a lot in the process :)

    Cheers
    Tommy Grav
    +
    -----------------------------------------------------------------------------------------------------------------+
    Associate Research Scientist Dept. of Physics and Astronomy
    Johns Hopkins University Bloomberg 243
    3400 N. Charles St.
    (410) 516-7683 Baltimore, MD21218
    +
    -----------------------------------------------------------------------------------------------------------------+
    Tommy Grav, Aug 8, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    873
    GIMME
    Feb 11, 2004
  2. Naren
    Replies:
    0
    Views:
    579
    Naren
    May 11, 2004
  3. Christopher Diggins
    Replies:
    0
    Views:
    610
    Christopher Diggins
    Jul 9, 2007
  4. Christopher Diggins
    Replies:
    0
    Views:
    433
    Christopher Diggins
    Jul 9, 2007
  5. John Levine
    Replies:
    0
    Views:
    729
    John Levine
    Feb 2, 2012
Loading...

Share This Page