Text Processing

Discussion in 'Python' started by Yigit Turgut, Dec 20, 2011.

  1. Yigit Turgut

    Yigit Turgut Guest

    Hi all,

    I have a text file containing such data ;

    A B C
    -------------------------------------------------------
    -2.0100e-01 8.000e-02 8.000e-05
    -2.0000e-01 0.000e+00 4.800e-04
    -1.9900e-01 4.000e-02 1.600e-04

    But I only need Section B, and I need to change the notation to ;

    8.000e-02 = 0.08
    0.000e+00 = 0.00
    4.000e-02 = 0.04

    Text file is approximately 10MB in size. I looked around to see if
    there is a quick and dirty workaround but there are lots of modules,
    lots of options.. I am confused.

    Which module is most suitable for this task ?
    Yigit Turgut, Dec 20, 2011
    #1
    1. Advertising

  2. Yigit Turgut

    Dave Angel Guest

    On 12/20/2011 02:17 PM, Yigit Turgut wrote:
    > Hi all,
    >
    > I have a text file containing such data ;
    >
    > A B C
    > -------------------------------------------------------
    > -2.0100e-01 8.000e-02 8.000e-05
    > -2.0000e-01 0.000e+00 4.800e-04
    > -1.9900e-01 4.000e-02 1.600e-04
    >
    > But I only need Section B, and I need to change the notation to ;
    >
    > 8.000e-02 = 0.08
    > 0.000e+00 = 0.00
    > 4.000e-02 = 0.04
    >
    > Text file is approximately 10MB in size. I looked around to see if
    > there is a quick and dirty workaround but there are lots of modules,
    > lots of options.. I am confused.
    >
    > Which module is most suitable for this task ?

    You probably don't need anything but sys (to parse the command options)
    and os (maybe).

    open the file
    for eachline
    if one of the header lines, continue
    separate out the part you want
    print it, formatted as you like

    Then just run the script with its stdout redirected, and you've got your
    new file

    The details depend on what your experience with Python is, and what
    version of Python you're running.

    --

    DaveA
    Dave Angel, Dec 20, 2011
    #2
    1. Advertising

  3. Yigit Turgut

    Jérôme Guest

    Tue, 20 Dec 2011 11:17:15 -0800 (PST)
    Yigit Turgut a écrit:

    > Hi all,
    >
    > I have a text file containing such data ;
    >
    > A B C
    > -------------------------------------------------------
    > -2.0100e-01 8.000e-02 8.000e-05
    > -2.0000e-01 0.000e+00 4.800e-04
    > -1.9900e-01 4.000e-02 1.600e-04
    >
    > But I only need Section B, and I need to change the notation to ;
    >
    > 8.000e-02 = 0.08
    > 0.000e+00 = 0.00
    > 4.000e-02 = 0.04
    >
    > Text file is approximately 10MB in size. I looked around to see if
    > there is a quick and dirty workaround but there are lots of modules,
    > lots of options.. I am confused.
    >
    > Which module is most suitable for this task ?


    You could try to do it yourself.

    You'd need to know what seperates the datas. Tabulation character ? Spaces ?

    Exemple :

    Input file
    ----------

    A B C
    -------------------------------------------------------
    -2.0100e-01 8.000e-02 8.000e-05
    -2.0000e-01 0.000e+00 4.800e-04
    -1.9900e-01 4.000e-02 1.600e-04


    Python code
    -----------

    # Open file
    with open('test1.plt','r') as f:

    b_values = []

    # skip as many lines as needed
    line = f.readline()
    line = f.readline()
    line = f.readline()

    while line:
    #start = line.find(u"\u0009", 0) + 1 #seek Tab
    start = line.find(" ", 0) + 4 #seek 4 spaces
    #end = line.find(u"\u0009", start)
    end = line.find(" ", start)
    b_values.append(float(line[start:end].strip()))
    line = f.readline()

    print b_values

    It gets trickier if the amount of spaces is not constant. I would then try
    with regular expressions. Perhaps would regexp be more efficient in any case.

    --
    Jérôme
    Jérôme, Dec 20, 2011
    #3
  4. Yigit Turgut

    Nick Dokos Guest

    Jérôme <> wrote:

    > Tue, 20 Dec 2011 11:17:15 -0800 (PST)
    > Yigit Turgut a écrit:
    >
    > > Hi all,
    > >
    > > I have a text file containing such data ;
    > >
    > > A B C
    > > -------------------------------------------------------
    > > -2.0100e-01 8.000e-02 8.000e-05
    > > -2.0000e-01 0.000e+00 4.800e-04
    > > -1.9900e-01 4.000e-02 1.600e-04
    > >
    > > But I only need Section B, and I need to change the notation to ;
    > >
    > > 8.000e-02 = 0.08
    > > 0.000e+00 = 0.00
    > > 4.000e-02 = 0.04
    > >
    > > Text file is approximately 10MB in size. I looked around to see if
    > > there is a quick and dirty workaround but there are lots of modules,
    > > lots of options.. I am confused.
    > >
    > > Which module is most suitable for this task ?

    >
    > You could try to do it yourself.
    >


    Does it have to be python? If not, I'd go with something similar to

    sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'

    Nick
    Nick Dokos, Dec 20, 2011
    #4
  5. On 20.12.2011 22:04, Nick Dokos wrote:

    >>> I have a text file containing such data ;
    >>>
    >>> A B C
    >>> -------------------------------------------------------
    >>> -2.0100e-01 8.000e-02 8.000e-05
    >>> -2.0000e-01 0.000e+00 4.800e-04
    >>> -1.9900e-01 4.000e-02 1.600e-04
    >>>
    >>> But I only need Section B, and I need to change the notation to ;
    >>>
    >>> 8.000e-02 = 0.08
    >>> 0.000e+00 = 0.00
    >>> 4.000e-02 = 0.04


    > Does it have to be python? If not, I'd go with something similar to
    >
    > sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'
    >


    Why sed and awk:

    awk 'NR>2 {printf("%.2f\n", $2);}' data.txt

    And in Python:

    f = open("data.txt")
    f.readline() # skip header
    f.readline() # skip header
    for line in f:
    print "%02s" % float(line.split()[1])
    Alexander Kapps, Dec 21, 2011
    #5
  6. Yigit Turgut

    Yigit Turgut Guest

    On Dec 21, 2:01 am, Alexander Kapps <> wrote:
    > On 20.12.2011 22:04, Nick Dokos wrote:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > >>> I have a text file containing such data ;

    >
    > >>>          A                B               C
    > >>> -------------------------------------------------------
    > >>> -2.0100e-01    8.000e-02    8.000e-05
    > >>> -2.0000e-01    0.000e+00   4.800e-04
    > >>> -1.9900e-01    4.000e-02    1.600e-04

    >
    > >>> But I only need Section B, and I need to change the notation to ;

    >
    > >>> 8.000e-02 = 0.08
    > >>> 0.000e+00 = 0.00
    > >>> 4.000e-02 = 0.04

    > > Does it have to be python? If not, I'd go with something similar to

    >
    > >     sed 1,2d foo.data | awk '{printf("%.2f\n", $2);}'

    >
    > Why sed and awk:
    >
    > awk 'NR>2 {printf("%.2f\n", $2);}' data.txt
    >
    > And in Python:
    >
    > f = open("data.txt")
    > f.readline()    # skip header
    > f.readline()    # skip header
    > for line in f:
    >      print "%02s" % float(line.split()[1])


    @Jerome ; Your suggestion provided floating point error, it might need
    some slight modificiation.

    @Nick ; Sorry mate, it needs to be in Python. But I noted solution in
    case if I need for another case.

    @Alexander ; Works as expected.

    Thank you all for the replies.
    Yigit Turgut, Dec 22, 2011
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jason Heyes
    Replies:
    4
    Views:
    372
    Karl Heinz Buchegger
    Mar 24, 2005
  2. Joe Francia
    Replies:
    0
    Views:
    294
    Joe Francia
    Jul 8, 2003
  3. phil hunt

    Text-to-HTML processing program

    phil hunt, Jan 3, 2004, in forum: Python
    Replies:
    11
    Views:
    575
    Reinier Post
    Jan 8, 2004
  4. Michael Ellis

    Cleaner idiom for text processing?

    Michael Ellis, May 26, 2004, in forum: Python
    Replies:
    16
    Views:
    485
    Peter Otten
    May 27, 2004
  5. Hubert Hung-Hsien Chang
    Replies:
    2
    Views:
    410
    Michael Foord
    Sep 17, 2004
Loading...

Share This Page