read file with multiple data per line

E

Eduardo

Hello all,

I googled a lot but couldn't find anything that i could consider a
possible solution (though i am fairly new to the language and i think
this is the main cause of my failure).

This is the beginning of the file i have to parse:

Modified System
32728
2NHST1 C1 56 3.263 2.528 16.345

and this is the end:

3.6539 6.4644 20.0000

This line has 7 formatted fields [5 digits integer, 5 digits
character, 5 digits character, 5 digits integer, three %8.3f fields]:

2NHST1 C1 56 3.263 2.528 16.345

and this one has 3 %10.4f fields:
3.6539 6.4644 20.0000

Those rules cannot be ignored or the programs i use to simulate and
analyze the results wont work.

This file describes the xyz coordinates and atom type of all the atoms
of the "system" i wish to simulate but i must sort all groups of
molecules together and that's what i planned to do with a python code.
I tried to accomplish this task using fortran wich is my main coding
skills, but it proved to be unstable so i decided to handle files
using a more apropriate languange while maintaining the number
crunching tasks written in fortran.

Thanks in advance and i apologise for eventual typos.

Eduardo Martins
 
S

Steven D'Aprano

Hello all,

I googled a lot but couldn't find anything that i could consider a
possible solution (though i am fairly new to the language and i think
this is the main cause of my failure).


You haven't actually said what the problem is. What are you having
trouble doing?
 
P

Piet van Oostrum

Eduardo said:
E> Hello all,
E> I googled a lot but couldn't find anything that i could consider a
E> possible solution (though i am fairly new to the language and i think
E> this is the main cause of my failure).
E> This is the beginning of the file i have to parse:
E> Modified System
E> 32728
E> 2NHST1 C1 56 3.263 2.528 16.345
E> and this is the end:
E> 3.6539 6.4644 20.0000
E> This line has 7 formatted fields [5 digits integer, 5 digits
E> character, 5 digits character, 5 digits integer, three %8.3f fields]:
E> 2NHST1 C1 56 3.263 2.528 16.345
E> and this one has 3 %10.4f fields:
E> 3.6539 6.4644 20.0000
E> Those rules cannot be ignored or the programs i use to simulate and
E> analyze the results wont work.
E> This file describes the xyz coordinates and atom type of all the atoms
E> of the "system" i wish to simulate but i must sort all groups of
E> molecules together and that's what i planned to do with a python code.
E> I tried to accomplish this task using fortran wich is my main coding
E> skills, but it proved to be unstable so i decided to handle files
E> using a more apropriate languange while maintaining the number
E> crunching tasks written in fortran.

I understand that the first two lines are special and that the third
line, or the third and fourth lines are repeated.

Something like this will parse the lines. After each line you can
process the f* variables.

inp = open('testinput', 'rt')

line1 = inp.readline()
line2 = inp.readline()

for line in inp:
line = line.rstrip('\n')
if len(line) == 44:
f1 = int(line[0:5])
f2 = line[5:10]
f3 = line[10:15]
f4 = int(line[15:20])
f5 = float(line[20:28])
f6 = float(line[28:36])
f7 = float(line[36:44])
print f1,f2,f3,f4,f5,f6,f7
elif len(line) == 30:
f1 = float(line[0:10])
f2 = float(line[10:20])
f3 = float(line[20:30])
print f1,f2,f3
else:
print("Sorry, I don't understand this format: %s" % line)
 
E

Eduardo

You haven't actually said what the problem is. What are you having
trouble doing?

Sorry for that Steven, my main problem is to devise a way to read all
the content of that file into a dictionary or other structure where i
could group atoms by molecule name.

Eduardo <[email protected]> (E) wrote:
E> Hello all,
E> I googled a lot but couldn't find anything that i could consider a
E> possible solution (though i am fairly new to the language and i think
E> this is the main cause of my failure).
E> This is the beginning of the file i have to parse:
E>  Modified System
E>        32728
E>     2NHST1   C1   56   3.263   2.528  16.345
E> and this is the end:
E>     3.6539    6.4644   20.0000
E> This line has 7 formatted fields [5 digits integer, 5 digits
E> character, 5 digits character, 5 digits integer, three %8.3f fields]:
E>     2NHST1   C1   56   3.263   2.528  16.345
E> and this one has 3 %10.4f fields:
E>     3.6539    6.4644   20.0000
E> Those rules cannot be ignored or the programs i use to simulate and
E> analyze the results wont work.
E> This file describes the xyz coordinates and atom type of all the atoms
E> of the "system" i wish to simulate but i must sort all groups of
E> molecules together and that's what i planned to do with a python code..
E> I tried to accomplish this task using fortran wich is my main coding
E> skills, but it proved to be unstable so i decided to handle files
E> using a more apropriate languange while maintaining the number
E> crunching tasks written in fortran.

I understand that the first two lines are special and that the third
line, or the third and fourth lines are repeated.

Something like this will parse the lines. After each line you can
process the f* variables.

inp = open('testinput', 'rt')

line1 = inp.readline()
line2 = inp.readline()

for line in inp:
    line = line.rstrip('\n')
    if len(line) == 44:
        f1 = int(line[0:5])
        f2 = line[5:10]
        f3 = line[10:15]
        f4 = int(line[15:20])
        f5 = float(line[20:28])
        f6 = float(line[28:36])
        f7 = float(line[36:44])
        print f1,f2,f3,f4,f5,f6,f7
    elif len(line) == 30:
        f1 = float(line[0:10])
        f2 = float(line[10:20])
        f3 = float(line[20:30])
        print f1,f2,f3
    else:
        print("Sorry, I don't understand this format: %s" % line)

Thank you very much Piet, i will try your sugestion.
 
W

woooee

If this is the record, then you can use split to get a list of the
individual fields and then convert to int or float where necessary.
rec = "2NHST1 C1 56 3.263 2.528 16.345 "
rec_split = rec.split()
print rec_split

If you want to read two records at a time, then use
all_data = open(name, "r").readlines()
to read all data into memory, given that the file isn't huge. You can
then use a for loop, step=2, to access 2 records at a time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top