Reading a portion of a file

C

cmfvulcanius

I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.

Thanks
 
R

Rune Strand

I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.

Seems like something along these line will do:

_file_ = "filepart.txt"

begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'

sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)

for s in sections: print s

If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.
 
J

Jordan

I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.

Seems like something along these line will do:

_file_ = "filepart.txt"

begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'

sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)

for s in sections: print s

If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.

You probably don't want to use regex for something this simple; it's
likely to make things even more complicated. Is there a space between
the begin_tag and the first word of a section (same question with the
end_tag)?
 
J

Jordan

On Mar 8, 5:12 pm, (e-mail address removed) wrote:
Seems like something along these line will do:
_file_ = "filepart.txt"
begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'
sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)
for s in sections: print s
If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.

You probably don't want to use regex for something this simple; it's
likely to make things even more complicated. Is there a space between
the begin_tag and the first word of a section (same question with the
end_tag)?

Sent the post too soon. What is the endline character for the file
type? What type of file is it? An example section would be nice
too. Cheers.
 
C

cmfvulcanius

On Mar 8, 5:12 pm, (e-mail address removed) wrote:
I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.
Seems like something along these line will do:
_file_ = "filepart.txt"
begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'
sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)
for s in sections: print s
If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.
You probably don't want to use regex for something this simple; it's
likely to make things even more complicated. Is there a space between
the begin_tag and the first word of a section (same question with the
end_tag)?

Sent the post too soon. What is the endline character for the file
type? What type of file is it? An example section would be nice
too. Cheers.

Ok, regex was my first thought because I used to use grep with Perl
and shell scripting to grab everything from one pattern to another
pattern. The file is just an unformatted file. What is below is
exactly what is in the file. There are no spaces between the beginning
and ending tags and the content. Would you recommend using spaces
there? And if so, why?

A sample of the file:

#VS:COMMAND:df:START
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vzfs 20971520 517652 20453868 3% /
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
tmpfs 2016032 0 2016032 0% /dev/shm
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
#VS:COMMAND:df:STOP

#VS:FILE:/proc/loadavg:START
0.00 0.00 0.00 1/32 14543
#VS:FILE:/proc/loadavg:STOP

#VS:FILE:/proc/meminfo:START
MemTotal: 524288 kB
MemFree: 450448 kB
Buffers: 0 kB
Cached: 0 kB
SwapCached: 0 kB
Active: 0 kB
Inactive: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 524288 kB
LowFree: 450448 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 73840 kB
Slab: 0 kB
CommitLimit: 0 kB
Committed_AS: 248704 kB
PageTables: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
#VS:FILE:/proc/meminfo:STOP

#VS:FILE:/proc/stat:START
cpu 67188 0 26366 391669264 656686 0 0
cpu0 24700 0 10830 195807826 373309 0 0
cpu1 42488 0 15536 195861438 283376 0 0
intr 0
swap 0 0
ctxt 18105366807
btime 1171391058
processes 26501285
procs_running 1
procs_blocked 0
#VS:FILE:/proc/stat:STOP

#VS:FILE:/proc/uptime:START
1962358.88 1577059.05
#VS:FILE:/proc/uptime:STOP
 
A

attn.steven.kuo

On Mar 8, 10:35 am, (e-mail address removed) wrote:

(snipped)

Ok, regex was my first thought because I used to use grep with Perl
and shell scripting to grab everything from one pattern to another
pattern. The file is just an unformatted file. What is below is
exactly what is in the file. There are no spaces between the beginning
and ending tags and the content. Would you recommend using spaces
there? And if so, why?

A sample of the file:


You can use iterators:

import StringIO
import itertools

def group(line):
if line[-6:-1] == 'START':
group.current = group.current + 1
return group.current

group.current = 0

data = """
#VS:COMMAND:df:START
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vzfs 20971520 517652 20453868 3% /
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
tmpfs 2016032 0 2016032 0% /dev/shm
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
#VS:COMMAND:df:STOP

#VS:FILE:/proc/loadavg:START
0.00 0.00 0.00 1/32 14543
#VS:FILE:/proc/loadavg:STOP

#VS:FILE:/proc/meminfo:START
MemTotal: 524288 kB
MemFree: 450448 kB
Buffers: 0 kB
Cached: 0 kB
SwapCached: 0 kB
Active: 0 kB
Inactive: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 524288 kB
LowFree: 450448 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 73840 kB
Slab: 0 kB
CommitLimit: 0 kB
Committed_AS: 248704 kB
PageTables: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
#VS:FILE:/proc/meminfo:STOP

#VS:FILE:/proc/stat:START
cpu 67188 0 26366 391669264 656686 0 0
cpu0 24700 0 10830 195807826 373309 0 0
cpu1 42488 0 15536 195861438 283376 0 0
intr 0
swap 0 0
ctxt 18105366807
btime 1171391058
processes 26501285
procs_running 1
procs_blocked 0
#VS:FILE:/proc/stat:STOP

#VS:FILE:/proc/uptime:START
1962358.88 1577059.05
#VS:FILE:/proc/uptime:STOP
""".lstrip("\n");

fh = StringIO.StringIO(data)

sections = itertools.groupby(itertools.ifilter(lambda line: len(line)
lambda line: group(line))

for key, section in sections:
for line in section:
print key, line,
 
V

Vulcanius

Here is the code I've come up with. Please feel free to critique it
and let me know what you would change. Also, as you can see I call
"open(SERVER,'r')" twice; but I want to only call it once, what would
the best way to do this be?

------------------------------------------------------------

import re

SERVER = "192.168.1.60"

# Pull all data from server file.
FILE = open(SERVER,'r')
ALLINFO = FILE.read()

# Grab a list of all sections in the server file.
SECTIONS = re.findall("(?m)^\#VS:\w*:.*:", ALLINFO)

# Remove duplicates from the list.
if SECTIONS:
SECTIONS.sort()
LAST = SECTIONS[-1]
for I in range(len(SECTIONS)-2, -1, -1):
if LAST==SECTIONS: del SECTIONS
else: LAST=SECTIONS

# Pull data from each section and assign it a dictionary item.
# Data can be called using SECTIONDICT['section'] i.e
SECTIONDICT['df']
SECTIONDICT = {}
for SECT in SECTIONS:
PRESECTNAME1 = SECT[9:len(SECT) - 1]
PRESECTNAME2 = PRESECTNAME1.split("/")
SECTNAME = PRESECTNAME2[len(PRESECTNAME1.split("/")) - 1]
START = SECT + "START"
STOP = SECT + "STOP"
for LINE in open(SERVER,'r'):
LINE = LINE.strip()
if START in LINE:
SECTIONLISTTEMP = []
elif STOP in LINE:
SECTIONDICT[SECTNAME] = SECTIONLISTTEMP
SECTIONLISTTEMP = []
print "-" * 80
print "SECTION: %s" % SECTNAME
print SECTIONDICT[SECTNAME]
else:
if LINE:
SECTIONLISTTEMP.append(LINE)

FILE.close()

------------------------------------------------------------
 
G

Gabriel Genellina

Here is the code I've come up with. Please feel free to critique it
and let me know what you would change. Also, as you can see I call
"open(SERVER,'r')" twice; but I want to only call it once, what would
the best way to do this be?

You got yesterday a reply from rune.strand@g... without regexps that looks
pretty functional, have you seen it?
SECTIONDICT = {}
for SECT in SECTIONS:
PRESECTNAME1 = SECT[9:len(SECT) - 1]
PRESECTNAME2 = PRESECTNAME1.split("/")

Ugh... don't use UPPERCASE names for variables, please!
Better to follow this style guide: http://www.python.org/dev/peps/pep-0008/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top