Parsing/Splitting Line

A

acatejr

I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.
 
J

John Machin

I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.

1. Look for "slicing" or "slice" or "slices" in the Python tutorial.
2. Write some code.
3. Run it.
 
G

Gabriel Genellina

At said:
I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.
.... print line[j:j+4], int(line[j:j+4])
....
1234 1234
0001 1
2 2
-3 -3

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
 
N

Neil Cerutti

I have a text file and each line is a list of values. The
values are not delimited, but every four characters is a value.
How do I get python to split this kind of data? Thanks.

Check out _Text Processing in Python_, Chapter 2, "PROBLEM:
Column statistics for delimited or flat-record files".
URL:http://gnosis.cx/TPiP/
 
M

Manuel Kaufmann

El Martes, 21 de Noviembre de 2006 02:59, (e-mail address removed) escribió:
I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.

You can define a function very easy to make it. For example you can do:

# split-line in 'n' characters
import sys

def splitLine(line, n):
"""split line in 'n' characters"""
x = 0
y = 0
while line >= n:
y = x + n
if line[x:y] == '':
break
yield line[x:y]
x += n

if __name__ == '__main__':
# i get the line-split from the command line
# but you can get it from a file
for x in splitLine(sys.argv[1], int(sys.argv[2])):
print x
 
J

John Machin

Neil said:
Check out _Text Processing in Python_, Chapter 2, "PROBLEM:
Column statistics for delimited or flat-record files".
URL:http://gnosis.cx/TPiP/

Hmmmm ... the elementary notion "do line[start:end] in a loop" is well
buried, just behind this:

# Adjust offsets to Python zero-based indexing,
# and also add final position after the line
num_positions = len(self.column_positions)
offsets = [(pos-1) for pos in self.column_positions]
offsets.append(len(line))

Folk who are burdened with real-world flat files (example: several
hundred thousand lines each of 996 bytes wide) might want to consider
moving the set-up of "offsets" out of the once-per line splitter()
method to the __init__() method :)

Cheers,
John
 
J

John Machin

Manuel said:
El Martes, 21 de Noviembre de 2006 02:59, (e-mail address removed) escribió:

You can define a function very easy to make it. For example you can do:

# split-line in 'n' characters
import sys

def splitLine(line, n):
"""split line in 'n' characters"""
x = 0
y = 0
while line >= n:

The intent appears to be that "line" refers to a str object, while "n"
refers to an int object. Comparison of such disparate objects is
guaranteed to produce a reproducible (but not necessarily meaningful)
result.

For example:

| >>> '' > 2
| True

You need to reconsider what you really want to have happen when there
is a trailing short slice. Possibilities are:

(a) silently ignore it -- what I guess your intent was, but the least
attractive IMO
(b) raise an exception -- overkill IMO
(c) just tack it on the end (which is what your code is currently doing
*accidentally*) -- and mention this in the docs and let the caller do
what they want with it.
y = x + n
if line[x:y] == '':
break
yield line[x:y]
x += n

if __name__ == '__main__':
# i get the line-split from the command line
# but you can get it from a file
for x in splitLine(sys.argv[1], int(sys.argv[2])):
print x

HTH,
John
 
N

Noah Rawlins

I have a text file and each line is a list of values. The values are
not delimited, but every four characters is a value. How do I get
python to split this kind of data? Thanks.

I'm a nut for regular expressions and obfuscation...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}' % size, line)
['hell', 'oiam', 'supe', 'rman']


or if you care about remainders...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}|.+$' % size, line)
['hell', 'oiam', 'supe', 'rman', 'sd']


noah
 
F

Fredrik Lundh

Noah Rawlins wrote:

I'm a nut for regular expressions and obfuscation...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}' % size, line)
['hell', 'oiam', 'supe', 'rman']

there are laws against such use of regular expressions in certain
jurisdictions.

</F>
 
G

Georg Brandl

Fredrik said:
Noah Rawlins wrote:

I'm a nut for regular expressions and obfuscation...

import re
def splitline(line, size=4):
return re.findall(r'.{%d}' % size, line)
splitline("helloiamsuperman")
['hell', 'oiam', 'supe', 'rman']

there are laws against such use of regular expressions in certain
jurisdictions.

.... and in particularly bad cases, you will be punished by Perl
not less than 5 years ...

Georg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top