Python help for a C++ programmer

M

mlimber

I'm writing a text processing program to process some survey results.
I'm familiar with C++ and could write it in that, but I thought I'd
try out Python. I've got a handle on the file I/O and regular
expression processing, but I'm wondering about building my array of
classes (I'd probably use a struct in C++ since there are no methods,
just data).

I want something like (C++ code):

struct Response
{
std::string name;
int age;
int iData[ 10 ];
std::string sData;
};

// Prototype
void Process( const std::vector<Response>& );

int main()
{
std::vector<Response> responses;

while( /* not end of file */ )
{
Response r;

// Fill struct from file
r.name = /* get the data from the file */;
r.age = /* ... */;
r.iData[0] = /* ... */;
// ...
r.sData = /* ... */;
responses.push_back( r );
}

// Do some processing on the responses
Process( responses );
}

What is the preferred way to do this sort of thing in Python?

Thanks in advance! --M
 
L

Lutz Horn

Hi,

I'm writing a text processing program to process some survey results.
I'm familiar with C++ and could write it in that, but I thought I'd
try out Python. I've got a handle on the file I/O and regular
expression processing, but I'm wondering about building my array of
classes (I'd probably use a struct in C++ since there are no methods,
just data).

You could try something like this.

#!/usr/bin/env python

class Response:
def __init__(self, name, age, iData, sData):
self.name = name
self.age = age
self.iData = iData
self.sData = sData

def sourceOfResponses():
return [["you", 42, [1, 2, 3], ["foo", "bar", "baz"]],
["me", 23, [1, 2, 3], ["ham", "spam", "eggs"]]]

if __name__ == "__main__":
responses = []
for input in sourceOfResponses:
response = Response(input.name, input.age,
input.iData, input.sData)
reponses.append(response)

Lutz
 
N

Neil Cerutti

I'm writing a text processing program to process some survey results.
I'm familiar with C++ and could write it in that, but I thought I'd
try out Python. I've got a handle on the file I/O and regular
expression processing, but I'm wondering about building my array of
classes (I'd probably use a struct in C++ since there are no methods,
just data).

I want something like (C++ code):

struct Response
{
std::string name;
int age;
int iData[ 10 ];
std::string sData;
};

// Prototype
void Process( const std::vector<Response>& );

int main()
{
std::vector<Response> responses;

while( /* not end of file */ )
{
Response r;

// Fill struct from file
r.name = /* get the data from the file */;
r.age = /* ... */;
r.iData[0] = /* ... */;
// ...
r.sData = /* ... */;
responses.push_back( r );
}

// Do some processing on the responses
Process( responses );
}

What is the preferred way to do this sort of thing in Python?

It depends on the format of your data (Python provides lots of
shortcuts for handling lots of kinds of data), but perhaps something
like this, if you do all the parsing manually:

class Response(object):
def __init__(self, extern_rep):
# parse or translate extern_rep into ...
self.name = ...
self.age = ...
# Use a dictionary instead of parallel lists.
self.data = {...}
def process(self):
# Do what you need to do.

fstream = open('thedatafile')

for line in fstream:
# This assumes each line is one response.
Response(line).process()
 
T

Tim Chase

I want something like (C++ code):
struct Response
{
std::string name;
int age;
int iData[ 10 ];
std::string sData;
};

// Prototype
void Process( const std::vector<Response>& );

int main()
{
std::vector<Response> responses;

while( /* not end of file */ )
{
Response r;

// Fill struct from file
r.name = /* get the data from the file */;
r.age = /* ... */;
r.iData[0] = /* ... */;
// ...
r.sData = /* ... */;
responses.push_back( r );
}

// Do some processing on the responses
Process( responses );
}

What is the preferred way to do this sort of thing in Python?

Without knowing more about the details involved with parsing the
file, here's a first-pass whack at it:

class Response(object):
def __init__(self, name, age, iData, sData):
self.name = name
self.age = age
self.iData = iData
self.sData = sData

def __repr__(self):
return '%s (%s)' % self.name

def parse_response_from_line(line):
name, age, iData, sData = line.rstrip('\n').split('\t')
return Response(name, age, iData, sData)

def process(response):
print 'Processing %r' % response

responses = [parse_response_from_line(line)
for line in file('input.txt')]

for response in responses:
process(response)


That last pair might be condensed to just

for line in file('input.txt'):
process(parse_response_from_line(line))

Things get a bit hairier if your input is multi-line. You might
have to do something like

def getline(fp):
return fp.readline().rstrip('\n')
def response_generator(fp):
name = None
while name != '':
name = getline(fp)
age = getline(fp)
iData = getline(fp)
sData = getline(fp)
if name and age and iData and sData:
yield Response(name, age, iData, sData)

fp = file('input.txt')
for response in response_generator(fp):
process(response)

which you can modify accordingly.

-tkc
 
B

Bruno Desthuilliers

mlimber a écrit :
I'm writing a text processing program to process some survey results.
I'm familiar with C++ and could write it in that, but I thought I'd
try out Python. I've got a handle on the file I/O and regular
expression processing,

FWIW, and depending on your text format, there may be better solutions
than regexps.
but I'm wondering about building my array of
classes (I'd probably use a struct in C++ since there are no methods,
just data).

If you have no methods and you're sure you won't have no methods, then
just use a dict (name-indexed record) or a tuple (position-indexed record).
I want something like (C++ code):

struct Response
{
std::string name;
int age;
int iData[ 10 ];
std::string sData;
};

// Prototype
void Process( const std::vector<Response>& );

int main()
{
std::vector<Response> responses;

while( /* not end of file */ )
{
Response r;

// Fill struct from file
r.name = /* get the data from the file */;
r.age = /* ... */;
r.iData[0] = /* ... */;
// ...
r.sData = /* ... */;
responses.push_back( r );
}

// Do some processing on the responses
Process( responses );
}

What is the preferred way to do this sort of thing in Python?

# assuming you're using a line-oriented format, and not
# worrying about exception handling etc...

def extract(line):
data = dict()
data['name'] = # get the name
data['age'] = # get the age
data['data'] = # etc...
return data


def process(responses):
# code here

if name == '__main__':
import sys
path = sys.argv[1]
responses = [extract(line) for line in open(path)]
process(response)

If you have a very huge dataset, you may want to either use tuples
instead of dicts (less overhead) and/or use a more stream-oriented
approach using generators - if applyable of course (that is, if you
don't need to extract all results before processing)

HTH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,586
Members
45,084
Latest member
HansGeorgi

Latest Threads

Top