Translate tab-delimited to fixed width text

Discussion in 'Python' started by mike beck, Sep 21, 2004.

  1. mike beck

    mike beck Guest

    I'm writing a script to take a tab-delimited input file where the
    format is known and transform it to a fixed-width text file. I thought
    that the script might be easier to troubleshoot or edit later (if the
    file formats change, for example) if I specified the column names for
    the input and output files, which is why I didn't just map input
    column 1 -> export column 14.

    Even though I've spent some quality time with the archives and the
    fine manual, I'm pretty sure that I'm doing some WRONG THINGS. Or
    perhaps some SUB-OPTIMAL THINGS. And possibly REALLY WEIRD THINGS. Or
    just maybe THINGS THAT WILL WORK FOR A LITTLE WHILE BUT WILL SURELY
    CAUSE DIRE PROBLEMS AT SOME LATER, INCREDIBLY INCONVENIENT TIME
    RESULTING IN HORRIBLE, TRAGIC CONSEQUENCES FOR EVERYONE INVOLVED.

    The script works, but the use of multiple dictionaries seems ... how
    should I say ... unintuitive. Is there a better way to move values
    from one dictionary to another than what I've done here? I'd
    appreciate any other tips (especially those that fit into the
    CATEGORIES listed above).

    And I've omitted the read from file and write to file business for the
    sake of brevity...

    Standing on the shoulders of giants (meaning, you all!),
    Mike

    ---

    """
    Transforms a tab delimited text file to a fixed-width text file.
    """
    import copy, string

    DEFFIELDSEP = '\t';
    DEFLINESEP = '\n';
    DEFHASHEADER = 1; # 1 if yes, 0 if no

    import_field_order = ( 'Email',
    'First Name(s)',
    'Last Name',
    'Address Line 1',
    'Address Line 2',
    'City',
    'State',
    'Zip Code', );

    # export columns (keys) to import columns (values)
    mapped_fields = { 'EMAIL_ADDRESS_1':'Email',
    'FIRST_NAME_1':'First Name(s)',
    'LAST_NAME_1':'Last Name',
    'ADDRESS_1':'Address Line 1',
    'CITY':'City',
    'STATE':'State',
    'ZIP_CODE':'Zip Code', };

    # required length and justification of export fields
    export_fields = { 'RECORD_TYPE':[1,'left'],
    'ACCOUNT_ID':[8,'left'],
    'TITLE_1':[8,'left'],
    'FIRST_NAME_1':[20,'left'],
    'MIDDLE_NAME_1':[20,'left'],
    'LAST_NAME_1':[20,'left'],
    'SUFFIX_1':[8,'left'],
    'NAME_ID_1':[2,'left'],
    'ADDRESS_1':[256,'left'],
    'CITY':[30,'left'],
    'STATE':[2,'left'],
    'ZIP_CODE':[5,'left'],
    'ZIP_EXTENSION':[4,'left'],
    'EMAIL_ADDRESS_1':[100,'left'],
    'EMAIL_ADDRESS_TYPE_1':[2,'left'], };

    export_field_order = ( 'RECORD_TYPE',
    'ACCOUNT_ID',
    'TITLE_1',
    'FIRST_NAME_1',
    'MIDDLE_NAME_1',
    'LAST_NAME_1',
    'SUFFIX_1',
    'NAME_ID_1',
    'ADDRESS_1',
    'CITY',
    'STATE',
    'ZIP_CODE',
    'ZIP_EXTENSION',
    'EMAIL_ADDRESS_1',
    'EMAIL_ADDRESS_TYPE_1', );

    def pad ( elem, max_length, justification=None):
    """
    Pad a string to a predetermined length. Truncate the string
    if necessary. By default, left justify and pad with spaces.
    """
    padded_value = '';
    # what if elem is none?
    if (justification is 'right'):
    padded_value = elem.rjust(max_length)[:max_length]
    else:
    padded_value = elem.ljust(max_length)[:max_length]
    return padded_value

    def glue ( var, ref, order ):
    """
    Build a string with padded dictionary values.
    Var and ref are dictionaries, order is a tuple describing the
    field order.
    """
    finished_line = []
    # pad the items
    for k in var.keys():
    # pad (value, length, justification)
    var[k] = pad( str(var[k]), ref[k][0], ref[k][1])
    # build the line in order
    for elem in order:
    finished_line.append(dict.get(var, elem))
    # join the elements and return a string
    return ''.join([x for x in finished_line])

    def build ( line, field_sep = DEFFIELDSEP ):
    """
    Create a single record by transposing fields via
    multiple dictionaries.
    """
    fields = string.split(line, field_sep);
    # trim whitespace
    fields = map(string.strip, fields)
    # create dict of import column names (keys) and import
    # data (values)
    step_one = dict([(q,a) for q,a in zip(import_field_order,
    fields)])
    # create a 'throwaway' copy of the mapped fields dict to
    # do the transform
    step_two = copy.deepcopy(mapped_fields)
    # replace the column names of the import file with the
    # values from the import file
    for k,v in step_two.items():
    step_two[k] = dict.get(step_one, v)
    # create an empty dict with the export column names
    step_three = dict([(q,'') for q in export_field_order])
    dict.update(step_three, step_two)
    # special cases
    step_three['RECORD_TYPE'] = 'D'
    step_three['EMAIL_ADDRESS_TYPE_1'] = 'I'
    return glue(step_three, export_fields, export_field_order)

    def decapitate(infile,has_header=DEFHASHEADER,line_sep=DEFLINESEP):
    """
    Return file without header row
    """
    if (has_header):
    sans_head = string.split(infile, line_sep)[1:]
    else:
    sans_head = string.split(infile, line_sep)
    return sans_head

    if __name__ == '__main__':

    # init vars
    out_lst = []

    # sample data. Names have been changed ... ;)
    fin = {'data': 'Email\tFirst Name(s)\tLast Name\tAddress Line
    1\tAddress Line 2\tCity\tState\tZip
    Code\\tBill\tMunroe\t416
    Main\t\tManhattan\tNY\t10471\\tGeorge\tScott\t111
    Blue Street 2005\t\tSan
    Francisco\tCA\t94144\\tMike\tBork\t22 Oak
    Rd\t\tAlbuquerque\tNM\t01720-5303\\tIma\tCrazy\t1111 E
    Maple Rd\t\tDenver\tCo\t80206-6139'};

    # get a list of lines without a header row
    lines = decapitate(fin['data']);

    for line in lines:
    # check for blank lines
    if (line):
    out_lst.append(build(line))

    print out_lst;
     
    mike beck, Sep 21, 2004
    #1
    1. Advertising

  2. mike beck

    Larry Bates Guest

    What's up with all the semi-colon (;) line terminators?
    Seems like some mixture of Javascript and Python here.

    I tend to do this with classes. Then I can take
    advantage of using the __str__ method to build the
    output string.

    class outputrec:
    #
    # default = default value if not given (also defines length)
    # fmt = format string to build output (%s, %i, %.2f, etc.)
    # justify = None (left) or right
    # values = list of allowable values
    #

    _record={'Email': {'default':100*' ',
    'fmt': '%s',
    'justify': None,
    'values': None}},
    'FirstName':{'default':20*' ',
    'fmt': '%s',
    'justify': None,
    'values': None}},
    #
    # Insert remainder of fields here
    #
    }

    _attributes=('Email','FirstName',...)

    def __init__(self, **kwargs):
    #-----------------------------------------------------------------------
    # If any keyword arguments were passed in, update local attributes
    with
    # their values.
    #-----------------------------------------------------------------------
    for attribute in [a for a in kwargs.keys() if a in
    self._attributes]:
    self.__setattr__(attribute, kwargs[attribute])

    return

    def __setattr__(self, attribute, value):
    '''
    Override the __setattr__ method so that I can limit the attributes
    to only those in the _record defined and so I can properly justify
    the attributes according to the justification entry.
    '''
    #-----------------------------------------------------------------------
    # Create shortcut to self.__dict__
    #-----------------------------------------------------------------------
    D=self.__dict__
    #-----------------------------------------------------------------------
    # Make sure that caller is setting a legal attribute
    #-----------------------------------------------------------------------
    if attribute not in self._attributes:
    raise AttributeError, "accpacPRbase: Illegal attribute=%s" % \
    attribute

    #-----------------------------------------------------------------------
    # See if the attribute has a list of allowable values and check the
    # value against that list.
    #-----------------------------------------------------------------------
    values=self._record[attribute]['values']
    if values and (value not in values):
    print "attribute=%s, value=%s not in " \
    "values list=%s" % (attribute, value, values))
    sys.exit(2)

    #-----------------------------------------------------------------------
    # Check the length of string attributes to make sure they are not
    too
    # long
    #-----------------------------------------------------------------------
    if not self._record[attribute]['fmt'] == '"%s"': D[attribute]=value
    else:
    max_length=len(self._record[attribute]['default'])
    if len(value) > max_length:
    print "attribute=%s, too long, " \
    "value='%s' truncated" % (attribute, value))
    print "len(%s)=%i, maxlen=%i" % \
    (attribute, len(attribute), max_length))
    #---------------------------------------------------------------
    # Chop it off at the maximum length
    #---------------------------------------------------------------
    value=value[:max_length]

    #-------------------------------------------------------------------
    # See if this value is to be justified to the right, if so
    justify
    # it to the length of the default value.
    #-------------------------------------------------------------------
    if self._record[attribute]['justify']:
    D[attribute]=value.rstrip().rjust(max_length)
    else:
    D[attribute]=value.rstrip().ljust(max_length)


    return

    def __str__(self):
    '''
    Create and return a string representation of this record for output.
    '''
    #-----------------------------------------------------------------------
    # Create shortcut to self.__dict__
    #-----------------------------------------------------------------------
    D=self.__dict__
    outputlist=[]
    #-----------------------------------------------------------------------
    # Loop over every possible attribute and use the defined format
    string
    # to format it for output record. Output record is then joined and
    # returned.
    #-----------------------------------------------------------------------
    for attribute in self._attributes:
    outputlist.append(self._record[attribute]['fmt'] % D[attribute])

    return ''.join(outputlist)



    OREC=outputrec()
    OREC.Email=''
    OREC.FirstName='Whatever'
    #
    # Put more assignments here
    #
    # or you can do in single statement
    #
    OREC=outputrec(Email='', FirstName='Whatever')

    Then you can get output record as

    f.writelines(OREC)

    There are LOTS of ways to accomplish this and I've tried most of
    them, this method just works well for me. I do basically the same
    thing for the input records. Once the record classes are debugged
    the program is very easy to write. It also "hides" the record
    formats from the main program which makes it easy should they
    change or if you want to support multiple formats (just write
    another one for the new format and conditionally instantiate it
    in your main program). I have actually created a base class for
    this and I define my record class with it as the baseclass. This
    way much of the baseclass code (that is debugged) is reusable.

    Hope this helps.

    Larry Bates



    "mike beck" <> wrote in message
    news:...
    > I'm writing a script to take a tab-delimited input file where the
    > format is known and transform it to a fixed-width text file. I thought
    > that the script might be easier to troubleshoot or edit later (if the
    > file formats change, for example) if I specified the column names for
    > the input and output files, which is why I didn't just map input
    > column 1 -> export column 14.
    >
    > Even though I've spent some quality time with the archives and the
    > fine manual, I'm pretty sure that I'm doing some WRONG THINGS. Or
    > perhaps some SUB-OPTIMAL THINGS. And possibly REALLY WEIRD THINGS. Or
    > just maybe THINGS THAT WILL WORK FOR A LITTLE WHILE BUT WILL SURELY
    > CAUSE DIRE PROBLEMS AT SOME LATER, INCREDIBLY INCONVENIENT TIME
    > RESULTING IN HORRIBLE, TRAGIC CONSEQUENCES FOR EVERYONE INVOLVED.
    >
    > The script works, but the use of multiple dictionaries seems ... how
    > should I say ... unintuitive. Is there a better way to move values
    > from one dictionary to another than what I've done here? I'd
    > appreciate any other tips (especially those that fit into the
    > CATEGORIES listed above).
    >
    > And I've omitted the read from file and write to file business for the
    > sake of brevity...
    >
    > Standing on the shoulders of giants (meaning, you all!),
    > Mike
    >
    > ---
    >
    > """
    > Transforms a tab delimited text file to a fixed-width text file.
    > """
    > import copy, string
    >
    > DEFFIELDSEP = '\t';
    > DEFLINESEP = '\n';
    > DEFHASHEADER = 1; # 1 if yes, 0 if no
    >
    > import_field_order = ( 'Email',
    > 'First Name(s)',
    > 'Last Name',
    > 'Address Line 1',
    > 'Address Line 2',
    > 'City',
    > 'State',
    > 'Zip Code', );
    >
    > # export columns (keys) to import columns (values)
    > mapped_fields = { 'EMAIL_ADDRESS_1':'Email',
    > 'FIRST_NAME_1':'First Name(s)',
    > 'LAST_NAME_1':'Last Name',
    > 'ADDRESS_1':'Address Line 1',
    > 'CITY':'City',
    > 'STATE':'State',
    > 'ZIP_CODE':'Zip Code', };
    >
    > # required length and justification of export fields
    > export_fields = { 'RECORD_TYPE':[1,'left'],
    > 'ACCOUNT_ID':[8,'left'],
    > 'TITLE_1':[8,'left'],
    > 'FIRST_NAME_1':[20,'left'],
    > 'MIDDLE_NAME_1':[20,'left'],
    > 'LAST_NAME_1':[20,'left'],
    > 'SUFFIX_1':[8,'left'],
    > 'NAME_ID_1':[2,'left'],
    > 'ADDRESS_1':[256,'left'],
    > 'CITY':[30,'left'],
    > 'STATE':[2,'left'],
    > 'ZIP_CODE':[5,'left'],
    > 'ZIP_EXTENSION':[4,'left'],
    > 'EMAIL_ADDRESS_1':[100,'left'],
    > 'EMAIL_ADDRESS_TYPE_1':[2,'left'], };
    >
    > export_field_order = ( 'RECORD_TYPE',
    > 'ACCOUNT_ID',
    > 'TITLE_1',
    > 'FIRST_NAME_1',
    > 'MIDDLE_NAME_1',
    > 'LAST_NAME_1',
    > 'SUFFIX_1',
    > 'NAME_ID_1',
    > 'ADDRESS_1',
    > 'CITY',
    > 'STATE',
    > 'ZIP_CODE',
    > 'ZIP_EXTENSION',
    > 'EMAIL_ADDRESS_1',
    > 'EMAIL_ADDRESS_TYPE_1', );
    >
    > def pad ( elem, max_length, justification=None):
    > """
    > Pad a string to a predetermined length. Truncate the string
    > if necessary. By default, left justify and pad with spaces.
    > """
    > padded_value = '';
    > # what if elem is none?
    > if (justification is 'right'):
    > padded_value = elem.rjust(max_length)[:max_length]
    > else:
    > padded_value = elem.ljust(max_length)[:max_length]
    > return padded_value
    >
    > def glue ( var, ref, order ):
    > """
    > Build a string with padded dictionary values.
    > Var and ref are dictionaries, order is a tuple describing the
    > field order.
    > """
    > finished_line = []
    > # pad the items
    > for k in var.keys():
    > # pad (value, length, justification)
    > var[k] = pad( str(var[k]), ref[k][0], ref[k][1])
    > # build the line in order
    > for elem in order:
    > finished_line.append(dict.get(var, elem))
    > # join the elements and return a string
    > return ''.join([x for x in finished_line])
    >
    > def build ( line, field_sep = DEFFIELDSEP ):
    > """
    > Create a single record by transposing fields via
    > multiple dictionaries.
    > """
    > fields = string.split(line, field_sep);
    > # trim whitespace
    > fields = map(string.strip, fields)
    > # create dict of import column names (keys) and import
    > # data (values)
    > step_one = dict([(q,a) for q,a in zip(import_field_order,
    > fields)])
    > # create a 'throwaway' copy of the mapped fields dict to
    > # do the transform
    > step_two = copy.deepcopy(mapped_fields)
    > # replace the column names of the import file with the
    > # values from the import file
    > for k,v in step_two.items():
    > step_two[k] = dict.get(step_one, v)
    > # create an empty dict with the export column names
    > step_three = dict([(q,'') for q in export_field_order])
    > dict.update(step_three, step_two)
    > # special cases
    > step_three['RECORD_TYPE'] = 'D'
    > step_three['EMAIL_ADDRESS_TYPE_1'] = 'I'
    > return glue(step_three, export_fields, export_field_order)
    >
    > def decapitate(infile,has_header=DEFHASHEADER,line_sep=DEFLINESEP):
    > """
    > Return file without header row
    > """
    > if (has_header):
    > sans_head = string.split(infile, line_sep)[1:]
    > else:
    > sans_head = string.split(infile, line_sep)
    > return sans_head
    >
    > if __name__ == '__main__':
    >
    > # init vars
    > out_lst = []
    >
    > # sample data. Names have been changed ... ;)
    > fin = {'data': 'Email\tFirst Name(s)\tLast Name\tAddress Line
    > 1\tAddress Line 2\tCity\tState\tZip
    > Code\\tBill\tMunroe\t416
    > Main\t\tManhattan\tNY\t10471\\tGeorge\tScott\t111
    > Blue Street 2005\t\tSan
    > Francisco\tCA\t94144\\tMike\tBork\t22 Oak
    > Rd\t\tAlbuquerque\tNM\t01720-5303\\tIma\tCrazy\t1111 E
    > Maple Rd\t\tDenver\tCo\t80206-6139'};
    >
    > # get a list of lines without a header row
    > lines = decapitate(fin['data']);
    >
    > for line in lines:
    > # check for blank lines
    > if (line):
    > out_lst.append(build(line))
    >
    > print out_lst;
     
    Larry Bates, Sep 21, 2004
    #2
    1. Advertising

  3. mike beck

    mike beck Guest

    This is very cool and useful. Thanks for the help!

    > I tend to do this with classes. Then I can take
    > advantage of using the __str__ method to build the
    > output string.
     
    mike beck, Sep 30, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Lozzi

    Export to Tab Delimited Text File

    David Lozzi, Mar 31, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    4,938
    David Lozzi
    Apr 1, 2006
  2. johnp
    Replies:
    4
    Views:
    3,676
    Toby Inkster
    May 23, 2005
  3. RyanL
    Replies:
    6
    Views:
    690
    Paul McGuire
    Aug 28, 2007
  4. digz
    Replies:
    5
    Views:
    2,152
    Tim Chase
    Mar 19, 2009
  5. Srikant
    Replies:
    1
    Views:
    500
    Greg Bacon
    Sep 29, 2007
Loading...

Share This Page