Formatting Results so that They Can be Nicely Imported into a Spreadsheet.

Discussion in 'Python' started by SMERSH009, Aug 5, 2007.

  1. SMERSH009

    SMERSH009 Guest

    Hi All.
    Let's say I have some badly formatted text called doc:


    doc=
    """
    friendid
    Female

    23 years old

    Los Gatos

    United States
    friendid
    Male

    24 years old

    San Francisco, California

    United States
    """

    How would I get these results to be displayed in a format similar to:
    friendid;Female;23 years old;Los Gatos;United States
    friendid;Male; 24 years old;San Francisco, California;United States


    The latter is a lot easier to organize and can be quickly imported
    into Excel's column format.

    Thanks Much,
    Sam
     
    SMERSH009, Aug 5, 2007
    #1
    1. Advertising

  2. SMERSH009

    John Machin Guest

    On Aug 5, 9:35 am, SMERSH009 <> wrote:
    > Hi All.
    > Let's say I have some badly formatted text called doc:
    >
    > doc=
    > """
    > friendid
    > Female
    >
    > 23 years old
    >
    > Los Gatos
    >
    > United States
    > friendid
    > Male
    >
    > 24 years old
    >
    > San Francisco, California
    >
    > United States
    > """
    >
    > How would I get these results to be displayed in a format similar to:
    > friendid;Female;23 years old;Los Gatos;United States
    > friendid;Male; 24 years old;San Francisco, California;United States
    >
    > The latter is a lot easier to organize and can be quickly imported
    > into Excel's column format.
    >


    You write a script to read your input file and write it out either in
    CSV format, using the Python csv module, or just use
    ';'.join(output_row) if you are sure that there are no ';' characters
    in the data.

    Is this a homework question?
     
    John Machin, Aug 5, 2007
    #2
    1. Advertising

  3. SMERSH009

    Guest

    On Aug 4, 6:35?pm, SMERSH009 <> wrote:
    > Hi All.
    > Let's say I have some badly formatted text called doc:
    >
    > doc=
    > """
    > friendid
    > Female
    >
    > 23 years old
    >
    > Los Gatos
    >
    > United States
    > friendid
    > Male
    >
    > 24 years old
    >
    > San Francisco, California
    >
    > United States
    > """
    >
    > How would I get these results to be displayed in a format similar to:
    > friendid;Female;23 years old;Los Gatos;United States
    > friendid;Male; 24 years old;San Francisco, California;United States
    >
    > The latter is a lot easier to organize and can be quickly imported
    > into Excel's column format.
    >
    > Thanks Much,
    > Sam


    d = doc.split('\n')

    f = [i.split() for i in d if i]

    g = [' '.join(i) for i in f]

    rec = []
    temprec = []
    for i in g:
    if i:
    if i == 'friendid':
    rec.append(temprec)
    temprec =
    else:
    temprec.append(i)
    rec.append(temprec)

    output = [';'.join(i) for i in rec if i]

    for i in output: print i

    ## friendid;Female;23 years old;Los Gatos;United States
    ## friendid;Male;24 years old;San Francisco, California;United
    States
     
    , Aug 5, 2007
    #3
  4. SMERSH009

    Jim Langston Guest

    <> wrote in message
    news:...
    > On Aug 4, 6:35?pm, SMERSH009 <> wrote:
    >> Hi All.
    >> Let's say I have some badly formatted text called doc:
    >>
    >> doc=
    >> """
    >> friendid
    >> Female
    >>
    >> 23 years old
    >>
    >> Los Gatos
    >>
    >> United States
    >> friendid
    >> Male
    >>
    >> 24 years old
    >>
    >> San Francisco, California
    >>
    >> United States
    >> """
    >>
    >> How would I get these results to be displayed in a format similar to:
    >> friendid;Female;23 years old;Los Gatos;United States
    >> friendid;Male; 24 years old;San Francisco, California;United States
    >>
    >> The latter is a lot easier to organize and can be quickly imported
    >> into Excel's column format.
    >>
    >> Thanks Much,
    >> Sam

    >
    > d = doc.split('\n')
    >
    > f = [i.split() for i in d if i]
    >
    > g = [' '.join(i) for i in f]
    >
    > rec = []
    > temprec = []
    > for i in g:
    > if i:
    > if i == 'friendid':
    > rec.append(temprec)
    > temprec =
    > else:
    > temprec.append(i)
    > rec.append(temprec)
    >
    > output = [';'.join(i) for i in rec if i]
    >
    > for i in output: print i
    >
    > ## friendid;Female;23 years old;Los Gatos;United States
    > ## friendid;Male;24 years old;San Francisco, California;United States


    also, I would suggest you use CSV format. CSV stands for "Comma Seperated
    Variable" and Excel can load such a sheet directly.

    Instead of seperating using ; seperate using , Of course, this provides a
    problem when there is a , in a string. Resolution is to quote the string.
    Being such, you can just go ahead and quote all strings. So you would want
    the output to be:

    "friendid","Female","23 years old","Los Gatos","United States"
    "friendid","Male","24 years old","San Francisco, California","United States"

    Numbers should not be quoted if you wish to treat them as numeric and not
    text.
     
    Jim Langston, Aug 5, 2007
    #4
  5. SMERSH009

    Guest

    On Aug 4, 9:21?pm, "Jim Langston" <> wrote:
    > <> wrote in message
    >
    > news:...
    >
    >
    >
    >
    >
    > > On Aug 4, 6:35?pm, SMERSH009 <> wrote:
    > >> Hi All.
    > >> Let's say I have some badly formatted text called doc:

    >
    > >> doc=
    > >> """
    > >> friendid
    > >> Female

    >
    > >> 23 years old

    >
    > >> Los Gatos

    >
    > >> United States
    > >> friendid
    > >> Male

    >
    > >> 24 years old

    >
    > >> San Francisco, California

    >
    > >> United States
    > >> """

    >
    > >> How would I get these results to be displayed in a format similar to:
    > >> friendid;Female;23 years old;Los Gatos;United States
    > >> friendid;Male; 24 years old;San Francisco, California;United States

    >
    > >> The latter is a lot easier to organize and can be quickly imported
    > >> into Excel's column format.

    >
    > >> Thanks Much,
    > >> Sam

    >
    > > d = doc.split('\n')

    >
    > > f = [i.split() for i in d if i]

    >
    > > g = [' '.join(i) for i in f]

    >
    > > rec = []
    > > temprec = []
    > > for i in g:
    > > if i:
    > > if i == 'friendid':
    > > rec.append(temprec)
    > > temprec =
    > > else:
    > > temprec.append(i)
    > > rec.append(temprec)

    >
    > > output = [';'.join(i) for i in rec if i]

    >
    > > for i in output: print i

    >
    > > ## friendid;Female;23 years old;Los Gatos;United States
    > > ## friendid;Male;24 years old;San Francisco, California;United States

    >
    > also, I would suggest you use CSV format.


    Well, the OP asked for a specific format. One is not
    always at liberty to change it.

    > CSV stands for "Comma Seperated
    > Variable" and Excel can load such a sheet directly.


    And Excel can load the shown format directly also,
    just specify the delimiter.

    >
    > Instead of seperating using ; seperate using , Of course, this provides a
    > problem when there is a , in a string.


    Which explains the popularity of using tabs as delimiters.
    The data deliverable specification I use at work
    uses the pipe character | which never appears as data
    in this particular application.

    > Resolution is to quote the string.


    Which makes the file bigger and isn't necessary
    when tabs and pipes are used as delimiters.

    > Being such, you can just go ahead and quote all strings. So you would want
    > the output to be:
    >
    > "friendid","Female","23 years old","Los Gatos","United States"
    > "friendid","Male","24 years old","San Francisco, California","United States"


    Which I would do if I had a specification that
    demanded it or was making files for others. For my
    own use, I wouldn't bother as it's unnecessary work.

    >
    > Numbers should not be quoted if you wish to treat them as numeric and not
    > text.


    A good reason not to use quotes at all. Besides which,
    Excel can handle that also.
     
    , Aug 5, 2007
    #5
  6. SMERSH009

    SMERSH009 Guest

    On Aug 4, 8:25 pm, "" <> wrote:
    > On Aug 4, 9:21?pm, "Jim Langston" <> wrote:
    >
    >
    >
    > > <> wrote in message

    >
    > >news:...

    >
    > > > On Aug 4, 6:35?pm, SMERSH009 <> wrote:
    > > >> Hi All.
    > > >> Let's say I have some badly formatted text called doc:

    >
    > > >> doc=
    > > >> """
    > > >> friendid
    > > >> Female

    >
    > > >> 23 years old

    >
    > > >> Los Gatos

    >
    > > >> United States
    > > >> friendid
    > > >> Male

    >
    > > >> 24 years old

    >
    > > >> San Francisco, California

    >
    > > >> United States
    > > >> """

    >
    > > >> How would I get these results to be displayed in a format similar to:
    > > >> friendid;Female;23 years old;Los Gatos;United States
    > > >> friendid;Male; 24 years old;San Francisco, California;United States

    >
    > > >> The latter is a lot easier to organize and can be quickly imported
    > > >> into Excel's column format.

    >
    > > >> Thanks Much,
    > > >> Sam

    >
    > > > d = doc.split('\n')

    >
    > > > f = [i.split() for i in d if i]

    >
    > > > g = [' '.join(i) for i in f]

    >
    > > > rec = []
    > > > temprec = []
    > > > for i in g:
    > > > if i:
    > > > if i == 'friendid':
    > > > rec.append(temprec)
    > > > temprec =
    > > > else:
    > > > temprec.append(i)
    > > > rec.append(temprec)

    >
    > > > output = [';'.join(i) for i in rec if i]

    >
    > > > for i in output: print i

    >
    > > > ## friendid;Female;23 years old;Los Gatos;United States
    > > > ## friendid;Male;24 years old;San Francisco, California;United States

    >
    > > also, I would suggest you use CSV format.

    >
    > Well, the OP asked for a specific format. One is not
    > always at liberty to change it.
    >
    > > CSV stands for "Comma Seperated
    > > Variable" and Excel can load such a sheet directly.

    >
    > And Excel can load the shown format directly also,
    > just specify the delimiter.
    >
    >
    >
    > > Instead of seperating using ; seperate using , Of course, this provides a
    > > problem when there is a , in a string.

    >
    > Which explains the popularity of using tabs as delimiters.
    > The data deliverable specification I use at work
    > uses the pipe character | which never appears as data
    > in this particular application.
    >
    > > Resolution is to quote the string.

    >
    > Which makes the file bigger and isn't necessary
    > when tabs and pipes are used as delimiters.
    >
    > > Being such, you can just go ahead and quote all strings. So you would want
    > > the output to be:

    >
    > > "friendid","Female","23 years old","Los Gatos","United States"
    > > "friendid","Male","24 years old","San Francisco, California","United States"

    >
    > Which I would do if I had a specification that
    > demanded it or was making files for others. For my
    > own use, I wouldn't bother as it's unnecessary work.
    >
    >
    >
    > > Numbers should not be quoted if you wish to treat them as numeric and not
    > > text.

    >
    > A good reason not to use quotes at all. Besides which,
    > Excel can handle that also.


    Thanks for all your posts guys.
    mensanator's was the most helpful, and I only ended up needing to use
    a few lines from that code.
    The only question that remains for me--and this is just for my
    knowledge-- what does the "if i" mean in this code snippet?
    f = [i.split() for i in d if i]
    How is it helpful to leave a dangling "if i"? Why not just f =
    [i.split() for i in d]?

    And yes John, this was indeed a "homework question." It was for my
    daughter's preschool. You are going to help her ace her beginner
    Python class! (No, this was not a homework question).
     
    SMERSH009, Aug 5, 2007
    #6
  7. SMERSH009

    Guest

    On Aug 5, 4:06?am, SMERSH009 <> wrote:
    > On Aug 4, 8:25 pm, "" <> wrote:
    >
    >
    >
    >
    >
    > > On Aug 4, 9:21?pm, "Jim Langston" <> wrote:

    >
    > > > <> wrote in message

    >
    > > >news:...

    >
    > > > > On Aug 4, 6:35?pm, SMERSH009 <> wrote:
    > > > >> Hi All.
    > > > >> Let's say I have some badly formatted text called doc:

    >
    > > > >> doc=
    > > > >> """
    > > > >> friendid
    > > > >> Female

    >
    > > > >> 23 years old

    >
    > > > >> Los Gatos

    >
    > > > >> United States
    > > > >> friendid
    > > > >> Male

    >
    > > > >> 24 years old

    >
    > > > >> San Francisco, California

    >
    > > > >> United States
    > > > >> """

    >
    > > > >> How would I get these results to be displayed in a format similar to:
    > > > >> friendid;Female;23 years old;Los Gatos;United States
    > > > >> friendid;Male; 24 years old;San Francisco, California;United States

    >
    > > > >> The latter is a lot easier to organize and can be quickly imported
    > > > >> into Excel's column format.

    >
    > > > >> Thanks Much,
    > > > >> Sam

    >
    > > > > d = doc.split('\n')

    >
    > > > > f = [i.split() for i in d if i]

    >
    > > > > g = [' '.join(i) for i in f]

    >
    > > > > rec = []
    > > > > temprec = []
    > > > > for i in g:
    > > > > if i:
    > > > > if i == 'friendid':
    > > > > rec.append(temprec)
    > > > > temprec =
    > > > > else:
    > > > > temprec.append(i)
    > > > > rec.append(temprec)

    >
    > > > > output = [';'.join(i) for i in rec if i]

    >
    > > > > for i in output: print i

    >
    > > > > ## friendid;Female;23 years old;Los Gatos;United States
    > > > > ## friendid;Male;24 years old;San Francisco, California;United States

    >
    > > > also, I would suggest you use CSV format.

    >
    > > Well, the OP asked for a specific format. One is not
    > > always at liberty to change it.

    >
    > > > CSV stands for "Comma Seperated
    > > > Variable" and Excel can load such a sheet directly.

    >
    > > And Excel can load the shown format directly also,
    > > just specify the delimiter.

    >
    > > > Instead of seperating using ; seperate using , Of course, this provides a
    > > > problem when there is a , in a string.

    >
    > > Which explains the popularity of using tabs as delimiters.
    > > The data deliverable specification I use at work
    > > uses the pipe character | which never appears as data
    > > in this particular application.

    >
    > > > Resolution is to quote the string.

    >
    > > Which makes the file bigger and isn't necessary
    > > when tabs and pipes are used as delimiters.

    >
    > > > Being such, you can just go ahead and quote all strings. So you would want
    > > > the output to be:

    >
    > > > "friendid","Female","23 years old","Los Gatos","United States"
    > > > "friendid","Male","24 years old","San Francisco, California","United States"

    >
    > > Which I would do if I had a specification that
    > > demanded it or was making files for others. For my
    > > own use, I wouldn't bother as it's unnecessary work.

    >
    > > > Numbers should not be quoted if you wish to treat them as numeric and not
    > > > text.

    >
    > > A good reason not to use quotes at all. Besides which,
    > > Excel can handle that also.

    >
    > Thanks for all your posts guys.
    > mensanator's was the most helpful, and I only ended up needing to use
    > a few lines from that code.
    > The only question that remains for me--and this is just for my
    > knowledge-- what does the "if i" mean in this code snippet?
    > f = [i.split() for i in d if i]


    d is a list of strings, some of which are empty (caused
    by the previous split). "if i" evaluates False for
    an empty string, so that particular i will not get
    split and ends up ommitted from f.

    > How is it helpful to leave a dangling "if i"? Why not just f =
    > [i.split() for i in d]?


    Compare the result of having "if i":
    [[], ['friendid'], ['Female'], ['23', 'years', 'old'], ['Los',
    'Gatos'], ['United', 'States'], ['friendid'], ['Male'], ['24',
    'years', 'old'], ['San', 'Francisco,', 'California'], ['United',
    'States']]

    to not having it:
    >>> ff = [i.split() for i in d]
    >>> ff

    [[], ['friendid'], ['Female'], [], [], ['23', 'years', 'old'], [], [],
    ['Los', 'Gatos'], [], [], ['United', 'States'], ['friendid'],
    ['Male'], [], [], ['24', 'years', 'old'], [], [], ['San',
    'Francisco,', 'California'], [], [], ['United', 'States'], []]

    All the blank lines became empty lists which actually
    wouldn't change anything if they get joined. But that
    won't always be the case. Knowing how to use conditionals
    in list comprehensions is worth knowing.

    Even with the "if i" included, we end up with an
    empty list at the start. This because the first "blank"
    line wasn't blank, it was a space, so it passes the
    "if i" test.

    >
    > And yes John, this was indeed a "homework question." It was for my
    > daughter's preschool. You are going to help her ace her beginner
    > Python class! (No, this was not a homework question).
     
    , Aug 5, 2007
    #7
  8. <> wrote:
    ...
    > Even with the "if i" included, we end up with an
    > empty list at the start. This because the first "blank"
    > line wasn't blank, it was a space, so it passes the
    > "if i" test.


    ....and you can fix that by changing the test to [... if i.split()].


    Alex
     
    Alex Martelli, Aug 5, 2007
    #8
  9. Re: Formatting Results so that They Can be Nicely Imported into aSpreadsheet.

    En Sun, 05 Aug 2007 06:06:54 -0300, SMERSH009 <>
    escribió:

    > The only question that remains for me--and this is just for my
    > knowledge-- what does the "if i" mean in this code snippet?
    > f = [i.split() for i in d if i]
    > How is it helpful to leave a dangling "if i"? Why not just f =
    > [i.split() for i in d]?


    `if i` means `if i is considered true`. In this case we are talking about
    strings: "" is false and all other strings are true. So this is a way to
    say `if i is not the empty string`, effectively filtering out empty lines.
    Perhaps using more meaningful names for variables makes the code more
    clear:

    exploded_lines = [line.split() for line in lines if line]

    --
    Gabriel Genellina
     
    Gabriel Genellina, Aug 12, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul F. Johnson
    Replies:
    4
    Views:
    345
  2. Martin M.
    Replies:
    4
    Views:
    345
    Simon Brunning
    Dec 15, 2005
  3. Dun Peal
    Replies:
    10
    Views:
    461
    Chris Rebert
    May 3, 2011
  4. ben
    Replies:
    1
    Views:
    89
    Ray Costanzo [MVP]
    Sep 10, 2004
  5. Volker Nicolai
    Replies:
    9
    Views:
    966
    Fabian Pilkowski
    Jul 4, 2005
Loading...

Share This Page