Function for examine content of directory

Discussion in 'Python' started by Tigerstyle, Sep 6, 2012.

  1. Tigerstyle

    Tigerstyle Guest

    Hi guys,

    I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

    This is the code so far:
    --
    import os

    path = "v:\\workspace\\Python2_Homework03\\src\\"
    dirs = os.listdir( path )
    filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
    extensions = []
    for filename in filenames:
    f = open(filename, "w")
    f.write("Some text\n")
    f.close()
    name , ext = os.path.splitext(f.name)
    extensions.append(ext)

    # This would print all the files and directories
    for file in dirs:
    print(file)

    for ext in extensions:
    print("Count for %s: " %ext, extensions.count(ext))

    --

    When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

    this.pdf
    the_other.txt
    this.doc
    that.txt
    this.txt
    that.pdf
    first.txt
    that.doc
    Count for .pdf: 2
    Count for .txt: 4
    Count for .doc: 2
    Count for .txt: 4
    Count for .txt: 4
    Count for .pdf: 2
    Count for .txt: 4
    Count for .doc: 2

    Any help is appreciated.

    T
     
    Tigerstyle, Sep 6, 2012
    #1
    1. Advertising

  2. Tigerstyle

    Ian Foote Guest

    On 06/09/12 15:56, Tigerstyle wrote:
    > Hi guys,
    >
    > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
    >
    > This is the code so far:
    > --
    > import os
    >
    > path = "v:\\workspace\\Python2_Homework03\\src\\"
    > dirs = os.listdir( path )
    > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
    > extensions = []

    Try using a set here instead of a list:
    extensions = set()
    > for filename in filenames:
    > f = open(filename, "w")
    > f.write("Some text\n")
    > f.close()
    > name , ext = os.path.splitext(f.name)
    > extensions.append(ext)

    and use:
    extensions.add(ext)

    This should take care of duplicates for you.

    Regards,
    Ian
     
    Ian Foote, Sep 6, 2012
    #2
    1. Advertising

  3. Tigerstyle

    MRAB Guest

    On 06/09/2012 15:56, Tigerstyle wrote:
    > Hi guys,
    >
    > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
    >
    > This is the code so far:
    > --
    > import os
    >
    > path = "v:\\workspace\\Python2_Homework03\\src\\"
    > dirs = os.listdir( path )
    > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
    > extensions = []
    > for filename in filenames:
    > f = open(filename, "w")
    > f.write("Some text\n")
    > f.close()
    > name , ext = os.path.splitext(f.name)
    > extensions.append(ext)
    >
    > # This would print all the files and directories
    > for file in dirs:
    > print(file)
    >
    > for ext in extensions:
    > print("Count for %s: " %ext, extensions.count(ext))
    >
    > --
    >
    > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:
    >
    > this.pdf
    > the_other.txt
    > this.doc
    > that.txt
    > this.txt
    > that.pdf
    > first.txt
    > that.doc
    > Count for .pdf: 2
    > Count for .txt: 4
    > Count for .doc: 2
    > Count for .txt: 4
    > Count for .txt: 4
    > Count for .pdf: 2
    > Count for .txt: 4
    > Count for .doc: 2
    >

    That's because each extension can occur multiple times in the list.

    Try the Counter class:

    from collections import Counter

    for ext, count in Counter(extensions).items():
    print("Count for %s: " % ext, count)
     
    MRAB, Sep 6, 2012
    #3
  4. Tigerstyle

    Tigerstyle Guest

    Thanks, just what I was looking for :)

    T

    kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
    > On 06/09/2012 15:56, Tigerstyle wrote:
    >
    > > Hi guys,

    >
    > >

    >
    > > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

    >
    > >

    >
    > > This is the code so far:

    >
    > > --

    >
    > > import os

    >
    > >

    >
    > > path = "v:\\workspace\\Python2_Homework03\\src\\"

    >
    > > dirs = os.listdir( path )

    >
    > > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that..doc","this.pdf","first.txt","that.pdf"}

    >
    > > extensions = []

    >
    > > for filename in filenames:

    >
    > > f = open(filename, "w")

    >
    > > f.write("Some text\n")

    >
    > > f.close()

    >
    > > name , ext = os.path.splitext(f.name)

    >
    > > extensions.append(ext)

    >
    > >

    >
    > > # This would print all the files and directories

    >
    > > for file in dirs:

    >
    > > print(file)

    >
    > >

    >
    > > for ext in extensions:

    >
    > > print("Count for %s: " %ext, extensions.count(ext))

    >
    > >

    >
    > > --

    >
    > >

    >
    > > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

    >
    > >

    >
    > > this.pdf

    >
    > > the_other.txt

    >
    > > this.doc

    >
    > > that.txt

    >
    > > this.txt

    >
    > > that.pdf

    >
    > > first.txt

    >
    > > that.doc

    >
    > > Count for .pdf: 2

    >
    > > Count for .txt: 4

    >
    > > Count for .doc: 2

    >
    > > Count for .txt: 4

    >
    > > Count for .txt: 4

    >
    > > Count for .pdf: 2

    >
    > > Count for .txt: 4

    >
    > > Count for .doc: 2

    >
    > >

    >
    > That's because each extension can occur multiple times in the list.
    >
    >
    >
    > Try the Counter class:
    >
    >
    >
    > from collections import Counter
    >
    >
    >
    > for ext, count in Counter(extensions).items():
    >
    > print("Count for %s: " % ext, count)
     
    Tigerstyle, Sep 6, 2012
    #4
  5. Tigerstyle

    Tigerstyle Guest

    Thanks, just what I was looking for :)

    T

    kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
    > On 06/09/2012 15:56, Tigerstyle wrote:
    >
    > > Hi guys,

    >
    > >

    >
    > > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

    >
    > >

    >
    > > This is the code so far:

    >
    > > --

    >
    > > import os

    >
    > >

    >
    > > path = "v:\\workspace\\Python2_Homework03\\src\\"

    >
    > > dirs = os.listdir( path )

    >
    > > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that..doc","this.pdf","first.txt","that.pdf"}

    >
    > > extensions = []

    >
    > > for filename in filenames:

    >
    > > f = open(filename, "w")

    >
    > > f.write("Some text\n")

    >
    > > f.close()

    >
    > > name , ext = os.path.splitext(f.name)

    >
    > > extensions.append(ext)

    >
    > >

    >
    > > # This would print all the files and directories

    >
    > > for file in dirs:

    >
    > > print(file)

    >
    > >

    >
    > > for ext in extensions:

    >
    > > print("Count for %s: " %ext, extensions.count(ext))

    >
    > >

    >
    > > --

    >
    > >

    >
    > > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

    >
    > >

    >
    > > this.pdf

    >
    > > the_other.txt

    >
    > > this.doc

    >
    > > that.txt

    >
    > > this.txt

    >
    > > that.pdf

    >
    > > first.txt

    >
    > > that.doc

    >
    > > Count for .pdf: 2

    >
    > > Count for .txt: 4

    >
    > > Count for .doc: 2

    >
    > > Count for .txt: 4

    >
    > > Count for .txt: 4

    >
    > > Count for .pdf: 2

    >
    > > Count for .txt: 4

    >
    > > Count for .doc: 2

    >
    > >

    >
    > That's because each extension can occur multiple times in the list.
    >
    >
    >
    > Try the Counter class:
    >
    >
    >
    > from collections import Counter
    >
    >
    >
    > for ext, count in Counter(extensions).items():
    >
    > print("Count for %s: " % ext, count)
     
    Tigerstyle, Sep 6, 2012
    #5
  6. On Thu, 6 Sep 2012 07:56:29 -0700 (PDT), Tigerstyle
    <> declaimed the following in
    gmane.comp.python.general:


    > extensions.append(ext)
    >

    Don't append an ext if it is already in the list...

    if ext not in extensions: extensions.append(ext)
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Sep 6, 2012
    #6
  7. On Fri, Sep 7, 2012 at 12:56 AM, Tigerstyle <> wrote:
    > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)


    If you haven't already, look into the Python 'dict' type; you may find
    it easier to work with for this sort of job. You can map an extension
    ("txt") to its count (4) directly.

    ChrisA
     
    Chris Angelico, Sep 6, 2012
    #7
  8. Tigerstyle

    Tigerstyle Guest

    kl. 16:56:29 UTC+2 torsdag 6. september 2012 skrev Tigerstyle følgende:
    > Hi guys,
    >
    >
    >
    > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
    >
    >
    >
    > This is the code so far:
    >
    > --
    >
    > import os
    >
    >
    >
    > path = "v:\\workspace\\Python2_Homework03\\src\\"
    >
    > dirs = os.listdir( path )
    >
    > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
    >
    > extensions = []
    >
    > for filename in filenames:
    >
    > f = open(filename, "w")
    >
    > f.write("Some text\n")
    >
    > f.close()
    >
    > name , ext = os.path.splitext(f.name)
    >
    > extensions.append(ext)
    >
    >
    >
    > # This would print all the files and directories
    >
    > for file in dirs:
    >
    > print(file)
    >
    >
    >
    > for ext in extensions:
    >
    > print("Count for %s: " %ext, extensions.count(ext))
    >
    >
    >
    > --
    >
    >
    >
    > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type.. Like this:
    >
    >
    >
    > this.pdf
    >
    > the_other.txt
    >
    > this.doc
    >
    > that.txt
    >
    > this.txt
    >
    > that.pdf
    >
    > first.txt
    >
    > that.doc
    >
    > Count for .pdf: 2
    >
    > Count for .txt: 4
    >
    > Count for .doc: 2
    >
    > Count for .txt: 4
    >
    > Count for .txt: 4
    >
    > Count for .pdf: 2
    >
    > Count for .txt: 4
    >
    > Count for .doc: 2
    >
    >
    >
    > Any help is appreciated.
    >
    >
    >
    > T
     
    Tigerstyle, Sep 7, 2012
    #8
  9. Tigerstyle

    Tigerstyle Guest

    Ok I'm now totally stuck.

    This is the code:

    ---
    import os
    from collections import Counter

    path = ":c\\mypath\dir"
    dirs = os.listdir( path )
    filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
    extensions = []
    for filename in filenames:
    f = open(filename, "w")
    f.write("Some text\n")
    f.close()
    name , ext = os.path.splitext(f.name)
    extensions.append(ext)

    # This would print all the files and directories
    for file in dirs:
    print(file)



    for ext, count in Counter(extensions).items():
    print("Count for %s: " % ext, count)

    ---

    I need to make this module into a function and write a separate module to verify by testing that the function gives correct results.

    Help and pointers are much appreciated.

    T
     
    Tigerstyle, Sep 7, 2012
    #9
  10. On Fri, 7 Sep 2012 07:28:03 -0700 (PDT), Tigerstyle
    <> declaimed the following in
    gmane.comp.python.general:

    > Ok I'm now totally stuck.
    >
    > This is the code:
    >

    This code is full of errors...

    > ---
    > import os
    > from collections import Counter
    >
    > path = ":c\\mypath\dir"


    Not a valid Windows path. The format should be "c:\mypath\dir"
    (actually, to use \ you should probably declare it a raw string -- much
    simpler, since all the python/OS functions don't care, is to use / -- as
    in "c:/mypath/dir")

    > dirs = os.listdir( path )


    Warning, this will also list items that are not files (like
    subdirectories). (hence "dirs" is a misleading name)


    > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
    > extensions = []
    > for filename in filenames:
    > f = open(filename, "w")
    > f.write("Some text\n")
    > f.close()
    > name , ext = os.path.splitext(f.name)
    > extensions.append(ext)
    >
    > # This would print all the files and directories
    > for file in dirs:
    > print(file)


    This prints the file/directory /name/

    NOTE: you grabbed the list of names BEFORE you created your test
    data files, so...

    >
    >
    >
    > for ext, count in Counter(extensions).items():
    > print("Count for %s: " % ext, count)
    >

    .... this is not really a count of files grouped by extension IN the
    directory -- this is only the count based on the file names you defined
    to be created.

    I'm not going to create test files, nor a test suite, and what I
    have done is still too much... but...

    -=-=-=-=-
    import os
    import collections

    PATH = "e:/userdata/wulfraed/my documents/python progs"

    fids = os.listdir(PATH)

    fids.sort()

    nmlen = max([len(f) for f in fids])

    format = "%%%ss %%10s" % nmlen

    cntr = collections.Counter()

    for fid in fids:
    prefix, ext = os.path.splitext(fid)
    print format % (prefix, ext)
    cntr.update([ext])

    print "\n\n"

    for ext, cnt in cntr.items():
    print "%10s %10s" % (ext, cnt)
    -=-=-=-=-

    .project
    .pydevproject
    .settings
    ABA .py
    ADC .py
    BookList .zip
    CGIServer
    DGen .py
    DiskCatalog .py
    DiskCatalog .pyc
    Dload .py
    Firearms .csv
    GWhist .py
    HTML .py
    Hanoi .py
    Hanoi .pyc
    HierHead .py
    Intervals .py
    MBX_Split .py
    MySQLTest .py
    MySQLTest .pyc
    MySQLdb .html
    MySQLdb_files
    NIM1 .py
    NumberPrinter .py
    PhotoFrame .py
    Probability .py
    ProgressBar .py
    ProgressBar2 .py
    RandomScores .py
    SQL .py
    SQLiteTest .py
    SampleData .txt
    SampleFormat .tsv
    Script1 .py
    Script2 .py
    Script3 .py
    Script3 .pyc
    Sociable_Chain .py
    Sociable_Chain .pyc
    Stereo .py
    TAGS .py
    azel_interp .py
    binadd .py
    binadd2 .py
    bsddb-test .py
    cgiform .py
    chessclock .py
    counter .py
    counterthread .py
    cp .py
    data .txt
    databasetest .py
    databasetest2 .py
    dbfail .py
    dbg .py
    dbg .pyc
    dbtst .py
    dirwalk .py
    execsub .py
    extractor .py
    filecnt .py
    filter .py
    fulldicttest .py
    h2b .py
    h2b .pyc
    headers .py
    highScore .py
    htmlparse .py
    i2b .py
    i2b .pyc
    infile1 .tsv
    infile2 .tsv
    infile3 .tsv
    int2wrd .py
    int2wrd .pyc
    int2wrd2 .py
    int2wrd2 .pyc
    intervalfile .txt
    invoice .csv
    junk .py
    justify .py
    linkedlist .py
    llist .py
    main .py
    make_ou_class .py
    make_ou_class .pyc
    mileage .py
    minmax .py
    mofn .py
    mofn.py .zip
    movefiles .py
    moving .py
    mptest1 .py
    myhtmlparser .py
    myhtmlparser .pyc
    mytest .py
    mytest .pyc
    node .py
    node .pyc
    pcdtojpeg .py
    pst .py
    queens1 .py
    queens2 .py
    queens2.py .zip
    query .py
    railroad .py
    rpg .py
    run .py
    s .txt
    sample .tsv
    scramble .py
    scratch .db
    script1 .html
    script1 .sql
    script2 .html
    setuptools-0.6c6-py2.4 .egg
    sgml .py
    spam .py
    sqltest .py
    sqrot .py
    src
    sub .py
    sub_p1 .py
    sub_p3 .py
    sudoku .py
    sudoku.py .bak
    sudoku .pyc
    summup_dict1
    summup_dict2
    summup_dict2b
    summup_dict3
    summup_list
    t .dat
    t .py
    tabspace .py
    tabspace .pyc
    tdriver .py
    test .csd
    test .db
    test .sql
    test .txt
    testABA .py
    testABA .pyc
    tgsetup .py
    thread .py
    threadsample .py
    threadswap .py
    timetest .py
    timing .py
    trips .dat
    update_log
    ut_00 .py
    wordprob .py



    12
    .pyc 17
    .bak 1
    .sql 2
    .tsv 5
    .csv 2
    .db 2
    .dat 2
    .py 98
    .txt 5
    .html 3
    .csd 1
    .egg 1
    .zip 3
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Sep 7, 2012
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Edward Wilde
    Replies:
    1
    Views:
    540
    Edward Wilde
    Dec 10, 2004
  2. Anders

    Examine the ASP.NET worker process

    Anders, Jan 10, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    486
    Jim Cheshire
    Jan 10, 2006
  3. Robert
    Replies:
    1
    Views:
    452
    Pascal J. Bourguignon
    Apr 14, 2008
  4. John Ladasky
    Replies:
    10
    Views:
    435
    Fuzzyman
    Oct 25, 2008
  5. Robert P. J. Day
    Replies:
    0
    Views:
    238
    Robert P. J. Day
    Jan 19, 2010
Loading...

Share This Page