Downloading multiple files based on info extracted from CSV

M

Matt Graves

I have a CSV file containing a bunch of URLs I have to download a file fromfor clients (Column 7) and the clients names (Column 0) I tried making a script to go down the .csv file and just download each file from column 7, and save the file as [clientname].csv

I am relatively new to python, so this may be way off but…






import urllib
import csv
urls = []
clientname = []

###This will set column 7 to be a list of urls
with open('clients.csv', 'r') as f:
reader = csv.reader(f)
for column in reader:
urls.append(column[7])

###And this will set column 0 as a list of client names
with open('clients.csv', 'r') as g:
reader = csv.reader(g)
for column in reader:
clientname.append(column[0])

###This SHOULD plug in the URL for F, and the client name for G.
def downloadFile(urls, clientname):
urllib.urlretrieve(f, "%g.csv") % clientname


downloadFile(f,g)



When I run it, I get : AttributeError: 'file' object has no attribute 'strip'
 
M

Mark Lawrence

I have a CSV file containing a bunch of URLs I have to download a file from for clients (Column 7) and the clients names (Column 0) I tried making a script to go down the .csv file and just download each file from column 7, and save the file as [clientname].csv

I am relatively new to python, so this may be way off but…

import urllib
import csv
urls = []
clientname = []

I assume clientnames.
###This will set column 7 to be a list of urls
with open('clients.csv', 'r') as f:
reader = csv.reader(f)
for column in reader:
urls.append(column[7])

###And this will set column 0 as a list of client names
with open('clients.csv', 'r') as g:
reader = csv.reader(g)
for column in reader:
clientname.append(column[0])

You could do the above in one hit.

with open('clients.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
urls.append(row[7])
clientnames.append(row[0])

Note that you're reading rows, not columns.
###This SHOULD plug in the URL for F, and the client name for G.

What makes you think this, f and g are file handles?
def downloadFile(urls, clientname):
urllib.urlretrieve(f, "%g.csv") % clientname

If you want one file at a time you'd want url, clientname.
downloadFile(f,g)

I think you want something like.

for url, clientname in zip(urls, clientnames):
downloadFile(url, clientname)
When I run it, I get : AttributeError: 'file' object has no attribute 'strip'

When you get a traceback like this please cut and paste all it of, not
just the last line. Here it seems likely that your call to downloadFile
doesn't like you passing in the file handle as I've explained above (I
hope :)
 
C

Chris Angelico

###This SHOULD plug in the URL for F, and the client name for G.
def downloadFile(urls, clientname):
urllib.urlretrieve(f, "%g.csv") % clientname

downloadFile(f,g)

When I run it, I get : AttributeError: 'file' object has no attribute 'strip'

When showing errors like this, you really need to copy and paste.
Fortunately, I can see where the problem is, here. You're referencing
the file object still in f, which is now a closed file object, instead
of the parameter urls.

But you're also passing f and g as parameters, instead of urls and
clientname. In fact, the downloadFile function isn't really achieving
much; you'd do better to simply inline its code into the main routine
and save yourself the hassle.

While you're at it, there are two more problems in that line of code.
Firstly, you're going to save everything into a file called "%g.csv",
and then try to modulo the return value of urlretrieve with the
clientname; I think you want the close parens at the very end of that
line. And secondly, %g is a floating-point encoder - you want %s here,
or simply use string concatenation:

urllib.urlretrieve(urls, clientname + ".csv")

Except that those are your lists, so that won't work without another
change. We'll fix that later...
###This will set column 7 to be a list of urls
with open('clients.csv', 'r') as f:
reader = csv.reader(f)
for column in reader:
urls.append(column[7])

###And this will set column 0 as a list of client names
with open('clients.csv', 'r') as g:
reader = csv.reader(g)
for column in reader:
clientname.append(column[0])

You're reading the file twice. There's no reason to do that; you can
read both columns at once. (By the way, what you're iterating over is
actually rows; for each row that comes out of the reader, do something
with one element from it. So calling it "column" is a bit confusing.)
So now we come to a choice. Question: Is it okay to hold the CSV file
open while you do the downloading? If it is, you can simplify the code
way way down:

import urllib
import csv

# You actually could get away with not using a with
# block here, but may as well keep it for best practice
with open('clients.csv') as f:
for client in csv.reader(f):
urllib.urlretrieve(client[7], client[0] + ".csv")

Yep, that's it! That's all you need. But retrieving all that might
take a long time, so it might be better to do all your CSV reading
first and only *then* start downloading. In that case, I'd make a
single list of tuples:

import urllib
import csv

clients = []
with open('clients.csv') as f:
for client in csv.reader(f):
clients.append((client[7], client[0] + ".csv"))

for client in clients:
urllib.urlretrieve(client[0], client[1])

And since the "iterate and append to a new list" idiom is so common,
it can be simplified down to a list comprehension; and since "call
this function with this tuple of arguments" is so common, it has its
own syntax. So the code looks like this:

import urllib
import csv

with open('clients.csv') as f:
clients = [client[7], client[0]+".csv" for client in csv.reader(f)]

for client in clients:
urllib.urlretrieve(*client)

Again, it's really that simple! :)

Enjoy!

ChrisA
 
J

John Gordon

In said:
import urllib
import csv
urls = []
clientname = []
###This will set column 7 to be a list of urls
with open('clients.csv', 'r') as f:
reader = csv.reader(f)
for column in reader:
urls.append(column[7])
###And this will set column 0 as a list of client names
with open('clients.csv', 'r') as g:
reader = csv.reader(g)
for column in reader:
clientname.append(column[0])
###This SHOULD plug in the URL for F, and the client name for G.
def downloadFile(urls, clientname):
urllib.urlretrieve(f, "%g.csv") % clientname

When I run it, I get : AttributeError: 'file' object has no attribute
'strip'

I think you're passing the wrong arguments to downloadFile(). You're
calling downloadFile(f, g), but f and g are file objects. Don't you want
to pass urls and clientname instead?

Even if the correct arguments are passed to downloadFile, I think you're
using them incorrectly. You don't even use the urls argument, and
clientname is supposed to be a list, so why aren't you looping through
it?

You aren't using string interpolation correctly on the call to urlretrieve.
Assuming your intent was to build a string and pass it as the second
argument, you have the close-parenthesis in the wrong place. The call
should look like this:

urllib.urlretrieve(f, "%g.csv" % clientname)

"%g" returns a floating-point value. Did you mean "%s" instead?)
 
D

Dennis Lee Bieber

def downloadFile(urls, clientname):
urllib.urlretrieve(f, "%g.csv") % clientname
The most blatent error is this line...

You are calling urllib.urlretrieve passing it arguments of f and the
string "%g.csv".

THEN you are doing a string interpolation on the RESULT.

The line should probably be

urllib.urlretrieve(f, "%s.csv" % clientname)

to apply the string interpolation first, and pass that as the second
argument (also not the %s for /string/ [which, in Python, tends to accept
any argument and produce a general string from it -- using any other format
specification tends to be for cases where you need to match a particular
format... %4.4x if you want zero-filled hex, for example])
 
M

Matt Graves

import urllib

import csv



# You actually could get away with not using a with

# block here, but may as well keep it for best practice

with open('clients.csv') as f:

for client in csv.reader(f):

urllib.urlretrieve(client[7], client[0] + ".csv")



Yep, that's it! That's all you need.


Worked perfect. Thank you!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top