Download the "head" of a large file?

erikcw · Jul 27, 2009

I'm trying to figure out how to download just the first few lines of a
large (50mb) text file form a server to save bandwidth. Can Python do
this?

Something like the Python equivalent of curl http://url.com/file.xml |
head -c 2048

Thanks!
Erik

Ben Charrow · Jul 27, 2009

erikcw said:
...download just the first few lines of a large (50mb) text file form a
server to save bandwidth..... Something like the Python equivalent of curl
http://url.com/file.xml | head -c 2048

If you're OK calling curl and head from within python:

from subprocess import Popen, PIPE
url = "http://docs.python.org/"
p1 = Popen(["curl", url], stdout = PIPE, stderr = PIPE)
p2 = Popen(["head", "-c", "1024"], stdin = p1.stdout, stdout = PIPE)
p2.communicate()[0]

If you want a pure python approach:

import urllib2
url = "http://docs.python.org/"
req = urllib2.Request(url)
f = urllib2.urlopen(req)
f.read(1024)

HTH,
Ben

John Yeung · Jul 27, 2009

I'm trying to figure out how to download just the first few lines of a
large (50mb) text file form a server to save bandwidth. Can Python do
this?

Something like the Python equivalent of curlhttp://url.com/file.xml|
head -c 2048

urllib.urlopen gives you a file-like object, which you can then read
line by line or in fixed-size chunks. For example:

import urllib
chunk = urllib.urlopen('http://url.com/file.xml').read(2048)

At that point, chunk is just bytes, which you can write to a local
file, print, or whatever it is you want.

John

Gabriel Genellina · Jul 28, 2009

En Mon, 27 Jul 2009 19:40:25 -0300, John Yeung

urllib.urlopen gives you a file-like object, which you can then read
line by line or in fixed-size chunks. For example:

import urllib
chunk = urllib.urlopen('http://url.com/file.xml').read(2048)

At that point, chunk is just bytes, which you can write to a local
file, print, or whatever it is you want.

As the OP wants to save bandwidth, it's better to ask exactly the amount
of data to read. That is, add a Range header field [1] to the request, and
inspect the response for a corresponding Content-Range header [2].

py> import urllib2
py> url = "http://www.python.org/"
py> req = urllib2.Request(url)
py> req.add_header('Range', 'bytes=0-10239') # first 10K
py> f = urllib2.urlopen(req)
py> data = f.read()
py> print repr(data[-30:]), len(data)
'\t <a href="http://www.zope.' 10240
py> f.headers['Content-Range']
'bytes 0-10239/18196'
py> f.getcode()
206 # 206=Partial Content
py> f.close()

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35

[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16

Dennis Lee Bieber · Jul 28, 2009

I'm trying to figure out how to download just the first few lines of a
large (50mb) text file form a server to save bandwidth. Can Python do
this?

Something like the Python equivalent of curl http://url.com/file.xml |
head -c 2048

Presuming that | is a shell pipe operation, then doesn't that
command line use "curl" to download the entire file, and "head" to
display just the first 2k?
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Ben Charrow · Jul 28, 2009

Dennis said:
Presuming that | is a shell pipe operation, then doesn't that
command line use "curl" to download the entire file, and "head" to
display just the first 2k?

No, the entire file is not downloaded. My understanding of why this is (which
could be wrong) is that the output of curl is piped to head, and once head gets
the first 2k it closes the pipe. Then, when curl tries to write to the pipe
again, it gets sent the SIGPIPE signal at which point it exits.

Cheers,
Ben

Creating a direct download div link for pdf file	3	Mar 19, 2023
How does a HEAD pointer end up pointing to the first node in a linked list?	3	Jan 24, 2023
Is it possible to download only the <head> of a web page?	2	Sep 4, 2008
implementing download using a url call	2	Mar 28, 2014
Add a text file that a user specified the name of in a program to a directory	0	Apr 28, 2022
Resuming the HTTP Download of a File and HTTP compression	0	Aug 7, 2013
Improving the web page download code.	5	Aug 27, 2013
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023

Download the "head" of a large file?

erikcw

Ben Charrow

John Yeung

Gabriel Genellina

Dennis Lee Bieber

Ben Charrow

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads