Need Estimate of Programming Effort

J

Jeff Sheffel

I'm looking for a simple estimate of a "level of effort", for a Perl
programming task.

The estimate should be in hours. (Maybe a range of programming hours -
based on level of Perl experience.) Any other additional estimate
information, and design comments are appreciated. (Please do not ask
questions about the requirements, since I did not write them; make any
assumptions necessary.)

Program Requirements:
---------------------
Design a web scraping utility to scrap information from various shopping
sites. The code should be written in Object Oriented Perl, with use strict
and warnings enabled.

The initial sites used should be http://www.shopzilla.com and
http://www.shopping.com. However, the program should be designed in a
manor that will allow other sites to be added in the future.

The minimum requirement for output is: Site scraped from, product name,
short description, low price, high price. For simplicity scrapings can be
limited to 60 items or less from each target site.

Optional features that can be added are throttling and threading.
Throttling will limit the number of hits to a particular site in a giving
time period and threading would allow the program to make several requests
simultaneously.
The program should be fully documented and run without warnings.
 
U

Uri Guttman

JS> I'm looking for a simple estimate of a "level of effort", for a Perl
JS> programming task.

you need to hire someone just for this task alone.

JS> The estimate should be in hours. (Maybe a range of programming hours -
JS> based on level of Perl experience.) Any other additional estimate
JS> information, and design comments are appreciated. (Please do not ask
JS> questions about the requirements, since I did not write them; make any
JS> assumptions necessary.)

you can't do that. i have written crawlers before and the client will
ALWAYS make many changes as it is developed. these projects cannot be
properly estimated without a very clear and precise spec. you are asking
for a world of trouble otherwise and this comes from deep experience.

JS> Design a web scraping utility to scrap information from various
JS> shopping sites. The code should be written in Object Oriented
JS> Perl, with use strict and warnings enabled.

oh boy! strict and warnings add many hours to any project. a stupid
requirement which doesn't help at all with estimates. so many design
questions will need to be asked and answered. this is not a toy.

JS> The initial sites used should be http://www.shopzilla.com and
JS> http://www.shopping.com. However, the program should be designed in a
JS> manor that will allow other sites to be added in the future.

s/manor/manner/

and you can't crawl those sites as is. they are shopping search engines
so you would need to know the product names/etc to locate them.

JS> The minimum requirement for output is: Site scraped from, product name,
JS> short description, low price, high price. For simplicity scrapings can be
JS> limited to 60 items or less from each target site.

which 60 items? is there a list? will it grow? more unasked questions.

JS> Optional features that can be added are throttling and threading.
JS> Throttling will limit the number of hits to a particular site in a
JS> giving time period and threading would allow the program to make
JS> several requests simultaneously.

parallel requests can be done without threading and in several
ways. threading is a design issue and not a requirement. throttling is a
requirement and if you didn't do it, any decent site will notice and
block you. are these 'optional' features to be designed in now or bolted
on (poorly) later? again, crawling large scale is not for
kiddies. prototype crawlers will not scale unless they are designed for
it from the beginning. so you have a major requirements conflict here
about whether this is a kiddie toy or a professional scalable crawler.

JS> The program should be fully documented and run without warnings.

they want documentation too? unheard of!! how about some properly
written requirements first?

if you really need professional help (and i think you do) have them
contact me directly as i have actually created 2 major crawler systems
and can at least ask the right questions. but there is no way in hell i
would provide a time estimate on such a frivolous set of
requirements. you can't make assumptions as this could be a week long
kiddie thing or 6-12 man-months which is a pretty wide range of
estimates.

i await the call from your client (or yourself). (not holding my
breath).

uri
 
J

Jeff Sheffel

Uri,
Thank you for your time. Your comments are valuable, and I agree with the
points you're making.

Your summary of programming effort, i.e. 1 week (= 40 man hours?) minimum,
is the quick answer I was looking for.

I should have clarified, that, this is not for a client, but a
"homework exercise" used by a potential employer for employment screening.
So, creative and rapid programming disciplines are being called for, here.

I don't see why you state, that, "the shopping sites can't be crawled."
I think they can, but not easily, and each site will require specific
methods. The product names ARE the search terms (expressed by the user on
the command line). They (i.e. the client) used the term threading, loosely,
as a design requirement for parallelism.

I don't think I'm up for this test...
aren't their plenty of Perl jobs today?
Jeff
 
U

Uri Guttman

JS> Your summary of programming effort, i.e. 1 week (= 40 man hours?)
JS> minimum, is the quick answer I was looking for.

JS> I should have clarified, that, this is not for a client, but a
JS> "homework exercise" used by a potential employer for employment
JS> screening. So, creative and rapid programming disciplines are
JS> being called for, here.

why didn't you say that to begin with? that is the most important info
in the whole story.

JS> I don't see why you state, that, "the shopping sites can't be
JS> crawled." I think they can, but not easily, and each site will
JS> require specific methods. The product names ARE the search terms
JS> (expressed by the user on the command line). They (i.e. the
JS> client) used the term threading, loosely, as a design requirement
JS> for parallelism.

this sounds like a very large homework assignment. i would be wary of
working for them if they require such projects to apply for a
job. unless they are expecting a toy which can be done in little time if
you don't care about scaling. there are dinky crawlers on cpan and if
you just drive them with some search terms the rest is parsing the web
pages (also cpan) and various amounts of driver and glue code.

JS> I don't think I'm up for this test...
JS> aren't their plenty of Perl jobs today?

ever look at jobs.perl.org?

and you should tell this employer to also post there.

and i do some perl job placement as well.

uri
 
C

Charlton Wilbur

JS> I should have clarified, that, this is not for a client, but a
JS> "homework exercise" used by a potential employer for
JS> employment screening. So, creative and rapid programming
JS> disciplines are being called for, here.

Someone who wants to employ *you* asks *you* this question, and you
pass it on to Usenet to do *your* homework for you?

Charlton
 
C

Charlton Wilbur

UG> this sounds like a very large homework assignment. i would be
UG> wary of working for them if they require such projects to
UG> apply for a job.

My impression is that the employer wanted a back-of-the-envelope
estimate as homework, not that they wanted the whole crawler.

Charlton
 
J

Jens Thoms Toerring

Uri Guttman said:
JS> Design a web scraping utility to scrap information from various
JS> shopping sites. The code should be written in Object Oriented
JS> Perl, with use strict and warnings enabled.
oh boy! strict and warnings add many hours to any project. a stupid
requirement which doesn't help at all with estimates.

Awfully sorry for chiming in like that but I am labouring under the
impression that using strict and warnings actually saves me a lot
of time since it helps to catch my more stupid errors. Can you help
me to find out about the errors of may ways and write a bit more about
why you think it would "add many hours to any project" instead?
Uri Guttman ------ (e-mail address removed) -------- http://www.stemsystems.com

This might be a temporary problem but when I try to go to the URL
at the end I got "403 Forbidden".

Regards, Jens
 
U

Uri Guttman

JS> Design a web scraping utility to scrap information from various
JS> shopping sites. The code should be written in Object Oriented
JS> Perl, with use strict and warnings enabled.

JTT> Awfully sorry for chiming in like that but I am labouring under the
JTT> impression that using strict and warnings actually saves me a lot
JTT> of time since it helps to catch my more stupid errors. Can you help
JTT> me to find out about the errors of may ways and write a bit more about
JTT> why you think it would "add many hours to any project" instead?

i was being sarcastic about estimating a project schedule when strict
and warnings are enabled. of course i endorse their use all the time but
it was silly for the requirements to specify them. and considering that
this was job application homework it is even sillier. how would using
strict and warnings affect project time estimation?


JTT> This might be a temporary problem but when I try to go to the URL
JTT> at the end I got "403 Forbidden".

it is down until i can redo the site as its hosting was moved. i should
put up an under construction thing already. an interview i did for
perlcast last summer was just broadcast last week and i got some emails
about it being down.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top