C
chadda
I 'll eventually have the input file filled with 350 million items.
Right now there is only one
$more input
3308191
The following program reads in the number from the file named 'input'
and builds a url form this number. Then it builds a url from this
number. I have lynx then dump the data into a file called 'out' and
then just grep the entire thing for the Product Number, Product ID,
SKU, UPC, and weight.
m-net% more parse.pl
#!/usr/bin/perl -w
my (@shit, $read, $build, @product, @id, @sku, @upc, @weight);
my $temp;
open(IN, '<', 'input') || die "cant open: $!";
$read = <IN>;
chomp($read);
$build = "http://www.doba.com/members/catalog/".$read.".html";
$temp = `lynx -accept_all_cookies -dump $build`;
open(OUTFILE, '>out');
print OUTFILE $temp;
close OUTFILE;
open(OUT, '<', 'out') || die "cant open: $!";
@shit = <OUT>;
@product = grep(/Product ID/, @shit);
@id = grep(/Item ID/, @shit);
@sku = grep(/SKU/, @shit);
@upc = grep(/UPC/, @shit); #this part doesn't grep UPC correctly. I
get some extra data after UPC.
@weight = grep(/Weight/, @shit);
print @product;
print @id;
print @sku;
print @upc;
print @weight;
% ./parse.pl
Product ID: 3308191
Item ID: 3653992
SKU: 8930
UPC: 896207999816 Condition: refurbished
Weight: 4.7 lbs.
Right now there is only one
$more input
3308191
The following program reads in the number from the file named 'input'
and builds a url form this number. Then it builds a url from this
number. I have lynx then dump the data into a file called 'out' and
then just grep the entire thing for the Product Number, Product ID,
SKU, UPC, and weight.
m-net% more parse.pl
#!/usr/bin/perl -w
my (@shit, $read, $build, @product, @id, @sku, @upc, @weight);
my $temp;
open(IN, '<', 'input') || die "cant open: $!";
$read = <IN>;
chomp($read);
$build = "http://www.doba.com/members/catalog/".$read.".html";
$temp = `lynx -accept_all_cookies -dump $build`;
open(OUTFILE, '>out');
print OUTFILE $temp;
close OUTFILE;
open(OUT, '<', 'out') || die "cant open: $!";
@shit = <OUT>;
@product = grep(/Product ID/, @shit);
@id = grep(/Item ID/, @shit);
@sku = grep(/SKU/, @shit);
@upc = grep(/UPC/, @shit); #this part doesn't grep UPC correctly. I
get some extra data after UPC.
@weight = grep(/Weight/, @shit);
print @product;
print @id;
print @sku;
print @upc;
print @weight;
% ./parse.pl
Product ID: 3308191
Item ID: 3653992
SKU: 8930
UPC: 896207999816 Condition: refurbished
Weight: 4.7 lbs.