extract all hotmail email addresses in a file and store in separatefile

D

Dennis

Hi, I have a text file that contents a list of email addresses like
this:

"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help
 
C

cartercc

Hi, I have a text file that contents a list of email addresses like
this:

"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help

open INFILE, "<all_emails.txt";
open HOTMAIL, ">hotmail_only.txt";
open NOTHOTMAIL, ">not_hotmail.txt";
while(<INFILE>)
{
$_ =~ s/"//g;
print HOTMAIL if $_ =~ /hotmail/i;
print NOTHOTMAIL if $_ != /hotmail/i;
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;
 
C

cartercc

Hi, I have a text file that contents a list of email addresses like
this:

"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help

open INFILE, "<all_emails.txt";
open HOTMAIL, ">hotmail_only.txt";
open NOTHOTMAIL, ">not_hotmail.txt";
while(<INFILE>)
{
$_ =~ s/"//g;
print HOTMAIL if $_ =~ /hotmail/i;
print NOTHOTMAIL if $_ != /hotmail/i;
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;
 
S

szr

Dennis said:
Hi, I have a text file that contents a list of email addresses like
this:

"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

Thanks for any help

To get you started, assuming there are no escaped quotes in between:

while (m!"([^"]+?)"!g) {
if ($1 =~ m!\@hotmail\.com!) {
... do something with $1
}
else { ... }
}


Or if you are using an array:

foreach my $email (map { s!"([^"]+?)"!$1!g; $_; } @email_list) {
if ($email =~ m!\@hotmail\.com!) {
...
}
else { ... }
}

(Note, untested, but should give a starting point.)
 
A

Antoninus Twink

open INFILE, "<all_emails.txt";
open HOTMAIL, ">hotmail_only.txt";
open NOTHOTMAIL, ">not_hotmail.txt";
while(<INFILE>)
{
$_ =~ s/"//g;
print HOTMAIL if $_ =~ /hotmail/i;
print NOTHOTMAIL if $_ != /hotmail/i;
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;

Firstly, you mean !~ instead of !=. Secondly, referring to $_ all the
time is unnecessary. Try:

open INFILE, "< all_emails.txt";
open HOTMAIL, "> hotmail_only.txt";
open NOTHOTMAIL, "> not_hotmail.txt";
while(<INFILE>)
{
s/"//g;
if (/hotmail/i) {
print HOTMAIL;
} else {
print NOTHOTMAIL;
}
}
close INFILE;
close HOTMAIL;
close NOTHOTMAIL;

You might also like to include some error-checking, and avoid
hard-coding the paths.
 
B

Bartc

Dennis said:
Hi, I have a text file that contents a list of email addresses like
this:

"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.
2. extract out the hotmail addresses and store it into another file.
The hotmail addresses in the original file would be deleted.

You have perl solutions so you won't need this. But was an interesting
little snippet:

/* Sort email addresses (possibly for some nefarious purpose) from file
"input" */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void error(void) {puts("File error"); exit(0);}

int main(void) {
char line[200];
char *p;
int n;

FILE *in,*hot,*nothot;

in=fopen("input","r");
if (in==0) error();

hot=fopen("hotmail","w");
if (hot==0) {fclose(in); error();};

nothot=fopen("nothotmail","w");
if (nothot==0) {fclose(in); fclose(nothot); error();};

while (1) {

fgets(line,sizeof(line),in);
if (feof(in)) break;

n=strlen(line);
p=line;
if (line[n-1]='\n') {line[n-1]=0; --n;};
if (n) {
if (line[n-1]='""') {line[n-1]=0; --n;};
if (*p=='"') ++p;

if (strstr(p,"@hotmail.com"))
fprintf(hot,"%s\n",p);
else
fprintf(nothot,"%s\n",p);
};
};

fclose(in);
fclose(hot);
fclose(nothot);

}
 
K

Kenny McCormack

<snip />

Why are you cross-posting this to C and Perl newsgroups?


Rui Maciel

I assume because he is interested in a C/Perl solution to his problem.
 
T

Tomás Ó hÉilidhe

1. Strip out the " characters and just leave the email addresses on
each line.

char const *const original = "\"(e-mail address removed)\"";

char buf[50];

strcpy(buf,original);

buf[strlen(original) - 1] = 0;

2. extract out the hotmail addresses and store it into another file.


Take the last 12 characters, make them all lowercase, and then compare
with "@hotmail.com".

I'm not big up on the file access functions, I usually just consult
the reference at dinkumware.com when I want to use them.
 
J

Jürgen Exner

Dennis said:
Hi, I have a text file that contents a list of email addresses like
this:

"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"

I like to

1. Strip out the " characters and just leave the email addresses on
each line.

'perldoc perlop' and look for either tr/// in combination with the 'd'
option or s/// in combination with the 'g' option. Maybe in combination
with anchoring the RE.

Or you can use substr() to grab anything between the first and last
character, excluding both.
2. extract out the hotmail addresses

perldoc -f grep
and store it into another file.

perldoc -f open
The hotmail addresses in the original file would be deleted.

perldoc -q "delete a line"

jue
 
V

vippstar

1. Strip out the " characters and just leave the email addresses on
each line.

char const *const original = "\"(e-mail address removed)\"";

char buf[50];

strcpy(buf,original);

buf[strlen(original) - 1] = 0;
That does not strip both of the " characters.
char const is confusing, and the second const is unnecessary.
fix:
const char *original = "\"...\"";
Take the last 12 characters, make them all lowercase, and then compare
with "@hotmail.com".
@ does not belong to C's basic character set, so, that's not possible.
 
M

Martijn Lievaart

Hi, I have a text file that contents a list of email addresses like
this:

"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"
"(e-mail address removed)"

I like to

1. Strip out the " characters and just leave the email addresses on each
line.
2. extract out the hotmail addresses and store it into another file. The
hotmail addresses in the original file would be deleted.

perl -nie 'if (/\@hotmail.com@$/) { s/"//g; print; }' text_file

HTH,
M4
 
B

Bartc

Wow - All that just to separate @hotmail.com from anything else ? I'm
glad I stuck with perl :)

I think pete just enjoys writing huge amounts of C code. Or showing off..

I thought my 50-line answer (posted to comp.lang.c only) might have been a
bit long because it didn't make clever use of scanf(), but at least it could
deal with /any number/ of email addresses from a file.

This code I /think/ only deals with the 4 email addresses in the OP's
example..
 
V

vippstar

I think pete just enjoys writing huge amounts of C code. Or showing off..
Or using concrete functions he has written in the past to write
concrete programs.
<snip>
 
B

Bartc

Or using concrete functions he has written in the past to write
concrete programs.

I thought it was some sort of unwritten rule here that when posting code
solutions you tend not to import large elements of your own library.
Otherwise everyone would post their own different version of getline() and
so on.

And also there's the possibility, as seems to have happened here, of using
something inappropriate just because it's there. There's no reason at all to
use a linked list to read all the input into memory (and risking
out-of-memory or thrashing for large input).

(Although I suspect pete may have created this over-the-top solution on
purpose..)
concrete programs.

Which is more concrete, this code which has a memory requirement of N or
code using fixed memory?
 
V

vippstar

I thought it was some sort of unwritten rule here that when posting code
solutions you tend not to import large elements of your own library.
Otherwise everyone would post their own different version of getline() and
so on. There's no such rule
And also there's the possibility, as seems to have happened here, of using
something inappropriate just because it's there. There's no reason at all to
use a linked list to read all the input into memory (and risking
out-of-memory or thrashing for large input).
What do you mean thrasing? The code risks nothing as all the calls to
malloc, etc are checked.
(Although I suspect pete may have created this over-the-top solution on
purpose..)
Yes, presumably the purpose was to provide the newbie with a concrete
example
Which is more concrete, this code which has a memory requirement of N or
code using fixed memory?
It doesn't matter as long as error checking is there.
 
B

Bartc

What do you mean thrasing? The code risks nothing as all the calls to
malloc, etc are checked.

I mean the slow-down that occurs when memory gets nearly full.
It doesn't matter as long as error checking is there.

No, "Sorry out of memory" is just as acceptable as "Task completed"!
 
V

vippstar

I mean the slow-down that occurs when memory gets nearly full.
While true this has nothing to do with C.
No, "Sorry out of memory" is just as acceptable as "Task completed"!
A concrete example of code is one that cannot "break", ie behave
unexpectedly.
 
T

Tomás Ó hÉilidhe

That does not strip both of the " characters


Wups, meant to write strcpy(buf,original+1);

char const is confusing, and the second const is unnecessary.
fix:
const char *original = "\"...\"";


"char const" is confusing? :-O

You're right that the second const is unnecessary, just like my
breakfast this morning was unnecessary. I

@ does not belong to C's basic character set, so, that's not possible.


I had a feeling it mightn't be.

One might argue that if you're dealing with strings that have an @
symbol in them on a particular platform, that the compiler for that
platform will have the @ character.
 
S

santosh

Bartc said:
news:569670a8-4f4d-4101-ab7c-bcc50625ad94@l64g2000hse.googlegroups.com...


I thought it was some sort of unwritten rule here that when posting
code solutions you tend not to import large elements of your own
library. Otherwise everyone would post their own different version of
getline() and so on.

As it is, everyone does post different versions of code for the same
task (as this thread itself has brilliantly illustrated), so as long as
the post contains all the code to compile into a working program in a
self-sufficient manner, I don't see any harm in including something
from a personal library.

And pete has pre-written functions to read files into linked-lists. He
often posts a link to his website containing this and other C code
occasionally here in clc.
And also there's the possibility, as seems to have happened here, of
using something inappropriate just because it's there. There's no
reason at all to use a linked list to read all the input into memory
(and risking out-of-memory or thrashing for large input).

Well reading a file into a linked-list isn't exactly inappropriate, but
it may be overkill for the small fragment that the OP posted. But it
could be that the OP's actual file contains hundreds or thousands of
email addresses. Constructing a linked-list will obviously take more
storage than a plain linear array, but it makes some tasks like sorting
lines, inserting lines, deleting lines, etc., much more easier. I
suspect that this is the reason why pete uses them.
(Although I suspect pete may have created this over-the-top solution
on purpose..)
Hmm.


Which is more concrete, this code which has a memory requirement of N
or code using fixed memory?

Either code could run out memory on a sufficiently memory starved
system. Besides the linked-list approach has other advantages (which
may not be very pertinent to the particular task the OP wanted) which
must be considered in a fair comparison.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top