Additional regex question

Bryan · Nov 18, 2003

Looked in the faq, helpful but still didnt figure out a clean way to do
something more advanced (for me):

I have a file with this kind of stuff in it:

+some identifying string 1
aaabbbbcccccdddd
eeeffffaaabcbcbaad
jjkalddd

+some identifying string 2
ggaadryyyyssaaad
ddddeeeakkkkalllla
asdfffff

I need to process the file and dump the results into a new file. The
file should be processed in the following manner:
1. Any line that starts with '+' should be untouched and dumped to the
new file
2. Any lines that are not empty should be joined with whatever lines are
not empty following them, up to the empty line.
3. The joined line needs to be searched for a pattern and then truncated
after the pattern.

So if my search string was (case insensitive) ddddeee, the output file
would look like this:
+some identifying string 1
aaabbbbcccccddddeee

+some identifying string 2
ggaadryyyyssaaadddddeee

Using index and substr, I can match and get the truncated version of the
joined string... but I am not sure how to loop over my file, and in some
cases use just one line and in others join lines.

I tried fiddling with $/ = "", or $/ = "+", but couldn't get what I wanted.

Suggestions appreciated,
B

Austin P. So (Hae Jin) · Nov 19, 2003

Bryan said:
I have a file with this kind of stuff in it:

+some identifying string 1
aaabbbbcccccdddd
eeeffffaaabcbcbaad
jjkalddd

+some identifying string 2
ggaadryyyyssaaad
ddddeeeakkkkalllla
asdfffff

Geez...are you trying to mask your homework problem?

This is a common thing for biological databases...best thing to do is to
convert things from FASTA format into a HASH (or array I suppose) and
then play with that information as needed...

i.e. convert:

identifier1 sequence
sequence
sequence

identifier2

sequence
sequence
sequence

to:

$hash{$identifier1}=$sequence.$sequence.$sequence...
$hash{$identifier2}=$sequence.$sequence.$sequence...
....

Are you sure this isn't a homework problem or something? Seems like you
are asking everyone to manipulate FASTA sequences for you...

Oh well...in this case, use a flag to indicate a new sequence and read
the lines into a hash as above...

Good luck...

Austin

Jay Tilton · Nov 19, 2003

: I have a file with this kind of stuff in it:
:
: +some identifying string 1
: aaabbbbcccccdddd
: eeeffffaaabcbcbaad
: jjkalddd
:
: +some identifying string 2
: ggaadryyyyssaaad
: ddddeeeakkkkalllla
: asdfffff
:
: I need to process the file and dump the results into a new file. The
: file should be processed in the following manner:
: 1. Any line that starts with '+' should be untouched and dumped to the
: new file
: 2. Any lines that are not empty should be joined with whatever lines are
: not empty following them, up to the empty line.
: 3. The joined line needs to be searched for a pattern and then truncated
: after the pattern.
:
: So if my search string was (case insensitive) ddddeee, the output file
: would look like this:
: +some identifying string 1
: aaabbbbcccccddddeee
:
: +some identifying string 2
: ggaadryyyyssaaadddddeee
:
: Using index and substr, I can match and get the truncated version of the
: joined string... but I am not sure how to loop over my file, and in some
: cases use just one line and in others join lines.

#!perl
use warnings;
use strict;
{
local $/ = '';
while(<DATA>) {
chomp;
(my $lines =
join '',
grep !( /^\+/ && print "$_\n"),
split /\n/
) =~ s/(ddddeee).+/$1/;
print "$lines\n\n";
}
}
__DATA__
+some identifying string 1
aaabbbbcccccdddd
eeeffffaaabcbcbaad
jjkalddd

+some identifying string 2
ggaadryyyyssaaad
ddddeeeakkkkalllla
asdfffff

Tad McClellan · Nov 19, 2003

Bryan said:
I need to process the file and dump the results into a new file. The
file should be processed in the following manner:
1. Any line that starts with '+' should be untouched and dumped to the
new file
2. Any lines that are not empty should be joined with whatever lines are
not empty following them, up to the empty line.
3. The joined line needs to be searched for a pattern and then truncated
after the pattern.

Suggestions appreciated,

------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;

my $search = 'ddddeee';
local $/ = ''; # enable paragraph mode
while ( <DATA> ) {
my($first, $rest) = split /\n/, $_, 2;
$rest =~ tr/\n//d;
$rest =~ s/($search).*/$1/;
print "$first\n$rest\n"
}

__DATA__
+some identifying string 1
aaabbbbcccccdddd
eeeffffaaabcbcbaad
jjkalddd

+some identifying string 2
ggaadryyyyssaaad
ddddeeeakkkkalllla
asdfffff

RegEx	0	Sep 1, 2022
When deployed to Heroku, python setup.py egg info did not run successfully.	1	Jul 4, 2022
My regex kung-fu is not strong =(	0	Apr 4, 2020
Complex regex question	1	Sep 26, 2009
Clickable link conversion regex?	0	Nov 30, 2012
regex problem	7	Jun 12, 2009
Tasks	1	Nov 29, 2022
Creating a regex to get multiple values and print	0	Jan 10, 2021

Additional regex question

Bryan

Austin P. So (Hae Jin)

Jay Tilton

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads