Remove Parts of a String

D

Dan __

Alright, this is probably a really simple question to answer, but I just
can't think of how to do it at the moment. Say I had a string, similar
to the following:

12343:3,73820:1,183874:8

What I would like to be able to do is remove all colons, and everything
between the colons and the commas, and then split the string at the
commas (and then split the original string again at the commas, and then
deal with the numbers after the colons, but thats the easy part, since
I'll already know which index of the array to look at).

I know its kind of an odd thing to want to do, but it helps keep things
organized and in one place. Any ideas on how to do this?
 
S

Stefano Crocco

Alright, this is probably a really simple question to answer, but I just
can't think of how to do it at the moment. Say I had a string, similar
to the following:

12343:3,73820:1,183874:8

What I would like to be able to do is remove all colons, and everything
between the colons and the commas, and then split the string at the
commas (and then split the original string again at the commas, and then
deal with the numbers after the colons, but thats the easy part, since
I'll already know which index of the array to look at).

I know its kind of an odd thing to want to do, but it helps keep things
organized and in one place. Any ideas on how to do this?

I'm not sure I understand you correctly or not. At any rate, this code will
transform your example string to "12343,73820,183874", then split it in
["12343","73820","183874"]:

str.gsub(/:[^,]*(?:$|(?=,))/,'').split( ',')

I hope this helps

Stefano
 
E

Eric I.

Thanks very much Stefano :)  Thats exactly what I wanted, and it works
great :)

Hi Dan,

You're the expert in your the domain, and I obviously don't understand
the larger picture. But the idea of getting access to the data after
the colons at a later stage using indexing into the original string
sounds unnecessarily complex. So here's another approach.

If you run this, you'll see what you get at every step.

====

s = "12343:3,73820:1,183874:8"

# break it up into an array of arrays of strings
a = s.split(',').map { |v| v.split(':') }
p a

# now you can do lots of
things

# just want the numbers before the
colons?
b1 = a.map { |v| v.first }
p b1

# want them as integers rather than
strings?
b2 = a.map { |v| v.first.to_i }
p b2

# want it as a string w/ commas? this seems to be part of your
original request
b3 = a.map { |v| v.first }.join(',')
p b3

# want to regenerate the original
string?
b4 = a.map { |v| v.join(':') }.join(',')
p b4

====

Maybe (or maybe not) that's helpful....

Eric

====

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE
workshops.
Ruby Fundamentals Wkshp June 16-18 Ann Arbor, Mich.
Ready for Rails Ruby Wkshp June 23-24 Ann Arbor, Mich.
Ruby on Rails Wkshp June 25-27 Ann Arbor, Mich.
Ruby Plus Rails Combo Wkshp June 23-27 Ann Arbor, Mich
Please visit http://LearnRuby.com for all the details.
 
D

Dan __

Eric said:
Hi Dan,

You're the expert in your the domain, and I obviously don't understand
the larger picture. But the idea of getting access to the data after
the colons at a later stage using indexing into the original string
sounds unnecessarily complex. So here's another approach.

If you run this, you'll see what you get at every step.

====

s = "12343:3,73820:1,183874:8"

# break it up into an array of arrays of strings
a = s.split(',').map { |v| v.split(':') }
p a

# now you can do lots of
things

# just want the numbers before the
colons?
b1 = a.map { |v| v.first }
p b1

# want them as integers rather than
strings?
b2 = a.map { |v| v.first.to_i }
p b2

# want it as a string w/ commas? this seems to be part of your
original request
b3 = a.map { |v| v.first }.join(',')
p b3

# want to regenerate the original
string?
b4 = a.map { |v| v.join(':') }.join(',')
p b4

====

Maybe (or maybe not) that's helpful....

Eric

Hi Eric, thanks for the reply :)

Your solution seems great for more complex data uses. However, I'm
using this as part of a simple page-rating system in a Rails application
right now. Basically, the number before the colon is the ID of a page
that has been rated, and the number after the colon is the rating. The
entire string stores every page and rating the user has ever given. So
by splitting the string, and only looking at the number before the
colon, I can compare the ID of the current page to the IDs of the pages
the user has rated, and then display the rating they've given.

It seems to me that both yours and Stephano's solutions are equally
simple for what I'm using them for. If I had need to expand to a more
complex system, I'd definitely use yours. I'm fairly positive that my
use for the data won't expand beyond what it is now, and I've already
got Stephano's solution implemented, so I'm gonna stick with that one ;)

Thanks very much for offering your solution to the problem though :)
 
7

7stud --

Dan __ wrote:
hen display the rating they've given.
It seems to me that both yours and Stephano's solutions are equally
simple for what I'm using them for.

I disagree. This solution is an abomination:

str.gsub(/:[^,]*(?:$|(?=,))/,'').split( ',')

Your problem is very simple. If you split() this string on commas:

"12343:3,73820:1,183874:8"

you get this array:

["12343:3", "73820:1", "183874:8"]

Then you just have to split() each of the strings in the array, e.g
"12343:3", on the colon:

arr.each do |str|
results = str.split(/:/)
p results
end


Here it is altogether:

str = "12343:3,73820:1,183874:8"

arr = str.split(/,/)
p arr

arr.each do |str|
results = str.split(/:/)
p results
end

["12343:3", "73820:1", "183874:8"]
["12343", "3"]
["73820", "1"]
["183874", "8"]

Note: it's clearer to write split(",") but as is being discussed in
another thread, the code will execute faster if you use a regex:
split(/,/). If you specify an argument for split(), then use a regex
rather than a string to make your code more efficient. If you value
code clarity more than a slight improvement in efficiency, then use a
string.
 
E

Eric I.

Hi Eric, thanks for the reply :)

Your solution seems great for more complex data uses.  However, I'm
using this as part of a simple page-rating system in a Rails application
right now.  Basically, the number before the colon is the ID of a page
that has been rated, and the number after the colon is the rating.  The
entire string stores every page and rating the user has ever given.  So
by splitting the string, and only looking at the number before the
colon, I can compare the ID of the current page to the IDs of the pages
the user has rated, and then display the rating they've given.

It seems to me that both yours and Stephano's solutions are equally
simple for what I'm using them for.  If I had need to expand to a more
complex system, I'd definitely use yours.  I'm fairly positive that my
use for the data won't expand beyond what it is now, and I've already
got Stephano's solution implemented, so I'm gonna stick with that one ;)

Thanks very much for offering your solution to the problem though :)

Hi Dan,

You are most welcome! You know, it sounds like what you need is a
Hash, indexed by ID with the ratings being the values. Lookup becomes
trivial then. And if this is a case where the user logs in to your
Rails app, you can pull this string from your DB, convert it to a
hash, and store it in the session, so you don't have to re-process
that string each time.

Here's some more code (pretty much unsolicited at this point, eh?)
respectfully offered for you to consider:

====

ratings = "12343:3,73820:1,183874:8"

ratings_array = ratings.split(/[,:]/).map { |str| str.to_i }
# we now have array w/ ids & ratings as integers (not strings)
interleaved:
# [12343, 3, 73820, 1, 183874,
8]

ratings_by_id = Hash[*ratings_array]
# we convert that into a hash where keys are ids and values are
ratings:
# {12343=>3, 183874=>8,
73820=>1}

# and now we can look up an id and get either its rating or, if
there
# is no rating,
nil

r1 = ratings_by_id[183874]
p r1 # prints 8

r2 = ratings_by_id[77777]
p r2 # prints nil

====

I converted all the data to integers, but you could leave them as
strings if that made more sense for your app. But integers are better
from a performance standpoint for hash look-up.

Best,

Eric

====

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE
workshops.
Ruby Fundamentals Wkshp June 16-18 Ann Arbor, Mich.
Ready for Rails Ruby Wkshp June 23-24 Ann Arbor, Mich.
Ruby on Rails Wkshp June 25-27 Ann Arbor, Mich.
Ruby Plus Rails Combo Wkshp June 23-27 Ann Arbor, Mich
Please visit http://LearnRuby.com for all the details.
 
S

Stefano Crocco

already spoke, after all, I am just learning Ruby. However, I do find that
to understand:

str.gsub(/:[^,]*(?:$|(?=,))/,'').split( ',')

you need a PhD in "General Expresionology". Ruby is supposed to be simple.
Here you have 1 line of code but need 1 page to explain it. I do find it
elegant, although I don't understand it. I only wish I could do that to!

Victor

To understand it, you need to know regexps. The first argument of gsub is a
regexp which matches all substrings starting with a colon, followed by any
number of character except comma followed by a comma (which is not included in
the match, which explains the use of the (?=,) construct). The second argument
of gsub tells that the parts of the string matching the previous regexp should
be replaced with an empty string. The returned string is then split at commas.

If you want a quick reference on regexps in ruby, you can look here:
http://www.zenspider.com/Languages/Ruby/QuickRef.html#11. If you need a more
complete introduction, there's a long description in the free edition of the
Pickaxe: http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5

I hope this helps

Stefano
 
V

Victor Reyes

[Note: parts of this message were removed to make it a legal post.]

already spoke, after all, I am just learning Ruby. However, I do find that
to understand:

str.gsub(/:[^,]*(?:$|(?=,))/,'').split( ',')

you need a PhD in "General Expresionology". Ruby is supposed to be simple.
Here you have 1 line of code but need 1 page to explain it. I do find it
elegant, although I don't understand it. I only wish I could do that to!

Victor

To understand it, you need to know regexps. The first argument of gsub is a
regexp which matches all substrings starting with a colon, followed by any
number of character except comma followed by a comma (which is not included
in
the match, which explains the use of the (?=,) construct). The second
argument
of gsub tells that the parts of the string matching the previous regexp
should
be replaced with an empty string. The returned string is then split at
commas.

If you want a quick reference on regexps in ruby, you can look here:
http://www.zenspider.com/Languages/Ruby/QuickRef.html#11. If you need a
more
complete introduction, there's a long description in the free edition of
the
Pickaxe: http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5

I hope this helps

Stefano

Stefano, Thank you for the quick explanation. It really helps. The link was
also very helpful.
It is like everything else in life. Once you understanding it the mystique
is gone!

Thanks again,

Victor
 
D

Dan __

Sorry for the slow response from me, I haven't had access to my Rails
app and the internet at the same time for the past few days (still
don't, actually).

Eric, your hash idea seems very interesting. I'm assuming I'd have to
update the hash every time I changed the entry in the database though,
correct?

7stud - Your way does seem a bit simpler than the way I used at first.
Thanks :)
 
E

Eric I.

Eric, your hash idea seems very interesting.  I'm assuming I'd have to
update the hash every time I changed the entry in the database though,
correct?

Or you can do it the other way around, if that's more convenient. You
can add a new item to the Hash, and regenerate the String, and then
save it to your db. See below where I assign to the variable
back_to_string.

====

ratings = "12343:3,73820:1,183874:8"

ratings_array = ratings.split(/[,:]/).map { |str| str.to_i }
# we now have array w/ ids & ratings
interleaved:
# [12343, 3, 73820, 1, 183874,
8]

ratings_by_id = Hash[*ratings_array]
# we convert that into a hash where keys are ids and values are
ratings:
# {12343=>3, 183874=>8,
73820=>1}

# and now we can look up an id and get either its rating or, if
there
# is no rating,
nil

r1 = ratings_by_id[183874]
p r1

r2 = ratings_by_id[77777]
p r2

# add a new rating to the hash
ratings_by_id[62541] = 4

# regenerate the String
back_to_string = ratings_by_id.map { |k, v| "#{k}:#{v}"}.join(',')
puts back_to_string

====

Eric

====

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE
workshops.
Ruby Fundamentals Wkshp June 16-18 Ann Arbor, Mich.
Ready for Rails Ruby Wkshp June 23-24 Ann Arbor, Mich.
Ruby on Rails Wkshp June 25-27 Ann Arbor, Mich.
Ruby Plus Rails Combo Wkshp June 23-27 Ann Arbor, Mich
Please visit http://LearnRuby.com for all the details.
 
D

Dan __

The more I look at your solution, Eric, the more I like it :) I can see
this type of solution helping out other places in my app than just the
one I was asking about (and places in other apps). Thanks very much for
posting this for me, I'm definitely bookmarking this thread :)

-Dan
 
D

Dave Bass

Victor said:
to understand:

str.gsub(/:[^,]*(?:$|(?=,))/,'').split( ',')

you need a PhD in "General Expresionology". Ruby is supposed to be
simple.

At a glance you can see that gsub does something to str, and then split
does something to the result. So the top level's quite straightforward.

The gsub regexp is not easy to understand however, and this is one of
the problems with regexps in general. They're concise but obscure.
Generally speaking one builds them up a little at a time, using quite a
bit of trial and error -- or at least, that's how I do it, maybe other
people get there in one!

A regexp like this deserves several lines of comments to explain what's
going on. For some reason coders hate writing comments.

I would tend to do a job like this in several stages rather than in one
line. I've learnt by bitter experience that my future self finds it
difficult to understand concise code that my past self has written!
 
V

Victor Reyes

[Note: parts of this message were removed to make it a legal post.]

Victor said:
to understand:

str.gsub(/:[^,]*(?:$|(?=,))/,'').split( ',')

you need a PhD in "General Expresionology". Ruby is supposed to be
simple.

At a glance you can see that gsub does something to str, and then split
does something to the result. So the top level's quite straightforward.

The gsub regexp is not easy to understand however, and this is one of
the problems with regexps in general. They're concise but obscure.
Generally speaking one builds them up a little at a time, using quite a
bit of trial and error -- or at least, that's how I do it, maybe other
people get there in one!

A regexp like this deserves several lines of comments to explain what's
going on. For some reason coders hate writing comments.

I would tend to do a job like this in several stages rather than in one
line. I've learnt by bitter experience that my future self finds it
difficult to understand concise code that my past self has written!
beautiful and concise, it is difficult to understand unless you do it often
or document it very well.
Perhaps its elegance had to do with its complexity or something like that!

Victor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top