Separating elements from a list according to preceding element

R

Rob Cowie

I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered? If this method is
suitable, how might I implement it?

Thanks all,

Rob Cowie
 
G

Gerard Flanagan

Rob said:
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered? If this method is
suitable, how might I implement it?

Thanks all,

Rob Cowie

a = [ '+', 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,8,2))
c = list(itertools.islice(a,1,8,2))

result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

print
print result1
print result2


Gerard
 
B

Ben Cartwright

Rob said:
I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered?

Maybe. You could write a couple regexes, one to find the included
tags, and one for the excluded, then run re.findall on them both.

But there's nothing fundamentally wrong with your method.
If this method is
suitable, how might I implement it?

tags = ['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

include, exclude = [], []
op = '+'
for cur in tags:
if cur in '+-':
op = cur
else:
if op == '+':
include.append(cur)
else:
exclude.append(cur)

--Ben
 
G

Gerard Flanagan

Gerard said:
Rob said:
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered? If this method is
suitable, how might I implement it?

Thanks all,

Rob Cowie

a = [ '+', 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,8,2))
c = list(itertools.islice(a,1,8,2))

result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

print
print result1
print result2


Gerard

'8' is the length of 'a' (len(a))
 
J

James Stroud

Rob said:
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered? If this method is
suitable, how might I implement it?

Thanks all,

Rob Cowie

Unclever way:

alist = ['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']
include, disinclude = [], []
aniter = iter(alist)
if len(alist) % 2:
include.append(aniter.next())
for asign in aniter:
if asign == '+':
include.append(aniter.next())
else:
disinclude.append(aniter.next())

A cleverer way will probably use list comprehension and logic shortcutting.

James
 
J

James Stroud

Gerard said:
Rob said:
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered? If this method is
suitable, how might I implement it?

Thanks all,

Rob Cowie


a = [ '+', 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,8,2))
c = list(itertools.islice(a,1,8,2))

result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

print
print result1
print result2


Gerard

Unfortunately this does not address the complete specification:
>>> a = [ 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]
>>>
>>> import itertools
>>>
>>> b = list(itertools.islice(a,0,len(a),2))
>>> c = list(itertools.islice(a,1,len(a),2))
>>>
>>> result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
>>> result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']
>>>
>>> print
>>> print result1 []
>>> print result2
[]

Need to check for the absence of that first op.

James
 
J

James Stroud

Bruno said:
Rob Cowie a écrit :
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered?


If you're responsible for the original URL, you may consider rewriting
it this way:
scheme://domain.tld/resource?tag1=1&tag2=1&tag3=1&tag4=0

Else - and after you've finished cursing the guy that came out with such
an innovative way to use url parameters - I think the first thing to do
would be to fix the implicit-first-operator-mess, so you have something
consistent:

if the_list[0] != "-":
the_list.insert(0, "+")

Then a possible solution could be:

todo = {'+' : [], '-' : []}
for op, tag in zip(the_list[::2], the_list[1::2]):
todo[op].append(tag)

But there's surely something better...

Fabulous. Here is a fix:

the_list = ['+'] * (len(the_list) % 2) + the_list
todo = {'+' : [], '-' : []}
for op, tag in zip(the_list[::2], the_list[1::2]):
todo[op].append(tag)
 
A

Alex Martelli

Gerard Flanagan said:
a = [ '+', 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,8,2))
c = list(itertools.islice(a,1,8,2))

Much as I love itertools, this specific task would be best expressed ad

b = a[::2]
c = a[1::2]

Do note that you really don't need the 'list(...)' here, for the
following use:
result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

....would be just as good if b and c were islice objects rather than
lists, except for the issue of _repeating_ (izipping twice). I'd rather
do some variant of a single-loop such as:

results = {'+':[], '-':[]}
for operator, tag in itertools.izip(a[::2], a[1::2]):
results[operator].append(tag)

and use results['+'] and results['-'] thereafter.

These approaches do not consider the inconvenient fact that the leading
'+' does in fact not appear in list a -- it needs to be assumed, the OP
stated; only a '-' would instead appear explicitly. Little for it but
specialcasing depending on whether a[0]=='-', I think -- e.g. in the
above 3-line snippet of mine, insert right after the first line:

if a[0]!='-': results['+'].append(a.pop(0))


Alex
 
G

Gerard Flanagan

James said:
Gerard said:
Rob said:
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
[...]
a = [ '+', 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,8,2))
c = list(itertools.islice(a,1,8,2))

result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

print
print result1
print result2


Gerard

Unfortunately this does not address the complete specification:
a = [ 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,len(a),2))
c = list(itertools.islice(a,1,len(a),2))

result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

print
print result1 []
print result2
[]

Need to check for the absence of that first op.

James

Yes, should have stuck to the spec.

Gerard
 
G

Gerard Flanagan

Alex said:
Gerard Flanagan said:
a = [ '+', 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,8,2))
c = list(itertools.islice(a,1,8,2))

Much as I love itertools, this specific task would be best expressed ad

b = a[::2]
c = a[1::2]

Yes, I thought that when I saw bruno's solution - I can't say that I've
never seen that syntax before, but I never really understood that this
is what it did.
Do note that you really don't need the 'list(...)' here, for the
following use:
result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']
result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

...would be just as good if b and c were islice objects rather than
lists, except for the issue of _repeating_ (izipping twice).

I couldn't get it to work without the 'list(...)' , it seems you must
have to 'rewind' the islice, eg. this works:

b = itertools.islice(a,0,8,2)
c = itertools.islice(a,1,8,2)

result1 = [x[1] for x in itertools.izip(b,c) if x[0] == '+']

b = itertools.islice(a,0,8,2)
c = itertools.islice(a,1,8,2)

result2 = [x[1] for x in itertools.izip(b,c) if x[0] == '-']

but not without that 're-assignment' of b and c.
I'd rather
do some variant of a single-loop such as:

results = {'+':[], '-':[]}
for operator, tag in itertools.izip(a[::2], a[1::2]):
results[operator].append(tag)

and use results['+'] and results['-'] thereafter.

These approaches do not consider the inconvenient fact that the leading
'+' does in fact not appear in list a -- it needs to be assumed, the OP
stated; only a '-' would instead appear explicitly. Little for it but
specialcasing depending on whether a[0]=='-', I think -- e.g. in the
above 3-line snippet of mine, insert right after the first line:

if a[0]!='-': results['+'].append(a.pop(0))


Alex

Cheers

Gerard
 
P

Paul McGuire

Rob Cowie said:
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered? If this method is
suitable, how might I implement it?

Thanks all,

Rob Cowie
Here's how this would look with pyparsing (download pyparsing at
http://pyparsing.sourceforge.net ):


data = 'tag1+tag2+tag3-tag4'

from pyparsing import *
tag = Word(alphas,alphanums)
incl = Literal("+").suppress()
excl = Literal("-").suppress()

inclQual = Optional(incl) + tag
exclQual = excl + tag
qualDef = OneOrMore(
inclQual.setResultsName("includes",listAllMatches=True ) |


exclQual.setResultsName("excludes",listAllMatches=True ) )

quals = qualDef.parseString(data)
print quals.includes
print quals.excludes


Prints out:

[['tag1'], ['tag2'], ['tag3']]
[['tag4']]

-- Paul
 
B

Bruno Desthuilliers

Rob Cowie a écrit :
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered?

If you're responsible for the original URL, you may consider rewriting
it this way:
scheme://domain.tld/resource?tag1=1&tag2=1&tag3=1&tag4=0

Else - and after you've finished cursing the guy that came out with such
an innovative way to use url parameters - I think the first thing to do
would be to fix the implicit-first-operator-mess, so you have something
consistent:

if the_list[0] != "-":
the_list.insert(0, "+")

Then a possible solution could be:

todo = {'+' : [], '-' : []}
for op, tag in zip(the_list[::2], the_list[1::2]):
todo[op].append(tag)

But there's surely something better...
 
M

Michael Spencer

Rob said:
I'm having a bit of trouble with this so any help would be gratefully
recieved...

After splitting up a url I have a string of the form
'tag1+tag2+tag3-tag4', or '-tag1-tag2' etc. The first tag will only be
preceeded by an operator if it is a '-', if it is preceded by nothing,
'+' is to be assumed.

Using re.split, I can generate a list that looks thus:
['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4']

I wish to derive two lists - each containing either tags to be
included, or tags to be excluded. My idea was to take an element,
examine what element precedes it and accordingly, insert it into the
relevant list. However, I have not been successful.

Is there a better way that I have not considered? If this method is
suitable, how might I implement it?

Thanks all,

Rob Cowie
Since you're already using a regexp, why not modify it to group the operators
with their tags? :
>>> import re
>>> source = "tag1+tag2+tag3-tag4" ....
>>> tagfinder = re.compile("([+-]?)(\w+)") ....
>>> include = []
>>> exclude = [] ....
>>> for op, tag in tagfinder.findall(source):
.... if op == "-":
.... exclude.append(tag)
.... else:
.... include.append(tag)
....
>>> include ['tag1', 'tag2', 'tag3']
>>> exclude ['tag4']
>>>

(Example assumes that a tag can be matched by \w+ and that there
is no space between the operators and their tags)

Michael
 
B

Bruno Desthuilliers

Gerard Flanagan a écrit :
Alex said:
a = [ '+', 'tag1', '+', 'tag2', '-', 'tag3', '+', 'tag4' ]

import itertools

b = list(itertools.islice(a,0,8,2))
c = list(itertools.islice(a,1,8,2))

Much as I love itertools, this specific task would be best expressed ad

b = a[::2]
c = a[1::2]


Yes, I thought that when I saw bruno's solution - I can't say that I've
never seen that syntax before, but I never really understood that this
is what it did.

It's in fact pretty simple. The full slice syntax is [start:end:step],
with default values of start=0, end=len(seq), step=1. So a[::2] will
retrieve a[0], a[2], a[4] etc, and a[1::2] -> a[1], a[3], a[5] etc.

(snip)
 
R

Rob Cowie

Thanks everyone. I assumed there was something I had not considered...
list slicing is that thing.

The pyParsing example looks interesting - but for this case, a little
too heavy. It doesn't really warrant including a third party module.

Rob C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top