Ordered list question

G

Guest

I'm currently working on a project where I'm looping through xml elements,
pulling the 'id' attribute (which will be coerced to a number) as well as the
element tag. I'm needing these elements in numerical order (from the id).
Example xml might look like:

<price id="5">
<copyright id="1">
<address id="3">

There will be cases where elements might be excluded, but I'd still need to
put what I find in id numerical order. In the above example I would need
the order of 1, 3, 5 (or copyright, address, price). In javascript I can easily
index an array, and any preceding elements that don't exist will be set to
'undefined':

-----
var a = [];

a[parseInt('5')] = 'price';
a[parseInt('1')] = 'copyright';
a[parseInt('3')] = 'address';

// a is now [undefined, copyright, undefined, address, undefined, price]
-----

Next, I can loop through the array and remove every 'undefined' in order to
get the ordered array I need:

-----
var newA = [];
for (var x = 0; x < a.length; x++) {
if (a[x] != undefined) {
newA.push(a[x]);
}
}

// newA is now [copyright, address, price]
-----

My question is, does python have a similar way to do something like this?
I'm assuming the best way is to create a dictionary and then sort it by
the keys?

Thanks.

Jay
 
T

Thomas 'PointedEars' Lahn

(e-mail address removed) wrote:
^^^^^^^^^^^^^^^^^^
Something is missing there.
I'm currently working on a project where I'm looping through xml elements,
pulling the 'id' attribute (which will be coerced to a number)

No, usually it won't.
as well as the element tag.

That's element _type name_.
I'm needing these elements in numerical order (from the id).

Attribute values of type ID MUST NOT start with a decimal digit in XML [1].
Example xml might look like:

<price id="5">
<copyright id="1">
<address id="3">

That is not even well-formed, as the end tags of the `address', `copyright',
and `price' elements (in that order) are missing. Well-formed XML would be
either

<foo>
<price id="5"/>
<copyright id="1"/>
<address id="3"/>
</foo>

or

<foo>
<price id="5">
<copyright id="1"/>
</price>
<address id="3"/>
</foo>

or

<foo>
<price id="5"/>
<copyright id="1">
<address id="3"/>
</copyright>
</foo>

or

<price id="5">
<copyright id="1"/>
<address id="3"/>
</price>

or

<price id="5">
<copyright id="1">
<address id="3"/>
</copyright>
</price>

but neither might be Valid (or make sense). Check your DTD or XML Schema.
There will be cases where elements might be excluded, but I'd still need
to put what I find in id numerical order. In the above example I would
need the order of 1, 3, 5 (or copyright, address, price). In javascript I
can easily index an array, and any preceding elements that don't exist
will be set to 'undefined':

-----
var a = [];

a[parseInt('5')] = 'price';
a[parseInt('1')] = 'copyright';
a[parseInt('3')] = 'address';

// a is now [undefined, copyright, undefined, address, undefined,
price] -----

This is nonsense even in "javascript" (there really is no such language
[1]). In ECMAScript implementations like JavaScript you would write

var a = [];
a[5] = "price";
a[1] = "copyright";
a[3] = "address";

as array indexes are only special object properties, and properties are
stored as strings anyway. However, the highest index you can store this
way, in the sense that it increases the `length' of the array, would be
2³²−2 (as the value of the `length' property ranges from 0 to 2³²–1).

Python's `list' type is roughly equivalent to ECMAScript's `Array' type.
Important differences include that apparently you cannot store as much items
in a Python list as in an ECMAScript Array –
....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError

[Kids, don't try this at home!]
....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: range() result has too many items

–, and that you need to add enough items in order to access one (so there
are no sparse lists):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

(I was not aware of that.) Also, the access parameter must be integer:
Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str

[Using a non-numeric or out-of-range parameter (see above) for bracket
property access on an ECMAScript Array means that the number of elements in
the array does not increase, but that the Array instance is augmented with a
non-element property, or an existing non-element property is overwritten;
this cannot happen with Python lists.]
Next, I can loop through the array and remove every 'undefined' in order
to get the ordered array I need:

Or you could be using an ECMAScript Object instance in the first place, and
iterate over its enumerable properties. This would even work with proper
IDs, but if you are not careful – chances of which your statements about the
language indicate – you might need further precautions to prevent showing up
of user-defined enumerable properties inherited from Object.prototype:

var o = {
5: "price",
1: "copyright",
3: "address"
};

or programmatically:

var o = {};
o[5] = "price";
o[1] = "copyright";
o[3] = "address";

Then:

for (var prop in o)
{
/* get prop or o[prop] */
}
-----
var newA = [];
for (var x = 0; x < a.length; x++) {

Unless a.length changes:

for (var x = 0, len = a.length; x < len; ++x) {

The variable name `x' should also be reserved for non-counters, e. g. object
references. Use i, j, k, and so forth in good programming tradition here
instead.
if (a[x] != undefined) {

if (typeof a[x] != "undefined") {

as your variant would also evaluate to `false' if a[x] was `null', and would
throw an exception in older implementations at no advantage (however, you
might want to consider using `a[x] !== undefined' for recent implementations
only).
newA.push(a[x]);
}
}

// newA is now [copyright, address, price]

Or you would be using Array.prototype.push() (or a[a.length] = …) in the
first place instead of this, as contrary to what you stated above you appear
to be only interested in the element type names:

var a = [];
a.push("price");
a.push("copyright");
a.push("address");
-----

My question is, does python have a similar way to do something like this?
I'm assuming the best way is to create a dictionary and then sort it by
the keys?

As are ECMAScript objects, Python's dictionaries are an *unordered*
collection of name-value pairs. You would be using Python's `list' type,
and its append() method (foo.append(bar)) or concatenate two lists instead
(foo += [bar]). Then you would sort the list (see below). (You could also
use a dictionary object, and use its keys() method and then sort its return
value. Depends on your use-case.)

I am getting the idea here that you intend to apply string parsing on XML.
However, when working with XML you should instead be using an XML parser to
get a document object, then XPath on the document object to retrieve the
`id' attribute values of the elements that have an `id' attribute
('//*[@id]/@id') or the elements themselves ('//*[@id]'), in which case you
would use XPathResult::*ORDERED_NODE_SNAPSHOT_TYPE to apply the
snapshotItem() method to generate a list, and then you would probably simply
say mylist.sort() [but see below]. Different XPath APIs for Python might
also present the result as a list already without you having to call the
snapshotItem() method. You should look into the libxml2 module and lxml.

If you are instead interested in finding out, e.g., the element type for a
specific ID, without using XPath again, then you should build and sort a
list of dictionaries –

a = [{"id": "5", "type": "price"},
{"id": "1", "type": "copyright"},
{"id": "3", "type": "address"}]

or, programmatically

a = []
a.append({"id": "5", "type": "price"})
a.append({"id": "1", "type": "copyright"})
a.append({"id": "3", "type": "address"})

– which is BTW syntactically exactly the approach that you would use in an
ECMAScript implementation (except for the trailing semicolon that should not
be missing in ECMAScript). The Python solution [3] –

a.sort(cmp=lambda x,y: cmp(x['id'], y['id']))

or (since Python 2.4)

a.sort(key=lambda x: x['id'])

or (since Python 2.4)

sorted(a, cmp=lambda x, y: cmp(x['id'], y['id']))

or (since Python 2.4)

sorted(a, key=lambda x: x['id'])

– only differs from the ECMAScript-based one in the way that the lambda
expression for the comparator is written [4]:

a.sort(function(x, y) {
var x_id = x.id,
y_id = y.id;
return ((x_id < y_id) ? -1 : ((x_id == y_id) ? 0 : 1));
});

(The local variables should improve the efficiency in the worst case. You
may omit some parentheses there at your discretion.)

A difference between sorted() and the other Python ways is that the former
returns a sorted list but leaves the original list as it is. (ECMAScript
does not provide a built-in method to sort an array of objects by the
object's property values, nor does it provide a built-in one that sorts an
array or array-like object not-in-place. But such is easily implemented.)


HTH
__________
[1] <http://www.w3.org/TR/xml/#id>
[2] <http://PointedEars.de/es-matrix>
[3] <http://wiki.python.org/moin/HowTo/Sorting/>
[4]
<https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/array/sort>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top