XML Not good for Big Files (vs Flat Files)

K

Kent Paul Dolan

Roedy said:
zipped is not suitable for handhelds. It is not suitable for routine
transport. If it were, HTML would be zipped too. It is too time
consuming and too cpu intensive.

Wrong, wrong, and wrong.

1) Unzip is a little bit ugly for memory footprint, around 100Kbytes,
but gzip/gunzip is half that:

-rwxr-xr-x 1 root root 53676 Jul 24 2004 /bin/gzip

Gzipped files are commonly use for transport, especially
for files that will be downloaded multiple times, but nearly
the same logic applies to files that as a group will be the
transported data across some repeatedly used interface,
especially when the sending end has lots of CPU cycles
available.

2) The unarchiving process is _almost_ universally much
faster than the archiving process.

root@1[/]# time gzip root.tar

real 0m9.207s
user 0m8.729s
sys 0m0.305s
root@1[/]# time gunzip root.tar.gz

real 0m1.884s
user 0m1.512s
sys 0m0.320s

where I kitchen-sinked 72 megabytes of glop accumulated
in the root directory for test data.

This means that for the most common case, files
going _to_ a handheld device, gzipping is a very
effective and low overhead choice.

3) I've downloaded many whole HTML web sites with
dozens to hundreds of pages; universally, the site
maintainers use some compression technology
and the files are transported as a compressed
archive.

xanthian.
 
K

Kent Paul Dolan

Roedy said:
1) Unzip is a little bit ugly for memory
footprint, around 100Kbytes, but gzip/gunzip is
half that: [53K]
and how much RAM does a cellphone have?

Though I'm not interested in cell phones, which I
consider slave fetters, apparently quite a bit. This
is from 2 years ago, the answer is probably larger
today. Note that gzip or equivalent can live in ROM,
only its data space, usually O(64Kbytes) needs to
live in RAM.

"Typically, cell-phone manufacturers now use 8 to 16
Mb of low-power SRAM for data backup; 32 to 128 Mb
of Pseudo-SRAM (PSRAM) for the working area of the
system; and 64 to 128 Mb of NOR Flash for bootable
code storage on basic phone programs. They also
employ 128 to 256 Mb or more additional memory¿often
NAND Flash¿for application software and the storage
of huge data, such as pictures and music."

http://www.wsdmag.com/Article/ArticleID/7916/7916.html

Notice that 128Mbytes of PSRAM is the RAM memory
size of many laptops 4 years ago.

HTH

xanthian.
 
M

Mark Thornton

Roedy said:
and how much RAM does a cellphone have?

Do you remember those modems we used to use on analog lines? Most of
those did compression. It isn't a big burden.

For XML you could even consider associating initial dictionaries with
schemas, so when expecting a file with a particular schema you could
preload the relevant dictionary. No benefit on the first occasion you
transfer a file for a given schema, but could help on subsequent transfers.

Mark Thornton
 
R

Roedy Green

Do you remember those modems we used to use on analog lines? Most of
those did compression. It isn't a big burden.

That is not how cellphones talk to the internet anymore is it? Surely
there is some digital protocol. Does it do compression?
 
A

Andrew McDonagh

Roedy said:
That is not how cellphones talk to the internet anymore is it? Surely
there is some digital protocol. Does it do compression?

FTLOG Roedy. Modems were digital. The name is short for
modulator/demodulator. Converts digital signals into analogue for
transmission over an analogue transport medium (the phone line).

But even early modems used compression, because the bandwidth was so
small on the early modems.

All mobile phones today use basically the same technology. The radio
waves between the handset and the base station are after all - analogue!

The encoding is purely digital, not the signal.

Andrew wonders off shaking his head....
 
R

Roedy Green

The encoding is purely digital, not the signal.

Andrew wonders off shaking his head....

Cell phones could work by taking the analog signal coming out the
modem, then redigitising it for packet sending to the cell phone
tower, in other words treating it exactly like voice, chewing up
bandwidth the entire time the carrier was on.

Or the data could be send directly on the packet protocol used by the
cell phone network, some sort of PPP. That protocol might have built
in automatic compression or it might not. I would think you should get
much more payload through per packet that way.
 
R

Roedy Green

Andrew wonders off shaking his head....

What gets into you that makes you throw in patronising little barbs
like that? It does not add to your persuasiveness.

Perhaps you underestimate their irritating effect. Pretty well the
only way you can deal with them is to pretend not to notice, or lie
low and wait for a chance for revenge.
 
A

Andrew McDonagh

Roedy said:
Cell phones could work by taking the analog signal coming out the
modem, then redigitising it for packet sending to the cell phone
tower, in other words treating it exactly like voice, chewing up
bandwidth the entire time the carrier was on.

You lost me here - I have no idea what it is you are getting at.

The phone digitizes all signals - including Voice.

The only analogue signal coming out of the modem (or rather its ADC
Analogue to Digital Converter) is sent to the transceiver for
broadcasting via microwave to the basestations.

All digital signals from the ADC are for use internally within the phone
or for encoding by the ADC for transmission to the network. Upon receipt
by the base stations, ADCs there convert the analogue signal back to its
digital encoding for routing around the network.

All this happens in pseudo realtime to us humans, because in fact Time
Division Multiplexing is used. There is no actual continuous
transmission occurring. Phones dont transmit unless there is something
to transmit (be it voice, data or signaling - as in signal strength,
quality, handover requests, etc).
Or the data could be send directly on the packet protocol used by the
cell phone network, some sort of PPP. That protocol might have built
in automatic compression or it might not. I would think you should get
much more payload through per packet that way.

It sort of is. The cell phones use Timeslots for transmission be it
voice or data. For data calls, special signaling from the cell phone
tells the network that the payload of each 'voice' message is actually a
Data message. Various compression algos are used to enhance capacity.

Google GSM Specs for specifics if interested.
 
R

Roedy Green

Cell phones could work by taking the analog signal coming out the
modem, then redigitising it for packet sending to the cell phone
tower, in other words treating it exactly like voice, chewing up
bandwidth the entire time the carrier was on.

Or the data could be send directly on the packet protocol used by the
cell phone network, some sort of PPP. That protocol might have built
in automatic compression or it might not. I would think you should get
much more payload through per packet that way.

Let me try two other ways to describe this:

1. what if your cell phone provider were also your IAP? You would not
need a modem in your cell phone to create a "sound" to be
redigitised. The data stream could be interleaved digitally with the
digitized voice stream.

2. :Think of a desktop with ADSL or cable connection. It does NOT work
by having a classical modem that creates "sound" that is digitised and
sent off in packets to an IAP. A cell phone should not need similar
kludge if it has a digital wireless connection.
 
F

Filip Larsen

Roedy Green wrote
Or the data could be send directly on the packet protocol used by the
cell phone network, some sort of PPP.

All the GSM data transfer protocols (CSD, HSCSD, and GPRS) uses a
special codec on the air interface to achieve high bit rates. For EDGE
even the HF modulation form in the relevant time slot is altered to give
better bit rates. The GSM 6.10 voice codec is optimised to voice and
would only support a very low data bit rate if used for
"voice-modulated" data transfers.

It would be my guess that other similar digital cellular systems uses
similar techniques to achieve high data bit rates.


Regards,
 
O

Oliver Wong

Jhair Tocancipa Triana said:
For decades you can achieve the same result in the example you state
using two files (one for the persons and other for the phone numbers)
and joining its contents (e.g. after loading them to a relational
database).

So XML offers nothing new in the scenario you describe...

To be fair, Joe Attardi's example wasn't meant to show something "new",
but rather to show XML providing something more than a guarantee that the
syntax is right. In this respect, I think Joe's example is successful (in
that it demonstrates hierarchal data in addition to syntax).

- Oliver
 
A

Andrew McDonagh

Roedy said:
Let me try two other ways to describe this:

1. what if your cell phone provider were also your IAP? You would not
need a modem in your cell phone to create a "sound" to be
redigitised. The data stream could be interleaved digitally with the
digitized voice stream.

Irrelevant unless you can change physics.

All radio communication is analogue. The information it carries
(encodes) may or may not be digital, but the physical transport is analogue.
2. :Think of a desktop with ADSL or cable connection. It does NOT work
by having a classical modem that creates "sound" that is digitised and
sent off in packets to an IAP. A cell phone should not need similar
kludge if it has a digital wireless connection.


Well....actually... ADSL does work by sending 'sounds'. It actually
uses the frequencies outside of the standard telephone range by advanced
DSPs to get that extra capacity. These sounds encode ATM (Asychronous
Transfer Mode) signals which is a transport protocol. ATM itself has a
payload that can carry either voice or data. Unlike IP(4) ATM was
designed with QoS parameterization to aid the network operators (AT&T,
Bell, BT, etc) in their tuning their network for different loads and
customers.
 
A

Andrew McDonagh

Andrew said:
Irrelevant unless you can change physics.

All radio communication is analogue. The information it carries
(encodes) may or may not be digital, but the physical transport is
analogue.



Well....actually... ADSL does work by sending 'sounds'. It actually
uses the frequencies outside of the standard telephone range by advanced
DSPs to get that extra capacity. These sounds encode ATM (Asychronous
Transfer Mode) signals which is a transport protocol. ATM itself has a
payload that can carry either voice or data. Unlike IP(4) ATM was
designed with QoS parameterization to aid the network operators (AT&T,
Bell, BT, etc) in their tuning their network for different loads and
customers.

http://www.vocal.com/adslmodems.html
 
S

Steve Wampler

Oliver said:
To be fair, Joe Attardi's example wasn't meant to show something
"new", but rather to show XML providing something more than a guarantee
that the syntax is right. In this respect, I think Joe's example is
successful (in that it demonstrates hierarchal data in addition to syntax).

Eh? (again) Are you really claiming that you cannot syntactically represent
hierarchical data? Please explain how context-free grammars represent
arithmetic expressions if hierarchy isn't syntax.
 
O

Oliver Wong

Steve Wampler said:
Eh? (again)

Whether the "syntax is right" and whether the data is hierarchal are two
orthogonal concepts, IMHO. I should have said "in addition to guarantee of
correct syntax" instead of just "in addition to syntax".
Are you really claiming that you cannot syntactically represent
hierarchical data?
No.

Please explain how context-free grammars represent
arithmetic expressions if hierarchy isn't syntax.

Isn't syntax simply the list of allowable keywords and their parameters?
I don't think syntax in itself is sufficient to represent hierarchy. You
need something like grammatical rules that can reference each other.

E.g., this, syntax, is not enough:

'(', ')', '+', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'

You also need this, a grammar:

EXP -> INT | INT OP INT | '(' EXP ')'
INT -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
OP -> '+' | '-'

- Oliver
 
S

Steve Wampler

Oliver said:
Isn't syntax simply the list of allowable keywords and their
parameters? I don't think syntax in itself is sufficient to represent
hierarchy. You need something like grammatical rules that can reference
each other.

E.g., this, syntax, is not enough:

'(', ')', '+', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'

You also need this, a grammar:

EXP -> INT | INT OP INT | '(' EXP ')'
INT -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
OP -> '+' | '-'

No. Syntax *is* grammar. You're mixing lexics and syntax. Semantics
is the meaning attached to a syntax. (Lexics is one aspect of syntax,
corresponding to the leaf nodes in the grammer.)
 
A

Andrew McDonagh

Roedy said:
Cut it out. That is not the issue and you know it. You are just
being obtuse.

Eh?

I was just pointing out that sound is used, after you'd said
'...It does NOT work by having a classical modem that creates "sound"
that is digitised and sent off in packets to an IAP.'

Whats obtuse about that?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top