Parsing stream of JSON objects incrementally

E

Evan Driscoll

I'm interested in writing two programs, A and B, which communicate using
JSON. At a high level, A wants to transfer an array to B.

However, I would very much like to make it possible for A and B to run
in parallel, so my current plan is to have A output and B read a
*sequence* of JSON objects. In other words, instead of
[ {"a": 0}, {"b":0}, {"c": 0} ]
it would just send
{"a": 0}
{"b": 0}
{"c": 0}

I know about the raw_decode() object inside the json.JSONParser class,
and that gets me most of the way there there.

However, what I'm *not* sure about is the best way to get the input to
the raw_decode() function, which expects a "string or buffer":
Traceback (most recent call last):
...
File "json\scanner.py", line 42, in iterscan
match = self.scanner.scanner(string, idx).match
TypeError: expected string or buffer

Now I'm not very familiar with the buffer and how it could be used (and
whether a file or stdin could be used as one in an incremental fashion),
but the best way I can come up with is the following:

1. Read a line of input
2. Try to decode it
3. If not, read another line, concatenate it to the end, and try again
4. etc.

That seems... inelegant at least.


Some other information:

* I'm looking for a 2.7 solution ideally
* I'd prefer not to use a different JSON library entirely
* As suggested, I *am* willing to wait for a newline to do processing
* However, I don't want to require exactly one object per line (and
want to allow both multiple objects on one line and newlines within
an object)

Evan


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO7UIRAAoJEAOzoR8eZTzgVfEIAIFMEKKZtuY+I14rKXNHS8fS
FxlRRS8To6Cqdx5EyGvMSFH3P2Hl7xF3dJq/ezV/mUs/u3xhqu4WyRhd4YVq3u8B
GauhKrLy+vLoHRQMRXszk8CFKVeUksCH8+zgiZrgeBby0ALwC8qzvXD3RP9fuPNs
hAPTcJpjXaAdsGgbYAoRUjhLZi9Ak547aexMNwg5TTStFI+fdOmHqD8vvBhQiU1k
OQM7lrXK3pM7IivSpuhfYB9GEFjYAjt1dbhcsSrNXj17rCNhxONpiPH0KndE6HcM
hSE9mNw0hfyRBe5VU738+e7pOCcADhHbTmsqZ8Od1mMgR51GPdwq2joVwbUuAFI=
=/X4G
-----END PGP SIGNATURE-----
 
M

Miki Tebeka

You probably need to accumulate a buffer and try to decode it, when succeeded return the object and read more. See example below (note that for sockets select might be a better option for reading data).

import json
from time import sleep

def json_decoder(fo):
buff = ''
decode = json.JSONDecoder().raw_decode
while True:
line = fo.readline()
if not line:
break
buff += line
print('BUFF: {}'.format(buff))
try:
obj, i = decode(buff)
buff = buff[i:].lstrip()
yield obj
except ValueError as e:
print('ERR: {}'.format(e))
sleep(0.01) # select will probably be a better option :)


def main():
import sys
for obj in json_decoder(sys.stdin):
print(obj)

if __name__ == '__main__':
main()
 
M

Miki Tebeka

You probably need to accumulate a buffer and try to decode it, when succeeded return the object and read more. See example below (note that for sockets select might be a better option for reading data).

import json
from time import sleep

def json_decoder(fo):
buff = ''
decode = json.JSONDecoder().raw_decode
while True:
line = fo.readline()
if not line:
break
buff += line
print('BUFF: {}'.format(buff))
try:
obj, i = decode(buff)
buff = buff[i:].lstrip()
yield obj
except ValueError as e:
print('ERR: {}'.format(e))
sleep(0.01) # select will probably be a better option :)


def main():
import sys
for obj in json_decoder(sys.stdin):
print(obj)

if __name__ == '__main__':
main()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top